This commit starts out by consolidating all `str` extension traits into one
`StrExt` trait to be included in the prelude. This means that
`UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into
one `StrExt` exported by the standard library. Some functionality is currently
duplicated with the `StrExt` present in libcore.
This commit also currently avoids any methods which require any form of pattern
to operate. These functions will be stabilized via a separate RFC.
Next, stability of methods and structures are as follows:
Stable
* from_utf8_unchecked
* CowString - after moving to std::string
* StrExt::as_bytes
* StrExt::as_ptr
* StrExt::bytes/Bytes - also made a struct instead of a typedef
* StrExt::char_indices/CharIndices - CharOffsets was renamed
* StrExt::chars/Chars
* StrExt::is_empty
* StrExt::len
* StrExt::lines/Lines
* StrExt::lines_any/LinesAny
* StrExt::slice_unchecked
* StrExt::trim
* StrExt::trim_left
* StrExt::trim_right
* StrExt::words/Words - also made a struct instead of a typedef
Unstable
* from_utf8 - the error type was changed to a `Result`, but the error type has
yet to prove itself
* from_c_str - this function will be handled by the c_str RFC
* FromStr - this trait will have an associated error type eventually
* StrExt::escape_default - needs iterators at least, unsure if it should make
the cut
* StrExt::escape_unicode - needs iterators at least, unsure if it should make
the cut
* StrExt::slice_chars - this function has yet to prove itself
* StrExt::slice_shift_char - awaiting conventions about slicing and shifting
* StrExt::graphemes/Graphemes - this functionality may only be in libunicode
* StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in
libunicode
* StrExt::width - this functionality may only be in libunicode
* StrExt::utf16_units - this functionality may only be in libunicode
* StrExt::nfd_chars - this functionality may only be in libunicode
* StrExt::nfkd_chars - this functionality may only be in libunicode
* StrExt::nfc_chars - this functionality may only be in libunicode
* StrExt::nfkc_chars - this functionality may only be in libunicode
* StrExt::is_char_boundary - naming is uncertain with container conventions
* StrExt::char_range_at - naming is uncertain with container conventions
* StrExt::char_range_at_reverse - naming is uncertain with container conventions
* StrExt::char_at - naming is uncertain with container conventions
* StrExt::char_at_reverse - naming is uncertain with container conventions
* StrVector::concat - this functionality may be replaced with iterators, but
it's not certain at this time
* StrVector::connect - as with concat, may be deprecated in favor of iterators
Deprecated
* StrAllocating and UnicodeStrPrelude have been merged into StrExit
* eq_slice - compiler implementation detail
* from_str - use the inherent parse() method
* is_utf8 - call from_utf8 instead
* replace - call the method instead
* truncate_utf16_at_nul - this is an implementation detail of windows and does
not need to be exposed.
* utf8_char_width - moved to libunicode
* utf16_items - moved to libunicode
* is_utf16 - moved to libunicode
* Utf16Items - moved to libunicode
* Utf16Item - moved to libunicode
* Utf16Encoder - moved to libunicode
* AnyLines - renamed to LinesAny and made a struct
* SendStr - use CowString<'static> instead
* str::raw - all functionality is deprecated
* StrExt::into_string - call to_string() instead
* StrExt::repeat - use iterators instead
* StrExt::char_len - use .chars().count() instead
* StrExt::is_alphanumeric - use .chars().all(..)
* StrExt::is_whitespace - use .chars().all(..)
Pending deprecation -- while slicing syntax is being worked out, these methods
are all #[unstable]
* Str - while currently used for generic programming, this trait will be
replaced with one of [], deref coercions, or a generic conversion trait.
* StrExt::slice - use slicing syntax instead
* StrExt::slice_to - use slicing syntax instead
* StrExt::slice_from - use slicing syntax instead
* StrExt::lev_distance - deprecated with no replacement
Awaiting stabilization due to patterns and/or matching
* StrExt::contains
* StrExt::contains_char
* StrExt::split
* StrExt::splitn
* StrExt::split_terminator
* StrExt::rsplitn
* StrExt::match_indices
* StrExt::split_str
* StrExt::starts_with
* StrExt::ends_with
* StrExt::trim_chars
* StrExt::trim_left_chars
* StrExt::trim_right_chars
* StrExt::find
* StrExt::rfind
* StrExt::find_str
* StrExt::subslice_offset
86 lines
3.2 KiB
Rust
86 lines
3.2 KiB
Rust
// Copyright 2012-2014 The Rust Project Developers. See the COPYRIGHT
|
|
// file at the top-level directory of this distribution and at
|
|
// http://rust-lang.org/COPYRIGHT.
|
|
//
|
|
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
|
|
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
|
|
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
|
|
// option. This file may not be copied, modified, or distributed
|
|
// except according to those terms.
|
|
|
|
//! # The Unicode Library
|
|
//!
|
|
//! Unicode-intensive functions for `char` and `str` types.
|
|
//!
|
|
//! This crate provides a collection of Unicode-related functionality,
|
|
//! including decompositions, conversions, etc., and provides traits
|
|
//! implementing these functions for the `char` and `str` types.
|
|
//!
|
|
//! The functionality included here is only that which is necessary to
|
|
//! provide for basic string-related manipulations. This crate does not
|
|
//! (yet) aim to provide a full set of Unicode tables.
|
|
|
|
#![crate_name = "unicode"]
|
|
#![experimental]
|
|
#![crate_type = "rlib"]
|
|
#![doc(html_logo_url = "http://www.rust-lang.org/logos/rust-logo-128x128-blk-v2.png",
|
|
html_favicon_url = "http://www.rust-lang.org/favicon.ico",
|
|
html_root_url = "http://doc.rust-lang.org/nightly/",
|
|
html_playground_url = "http://play.rust-lang.org/")]
|
|
#![no_std]
|
|
#![feature(globs, macro_rules, slicing_syntax, unboxed_closures)]
|
|
|
|
extern crate core;
|
|
|
|
// regex module
|
|
pub use tables::regex;
|
|
|
|
mod normalize;
|
|
mod tables;
|
|
mod u_char;
|
|
mod u_str;
|
|
|
|
// re-export char so that std et al see it correctly
|
|
/// Character manipulation (`char` type, Unicode Scalar Value)
|
|
///
|
|
/// This module provides the `Char` and `UnicodeChar` traits, as well as their
|
|
/// implementation for the primitive `char` type, in order to allow basic character
|
|
/// manipulation.
|
|
///
|
|
/// A `char` actually represents a
|
|
/// *[Unicode Scalar Value](http://www.unicode.org/glossary/#unicode_scalar_value)*,
|
|
/// as it can contain any Unicode code point except high-surrogate and
|
|
/// low-surrogate code points.
|
|
///
|
|
/// As such, only values in the ranges \[0x0,0xD7FF\] and \[0xE000,0x10FFFF\]
|
|
/// (inclusive) are allowed. A `char` can always be safely cast to a `u32`;
|
|
/// however the converse is not always true due to the above range limits
|
|
/// and, as such, should be performed via the `from_u32` function..
|
|
pub mod char {
|
|
pub use core::char::{MAX, from_u32, is_digit_radix, to_digit};
|
|
pub use core::char::{from_digit, escape_unicode, escape_default};
|
|
pub use core::char::{len_utf8_bytes, Char};
|
|
|
|
pub use normalize::{decompose_canonical, decompose_compatible, compose};
|
|
|
|
pub use tables::normalization::canonical_combining_class;
|
|
pub use tables::UNICODE_VERSION;
|
|
|
|
pub use u_char::{is_alphabetic, is_XID_start, is_XID_continue};
|
|
pub use u_char::{is_lowercase, is_uppercase, is_whitespace};
|
|
pub use u_char::{is_alphanumeric, is_control, is_digit};
|
|
pub use u_char::{to_uppercase, to_lowercase, width, UnicodeChar};
|
|
}
|
|
|
|
pub mod str {
|
|
pub use u_str::{UnicodeStr, Words, Graphemes, GraphemeIndices};
|
|
pub use u_str::{utf8_char_width, is_utf16, Utf16Items, Utf16Item};
|
|
pub use u_str::{utf16_items, Utf16Encoder};
|
|
}
|
|
|
|
// this lets us use #[deriving(..)]
|
|
mod std {
|
|
pub use core::clone;
|
|
pub use core::cmp;
|
|
pub use core::fmt;
|
|
}
|