Commit graph

228 commits

Author SHA1 Message Date
Oliver Middleton
9673ace339 Fix a couple of issues in from_utf8 docs 2016-02-14 18:38:37 +00:00
Oliver Middleton
bc8495abd8 Fix doc error for Utf8Error 2016-02-14 17:52:05 +00:00
Carlos E. Garcia
02aa0aff2f Minor spelling fixes 2016-02-09 11:52:39 -05:00
Kamal Marhubi
129a6239d2 docs: Standardize on 'Errors' header in std docs 2016-02-01 21:41:29 -05:00
Ulrik Sverdrup
ba9a3bc453 core: Use raw pointers to avoid aliasing in str::split_at_mut
Introduce private function from_raw_parts_mut for str to factor out the logic.

We want to use raw pointers here instead of duplicating a &mut str, to
be on safer ground w.r.t rust aliasing rules.
2016-01-21 15:25:49 +01:00
Alex Crichton
9a4f43b9b6 std: Stabilize APIs for the 1.7 release
This commit stabilizes and deprecates the FCP (final comment period) APIs for
the upcoming 1.7 beta release. The specific APIs which changed were:

Stabilized

* `Path::strip_prefix` (renamed from `relative_from`)
* `path::StripPrefixError` (new error type returned from `strip_prefix`)
* `Ipv4Addr::is_loopback`
* `Ipv4Addr::is_private`
* `Ipv4Addr::is_link_local`
* `Ipv4Addr::is_multicast`
* `Ipv4Addr::is_broadcast`
* `Ipv4Addr::is_documentation`
* `Ipv6Addr::is_unspecified`
* `Ipv6Addr::is_loopback`
* `Ipv6Addr::is_unique_local`
* `Ipv6Addr::is_multicast`
* `Vec::as_slice`
* `Vec::as_mut_slice`
* `String::as_str`
* `String::as_mut_str`
* `<[T]>::clone_from_slice` - the `usize` return value is removed
* `<[T]>::sort_by_key`
* `i32::checked_rem` (and other signed types)
* `i32::checked_neg` (and other signed types)
* `i32::checked_shl` (and other signed types)
* `i32::checked_shr` (and other signed types)
* `i32::saturating_mul` (and other signed types)
* `i32::overflowing_add` (and other signed types)
* `i32::overflowing_sub` (and other signed types)
* `i32::overflowing_mul` (and other signed types)
* `i32::overflowing_div` (and other signed types)
* `i32::overflowing_rem` (and other signed types)
* `i32::overflowing_neg` (and other signed types)
* `i32::overflowing_shl` (and other signed types)
* `i32::overflowing_shr` (and other signed types)
* `u32::checked_rem` (and other unsigned types)
* `u32::checked_neg` (and other unsigned types)
* `u32::checked_shl` (and other unsigned types)
* `u32::saturating_mul` (and other unsigned types)
* `u32::overflowing_add` (and other unsigned types)
* `u32::overflowing_sub` (and other unsigned types)
* `u32::overflowing_mul` (and other unsigned types)
* `u32::overflowing_div` (and other unsigned types)
* `u32::overflowing_rem` (and other unsigned types)
* `u32::overflowing_neg` (and other unsigned types)
* `u32::overflowing_shl` (and other unsigned types)
* `u32::overflowing_shr` (and other unsigned types)
* `ffi::IntoStringError`
* `CString::into_string`
* `CString::into_bytes`
* `CString::into_bytes_with_nul`
* `From<CString> for Vec<u8>`
* `From<CString> for Vec<u8>`
* `IntoStringError::into_cstring`
* `IntoStringError::utf8_error`
* `Error for IntoStringError`

Deprecated

* `Path::relative_from` - renamed to `strip_prefix`
* `Path::prefix` - use `components().next()` instead
* `os::unix::fs` constants - moved to the `libc` crate
* `fmt::{radix, Radix, RadixFmt}` - not used enough to stabilize
* `IntoCow` - conflicts with `Into` and may come back later
* `i32::{BITS, BYTES}` (and other integers) - not pulling their weight
* `DebugTuple::formatter` - will be removed
* `sync::Semaphore` - not used enough and confused with system semaphores

Closes #23284
cc #27709 (still lots more methods though)
Closes #27712
Closes #27722
Closes #27728
Closes #27735
Closes #27729
Closes #27755
Closes #27782
Closes #27798
2016-01-16 11:03:10 -08:00
Ulrik Sverdrup
cadcd70775 UTF-8 validation: Add missing if conditional for short input
We need to guard that `len` is large enough for the fast skip loop.
2016-01-14 14:59:55 +01:00
Ulrik Sverdrup
11e3de39d9 Add fast path for ASCII in UTF-8 validation
This speeds up the ascii case (and long stretches of ascii in otherwise
mixed UTF-8 data) when checking UTF-8 validity.

Benchmark results suggest that on purely ASCII input, we can improve
throughput (megabytes verified / second) by a factor of 13 to 14!
On xml and mostly english language input (en.wikipedia xml dump),
throughput increases by a factor 7.

On mostly non-ASCII input, performance increases slightly or is the
same.

The UTF-8 validation is rewritten to use indexed access; since all
access is preceded by a (mandatory for validation) length check, they
are statically elided by llvm and this formulation is in fact the best
for performance. A previous version had losses due to slice to iterator
conversions.

A large credit to Björn Steinbrink who improved this patch immensely,
writing this second version.

Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3.

Old code is `regular`, this PR is called `fast`.

Datasets:

- `ascii` is just ascii (2.5 kB)
- `cyr` is cyrillic script with ascii spaces (5 kB)
- `dewik10` is 10MB of a de.wikipedia xml dump
- `enwik10` is 100MB of an en.wikipedia xml dump
- `jawik10` is 10MB of a ja.wikipedia xml dump

```
test from_utf8_ascii_fast        ... bench:         140 ns/iter (+/- 4) = 18221 MB/s
test from_utf8_ascii_regular     ... bench:       1,932 ns/iter (+/- 19) = 1320 MB/s
test from_utf8_cyr_fast          ... bench:      10,025 ns/iter (+/- 245) = 511 MB/s
test from_utf8_cyr_regular       ... bench:      12,250 ns/iter (+/- 437) = 418 MB/s
test from_utf8_dewik10_fast      ... bench:   6,017,909 ns/iter (+/- 105,755) = 1740 MB/s
test from_utf8_dewik10_regular   ... bench:  11,669,493 ns/iter (+/- 264,045) = 891 MB/s
test from_utf8_enwik8_fast       ... bench:  14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s
test from_utf8_enwik8_regular    ... bench:  93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s
test from_utf8_jawik10_fast      ... bench:  29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s
test from_utf8_jawik10_regular   ... bench:  29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s
```

Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>
2016-01-12 21:57:04 +01:00
Ori Avtalion
d01ec6ccca Add missing links to str docs 2015-12-11 22:52:28 +02:00
Alex Crichton
464cdff102 std: Stabilize APIs for the 1.6 release
This commit is the standard API stabilization commit for the 1.6 release cycle.
The list of issues and APIs below have all been through their cycle-long FCP and
the libs team decisions are listed below

Stabilized APIs

* `Read::read_exact`
* `ErrorKind::UnexpectedEof` (renamed from `UnexpectedEOF`)
* libcore -- this was a bit of a nuanced stabilization, the crate itself is now
  marked as `#[stable]` and the methods appearing via traits for primitives like
  `char` and `str` are now also marked as stable. Note that the extension traits
  themeselves are marked as unstable as they're imported via the prelude. The
  `try!` macro was also moved from the standard library into libcore to have the
  same interface. Otherwise the functions all have copied stability from the
  standard library now.
* The `#![no_std]` attribute
* `fs::DirBuilder`
* `fs::DirBuilder::new`
* `fs::DirBuilder::recursive`
* `fs::DirBuilder::create`
* `os::unix::fs::DirBuilderExt`
* `os::unix::fs::DirBuilderExt::mode`
* `vec::Drain`
* `vec::Vec::drain`
* `string::Drain`
* `string::String::drain`
* `vec_deque::Drain`
* `vec_deque::VecDeque::drain`
* `collections::hash_map::Drain`
* `collections::hash_map::HashMap::drain`
* `collections::hash_set::Drain`
* `collections::hash_set::HashSet::drain`
* `collections::binary_heap::Drain`
* `collections::binary_heap::BinaryHeap::drain`
* `Vec::extend_from_slice` (renamed from `push_all`)
* `Mutex::get_mut`
* `Mutex::into_inner`
* `RwLock::get_mut`
* `RwLock::into_inner`
* `Iterator::min_by_key` (renamed from `min_by`)
* `Iterator::max_by_key` (renamed from `max_by`)

Deprecated APIs

* `ErrorKind::UnexpectedEOF` (renamed to `UnexpectedEof`)
* `OsString::from_bytes`
* `OsStr::to_cstring`
* `OsStr::to_bytes`
* `fs::walk_dir` and `fs::WalkDir`
* `path::Components::peek`
* `slice::bytes::MutableByteVector`
* `slice::bytes::copy_memory`
* `Vec::push_all` (renamed to `extend_from_slice`)
* `Duration::span`
* `IpAddr`
* `SocketAddr::ip`
* `Read::tee`
* `io::Tee`
* `Write::broadcast`
* `io::Broadcast`
* `Iterator::min_by` (renamed to `min_by_key`)
* `Iterator::max_by` (renamed to `max_by_key`)
* `net::lookup_addr`

New APIs (still unstable)

* `<[T]>::sort_by_key` (added to mirror `min_by_key`)

Closes #27585
Closes #27704
Closes #27707
Closes #27710
Closes #27711
Closes #27727
Closes #27740
Closes #27744
Closes #27799
Closes #27801
cc #27801 (doesn't close as `Chars` is still unstable)
Closes #28968
2015-12-05 15:09:44 -08:00
bors
040a77f772 Auto merge of #29952 - petrochenkov:depr, r=brson
Part of https://github.com/rust-lang/rust/issues/29935

The deprecation lint is still called "deprecated", so people can continue using `#[allow(deprecated)]` and similar things.
2015-11-23 20:08:49 +00:00
Manish Goregaokar
c0f9a39e5c Mark slice_error_fail as a cold path 2015-11-23 09:34:58 +05:30
Vadim Petrochenkov
a613059e3f Rename #[deprecated] to #[rustc_deprecated] 2015-11-20 16:11:20 +03:00
Devon Hollowood
da5dd298d4 Add information about str::parse() in FromStr docs 2015-11-20 00:41:10 -08:00
Vadim Petrochenkov
7e2ffc7090 Add missing annotations and some tests 2015-11-18 01:24:21 +03:00
Kevin Butler
82784cb89d libcore: deny warnings in doctests 2015-11-12 05:16:08 +00:00
Vadim Petrochenkov
2ef07f0519 Remove stability annotations from trait impl items
Remove `stable` stability annotations from inherent impls
2015-11-06 00:13:46 +03:00
bors
e02ada6d38 Auto merge of #29254 - alexcrichton:stabilize-1.5, r=brson
This commit stabilizes and deprecates library APIs whose FCP has closed in the
last cycle, specifically:

Stabilized APIs:

* `fs::canonicalize`
* `Path::{metadata, symlink_metadata, canonicalize, read_link, read_dir, exists,
   is_file, is_dir}` - all moved to inherent methods from the `PathExt` trait.
* `Formatter::fill`
* `Formatter::width`
* `Formatter::precision`
* `Formatter::sign_plus`
* `Formatter::sign_minus`
* `Formatter::alternate`
* `Formatter::sign_aware_zero_pad`
* `string::ParseError`
* `Utf8Error::valid_up_to`
* `Iterator::{cmp, partial_cmp, eq, ne, lt, le, gt, ge}`
* `<[T]>::split_{first,last}{,_mut}`
* `Condvar::wait_timeout` - note that `wait_timeout_ms` is not yet deprecated
  but will be once 1.5 is released.
* `str::{R,}MatchIndices`
* `str::{r,}match_indices`
* `char::from_u32_unchecked`
* `VecDeque::insert`
* `VecDeque::shrink_to_fit`
* `VecDeque::as_slices`
* `VecDeque::as_mut_slices`
* `VecDeque::swap_remove_front` - (renamed from `swap_front_remove`)
* `VecDeque::swap_remove_back` - (renamed from `swap_back_remove`)
* `Vec::resize`
* `str::slice_mut_unchecked`
* `FileTypeExt`
* `FileTypeExt::{is_block_device, is_char_device, is_fifo, is_socket}`
* `BinaryHeap::from` - `from_vec` deprecated in favor of this
* `BinaryHeap::into_vec` - plus a `Into` impl
* `BinaryHeap::into_sorted_vec`

Deprecated APIs

* `slice::ref_slice`
* `slice::mut_ref_slice`
* `iter::{range_inclusive, RangeInclusive}`
* `std::dynamic_lib`

Closes #27706
Closes #27725
cc #27726 (align not stabilized yet)
Closes #27734
Closes #27737
Closes #27742
Closes #27743
Closes #27772
Closes #27774
Closes #27777
Closes #27781
cc #27788 (a few remaining methods though)
Closes #27790
Closes #27793
Closes #27796
Closes #27810
cc #28147 (not all parts stabilized)
2015-10-25 16:38:38 +00:00
Alex Crichton
ff49733274 std: Stabilize library APIs for 1.5
This commit stabilizes and deprecates library APIs whose FCP has closed in the
last cycle, specifically:

Stabilized APIs:

* `fs::canonicalize`
* `Path::{metadata, symlink_metadata, canonicalize, read_link, read_dir, exists,
   is_file, is_dir}` - all moved to inherent methods from the `PathExt` trait.
* `Formatter::fill`
* `Formatter::width`
* `Formatter::precision`
* `Formatter::sign_plus`
* `Formatter::sign_minus`
* `Formatter::alternate`
* `Formatter::sign_aware_zero_pad`
* `string::ParseError`
* `Utf8Error::valid_up_to`
* `Iterator::{cmp, partial_cmp, eq, ne, lt, le, gt, ge}`
* `<[T]>::split_{first,last}{,_mut}`
* `Condvar::wait_timeout` - note that `wait_timeout_ms` is not yet deprecated
  but will be once 1.5 is released.
* `str::{R,}MatchIndices`
* `str::{r,}match_indices`
* `char::from_u32_unchecked`
* `VecDeque::insert`
* `VecDeque::shrink_to_fit`
* `VecDeque::as_slices`
* `VecDeque::as_mut_slices`
* `VecDeque::swap_remove_front` - (renamed from `swap_front_remove`)
* `VecDeque::swap_remove_back` - (renamed from `swap_back_remove`)
* `Vec::resize`
* `str::slice_mut_unchecked`
* `FileTypeExt`
* `FileTypeExt::{is_block_device, is_char_device, is_fifo, is_socket}`
* `BinaryHeap::from` - `from_vec` deprecated in favor of this
* `BinaryHeap::into_vec` - plus a `Into` impl
* `BinaryHeap::into_sorted_vec`

Deprecated APIs

* `slice::ref_slice`
* `slice::mut_ref_slice`
* `iter::{range_inclusive, RangeInclusive}`
* `std::dynamic_lib`

Closes #27706
Closes #27725
cc #27726 (align not stabilized yet)
Closes #27734
Closes #27737
Closes #27742
Closes #27743
Closes #27772
Closes #27774
Closes #27777
Closes #27781
cc #27788 (a few remaining methods though)
Closes #27790
Closes #27793
Closes #27796
Closes #27810
cc #28147 (not all parts stabilized)
2015-10-25 09:36:32 -07:00
Steve Klabnik
b17433292d Unsafety -> Safety in doc headings
Follow https://doc.rust-lang.org/book/documentation.html#special-sections
2015-10-23 11:42:14 -04:00
Andrew Paseltiner
1162b3752c Correct spelling in docs 2015-10-13 09:44:11 -04:00
Andrew Chin
dce58baff0 Trivial typo fix: from_utrf8 should be from_utf8 2015-10-10 23:33:43 -04:00
Cristi Cobzarenco
4b308b44e1 typos: fix a grabbag of typos all over the place 2015-10-08 19:49:31 +01:00
Steve Klabnik
4d73da92f0 Improve documentation for the from_utf8 family
Our docs were very basic for the various versions of from_utf8, so
this commit beefs them up.

It also improves docs for the &str variant's error, Utf8Error.
2015-10-02 19:42:25 -04:00
Steve Klabnik
80130005da Add some docs for FromString::from_str
@marchelzo pointed out on IRC that this doesn't have docs, so, let's
change that.
2015-09-30 17:42:41 -04:00
bors
54f7b1d455 Auto merge of #28632 - alexcrichton:update-match-indices, r=Kimundi
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).

cc #27743
2015-09-26 20:06:51 +00:00
Alex Crichton
d5f2d3b177 std: Update MatchIndices to return a subslice
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).

cc #27743
2015-09-25 09:29:23 -07:00
Simon Sapin
081278eb7a Utf8Error::valid_up_to: make documented semantics more precise/useful 2015-09-24 18:54:12 +02:00
bors
cedbd998a4 Auto merge of #28339 - alexcrichton:stabilize-1.4, r=aturon
The FCP is coming to a close and 1.4 is coming out soon, so this brings in the
libs team decision for all library features this cycle.

Stabilized APIs:

* `<Box<str>>::into_string`
* `Arc::downgrade`
* `Arc::get_mut`
* `Arc::make_mut`
* `Arc::try_unwrap`
* `Box::from_raw`
* `Box::into_raw`
* `CStr::to_str`
* `CStr::to_string_lossy`
* `CString::from_raw`
* `CString::into_raw`
* `IntoRawFd::into_raw_fd`
* `IntoRawFd`
* `IntoRawHandle::into_raw_handle`
* `IntoRawHandle`
* `IntoRawSocket::into_raw_socket`
* `IntoRawSocket`
* `Rc::downgrade`
* `Rc::get_mut`
* `Rc::make_mut`
* `Rc::try_unwrap`
* `Result::expect`
* `String::into_boxed_slice`
* `TcpSocket::read_timeout`
* `TcpSocket::set_read_timeout`
* `TcpSocket::set_write_timeout`
* `TcpSocket::write_timeout`
* `UdpSocket::read_timeout`
* `UdpSocket::set_read_timeout`
* `UdpSocket::set_write_timeout`
* `UdpSocket::write_timeout`
* `Vec::append`
* `Vec::split_off`
* `VecDeque::append`
* `VecDeque::retain`
* `VecDeque::split_off`
* `rc::Weak::upgrade`
* `rc::Weak`
* `slice::Iter::as_slice`
* `slice::IterMut::into_slice`
* `str::CharIndices::as_str`
* `str::Chars::as_str`
* `str::split_at_mut`
* `str::split_at`
* `sync::Weak::upgrade`
* `sync::Weak`
* `thread::park_timeout`
* `thread::sleep`

Deprecated APIs

* `BTreeMap::with_b`
* `BTreeSet::with_b`
* `Option::as_mut_slice`
* `Option::as_slice`
* `Result::as_mut_slice`
* `Result::as_slice`
* `f32::from_str_radix`
* `f64::from_str_radix`

Closes #27277
Closes #27718
Closes #27736
Closes #27764
Closes #27765
Closes #27766
Closes #27767
Closes #27768
Closes #27769
Closes #27771
Closes #27773
Closes #27775
Closes #27776
Closes #27785
Closes #27792
Closes #27795
Closes #27797
2015-09-13 19:45:15 +00:00
Alex Crichton
f0b1326dc7 std: Stabilize/deprecate features for 1.4
The FCP is coming to a close and 1.4 is coming out soon, so this brings in the
libs team decision for all library features this cycle.

Stabilized APIs:

* `<Box<str>>::into_string`
* `Arc::downgrade`
* `Arc::get_mut`
* `Arc::make_mut`
* `Arc::try_unwrap`
* `Box::from_raw`
* `Box::into_raw`
* `CStr::to_str`
* `CStr::to_string_lossy`
* `CString::from_raw`
* `CString::into_raw`
* `IntoRawFd::into_raw_fd`
* `IntoRawFd`
* `IntoRawHandle::into_raw_handle`
* `IntoRawHandle`
* `IntoRawSocket::into_raw_socket`
* `IntoRawSocket`
* `Rc::downgrade`
* `Rc::get_mut`
* `Rc::make_mut`
* `Rc::try_unwrap`
* `Result::expect`
* `String::into_boxed_slice`
* `TcpSocket::read_timeout`
* `TcpSocket::set_read_timeout`
* `TcpSocket::set_write_timeout`
* `TcpSocket::write_timeout`
* `UdpSocket::read_timeout`
* `UdpSocket::set_read_timeout`
* `UdpSocket::set_write_timeout`
* `UdpSocket::write_timeout`
* `Vec::append`
* `Vec::split_off`
* `VecDeque::append`
* `VecDeque::retain`
* `VecDeque::split_off`
* `rc::Weak::upgrade`
* `rc::Weak`
* `slice::Iter::as_slice`
* `slice::IterMut::into_slice`
* `str::CharIndices::as_str`
* `str::Chars::as_str`
* `str::split_at_mut`
* `str::split_at`
* `sync::Weak::upgrade`
* `sync::Weak`
* `thread::park_timeout`
* `thread::sleep`

Deprecated APIs

* `BTreeMap::with_b`
* `BTreeSet::with_b`
* `Option::as_mut_slice`
* `Option::as_slice`
* `Result::as_mut_slice`
* `Result::as_slice`
* `f32::from_str_radix`
* `f64::from_str_radix`

Closes #27277
Closes #27718
Closes #27736
Closes #27764
Closes #27765
Closes #27766
Closes #27767
Closes #27768
Closes #27769
Closes #27771
Closes #27773
Closes #27775
Closes #27776
Closes #27785
Closes #27792
Closes #27795
Closes #27797
2015-09-11 09:48:48 -07:00
Erick Tryzelaar
fbd91a732b Optimize string comparison by using memcmp
llvm seems to be having some trouble optimizing the iterator-based
string comparsion method into some equivalent to memcmp. This
explicitly calls out to the memcmp intrinisic in order to allow
llvm to generate better code. In some manual benchmarking, this
memcmp-based approach is 20 times faster than the iterator approach.
2015-09-10 16:46:59 -07:00
Andre Bogus
9cca96545f some more clippy-based improvements 2015-09-08 00:36:29 +02:00
bors
94ddfc7707 Auto merge of #28119 - nagisa:bytesderef, r=alexcrichton 2015-09-04 12:34:03 +00:00
Alex Crichton
48615a68fb std: Account for CRLF in {str, BufRead}::lines
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of
the `str::lines` and `BufRead::lines` iterators. Both iterators now account for
`\r\n` sequences in addition to `\n`, allowing for less surprising behavior
across platforms (especially in the `BufRead` case). Splitting *only* on the
`\n` character can still be achieved with `split('\n')` in both cases.

The `str::lines_any` function is also now deprecated as `str::lines` is a
drop-in replacement for it.

[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md

Closes #28032
2015-09-03 23:01:41 -07:00
Manish Goregaokar
a520568ae7 Elide lifetimes in libcore 2015-09-03 17:46:35 +05:30
Simonas Kazlauskas
25dce09a96 Change explicit BytesDeref impl into Cloned iterator 2015-08-31 14:48:28 +03:00
bors
b0f77ba26a Auto merge of #28101 - ijks:24214-str-bytes, r=alexcrichton
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.

Partially closes #24214.
2015-08-31 09:15:55 +00:00
Daan Rijks
dacf2725ec Add overrides to iterator methods for str::Bytes
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.

Partially closes #24214.
2015-08-30 17:32:50 +02:00
Simon Sapin
e33650c16f Add .as_str() to str::Chars and str::CharIndices. See #27775. 2015-08-28 22:44:17 +02:00
bors
de67d62c6b Auto merge of #27474 - bluss:twoway-reverse, r=brson
StrSearcher: Implement the complete reverse case for the two way algorithm

Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
2015-08-18 02:02:57 +00:00
Ulrik Sverdrup
01e8812461 StrSearcher: Additional comments and small code moves
Break out a separate static method to create the "byteset".
2015-08-16 22:37:18 +02:00
Alex Crichton
b7dcf272d9 core: Fill out issues for unstable features 2015-08-15 18:09:16 -07:00
Niko Matsakis
91b3e9cac0 Fallout in libs -- misc missing bounds uncovered by WF checks. 2015-08-12 17:58:56 -04:00
Tobias Bucher
5309fbb6c9 Make str::as_bytes_mut private 2015-08-09 22:05:22 +02:00
Tobias Bucher
22ec5f4af7 Replace many uses of mem::transmute with more specific functions
The replacements are functions that usually use a single `mem::transmute` in
their body and restrict input and output via more concrete types than `T` and
`U`. Worth noting are the `transmute` functions for slices and the `from_utf8*`
family for mutable slices. Additionally, `mem::transmute` was often used for
casting raw pointers, when you can already cast raw pointers just fine with
`as`.
2015-08-09 22:05:22 +02:00
Ulrik Sverdrup
2b82c072c7 StrSearcher: Improve inner loop in TwoWaySearcher::next, next_back
The innermost loop of TwoWaySearcher checks the boundary of the haystack
vs position + needle.len(), and it checks the last byte of the needle
against the byteset.

If these two steps are combined by using the indexing of the last
needle byte's position as bounds check, the algorithm improves its
throughput. We improve the innermost loop by reducing the number of
instructions used, and elminating the panic case for the checked
indexing that was previously used.

Selected benchmarks from the external/workspace testsuite. Benchmarks
improve across the board.

```
before:

test bb_in_aa::twoway_find                  ... bench:       4,229 ns/iter (+/- 1,305) = 23646 MB/s
test bb_in_aa::twoway_rfind                 ... bench:       3,873 ns/iter (+/- 101) = 25819 MB/s
test short_1let_long::twoway_find           ... bench:       7,075 ns/iter (+/- 29) = 360 MB/s
test short_1let_long::twoway_rfind          ... bench:       6,640 ns/iter (+/- 79) = 384 MB/s
test short_2let_long::twoway_find           ... bench:       3,823 ns/iter (+/- 16) = 667 MB/s
test short_2let_long::twoway_rfind          ... bench:       3,774 ns/iter (+/- 44) = 675 MB/s
test short_3let_long::twoway_find           ... bench:       3,582 ns/iter (+/- 47) = 712 MB/s
test short_3let_long::twoway_rfind          ... bench:       3,616 ns/iter (+/- 34) = 705 MB/s

with this commit:

test bb_in_aa::twoway_find                  ... bench:       2,952 ns/iter (+/- 20) = 33875 MB/s
test bb_in_aa::twoway_rfind                 ... bench:       2,939 ns/iter (+/- 99) = 34025 MB/s
test short_1let_long::twoway_find           ... bench:       4,593 ns/iter (+/- 4) = 555 MB/s
test short_1let_long::twoway_rfind          ... bench:       4,592 ns/iter (+/- 76) = 555 MB/s
test short_2let_long::twoway_find           ... bench:       2,804 ns/iter (+/- 3) = 909 MB/s
test short_2let_long::twoway_rfind          ... bench:       2,807 ns/iter (+/- 40) = 908 MB/s
test short_3let_long::twoway_find           ... bench:       3,105 ns/iter (+/- 120) = 821 MB/s
test short_3let_long::twoway_rfind          ... bench:       3,019 ns/iter (+/- 50) = 844 MB/s
```

- `bb_in_aa`: fast skip due to byteset filter loop improves.
- 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
2015-08-07 13:41:17 +02:00
Alex Crichton
5cccf3cd25 syntax: Implement #![no_core]
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of
the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The
`#![no_std]` attribute now injects `extern crate core` at the top of the crate
as well as the libcore prelude into all modules (in the same manner as the
standard library's prelude). The `#![no_core]` attribute disables both std and
core injection.

[rfc]: https://github.com/rust-lang/rfcs/pull/1184
2015-08-03 17:23:01 -07:00
Ulrik Sverdrup
7ebae85bb8 StrSearcher: Implement the full two way algorithm in reverse for rfind
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
2015-08-02 20:09:35 +02:00
Alex Crichton
5af6cf9fa4 std: Remove the curious inner module
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.

The libcore inner module was also trimmed down a bit to the bare bones.
2015-07-29 14:18:24 -07:00
Björn Steinbrink
000e870554 Inline eq_slice_() into eq_slice()
eq_slice_() used to be a common implementation for two function that
both called it, but of those only eq_slice() is left, so we can as well
directly inline the code.
2015-07-21 18:30:18 +02:00