Commit graph

201 commits

Author SHA1 Message Date
Alex Crichton
d5f2d3b177 std: Update MatchIndices to return a subslice
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).

cc #27743
2015-09-25 09:29:23 -07:00
bors
cedbd998a4 Auto merge of #28339 - alexcrichton:stabilize-1.4, r=aturon
The FCP is coming to a close and 1.4 is coming out soon, so this brings in the
libs team decision for all library features this cycle.

Stabilized APIs:

* `<Box<str>>::into_string`
* `Arc::downgrade`
* `Arc::get_mut`
* `Arc::make_mut`
* `Arc::try_unwrap`
* `Box::from_raw`
* `Box::into_raw`
* `CStr::to_str`
* `CStr::to_string_lossy`
* `CString::from_raw`
* `CString::into_raw`
* `IntoRawFd::into_raw_fd`
* `IntoRawFd`
* `IntoRawHandle::into_raw_handle`
* `IntoRawHandle`
* `IntoRawSocket::into_raw_socket`
* `IntoRawSocket`
* `Rc::downgrade`
* `Rc::get_mut`
* `Rc::make_mut`
* `Rc::try_unwrap`
* `Result::expect`
* `String::into_boxed_slice`
* `TcpSocket::read_timeout`
* `TcpSocket::set_read_timeout`
* `TcpSocket::set_write_timeout`
* `TcpSocket::write_timeout`
* `UdpSocket::read_timeout`
* `UdpSocket::set_read_timeout`
* `UdpSocket::set_write_timeout`
* `UdpSocket::write_timeout`
* `Vec::append`
* `Vec::split_off`
* `VecDeque::append`
* `VecDeque::retain`
* `VecDeque::split_off`
* `rc::Weak::upgrade`
* `rc::Weak`
* `slice::Iter::as_slice`
* `slice::IterMut::into_slice`
* `str::CharIndices::as_str`
* `str::Chars::as_str`
* `str::split_at_mut`
* `str::split_at`
* `sync::Weak::upgrade`
* `sync::Weak`
* `thread::park_timeout`
* `thread::sleep`

Deprecated APIs

* `BTreeMap::with_b`
* `BTreeSet::with_b`
* `Option::as_mut_slice`
* `Option::as_slice`
* `Result::as_mut_slice`
* `Result::as_slice`
* `f32::from_str_radix`
* `f64::from_str_radix`

Closes #27277
Closes #27718
Closes #27736
Closes #27764
Closes #27765
Closes #27766
Closes #27767
Closes #27768
Closes #27769
Closes #27771
Closes #27773
Closes #27775
Closes #27776
Closes #27785
Closes #27792
Closes #27795
Closes #27797
2015-09-13 19:45:15 +00:00
Alex Crichton
f0b1326dc7 std: Stabilize/deprecate features for 1.4
The FCP is coming to a close and 1.4 is coming out soon, so this brings in the
libs team decision for all library features this cycle.

Stabilized APIs:

* `<Box<str>>::into_string`
* `Arc::downgrade`
* `Arc::get_mut`
* `Arc::make_mut`
* `Arc::try_unwrap`
* `Box::from_raw`
* `Box::into_raw`
* `CStr::to_str`
* `CStr::to_string_lossy`
* `CString::from_raw`
* `CString::into_raw`
* `IntoRawFd::into_raw_fd`
* `IntoRawFd`
* `IntoRawHandle::into_raw_handle`
* `IntoRawHandle`
* `IntoRawSocket::into_raw_socket`
* `IntoRawSocket`
* `Rc::downgrade`
* `Rc::get_mut`
* `Rc::make_mut`
* `Rc::try_unwrap`
* `Result::expect`
* `String::into_boxed_slice`
* `TcpSocket::read_timeout`
* `TcpSocket::set_read_timeout`
* `TcpSocket::set_write_timeout`
* `TcpSocket::write_timeout`
* `UdpSocket::read_timeout`
* `UdpSocket::set_read_timeout`
* `UdpSocket::set_write_timeout`
* `UdpSocket::write_timeout`
* `Vec::append`
* `Vec::split_off`
* `VecDeque::append`
* `VecDeque::retain`
* `VecDeque::split_off`
* `rc::Weak::upgrade`
* `rc::Weak`
* `slice::Iter::as_slice`
* `slice::IterMut::into_slice`
* `str::CharIndices::as_str`
* `str::Chars::as_str`
* `str::split_at_mut`
* `str::split_at`
* `sync::Weak::upgrade`
* `sync::Weak`
* `thread::park_timeout`
* `thread::sleep`

Deprecated APIs

* `BTreeMap::with_b`
* `BTreeSet::with_b`
* `Option::as_mut_slice`
* `Option::as_slice`
* `Result::as_mut_slice`
* `Result::as_slice`
* `f32::from_str_radix`
* `f64::from_str_radix`

Closes #27277
Closes #27718
Closes #27736
Closes #27764
Closes #27765
Closes #27766
Closes #27767
Closes #27768
Closes #27769
Closes #27771
Closes #27773
Closes #27775
Closes #27776
Closes #27785
Closes #27792
Closes #27795
Closes #27797
2015-09-11 09:48:48 -07:00
Erick Tryzelaar
fbd91a732b Optimize string comparison by using memcmp
llvm seems to be having some trouble optimizing the iterator-based
string comparsion method into some equivalent to memcmp. This
explicitly calls out to the memcmp intrinisic in order to allow
llvm to generate better code. In some manual benchmarking, this
memcmp-based approach is 20 times faster than the iterator approach.
2015-09-10 16:46:59 -07:00
Andre Bogus
9cca96545f some more clippy-based improvements 2015-09-08 00:36:29 +02:00
bors
94ddfc7707 Auto merge of #28119 - nagisa:bytesderef, r=alexcrichton 2015-09-04 12:34:03 +00:00
Alex Crichton
48615a68fb std: Account for CRLF in {str, BufRead}::lines
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of
the `str::lines` and `BufRead::lines` iterators. Both iterators now account for
`\r\n` sequences in addition to `\n`, allowing for less surprising behavior
across platforms (especially in the `BufRead` case). Splitting *only* on the
`\n` character can still be achieved with `split('\n')` in both cases.

The `str::lines_any` function is also now deprecated as `str::lines` is a
drop-in replacement for it.

[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md

Closes #28032
2015-09-03 23:01:41 -07:00
Manish Goregaokar
a520568ae7 Elide lifetimes in libcore 2015-09-03 17:46:35 +05:30
Simonas Kazlauskas
25dce09a96 Change explicit BytesDeref impl into Cloned iterator 2015-08-31 14:48:28 +03:00
bors
b0f77ba26a Auto merge of #28101 - ijks:24214-str-bytes, r=alexcrichton
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.

Partially closes #24214.
2015-08-31 09:15:55 +00:00
Daan Rijks
dacf2725ec Add overrides to iterator methods for str::Bytes
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.

Partially closes #24214.
2015-08-30 17:32:50 +02:00
Simon Sapin
e33650c16f Add .as_str() to str::Chars and str::CharIndices. See #27775. 2015-08-28 22:44:17 +02:00
bors
de67d62c6b Auto merge of #27474 - bluss:twoway-reverse, r=brson
StrSearcher: Implement the complete reverse case for the two way algorithm

Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
2015-08-18 02:02:57 +00:00
Ulrik Sverdrup
01e8812461 StrSearcher: Additional comments and small code moves
Break out a separate static method to create the "byteset".
2015-08-16 22:37:18 +02:00
Alex Crichton
b7dcf272d9 core: Fill out issues for unstable features 2015-08-15 18:09:16 -07:00
Niko Matsakis
91b3e9cac0 Fallout in libs -- misc missing bounds uncovered by WF checks. 2015-08-12 17:58:56 -04:00
Tobias Bucher
5309fbb6c9 Make str::as_bytes_mut private 2015-08-09 22:05:22 +02:00
Tobias Bucher
22ec5f4af7 Replace many uses of mem::transmute with more specific functions
The replacements are functions that usually use a single `mem::transmute` in
their body and restrict input and output via more concrete types than `T` and
`U`. Worth noting are the `transmute` functions for slices and the `from_utf8*`
family for mutable slices. Additionally, `mem::transmute` was often used for
casting raw pointers, when you can already cast raw pointers just fine with
`as`.
2015-08-09 22:05:22 +02:00
Ulrik Sverdrup
2b82c072c7 StrSearcher: Improve inner loop in TwoWaySearcher::next, next_back
The innermost loop of TwoWaySearcher checks the boundary of the haystack
vs position + needle.len(), and it checks the last byte of the needle
against the byteset.

If these two steps are combined by using the indexing of the last
needle byte's position as bounds check, the algorithm improves its
throughput. We improve the innermost loop by reducing the number of
instructions used, and elminating the panic case for the checked
indexing that was previously used.

Selected benchmarks from the external/workspace testsuite. Benchmarks
improve across the board.

```
before:

test bb_in_aa::twoway_find                  ... bench:       4,229 ns/iter (+/- 1,305) = 23646 MB/s
test bb_in_aa::twoway_rfind                 ... bench:       3,873 ns/iter (+/- 101) = 25819 MB/s
test short_1let_long::twoway_find           ... bench:       7,075 ns/iter (+/- 29) = 360 MB/s
test short_1let_long::twoway_rfind          ... bench:       6,640 ns/iter (+/- 79) = 384 MB/s
test short_2let_long::twoway_find           ... bench:       3,823 ns/iter (+/- 16) = 667 MB/s
test short_2let_long::twoway_rfind          ... bench:       3,774 ns/iter (+/- 44) = 675 MB/s
test short_3let_long::twoway_find           ... bench:       3,582 ns/iter (+/- 47) = 712 MB/s
test short_3let_long::twoway_rfind          ... bench:       3,616 ns/iter (+/- 34) = 705 MB/s

with this commit:

test bb_in_aa::twoway_find                  ... bench:       2,952 ns/iter (+/- 20) = 33875 MB/s
test bb_in_aa::twoway_rfind                 ... bench:       2,939 ns/iter (+/- 99) = 34025 MB/s
test short_1let_long::twoway_find           ... bench:       4,593 ns/iter (+/- 4) = 555 MB/s
test short_1let_long::twoway_rfind          ... bench:       4,592 ns/iter (+/- 76) = 555 MB/s
test short_2let_long::twoway_find           ... bench:       2,804 ns/iter (+/- 3) = 909 MB/s
test short_2let_long::twoway_rfind          ... bench:       2,807 ns/iter (+/- 40) = 908 MB/s
test short_3let_long::twoway_find           ... bench:       3,105 ns/iter (+/- 120) = 821 MB/s
test short_3let_long::twoway_rfind          ... bench:       3,019 ns/iter (+/- 50) = 844 MB/s
```

- `bb_in_aa`: fast skip due to byteset filter loop improves.
- 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
2015-08-07 13:41:17 +02:00
Alex Crichton
5cccf3cd25 syntax: Implement #![no_core]
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of
the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The
`#![no_std]` attribute now injects `extern crate core` at the top of the crate
as well as the libcore prelude into all modules (in the same manner as the
standard library's prelude). The `#![no_core]` attribute disables both std and
core injection.

[rfc]: https://github.com/rust-lang/rfcs/pull/1184
2015-08-03 17:23:01 -07:00
Ulrik Sverdrup
7ebae85bb8 StrSearcher: Implement the full two way algorithm in reverse for rfind
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
2015-08-02 20:09:35 +02:00
Alex Crichton
5af6cf9fa4 std: Remove the curious inner module
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.

The libcore inner module was also trimmed down a bit to the bare bones.
2015-07-29 14:18:24 -07:00
Björn Steinbrink
000e870554 Inline eq_slice_() into eq_slice()
eq_slice_() used to be a common implementation for two function that
both called it, but of those only eq_slice() is left, so we can as well
directly inline the code.
2015-07-21 18:30:18 +02:00
bors
fec23d9d59 Auto merge of #27168 - brson:stdprim, r=steveklabnik
This makes the primitive descriptions on the front page read properly
as descriptions of types and not of the associated modules.

Having the primitive and module docs derived from the same source
causes problems, primarily that they can't contain hyperlinks
cross-referencing each other.
    
This crates dedicated private modules in `std` to document the
primitive types, then for all primitives that have a corresponding
module, puts hyperlinks in moth the primitive docs and the module docs
cross-linking each other.
    
This should help clear up confusion when readers find themselves on
the wrong page.

This also removes all the duplicate `#[doc(primitive)]` tags in various places (especially core), so the core docs will no longer attempt to document the primitives for now. Seems like an acceptable tradeoff to get some cleanup for std.
2015-07-21 13:06:45 +00:00
Brian Anderson
8497c428e5 std: Create separate docs for the primitives
Having the primitive and module docs derived from the same source
causes problems, primarily that they can't contain hyperlinks
cross-referencing each other.

This crates dedicated private modules in `std` to document the
primitive types, then for all primitives that have a corresponding
module, puts hyperlinks in moth the primitive docs and the module docs
cross-linking each other.

This should help clear up confusion when readers find themselves on
the wrong page.
2015-07-20 13:18:06 -07:00
Lee Jeffery
a219917e3f Fix doc comment parsing in macros. 2015-07-18 11:34:59 +01:00
Simon Sapin
7469914e96 Add str::split_at_mut 2015-07-13 16:21:43 +02:00
Simon Sapin
f9005512a9 Implement IndexMut for String and str.
... matching the existing Index impls.
There is no reason not to if String implement DerefMut.

The code removed in `src/librustc/middle/effect.rs` was added in #9750
to prevent things like `s[0] = 0x80` where `s: String`,
but I belive became unnecessary when the Index(Mut) traits were introduced.
2015-07-13 16:21:43 +02:00
Ulrik Sverdrup
0da69969d1 core: Use memcmp in is_prefix_of / is_suffix_of
The basic str equality in core::str calls memcmp, re-use the same
function in StrSearcher's is_prefix_of, is_suffix_of.
2015-07-04 15:10:20 +02:00
Ulrik Sverdrup
274bb24efd StrSearcher: Explicitly separate the long and short cases
This is needed to not drop performance, after the trait-based changes.
Force separate versions of the next method to be generated for the short
and long period cases.
2015-06-24 16:22:09 +02:00
Ulrik Sverdrup
71006bd654 StrSearcher: Use trait to specialize two way algorithm by case
Use a trait to be able to implement both the fast search that skips to
each match, and the slower search that emits `Reject` intervals
regularly. The latter is important for uses of `next_reject`.
2015-06-21 19:58:56 +02:00
Ulrik Sverdrup
a6dd2031a3 StrSearcher: Specialize is_prefix_of/is_suffix_of for &str 2015-06-21 19:58:56 +02:00
Ulrik Sverdrup
b890b7bbc7 StrSearcher: Update substring search to use the Two Way algorithm
To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Much thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```

For this particular case, two way is not much faster than the naive
implementation it replaces.
2015-06-21 19:58:50 +02:00
Alex Crichton
2d389c125e std: Stabilize the str_matches feature
This commit stabilizes the `str::{matches, rmatches}` functions and iterators,
but renames the unstable feature for the `str::{matches,rmatches}_indices`
function to `str_match_indices` due to the comment present on the functions
about the iterator's return value.
2015-06-17 09:07:16 -07:00
Alex Crichton
a05ed9936d std: Remove two internal str_internals functions
These were just exposed to be used elsewhere at some point, but neither is
currently being used so just make them private again.
2015-06-17 09:07:16 -07:00
Alex Crichton
c14d86fd3f core: Split apart the global core feature
This commit shards the broad `core` feature of the libcore library into finer
grained features. This split groups together similar APIs and enables tracking
each API separately, giving a better sense of where each feature is within the
stabilization process.

A few minor APIs were deprecated along the way:

* Iterator::reverse_in_place
* marker::NoCopy
2015-06-17 09:06:59 -07:00
bors
af8a4a0805 Auto merge of #26313 - steveklabnik:fix_str_docs, r=alexcrichton
Because these structures are created by a macro, the doc comments
don't quite work: the leading /// isn't stripped. Instead, just
use #[doc] so that they render correctly.
2015-06-16 01:57:34 +00:00
Steve Klabnik
759a5d1022 Fix up Split docs
Because these structures are created by a macro, the doc comments
don't quite work: the leading /// isn't stripped. Instead, just
use #[doc] so that they render correctly.
2015-06-15 12:25:10 -04:00
Ulrik Sverdrup
d43bf53948 Add str::split_at
Implement RFC rust-lang/rfcs#1123

Add str method str::split_at(mid: usize) -> (&str, &str).
2015-06-10 09:15:07 +02:00
Ulrik Sverdrup
4e8afd65f9 docs: Update FromStr documentation
Fixes #25250
2015-05-11 03:41:53 +02:00
Nick Hamann
a1898f890d Convert #[lang="..."] to #[lang = "..."]
In my opinion this looks nicer, but also it matches the whitespace generally
used for stability markers more closely.
2015-05-09 14:50:28 -05:00
Sean McArthur
aaa3641754 core: impl AsRef<[u8]> for str 2015-05-08 17:13:54 -07:00
Joseph Crail
464069a4bf Fix spelling errors in documentation. 2015-05-04 13:21:27 -04:00
Jan Bujak
91ea0c4f12 Add #[inline(always)] to str::from_utf8_unchecked
Without the inline annotation this:
    str::from_utf8_unchecked( slice::from_raw_parts( ptr, len ) )
doesn't get inlined which can be pretty brutal performance-wise
when used in an inner loop of a low level string manipulation method.
2015-05-03 12:09:40 +02:00
Alex Crichton
eeb94886ad std: Remove deprecated/unstable num functionality
This commit removes all the old casting/generic traits from `std::num` that are
no longer in use by the standard library. This additionally removes the old
`strconv` module which has not seen much use in quite a long time. All generic
functionality has been supplanted with traits in the `num` crate and the
`strconv` module is supplanted with the [rust-strconv crate][rust-strconv].

[rust-strconv]: https://github.com/lifthrasiir/rust-strconv

This is a breaking change due to the removal of these deprecated crates, and the
alternative crates are listed above.

[breaking-change]
2015-04-21 11:37:43 -07:00
Alex Crichton
69ded69d63 std: Remove deprecated AsOsStr/Str/AsSlice traits
Cleaning out more deprecated items
2015-04-21 11:37:34 -07:00
bors
7397bdc9c5 Auto merge of #24620 - pczarn:model-lexer-issues, r=cmr
Fixes #15679
Fixes #15878
Fixes #15882
Closes #15883
2015-04-21 14:37:53 +00:00
Piotr Czarnecki
13bc8afa4b Model lexer: Fix remaining issues 2015-04-21 12:02:12 +02:00
Steve Klabnik
1dd8a651ba Make iterator struct docs more consistent.
Fixes #24008.
2015-04-20 09:55:40 -04:00
Tamir Duberstein
29ac04402d Positive case of len() -> is_empty()
`s/(?<!\{ self)(?<=\.)len\(\) == 0/is_empty()/g`
2015-04-14 20:26:03 -07:00