Commit graph

29 commits

Author SHA1 Message Date
bors
de67d62c6b Auto merge of #27474 - bluss:twoway-reverse, r=brson
StrSearcher: Implement the complete reverse case for the two way algorithm

Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
2015-08-18 02:02:57 +00:00
Ulrik Sverdrup
01e8812461 StrSearcher: Additional comments and small code moves
Break out a separate static method to create the "byteset".
2015-08-16 22:37:18 +02:00
Alex Crichton
b7dcf272d9 core: Fill out issues for unstable features 2015-08-15 18:09:16 -07:00
Ulrik Sverdrup
2b82c072c7 StrSearcher: Improve inner loop in TwoWaySearcher::next, next_back
The innermost loop of TwoWaySearcher checks the boundary of the haystack
vs position + needle.len(), and it checks the last byte of the needle
against the byteset.

If these two steps are combined by using the indexing of the last
needle byte's position as bounds check, the algorithm improves its
throughput. We improve the innermost loop by reducing the number of
instructions used, and elminating the panic case for the checked
indexing that was previously used.

Selected benchmarks from the external/workspace testsuite. Benchmarks
improve across the board.

```
before:

test bb_in_aa::twoway_find                  ... bench:       4,229 ns/iter (+/- 1,305) = 23646 MB/s
test bb_in_aa::twoway_rfind                 ... bench:       3,873 ns/iter (+/- 101) = 25819 MB/s
test short_1let_long::twoway_find           ... bench:       7,075 ns/iter (+/- 29) = 360 MB/s
test short_1let_long::twoway_rfind          ... bench:       6,640 ns/iter (+/- 79) = 384 MB/s
test short_2let_long::twoway_find           ... bench:       3,823 ns/iter (+/- 16) = 667 MB/s
test short_2let_long::twoway_rfind          ... bench:       3,774 ns/iter (+/- 44) = 675 MB/s
test short_3let_long::twoway_find           ... bench:       3,582 ns/iter (+/- 47) = 712 MB/s
test short_3let_long::twoway_rfind          ... bench:       3,616 ns/iter (+/- 34) = 705 MB/s

with this commit:

test bb_in_aa::twoway_find                  ... bench:       2,952 ns/iter (+/- 20) = 33875 MB/s
test bb_in_aa::twoway_rfind                 ... bench:       2,939 ns/iter (+/- 99) = 34025 MB/s
test short_1let_long::twoway_find           ... bench:       4,593 ns/iter (+/- 4) = 555 MB/s
test short_1let_long::twoway_rfind          ... bench:       4,592 ns/iter (+/- 76) = 555 MB/s
test short_2let_long::twoway_find           ... bench:       2,804 ns/iter (+/- 3) = 909 MB/s
test short_2let_long::twoway_rfind          ... bench:       2,807 ns/iter (+/- 40) = 908 MB/s
test short_3let_long::twoway_find           ... bench:       3,105 ns/iter (+/- 120) = 821 MB/s
test short_3let_long::twoway_rfind          ... bench:       3,019 ns/iter (+/- 50) = 844 MB/s
```

- `bb_in_aa`: fast skip due to byteset filter loop improves.
- 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
2015-08-07 13:41:17 +02:00
Alex Crichton
5cccf3cd25 syntax: Implement #![no_core]
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of
the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The
`#![no_std]` attribute now injects `extern crate core` at the top of the crate
as well as the libcore prelude into all modules (in the same manner as the
standard library's prelude). The `#![no_core]` attribute disables both std and
core injection.

[rfc]: https://github.com/rust-lang/rfcs/pull/1184
2015-08-03 17:23:01 -07:00
Ulrik Sverdrup
7ebae85bb8 StrSearcher: Implement the full two way algorithm in reverse for rfind
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
2015-08-02 20:09:35 +02:00
Alex Crichton
5af6cf9fa4 std: Remove the curious inner module
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.

The libcore inner module was also trimmed down a bit to the bare bones.
2015-07-29 14:18:24 -07:00
Ulrik Sverdrup
0da69969d1 core: Use memcmp in is_prefix_of / is_suffix_of
The basic str equality in core::str calls memcmp, re-use the same
function in StrSearcher's is_prefix_of, is_suffix_of.
2015-07-04 15:10:20 +02:00
Ulrik Sverdrup
274bb24efd StrSearcher: Explicitly separate the long and short cases
This is needed to not drop performance, after the trait-based changes.
Force separate versions of the next method to be generated for the short
and long period cases.
2015-06-24 16:22:09 +02:00
Ulrik Sverdrup
71006bd654 StrSearcher: Use trait to specialize two way algorithm by case
Use a trait to be able to implement both the fast search that skips to
each match, and the slower search that emits `Reject` intervals
regularly. The latter is important for uses of `next_reject`.
2015-06-21 19:58:56 +02:00
Ulrik Sverdrup
a6dd2031a3 StrSearcher: Specialize is_prefix_of/is_suffix_of for &str 2015-06-21 19:58:56 +02:00
Ulrik Sverdrup
b890b7bbc7 StrSearcher: Update substring search to use the Two Way algorithm
To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Much thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```

For this particular case, two way is not much faster than the naive
implementation it replaces.
2015-06-21 19:58:50 +02:00
Alex Crichton
c14d86fd3f core: Split apart the global core feature
This commit shards the broad `core` feature of the libcore library into finer
grained features. This split groups together similar APIs and enables tracking
each API separately, giving a better sense of where each feature is within the
stabilization process.

A few minor APIs were deprecated along the way:

* Iterator::reverse_in_place
* marker::NoCopy
2015-06-17 09:06:59 -07:00
Tamir Duberstein
29ac04402d Positive case of len() -> is_empty()
`s/(?<!\{ self)(?<=\.)len\(\) == 0/is_empty()/g`
2015-04-14 20:26:03 -07:00
Andrew Paseltiner
6fa16d6a47 pluralize doc comment verbs and add missing periods 2015-04-13 13:57:51 -04:00
Marvin Löbel
fbba28e246 Added smoke tests for new methods.
Fixed bug in existing StrSearcher impl
2015-04-05 18:52:58 +02:00
Marvin Löbel
c04f22a667 Refactored core::str::pattern to become a user-facing module and hide away
CharEq.
2015-04-05 18:52:57 +02:00
Marvin Löbel
1b4cddcbfd Implemented remaining string pattern iterators.
- Added missing reverse versions of methods
- Added [r]matches()
- Generated the string pattern iterators with a macro
- Added where bounds to the methods returning reverse iterators
  for better error messages.
2015-04-05 18:52:57 +02:00
Alex Crichton
d4a2c94180 std: Clean out #[deprecated] APIs
This commit cleans out a large amount of deprecated APIs from the standard
library and some of the facade crates as well, updating all users in the
compiler and in tests as it goes along.
2015-03-31 15:49:57 -07:00
Niko Matsakis
76ead08108 Remove auto-deref'ing Pattern impl because it conflicts with other
possible blanket impls and also triggers internal overflow. Add some
special cases for common uses (&&str, &String) for now; bounds-targeting
deref coercions are probably the right longer term answer.
2015-03-23 18:05:20 -04:00
Joseph Crail
857035ade7 Fix spelling errors in comments.
I corrected misspelled comments in several crates.
2015-03-19 00:48:08 -04:00
Steve Klabnik
64ab111b53 Example -> Examples
This brings comments in line with https://github.com/rust-lang/rfcs/blob/master/text/0505-api-comment-conventions.md#using-markdown
2015-03-11 21:11:40 -04:00
Marvin Löbel
c8dd2d066d Addressed PR comments 2015-02-20 00:58:15 +01:00
Marvin Löbel
a641996796 Fix tidy and rebase fallout
Added a few bugfixes and additional testcases
2015-02-20 00:58:07 +01:00
Marvin Löbel
c1de0a0f9e Added a Pattern impl that delegates to the dereference of a type.
This allows to match with a `&String` or `&&str`, for example.
2015-02-20 00:58:06 +01:00
Marvin Löbel
f9ef8cd555 Refactored code into Searcher traits with naive implementations
Made the family of Split iterators use the Pattern API

Renamed the Matcher traits into Searcher
2015-02-20 00:57:38 +01:00
Marvin Löbel
13ea9062a9 Made match_indices use the generic pattern API 2015-02-20 00:32:59 +01:00
Marvin Löbel
bc09c1ddc5 Made str::MatchIndices a private implementantion detail 2015-02-20 00:32:59 +01:00
Marvin Löbel
54f0bead81 Added string pattern traits and basic implementantions 2015-02-20 00:32:59 +01:00