StrSearcher: Implement the complete reverse case for the two way algorithm
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100
```
test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s
```
The innermost loop of TwoWaySearcher checks the boundary of the haystack
vs position + needle.len(), and it checks the last byte of the needle
against the byteset.
If these two steps are combined by using the indexing of the last
needle byte's position as bounds check, the algorithm improves its
throughput. We improve the innermost loop by reducing the number of
instructions used, and elminating the panic case for the checked
indexing that was previously used.
Selected benchmarks from the external/workspace testsuite. Benchmarks
improve across the board.
```
before:
test bb_in_aa::twoway_find ... bench: 4,229 ns/iter (+/- 1,305) = 23646 MB/s
test bb_in_aa::twoway_rfind ... bench: 3,873 ns/iter (+/- 101) = 25819 MB/s
test short_1let_long::twoway_find ... bench: 7,075 ns/iter (+/- 29) = 360 MB/s
test short_1let_long::twoway_rfind ... bench: 6,640 ns/iter (+/- 79) = 384 MB/s
test short_2let_long::twoway_find ... bench: 3,823 ns/iter (+/- 16) = 667 MB/s
test short_2let_long::twoway_rfind ... bench: 3,774 ns/iter (+/- 44) = 675 MB/s
test short_3let_long::twoway_find ... bench: 3,582 ns/iter (+/- 47) = 712 MB/s
test short_3let_long::twoway_rfind ... bench: 3,616 ns/iter (+/- 34) = 705 MB/s
with this commit:
test bb_in_aa::twoway_find ... bench: 2,952 ns/iter (+/- 20) = 33875 MB/s
test bb_in_aa::twoway_rfind ... bench: 2,939 ns/iter (+/- 99) = 34025 MB/s
test short_1let_long::twoway_find ... bench: 4,593 ns/iter (+/- 4) = 555 MB/s
test short_1let_long::twoway_rfind ... bench: 4,592 ns/iter (+/- 76) = 555 MB/s
test short_2let_long::twoway_find ... bench: 2,804 ns/iter (+/- 3) = 909 MB/s
test short_2let_long::twoway_rfind ... bench: 2,807 ns/iter (+/- 40) = 908 MB/s
test short_3let_long::twoway_find ... bench: 3,105 ns/iter (+/- 120) = 821 MB/s
test short_3let_long::twoway_rfind ... bench: 3,019 ns/iter (+/- 50) = 844 MB/s
```
- `bb_in_aa`: fast skip due to byteset filter loop improves.
- 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
This commit is an implementation of [RFC 1184][rfc] which tweaks the behavior of
the `#![no_std]` attribute and adds a new `#![no_core]` attribute. The
`#![no_std]` attribute now injects `extern crate core` at the top of the crate
as well as the libcore prelude into all modules (in the same manner as the
standard library's prelude). The `#![no_core]` attribute disables both std and
core injection.
[rfc]: https://github.com/rust-lang/rfcs/pull/1184
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100
```
test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s
```
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.
The libcore inner module was also trimmed down a bit to the bare bones.
This is needed to not drop performance, after the trait-based changes.
Force separate versions of the next method to be generated for the short
and long period cases.
Use a trait to be able to implement both the fast search that skips to
each match, and the slower search that emits `Reject` intervals
regularly. The latter is important for uses of `next_reject`.
To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.
Fixes#25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).
Much thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.
I had quickcheck to fuzz test forward and reverse searching thoroughly.
The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.
The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.
A bad case for the two way algorithm is:
```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```
For this particular case, two way is not much faster than the naive
implementation it replaces.
This commit shards the broad `core` feature of the libcore library into finer
grained features. This split groups together similar APIs and enables tracking
each API separately, giving a better sense of where each feature is within the
stabilization process.
A few minor APIs were deprecated along the way:
* Iterator::reverse_in_place
* marker::NoCopy
- Added missing reverse versions of methods
- Added [r]matches()
- Generated the string pattern iterators with a macro
- Added where bounds to the methods returning reverse iterators
for better error messages.
This commit cleans out a large amount of deprecated APIs from the standard
library and some of the facade crates as well, updating all users in the
compiler and in tests as it goes along.
possible blanket impls and also triggers internal overflow. Add some
special cases for common uses (&&str, &String) for now; bounds-targeting
deref coercions are probably the right longer term answer.