Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100
```
test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s
```
part of #24407
I'm not sure whether I should be trying to explain the general rule in the E0210 explanation or just point people to the RFC. However, if we go with the latter option I think that the RFC will need to be revised slightly, since it is not quite as gentle as I would like.
Also, the link to RFC 1023 is not the correct one (it should be https://github.com/rust-lang/rfcs/blob/master/text/1023-rebalancing-coherence.md), but the correct one is too long. I'm aware of @michaelsproul's PR https://github.com/rust-lang/rust/pull/26290 from awhile back, but it doesn't seem to be working. Has there not been a new snapshot yet?
#27360 removed a padding field full of uint8_t's, but didn't remove
the use. This didn't get picked up presumably because (a) bors
doesn't have any BSD builders, and/or (b) #[cfg]'d out blocks don't
get linted.
```
rustc: x86_64-unknown-freebsd/stage1/lib/rustlib/x86_64-unknown-freebsd/lib/liblibc
src/liblibc/lib.rs:1099:42: 1099:49 error: unused import, #[deny(unused_imports)] on by default
src/liblibc/lib.rs:1099 use types::common::c99::{uint8_t, uint32_t, int32_t};
^~~~~~~
error: aborting due to previous error
fatal runtime error: Could not unwind stack, error = 159555904
```
this makes the second code block consistent with the first code block -- other than being in reversed order, the first code block claims b is u16 and c is u32, whereas the second code block claims the opposite. seems to be an obvious typo.
This is *mostly* reducing *my* use of *italics* but there's some other misc changes interspersed as I went along.
This updates the italicizing alphabetically from `a` to `ra`.
r? @steveklabnik
#27360 removed a padding field full of uint8_t's, but didn't remove
the use. This didn't get picked up presumably because (a) bors
doesn't have any BSD builders, and/or (b) #[cfg]'d out blocks don't
get linted.
```
rustc: x86_64-unknown-freebsd/stage1/lib/rustlib/x86_64-unknown-freebsd/lib/liblibc
src/liblibc/lib.rs:1099:42: 1099:49 error: unused import, #[deny(unused_imports)] on by default
src/liblibc/lib.rs:1099 use types::common::c99::{uint8_t, uint32_t, int32_t};
^~~~~~~
error: aborting due to previous error
fatal runtime error: Could not unwind stack, error = 159555904
```
There are still problems in both the design and implementation of this, so we don't want it landing in 1.2.
cc @arielb1 @nikomatsakis
cc #27364
r? @alexcrichton
The following APIs were all marked with a `#[stable]` tag:
* process::Child::id
* error::Error::is
* error::Error::downcast
* error::Error::downcast_ref
* error::Error::downcast_mut
* io::Error::get_ref
* io::Error::get_mut
* io::Error::into_inner
* hash::Hash::hash_slice
* hash::Hasher::write_{i,u}{8,16,32,64,size}
This isn't actually necessary any more with the advent of `$crate` and changes
in the compiler to expand macros to `::core::$foo` in the context of a
`#![no_std]` crate.
The libcore inner module was also trimmed down a bit to the bare bones.
As there’s no C++ runtime any more there’s really no point in having anything but Rust tags being made.
I’ve also taken the liberty of excluding the compiler parts of this in the `librust%,,` pattern substitution. Whether or not this is “correct” will depend on whether you want tags for the compiler or for general use. For myself, I want it for general use.
I’m not sure how much people use the tags files anyway. I definitely do, but with Racer existing the tags files aren’t quite so necessary.
This is a minor [breaking-change], as it changes what
`boxed_str.to_owned()` does (previously it would deref to `&str` and
call `to_owned` on that to get a `String`). However `Box<str>` is such an
exceptionally rare type that this is not expected to be a serious
concern. Also a `Box<str>` can be freely converted to a `String` to
obtain the previous result anyway.
I think this was just missed when `Send` and `Sync` were redone, since it seems odd to not be able to use things like `Arc<AtomicPtr>`. If it was intentional feel free to just close this.
I used another test as a template for writing mine, so I hope I got all the headers and stuff right.