Make RangeInclusive just a two-field struct
Not being an enum improves ergonomics and consistency, especially since NonEmpty variant wasn't prevented from being empty. It can still be iterable without an extra "done" bit by making the range have !(start <= end), which is even possible without changing the Step trait.
Implements merged https://github.com/rust-lang/rfcs/pull/1980; tracking issue https://github.com/rust-lang/rust/issues/28237.
This is definitely a breaking change to anything consuming `RangeInclusive` directly (not as an Iterator) or constructing it without using the sugar. Is there some change that would make sense before this so compilation failures could be compatibly fixed ahead of time?
r? @aturon (as FCP proposer on the RFC)
Not being an enum improves ergonomics, especially since NonEmpty could be Empty. It can still be iterable without an extra "done" bit by making the range have !(start <= end), which is even possible without changing the Step trait.
Implements RFC 1980
Correct some stability versions
These were found by running tidy on stable versions of rust and finding
features stabilised with the wrong version numbers.
Added links to types in from_utf8 description
References #29375. Link to types mentioned in the documentation for `from_utf8` (`str`, `&[u8`], etc). Paragraphs were reformatted to keep any one line from being excessively long, but are otherwise unchanged.
Added links to from_utf8 methods in Utf8Error
Referencing #29375. Linked the `from_utf8` methods for both `String` and `str` in the description. Also linked the `u8` to its documentation
Rename TryFrom's associated type and implement str::parse using TryFrom.
Per discussion on the tracking issue, naming `TryFrom`'s associated type `Error` is generally more consistent with similar traits in the Rust ecosystem, and what people seem to assume it should be called. It also helps disambiguate from `Result::Err`, the most common "Err".
See https://github.com/rust-lang/rust/issues/33417#issuecomment-269108968.
`TryFrom<&str>` and `FromStr` are equivalent, so have the latter provide the former to ensure that. Using `TryFrom` in the implementation of `str::parse` means types that implement either trait can use it. When we're ready to stabilize `TryFrom`, we should update `FromStr` to
suggest implementing `TryFrom<&str>` instead for new code.
See https://github.com/rust-lang/rust/issues/33417#issuecomment-277175994
and https://github.com/rust-lang/rust/issues/33417#issuecomment-277253827.
Refs #33417.
Their relationship is:
* `resume_from = error_len.map(|l| l + valid_up_to)`
* error_len is always one of None, Some(1), Some(2), or Some(3).
When I started using resume_from I almost always ended up subtracting
valid_up_to to obtain error_len.
Therefore the latter is what should be provided in the first place.
UTF-8 validation: Compute block end upfront
Simplify the conditional used for ensuring that the whole word loop is
only used if there are at least two whole words left to read.
This makes the function slightly smaller and simpler, a 0-5% reduction
in runtime for various test cases.
Use more specific panic message for &str slicing errors
Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.
It reports the first error it finds in:
1. begin out of bounds
2. end out of bounds
3. begin <= end violated
3. begin not char boundary
5. end not char boundary.
Example:
&"abcαβγ"[..4]
thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'
Fixes#38052
Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.
Example:
&"abcαβγ"[..4]
thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
a char boundary; it is inside `α` (bytes 3..5) of `abcαβγ`'
Simplify the conditional used for ensuring that the whole word loop is
only used if there are at least two whole words left to read.
This makes the function slightly smaller and simpler, a 0-5% reduction
in runtime for various test cases.
Improve .chars().count()
Use a simpler loop to count the `char` of a string: count the
number of non-continuation bytes. Use `count += <conditional>` which the
compiler understands well and can apply loop optimizations to.
benchmark descriptions and results for two configurations:
- ascii: ascii text
- cy: cyrillic text
- jp: japanese text
- words ascii: counting each split_whitespace item from the ascii text
- words jp: counting each split_whitespace item from the jp text
```
x86-64 rustc -Copt-level=3
name orig_ ns/iter cmov_ ns/iter diff ns/iter diff %
count_ascii 1,453 (1755 MB/s) 1,398 (1824 MB/s) -55 -3.79%
count_cy 5,990 (856 MB/s) 2,545 (2016 MB/s) -3,445 -57.51%
count_jp 3,075 (1169 MB/s) 1,772 (2029 MB/s) -1,303 -42.37%
count_words_ascii 4,157 (521 MB/s) 1,797 (1205 MB/s) -2,360 -56.77%
count_words_jp 3,337 (1071 MB/s) 1,772 (2018 MB/s) -1,565 -46.90%
x86-64 rustc -Ctarget-feature=+avx -Copt-level=3
name orig_ ns/iter cmov_ ns/iter diff ns/iter diff %
count_ascii 1,444 (1766 MB/s) 763 (3343 MB/s) -681 -47.16%
count_cy 5,871 (874 MB/s) 1,527 (3360 MB/s) -4,344 -73.99%
count_jp 2,874 (1251 MB/s) 1,073 (3351 MB/s) -1,801 -62.67%
count_words_ascii 4,131 (524 MB/s) 1,871 (1157 MB/s) -2,260 -54.71%
count_words_jp 3,253 (1099 MB/s) 1,331 (2686 MB/s) -1,922 -59.08%
```
I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning `count_words_ascii` in particular (counting
many small strings); this solution is an improvement without tradeoffs.
Use a simpler loop to count the `char` of a string: count the
number of non-continuation bytes. Use `count += <conditional>` which the
compiler understands well and can apply loop optimizations to.