rust/library/core/src
bors f624427f87 Auto merge of #90414 - thomcc:count-chars-faster, r=nagisa
Optimize `core::str::Chars::count`

I wrote this a while ago after seeing this function as a bottleneck in a profile, but never got around to contributing it. I saw it again, and so here it is. The implementation is fairly complex, but I tried to explain what's happening at both a high level (in the header comment for the file), and in line comments in the impl. Hopefully it's clear enough.

This implementation (`case00_cur_libcore` in the benchmarks below) is somewhat consistently around 4x to 5x faster than the old implementation (`case01_old_libcore` in the benchmarks below), for a wide variety of workloads, without regressing performance on any of the workload sizes I've tried.

I also improved the benchmarks for this code, so that they explicitly check text in different languages and of different sizes (err, the cross product of language x size). The results of the benchmarks are here:

<details>
<summary>Benchmark results</summary>
<pre>
test str::char_count::emoji_huge::case00_cur_libcore       ... bench:      20,216 ns/iter (+/- 3,673) = 17931 MB/s
test str::char_count::emoji_huge::case01_old_libcore       ... bench:     108,851 ns/iter (+/- 12,777) = 3330 MB/s
test str::char_count::emoji_huge::case02_iter_increment    ... bench:     329,502 ns/iter (+/- 4,163) = 1100 MB/s
test str::char_count::emoji_huge::case03_manual_char_len   ... bench:     223,333 ns/iter (+/- 14,167) = 1623 MB/s
test str::char_count::emoji_large::case00_cur_libcore      ... bench:         293 ns/iter (+/- 6) = 19331 MB/s
test str::char_count::emoji_large::case01_old_libcore      ... bench:       1,681 ns/iter (+/- 28) = 3369 MB/s
test str::char_count::emoji_large::case02_iter_increment   ... bench:       5,166 ns/iter (+/- 85) = 1096 MB/s
test str::char_count::emoji_large::case03_manual_char_len  ... bench:       3,476 ns/iter (+/- 62) = 1629 MB/s
test str::char_count::emoji_medium::case00_cur_libcore     ... bench:          48 ns/iter (+/- 0) = 14750 MB/s
test str::char_count::emoji_medium::case01_old_libcore     ... bench:         217 ns/iter (+/- 4) = 3262 MB/s
test str::char_count::emoji_medium::case02_iter_increment  ... bench:         642 ns/iter (+/- 7) = 1102 MB/s
test str::char_count::emoji_medium::case03_manual_char_len ... bench:         445 ns/iter (+/- 3) = 1591 MB/s
test str::char_count::emoji_small::case00_cur_libcore      ... bench:          18 ns/iter (+/- 0) = 3777 MB/s
test str::char_count::emoji_small::case01_old_libcore      ... bench:          23 ns/iter (+/- 0) = 2956 MB/s
test str::char_count::emoji_small::case02_iter_increment   ... bench:          66 ns/iter (+/- 2) = 1030 MB/s
test str::char_count::emoji_small::case03_manual_char_len  ... bench:          29 ns/iter (+/- 1) = 2344 MB/s
test str::char_count::en_huge::case00_cur_libcore          ... bench:      25,909 ns/iter (+/- 39,260) = 13299 MB/s
test str::char_count::en_huge::case01_old_libcore          ... bench:     102,887 ns/iter (+/- 3,257) = 3349 MB/s
test str::char_count::en_huge::case02_iter_increment       ... bench:     166,370 ns/iter (+/- 12,439) = 2071 MB/s
test str::char_count::en_huge::case03_manual_char_len      ... bench:     166,332 ns/iter (+/- 4,262) = 2071 MB/s
test str::char_count::en_large::case00_cur_libcore         ... bench:         281 ns/iter (+/- 6) = 19160 MB/s
test str::char_count::en_large::case01_old_libcore         ... bench:       1,598 ns/iter (+/- 19) = 3369 MB/s
test str::char_count::en_large::case02_iter_increment      ... bench:       2,598 ns/iter (+/- 167) = 2072 MB/s
test str::char_count::en_large::case03_manual_char_len     ... bench:       2,578 ns/iter (+/- 55) = 2088 MB/s
test str::char_count::en_medium::case00_cur_libcore        ... bench:          44 ns/iter (+/- 1) = 15295 MB/s
test str::char_count::en_medium::case01_old_libcore        ... bench:         201 ns/iter (+/- 51) = 3348 MB/s
test str::char_count::en_medium::case02_iter_increment     ... bench:         322 ns/iter (+/- 40) = 2090 MB/s
test str::char_count::en_medium::case03_manual_char_len    ... bench:         319 ns/iter (+/- 5) = 2109 MB/s
test str::char_count::en_small::case00_cur_libcore         ... bench:          15 ns/iter (+/- 0) = 2333 MB/s
test str::char_count::en_small::case01_old_libcore         ... bench:          14 ns/iter (+/- 0) = 2500 MB/s
test str::char_count::en_small::case02_iter_increment      ... bench:          30 ns/iter (+/- 1) = 1166 MB/s
test str::char_count::en_small::case03_manual_char_len     ... bench:          30 ns/iter (+/- 1) = 1166 MB/s
test str::char_count::ru_huge::case00_cur_libcore          ... bench:      16,439 ns/iter (+/- 3,105) = 19777 MB/s
test str::char_count::ru_huge::case01_old_libcore          ... bench:      89,480 ns/iter (+/- 2,555) = 3633 MB/s
test str::char_count::ru_huge::case02_iter_increment       ... bench:     217,703 ns/iter (+/- 22,185) = 1493 MB/s
test str::char_count::ru_huge::case03_manual_char_len      ... bench:     157,330 ns/iter (+/- 19,188) = 2066 MB/s
test str::char_count::ru_large::case00_cur_libcore         ... bench:         243 ns/iter (+/- 6) = 20905 MB/s
test str::char_count::ru_large::case01_old_libcore         ... bench:       1,384 ns/iter (+/- 51) = 3670 MB/s
test str::char_count::ru_large::case02_iter_increment      ... bench:       3,381 ns/iter (+/- 543) = 1502 MB/s
test str::char_count::ru_large::case03_manual_char_len     ... bench:       2,423 ns/iter (+/- 429) = 2096 MB/s
test str::char_count::ru_medium::case00_cur_libcore        ... bench:          42 ns/iter (+/- 1) = 15119 MB/s
test str::char_count::ru_medium::case01_old_libcore        ... bench:         180 ns/iter (+/- 4) = 3527 MB/s
test str::char_count::ru_medium::case02_iter_increment     ... bench:         402 ns/iter (+/- 45) = 1579 MB/s
test str::char_count::ru_medium::case03_manual_char_len    ... bench:         280 ns/iter (+/- 29) = 2267 MB/s
test str::char_count::ru_small::case00_cur_libcore         ... bench:          12 ns/iter (+/- 0) = 2666 MB/s
test str::char_count::ru_small::case01_old_libcore         ... bench:          12 ns/iter (+/- 0) = 2666 MB/s
test str::char_count::ru_small::case02_iter_increment      ... bench:          19 ns/iter (+/- 0) = 1684 MB/s
test str::char_count::ru_small::case03_manual_char_len     ... bench:          14 ns/iter (+/- 1) = 2285 MB/s
test str::char_count::zh_huge::case00_cur_libcore          ... bench:      15,053 ns/iter (+/- 2,640) = 20067 MB/s
test str::char_count::zh_huge::case01_old_libcore          ... bench:      82,622 ns/iter (+/- 3,602) = 3656 MB/s
test str::char_count::zh_huge::case02_iter_increment       ... bench:     230,456 ns/iter (+/- 7,246) = 1310 MB/s
test str::char_count::zh_huge::case03_manual_char_len      ... bench:     220,595 ns/iter (+/- 11,624) = 1369 MB/s
test str::char_count::zh_large::case00_cur_libcore         ... bench:         227 ns/iter (+/- 65) = 20792 MB/s
test str::char_count::zh_large::case01_old_libcore         ... bench:       1,136 ns/iter (+/- 144) = 4154 MB/s
test str::char_count::zh_large::case02_iter_increment      ... bench:       3,147 ns/iter (+/- 253) = 1499 MB/s
test str::char_count::zh_large::case03_manual_char_len     ... bench:       2,993 ns/iter (+/- 400) = 1577 MB/s
test str::char_count::zh_medium::case00_cur_libcore        ... bench:          36 ns/iter (+/- 5) = 16388 MB/s
test str::char_count::zh_medium::case01_old_libcore        ... bench:         142 ns/iter (+/- 18) = 4154 MB/s
test str::char_count::zh_medium::case02_iter_increment     ... bench:         379 ns/iter (+/- 37) = 1556 MB/s
test str::char_count::zh_medium::case03_manual_char_len    ... bench:         364 ns/iter (+/- 51) = 1620 MB/s
test str::char_count::zh_small::case00_cur_libcore         ... bench:          11 ns/iter (+/- 1) = 3000 MB/s
test str::char_count::zh_small::case01_old_libcore         ... bench:          11 ns/iter (+/- 1) = 3000 MB/s
test str::char_count::zh_small::case02_iter_increment      ... bench:          20 ns/iter (+/- 3) = 1650 MB/s
</pre>
</details>

I also added fairly thorough tests for different sizes and alignments. This completes on my machine in 0.02s, which is surprising given how thorough they are, but it seems to detect bugs in the implementation. (I haven't run the tests on a 32 bit machine yet since before I reworked the code a little though, so... hopefully I'm not about to embarrass myself).

This uses similar SWAR-style techniques to the `is_ascii` impl I contributed in https://github.com/rust-lang/rust/pull/74066, so I'm going to request review from the same person who reviewed that one. That said am not particularly picky, and might not have the correct syntax for requesting a review from someone (so it goes).

r? `@nagisa`
2022-02-06 08:34:48 +00:00
..
alloc Fix a bunch of typos 2021-12-14 16:40:43 +01:00
array Switch all libraries to the 2021 edition 2021-12-23 19:03:47 +08:00
char Rollup merge of #93392 - GKFX:char-docs, r=scottmcm 2022-01-31 06:58:32 +01:00
convert Rollup merge of #92382 - clarfonthey:const_convert, r=scottmcm 2022-01-15 02:25:14 +01:00
fmt Create core::fmt::ArgumentV1 with generics instead of fn pointer 2022-01-29 13:52:19 +00:00
future Rollup merge of #92887 - pietroalbini:pa-bootstrap-update, r=Mark-Simulacrum 2022-01-30 08:37:46 -08:00
hash change PhantomData type for BuildHasherDefault 2022-01-07 00:39:48 +01:00
iter Fix a typo from #92899 2022-01-28 01:35:33 +00:00
macros update cfg(bootstrap)s 2022-01-28 15:01:07 +01:00
mem Add MaybeUninit::as_bytes 2022-01-19 21:27:29 +00:00
num doc: use U+2212 for minus sign in integer MIN/MAX text 2022-02-04 17:59:53 +01:00
ops Add a minimal working append_const_msg argument 2022-01-26 00:48:08 +11:00
panic Add PanicInfo::can_unwind which indicates whether a panic handler is 2022-01-17 00:39:28 +00:00
prelude update cfg(bootstrap)s 2022-01-28 15:01:07 +01:00
ptr Make NonNull::new const 2022-01-23 23:04:39 +09:00
slice Auto merge of #86988 - thomcc:chunky-splitz-says-no-checking, r=the8472 2022-02-01 10:11:59 +00:00
str Fix comment grammar for do_count_chars 2022-02-05 11:17:10 -08:00
stream Remove P: Unpin bound on impl Stream for Pin 2021-12-17 11:14:02 +08:00
sync Add rustc_diagnostic_item attribute to AtomicBool 2022-01-13 23:32:49 +01:00
task Implement data and vtable getters for RawWaker 2021-12-17 04:30:13 +08:00
unicode Regenerate tables for Unicode 14.0.0 2021-10-06 17:49:33 -07:00
any.rs Reverts #92135 because perf regression 2021-12-26 16:02:33 +03:00
ascii.rs Add #[must_use] to remaining core functions 2021-10-30 18:21:29 -04:00
bool.rs Constify bool::then{,_some} 2021-12-15 00:11:23 +08:00
borrow.rs Make Borrow and BorrowMut impls const 2021-12-04 21:57:39 +09:00
cell.rs update cfg(bootstrap)s 2022-01-28 15:01:07 +01:00
clone.rs Update Copy/Clone documentation WRT arrays 2021-11-08 13:11:59 -05:00
cmp.rs Edit docs introduction for std::cmp::PartialOrd 2022-01-28 00:46:04 -06:00
default.rs Add #[must_use] to remaining core functions 2021-10-30 18:21:29 -04:00
ffi.rs Use target_family = "wasm" 2021-11-10 08:35:42 -08:00
hint.rs Add is_riscv_feature_detected!; modify impl of hint::spin_loop 2022-01-05 15:44:52 +08:00
internal_macros.rs Added docs to internal_macro const 2021-10-22 10:07:35 +13:00
intrinsics.rs Document about some behaviors of const_(de)allocate and add some tests. 2022-01-29 19:13:23 +09:00
lazy.rs Use UnsafeCell::get_mut() in core::lazy::OnceCell::get_mut() 2021-12-30 05:04:44 +02:00
lib.rs Rollup merge of #92887 - pietroalbini:pa-bootstrap-update, r=Mark-Simulacrum 2022-01-30 08:37:46 -08:00
marker.rs Update Copy/Clone documentation WRT arrays 2021-11-08 13:11:59 -05:00
option.rs Fix is_some_with tests. 2022-01-19 00:12:35 +01:00
panic.rs Allow panic!("{}", computed_str) in const fn. 2021-09-15 21:56:43 +01:00
panicking.rs Change TerminatorKind::Abort to call the panic handler instead of 2022-01-17 00:39:34 +00:00
pin.rs Add #[must_use] to remaining core functions 2021-10-30 18:21:29 -04:00
primitive.rs mv std libs to library/ 2020-07-27 19:51:13 -05:00
primitive_docs.rs Fix annotation of code blocks 2022-02-01 21:44:53 +00:00
result.rs Fix is_some_with tests. 2022-01-19 00:12:35 +01:00
time.rs Improve Duration::try_from_secs_f32/64 accuracy by directly processing exponent and mantissa 2022-01-26 18:14:25 +03:00
tuple.rs mv std libs to library/ 2020-07-27 19:51:13 -05:00
unit.rs mv std libs to library/ 2020-07-27 19:51:13 -05:00