Commit graph

14 commits

Author SHA1 Message Date
Stuart Cook
1c892e829c
Rollup merge of #147436 - okaneco:eq_ignore_ascii_autovec, r=scottmcm
slice/ascii: Optimize `eq_ignore_ascii_case` with auto-vectorization

- Refactor the current functionality into a helper function
- Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function
- Add a codegen test checking for vectorization and no panicking
- Add benches for `eq_ignore_ascii_case`

---

The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation.

Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.

Benchmarks - Cases below 16 bytes are unaffected, cases above all show sizeable improvements.
```
before:
    str::eq_ignore_ascii_case::bench_large_str_eq         4942.30ns/iter +/- 48.20
    str::eq_ignore_ascii_case::bench_medium_str_eq         632.01ns/iter +/- 16.87
    str::eq_ignore_ascii_case::bench_str_17_bytes_eq        16.28ns/iter  +/- 0.45
    str::eq_ignore_ascii_case::bench_str_31_bytes_eq        35.23ns/iter  +/- 2.28
    str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq       7.56ns/iter  +/- 0.22
    str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq    2.64ns/iter  +/- 0.06
after:
    str::eq_ignore_ascii_case::bench_large_str_eq         611.63ns/iter +/- 28.29
    str::eq_ignore_ascii_case::bench_medium_str_eq         77.10ns/iter +/- 19.76
    str::eq_ignore_ascii_case::bench_str_17_bytes_eq        3.49ns/iter  +/- 0.39
    str::eq_ignore_ascii_case::bench_str_31_bytes_eq        3.50ns/iter  +/- 0.27
    str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq      7.27ns/iter  +/- 0.09
    str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq   2.60ns/iter  +/- 0.05
```
2026-01-27 17:36:35 +11:00
Juho Kahala
29596f87be rename uN::{gather,scatter}_bits to uN::{extract,deposit}_bits 2026-01-26 06:28:42 +02:00
Yotam Ofek
f82dd820a5 Use rand crate more idiomatically 2026-01-07 22:31:33 +02:00
quaternic
7e7d724373 Implement benchmarks for uN::{gather,scatter}_bits 2025-12-03 20:39:51 +02:00
okaneco
a5ba24843d Relocate bench and use str corpora for data
Add #[inline(always)] to inner function and check not for filecheck test
2025-10-08 14:42:33 -04:00
okaneco
b031c51664 slice/ascii: Optimize eq_ignore_ascii_case with auto-vectorization
Refactor the current functionality into a helper function
Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function
Add a codegen test
Add benches for `eq_ignore_ascii_case`

The optimized function is initially only enabled for x86_64 which has `sse2` as
part of its baseline, but none of the code is platform specific. Other
platforms with SIMD instructions may also benefit from this implementation.

Performance improvements only manifest for slices of 16 bytes or longer, so the
optimized path is gated behind a length check for greater than or equal to 16.
2025-10-07 05:52:37 -04:00
Pascal S. de Kloe
2143a3f0c2 benchmarks for exponent fmt of integers 2025-08-22 17:40:48 +02:00
Pascal S. de Kloe
7704a39685 fmt benchmarks for binary, octal and hex 2025-07-25 16:03:47 +02:00
sayantn
7443d039a5
Update stdarch 2025-05-01 20:01:43 +05:30
Thalia Archibald
988eb19970 library: Use size_of from the prelude instead of imported
Use `std::mem::{size_of, size_of_val, align_of, align_of_val}` from the
prelude instead of importing or qualifying them.

These functions were added to all preludes in Rust 1.80.
2025-03-06 20:20:38 -08:00
Eric Huss
0aa634e71a Migrate coretests to Rust 2024 2025-02-13 13:10:21 -08:00
Eric Huss
b7c975b22e library: Update rand to 0.9.0 2025-02-13 12:20:55 -08:00
Pascal S. de Kloe
962fec2193 black_box integer-input on fmt benches 2025-01-30 18:57:23 +01:00
bjorn3
b6a3841942 Put all coretests in a separate crate 2025-01-26 10:26:36 +00:00