slice/ascii: Optimize `eq_ignore_ascii_case` with auto-vectorization
- Refactor the current functionality into a helper function
- Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function
- Add a codegen test checking for vectorization and no panicking
- Add benches for `eq_ignore_ascii_case`
---
The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation.
Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.
Benchmarks - Cases below 16 bytes are unaffected, cases above all show sizeable improvements.
```
before:
str::eq_ignore_ascii_case::bench_large_str_eq 4942.30ns/iter +/- 48.20
str::eq_ignore_ascii_case::bench_medium_str_eq 632.01ns/iter +/- 16.87
str::eq_ignore_ascii_case::bench_str_17_bytes_eq 16.28ns/iter +/- 0.45
str::eq_ignore_ascii_case::bench_str_31_bytes_eq 35.23ns/iter +/- 2.28
str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq 7.56ns/iter +/- 0.22
str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq 2.64ns/iter +/- 0.06
after:
str::eq_ignore_ascii_case::bench_large_str_eq 611.63ns/iter +/- 28.29
str::eq_ignore_ascii_case::bench_medium_str_eq 77.10ns/iter +/- 19.76
str::eq_ignore_ascii_case::bench_str_17_bytes_eq 3.49ns/iter +/- 0.39
str::eq_ignore_ascii_case::bench_str_31_bytes_eq 3.50ns/iter +/- 0.27
str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq 7.27ns/iter +/- 0.09
str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq 2.60ns/iter +/- 0.05
```
lint: Use rustc_apfloat for `overflowing_literals`, add f16 and f128
Switch to parsing float literals for overflow checks using `rustc_apfloat` rather than host floats. This avoids small variations in platform support and makes it possible to start checking `f16` and `f128` as well.
Using APFloat matches what we try to do elsewhere to avoid platform inconsistencies.
remove `#[deprecated]` from unstable & internal `SipHasher13` and `24` types
These types are unstable and `doc(hidden)` (under the internal feature `hashmap_internals`). Deprecating them only adds noise (`#[allow(deprecated)]`) to all places where they are used, so this PR removes the deprecation attributes from them.
It also includes a few other small cleanups in separate commits, including one I overlooked in rust-lang/rust#151228.
Update backtrace and windows-bindgen
Supersedes the backtrace bump in rust-lang/rust#151659
This is mostly just renaming `windows_targets` to `windows_link` but it needs to be done in tandem with the backtrace submodule update. The reason for doing this is that backtrace is both copy/pasted into std (via being a submodule) and published as an independent crate.
Switch to parsing float literals for overflow checks using
`rustc_apfloat` rather than host floats. This avoids small variations in
platform support and makes it possible to start checking `f16` and
`f128` as well.
Using APFloat matches what we try to do elsewhere to avoid platform
inconsistencies.
Suggest changing `iter`/`into_iter` when the other was meant
When encountering a call to `iter` that should have been `into_iter` and vice-versa, provide a structured suggestion:
```
error[E0271]: type mismatch resolving `<IntoIter<{integer}, 3> as IntoIterator>::Item == &{integer}`
--> $DIR/into_iter-when-iter-was-intended.rs:5:37
|
LL | let _a = [0, 1, 2].iter().chain([3, 4, 5].into_iter());
| ----- ^^^^^^^^^^^^^^^^^^^^^ expected `&{integer}`, found integer
| |
| required by a bound introduced by this call
|
note: the method call chain might not have had the expected associated types
--> $DIR/into_iter-when-iter-was-intended.rs:5:47
|
LL | let _a = [0, 1, 2].iter().chain([3, 4, 5].into_iter());
| --------- ^^^^^^^^^^^ `IntoIterator::Item` is `{integer}` here
| |
| this expression has type `[{integer}; 3]`
note: required by a bound in `std::iter::Iterator::chain`
--> $SRC_DIR/core/src/iter/traits/iterator.rs:LL:COL
help: consider not consuming the `[{integer}, 3]` to construct the `Iterator`
|
LL - let _a = [0, 1, 2].iter().chain([3, 4, 5].into_iter());
LL + let _a = [0, 1, 2].iter().chain([3, 4, 5].iter());
|
```
Finish addressing the original case in rust-lang/rust#68095. Only the case of chaining a `Vec` or `[]` is left unhandled.
Rollup of 2 pull requests
Successful merges:
- rust-lang/rust#151612 (Update documentation for `cold_path`, `likely`, and `unlikely`)
- rust-lang/rust#151670 (compiletest: Parse aux `proc-macro` directive into struct)
Update documentation for `cold_path`, `likely`, and `unlikely`
* Add a note recommending benchmarks to `cold_path`, as other hints have
* Note that `cold_path` can be used to implement `likely` and `unlikely`
* Update the tracking issue for the `likely_unlikely` feature
Tracking issue: https://github.com/rust-lang/rust/issues/136873
Tracking issue: https://github.com/rust-lang/rust/issues/151619
When encountering a call to `iter` that should have been `into_iter` and vice-versa, provide a structured suggestion:
```
error[E0271]: type mismatch resolving `<IntoIter<{integer}, 3> as IntoIterator>::Item == &{integer}`
--> $DIR/into_iter-when-iter-was-intended.rs:5:37
|
LL | let _a = [0, 1, 2].iter().chain([3, 4, 5].into_iter());
| ----- ^^^^^^^^^^^^^^^^^^^^^ expected `&{integer}`, found integer
| |
| required by a bound introduced by this call
|
note: the method call chain might not have had the expected associated types
--> $DIR/into_iter-when-iter-was-intended.rs:5:47
|
LL | let _a = [0, 1, 2].iter().chain([3, 4, 5].into_iter());
| --------- ^^^^^^^^^^^ `IntoIterator::Item` is `{integer}` here
| |
| this expression has type `[{integer}; 3]`
note: required by a bound in `std::iter::Iterator::chain`
--> $SRC_DIR/core/src/iter/traits/iterator.rs:LL:COL
help: consider not consuming the `[{integer}, 3]` to construct the `Iterator`
|
LL - let _a = [0, 1, 2].iter().chain([3, 4, 5].into_iter());
LL + let _a = [0, 1, 2].iter().chain([3, 4, 5].iter());
|
```
Fix 'the the' typo in library/core/src/array/iter.rs
This PR fixes a small grammatical error in a safety comment within `library/core/src/array/iter.rs` where the word "the" was duplicated.
No functional changes.
Various refactors to the proc_macro bridge
This reduces the amount of types, traits and other abstractions that are involved with the bridge, which should make it easier to understand and modify. This should also help a bit with getting rid of the type marking hack, which is complicating the code a fair bit.
Fixes: rust-lang/rust#139810
Fix(lib/win/thread): Ensure `Sleep`'s usage passes over the requested duration under Win7
Fixesrust-lang/rust#149935. See the added comment for more details.
This makes the concerned test now reproducibly pass, for us at least. Also, testing this separately revealed successful: see the issue.
@rustbot label C-bug I-flaky-test O-windows-7 T-libs A-time A-thread
std: avoid tearing `dbg!` prints
Fixes https://github.com/rust-lang/rust/issues/136703.
This is an alternative to rust-lang/rust#149859. Instead of formatting everything into a string, this PR makes multi-expression `dbg!` expand into multiple nested matches, with the final match containing a single `eprint!`. By using macro recursion and relying on hygiene, this allows naming every bound value in that `eprint!`.
CC @orlp
r? libs
Other hints have a note recommending benchmarks to ensure they actually
do what is intended. This is also applicable for `cold_path`, so add a
note here.
std: `sleep_until` on Motor and VEX
This PR:
* Forwards the public `sleep_until` to the private `sleep_until` on Motor OS
* Adds a `sleep_until` implementation on VEX that yields until the deadline has passed
CC @lasiotus
CC @lewisfm @tropicaaal @Gavin-Niederman @max-niederman
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
Fix(lib/win/net): Remove hostname support under Win7
Fixesrust-lang/rust#150896. `GetHostNameW` is not available under Windows 7, leading to dynamic linking failures upon program executions. For now, as it is still unstable, this therefore appropriately cfg-gates the feature in order to mark the Win7 as unsupported with regards to this particular feature. Porting the functionality for Windows 7 would require changing the underlying system call and so more work for the immediate need.
@rustbot label C-bug O-windows-7 T-libs A-io
Three targets, covering A32 and T32 instructions, and soft-float and
hard-float ABIs. Hard-float not available in Thumb mode. Atomics
in Thumb mode require __sync* functions from compiler-builtins.
Fix compilation of std/src/sys/pal/uefi/tests.rs
Dropped the `align` test since the `POOL_ALIGNMENT` and `align_size` items it uses do not exist.
The other changes are straightforward fixes for places where the test code drifted from the current API, since the tests are not yet built in CI for the UEFI target.
CC @Ayush1325
Don't use default build-script fingerprinting in `test`
This changes the `test` build script so that it does not use the default fingerprinting mechanism in cargo which causes a full scan of the package every time it runs. This build script does not depend on any of the files in the package.
This is the recommended approach for writing build scripts.
Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native
## Summary
This PR fixes a severe performance regression in `slice::is_ascii` on AVX-512 CPUs when compiling with `-C target-cpu=native`.
On affected systems, the current implementation achieves only ~3 GB/s for large inputs, compared to ~60–70 GB/s previously (≈20–24× regression). This PR restores the original performance characteristics.
This change is intended as a **temporary workaround** for upstream LLVM poor codegen. Once the underlying LLVM issue is fixed and Rust is able to consume that fix, this workaround should be reverted.
## Problem
When `is_ascii` is compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31 `kshiftrd` instructions to extract mask bits one-by-one, instead of using the efficient `pmovmskb`
instruction. This causes a **~22x performance regression**.
Because `is_ascii` is marked `#[inline]`, it gets inlined and recompiled with the user's target settings, affecting anyone using `-C target-cpu=native` on AVX-512 CPUs.
## Root cause (upstream)
The underlying issue appears to be an LLVM vectorizer/backend bug affecting certain AVX-512 patterns.
An upstream issue has been filed by @folkertdev to track the root cause: llvm/llvm-project#176906
Until this is resolved in LLVM and picked up by rustc, this PR avoids triggering the problematic codegen pattern.
## Solution
Replace the counting loop with explicit SSE2 intrinsics (`_mm_movemask_epi8`) that force `pmovmskb` codegen regardless of CPU features.
## Godbolt Links (Rust 1.92)
| Pattern | Target | Link | Result |
|---------|--------|------|--------|
| Counting loop (old) | Default SSE2 | https://godbolt.org/z/sE86xz4fY | `pmovmskb` |
| Counting loop (old) | AVX-512 (znver4) | https://godbolt.org/z/b3jvMhGd3 | 31x `kshiftrd` (broken) |
| SSE2 intrinsics (fix) | Default SSE2 | https://godbolt.org/z/hMeGfeaPv | `pmovmskb` |
| SSE2 intrinsics (fix) | AVX-512 (znver4) | https://godbolt.org/z/Tdvdqjohn | `vpmovmskb` (fixed) |
## Benchmark Results
**CPU:** AMD Ryzen 5 7500F (Zen 4 with AVX-512)
### Default Target (SSE2) — Mixed
| Size | Before | After | Change |
|------|--------|-------|--------|
| 4 B | 1.8 GB/s | 2.0 GB/s | **+11%** |
| 8 B | 3.2 GB/s | 5.8 GB/s | **+81%** |
| 16 B | 5.3 GB/s | 8.5 GB/s | **+60%** |
| 32 B | 17.7 GB/s | 15.8 GB/s | -11% |
| 64 B | 28.6 GB/s | 25.1 GB/s | -12% |
| 256 B | 51.5 GB/s | 48.6 GB/s | ~same |
| 1 KB | 64.9 GB/s | 60.7 GB/s | ~same |
| 4 KB+ | ~68-70 GB/s | ~68-72 GB/s | ~same |
### Native Target (AVX-512) — Up to 24x Faster
| Size | Before | After | Speedup |
|------|--------|-------|---------|
| 4 B | 1.2 GB/s | 2.0 GB/s | **1.7x** |
| 8 B | 1.6 GB/s | 5.0 GB/s | **3.3x** |
| 16 B | ~7 GB/s | ~7 GB/s | ~same |
| 32 B | 2.9 GB/s | 14.2 GB/s | **4.9x** |
| 64 B | 2.9 GB/s | 23.2 GB/s | **8x** |
| 256 B | 2.9 GB/s | 47.2 GB/s | **16x** |
| 1 KB | 2.8 GB/s | 60.0 GB/s | **21x** |
| 4 KB+ | 2.9 GB/s | ~68-70 GB/s | **23-24x** |
### Summary
- **SSE2 (default):** Small inputs (4-16 B) 11-81% faster; 32-64 B ~11% slower; large inputs unchanged
- **AVX-512 (native):** 21-24x faster for inputs ≥1 KB, peak ~70 GB/s (was ~3 GB/s)
Note: this is the pure ascii path, but the story is similar for the others.
See linked bench project.
## Test Plan
- [x] Assembly test (`slice-is-ascii-avx512.rs`) verifies no `kshiftrd` with AVX-512
- [x] Existing codegen test updated to `loongarch64`-only (auto-vectorization still used there)
- [x] Fuzz testing confirms old/new implementations produce identical results (~53M iterations)
- [x] Benchmarks confirm performance improvement
- [x] Tidy checks pass
## Reproduction / Test Projects
Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation
- `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
- `fuzz/` - Compares old/new implementations with libfuzzer
## Related Issues
- issue opened by @folkertdev llvm/llvm-project#176906
- Regression introduced in https://github.com/rust-lang/rust/pull/130733
Dropped the `align` test since the `POOL_ALIGNMENT` and `align_size`
items it uses do not exist.
The other changes are straightforward fixes for places where the test
code drifted from the current API, since the tests are not yet built in
CI for the UEFI target.
This changes the `test` build script so that it does not use the default
fingerprinting mechanism in cargo which causes a full scan of the
package every time it runs. This build script does not depend on any of
the files in the package.
This is the recommended approach for writing build scripts.