Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native ## Summary This PR fixes a severe performance regression in `slice::is_ascii` on AVX-512 CPUs when compiling with `-C target-cpu=native`. On affected systems, the current implementation achieves only ~3 GB/s for large inputs, compared to ~60–70 GB/s previously (≈20–24× regression). This PR restores the original performance characteristics. This change is intended as a **temporary workaround** for upstream LLVM poor codegen. Once the underlying LLVM issue is fixed and Rust is able to consume that fix, this workaround should be reverted. ## Problem When `is_ascii` is compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31 `kshiftrd` instructions to extract mask bits one-by-one, instead of using the efficient `pmovmskb` instruction. This causes a **~22x performance regression**. Because `is_ascii` is marked `#[inline]`, it gets inlined and recompiled with the user's target settings, affecting anyone using `-C target-cpu=native` on AVX-512 CPUs. ## Root cause (upstream) The underlying issue appears to be an LLVM vectorizer/backend bug affecting certain AVX-512 patterns. An upstream issue has been filed by @folkertdev to track the root cause: llvm/llvm-project#176906 Until this is resolved in LLVM and picked up by rustc, this PR avoids triggering the problematic codegen pattern. ## Solution Replace the counting loop with explicit SSE2 intrinsics (`_mm_movemask_epi8`) that force `pmovmskb` codegen regardless of CPU features. ## Godbolt Links (Rust 1.92) | Pattern | Target | Link | Result | |---------|--------|------|--------| | Counting loop (old) | Default SSE2 | https://godbolt.org/z/sE86xz4fY | `pmovmskb` | | Counting loop (old) | AVX-512 (znver4) | https://godbolt.org/z/b3jvMhGd3 | 31x `kshiftrd` (broken) | | SSE2 intrinsics (fix) | Default SSE2 | https://godbolt.org/z/hMeGfeaPv | `pmovmskb` | | SSE2 intrinsics (fix) | AVX-512 (znver4) | https://godbolt.org/z/Tdvdqjohn | `vpmovmskb` (fixed) | ## Benchmark Results **CPU:** AMD Ryzen 5 7500F (Zen 4 with AVX-512) ### Default Target (SSE2) — Mixed | Size | Before | After | Change | |------|--------|-------|--------| | 4 B | 1.8 GB/s | 2.0 GB/s | **+11%** | | 8 B | 3.2 GB/s | 5.8 GB/s | **+81%** | | 16 B | 5.3 GB/s | 8.5 GB/s | **+60%** | | 32 B | 17.7 GB/s | 15.8 GB/s | -11% | | 64 B | 28.6 GB/s | 25.1 GB/s | -12% | | 256 B | 51.5 GB/s | 48.6 GB/s | ~same | | 1 KB | 64.9 GB/s | 60.7 GB/s | ~same | | 4 KB+ | ~68-70 GB/s | ~68-72 GB/s | ~same | ### Native Target (AVX-512) — Up to 24x Faster | Size | Before | After | Speedup | |------|--------|-------|---------| | 4 B | 1.2 GB/s | 2.0 GB/s | **1.7x** | | 8 B | 1.6 GB/s | 5.0 GB/s | **3.3x** | | 16 B | ~7 GB/s | ~7 GB/s | ~same | | 32 B | 2.9 GB/s | 14.2 GB/s | **4.9x** | | 64 B | 2.9 GB/s | 23.2 GB/s | **8x** | | 256 B | 2.9 GB/s | 47.2 GB/s | **16x** | | 1 KB | 2.8 GB/s | 60.0 GB/s | **21x** | | 4 KB+ | 2.9 GB/s | ~68-70 GB/s | **23-24x** | ### Summary - **SSE2 (default):** Small inputs (4-16 B) 11-81% faster; 32-64 B ~11% slower; large inputs unchanged - **AVX-512 (native):** 21-24x faster for inputs ≥1 KB, peak ~70 GB/s (was ~3 GB/s) Note: this is the pure ascii path, but the story is similar for the others. See linked bench project. ## Test Plan - [x] Assembly test (`slice-is-ascii-avx512.rs`) verifies no `kshiftrd` with AVX-512 - [x] Existing codegen test updated to `loongarch64`-only (auto-vectorization still used there) - [x] Fuzz testing confirms old/new implementations produce identical results (~53M iterations) - [x] Benchmarks confirm performance improvement - [x] Tidy checks pass ## Reproduction / Test Projects Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation - `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison - `fuzz/` - Compares old/new implementations with libfuzzer ## Related Issues - issue opened by @folkertdev llvm/llvm-project#176906 - Regression introduced in https://github.com/rust-lang/rust/pull/130733 |
||
|---|---|---|
| .github | ||
| compiler | ||
| library | ||
| LICENSES | ||
| src | ||
| tests | ||
| .clang-format | ||
| .editorconfig | ||
| .git-blame-ignore-revs | ||
| .gitattributes | ||
| .gitignore | ||
| .gitmodules | ||
| .ignore | ||
| .mailmap | ||
| bootstrap.example.toml | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CODE_OF_CONDUCT.md | ||
| configure | ||
| CONTRIBUTING.md | ||
| COPYRIGHT | ||
| INSTALL.md | ||
| LICENSE-APACHE | ||
| license-metadata.json | ||
| LICENSE-MIT | ||
| package.json | ||
| README.md | ||
| RELEASES.md | ||
| REUSE.toml | ||
| rust-bors.toml | ||
| rustfmt.toml | ||
| triagebot.toml | ||
| typos.toml | ||
| x | ||
| x.ps1 | ||
| x.py | ||
| yarn.lock | ||
This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.
Why Rust?
-
Performance: Fast and memory-efficient, suitable for critical services, embedded devices, and easily integrated with other languages.
-
Reliability: Our rich type system and ownership model ensure memory and thread safety, reducing bugs at compile-time.
-
Productivity: Comprehensive documentation, a compiler committed to providing great diagnostics, and advanced tooling including package manager and build tool (Cargo), auto-formatter (rustfmt), linter (Clippy) and editor support (rust-analyzer).
Quick Start
Read "Installation" from The Book.
Installing from Source
If you really want to install from source (though this is not recommended), see INSTALL.md.
Getting Help
See https://www.rust-lang.org/community for a list of chat platforms and forums.
Contributing
See CONTRIBUTING.md.
License
Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.
Trademark
The Rust Foundation owns and protects the Rust and Cargo trademarks and logos (the "Rust Trademarks").
If you want to use these names or brands, please read the Rust language trademark policy.
Third-party logos may be subject to third-party copyrights and trademarks. See Licenses for details.