Find a file
Jonathan Brouwer 13f0399a57
Rollup merge of #151259 - bonega:fix-is-ascii-avx512, r=folkertdev
Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native

## Summary

This PR fixes a severe performance regression in `slice::is_ascii` on AVX-512 CPUs when compiling with `-C target-cpu=native`.

On affected systems, the current implementation achieves only ~3 GB/s for large inputs, compared to ~60–70 GB/s previously (≈20–24× regression). This PR restores the original performance characteristics.

This change is intended as a **temporary workaround** for upstream LLVM poor codegen. Once the underlying LLVM issue is fixed and Rust is able to consume that fix, this workaround should be reverted.

  ## Problem

  When `is_ascii` is compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31 `kshiftrd` instructions to extract mask bits one-by-one, instead of using the efficient `pmovmskb`
  instruction. This causes a **~22x performance regression**.

  Because `is_ascii` is marked `#[inline]`, it gets inlined and recompiled with the user's target settings, affecting anyone using `-C target-cpu=native` on AVX-512 CPUs.

## Root cause (upstream)

The underlying issue appears to be an LLVM vectorizer/backend bug affecting certain AVX-512 patterns.

An upstream issue has been filed by @folkertdev  to track the root cause: llvm/llvm-project#176906

Until this is resolved in LLVM and picked up by rustc, this PR avoids triggering the problematic codegen pattern.

  ## Solution

  Replace the counting loop with explicit SSE2 intrinsics (`_mm_movemask_epi8`) that force `pmovmskb` codegen regardless of CPU features.

  ## Godbolt Links (Rust 1.92)

  | Pattern | Target | Link | Result |
  |---------|--------|------|--------|
  | Counting loop (old) | Default SSE2 | https://godbolt.org/z/sE86xz4fY | `pmovmskb` |
  | Counting loop (old) | AVX-512 (znver4) | https://godbolt.org/z/b3jvMhGd3 | 31x `kshiftrd` (broken) |
  | SSE2 intrinsics (fix) | Default SSE2 | https://godbolt.org/z/hMeGfeaPv | `pmovmskb` |
  | SSE2 intrinsics (fix) | AVX-512 (znver4) | https://godbolt.org/z/Tdvdqjohn | `vpmovmskb` (fixed) |

  ## Benchmark Results

  **CPU:** AMD Ryzen 5 7500F (Zen 4 with AVX-512)

  ### Default Target (SSE2) — Mixed

  | Size | Before | After | Change |
  |------|--------|-------|--------|
  | 4 B | 1.8 GB/s | 2.0 GB/s | **+11%** |
  | 8 B | 3.2 GB/s | 5.8 GB/s | **+81%** |
  | 16 B | 5.3 GB/s | 8.5 GB/s | **+60%** |
  | 32 B | 17.7 GB/s | 15.8 GB/s | -11% |
  | 64 B | 28.6 GB/s | 25.1 GB/s | -12% |
  | 256 B | 51.5 GB/s | 48.6 GB/s | ~same |
  | 1 KB | 64.9 GB/s | 60.7 GB/s | ~same |
  | 4 KB+ | ~68-70 GB/s | ~68-72 GB/s | ~same |

  ### Native Target (AVX-512) — Up to 24x Faster

  | Size | Before | After | Speedup |
  |------|--------|-------|---------|
  | 4 B | 1.2 GB/s | 2.0 GB/s | **1.7x** |
  | 8 B | 1.6 GB/s | 5.0 GB/s | **3.3x** |
  | 16 B | ~7 GB/s | ~7 GB/s | ~same |
  | 32 B | 2.9 GB/s | 14.2 GB/s | **4.9x** |
  | 64 B | 2.9 GB/s | 23.2 GB/s | **8x** |
  | 256 B | 2.9 GB/s | 47.2 GB/s | **16x** |
  | 1 KB | 2.8 GB/s | 60.0 GB/s | **21x** |
  | 4 KB+ | 2.9 GB/s | ~68-70 GB/s | **23-24x** |

  ### Summary

  - **SSE2 (default):** Small inputs (4-16 B) 11-81% faster; 32-64 B ~11% slower; large inputs unchanged
  - **AVX-512 (native):** 21-24x faster for inputs ≥1 KB, peak ~70 GB/s (was ~3 GB/s)

  Note: this is the pure ascii path, but the story is similar for the others.
  See linked bench project.

  ## Test Plan

  - [x] Assembly test (`slice-is-ascii-avx512.rs`) verifies no `kshiftrd` with AVX-512
  - [x] Existing codegen test updated to `loongarch64`-only (auto-vectorization still used there)
  - [x] Fuzz testing confirms old/new implementations produce identical results (~53M iterations)
  - [x] Benchmarks confirm performance improvement
  - [x] Tidy checks pass

  ## Reproduction / Test Projects

  Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation

  - `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
  - `fuzz/` - Compares old/new implementations with libfuzzer

  ## Related Issues
  - issue opened by @folkertdev llvm/llvm-project#176906
  - Regression introduced in https://github.com/rust-lang/rust/pull/130733
2026-01-24 08:18:05 +01:00
.github ci: Move lockfile updates to a script 2026-01-19 05:23:45 +00:00
compiler Rollup merge of #150556 - thejpster:add-thumbv7a-thumbv7r-thumbv8r, r=petrochenkov 2026-01-24 08:18:05 +01:00
library Rollup merge of #151259 - bonega:fix-is-ascii-avx512, r=folkertdev 2026-01-24 08:18:05 +01:00
LICENSES Synchronize Unicode license text from unicode.org 2024-11-20 00:54:12 -08:00
src Rollup merge of #150556 - thejpster:add-thumbv7a-thumbv7r-thumbv8r, r=petrochenkov 2026-01-24 08:18:05 +01:00
tests Rollup merge of #151259 - bonega:fix-is-ascii-avx512, r=folkertdev 2026-01-24 08:18:05 +01:00
.clang-format Add .clang-format 2024-06-26 05:56:00 +08:00
.editorconfig editorconfig: don't use nonexistant syntax 2025-08-24 10:37:19 -05:00
.git-blame-ignore-revs git: ignore 60600a6fa4 for blame purposes 2025-04-17 11:50:24 +08:00
.gitattributes Mark .pp files as Rust 2025-03-29 12:39:06 +01:00
.gitignore ignore build-rust-analyzer even if it's a symlink 2025-11-12 16:41:41 +01:00
.gitmodules Update to LLVM 21 2025-08-01 10:17:04 +02:00
.ignore change config.toml to bootstrap.toml for bootstrap module 2025-03-17 12:56:41 +05:30
.mailmap Add temporary new bors e-mail address to the mailmap 2026-01-14 18:01:50 +01:00
bootstrap.example.toml Change how we build offload as a single Step 2025-12-22 23:50:11 +01:00
Cargo.lock Update Cargo.lock 2026-01-22 19:03:11 +01:00
Cargo.toml Bump Clippy version -> 0.1.95 2026-01-22 16:55:20 +01:00
CODE_OF_CONDUCT.md Remove the code of conduct; instead link https://www.rust-lang.org/conduct.html 2019-10-05 22:55:19 +02:00
configure Ensure ./configure works when configure.py path contains spaces 2024-02-16 18:57:22 +00:00
CONTRIBUTING.md chore: remove dead links 2025-09-22 10:15:50 +01:00
COPYRIGHT Merge commit '500e0ff187' into clippy-subtree-update 2026-01-09 10:37:00 +01:00
INSTALL.md Rename rust.use-lld to rust.bootstrap-override-lld in INSTALL.md 2026-01-14 14:20:45 -06:00
LICENSE-APACHE Merge commit '500e0ff187' into clippy-subtree-update 2026-01-09 10:37:00 +01:00
license-metadata.json Update license metadata 2025-02-15 16:48:37 +01:00
LICENSE-MIT Merge commit '500e0ff187' into clippy-subtree-update 2026-01-09 10:37:00 +01:00
package.json Update browser-ui-test version to 0.23.1 2026-01-07 13:55:53 +01:00
README.md Merge commit '500e0ff187' into clippy-subtree-update 2026-01-09 10:37:00 +01:00
RELEASES.md Use version 1.93.0 in RELEASES.md instead of 1.93 2026-01-20 12:57:16 +01:00
REUSE.toml Use yarn instead of npm in tidy 2025-11-17 10:58:13 +02:00
rust-bors.toml Add S-blocked to labels_blocking_approval 2026-01-19 16:01:16 +01:00
rustfmt.toml Rename tests/rustdoc into tests/rustdoc-html 2026-01-07 14:23:30 +01:00
triagebot.toml misc: roll bootstrap reviewers for src/tools/build-manifest 2026-01-21 09:59:34 +08:00
typos.toml spellcheck: "numer" as in "numerator" 2026-01-19 15:31:25 +01:00
x fix ./x readdir logic when CDPATH is set 2025-09-19 16:48:05 +02:00
x.ps1 use & instead of start-process in x.ps1 2023-12-09 09:46:16 -05:00
x.py Reformat Python code with ruff 2024-12-04 23:03:44 +01:00
yarn.lock Update npm package, remove another unused one, and fix some typing errors that started popping when we moved to yarn 2025-11-17 10:58:18 +02:00

This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.

Why Rust?

  • Performance: Fast and memory-efficient, suitable for critical services, embedded devices, and easily integrated with other languages.

  • Reliability: Our rich type system and ownership model ensure memory and thread safety, reducing bugs at compile-time.

  • Productivity: Comprehensive documentation, a compiler committed to providing great diagnostics, and advanced tooling including package manager and build tool (Cargo), auto-formatter (rustfmt), linter (Clippy) and editor support (rust-analyzer).

Quick Start

Read "Installation" from The Book.

Installing from Source

If you really want to install from source (though this is not recommended), see INSTALL.md.

Getting Help

See https://www.rust-lang.org/community for a list of chat platforms and forums.

Contributing

See CONTRIBUTING.md.

License

Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.

See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.

Trademark

The Rust Foundation owns and protects the Rust and Cargo trademarks and logos (the "Rust Trademarks").

If you want to use these names or brands, please read the Rust language trademark policy.

Third-party logos may be subject to third-party copyrights and trademarks. See Licenses for details.