rust/tests/assembly-llvm
Stuart Cook a6e8a31b86
Rollup merge of #151611 - bonega:improve-is-slice-is-ascii-performance, r=folkertdev
Improve is_ascii performance on x86_64 with explicit SSE2 intrinsics

# Summary

Improves `slice::is_ascii` performance for SSE2 target roughly 1.5-2x on larger inputs.
AVX-512 keeps similiar performance characteristics.

This is building on the work already merged in rust-lang/rust#151259.
In particular this PR improves the default SSE2 performance, I don't consider this a temporary fix anymore.
Thanks to @folkertdev for pointing me to consider `as_chunk` again.

# The implementation:
- Uses 64-byte chunks with 4x 16-byte SSE2 loads OR'd together
- Extracts the MSB mask with a single `pmovmskb` instruction
- Falls back to usize-at-a-time SWAR for inputs < 64 bytes

# Performance impact (vs before rust-lang/rust#151259):
- AVX-512: 34-48x faster
- SSE2: 1.5-2x faster

  <details>
  <summary>Benchmark Results (click to expand)</summary>

  Benchmarked on AMD Ryzen 9 9950X (AVX-512 capable). Values show relative performance (1.00 = fastest).
  Tops out at 139GB/s for large inputs.

  ### early_non_ascii

  | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
  |------------|------------|----------|------------|----------|
  | 64 | 1.01 | **1.00** | 13.45 | 1.13 |
  | 1024 | 1.01 | **1.00** | 13.53 | 1.14 |
  | 65536 | 1.01 | **1.00** | 13.99 | 1.12 |
  | 1048576 | 1.02 | **1.00** | 13.29 | 1.12 |

  ### late_non_ascii

  | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
  |------------|------------|----------|------------|----------|
  | 64 | **1.00** | 1.01 | 13.37 | 1.13 |
  | 1024 | 1.10 | **1.00** | 42.42 | 1.95 |
  | 65536 | **1.00** | 1.06 | 42.22 | 1.73 |
  | 1048576 | **1.00** | 1.03 | 34.73 | 1.46 |

  ### pure_ascii

  | Input Size | new_avx512 | new_sse2 | old_avx512 | old_sse2 |
  |------------|------------|----------|------------|----------|
  | 4 | 1.03 | **1.00** | 1.75 | 1.32 |
  | 8 | **1.00** | 1.14 | 3.89 | 2.06 |
  | 16 | **1.00** | 1.04 | 1.13 | 1.62 |
  | 32 | 1.07 | 1.19 | 5.11 | **1.00** |
  | 64 | **1.00** | 1.13 | 13.32 | 1.57 |
  | 128 | **1.00** | 1.01 | 19.97 | 1.55 |
  | 256 | **1.00** | 1.02 | 27.77 | 1.61 |
  | 1024 | **1.00** | 1.02 | 41.34 | 1.84 |
  | 4096 | 1.02 | **1.00** | 45.61 | 1.98 |
  | 16384 | 1.01 | **1.00** | 48.67 | 2.04 |
  | 65536 | **1.00** | 1.03 | 43.86 | 1.77 |
  | 262144 | **1.00** | 1.06 | 41.44 | 1.79 |
  | 1048576 | 1.02 | **1.00** | 35.36 | 1.44 |

  </details>

## Reproduction / Test Projects

Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation

- `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
- `fuzz/` - Compares old/new implementations with libfuzzer

Relates to: https://github.com/llvm/llvm-project/issues/176906
2026-01-26 14:36:21 +11:00
..
asm s390x: support f16 and f16x8 in inline assembly 2026-01-09 18:42:46 +01:00
auxiliary Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
compiletest-self-test compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
libs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
naked-functions naked functions: emit .private_extern on macos 2026-01-06 16:48:04 +01:00
nvptx-kernel-abi Nvptx: Use llbc as default linker 2025-12-19 21:39:48 +01:00
sanitizer/kcfi compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
simd Ignore intrinsic calls in cross-crate-inlining cost model 2025-09-05 20:44:49 -04:00
stack-protector Rollup merge of #148849 - saethlin:windows-stack-protectors, r=wesleywiser 2025-12-18 18:37:14 +01:00
targets Add ARMv6 bare-metal targets 2026-01-24 17:29:25 +00:00
aarch64-pointer-auth.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
aarch64-xray.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
align_offset.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
breakpoint.rs Ignore intrinsic calls in cross-crate-inlining cost model 2025-09-05 20:44:49 -04:00
c-variadic-arm.rs implement va_arg for arm in rustc itself 2025-09-08 13:46:28 +02:00
closure-inherit-target-feature.rs sess: default to v0 symbol mangling 2025-11-19 11:55:09 +00:00
cmse.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
cstring-merging.rs Fix cstring-merging test for Hexagon target 2026-01-23 23:45:36 -06:00
dwarf-mixed-versions-lto.rs Fix tests/assembly-llvm/dwarf-mixed-versions-lto.rs test failure on riscv64 2025-07-23 11:14:07 +00:00
dwarf4.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
dwarf5.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
emit-intel-att-syntax.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
force-target-feature.rs Add an experimental unsafe(force_target_feature) attribute. 2025-08-22 01:26:26 +02:00
is_aligned.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
issue-83585-small-pod-struct-equality.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
large_data_threshold.rs Add -Z large-data-threshold 2026-01-07 11:57:48 -08:00
loongarch-float-struct-abi.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
manual-eq-efficient.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
niche-prefer-zero.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
nvptx-arch-default.rs Nvptx: Use llbc as default linker 2025-12-19 21:39:48 +01:00
nvptx-arch-emit-asm.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
nvptx-arch-link-arg.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
nvptx-arch-target-cpu.rs Nvptx: Use llbc as default linker 2025-12-19 21:39:48 +01:00
nvptx-atomics.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
nvptx-c-abi-arg-v7.rs Nvptx: Use llbc as default linker 2025-12-19 21:39:48 +01:00
nvptx-c-abi-ret-v7.rs Nvptx: Use llbc as default linker 2025-12-19 21:39:48 +01:00
nvptx-internalizing.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
nvptx-linking-binary.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
nvptx-linking-cdylib.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
nvptx-safe-naming.rs Nvptx: Use llbc as default linker 2025-12-19 21:39:48 +01:00
panic-no-unwind-no-uwtable.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
panic-unwind-no-uwtable.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
pic-relocation-model.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
pie-relocation-model.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
powerpc64-struct-abi.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
reg-struct-return.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
regparm-module-flag.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
riscv-float-struct-abi.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
riscv-soft-abi-with-float-features.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
rust-abi-arg-attr.rs adding minicore to test file to avoid duplicating lang error 2026-01-09 02:30:33 +00:00
s390x-backchain-toggle.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
s390x-vector-abi.rs stabilize s390x_target_feature_vector 2025-11-06 12:49:48 +01:00
simd-bitmask.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
simd-intrinsic-gather.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
simd-intrinsic-mask-load.rs Add alignment parameter to simd_masked_{load,store} 2025-11-04 02:30:59 +05:30
simd-intrinsic-mask-reduce.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
simd-intrinsic-mask-store.rs Add alignment parameter to simd_masked_{load,store} 2025-11-04 02:30:59 +05:30
simd-intrinsic-scatter.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
simd-intrinsic-select.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
slice-is-ascii.rs Mark is_ascii_sse2 as #[inline] 2026-01-25 20:05:08 +01:00
slice-is_ascii.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
slp-vectorize-closure.rs Add regression test for closure loop vectorization 2025-12-09 23:02:14 +09:00
small_data_threshold.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
sparc-struct-abi.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
stack-probes.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
static-relocation-model.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
strict_provenance.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
tail-call-infinite-recursion.rs add assembly test for infinite recursion with become 2025-11-13 16:57:02 +01:00
target-feature-multiple.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
wasm_exceptions.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
x86-return-float.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
x86_64-array-pair-load-store-merge.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
x86_64-bigint-helpers.rs x86_64-bigint-helpers test: update test assertion 2025-10-09 12:28:06 +00:00
x86_64-cmp.rs Merge similar output checks in assembly-llvm/x86_64-cmp 2025-09-16 11:49:21 -07:00
x86_64-floating-point-clamp.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
x86_64-fortanix-unknown-sgx-lvi-generic-load.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
x86_64-fortanix-unknown-sgx-lvi-generic-ret.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
x86_64-fortanix-unknown-sgx-lvi-inline-assembly.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
x86_64-function-return.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
x86_64-indirect-branch-cs-prefix.rs Add -Zindirect-branch-cs-prefix option 2025-08-17 16:51:42 +02:00
x86_64-mcount.rs Test instrument-mcount 2025-08-26 13:44:00 +00:00
x86_64-no-jump-tables.rs Stabilize -Zjump-tables=<bool> into -Cjump-table=<bool> 2025-11-03 08:12:16 -06:00
x86_64-sse_crc.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
x86_64-typed-swap.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00
x86_64-windows-float-abi.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
x86_64-windows-i128-abi.rs compiletest: rename add-core-stubs to add-minicore 2025-11-02 16:20:06 +01:00
x86_64-xray.rs Rename tests/assembly into tests/assembly-llvm 2025-07-22 14:27:48 +02:00