Commit graph

287 commits

Author SHA1 Message Date
Stuart Cook
3d102a7812
Rollup merge of #150893 - ZuseZ4:move-un-register-lib, r=oli-obk
offload: move (un)register lib into global_ctors

Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.

Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in https://github.com/rust-lang/rust/pull/150683, where I introduce a new variant of our offload intrinsic.

r? oli-obk
2026-01-28 19:03:51 +11:00
Manuel Drehwald
35ce8ab120 adjust testcase for new logic 2026-01-27 10:43:21 -08:00
Stuart Cook
1c892e829c
Rollup merge of #147436 - okaneco:eq_ignore_ascii_autovec, r=scottmcm
slice/ascii: Optimize `eq_ignore_ascii_case` with auto-vectorization

- Refactor the current functionality into a helper function
- Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function
- Add a codegen test checking for vectorization and no panicking
- Add benches for `eq_ignore_ascii_case`

---

The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation.

Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.

Benchmarks - Cases below 16 bytes are unaffected, cases above all show sizeable improvements.
```
before:
    str::eq_ignore_ascii_case::bench_large_str_eq         4942.30ns/iter +/- 48.20
    str::eq_ignore_ascii_case::bench_medium_str_eq         632.01ns/iter +/- 16.87
    str::eq_ignore_ascii_case::bench_str_17_bytes_eq        16.28ns/iter  +/- 0.45
    str::eq_ignore_ascii_case::bench_str_31_bytes_eq        35.23ns/iter  +/- 2.28
    str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq       7.56ns/iter  +/- 0.22
    str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq    2.64ns/iter  +/- 0.06
after:
    str::eq_ignore_ascii_case::bench_large_str_eq         611.63ns/iter +/- 28.29
    str::eq_ignore_ascii_case::bench_medium_str_eq         77.10ns/iter +/- 19.76
    str::eq_ignore_ascii_case::bench_str_17_bytes_eq        3.49ns/iter  +/- 0.39
    str::eq_ignore_ascii_case::bench_str_31_bytes_eq        3.50ns/iter  +/- 0.27
    str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq      7.27ns/iter  +/- 0.09
    str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq   2.60ns/iter  +/- 0.05
```
2026-01-27 17:36:35 +11:00
Jonathan Pallant
6ecb3f33f0
Adds two new Tier 3 targets - aarch64v8r-unknown-none and aarch64v8r-unknown-none-softfloat.
The existing `aarch64-unknown-none` target assumes Armv8.0-A as a baseline. However, Arm recently released the Arm Cortex-R82 processor which is the first to implement the Armv8-R AArch64 mode architecture. This architecture is similar to Armv8-A AArch64, however it has a different set of mandatory features, and is based off of Armv8.4. It is largely unrelated to the existing Armv8-R architecture target (`armv8r-none-eabihf`), which only operates in AArch32 mode.

The second `aarch64v8r-unknown-none-softfloat` target allows for possible Armv8-R AArch64 CPUs with no FPU, or for use-cases where FPU register stacking is not desired. As with the existing `aarch64-unknown-none` target we have coupled FPU support and Neon support together - there is no 'has FPU but does not have NEON' target proposed even though the architecture technically allows for it.

This PR was developed by Ferrous Systems on behalf of Arm. Arm is the owner of these changes.
2026-01-26 12:43:52 +00:00
bors
873d4682c7 Auto merge of #151337 - the8472:bail-before-memcpy2, r=Mark-Simulacrum
optimize `vec.extend(slice.to_vec())`, take 2

Redoing https://github.com/rust-lang/rust/pull/130998
It was reverted in https://github.com/rust-lang/rust/pull/151150 due to flakiness. I have traced this to layout randomization perturbing the test (the failure reproduces locally with layout randomization), which is now excluded.
2026-01-25 19:45:35 +00:00
Matthias Krüger
0de96f455d
Rollup merge of #151405 - heiher:fix-cli, r=Mark-Simulacrum
LoongArch: Fix call-llvm-intrinsics test
2026-01-25 16:27:23 +01:00
Matthias Krüger
f6a8326a99
Rollup merge of #151404 - heiher:fix-dae, r=Mark-Simulacrum
LoongArch: Fix direct-access-external-data test

On LoongArch targets, `-Cdirect-access-external-data` defaults to `no`. Since copy relocations are not supported, `dso_local` is not emitted under `-Crelocation-model=static`, unlike on other targets.
2026-01-25 16:27:22 +01:00
Matthias Krüger
9dffb21112
Rollup merge of #150065 - is57primenumber:add-slice-cse-test, r=Mark-Simulacrum
add CSE optimization tests for iterating over slice

This PR is regression test for issue rust-lang/rust#119573.
This PR introduces a new regression test to verify a critical optimization known as Common Subexpression Elimination (CSE) is correctly applied during various slice iteration patterns.
2026-01-25 07:42:59 +01:00
Matthias Krüger
b651be2191
Rollup merge of #145393 - clubby789:issue-138497, r=Mark-Simulacrum
Add codegen test for removing trailing zeroes from `NonZero`

Closes rust-lang/rust#138497
2026-01-25 07:42:56 +01:00
bors
75963ce795 Auto merge of #151065 - nagisa:add-preserve-none-abi, r=petrochenkov
abi: add a rust-preserve-none calling convention

This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature.

For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of:

    foo:
        push r12
        ; do things
        pop r12
        jmp next_step

This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses.

I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)
2026-01-25 02:49:32 +00:00
Matthias Krüger
3a69035338
Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee
add `simd_splat` intrinsic

Add `simd_splat` which lowers to the LLVM canonical splat sequence.

```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```

Right now we try to fake it using one of

```rust
fn splat(x: u32) -> u32x8 {
    u32x8::from_array([x; 8])
}
```

or (in `stdarch`)

```rust
fn splat(value: $elem_type) -> $name {
    #[derive(Copy, Clone)]
    #[repr(simd)]
    struct JustOne([$elem_type; 1]);
    let one = JustOne([value]);
    // SAFETY: 0 is always in-bounds because we're shuffling
    // a simd type with exactly one element.
    unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```

Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:

- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804

---

As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.

Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.

Currently this just adds the intrinsic, it does not actually use it anywhere yet.
2026-01-24 21:04:15 +01:00
Simonas Kazlauskas
6db94dbc25 abi: add a rust-preserve-none calling convention
This is the conceptual opposite of the rust-cold calling convention and
is particularly useful in combination with the new `explicit_tail_calls`
feature.

For relatively tight loops implemented with tail calling (`become`) each
of the function with the regular calling convention is still responsible
for restoring the initial value of the preserved registers. So it is not
unusual to end up with a situation where each step in the tail call loop
is spilling and reloading registers, along the lines of:

    foo:
        push r12
        ; do things
        pop r12
        jmp next_step

This adds up quickly, especially when most of the clobberable registers
are already used to pass arguments or other uses.

I was thinking of making the name of this ABI a little less LLVM-derived
and more like a conceptual inverse of `rust-cold`, but could not come
with a great name (`rust-cold` is itself not a great name: cold in what
context? from which perspective? is it supposed to mean that the
function is rarely called?)
2026-01-24 19:23:17 +02:00
Folkert de Vries
71f34429ac
const-eval: do not call immediate_const_vector on vector of pointers 2026-01-24 10:40:47 +01:00
Jonathan Brouwer
13f0399a57
Rollup merge of #151259 - bonega:fix-is-ascii-avx512, r=folkertdev
Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native

## Summary

This PR fixes a severe performance regression in `slice::is_ascii` on AVX-512 CPUs when compiling with `-C target-cpu=native`.

On affected systems, the current implementation achieves only ~3 GB/s for large inputs, compared to ~60–70 GB/s previously (≈20–24× regression). This PR restores the original performance characteristics.

This change is intended as a **temporary workaround** for upstream LLVM poor codegen. Once the underlying LLVM issue is fixed and Rust is able to consume that fix, this workaround should be reverted.

  ## Problem

  When `is_ascii` is compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31 `kshiftrd` instructions to extract mask bits one-by-one, instead of using the efficient `pmovmskb`
  instruction. This causes a **~22x performance regression**.

  Because `is_ascii` is marked `#[inline]`, it gets inlined and recompiled with the user's target settings, affecting anyone using `-C target-cpu=native` on AVX-512 CPUs.

## Root cause (upstream)

The underlying issue appears to be an LLVM vectorizer/backend bug affecting certain AVX-512 patterns.

An upstream issue has been filed by @folkertdev  to track the root cause: llvm/llvm-project#176906

Until this is resolved in LLVM and picked up by rustc, this PR avoids triggering the problematic codegen pattern.

  ## Solution

  Replace the counting loop with explicit SSE2 intrinsics (`_mm_movemask_epi8`) that force `pmovmskb` codegen regardless of CPU features.

  ## Godbolt Links (Rust 1.92)

  | Pattern | Target | Link | Result |
  |---------|--------|------|--------|
  | Counting loop (old) | Default SSE2 | https://godbolt.org/z/sE86xz4fY | `pmovmskb` |
  | Counting loop (old) | AVX-512 (znver4) | https://godbolt.org/z/b3jvMhGd3 | 31x `kshiftrd` (broken) |
  | SSE2 intrinsics (fix) | Default SSE2 | https://godbolt.org/z/hMeGfeaPv | `pmovmskb` |
  | SSE2 intrinsics (fix) | AVX-512 (znver4) | https://godbolt.org/z/Tdvdqjohn | `vpmovmskb` (fixed) |

  ## Benchmark Results

  **CPU:** AMD Ryzen 5 7500F (Zen 4 with AVX-512)

  ### Default Target (SSE2) — Mixed

  | Size | Before | After | Change |
  |------|--------|-------|--------|
  | 4 B | 1.8 GB/s | 2.0 GB/s | **+11%** |
  | 8 B | 3.2 GB/s | 5.8 GB/s | **+81%** |
  | 16 B | 5.3 GB/s | 8.5 GB/s | **+60%** |
  | 32 B | 17.7 GB/s | 15.8 GB/s | -11% |
  | 64 B | 28.6 GB/s | 25.1 GB/s | -12% |
  | 256 B | 51.5 GB/s | 48.6 GB/s | ~same |
  | 1 KB | 64.9 GB/s | 60.7 GB/s | ~same |
  | 4 KB+ | ~68-70 GB/s | ~68-72 GB/s | ~same |

  ### Native Target (AVX-512) — Up to 24x Faster

  | Size | Before | After | Speedup |
  |------|--------|-------|---------|
  | 4 B | 1.2 GB/s | 2.0 GB/s | **1.7x** |
  | 8 B | 1.6 GB/s | 5.0 GB/s | **3.3x** |
  | 16 B | ~7 GB/s | ~7 GB/s | ~same |
  | 32 B | 2.9 GB/s | 14.2 GB/s | **4.9x** |
  | 64 B | 2.9 GB/s | 23.2 GB/s | **8x** |
  | 256 B | 2.9 GB/s | 47.2 GB/s | **16x** |
  | 1 KB | 2.8 GB/s | 60.0 GB/s | **21x** |
  | 4 KB+ | 2.9 GB/s | ~68-70 GB/s | **23-24x** |

  ### Summary

  - **SSE2 (default):** Small inputs (4-16 B) 11-81% faster; 32-64 B ~11% slower; large inputs unchanged
  - **AVX-512 (native):** 21-24x faster for inputs ≥1 KB, peak ~70 GB/s (was ~3 GB/s)

  Note: this is the pure ascii path, but the story is similar for the others.
  See linked bench project.

  ## Test Plan

  - [x] Assembly test (`slice-is-ascii-avx512.rs`) verifies no `kshiftrd` with AVX-512
  - [x] Existing codegen test updated to `loongarch64`-only (auto-vectorization still used there)
  - [x] Fuzz testing confirms old/new implementations produce identical results (~53M iterations)
  - [x] Benchmarks confirm performance improvement
  - [x] Tidy checks pass

  ## Reproduction / Test Projects

  Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation

  - `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
  - `fuzz/` - Compares old/new implementations with libfuzzer

  ## Related Issues
  - issue opened by @folkertdev llvm/llvm-project#176906
  - Regression introduced in https://github.com/rust-lang/rust/pull/130733
2026-01-24 08:18:05 +01:00
bors
9283d592de Auto merge of #151389 - scottmcm:vec-repeat, r=joboet
Use `repeat_packed` when calculating layouts in `RawVec`

Seeing whether this helps the icounts seen in https://github.com/rust-lang/rust/pull/148769#issuecomment-3769921666
2026-01-23 07:24:11 +00:00
Andreas Liljeqvist
c609cce8cf Merge is_ascii codegen tests using revisions
Combine the x86_64 and loongarch64 is_ascii tests into a single file
using compiletest revisions. Both now test assembly output:

- X86_64: Verifies no broken kshiftrd/kshiftrq instructions (AVX-512 fix)
- LA64: Verifies vmskltz.b instruction is used (auto-vectorization)
2026-01-22 22:18:00 +01:00
Matthew Maurer
b639b0a4d8 llvm: Tolerate dead_on_return attribute changes
The attribute now has a size parameter and sorts differently:
* Explicitly omit size parameter during construction on 23+
* Tolerate alternate sorting in tests

https://github.com/llvm/llvm-project/pull/171712
2026-01-21 23:39:03 +00:00
Scott McMurray
c3f309e32b Use repeat_packed when calculating layouts in RawVec 2026-01-21 01:11:12 -08:00
Jacob Pratt
43d2006c25
Rollup merge of #150436 - va-list-copy, r=workingjubilee,RalfJung
`c_variadic`: impl `va_copy` and `va_end` as Rust intrinsics

tracking issue: https://github.com/rust-lang/rust/issues/44930

Implement `va_copy` as (the rust equivalent of) `memcpy`, which is the behavior of all current LLVM targets. By providing our own implementation, we can guarantee its behavior. These guarantees are important for implementing c-variadics in e.g. const-eval.

Discussed in [#t-compiler/const-eval > c-variadics in const-eval](https://rust-lang.zulipchat.com/#narrow/channel/146212-t-compiler.2Fconst-eval/topic/c-variadics.20in.20const-eval/with/565509704).

I've also updated the comment for `Drop` a bit. The background here is that the C standard requires that `va_end` is used in the same function (and really, in the same scope) as the corresponding `va_start` or `va_copy`. That is because historically `va_start` would start a scope, which `va_end` would then close. e.g.

https://softwarepreservation.computerhistory.org/c_plus_plus/cfront/release_3.0.3/source/incl-master/proto-headers/stdarg.sol

```c
#define         va_start(ap, parmN)     {\
        va_buf  _va;\
        _vastart(ap = (va_list)_va, (char *)&parmN + sizeof parmN)
#define         va_end(ap)      }
#define         va_arg(ap, mode)        *((mode *)_vaarg(ap, sizeof (mode)))
```

The C standard still has to consider such implementations, but for Rust they are irrelevant. Hence we can use `Clone` for `va_copy` and `Drop` for `va_end`.
2026-01-20 19:46:29 -05:00
Jamie Hill-Daniel
76438f032a Add codegen test for issue 138497 2026-01-20 21:37:31 +00:00
Folkert de Vries
dd9241d150
c_variadic: use Clone instead of LLVM va_copy 2026-01-20 18:38:50 +01:00
Nikita Popov
0be66603ac Avoid passing addrspacecast to lifetime intrinsics
Since LLVM 22 the alloca must be passed directly. Do this by
stripping the addrspacecast if it exists.
2026-01-20 14:47:04 +01:00
WANG Rui
d977471ce2 LoongArch: Fix call-llvm-intrinsics test 2026-01-20 19:43:06 +08:00
WANG Rui
e3f198ec05 LoongArch: Fix direct-access-external-data test
On LoongArch targets, `-Cdirect-access-external-data` defaults to `no`.
Since copy relocations are not supported, `dso_local` is not emitted
under `-Crelocation-model=static`, unlike on other targets.
2026-01-20 16:26:15 +08:00
Stuart Cook
1262ff906b
Rollup merge of #150288 - offload-bench-fix, r=ZuseZ4
Add scalar support for offload

This PR adds scalar support to the offload feature. The scalar management has two main parts:

On the host side, each scalar arg is casted to `ix` type, zero extended to `i64` and passed to the kernel like that.
On the device, the each scalar arg (`i64` at that point), is truncated to `ix` and then casted to the original type.

r? @ZuseZ4
2026-01-20 18:00:08 +11:00
Marcelo Domínguez
307a4fcdf8 Add scalar support for both host and device 2026-01-19 22:28:42 +01:00
Folkert de Vries
80c0b99de0
add simd_splat intrinsic 2026-01-19 16:48:28 +01:00
Jonathan Brouwer
a56e2d3037
Rollup merge of #151071 - gen-openmp-metadata, r=nnethercote
Generate openmp metadata

LLVM has an openmp-opt pass, which is part of the default O3 pipeline.
The pass bails if we don't have a global called openmp, so let's generate it if people enable our experimental offload feature. openmp is a superset of the offload feature, so they share optimizations.
In follow-up PRs I'll start verifying that LLVM optimizes Rust the way we want it.

r? compiler
2026-01-19 08:31:31 +01:00
The 8472
2b8f4a562f avoid phi node for pointers flowing into Vec appends 2026-01-18 21:03:14 +01:00
Andreas Liljeqvist
a0f9a15b4a Fix is_ascii performance regression on AVX-512 CPUs
When `[u8]::is_ascii()` is compiled with `-C target-cpu=native` on
AVX-512 CPUs, LLVM generates inefficient code. Because `is_ascii` is
marked `#[inline]`, it gets inlined and recompiled with the user's
target settings. The previous implementation used a counting loop that
LLVM auto-vectorizes to `pmovmskb` on SSE2, but with AVX-512 enabled,
LLVM uses k-registers and extracts bits individually with ~31
`kshiftrd` instructions.

This fix replaces the counting loop with explicit SSE2 intrinsics
(`_mm_loadu_si128`, `_mm_or_si128`, `_mm_movemask_epi8`) for x86_64.
`_mm_movemask_epi8` compiles to `pmovmskb`, forcing efficient codegen
regardless of CPU features.

Benchmark results on AMD Ryzen 5 7500F (Zen 4 with AVX-512):
- Default build: ~73 GB/s → ~74 GB/s (no regression)
- With -C target-cpu=native: ~3 GB/s → ~67 GB/s (22x improvement)

The loongarch64 implementation retains the original counting loop
since it doesn't have this issue.

Regression from: https://github.com/rust-lang/rust/pull/130733
2026-01-17 17:38:51 +01:00
Manuel Drehwald
5c85d522d0 Generate global openmp metadata to trigger llvm openmp-opt pass 2026-01-16 14:57:32 -05:00
Jacob Pratt
6912c676cd
Rollup merge of #150607 - dispatch-ptr-intrinsic, r=workingjubilee
Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang/rust#135024
2026-01-15 19:35:46 -05:00
Jieyou Xu
cd79ff2e2c
Revert "avoid phi node for pointers flowing into Vec appends #130998"
This reverts PR <https://github.com/rust-lang/rust/pull/130998> because
the added test seems to be flaky / non-deterministic, and has been
failing in unrelated PRs during merge CI.
2026-01-15 09:37:16 +08:00
bors
86a49fd71f Auto merge of #130998 - the8472:bail-before-memcpy, r=nnethercote
avoid phi node for pointers flowing into Vec appends

Elide temporary allocations in patterns like `vec.append(slice.to_vec())`

related discussion: https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/nocapture.20and.20allocation.20elimination
2026-01-14 16:36:26 +00:00
Jonathan Brouwer
b431a5e685
Rollup merge of #151067 - ui_test_no_should_fail, r=lqd
Avoid should-fail in two ui tests and a codegen-llvm test

`should-fail` is only meant for testing the compiletest framework itself. It checks that the test runner itself panicked.

With this there are still a bunch of rustdoc-html tests that use it due to this test suite not supporting anything like `//@ doc-fail`.
2026-01-14 11:05:40 +01:00
bjorn3
15112eee67 Avoid should-fail in a codegen-llvm test 2026-01-13 15:21:20 +00:00
Hans Wennborg
6ca950136d Relax test expectation for @__llvm_profile_runtime_user
After https://github.com/llvm/llvm-project/pull/174174 it has profile
info marking it cold.
2026-01-12 11:03:07 +01:00
The 8472
468eb45b3f avoid phi node for pointers flowing into Vec appends 2026-01-12 02:54:30 +01:00
Stuart Cook
30585ebbd3
Rollup merge of #150494 - extern_linkage_dso_local, r=bjorn3
Fix dso_local for external statics with linkage

Tracking issue of the feature: rust-lang/rust#127488

DSO local attributes are not correctly applied to extern statics with `#[linkage = "foo"]` as we generate an internal global for such statics, and the we evaluate (and apply) DSO attributes on the internal one instead.

Fix this by applying DSO local attributes on the actually extern ones, too.
2026-01-11 14:27:55 +11:00
Flakebi
91d4e40e02
Add amdgpu_dispatch_ptr intrinsic
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.

The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.

The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
2026-01-09 10:41:37 +01:00
is57primenumber
65c17223c1 add CSE optimization tests for iterating over slice 2026-01-05 05:17:56 +09:00
Matthias Krüger
1494755275
Rollup merge of #150426 - ZuseZ4:offload-register-lib, r=davidtwco
Update offload test and verify that tgt_(un)register_lib have the right type

Apparently, we weren't running offload tests when Enzyme wasn't built. Time to fix that.
Also adds a test mode which generates the host IR, but does not expect device IR/artifacts. This way, we don't have to handle artifacts and paths in our tests.
Also removes some outdated documentation.

cc `@Kevinsala,` `@Sa4dUs`

closes: https://github.com/rust-lang/rust/issues/150415

~~blocked on `needs-offload` infrastructure landing in https://github.com/rust-lang/rust/pull/150427~~
2026-01-04 21:14:05 +01:00
Manuel Drehwald
fa584faca5 Update test and verify that tgt_(un)register_lib have the right type 2026-01-04 06:58:31 -08:00
bors
f57b9e6f56 Auto merge of #150564 - rwardd:rwardd/option_or_codegen_tests, r=scottmcm
Added codegen tests for different forms of `Option::or`

Adds tests to check the output of the different ways of writing `Option::or`

Fixes rust-lang/rust#124533
2026-01-03 22:47:35 +00:00
Ryan Ward
a2fcb0de18 fix: add CHECK directives to ret comments and be more pervasive with directive contents 2026-01-03 12:50:38 +10:30
Ryan
3df06f5083
fix: use std::num::NonZero instead of extern crate and extend information in CHECK- directives
Co-authored-by: scottmcm <scottmcm@users.noreply.github.com>
2026-01-03 10:53:54 +10:30
bors
85c8ff69cb Auto merge of #150606 - JonathanBrouwer:rollup-lue4jqz, r=JonathanBrouwer
Rollup of 6 pull requests

Successful merges:

 - rust-lang/rust#150425 (mapping an error from cmd.spawn() in npm::install)
 - rust-lang/rust#150444 (Expose kernel launch options as offload intrinsic args)
 - rust-lang/rust#150495 (Correct hexagon "unwinder_private_data_size")
 - rust-lang/rust#150578 (Fix a typo in the docs of AsMut for rust-lang/rust#149609)
 - rust-lang/rust#150581 (mir_build: Separate match lowering for string-equality and scalar-equality)
 - rust-lang/rust#150594 (Fix typo in the docs of `CString::from_vec_with_nul`)

r? `@ghost`
`@rustbot` modify labels: rollup
2026-01-02 19:45:27 +00:00
bors
5497a36a7f Auto merge of #149658 - Enselic:non-zero-opt, r=Mark-Simulacrum
tests/codegen-llvm/some-non-zero-from-atomic-optimization.rs: New test

Closes rust-lang/rust#60044 which has one 👍 and one ❤️  vote and just **E-needs-test**.
2026-01-02 16:29:24 +00:00
Marcelo Domínguez
58e2610f71 Expose workgroup/thread dims as intrinsic args 2026-01-02 11:50:32 +01:00
Ryan Ward
66c4ead02d fix: added further CHECK-SAME labels and replaced all struct input tests with NonZero<u8> input 2026-01-02 12:54:17 +10:30