Commit graph

3289 commits

Author SHA1 Message Date
Stuart Cook
3d102a7812
Rollup merge of #150893 - ZuseZ4:move-un-register-lib, r=oli-obk
offload: move (un)register lib into global_ctors

Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.

Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in https://github.com/rust-lang/rust/pull/150683, where I introduce a new variant of our offload intrinsic.

r? oli-obk
2026-01-28 19:03:51 +11:00
Manuel Drehwald
1f11bf6649 Leave note to drop tgt_init_all_rtls in the future 2026-01-27 10:43:22 -08:00
Manuel Drehwald
7eae36f017 Add an early return if handling multiple offload calls 2026-01-27 10:43:03 -08:00
bors
873d4682c7 Auto merge of #151337 - the8472:bail-before-memcpy2, r=Mark-Simulacrum
optimize `vec.extend(slice.to_vec())`, take 2

Redoing https://github.com/rust-lang/rust/pull/130998
It was reverted in https://github.com/rust-lang/rust/pull/151150 due to flakiness. I have traced this to layout randomization perturbing the test (the failure reproduces locally with layout randomization), which is now excluded.
2026-01-25 19:45:35 +00:00
bors
75963ce795 Auto merge of #151065 - nagisa:add-preserve-none-abi, r=petrochenkov
abi: add a rust-preserve-none calling convention

This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature.

For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of:

    foo:
        push r12
        ; do things
        pop r12
        jmp next_step

This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses.

I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)
2026-01-25 02:49:32 +00:00
Matthias Krüger
3a69035338
Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee
add `simd_splat` intrinsic

Add `simd_splat` which lowers to the LLVM canonical splat sequence.

```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```

Right now we try to fake it using one of

```rust
fn splat(x: u32) -> u32x8 {
    u32x8::from_array([x; 8])
}
```

or (in `stdarch`)

```rust
fn splat(value: $elem_type) -> $name {
    #[derive(Copy, Clone)]
    #[repr(simd)]
    struct JustOne([$elem_type; 1]);
    let one = JustOne([value]);
    // SAFETY: 0 is always in-bounds because we're shuffling
    // a simd type with exactly one element.
    unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```

Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:

- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804

---

As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.

Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.

Currently this just adds the intrinsic, it does not actually use it anywhere yet.
2026-01-24 21:04:15 +01:00
Simonas Kazlauskas
6db94dbc25 abi: add a rust-preserve-none calling convention
This is the conceptual opposite of the rust-cold calling convention and
is particularly useful in combination with the new `explicit_tail_calls`
feature.

For relatively tight loops implemented with tail calling (`become`) each
of the function with the regular calling convention is still responsible
for restoring the initial value of the preserved registers. So it is not
unusual to end up with a situation where each step in the tail call loop
is spilling and reloading registers, along the lines of:

    foo:
        push r12
        ; do things
        pop r12
        jmp next_step

This adds up quickly, especially when most of the clobberable registers
are already used to pass arguments or other uses.

I was thinking of making the name of this ABI a little less LLVM-derived
and more like a conceptual inverse of `rust-cold`, but could not come
with a great name (`rust-cold` is itself not a great name: cold in what
context? from which perspective? is it supposed to mean that the
function is rarely called?)
2026-01-24 19:23:17 +02:00
Jonathan Brouwer
48b9a6c298
Rollup merge of #151527 - tgross35:f16-fixme-cleanup, r=folkertdev
Clean up or resolve cfg-related instances of `FIXME(f16_f128)`

* Replace target-specific config that has a `FIXME` with `cfg(target_has_reliable_f*)`
* Take care of trivial intrinsic-related FIXMEs
* Split `FIXME(f16_f128)` into `FIXME(f16)`, `FIXME(f128)`, or `FIXME(f16,f128)` to more clearly identify what they block

The individual commit messages have more details.
2026-01-23 11:07:57 +01:00
Jonathan Brouwer
dec8d6ebcf
Rollup merge of #150780 - fzakaria:fzakaria/section-threshold, r=jackh726
Add -Z large-data-threshold

This flag allows specifying the threshold size for placing static data in large data sections when using the medium code model on x86-64.

When using -Ccode-model=medium, data smaller than this threshold uses RIP-relative addressing (32-bit offsets), while larger data uses absolute 64-bit addressing. This allows the compiler to generate more efficient code for smaller data while still supporting data larger than 2GB.

This mirrors the -mlarge-data-threshold flag available in GCC and Clang. The default threshold is 65536 bytes (64KB) if not specified, matching LLVM's default behavior.
2026-01-23 11:07:55 +01:00
Trevor Gross
490b307740 cleanup: Start splitting FIXME(f16_f128) into f16, f128, or f16,f128
Make it easier to identify which FIXMEs are blocking stabilization of
which type.
2026-01-22 23:41:57 -06:00
Jonathan Brouwer
704eaef9d4
Rollup merge of #151465 - RalfJung:fn-call-vars, r=mati865
codegen: clarify some variable names around function calls

I looked at rust-lang/rust#145932 to try to understand how it works, and quickly got lost in the variable names -- what refers to the caller, what to the callee? So here's my attempt at making those more clear. Hopefully the new names are correct.^^

Cc @JamieCunliffe
2026-01-22 13:35:42 +01:00
Jacob Pratt
512cc8d785
Rollup merge of #151442 - clubby789:crate-type-port, r=JonathanBrouwer
Port `#![crate_type]` to the attribute parser

Tracking issue: https://github.com/rust-lang/rust/issues/131229

~~Note that the actual parsing that is used in the compiler session is unchanged, as it must happen very early on; this just ports the validation logic.~~

Also added `// tidy-alphabetical-start` to `check_attr.rs` to make it a bit less conflict-prone
2026-01-22 00:37:43 -05:00
Jamie Hill-Daniel
66b78b700b Port crate_type to attribute parser 2026-01-22 02:34:28 +00:00
León Orell Valerian Liehr
558a59258e
Support debuginfo for assoc const bindings 2026-01-21 18:52:08 +01:00
Ralf Jung
29ed211215 codegen: clarify some variable names around function calls 2026-01-21 18:01:30 +01:00
Jacob Pratt
2206d935f7
Rollup merge of #149209 - lto_refactors8, r=jackh726
Move LTO to OngoingCodegen::join

This will make it easier to in the future move all this code to link_binary.

Follow up to https://github.com/rust-lang/rust/pull/147810
Part of https://github.com/rust-lang/compiler-team/issues/908
2026-01-21 02:04:01 -05:00
Manuel Drehwald
43111396e3 move initialization of omp/ol runtimes into global_ctor/dtor 2026-01-20 20:06:08 -05:00
Jacob Pratt
db9ff0d44f
Rollup merge of #151429 - s390x, r=durin42
s390x: Support aligned stack datalayout

LLVM 23 will mark the stack as aligned for more efficient code:

https://github.com/llvm/llvm-project/pull/176041

r? durin42

@rustbot label llvm-main
2026-01-20 19:46:32 -05:00
Jacob Pratt
43d2006c25
Rollup merge of #150436 - va-list-copy, r=workingjubilee,RalfJung
`c_variadic`: impl `va_copy` and `va_end` as Rust intrinsics

tracking issue: https://github.com/rust-lang/rust/issues/44930

Implement `va_copy` as (the rust equivalent of) `memcpy`, which is the behavior of all current LLVM targets. By providing our own implementation, we can guarantee its behavior. These guarantees are important for implementing c-variadics in e.g. const-eval.

Discussed in [#t-compiler/const-eval > c-variadics in const-eval](https://rust-lang.zulipchat.com/#narrow/channel/146212-t-compiler.2Fconst-eval/topic/c-variadics.20in.20const-eval/with/565509704).

I've also updated the comment for `Drop` a bit. The background here is that the C standard requires that `va_end` is used in the same function (and really, in the same scope) as the corresponding `va_start` or `va_copy`. That is because historically `va_start` would start a scope, which `va_end` would then close. e.g.

https://softwarepreservation.computerhistory.org/c_plus_plus/cfront/release_3.0.3/source/incl-master/proto-headers/stdarg.sol

```c
#define         va_start(ap, parmN)     {\
        va_buf  _va;\
        _vastart(ap = (va_list)_va, (char *)&parmN + sizeof parmN)
#define         va_end(ap)      }
#define         va_arg(ap, mode)        *((mode *)_vaarg(ap, sizeof (mode)))
```

The C standard still has to consider such implementations, but for Rust they are irrelevant. Hence we can use `Clone` for `va_copy` and `Drop` for `va_end`.
2026-01-20 19:46:29 -05:00
Matthew Maurer
39296ff8f8 s390x: Support aligned stack datalayout
LLVM 23 will mark the stack as aligned for more efficient code:

https://github.com/llvm/llvm-project/pull/176041
2026-01-20 20:21:11 +00:00
Folkert de Vries
dd9241d150
c_variadic: use Clone instead of LLVM va_copy 2026-01-20 18:38:50 +01:00
Nikita Popov
08da3685ed Don't use evex512 with LLVM 22
As Intel has walked back on the existence of AVX 10.1-256, LLVM
no longer uses evex512 and avx-10.n-512 are now avx-10.n instead,
so we can skip all the special handling on LLVM 22.
2026-01-20 14:47:09 +01:00
Nikita Popov
0be66603ac Avoid passing addrspacecast to lifetime intrinsics
Since LLVM 22 the alloca must be passed directly. Do this by
stripping the addrspacecast if it exists.
2026-01-20 14:47:04 +01:00
Nikita Popov
bf3ac98d69 Update amdgpu data layout
This changed in:
853760bca6
2026-01-20 14:46:58 +01:00
Stuart Cook
1262ff906b
Rollup merge of #150288 - offload-bench-fix, r=ZuseZ4
Add scalar support for offload

This PR adds scalar support to the offload feature. The scalar management has two main parts:

On the host side, each scalar arg is casted to `ix` type, zero extended to `i64` and passed to the kernel like that.
On the device, the each scalar arg (`i64` at that point), is truncated to `ix` and then casted to the original type.

r? @ZuseZ4
2026-01-20 18:00:08 +11:00
Marcelo Domínguez
307a4fcdf8 Add scalar support for both host and device 2026-01-19 22:28:42 +01:00
Folkert de Vries
80c0b99de0
add simd_splat intrinsic 2026-01-19 16:48:28 +01:00
bors
d940e56841 Auto merge of #151363 - JonathanBrouwer:rollup-yIXELnN, r=JonathanBrouwer
Rollup of 2 pull requests

Successful merges:

 - rust-lang/rust#151336 (Port rustc codegen attrs)
 - rust-lang/rust#151359 (compiler: Temporarily re-export `assert_matches!` to reduce stabilization churn)

r? @ghost
2026-01-19 13:09:33 +00:00
Jonathan Brouwer
a56e2d3037
Rollup merge of #151071 - gen-openmp-metadata, r=nnethercote
Generate openmp metadata

LLVM has an openmp-opt pass, which is part of the default O3 pipeline.
The pass bails if we don't have a global called openmp, so let's generate it if people enable our experimental offload feature. openmp is a superset of the offload feature, so they share optimizations.
In follow-up PRs I'll start verifying that LLVM optimizes Rust the way we want it.

r? compiler
2026-01-19 08:31:31 +01:00
Zalathar
7ec34defe9 Temporarily re-export assert_matches! to reduce stabilization churn 2026-01-19 18:26:53 +11:00
Edvin Bryntesson
9a931e8bf2
Port #[rustc_allocator_zeroed_variant] to attr parser 2026-01-18 20:13:13 +01:00
Manuel Drehwald
5c85d522d0 Generate global openmp metadata to trigger llvm openmp-opt pass 2026-01-16 14:57:32 -05:00
Jacob Pratt
6912c676cd
Rollup merge of #150607 - dispatch-ptr-intrinsic, r=workingjubilee
Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang/rust#135024
2026-01-15 19:35:46 -05:00
The 8472
3df0dc8803 mark rust_dealloc as captures(address)
Co-authored-by: Ralf Jung <post@ralfj.de>
2026-01-15 20:38:40 +01:00
Jieyou Xu
cd79ff2e2c
Revert "avoid phi node for pointers flowing into Vec appends #130998"
This reverts PR <https://github.com/rust-lang/rust/pull/130998> because
the added test seems to be flaky / non-deterministic, and has been
failing in unrelated PRs during merge CI.
2026-01-15 09:37:16 +08:00
Jonathan Brouwer
d23e780a57
Rollup merge of #150966 - arch-powerpc64le, r=petrochenkov
rustc_target: Remove unused Arch::PowerPC64LE

This variant has been added in https://github.com/rust-lang/rust/pull/147645, but actually unused since target_arch for powerpc64le- targets is "powerpc64". (The difference between powerpc64- and powerpc64le- targets is identified by target_endian.)

Note: This is an internal cleanup and does NOT remove `powerpc64le-*` targets.
2026-01-14 22:29:57 +01:00
bors
86a49fd71f Auto merge of #130998 - the8472:bail-before-memcpy, r=nnethercote
avoid phi node for pointers flowing into Vec appends

Elide temporary allocations in patterns like `vec.append(slice.to_vec())`

related discussion: https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/nocapture.20and.20allocation.20elimination
2026-01-14 16:36:26 +00:00
Taiki Endo
7d80e7d720 rustc_target: Remove unused Arch::PowerPC64LE
target_arch for powerpc64le- targets is "powerpc64".
2026-01-14 23:12:57 +09:00
Marcelo Domínguez
2c9c5d14a2 Allow bounded types 2026-01-14 11:37:31 +01:00
Marcelo Domínguez
bc751adcdb Minor doc and ty fixes 2026-01-14 11:37:31 +01:00
Nicholas Nethercote
3aa31788b5 Remove Deref/DerefMut impl for Providers.
It's described as a "backwards compatibility hack to keep the diff
small". Removing it requires only a modest amount of churn, and the
resulting code is clearer without the invisible derefs.
2026-01-14 15:55:59 +11:00
The 8472
e6071522db mark rust_dealloc as captures(address)
Co-authored-by: Ralf Jung <post@ralfj.de>
2026-01-12 02:54:22 +01:00
Nicholas Nethercote
5e510929c6 Remove useless call to erase_and_anonymize_regions.
The only thing we do with the result is consult the `.def_id` field,
which is unaffected by `erase_and_anonymize_regions`.
2026-01-12 09:22:58 +11:00
Matthias Krüger
cbdfa9167f
Rollup merge of #150908 - llvm-f16-cfg, r=nikic
llvm: Update `reliable_f16` configuration for LLVM22

Since yesterday, the LLVM `main` branch should have working `f16` on all platforms that Rust supports; this will be LLVM version 22, so update how `cfg(target_has_reliable_f16)` is set to reflect this.

Within the rust-lang organization, this currently has no effect. The goal is to start catching problems as early as possible in external CI that runs top-of-tree rust against top-of-tree LLVM, and once testing for the rust-lang bump to LLVM 22 starts. Hopefully this will mean that we can fix any problems that show up before the bump actually happens, meaning `f16` will be about ready for stabilization at that point (with some considerations for the GCC patch at [1] propagating).

References:

* 919021b0df
* 054ee2f870
* db26ce5c55
* 549d7c4f35
* 4903c6260c

[1]: 8b6a18ecaf
2026-01-11 09:56:50 +01:00
Stuart Cook
30585ebbd3
Rollup merge of #150494 - extern_linkage_dso_local, r=bjorn3
Fix dso_local for external statics with linkage

Tracking issue of the feature: rust-lang/rust#127488

DSO local attributes are not correctly applied to extern statics with `#[linkage = "foo"]` as we generate an internal global for such statics, and the we evaluate (and apply) DSO attributes on the internal one instead.

Fix this by applying DSO local attributes on the actually extern ones, too.
2026-01-11 14:27:55 +11:00
Trevor Gross
07fa70e104 llvm: Update reliable_f16 configuration for LLVM22
Since yesterday, the LLVM `main` branch should have working `f16` on all
platforms that Rust supports; this will be LLVM version 22, so update
how `cfg(target_has_reliable_f16)` is set to reflect this.

Within the rust-lang organization, this currently has no effect. The
goal is to start catching problems as early as possible in external CI
that runs top-of-tree rust against top-of-tree LLVM, and once testing
for the rust-lang bump to LLVM 22 starts. Hopefully this will mean that
we can fix any problems that show up before the bump actually happens,
meaning `f16` will be about ready for stabilization at that point (with
some considerations for the GCC patch at [1] propagating).

References:

* 919021b0df
* 054ee2f870
* db26ce5c55
* 549d7c4f35
* 4903c6260c

[1]: 8b6a18ecaf
2026-01-09 19:44:24 -06:00
Tshepang Mbambo
229673ac85
make sentence more simple 2026-01-09 22:49:32 +02:00
Tshepang Mbambo
8e61f0de27
cg_llvm: add a pause to make comment less confusing 2026-01-09 22:47:59 +02:00
Flakebi
91d4e40e02
Add amdgpu_dispatch_ptr intrinsic
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.

The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.

The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
2026-01-09 10:41:37 +01:00
Matthias Krüger
d464630301
Rollup merge of #150811 - defid-aliases, r=bjorn3
Store defids instead of symbol names in the aliases list

I was honestly surprised this worked in the past. This causes a cycle error since we now compute a symbol name in codegen_attrs, and then compute codegen attrs when we try to get the symbol name.

It only worked when there weren't any codegen attributes to begin with, causing symbol name computation to skip the call to codegen_attrs.

Like this we won't have the same problem.

r? @bjorn3
2026-01-08 22:21:21 +01:00