`c_variadic`: impl `va_copy` and `va_end` as Rust intrinsics
tracking issue: https://github.com/rust-lang/rust/issues/44930
Implement `va_copy` as (the rust equivalent of) `memcpy`, which is the behavior of all current LLVM targets. By providing our own implementation, we can guarantee its behavior. These guarantees are important for implementing c-variadics in e.g. const-eval.
Discussed in [#t-compiler/const-eval > c-variadics in const-eval](https://rust-lang.zulipchat.com/#narrow/channel/146212-t-compiler.2Fconst-eval/topic/c-variadics.20in.20const-eval/with/565509704).
I've also updated the comment for `Drop` a bit. The background here is that the C standard requires that `va_end` is used in the same function (and really, in the same scope) as the corresponding `va_start` or `va_copy`. That is because historically `va_start` would start a scope, which `va_end` would then close. e.g.
https://softwarepreservation.computerhistory.org/c_plus_plus/cfront/release_3.0.3/source/incl-master/proto-headers/stdarg.sol
```c
#define va_start(ap, parmN) {\
va_buf _va;\
_vastart(ap = (va_list)_va, (char *)&parmN + sizeof parmN)
#define va_end(ap) }
#define va_arg(ap, mode) *((mode *)_vaarg(ap, sizeof (mode)))
```
The C standard still has to consider such implementations, but for Rust they are irrelevant. Hence we can use `Clone` for `va_copy` and `Drop` for `va_end`.
As Intel has walked back on the existence of AVX 10.1-256, LLVM
no longer uses evex512 and avx-10.n-512 are now avx-10.n instead,
so we can skip all the special handling on LLVM 22.
Add scalar support for offload
This PR adds scalar support to the offload feature. The scalar management has two main parts:
On the host side, each scalar arg is casted to `ix` type, zero extended to `i64` and passed to the kernel like that.
On the device, the each scalar arg (`i64` at that point), is truncated to `ix` and then casted to the original type.
r? @ZuseZ4
Generate openmp metadata
LLVM has an openmp-opt pass, which is part of the default O3 pipeline.
The pass bails if we don't have a global called openmp, so let's generate it if people enable our experimental offload feature. openmp is a superset of the offload feature, so they share optimizations.
In follow-up PRs I'll start verifying that LLVM optimizes Rust the way we want it.
r? compiler
Add amdgpu_dispatch_ptr intrinsic
There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.
The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?
Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
Tracking issue: rust-lang/rust#135024
This reverts PR <https://github.com/rust-lang/rust/pull/130998> because
the added test seems to be flaky / non-deterministic, and has been
failing in unrelated PRs during merge CI.
rustc_target: Remove unused Arch::PowerPC64LE
This variant has been added in https://github.com/rust-lang/rust/pull/147645, but actually unused since target_arch for powerpc64le- targets is "powerpc64". (The difference between powerpc64- and powerpc64le- targets is identified by target_endian.)
Note: This is an internal cleanup and does NOT remove `powerpc64le-*` targets.
It's described as a "backwards compatibility hack to keep the diff
small". Removing it requires only a modest amount of churn, and the
resulting code is clearer without the invisible derefs.
llvm: Update `reliable_f16` configuration for LLVM22
Since yesterday, the LLVM `main` branch should have working `f16` on all platforms that Rust supports; this will be LLVM version 22, so update how `cfg(target_has_reliable_f16)` is set to reflect this.
Within the rust-lang organization, this currently has no effect. The goal is to start catching problems as early as possible in external CI that runs top-of-tree rust against top-of-tree LLVM, and once testing for the rust-lang bump to LLVM 22 starts. Hopefully this will mean that we can fix any problems that show up before the bump actually happens, meaning `f16` will be about ready for stabilization at that point (with some considerations for the GCC patch at [1] propagating).
References:
* 919021b0df
* 054ee2f870
* db26ce5c55
* 549d7c4f35
* 4903c6260c
[1]: 8b6a18ecaf
Fix dso_local for external statics with linkage
Tracking issue of the feature: rust-lang/rust#127488
DSO local attributes are not correctly applied to extern statics with `#[linkage = "foo"]` as we generate an internal global for such statics, and the we evaluate (and apply) DSO attributes on the internal one instead.
Fix this by applying DSO local attributes on the actually extern ones, too.
Since yesterday, the LLVM `main` branch should have working `f16` on all
platforms that Rust supports; this will be LLVM version 22, so update
how `cfg(target_has_reliable_f16)` is set to reflect this.
Within the rust-lang organization, this currently has no effect. The
goal is to start catching problems as early as possible in external CI
that runs top-of-tree rust against top-of-tree LLVM, and once testing
for the rust-lang bump to LLVM 22 starts. Hopefully this will mean that
we can fix any problems that show up before the bump actually happens,
meaning `f16` will be about ready for stabilization at that point (with
some considerations for the GCC patch at [1] propagating).
References:
* 919021b0df
* 054ee2f870
* db26ce5c55
* 549d7c4f35
* 4903c6260c
[1]: 8b6a18ecaf
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.
The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.
The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.
Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
Store defids instead of symbol names in the aliases list
I was honestly surprised this worked in the past. This causes a cycle error since we now compute a symbol name in codegen_attrs, and then compute codegen attrs when we try to get the symbol name.
It only worked when there weren't any codegen attributes to begin with, causing symbol name computation to skip the call to codegen_attrs.
Like this we won't have the same problem.
r? @bjorn3
`c_variadic`: provide our own `va_arg` implementation for more targets
tracking issue: https://github.com/rust-lang/rust/issues/44930
Provide our own implementations in order to guarantee the behavior of `va_arg`. We will only be able to stabilize `c_variadic` on targets where we know and guarantee the properties of `va_arg`.
r? workingjubilee
tests/ui/runtime/on-broken-pipe/with-rustc_main.rs: Not needed so remove
related: https://github.com/rust-lang/rust/issues/145899#issuecomment-3705550673
print error from EnzymeWrapper::get_or_init(sysroot) as a note
r? @ZuseZ4
e.g.
1. when libEnzyme not found
```shell
$ rustc +stage1 -Z autodiff=Enable -C lto=fat src/main.rs
error: autodiff backend not found in the sysroot: failed to find a `libEnzyme-21` folder in the sysroot candidates:
* /Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib
|
= note: it will be distributed via rustup in the future
```
2. when could not load libEnzyme successfully
```shell
rustc +stage1 -Z autodiff=Enable -C lto=fat src/main.rs
error: failed to load our autodiff backend: DlOpen { source: "dlopen(/Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib/rustlib/aarch64-apple-darwin/lib/libEnzyme-21.dylib, 0x0005): tried: \'/Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib/rustlib/aarch64-apple-darwin/lib/libEnzyme-21.dylib\' (slice is not valid mach-o file), \'/System/Volumes/Preboot/Cryptexes/OS/Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib/rustlib/aarch64-apple-darwin/lib/libEnzyme-21.dylib\' (no such file), \'/Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib/rustlib/aarch64-apple-darwin/lib/libEnzyme-21.dylib\' (slice is not valid mach-o file)" }
```
This flag allows specifying the threshold size for placing static data
in large data sections when using the medium code model on x86-64.
When using -Ccode-model=medium, data smaller than this threshold uses
RIP-relative addressing (32-bit offsets), while larger data uses
absolute 64-bit addressing. This allows the compiler to generate more
efficient code for smaller data while still supporting data larger than
2GB.
This mirrors the -mlarge-data-threshold flag available in GCC and Clang.
The default threshold is 65536 bytes (64KB) if not specified, matching
LLVM's default behavior.