The attribute now has a size parameter and sorts differently:
* Explicitly omit size parameter during construction on 23+
* Tolerate alternate sorting in tests
https://github.com/llvm/llvm-project/pull/171712
`c_variadic`: impl `va_copy` and `va_end` as Rust intrinsics
tracking issue: https://github.com/rust-lang/rust/issues/44930
Implement `va_copy` as (the rust equivalent of) `memcpy`, which is the behavior of all current LLVM targets. By providing our own implementation, we can guarantee its behavior. These guarantees are important for implementing c-variadics in e.g. const-eval.
Discussed in [#t-compiler/const-eval > c-variadics in const-eval](https://rust-lang.zulipchat.com/#narrow/channel/146212-t-compiler.2Fconst-eval/topic/c-variadics.20in.20const-eval/with/565509704).
I've also updated the comment for `Drop` a bit. The background here is that the C standard requires that `va_end` is used in the same function (and really, in the same scope) as the corresponding `va_start` or `va_copy`. That is because historically `va_start` would start a scope, which `va_end` would then close. e.g.
https://softwarepreservation.computerhistory.org/c_plus_plus/cfront/release_3.0.3/source/incl-master/proto-headers/stdarg.sol
```c
#define va_start(ap, parmN) {\
va_buf _va;\
_vastart(ap = (va_list)_va, (char *)&parmN + sizeof parmN)
#define va_end(ap) }
#define va_arg(ap, mode) *((mode *)_vaarg(ap, sizeof (mode)))
```
The C standard still has to consider such implementations, but for Rust they are irrelevant. Hence we can use `Clone` for `va_copy` and `Drop` for `va_end`.
Add scalar support for offload
This PR adds scalar support to the offload feature. The scalar management has two main parts:
On the host side, each scalar arg is casted to `ix` type, zero extended to `i64` and passed to the kernel like that.
On the device, the each scalar arg (`i64` at that point), is truncated to `ix` and then casted to the original type.
r? @ZuseZ4
Generate openmp metadata
LLVM has an openmp-opt pass, which is part of the default O3 pipeline.
The pass bails if we don't have a global called openmp, so let's generate it if people enable our experimental offload feature. openmp is a superset of the offload feature, so they share optimizations.
In follow-up PRs I'll start verifying that LLVM optimizes Rust the way we want it.
r? compiler
Add amdgpu_dispatch_ptr intrinsic
There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.
The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?
Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
Tracking issue: rust-lang/rust#135024
This reverts PR <https://github.com/rust-lang/rust/pull/130998> because
the added test seems to be flaky / non-deterministic, and has been
failing in unrelated PRs during merge CI.
Avoid should-fail in two ui tests and a codegen-llvm test
`should-fail` is only meant for testing the compiletest framework itself. It checks that the test runner itself panicked.
With this there are still a bunch of rustdoc-html tests that use it due to this test suite not supporting anything like `//@ doc-fail`.
Fix dso_local for external statics with linkage
Tracking issue of the feature: rust-lang/rust#127488
DSO local attributes are not correctly applied to extern statics with `#[linkage = "foo"]` as we generate an internal global for such statics, and the we evaluate (and apply) DSO attributes on the internal one instead.
Fix this by applying DSO local attributes on the actually extern ones, too.
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.
The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.
The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.
Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
Update offload test and verify that tgt_(un)register_lib have the right type
Apparently, we weren't running offload tests when Enzyme wasn't built. Time to fix that.
Also adds a test mode which generates the host IR, but does not expect device IR/artifacts. This way, we don't have to handle artifacts and paths in our tests.
Also removes some outdated documentation.
cc `@Kevinsala,` `@Sa4dUs`
closes: https://github.com/rust-lang/rust/issues/150415
~~blocked on `needs-offload` infrastructure landing in https://github.com/rust-lang/rust/pull/150427~~
Added codegen tests for different forms of `Option::or`
Adds tests to check the output of the different ways of writing `Option::or`
Fixesrust-lang/rust#124533
tests/codegen-llvm/some-non-zero-from-atomic-optimization.rs: New test
Closesrust-lang/rust#60044 which has one 👍 and one ❤️ vote and just **E-needs-test**.
Allow inline calls to offload intrinsic
Removes explicit insertion point handling and recovers the pointer at the end of the saved basic block.
r? `@ZuseZ4`
fixes: https://github.com/rust-lang/rust/issues/150413
This test currently doesn't fulfill its purpose, as `external dso_local`
can still match `external {{.*}}`. Fix this by using CHECK-NOT directives.
Also, this test is expanded to all platforms where it makes sense, instead
of restricting to loongarch.
Replace Rvalue::NullaryOp by a variant in mir::Operand.
Based on https://github.com/rust-lang/rust/pull/148151
This PR fully removes the MIR `Rvalue::NullaryOp`. After rust-lang/rust#148151, it was only useful for runtime checks like `ub_checks`, `contract_checks` and `overflow_checks`.
These are "runtime" checks, boolean constants that may only be `true` in codegen. It depends on a rustc flag passed to codegen, so we need to represent those flags cross-crate.
This PR replaces those runtime checks by special variants in MIR `ConstValue`. This allows code that expects constants to manipulate those as such, even if we may not always be able to evaluate them to actual scalars.
Move shared offload globals and define per-kernel globals once
This PR moves the shared LLVM global variables logic out of the `offload` intrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic.
r? `@ZuseZ4`
tracking:
- https://github.com/rust-lang/rust/issues/131513
`rustc_scalable_vector(N)`
Supercedes rust-lang/rust#118917.
Initial experimental implementation of rust-lang/rfcs#3838. Introduces a `rustc_scalable_vector(N)` attribute that can be applied to types with a single `[$ty]` field (for `u{16,32,64}`, `i{16,32,64}`, `f{32,64}`, `bool`). `rustc_scalable_vector` types are lowered to scalable vectors in the codegen backend.
As with any unstable feature, there will necessarily be follow-ups as we experiment and find cases that we've not considered or still need some logic to handle, but this aims to be a decent baseline to start from.
See rust-lang/rust#145052 for request for a lang experiment.
Introduces `BackendRepr::ScalableVector` corresponding to scalable
vector types annotated with `repr(scalable)` which lowers to a scalable
vector type in LLVM.
Co-authored-by: Jamie Cunliffe <Jamie.Cunliffe@arm.com>
fix va_list test by adding a llvmir signext check
s390x has no option to directly pass 32bit values therefor i32 parameters need an optional llvmir signext attribute.