This was a bit more invasive than I had kind of hoped. An alternate
approach would be to add an extra call_intrinsic_with_attrs() that would
have the new-in-this-change signature for call_intrinsic, but this felt
about equivalent and made it a little easier to audit the relevant
callsites of call_intrinsic().
Add a test for the cold attribute
This adds a test for the cold attribute to verify that it actually does something, and that it applies correctly in all the positions it is expected to work.
Offload host2
r? `@oli-obk`
A follow-up to my previous gpu host PR. With this, I can (in theory) run a sufficiently simple Rust function on GPUs. I tested it on AMD, where the amdgcn tartget of rustc causes issues due to Addressspace castings, which might not be valid. If I (manually) fix them, I can run the generated IR on an AMD GPU. This should conceptually also work on NVIDIA or Intel. I updated the dev-guide acordingly: https://rustc-dev-guide.rust-lang.org/offload/usage.html
I am unhappy with the amount of standalone functions in my offload code, so in my second commit I bundled some of the code around two structs which are Rust versions of the LLVM/Offload structs which they represent. The structs themselves only have doc comments. Since I directly lower everything to llvm-ir I didn't saw a big value in modelling the struct member variables.
Add vsx register support for ppc inline asm, and implement preserves_flag option
This should address the last(?) missing pieces of inline asm for ppc:
* Explicit VSX register support. ISA 2.06 (POWER7) added a 64x128b register overlay extending the fpr's to 128b, and unifies them with the vmx (altivec) registers. Implementations details within gcc/llvm percolate up, and require using the `x` template modifier. I have updated the inline asm to implicitly include this for vsx arguments which do not specify it. ~~Support for the gcc codegen backend is still a todo.~~
* Implement the `preserves_flags` option. All ABI's, and all ISAs store their flags in `cr`, and the carry bit lives inside `xer`. The other status registers hold sticky bits or control bits which do not affect branch instructions.
There is some interest in the e500 (powerpcspe) port. Architecturally, it has a very different FP ISA, and includes a simd extension called SPR (which is not IBM's cell SPE). Notably, it does not have altivec/fpr/vsx registers. It also has an SPE accumulator register which its ABI marks as volatile, but I am not sure if the compiler uses it.
Implemented preserves_flags on powerpc by making it do
nothing. This prevents having two different ways to mark
`cr0` as clobbered. clang and gcc alias `cr0` to `cc`.
The gcc inline documentation does not state what this does
on powerpc* targets, but inspection of the source shows
it is equivalent to condition register field `cr0`, so it
should not be added.
Where supported, VSX is a 64x128b register set which encompasses
both the floating point and vector registers.
In the type tests, xvsqrtdp is used as it is the only two-argument
vsx opcode supported by all targets on llvm. If you need to copy
a vsx register, the preferred way is "xxlor xt, xa, xa".
This adds a test for the cold attribute to verify that it actually does
something, and that it applies correctly in all the positions it is
expected to work.
kcfi: only reify trait methods when dyn-compatible
fixes https://github.com/rust-lang/rust/issues/146853
Only generate a `ReifyShim` for trait method calls if the trait is dyn-compatible.
Until now kcfi would generate a `ReifyShim` whenever a trait method was cast to a function pointer. But technically the shim is only needed for dyn-compatible traits (where the method might end up in a vtable).
Up to this point that was only slightly inefficient, but in combination with c-variadic trait methods it is wrong. For c-variadic trait methods the generated shim is incorrect, and that is why c-variadic methods make a trait no longer dyn-compatible: we should simply never generate a `ReifyShim` that is c-variadic.
With this change the documentation on `ReifyReason` is now actually correct:
> If KCFI is enabled, creating a function pointer from a method on a dyn-compatible trait. This includes the case of converting `::call`-like methods on closure-likes to function pointers.
cc ```@maurer``` ```@workingjubilee```
r? ```@rcvalle```
bless autodiff batching test
This pr blesses a broken test and unblocks running rust in the Enzyme CI: https://github.com/EnzymeAD/Enzyme/pull/2430
Enzyme is the plugin used by our std::autodiff and (future) std::batching modules, both of which are not build by default.
In the near future we also hope to enable std::autodiff in the Rust CI.
This test is the only one to combine two features, automatic differentiation and batching/vectorization. This combination is even more experimental than either feature on its own. I have a wip branch in which I enable more vectorization/batching and as part of that I'll think more about how to write those tests in a robust way (and likely change the interface). Until that lands, I don't care too much about what specific IR we generate here; it's just nice to track changes.
r? compiler
If the `LocalRef` is `LocalRef::Place`, we can refer to it directly,
because the local of place is an indirect pointer.
Such a statement is `_1 = &(_2.1)`.
If the `LocalRef` is `LocalRef::Operand`,
the `OperandRef` should provide the pointer of the reference.
Such a statement is `_1 = &((*_2).1)`.
But there is a special case that hasn't been handled, scalar pairs like `(&[i32; 16], i32)`.
Fix autodiff empty ret regression
closes https://github.com/rust-lang/rust/issues/147144
The two gsoc summer projects caused a bit of churn, which was to be expected, especially since we don't run autodiff in CI yet.
This adds a void return testcase that we should have had anyway, and fixes the regression.
r? `@Zalathar` (Just guessing since I've seen you in a few LLVM PRs and Oli is probably still busy. Feel free to reroll!)
Skip cleanups on unsupported targets
This commit is an update to the `AbortUnwindingCalls` MIR pass in the compiler. Specifically a new boolean is added for "can this target possibly unwind" and if that's `false` then terminators are all adjusted to be unreachable/not present. The end result is that this fixesrust-lang/rust#140293 for wasm targets.
The motivation for this PR is that currently on WebAssembly targets the usage of the `C-unwind` ABI can lead LLVM to either (a) emit exception-handling instructions or (b) hit a LLVM-ICE-style codegen error. WebAssembly as a base instruction set does not support unwinding at all, and a later proposal to WebAssembly, the exception-handling proposal, was what enabled this. This means that the current intent of WebAssembly targets is that they maintain the baseline of "don't emit exception-handling instructions unless enabled". The commit here is intended to restore this behavior by skipping these instructions even when `C-unwind` is present.
Exception-handling is a relatively tricky and also murky topic in WebAssembly, however. There are two sets of instructions LLVM can emit for WebAssembly exceptions, Rust's Emscripten target supports exceptions, WASI targets do not, the LLVM flags to enable this are not always obvious, and additionally this all touches on "changing exception-handling behavior should be a target-level concern, not a feature". Effectively WebAssembly's exception-handling integration into Rust is not finalized at this time. The best idea at this time is that a parallel set of targets will eventually be added which support exceptions, but it's not clear if/when to do this. In the meantime the goal is to keep existing targets working while still enabling experimentation with exception-handling with `-Zbuild-std` and various permutations of LLVM flags.
To that extent this commit does not blanket disable these landing pads and cleanup routines for WebAssembly but instead checks to see if panic=unwind is enabled or if `+exception-handling` is enabled. Tests are updated here as well to account for this where, by default, using a `C-unwind` ABI won't affect Rust codegen at all. If `+exception-handling` is enabled, however, then Rust codegen will look like native platforms where exceptions are caught and the program aborts. More-or-less I've done my best to keep exceptions working on wasm where it's possible to have them work, but turned them off where they're not supposed to be emitted.
Closesrust-lang/rust#140293
tests: relax expectations after llvm change 902ddda120a5
LLVM 22 is able to drop assumes that seem to not help further optimizations, which actually seems to dramatically _help_ further optimizations in some of our small test cases.
I'm a little unclear how to fix the last failure, in `tests/codegen-llvm/issues/issue-122600-ptr-discriminant-update.rs`:
```
-; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite, inaccessiblemem: write) uwtable
+; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(argmem: readwrite, inaccessiblemem: write) uwtable
define void ``@update(ptr`` noundef captures(none) %s) unnamed_addr #0 {
start:
- %_3.sroa.0.0.copyload = load i8, ptr %s, align 1
- %0 = trunc nuw i8 %_3.sroa.0.0.copyload to i1
- %1 = xor i1 %0, true
- tail call void ``@llvm.assume(i1`` %1)
store i8 1, ptr %s, align 1
ret void
}
```
I'm just not conversant enough in LLVM IR to follow the changes here.
``@rustbot`` label llvm-main
r? nikic
Add attributes for #[global_allocator] functions
Emit `#[rustc_allocator]` etc. attributes on the functions generated by the `#[global_allocator]` macro, which will emit LLVM attributes like `"alloc-family"`. If the module with the global allocator participates in LTO, this ensures that the attributes typically emitted on the allocator declarations are not lost if the definition is imported.
There is a similar issue when the allocator shim is used, but I've opted not to fix that case in this PR, because doing that cleanly is somewhat gnarly.
Related to https://github.com/rust-lang/rust/issues/145995.
Emit `#[rustc_allocator]` etc. attributes on the functions generated
by the `#[global_allocator]` macro, which will emit LLVM attributes
like `"alloc-family"`. If the module with the global allocator
participates in LTO, this ensures that the attributes typically
emitted on the allocator declarations are not lost if the
definition is imported.
Mark float intrinsics with no preconditions as safe
Note: for ease of reviewing, the list of safe intrinsics is sorted in the first commit, and then safe intrinsics are added in the second commit.
All *recently added* float intrinsics have been correctly marked as safe to call due to the fact that they have no preconditions. This adds the remaining float intrinsics which are safe to call to the safe intrinsic list, and removes the unsafe blocks around their calls.
---
Side note: this may want a try run before being added to the queue, since I'm not sure if there's any tier-2 code that uses these intrinsics that might not be tested on the usual PR flow. We've already uncovered a few places in subtrees that do this, and it's worth double-checking before clogging up the queue.
emit attribute for readonly non-pure inline assembly
fixes https://github.com/rust-lang/rust/issues/146761
Provide a better `MemoryEffects` to LLVM when an inline assembly block specifies `readonly` but not `pure`. That means that the assembly block may not perform any writes, but that there still may be side effects from its instructions.
I haven't been able to find a case yet where this actually matters, though. So the test checks that the right attribute is applied, but the generated assembly is equivalent to not specifying `readonly` at all.
r? ````@nikic````
cc ````@Amanieu````
LLVM 22 is able to drop assumes that seem to not help further
optimizations, which actually seems to dramatically _help_ further
optimizations in some of our small test cases.