implement `carryless_mul`
tracking issue: https://github.com/rust-lang/rust/issues/152080
ACP: https://github.com/rust-lang/libs-team/issues/738
This defers to LLVM's `llvm.clmul` when available, and otherwise falls back to a method from the `polyval` crate ([link](https://github.com/RustCrypto/universal-hashes/blob/master/polyval/src/field_element/soft/soft64.rs)).
Some things are missing, which I think we can defer:
- the ACP has some discussion about additional methods, but I'm not sure exactly what is wanted or how to implement it efficiently
- the SIMD intrinsic is not yet `const` (I think I ran into a bootstrapping issue). That is fine for now, I think in `stdarch` we can't really use this intrinsic at the moment, we'd only want the scalar version to replace some riscv intrinsics.
- the SIMD intrinsic is not implemented for the gcc and cranelift backends. That should be reasonably straightforward once we have a const eval implementation though.
Set hidden visibility on naked functions in compiler-builtins
88b46460fa made builtin functions hidden, but it doesn't apply to naked functions, which are generated through a different code path.
This was discovered in rust-lang/rust#151486 where aarch64 outline atomics showed up in shared objects, overriding the symbols from compiler-rt.
Start using pattern types in libcore (NonZero and friends)
part of rust-lang/rust#136006
This PR only changes the internal representation of `NonZero`, `NonMax`, ... and other integral range types in libcore. This subsequently affects other types made up of it, but nothing really changes except that the field of `NonZero` is now accessible safely in contrast to the `rustc_layout_scalar_range_start` attribute, which has all kinds of obscure rules on how to properly access its field.
llvm will look at both
1. the values of "target-features" and
2. the function string attributes.
this removes the redundant function string attribute because it is not needed at all.
rustc sets the `+backchain` attribute through `target_features_attr(...)`
Don't compute FnAbi for LLVM intrinsics in backends
~~This removes support for `extern "unadjusted"` for anything other than LLVM intrinsics. It only makes sense in the context of calling LLVM intrinsics anyway as it exposes the way the LLVM backend internally represents types. Perhaps it should be renamed to `extern "llvm-intrinsic"`?~~
Follow up to https://github.com/rust-lang/rust/pull/148533
Cleanup offload datatransfer
There are 3 steps to run code on a GPU: Copy data from the host to the device, launch the kernel, and move it back.
At the moment, we have a single variable describing the memory handling to do in each step, but that makes it hard for LLVM's opt pass to understand what's going on. We therefore split it into three variables, each only including the bits relevant for the corresponding stage.
cc @jdoerfert @kevinsala
r? compiler
c-variadic: make `va_arg` match on `Arch` exhaustive
tracking issue: https://github.com/rust-lang/rust/issues/44930
Continuing from https://github.com/rust-lang/rust/pull/150094, the more annoying cases remain. These are mostly very niche targets without Clang `va_arg` implementations, and so it might just be easier to defer to LLVM instead of us getting the ABI subtly wrong. That does mean we cannot stabilize c-variadic on those targets I think.
Alternatively we could ask target maintainers to contribute an implementation. I'd honestly prefer they make that change to LVM though (likely by just using `CodeGen::emitVoidPtrVAArg`) that we can mirror.
r? @workingjubilee
Remove dummy loads on offload codegen
The current logic generates two dummy loads to prevent some globals from being optimized away. This blocks memtransfer loop hoisting optimizations, so it's time to remove them.
r? @ZuseZ4
skip codegen for intrinsics with big fallback bodies if backend does not need them
This hopefully fixes the perf regression from https://github.com/rust-lang/rust/pull/148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list.
Fixes https://github.com/rust-lang/rust/issues/149945
Cc @scottmcm @bjorn3
offload: move (un)register lib into global_ctors
Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.
Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in https://github.com/rust-lang/rust/pull/150683, where I introduce a new variant of our offload intrinsic.
r? oli-obk
abi: add a rust-preserve-none calling convention
This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature.
For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
This is the conceptual opposite of the rust-cold calling convention and
is particularly useful in combination with the new `explicit_tail_calls`
feature.
For relatively tight loops implemented with tail calling (`become`) each
of the function with the regular calling convention is still responsible
for restoring the initial value of the preserved registers. So it is not
unusual to end up with a situation where each step in the tail call loop
is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers
are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived
and more like a conceptual inverse of `rust-cold`, but could not come
with a great name (`rust-cold` is itself not a great name: cold in what
context? from which perspective? is it supposed to mean that the
function is rarely called?)
Clean up or resolve cfg-related instances of `FIXME(f16_f128)`
* Replace target-specific config that has a `FIXME` with `cfg(target_has_reliable_f*)`
* Take care of trivial intrinsic-related FIXMEs
* Split `FIXME(f16_f128)` into `FIXME(f16)`, `FIXME(f128)`, or `FIXME(f16,f128)` to more clearly identify what they block
The individual commit messages have more details.
Add -Z large-data-threshold
This flag allows specifying the threshold size for placing static data in large data sections when using the medium code model on x86-64.
When using -Ccode-model=medium, data smaller than this threshold uses RIP-relative addressing (32-bit offsets), while larger data uses absolute 64-bit addressing. This allows the compiler to generate more efficient code for smaller data while still supporting data larger than 2GB.
This mirrors the -mlarge-data-threshold flag available in GCC and Clang. The default threshold is 65536 bytes (64KB) if not specified, matching LLVM's default behavior.
codegen: clarify some variable names around function calls
I looked at rust-lang/rust#145932 to try to understand how it works, and quickly got lost in the variable names -- what refers to the caller, what to the callee? So here's my attempt at making those more clear. Hopefully the new names are correct.^^
Cc @JamieCunliffe
Port `#![crate_type]` to the attribute parser
Tracking issue: https://github.com/rust-lang/rust/issues/131229
~~Note that the actual parsing that is used in the compiler session is unchanged, as it must happen very early on; this just ports the validation logic.~~
Also added `// tidy-alphabetical-start` to `check_attr.rs` to make it a bit less conflict-prone