Don't compute FnAbi for LLVM intrinsics in backends
~~This removes support for `extern "unadjusted"` for anything other than LLVM intrinsics. It only makes sense in the context of calling LLVM intrinsics anyway as it exposes the way the LLVM backend internally represents types. Perhaps it should be renamed to `extern "llvm-intrinsic"`?~~
Follow up to https://github.com/rust-lang/rust/pull/148533
abi: add a rust-preserve-none calling convention
This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature.
For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
This is the conceptual opposite of the rust-cold calling convention and
is particularly useful in combination with the new `explicit_tail_calls`
feature.
For relatively tight loops implemented with tail calling (`become`) each
of the function with the regular calling convention is still responsible
for restoring the initial value of the preserved registers. So it is not
unusual to end up with a situation where each step in the tail call loop
is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers
are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived
and more like a conceptual inverse of `rust-cold`, but could not come
with a great name (`rust-cold` is itself not a great name: cold in what
context? from which perspective? is it supposed to mean that the
function is rarely called?)
It's described as a "backwards compatibility hack to keep the diff
small". Removing it requires only a modest amount of churn, and the
resulting code is clearer without the invisible derefs.
Introduces `BackendRepr::ScalableVector` corresponding to scalable
vector types annotated with `repr(scalable)` which lowers to a scalable
vector type in LLVM.
Co-authored-by: Jamie Cunliffe <Jamie.Cunliffe@arm.com>
Overhaul filename handling for cross-compiler consistency
This PR overhauls the way we handle filenames in the compiler and `rmeta` in order to achieve achieve cross-compiler consistency (ie. having the same path no matter if the filename was created in the current compiler session or is coming from `rmeta`).
This is required as some parts of the compiler rely on consistent paths for the soundness of generated code (see rust-lang/rust#148328).
In order to achieved consistency multiple steps are being taken by this PR:
- by making `RealFileName` immutable
- by only having `SourceMap::to_real_filename` create `RealFileName`
- currently `RealFileName` can be created from any `Path` and are remapped afterwards, which creates consistency issue
- by also making `RealFileName` holds it's working directory, embeddable name and the remapped scopes
- this removes the need for a `Session`, to know the current(!) scopes and cwd, which is invalid as they may not be equal to the scopes used when creating the filename
In order for `SourceMap::to_real_filename` to know which scopes to apply `FilePathMapping` now takes the current remapping scopes to apply, which makes `FileNameDisplayPreference` and company useless and are removed.
This PR is split-up in multiple commits (unfortunately not atomic), but should help review the changes.
Unblocks https://github.com/rust-lang/rust/pull/147611
Fixes https://github.com/rust-lang/rust/issues/148328
Update `rustc_codegen_gcc` rotate operation document
## Description
This PR resolves a TODO comment in the `rustc_codegen_gcc` backend by documenting that the rotate operations (`rotate_left` and `rotate_right`) already implement the optimized branchless algorithm from comment.
The existing implementation already uses the optimal branchless rotation pattern:
- For left rotation: `(x << n) | (x >> (-n & (width-1)))`
- For right rotation: `(x >> n) | (x << (-n & (width-1)))`
This pattern avoids branches and generates efficient machine code across different platforms, which was the goal mentioned in the original TODO.
## Changes
- Removed the TODO comment that suggested implementing the algorithm from https://blog.regehr.org/archives/1063
Remove -Zoom=panic
There are major questions remaining about the reentrancy that this allows. It doesn't have any users on github outside of a single project that uses it in a panic=abort project to show backtraces. It can still be emulated through `#[alloc_error_handler]` or `set_alloc_error_hook` depending on if you use the standard library or not. And finally it makes it harder to do various improvements to the allocator shim.
With this PR the sole remaining symbol in the allocator shim that is not effectively emulating weak symbols is the symbol that prevents skipping the allocator shim on stable even when it would otherwise be empty because libstd + `#[global_allocator]` is used.
Closes https://github.com/rust-lang/rust/issues/43596
Fixes https://github.com/rust-lang/rust/issues/126683
This register is only supported on the *powerpc*spe targets. It is
only recognized by LLVM. gcc does not accept this as a clobber, nor
does it support these targets.
This is a volatile register, thus it is included with clobber_abi.
There are major questions remaining about the reentrancy that this
allows. It doesn't have any users on github outside of a single project
that uses it in a panic=abort project to show backtraces. It
can still be emulated through #[alloc_error_handler] or
set_alloc_error_hook depending on if you use the standard library or
not. And finally it makes it harder to do various improvements to the
allocator shim.
This implements a new unstable compiler flag `-Zannotate-moves` that makes
move and copy operations visible in profilers by creating synthetic debug
information. This is achieved with zero runtime cost by manipulating debug
info scopes to make moves/copies appear as calls to `compiler_move<T, SIZE>`
and `compiler_copy<T, SIZE>` marker functions in profiling tools.
This allows developers to identify expensive move/copy operations in their
code using standard profiling tools, without requiring specialized tooling
or runtime instrumentation.
The implementation works at codegen time. When processing MIR operands
(`Operand::Move` and `Operand::Copy`), the codegen creates an `OperandRef`
with an optional `move_annotation` field containing an `Instance` of the
appropriate profiling marker function. When storing the operand,
`store_with_annotation()` wraps the store operation in a synthetic debug
scope that makes it appear inlined from the marker.
Two marker functions (`compiler_move` and `compiler_copy`) are defined
in `library/core/src/profiling.rs`. These are never actually called -
they exist solely as debug info anchors.
Operations are only annotated if the type:
- Meets the size threshold (default: 65 bytes, configurable via
`-Zannotate-moves=SIZE`)
- Has a non-scalar backend representation (scalars use registers,
not memcpy)
This has a very small size impact on object file size. With the default
limit it's well under 0.1%, and even with a very small limit of 8 bytes
it's still ~1.5%. This could be enabled by default.