replace box_new with lower-level intrinsics
The `box_new` intrinsic is super special: during THIR construction it turns into an `ExprKind::Box` (formerly known as the `box` keyword), which then during MIR building turns into a special instruction sequence that invokes the exchange_malloc lang item (which has a name from a different time) and a special MIR statement to represent a shallowly-initialized `Box` (which raises [interesting opsem questions](https://github.com/rust-lang/rust/issues/97270)).
This PR is the n-th attempt to get rid of `box_new`. That's non-trivial because it usually causes a perf regression: replacing `box_new` by naive unsafe code will incur extra copies in debug builds, making the resulting binaries a lot slower, and will generate a lot more MIR, making compilation measurably slower. Furthermore, `vec!` is a macro, so the exact code it expands to is highly relevant for borrow checking, type inference, and temporary scopes.
To avoid those problems, this PR does its best to make the MIR almost exactly the same as what it was before. `box_new` is used in two places, `Box::new` and `vec!`:
- For `Box::new` that is fairly easy: the `move_by_value` intrinsic is basically all we need. However, to avoid the extra copy that would usually be generated for the argument of a function call, we need to special-case this intrinsic during MIR building. That's what the first commit does.
- `vec!` is a lot more tricky. As a macro, its details leak to stable code, so almost every variant I tried broke either type inference or the lifetimes of temporaries in some ui test or ended up accepting unsound code due to the borrow checker not enforcing all the constraints I hoped it would enforce. I ended up with a variant that involves a new intrinsic, `fn write_box_via_move<T>(b: Box<MaybeUninit<T>>, x: T) -> Box<MaybeUninit<T>>`, that writes a value into a `Box<MaybeUninit<T>>` and returns that box again. In exchange we can get rid of somewhat similar code in the lowering for `ExprKind::Box`, and the `exchange_malloc` lang item. (We can also get rid of `Rvalue::ShallowInitBox`; I didn't include that in this PR -- I think @cjgillot has a commit for this somewhere [around here](https://github.com/rust-lang/rust/pull/147862/commits).)
See [here](https://github.com/rust-lang/rust/pull/148190#issuecomment-3457454814) for the latest perf numbers. Most of the regressions are in deep-vector which consists entirely of an invocation of `vec!`, so any change to that macro affects this benchmark disproportionally.
This is my first time even looking at MIR building code, so I am very low confidence in that part of the patch, in particular when it comes to scopes and drops and things like that.
I also had do nerf some clippy tests because clippy gets confused by the new expansion of `vec!` so it makes fewer suggestions when `vec!` is involved.
### `vec!` FAQ
- Why does `write_box_via_move` return the `Box` again? Because we need to expand `vec!` to a bunch of method invocations without any blocks or let-statements, or else the temporary scopes (and type inference) don't work out.
- Why is `box_assume_init_into_vec_unsafe` (unsoundly!) a safe function? Because we can't use an unsafe block in `vec!` as that would necessarily also include the `$x` (due to it all being one big method invocation) and therefore interpret the user's code as being inside `unsafe`, which would be bad (and 10 years later, we still don't have safe blocks for macros like this).
- Why does `write_box_via_move` use `Box` as input/output type, and not, say, raw pointers? Because that is the only way to get the correct behavior when `$x` panics or has control effects: we need the `Box` to be dropped in that case. (As a nice side-effect this also makes the intrinsic safe, which is imported as explained in the previous bullet.)
- Can't we make it safe by having `write_box_via_move` return `Box<T>`? Yes we could, but there's no easy way for the intrinsic to convert its `Box<MaybeUninit<T>>` to a `Box<T>`. Transmuting would be unsound as the borrow checker would no longer properly enforce that lifetimes involved in a `vec!` invocation behave correctly.
- Is this macro truly cursed? Yes, yes it is.
New float tests in core are failing on clif with issues like the
following:
Undefined symbols for architecture arm64:
"_coshf128", referenced from:
__RNvMNtCshY0fR2o0hOA_3std4f128C4f1284coshCs5TKtJxXQNGL_9coretests in coretests-e38519c0cc90db54.coretests.44b6247a565e10d1-cgu.10.rcgu.o
"_exp2f128", referenced from:
__RNvMNtCshY0fR2o0hOA_3std4f128C4f1284exp2Cs5TKtJxXQNGL_9coretests in coretests-e38519c0cc90db54.coretests.44b6247a565e10d1-cgu.10.rcgu.o
...
Disable f128 math unless the symbols are known to be available, which
for now is only glibc targets. This matches the LLVM backend.
abi: add a rust-preserve-none calling convention
This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature.
For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
This is the conceptual opposite of the rust-cold calling convention and
is particularly useful in combination with the new `explicit_tail_calls`
feature.
For relatively tight loops implemented with tail calling (`become`) each
of the function with the regular calling convention is still responsible
for restoring the initial value of the preserved registers. So it is not
unusual to end up with a situation where each step in the tail call loop
is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers
are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived
and more like a conceptual inverse of `rust-cold`, but could not come
with a great name (`rust-cold` is itself not a great name: cold in what
context? from which perspective? is it supposed to mean that the
function is rarely called?)
* Fix CI disk space issue for abi-cafe tests
Port disk space cleanup script from rust-lang/rust to free space on
GitHub Actions runners before running abi-cafe tests.
* ci: update free-disk-space.sh to match rust-lang/rust@d29e478
* ci: set RUNNER_ENVIRONMENT=github-hosted for free-disk-space script
* Revert indentation change
---------
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
`c_variadic`: impl `va_copy` and `va_end` as Rust intrinsics
tracking issue: https://github.com/rust-lang/rust/issues/44930
Implement `va_copy` as (the rust equivalent of) `memcpy`, which is the behavior of all current LLVM targets. By providing our own implementation, we can guarantee its behavior. These guarantees are important for implementing c-variadics in e.g. const-eval.
Discussed in [#t-compiler/const-eval > c-variadics in const-eval](https://rust-lang.zulipchat.com/#narrow/channel/146212-t-compiler.2Fconst-eval/topic/c-variadics.20in.20const-eval/with/565509704).
I've also updated the comment for `Drop` a bit. The background here is that the C standard requires that `va_end` is used in the same function (and really, in the same scope) as the corresponding `va_start` or `va_copy`. That is because historically `va_start` would start a scope, which `va_end` would then close. e.g.
https://softwarepreservation.computerhistory.org/c_plus_plus/cfront/release_3.0.3/source/incl-master/proto-headers/stdarg.sol
```c
#define va_start(ap, parmN) {\
va_buf _va;\
_vastart(ap = (va_list)_va, (char *)&parmN + sizeof parmN)
#define va_end(ap) }
#define va_arg(ap, mode) *((mode *)_vaarg(ap, sizeof (mode)))
```
The C standard still has to consider such implementations, but for Rust they are irrelevant. Hence we can use `Clone` for `va_copy` and `Drop` for `va_end`.