Add `f16` inline ASM support for s390x
tracking issue: https://github.com/rust-lang/rust/issues/116909
cc https://github.com/rust-lang/rust/issues/125398
Support the `f16x8` type in inline assembly. Only with the `nnp-assist` feature are there any instructions that make use of this type. Based on the riscv implementation I now cast to `i16x8` when that feature is not enabled.
As far as I'm aware there are no instructions operating on `f16` scalar values. Should we still add support for using them in inline assembly?
r? @tgross35
cc @uweigand
Destabilise `target-spec-json`
Per rust-lang/compiler-team#944:
> Per https://github.com/rust-lang/rust/issues/71009, the ability to load target spec JSONs was stabilised accidentally. Within the team, we've always considered the format to be unstable and have changed it freely. This has been feasible as custom targets can only be used with core, like any other target, and so custom targets de-facto require nightly to be used (i.e. to build core manually or use Cargo's -Zbuild-std).
>
> Current build-std RFCs (https://github.com/rust-lang/rfcs/pull/3873, https://github.com/rust-lang/rfcs/pull/3874) propose a mechanism for building core on stable (at the request of Rust for Linux), which combined with a stable target-spec-json format, permit the current format to be used much more widely on stable toolchains. This would prevent us from improving the format - making it less tied to LLVM, switching to TOML, enabling keys in the spec to be stabilised individually, etc.
>
> De-stabilising the format gives us the opportunity to improve the format before it is too challenging to do so. Internal company toolchains and projects like Rust for Linux already use target-spec-json, but must use nightly at some point while doing so, so while it could be inconvenient for those users to destabilise this, it is hoped that an minimal alternative that we could choose to stabilise can be proposed relatively quickly.
Ensure that static initializers are acyclic for NVPTX
NVPTX does not support cycles in static initializers (see rust-lang/rust#146787). LLVM produces an error when attempting to generate code for such constructs, like self-referential structs.
To avoid LLVM UB, we emit a post-monomorphization error on the Rust side before reaching codegen.
This is achieved by analyzing a subgraph of the "mono item graph" that only contains statics.
1. Calculate the strongly connected components (SCCs) of the graph.
2. Check for cycles (more than one node in an SCC or one node that references itself).
NVPTX does not support cycles in static initializers. LLVM produces an error when attempting to codegen such constructs (like self referential structs).
To not produce LLVM UB we instead emit a post-monomorphization error on
Rust side before reaching codegen.
This is achieved by analysing a subgraph of the "mono item graph" that
only contains statics:
1. Calculate the strongly connected components (SCCs) of the graph
2. Check for cycles (more than one node in a SCC or exactly one node
which references itself)
Stabilize 29 RISC-V target features (`riscv_ratified_v2`)
This commit stabilizes RISC-V target features with following constraints:
* Describes a ratified extension.
* Implemented on Rust 1.88.0 or before.
Waiting for four+ version cycles seems sufficient.
* Does not disrupt current rustc's target feature (cf. rust-lang/rust#140570) + ABI (cf. rust-lang/rust#132618) handling.
It excludes `E` and all floating point-arithmetic extensions. The `Zfinx` family does not involve floating point registers but not stabilizing for now to avoid possible confusion between the `F` extension family.
* Not vector-related (floating point and integer).
While integer vector subsets should not cause any ABI issues (as they don't use ABI-dependent floating point registers), we need to discuss before stabilizing them.
* Supported by the lowest LLVM version supported by rustc (LLVM 20).
List of target features to be stabilized:
1. `b`
2. `za64rs` (no-RT)
3. `za128rs` (no-RT)
4. `zaamo`
5. `zabha`
6. `zacas`
7. `zalrsc`
8. `zama16b` (no-RT)
9. `zawrs`
10. `zca`
11. `zcb`
12. `zcmop`
13. `zic64b` (no-RT)
14. `zicbom`
15. `zicbop` (no-RT)
16. `zicboz`
17. `ziccamoa` (no-RT)
18. `ziccif` (no-RT)
19. `zicclsm` (no-RT)
20. `ziccrse` (no-RT)
21. `zicntr`
22. `zicond`
23. `zicsr`
24. `zifencei`
25. `zihintntl`
26. `zihintpause`
27. `zihpm`
28. `zimop`
29. `ztso`
Of which, 20 of them (29 minus 9 "no-RT" target features) support runtime detection through `std::arch::is_riscv_feature_detected!()`.
Corresponding PR for the Reference: rust-lang/reference#1987
Enable thumb interworking on ARMv7A/R and ARMv8R bare-metal targets
This flag enables the `#[instruction_set(arm::t32)]` for the armv7a/armv7r/armv8r targets, which all support Thumb interwork but were missing this flag.
Target maintainers are `@chrisnc,` `@rust-lang/arm-maintainers,` `@rust-embedded/arm` (including me).
RISC-V: Implement (Zkne or Zknd) intrinsics correctly
On rust-lang/stdarch#1765, it has been pointed out that two RISC-V (64-bit only) intrinsics to perform AES key scheduling have wrong target feature.
`aes64ks1i` and `aes64ks2` instructions require *either* Zkne (scalar cryptography: AES encryption) or Zknd (scalar cryptography: AES decryption) extension (or both) but corresponding Rust intrinsics (in `core::arch::riscv64`) required *both* Zkne and Zknd extensions.
An excerpt from the original intrinsics:
```rust
#[target_feature(enable = "zkne", enable = "zknd")]
```
To fix that, we need to:
1. Represent a condition where *either* Zkne or Zknd is available and
2. Workaround an issue: `llvm.riscv.aes64ks1i` / `llvm.riscv.aes64ks2` LLVM intrinsics require either Zkne or Zknd extension.
This PR attempts to resolve them by:
1. Adding a perma-unstable RISC-V target feature: `zkne_or_zknd` (implied from both `zkne` and `zknd`) and
2. Using inline assembly to construct machine code directly (because `zkne_or_zknd` alone cannot imply neither Zkne nor Zknd, we cannot use LLVM intrinsics).
The author confirmed that we can construct an AES key scheduling function with decent performance using fixed `aes64ks1i` and `aes64ks2` intrinsics (with optimization enabled).
Don't use `matches!` when `==` suffices
In the codebase we sometimes use `matches!` for values that can actually just be compared. Replace them with `==`.
Subset of rust-lang/rust#149933.
* No method named `allow_toggle()` exists on the type, but based on the
documentation of both `requires_nightly()` and `toggle_allowed()` it seems that
`toggle_allowed()` is the intended method to mention.
* Add `()` to the mention of `in_cfg()` to make it clear that a method is being
referred to, and to match the presence of `()` in the mention of
`toggle_allowed()`.
Fix d32 usage in Arm target specs
Fixes https://github.com/rust-lang/rust/issues/149399 - after checking with an LLVM engineer Rust's feature implications do correctly map to LLVM's. The target specs originally had +vfp3,+d16, but were mistakenly fixed to +vfp3,-d32 which disables vfp3 again.
Some targets specify +vfp2,-d32, and since vfp2 shouldn't imply d32 the -d32 is unneeded.
The list of Arm features is quite old and since Arm is now a target maintainer of many of them we'll go in and update them. We should probably add vfp3d16 and similar as rust has no way to express these right now after d16 was removed.
The LLVM features expand like this:
```
vfp4 -> vfp3 + fp16 + vfp4d16 + vfp4sp
vfp4d16 -> vfp3d16 + fp16 + fp64 + vfp4d16sp
vfp4sp -> vfp3sp + fp16 + d32 + vfp4d16sp
vfp4d16sp -> vfp3d16sp + fp16
vfp3 -> vfp2 + vfp3d16 + vfp3sp
vfp3d16 -> vfp2 + fp64 + vfp3d16sp
vfp3sp -> vfp2 + d32 + vfp3d16sp
vfp3d16sp -> vfp2sp
vfp2 -> vfp2sp + fp64
vfp2sp -> fpregs
```
`-neon` might be unnecessary too in many of these cases, but some default CPUs that Rust specifies will turn Neon on so that needs a bit more care. I can't see any LLVM cpus that enable D32.
Old description:
> Fixes https://github.com/rust-lang/rust/issues/149399 - this implication was likely a mistake and isn't enforced by LLVM. This is is a breaking change and any users specifying `vfp2/3/4` via `-Ctarget-features` or the `target_feature` attribute will need to add `d32` in to get the same behaviour. The target features are unstable so this is ok for Rust, and this is necessary as otherwise there's no way to specify a `vfp2-d16` configuration, for example.
>
> I expect these targets would have been broken by https://github.com/rust-lang/rust/pull/149173 as `-d32` would have disabled any `+vfpX` feature before it. With the removal of the implication the `-d32` went back to being unnecessary, but I've removed it anyway.
>
> As ``@RalfJung`` pointed out, thumbv7a-nuttx-eabihf looks to have been relying on this implication so I've added `+d32` to it's target spec.
rustc_target: Add `efiapi` ABI support for LoongArch
This commit adds basic `efiapi` ABI support for LoongArch by recognizing `extern "efiapi"` in the ABI map and inline asm clobber handling, and mapping it to the C calling convention.
This change is intentionally submitted ahead of the full LoongArch UEFI target support. While UEFI binaries are ultimately produced as PE images, LoongArch UEFI applications can already be developed by building ELF objects, applying relocation fixups, and converting them to PE in a later step. For such workflows, having `efiapi` properly recognized by the compiler is a prerequisite, even without a dedicated UEFI target.
Landing this ABI support early helps unblock LoongArch UEFI application and driver development, and allows the remaining UEFI-specific pieces to be introduced incrementally in follow-up patches.
MCP: https://github.com/rust-lang/compiler-team/issues/953
Because some AES key scheduling instructions require *either* Zkne or
Zknd extension, we must have a target feature to represent
`(Zkne || Zknd)`.
This commit adds (perma-unstable) target feature to the RISC-V
architecture: `zkne_or_zknd` for this purpose.
Helped-by: sayantn <sayantn05@gmail.com>
Many aarch64 targets without LSE in the baseline enable the
`outline-atomics` feature, which uses runtime detection of LSE for its
faster atomic ops. This provides nontrivial performance improvements on
most hardware from the past decade, at a small cost to anything pre-LSE.
This matches what Clang does [1].
[1]: e24f90190c
The `fmt::Debug` impl for `TyAndLayout<'a, Ty>'` requires `fmt::Display`
on the `Ty` parameter. In `ArgAbi`, `TyAndLayout`'s Ty` is instantiated
with a parameter that implements `TyAbiInterface`. `TyAbiInterface`
only required `fmt::Debug` be implemented on `Self`, not `fmt::Display`,
which meant that it wasn't actually possible to debug print `ArgAbi`.
Introduces `BackendRepr::ScalableVector` corresponding to scalable
vector types annotated with `repr(scalable)` which lowers to a scalable
vector type in LLVM.
Co-authored-by: Jamie Cunliffe <Jamie.Cunliffe@arm.com>
Add new Tier-3 target: riscv64im-unknown-none-elf
This PR proposes to add riscv64im-unknown-none-elf, a subset of the already supported riscv64imac-unknown-none-elf.
The motivation behind this PR is that we want to standardize (most) zkVMs on riscv64im-none and riscv64ima-none. Having different variants of riscv extensions, also seems to be within expectation, atleast with respects to riscv32.
Note: This does not mean that we will be able to remove [riscv32im-risc0-zkvm-elf](https://doc.rust-lang.org/rustc/platform-support/riscv32im-risc0-zkvm-elf.html) -- I am not aware of all of the dependents for this
**Tier-3 Policy**
> A tier 3 target must have a designated developer or developers (the "target maintainers") on record to be CCed when issues arise regarding the target. (The mechanism to track and CC such developers may evolve over time.)
I assigned Rust Embedded Working Group, since they are already maintaining riscv64IMAC, though I am happy to assign myself.
> Targets must use naming consistent with any existing targets; for instance, a target for the same CPU or OS as an existing Rust target should use the same name for that CPU or OS. Targets should normally use the same names and naming conventions as used elsewhere in the broader ecosystem beyond Rust (such as in other toolchains), unless they have a very good reason to diverge. Changing the name of a target can be highly disruptive, especially once the target reaches a higher tier, so getting the name right is important even for a tier 3 target.
It follows the naming convention of the other bare metal riscv targets
> Tier 3 targets may have unusual requirements to build or use, but must not create legal issues or impose onerous legal terms for the Rust project or for Rust developers or users.
This has the same requirements as riscv{32, 64}imac
> Neither this policy nor any decisions made regarding targets shall create any binding agreement or estoppel by any party. If any member of an approving Rust team serves as one of the maintainers of a target, or has any legal or employment requirement (explicit or implicit) that might affect their decisions regarding a target, they must recuse themselves from any approval decisions regarding the target's tier status, though they may otherwise participate in discussions.
> Tier 3 targets should attempt to implement as much of the standard libraries as possible and appropriate (core for most targets, alloc for targets that can support dynamic memory allocation, std for targets with an operating system or equivalent layer of system-provided functionality), but may leave some code unimplemented (either unavailable or stubbed out as appropriate), whether because the target makes it impossible to implement or challenging to implement. The authors of pull requests are not obligated to avoid calling any portions of the standard library on the basis of a tier 3 target not implementing those portions.
> The target must provide documentation for the Rust community explaining how to build for the target, using cross-compilation if possible. If the target supports running binaries, or running tests (even if they do not pass), the documentation must explain how to run such binaries or tests for the target, using emulation if possible or dedicated hardware if necessary.
> Tier 3 targets must not impose burden on the authors of pull requests, or other developers in the community, to maintain the target. In particular, do not post comments (automated or manual) on a PR that derail or suggest a block on the PR based on a tier 3 target. Do not send automated messages or notifications (via any medium, including via ````@)```` to a PR author or others involved with a PR regarding a tier 3 target, unless they have opted into such messages.
> Patches adding or updating tier 3 targets must not break any existing tier 2 or tier 1 target, and must not knowingly break another tier 3 target without approval of either the compiler team or the maintainers of the other tier 3 target.
> Tier 3 targets must be able to produce assembly using at least one of rustc's supported backends from any host target. (Having support in a fork of the backend is not sufficient, it must be upstream.)
Acknowledging the above.
This commit adds basic `efiapi` ABI support for LoongArch by recognizing
`extern "efiapi"` in the ABI map and inline asm clobber handling, and
mapping it to the C calling convention.
This change is intentionally submitted ahead of the full LoongArch UEFI
target support. While UEFI binaries are ultimately produced as PE images,
LoongArch UEFI applications can already be developed by building ELF
objects, applying relocation fixups, and converting them to PE in a
later step. For such workflows, having `efiapi` properly recognized by
the compiler is a prerequisite, even without a dedicated UEFI target.
Landing this ABI support early helps unblock LoongArch UEFI application
and driver development, and allows the remaining UEFI-specific pieces to
be introduced incrementally in follow-up patches.
MCP: https://github.com/rust-lang/compiler-team/issues/953
The amdgpu target uses vector types in various places. The vector types
can be used on all architectures, there is no associated target feature
that needs to be enabled.
The largest vector type found in LLVM intrinsics is `v32i32`
(`[32 x i32]`) for mfma intrinsics. Note that while this intrinsic is
only supported on some architectures, the vector type itself is
supported on all architectures.
Restrict spe_acc to PowerPC SPE targets
Update the tests, add powerpc-*-gnuspe testing, and create a distinct clobber_abi list for PowerPC SPE targets.
Note, the SPE target does not have vector, vector-scalar, or floating-point specific registers.
r? ```@Amanieu```
callconv: fix mips64 aggregate argument passing for C FFI
MIPS64 needs to put a padding argument before an aggregate argument when
this argument is in an odd-number position, starting from 0, and has an
alignment of 16 bytes or higher, e.g.
`void foo(int a, max_align_t b);` is the same as
`void foo(int a, long _padding, max_align_t b);`
Enable `outline-atomics` by default on more AArch64 platforms
The baseline Armv8.0 ISA doesn't have atomics instructions, but in
practice most hardware is at least Armv8.1-A (2014), which includes
single-instruction atomics as part of the LSE feature. As a performance
optimization for these cases, GCC and LLVM have the `-moutline-atomics` flag
to turn atomic operations into calls to symbols like `__aarch64_cas1_acq`.
These can do runtime feature detection and use the LSE instructions if
available, falling back to more portable load-exclusive/store-exclusive
loops.
Since the recent 3b50253b57 ("compiler-builtins: plumb LSE support
for aarch64 on linux") our builtins support this LSE optimization, and
since 6936bb975a ("Dynamically enable LSE for aarch64 rust provided
intrinsics"), std will set the flag as part of its startup code. The first
commit in this PR configures this to work on all platforms built with
`outline-atomics`, not just Linux.
Thus, enable `outline-atomics` by default on Android, OpenBSD, Windows,
and Fuchsia platforms that don't have LSE in the baseline. The feature is
already enabled on Linux. Platform-specific details are included in each
commit message.
The current implementation can still be accessed by setting
`-Ctarget-feature=-outline-atomics`. Setting `-Ctarget-feature=+lse` or
a relevant CPU will use the single-instruction atomics without the call
overhead. https://rust.godbolt.org/z/dsdrzszoe
Link: https://learn.arm.com/learning-paths/servers-and-cloud-computing/lse/intro/
Original Clang outline-atomics benchmarks: https://reviews.llvm.org/D91157#2435844
try-job: aarch64-msvc-*
try-job: arm-android
try-job: dist-android
try-job: dist-aarch64-llvm-mingw
try-job: dist-aarch64-msvc
try-job: dist-various-*
try-job: test-various
Update the tests, add powerpc-*-gnuspe testing, and create a distinct
clobber_abi list for PowerPC SPE targets.
Note, the SPE target does not have vector, vector-scalar, or
floating-point specific registers.
Clang has recently started doing this, as of LLVM commit 5d774ec8d183
("[Driver] Enable outline atomics for OpenBSD/aarch64") [1]. Thus, do
the same here.
[1]: 5d774ec8d1
Clang has done this by default since LLVM commit 1a963d3278c2 ("[Driver]
Make -moutline-atomics default for aarch64-fuchsia targets"), [1], so do
the same here.
[1]: 1a963d3278
Per LLVM commit c5e7e649d537 ("[AArch64][Clang][Linux] Enable
out-of-line atomics by default.") [1], Clang enables these on Android.
Thus, do the same in Rust.
[1]: c5e7e649d5
c-variadic: bpf and spirv do not support c-variadic definitions
tracking issue: https://github.com/rust-lang/rust/issues/44930
Emit a nice error message on bpf and spirv targets when a c-variadic function is defined. Spirv also does not support c-variadic calls, bpf appears to allow them.
r? ```@workingjubilee```
This register is only supported on the *powerpc*spe targets. It is
only recognized by LLVM. gcc does not accept this as a clobber, nor
does it support these targets.
This is a volatile register, thus it is included with clobber_abi.
Fix armv4t- and armv5te- bare metal targets
These two targets currently force on the LLVM feature `+atomics-32`. LLVM doesn't appear to actually be able to emit 32-bit load/store atomics for these targets despite this feature, and emits calls to a shim function called `__sync_lock_test_and_set_4`, which nothing in the Rust standard library supplies.
See [#t-compiler/arm > __sync_lock_test_and_set_4 on Armv5TE](https://rust-lang.zulipchat.com/#narrow/channel/242906-t-compiler.2Farm/topic/__sync_lock_test_and_set_4.20on.20Armv5TE/with/553724827) for more details.
Experimenting with clang and gcc (as logged in that zulip thread) shows that C code cannot do atomic load/stores on that architecture either (at least, not without a library call inserted).
So, the safest thing to do is probably turn off `+atomics-32` for these two Tier 3 targets.
I asked `@Lokathor` and he said he didn't even use atomics on the `armv4t-none-eabi`/`thumbv4t-none-eabi` target he maintains.
I was unable to reach `@QuinnPainter` for comment for `armv5te-none-eabi`/`thumbv5te-none-eabi`.
The second commit renames the base target spec `spec::base::thumb` to `spec::base::arm_none` and changes `armv4t-none-eabi`/`thumbv4t-none-eabi` and `armv5te-none-eabi`/`thumbv5te-none-eabi` to use it. This harmonises the frame-pointer and linker options across the bare-metal Arm EABI and EABIHF targets.
You could make an argument for harmonising `armv7a-none-*`, `armv7r-none*` and `armv8r-none-*` as well, but that can be another PR.