Similarly to i686 and X86_64 MinGW targets, Rust needs to provide the
right chkstk symbol for AArch64 to avoid relying on the linker to
provide it.
CC https://github.com/rust-lang/rust/issues/150725
We can't drop the `unsafe` here because it is required at the `libm`
MSRV. Instead, we will need to `allow` the lint.
This reverts commit 96ac3624abc144db930d94504a9c67aad7b949ed.
stdarch subtree update
Subtree update of `stdarch` to 61119062fb.
Created using https://github.com/rust-lang/josh-sync.
r? `@sayantn`
My first josh sync, it lgtm, but let me know if I missed something.
I'm especially looking forward to the amd GPU module, which we want to use for the offload project.
Due to an erroneous overflow threshold, `expm1f` was incorrectly
returning `inf` for inputs in the range `[88.72169, 88.72283]`. This
additionally caused `sinhf` to return `NaN` for inputs in that range.
The bug was ported from the original in musl, which has since been fixed
in [1].
[1]: https://git.musl-libc.org/cgit/musl/commit/?id=964104f9f0e056cf58d9defa0b716d7756f040f6
Avoid using `Ord::clamp` in the `f16`-specific part of the generic
`scalbn`.
It turned out to be redundant anyway, as both callsites follow a pattern
like
```
if n < negative_val {
let foo = (n + positive_val).clamp(negative_val, positive_val);
}
```
and `n < negative_val < 0` implies `n + positive_val < positive_val`.
Fixes: rust-lang/compiler-builtins#1046
Enable `outline-atomics` by default on more AArch64 platforms
The baseline Armv8.0 ISA doesn't have atomics instructions, but in
practice most hardware is at least Armv8.1-A (2014), which includes
single-instruction atomics as part of the LSE feature. As a performance
optimization for these cases, GCC and LLVM have the `-moutline-atomics` flag
to turn atomic operations into calls to symbols like `__aarch64_cas1_acq`.
These can do runtime feature detection and use the LSE instructions if
available, falling back to more portable load-exclusive/store-exclusive
loops.
Since the recent 3b50253b57 ("compiler-builtins: plumb LSE support
for aarch64 on linux") our builtins support this LSE optimization, and
since 6936bb975a ("Dynamically enable LSE for aarch64 rust provided
intrinsics"), std will set the flag as part of its startup code. The first
commit in this PR configures this to work on all platforms built with
`outline-atomics`, not just Linux.
Thus, enable `outline-atomics` by default on Android, OpenBSD, Windows,
and Fuchsia platforms that don't have LSE in the baseline. The feature is
already enabled on Linux. Platform-specific details are included in each
commit message.
The current implementation can still be accessed by setting
`-Ctarget-feature=-outline-atomics`. Setting `-Ctarget-feature=+lse` or
a relevant CPU will use the single-instruction atomics without the call
overhead. https://rust.godbolt.org/z/dsdrzszoe
Link: https://learn.arm.com/learning-paths/servers-and-cloud-computing/lse/intro/
Original Clang outline-atomics benchmarks: https://reviews.llvm.org/D91157#2435844
try-job: aarch64-msvc-*
try-job: arm-android
try-job: dist-android
try-job: dist-aarch64-llvm-mingw
try-job: dist-aarch64-msvc
try-job: dist-various-*
try-job: test-various
Currently the benchmark CI jobs prints multiple pages of paths from the
extracted archive, since `tar` is run with `v`. This is a lot of output
that is usually just noise in CI.
Switch to printing the paths from python instead, limiting to a depth of
three segments (and deduplicating). Removing it completely was an
option, but it's still nice to have a hint about what gets updated.
Jorge hasn't been very involved with these crates for a while (thank you
for getting these super important projects going!). Update the `authors`
field to include, as far as I am aware, everyone who has effectively
maintained `compiler-builtins` at some point in time.
This field is dropped from non-published crates.
This was originally attempted at [1], but the numbers seemed to indicate
that tests weren't being run or counted completely. That issue appears
to be resolved, so add benchmarks for Aarch64.
[1]: https://github.com/rust-lang/compiler-builtins/pull/930
* `repe` is "repeat while equal", which only makes sense for string
comparisons. Change it to `rep`. (The encoding is the same so there is
no performance change.)
* Remove an unneeded `test`. This was added in ae557bde4efc ("Skip rep
movsb in copy_backward if possible"). The `jz` was removed in
ef37a23d8417 ("Remove branches around rep movsb/stosb") but the `test`
was missed.
* Remove an incorrect `preserves_flags`; `add` and `sub` affect flags.
Discussion: https://github.com/rust-lang/compiler-builtins/pull/911
Fixes: ef37a23d8417 ("Remove branches around rep movsb/stosb")
Fixes: c30322aafc9c ("Align destination in mem* instructions.")
[ Added details to the commit message - Trevor ]
This is kind of a retry at rust-lang/compiler-builtins#898. One of the
problems there was that it would have added overhead and regressed
performance for typical inputs.
Unlike that PR, this doesn't aim for sub-linear scaling; the cost of
evaluating `fmod(x, y)` is still roughly proportional to `log2(|x/y|)`.
However, the constant factor is much better. Running the
`random`-benchmarks locally, I got walltime reductions of
fmodf16: -56.9%
fmodf: -85.0%
fmod: -95.4%
fmodf128: -98.7%
So far we haven't been running the `mem_icount` benches in CI, but this
would be useful. Use a glob pattern for the test so this and future
icount benchmarks all get run.
The latest release of gungraun uses global symbols to register tests.
Since it doesn't know about modules, these conflict.
Add the module name so this isn't an issue, but keep the modules around
because they are useful for organization.
`iai-callgrind` was renamed to `gungraun` and had a new release. Update
everything to match.
There shouldn't be any changes to observable behavior here.
Build outline atomic symbols on all targets that have `outline-atomics`
enabled, rather than only on Linux. Since this is no longer OS-specific,
also rename the module.
`cfg(target_family = "...")` can be set multiple times, and thus
`CARGO_CFG_TARGET_FAMILY` can also contain comma-separated values,
similar to `CARGO_CFG_TARGET_FEATURE`.
This allows `cargo build --target wasm32-unknown-emscripten -p
musl-math-sys` to work, and will become more important if we were to add
e.g. `cfg(target_family = "darwin")` in the future as discussed in
https://github.com/rust-lang/rust/issues/100343.
Rustc commit 055e05a338 / builtins commit 2fb3a1871bc9 ("Mark float
intrinsics with no preconditions as safe") changed `fma` and other
intrinsics to not be unsafe to call. Unfortunately we can't remove the
`unsafe` just yet since the rustc we pin for benchmarks is older than
this.
Add back `unsafe` but allow it to be unused.