Currently the default release profile enables LTO and single CGU builds,
which is very slow to build. Most tests are better run with
optimizations enabled since it allows testing a much larger number of
inputs, so it is inconvenient that building can sometimes take
significantly longer than the tests.
Remedy this by doing the following:
* Move the existing `release` profile to `release-opt`.
* With the above, the default `release` profile is untouched (16 CGUs
and thin local LTO).
* `release-checked` inherits `release`, so no LTO or single CGU.
This means that the simple `cargo test --release` becomes much faster
for local development. We are able to enable the other profiles as
needed in CI.
Tests should ideally still be run with `--profile release-checked` to
ensure there are no debug assetions or unexpected wrapping math hit.
`no-panic` still needs a single CGU, so must be run with `--profile
release-opt`. Since it is not possible to detect CGU or profilel
configuration from within build scripts, the `ENSURE_NO_PANIC`
environment variable must now always be set.
The Cargo feature `checked` was added in 410b0633a6b9 ("Overhaul tests")
and later removed in e4ac1399062c ("swap stable to be unstable, checked
is now debug_assertions"). However, there are a few remaining uses of
`feature = "checked"` that did not get removed. Clean these up here.
This function is significantly slower than all others so includes an
override in `EXTREMELY_SLOW_TESTS`. Without it, PR CI takes ~1hour and
the extensive tests in CI take ~1day.
Certain functions (`fmodf128`) are significantly slower than others,
to the point that running the default number of tests adds tens of
minutes to PR CI and extensive test time increases to ~1day. It does not
make sense to do this by default; so, introduce `EXTREMELY_SLOW_TESTS`
to test configuration that allows setting specific tests that need to
have a reduced iteration count.
With the new routines, some of our tests are running close to their
timeouts. Increase the timeout for test jobs, and set a short timeout
for all other jobs that did not have one.
Make things more consistent with other API that works with a bitwise
representation of the exponent. That is, use `u32` when working with a
bitwise (biased) representation, use `i32` when the bitwise
representation has been adjusted for bias and ay be negative.
Every place this has been used so far has an `as i32`, so this change
makes things cleaner anyway.
Currently our XFAILs are open ended; we do not check that it actually
fails, so we have no easy way of knowing that a previously-failing test
starts passing. Introduce a new enum that we return from overrides to
give us more flexibility here, including the ability to assert that
expected failures happen.
With the new enum, it is also possible to specify ULP via return value
rather than passing a `&mut u32` parameter.
This includes refactoring of `precision.rs` to be more accurate about
where errors come from, if possible.
Fixes: https://github.com/rust-lang/libm/issues/455
Additionally, make use of this version to implement `ceil` and `ceilf`.
Musl's `ceilf` algorithm seems to work better for all versions of the
functions. Testing with a generic version of musl's `ceil` routine
showed the following regressions:
icount::icount_bench_ceil_group::icount_bench_ceil logspace:setup_ceil()
Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
Baselines: softfloat|softfloat
Instructions: 14064|13171 (+6.78005%) [+1.06780x]
L1 Hits: 16697|15803 (+5.65715%) [+1.05657x]
L2 Hits: 0|0 (No change)
RAM Hits: 7|8 (-12.5000%) [-1.14286x]
Total read+write: 16704|15811 (+5.64797%) [+1.05648x]
Estimated Cycles: 16942|16083 (+5.34104%) [+1.05341x]
icount::icount_bench_ceilf_group::icount_bench_ceilf logspace:setup_ceilf()
Performance has regressed: Instructions (14732 > 9901) regressed by +48.7931% (>+5.00000)
Baselines: softfloat|softfloat
Instructions: 14732|9901 (+48.7931%) [+1.48793x]
L1 Hits: 17494|12611 (+38.7202%) [+1.38720x]
L2 Hits: 0|0 (No change)
RAM Hits: 6|6 (No change)
Total read+write: 17500|12617 (+38.7018%) [+1.38702x]
Estimated Cycles: 17704|12821 (+38.0860%) [+1.38086x]
`exp` does not perform any form of unbiasing, so there isn't any reason
it should be signed. Change this.
Additionally, add `EPSILON` to the `Float` trait.
Musl commit 97e9b73d59 ("math: new software sqrt") adds a new algorithm
using Goldschmidt division. Port this algorithm to Rust and make it
generic, which shows a notable performance improvement over the existing
algorithm.
This also allows adding square root routines for `f16` and `f128`.
Any architecture-specific float operations are likely to consist of only
a few instructions, but the softfloat implementations are much more
complex. Ensure this is what gets tested.
`cc` automatically reads this from Cargo's `OPT_LEVEL` variable so we
don't need to set it explicitly. Remove this so running in a debugger
makes more sense.