Additionally, make use of this version to implement `floor` and
`floorf`.
Similar to `ceil`, musl'f `ceilf` routine seems to work better for all
float widths than the `ceil` algorithm. Trying with the `ceil` (`f64`)
algorithm produced the following regressions:
icount::icount_bench_floor_group::icount_bench_floor logspace:setup_floor()
Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
Baselines: softfloat|softfloat
Instructions: 14064|13171 (+6.78005%) [+1.06780x]
L1 Hits: 16821|15802 (+6.44855%) [+1.06449x]
L2 Hits: 0|0 (No change)
RAM Hits: 8|9 (-11.1111%) [-1.12500x]
Total read+write: 16829|15811 (+6.43856%) [+1.06439x]
Estimated Cycles: 17101|16117 (+6.10535%) [+1.06105x]
icount::icount_bench_floorf128_group::icount_bench_floorf128 logspace:setup_floorf128()
Baselines: softfloat|softfloat
Instructions: 166868|N/A (*********)
L1 Hits: 221429|N/A (*********)
L2 Hits: 1|N/A (*********)
RAM Hits: 34|N/A (*********)
Total read+write: 221464|N/A (*********)
Estimated Cycles: 222624|N/A (*********)
icount::icount_bench_floorf16_group::icount_bench_floorf16 logspace:setup_floorf16()
Baselines: softfloat|softfloat
Instructions: 143029|N/A (*********)
L1 Hits: 176517|N/A (*********)
L2 Hits: 1|N/A (*********)
RAM Hits: 13|N/A (*********)
Total read+write: 176531|N/A (*********)
Estimated Cycles: 176977|N/A (*********)
icount::icount_bench_floorf_group::icount_bench_floorf logspace:setup_floorf()
Performance has regressed: Instructions (14732 > 10441) regressed by +41.0976% (>+5.00000)
Baselines: softfloat|softfloat
Instructions: 14732|10441 (+41.0976%) [+1.41098x]
L1 Hits: 17616|13027 (+35.2268%) [+1.35227x]
L2 Hits: 0|0 (No change)
RAM Hits: 8|6 (+33.3333%) [+1.33333x]
Total read+write: 17624|13033 (+35.2260%) [+1.35226x]
Estimated Cycles: 17896|13237 (+35.1968%) [+1.35197x]
|
||
|---|---|---|
| .. | ||
| compiler-builtins/libm | ||