rust/tests
Stuart Cook 2e4e196a5b
Rollup merge of #136457 - calder:master, r=tgross35
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us 

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us 

With rustc 1.86.0-nightly (8239a37f9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us 

With rustc 1.84.1 (e71f9a9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

https://github.com/rust-lang/rust/issues/21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* https://github.com/rust-lang/rust/issues/21690
* https://github.com/rust-lang/libs-team/issues/532
* https://github.com/rust-lang/rust/issues/136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
2025-04-05 13:18:12 +11:00
..
assembly Auto merge of #138503 - bjorn3:string_merging, r=tmiasko 2025-03-28 10:18:32 +00:00
auxiliary tests: use minicore more 2025-02-24 09:26:54 +00:00
codegen Rollup merge of #136457 - calder:master, r=tgross35 2025-04-05 13:18:12 +11:00
codegen-units Remove -Zinline-in-all-cgus and clean up CGU partitioning tests 2025-01-27 23:48:47 -05:00
coverage Bless tests 2025-04-02 19:59:26 +08:00
coverage-run-rustdoc Update coverage-run-rustdoc output 2025-03-28 10:35:53 +01:00
crashes add TypingMode::Borrowck 2025-04-03 11:13:10 +02:00
debuginfo Rollup merge of #137967 - mustartt:fix-aix-test-hangs, r=workingjubilee 2025-03-11 13:30:50 +01:00
incremental Rollup merge of #139153 - compiler-errors:incr-comp-closure, r=oli-obk 2025-03-31 14:36:22 +02:00
mir-opt Auto merge of #132527 - DianQK:gvn-stmt-iter, r=oli-obk 2025-04-03 19:17:33 +00:00
pretty Auto merge of #138492 - lcnr:rm-inline_const_pat, r=oli-obk 2025-04-01 14:20:46 +00:00
run-make Move link-self-contained-consistency test to a more reasonable location 2025-04-03 15:41:38 +02:00
rustdoc Correctly handle line comments in attributes and generate extern crates 2025-03-27 11:18:43 +01:00
rustdoc-gui Rollup merge of #137539 - GuillaumeGomez:copy-content-tests, r=notriddle 2025-02-25 13:07:34 +01:00
rustdoc-js
rustdoc-js-std Remove the common prelude module 2025-02-11 13:04:27 -08:00
rustdoc-json rustdoc-json: Add test for #[automatically_derived] attribute 2025-03-31 20:42:49 +00:00
rustdoc-ui Rollup merge of #139328 - GuillaumeGomez:fix-panic-output-137970, r=fmease 2025-04-04 21:54:57 +02:00
ui Auto merge of #139390 - matthiaskrgr:rollup-l64euwx, r=matthiaskrgr 2025-04-04 23:03:57 +00:00
ui-fulldeps compiletest: Require //~ annotations even if error-pattern is specified 2025-04-03 11:08:55 +03:00
COMPILER_TESTS.md