rust/library
Stuart Cook 2e4e196a5b
Rollup merge of #136457 - calder:master, r=tgross35
Expose algebraic floating point intrinsics

# Problem

A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.

See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.

### C++: 10us 

With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
    <th>C++</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
    #pragma clang fp reassociate(on)
    float sum = 0.0;
    for (size_t i = 0; i < len; ++i) {
        sum += a[i] * b[i];
    }
    return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>

### Nightly Rust: 10us 

With rustc 1.86.0-nightly (8239a37f9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>

### Stable Rust: 84us 

With rustc 1.84.1 (e71f9a9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
    <th>Rust</th>
    <th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = 0.0;
    for i in 0..a.len() {
        sum += a[i] * b[i];
    }
    sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>

# Proposed Change

Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.

# Alternatives Considered

https://github.com/rust-lang/rust/issues/21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.

In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.

# References

* https://github.com/rust-lang/rust/issues/21690
* https://github.com/rust-lang/libs-team/issues/532
* https://github.com/rust-lang/rust/issues/136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
2025-04-05 13:18:12 +11:00
..
alloc Fix testing with randomized layouts enabled 2025-04-03 15:30:01 +00:00
alloctests Add a test for Weak created from UniqueArc::downgrade 2025-03-29 12:13:38 +08:00
backtrace@9d2c34e7e6 Update backtrace 2025-02-13 14:32:50 -08:00
core Rollup merge of #136457 - calder:master, r=tgross35 2025-04-05 13:18:12 +11:00
coretests Auto merge of #139213 - bjorn3:cg_clif_test_coretests, r=jieyouxu 2025-04-04 11:59:59 +00:00
panic_abort Migrate panic_abort to Rust 2024 2025-03-11 09:46:34 -07:00
panic_unwind Mark imports of #[rustc_std_internal_symbol] items with this attribute 2025-03-17 14:06:56 +00:00
portable-simd Merge commit 'c14f2fc3eb' into sync-from-portable-simd-2025-03-19 2025-03-19 00:58:47 -04:00
proc_macro allow wasm_c_abi in proc_macro bridge 2025-03-25 08:22:35 +01:00
profiler_builtins Migrate profiler_builtins to Rust 2024 2025-03-11 09:46:35 -07:00
rtstartup Revert changes for rtstartup 2025-03-10 21:23:31 +08:00
rustc-std-workspace-alloc Migrated the rustc-std-workspace crates to Rust 2024 2025-03-11 09:46:35 -07:00
rustc-std-workspace-core Migrated the rustc-std-workspace crates to Rust 2024 2025-03-11 09:46:35 -07:00
rustc-std-workspace-std Migrated the rustc-std-workspace crates to Rust 2024 2025-03-11 09:46:35 -07:00
std Rollup merge of #136457 - calder:master, r=tgross35 2025-04-05 13:18:12 +11:00
stdarch@9426bb5658 Update stdarch 2025-03-06 11:11:55 -08:00
sysroot Migrate the sysroot crate to Rust 2024 2025-03-11 09:46:35 -07:00
test Migrate test to Rust 2024 2025-03-11 09:46:34 -07:00
unwind Rollup merge of #137621 - Berrysoft:cygwin-std, r=joboet 2025-03-17 05:47:49 -04:00
windows_targets Migrate windows-targets to Rust 2024 2025-03-11 09:46:35 -07:00
Cargo.lock compiler and tools dependencies 2025-04-01 20:48:17 +00:00
Cargo.toml Add opt-level = "s" for more std symbolication crates 2025-04-01 20:50:19 +00:00