Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
This commit is contained in:
commit
04db0fb5da
1 changed files with 25 additions and 0 deletions
|
|
@ -348,6 +348,31 @@ pub(super) fn codegen_simd_intrinsic_call<'tcx>(
|
|||
ret.write_cvalue(fx, ret_lane);
|
||||
}
|
||||
|
||||
sym::simd_splat => {
|
||||
intrinsic_args!(fx, args => (value); intrinsic);
|
||||
|
||||
if !ret.layout().ty.is_simd() {
|
||||
report_simd_type_validation_error(fx, intrinsic, span, ret.layout().ty);
|
||||
return;
|
||||
}
|
||||
let (lane_count, lane_ty) = ret.layout().ty.simd_size_and_type(fx.tcx);
|
||||
|
||||
if value.layout().ty != lane_ty {
|
||||
fx.tcx.dcx().span_fatal(
|
||||
span,
|
||||
format!(
|
||||
"[simd_splat] expected element type {lane_ty:?}, got {got:?}",
|
||||
got = value.layout().ty
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
for i in 0..lane_count {
|
||||
let ret_lane = ret.place_lane(fx, i.into());
|
||||
ret_lane.write_cvalue(fx, value);
|
||||
}
|
||||
}
|
||||
|
||||
sym::simd_neg
|
||||
| sym::simd_bswap
|
||||
| sym::simd_bitreverse
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue