Some intrinsics take `i64` or `u64` arguments which typically means that they're
using 64-bit registers and aren't actually available on x86. This commit adds a
check to stdsimd-verify to assert this and moves around some intrinsics that I
believe should only be available on x86_64.
This commit was checked in many places against gcc/clang/MSVC using godbolt.org
to ensure that we're agreeing with what other compilers are doing.
Closes#304
This was historically done as the contents of the `i686` module wouldn't
actually compile on i586 for various reasons. I believe I've tracked this down
to #300 where LLVM refuses to compile a function using the `x86_mmx` type
without actually enabling the `mmx` feature (sort of reasonably so!). This
commit will now compile in both the `i586` and `i686` modules of this crate into
the `i586-unknown-linux-gnu` target, and the relevant functions now also enable
the `mmx` feature if they're using the `__m64` type.
I believe this is uncovering a more widespread problem where the `__m64` isn't
usable outside the context of `mmx`-enabled functions. The i686 and x86_64
targets have this feature enabled by default which is why it's worked there, but
they're not enabled for the i586 target. We'll probably want to consider this
when stabilizing!
In rust-lang/rust#47743 the SIMD types in the Rust ABI are being switched to
getting passed through memory unconditionally rather than through LLVM's
immediate registers. This means that a bunch of our assert_instr invocations
started breaking as LLVM has more efficient methods of dealing with memory than
the instructions themselves.
This switches `assert_instr` to unconditionally use a shim that is an `extern`
function which should revert back to the previous behavior, using the simd types
as immediates and testing the same.
* Remove `PartialEq for __m64`
This helps to strip the public API of the vendor type for now, although this may
come back at a later date!
* Remove `PartialEq for __m128i`
Like the previous commit, but for another type!
* Remove `PartialEq for __m256i`
Same as previous commit!
I believe we're reserving the `simd` module for exclusively the portable types
and their operations, so this commit moves the various x86-specific types from
the portable modules to the `x86` module. Along the way this also adds some doc
blocks for all the existing x86 types.
This is primarily doing to avoid falling into a portability trap by accident,
and in general makes the vendor types (on x86) going towards as minimal as they
can be. Along the way some tests were cleaned up which were still using the
portable types.
They need to be structured *somehow* to be the right bit width but ideally we
wouldn't have the intrinsics rely on the particulars about how they're
represented.
This commit provides insurance that intrinsics are only introduced with known
canonical types (`__m128i` and such) instead of also allowing `u8x16` for
example.
This commit updates to the latest nightly's syntax where `#[target_feature =
"+foo"]` is now deprecated in favor of `#[target_feature(enable = "foo")]`.
Additionally `#[target_feature]` can only be applied to `unsafe` functions for
now.
Along the way this removes a few exampels that were just left around and also
disables the `fxsr` modules as that target feature will need to land in upstream
rust-lang/rust first as it's currently unknown to the compiler.
- `_m_paddb` for `_mm_add_pi8`
- `_m_paddw` for `_mm_add_pi16`
- `_m_paddd` for `_mm_add_pi32`
- `_m_paddsb` for `_mm_adds_pi8`
- `_m_paddsw` for `_mm_adds_pi16`
- `_m_paddusb` for `_mm_adds_pu8`
- `_m_paddusw` for `_mm_adds_pu16`
* Implement `_m_psubb`
* Implement `_m_psubw`
* Implement `_m_psubd`
* Implement `_m_psubsb`
* Implement `_m_psubsw`
* Implement `_m_psubusb`
* Implement `_m_psubusw`
* Have the subtraction intrinsic naming consistent with the addition ones
E.g. use `_mm_sub_pi8` instead of `_m_psubb`
* Implement all subtraction aliases for the `_mm_*` variants
- `_m_psubb` for `_mm_sub_pi8`
- `_m_psubw` for `_mm_sub_pi16`
- `_m_psubd` for `_mm_sub_pi32`
- `_m_psubsb` for `_mm_subs_pi8`
- `_m_psubsw` for `_mm_subs_pi16`
- `_m_psubusb` for `_mm_subs_pu8`
- `_m_psubusw` for `_mm_subs_pu16`
This commit starts the migration towards Intel's types one intrinsic at a time,
starting with `_mm_add_ss`. This is mostly just to get a feel for what the tests
will start to look like.
* [core/runtime] use getauxval on non-x86 platforms
* test coresimd::auxv against auxv crate
* add test files from auxv crate
* [arm] use simd_test macro
* formatting
* missing docs
* improve docs
* reading /proc/self/auxv succeeds only if reading all fields succeeds
* remove cc-crate build dependency
* getauxval succeeds only if hwcap/hwcap2 are non-zero
* fix formatting
* move getauxval to stdsimd
* delete getauxval-wrapper.c
* remove auxv crate dev-dependency from coresimd
* Completes SSE and adds some MMX intrinsics
MMX:
- `_mm_cmpgt_pi{8,16,32}`
- `_mm_unpack{hi,lo}_pi{8,16,32}`
SSE (is now complete):
- `_mm_cvtp{i,u}{8,16}_ps`
- add test for `_m_pmulhuw`
* fmt and clippy
* add an exception for intrinsics using cvtpi2ps
* assert_instr check for failed inlining
* Fix `call` instructions showing up in some intrinsics
The ABI of types like `u8x8` as they're defined isn't actually the underlying
type we need for LLVM, but only `__m64` currently satisfies that. Apparently
this (and the casts involved) caused some extraneous instructions for a number
of intrinsics. They've all moved over to the `__m64` type now to ensure that
they're what the underlying interface is.
* Allow PIC-relative `call` instructions on x86
These should be harmless when evaluating whether we failed inlining
* Fix sse::_mm_cvtsi32_ss and sse::_mm_cvtsi64_ss
By using LLVM builtins, the expected instruction
is correctly generated on all platforms.
* Use LLVM builtins for storeu*
Just to make sure that the wrong instructions is not related to
Rust code.
* sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps
And mmx:
_mm_cmpgt_pi8
_mm_cmpgt_pi16
_mm_unpackhi_pi16
_mm_unpacklo_pi8
_mm_unpacklo_pi16
* Fix: literal out of range
This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which is
then parsed in the `stdsimd-verify` crate to verify that everything we write
down is matched against the upstream definitions.
Currently the checks are pretty loose to get this compiling but a few intrinsics
were fixed as a result of this. For example:
* `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX
* `_mm_tzcnt_32` - erroneously had `u32` in the name
* `_mm_tzcnt_64` - erroneously had `u64` in the name
* `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms
* `_mm_extract_epi64` - erroneously available on 32-bit platforms
* `_mm_insert_epi64` - erroneously available on 32-bit platforms
* `_mm256_extract_epi16` - erroneously returned i32 instead of i16
* `_mm256_extract_epi8` - erroneously returned i32 instead of i8
* `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32
* `_popcnt32` - the signededness of the argument and return were flipped
* `_popcnt64` - the signededness of the argument was flipped and the argument
was too large bit-wise
* `_mm_tzcnt_32` - the return value's sign was flipped
* `_mm_tzcnt_64` - the return value's sign was flipped
* A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8:
i32` which Intel was using. (we were also internally inconsistent)
* A number of intrinsics working with `__m64` were instead working with i64/u64,
so they're now corrected to operate with the vector types instead.
Currently the verifications performed are:
* Each name in Rust is defined in the XML document
* The arguments/return values all agree.
* The CPUID features listed in the XML document are all enabled in Rust as well.
The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require closer
adherence with Intel's own types. Otherwise types like `i32x8` (or any integers
with 256 bits) all match up to `__m256i` right now, althoguh this may want to
change in the future.
Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in the XML
and the instruction listed in `assert_instr`, and those'll need to be taken care
of in a future commit.
Closes#240