Commit graph

347 commits

Author SHA1 Message Date
Alex Crichton
4b66abaede
Move x86-specific types to the vendor module (#293)
I believe we're reserving the `simd` module for exclusively the portable types
and their operations, so this commit moves the various x86-specific types from
the portable modules to the `x86` module. Along the way this also adds some doc
blocks for all the existing x86 types.
2018-01-19 21:20:44 -06:00
Alex Crichton
e19b6d9efd
Remove Into/From between x86 and portable types (#292)
This is primarily doing to avoid falling into a portability trap by accident,
and in general makes the vendor types (on x86) going towards as minimal as they
can be. Along the way some tests were cleaned up which were still using the
portable types.
2018-01-19 20:15:07 -06:00
Alex Crichton
54452230a7
Add an example of SIMD-powered hex encoding (#291)
This is lifted from an example elsewhere I found and shows off runtime
dispatching along with a lot of intrinsics being used in a bunch.
2018-01-19 16:53:38 -06:00
Alex Crichton
faf5aea427
Reduce implicit reliance on structure of __m* types (#290)
They need to be structured *somehow* to be the right bit width but ideally we
wouldn't have the intrinsics rely on the particulars about how they're
represented.
2018-01-19 14:44:31 -06:00
Alex Crichton
330a124568
Update stdsimd-verify for vendor types (#289)
This commit provides insurance that intrinsics are only introduced with known
canonical types (`__m128i` and such) instead of also allowing `u8x16` for
example.
2018-01-19 12:11:21 -06:00
Alex Crichton
30b1145ef7
Migrate the i586::avx2 module to vendor types (#287) 2018-01-19 10:32:16 -06:00
Alex Crichton
1ad6d5fa88
Migrate the x86_64 folder to vendor types (#284) 2018-01-19 10:30:25 -06:00
messense
8deae9ce66 Update links in Cargo.toml to rust-lang-nursery/stdsimd (#288) 2018-01-18 20:23:50 -06:00
Alex Crichton
c5afde07d2
Migrate the i586::avx module to vendor types (#286)
Closes #285
2018-01-18 11:21:03 -06:00
Alex Crichton
5c8867c7c3
Update target_feature syntax (#283)
This commit updates to the latest nightly's syntax where `#[target_feature =
"+foo"]` is now deprecated in favor of `#[target_feature(enable = "foo")]`.
Additionally `#[target_feature]` can only be applied to `unsafe` functions for
now.

Along the way this removes a few exampels that were just left around and also
disables the `fxsr` modules as that target feature will need to land in upstream
rust-lang/rust first as it's currently unknown to the compiler.
2018-01-17 09:45:02 -06:00
Josef Ippisch
8deead27f2 Implement addition aliases (#281)
- `_m_paddb` for `_mm_add_pi8`
- `_m_paddw` for `_mm_add_pi16`
- `_m_paddd` for `_mm_add_pi32`
- `_m_paddsb` for `_mm_adds_pi8`
- `_m_paddsw` for `_mm_adds_pi16`
- `_m_paddusb` for `_mm_adds_pu8`
- `_m_paddusw` for `_mm_adds_pu16`
2018-01-13 12:08:53 -06:00
Josef Ippisch
50cf00372d MMX subtraction instructions (#280)
* Implement `_m_psubb`

* Implement `_m_psubw`

* Implement `_m_psubd`

* Implement `_m_psubsb`

* Implement `_m_psubsw`

* Implement `_m_psubusb`

* Implement `_m_psubusw`

* Have the subtraction intrinsic naming consistent with the addition ones

E.g. use `_mm_sub_pi8` instead of `_m_psubb`

* Implement all subtraction aliases for the `_mm_*` variants

- `_m_psubb` for `_mm_sub_pi8`
- `_m_psubw` for `_mm_sub_pi16`
- `_m_psubd` for `_mm_sub_pi32`
- `_m_psubsb` for `_mm_subs_pi8`
- `_m_psubsw` for `_mm_subs_pi16`
- `_m_psubusb` for `_mm_subs_pu8`
- `_m_psubusw` for `_mm_subs_pu16`
2018-01-12 17:10:51 -06:00
Alex Crichton
e77ebf194a
Migrate the i686 module to vendor types (#279)
* Migrate `i686::sse` to vendor types

* Migrate `i686::sse2` to vendor types

* Migrate i686::sse41 to vendor types

* Migrate i686::sse42 to vendor types
2018-01-12 14:08:20 -06:00
Alex Crichton
48a7490711
Make rustc's job a little esaier in sse42 (#277)
Move all the casts from `__m128i` to `i8x16` outside the macro invocations so
rustc only has to resolve a few function calls, not thousands!
2018-01-12 11:37:06 -06:00
Alex Crichton
feb8c2b152
Migrate i586::ssse3 to vendor types (#275) 2018-01-11 23:18:35 -06:00
Alex Crichton
fde52cb334
Migrate i586::sse41 to vendor types (#276) 2018-01-11 23:18:15 -06:00
Alex Crichton
3148881fa2 Move travis workaround earlier
Try to get it used on OSX as well
2018-01-11 08:24:11 -08:00
Alex Crichton
5467c0a008
Migrate i586::sse3 to vendor types (#274) 2018-01-11 10:13:26 -06:00
Alex Crichton
6d8d2f81e9
Migrate a bunch of i586::sse2 to native types (#273) 2018-01-10 12:42:26 -06:00
Alex Crichton
baf9d0e7e0
Migrate the i686::sse module to vendor types (#269)
This migrates the entire `i686::sse` module (and touches a few others) to the
vendor types.
2018-01-09 13:38:09 -06:00
Jef
248f5441bb Make splat a const fn 2018-01-09 18:38:47 +01:00
Alex Crichton
fd2cc3bc05
Migrate _mm_add_ss to __m128 (#265)
This commit starts the migration towards Intel's types one intrinsic at a time,
starting with `_mm_add_ss`. This is mostly just to get a feel for what the tests
will start to look like.
2018-01-09 09:49:08 -06:00
gnzlbg
58664a6f54 More run-time detection improvements (#242)
* [core/runtime] use getauxval on non-x86 platforms

* test coresimd::auxv against auxv crate

* add test files from auxv crate

* [arm] use simd_test macro

* formatting

* missing docs

* improve docs

* reading /proc/self/auxv succeeds only if reading all fields succeeds

* remove cc-crate build dependency

* getauxval succeeds only if hwcap/hwcap2 are non-zero

* fix formatting

* move getauxval to stdsimd

* delete getauxval-wrapper.c

* remove auxv crate dev-dependency from coresimd
2018-01-09 09:23:45 -06:00
Alex Crichton
94fe929a03 Update to a released syn/quote version 2018-01-08 10:10:52 -08:00
Josef Ippisch
705c34b4eb Implement all addition MMX intrinsics (#266)
* Implement `_mm_add_pi16`

* Implement `_mm_add_pi8`

* Implement `_mm_add_pi32`

* Implement `_mm_adds_pi16`

* Implement `_mm_adds_pi8`

* Implement `_mm_adds_pu8`

* Implement `_mm_adds_pu16`
2018-01-06 12:36:05 -06:00
Jake Goulding
4667c63113 Add RDTSC and RDTSCP intrinsics (#264) 2018-01-05 13:30:26 -06:00
gnzlbg
4bb1ea5a05 Completes SSE and adds some MMX intrinsics (#247)
* Completes SSE and adds some MMX intrinsics

MMX:

- `_mm_cmpgt_pi{8,16,32}`
- `_mm_unpack{hi,lo}_pi{8,16,32}`

SSE (is now complete):

- `_mm_cvtp{i,u}{8,16}_ps`
- add test for `_m_pmulhuw`

* fmt and clippy

* add an exception for intrinsics using cvtpi2ps
2018-01-04 10:15:23 -06:00
Alex Crichton
4f1f2bd550 Add an exception for vzeroall/vzeroupper on Windows
These apparently blow the 20 intstruction limit with all the loads/stores.
2018-01-03 16:02:35 -08:00
Alex Crichton
3441968ffa Turn down debug level on release mode
Apparently helps fix errors about codeview registers on MSVC!
2018-01-03 15:59:31 -08:00
Alex Crichton
edbfae36c0
Lower the instruction limit to 20 (#262)
Right now it's 30 which is a bit high, most of the intrinsics requiring all
these instructions ended up needing to be fixed anyway.
2018-01-03 17:21:01 -06:00
Alex Crichton
07ebce51b8
Assert intrinsic implementations are inlined properly (#261)
* assert_instr check for failed inlining

* Fix `call` instructions showing up in some intrinsics

The ABI of types like `u8x8` as they're defined isn't actually the underlying
type we need for LLVM, but only `__m64` currently satisfies that. Apparently
this (and the casts involved) caused some extraneous instructions for a number
of intrinsics. They've all moved over to the `__m64` type now to ensure that
they're what the underlying interface is.

* Allow PIC-relative `call` instructions on x86

These should be harmless when evaluating whether we failed inlining
2018-01-03 16:37:45 -06:00
gwenn
acc8d3de10 Use llvm builtins where possible (#260)
* Fix sse::_mm_cvtsi32_ss and sse::_mm_cvtsi64_ss

By using LLVM builtins, the expected instruction
is correctly generated on all platforms.

* Use LLVM builtins for storeu*

Just to make sure that the wrong instructions is not related to
Rust code.
2018-01-03 15:18:34 -06:00
gwenn
983b72d189 Last missing avx and avx2 intrinsics (#258)
* avx: _mm256_cvtss_f32, avx2: _mm256_cvtsd_f64, _mm256_cvtsi256_si32

* avx2: _mm256_slli_si256, _mm256_srli_si256

And aliases:
_mm256_bslli_epi128
_mm256_bsrli_epi128
2018-01-02 14:33:02 -06:00
Alex Crichton
ec373ba107 Update to syn master 2018-01-02 12:32:27 -08:00
Alex Crichton
59ed27cc95 Fix stdsimd-verify for syn master 2017-12-31 09:52:16 -08:00
Alex Crichton
3403b6f06a Fix compile with syn master 2017-12-31 09:19:44 -08:00
gwenn
802a379a4a sse2: remove duplicates and move intrinsics to x86_64 file (#256)
* sse2: remove duplicates from i686 file

_mm_cvtsi64x_si128
_mm_cvtsi64_si128
_mm_cvtsi128_si64
_mm_cvtsi128_si64x

* sse2: move _mm_cvtsi64_sd and _mm_cvtsi64x_sd to x86_64 file
2017-12-31 00:58:14 -06:00
Adam Niederer
9141a063c9 Add bswap (#257) 2017-12-31 00:57:04 -06:00
gwenn
5ca8c0aa93 sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps (#255)
* sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps

And mmx:
_mm_cmpgt_pi8
_mm_cmpgt_pi16
_mm_unpackhi_pi16
_mm_unpacklo_pi8
_mm_unpacklo_pi16

* Fix: literal out of range
2017-12-30 11:19:44 -06:00
gwenn
17edf649af Fix some assert_instr (#254)
* Fix some assert_instr

Missing assert_instr:
- _mm_cvtsi32_si128
- _mm_cvtsi128_si32
- _mm_loadl_epi64
- _mm_storel_epi64
- _mm_move_epi64
- _mm_cvtsd_f64
- _mm_setzero_pd
- _mm_load1_pd
- _mm_load_pd1
- _mm_loaddup_pd

Wrong intrusction used:
- _mm_hsub_pi16

* Try to fix CI build by disabling some asserts

* Exclude some assert_instr on (x86_64, linux)
2017-12-30 11:19:00 -06:00
Alex Crichton
be461b1377
Verify Intel intrinsics against upstream definitions (#251)
This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which is
then parsed in the `stdsimd-verify` crate to verify that everything we write
down is matched against the upstream definitions.

Currently the checks are pretty loose to get this compiling but a few intrinsics
were fixed as a result of this. For example:

* `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX
* `_mm_tzcnt_32` - erroneously had `u32` in the name
* `_mm_tzcnt_64` - erroneously had `u64` in the name
* `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms
* `_mm_extract_epi64` - erroneously available on 32-bit platforms
* `_mm_insert_epi64` - erroneously available on 32-bit platforms
* `_mm256_extract_epi16` - erroneously returned i32 instead of i16
* `_mm256_extract_epi8` - erroneously returned i32 instead of i8
* `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32
* `_popcnt32` - the signededness of the argument and return were flipped
* `_popcnt64` - the signededness of the argument was flipped and the argument
  was too large bit-wise
* `_mm_tzcnt_32` - the return value's sign was flipped
* `_mm_tzcnt_64` - the return value's sign was flipped
* A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8:
  i32` which Intel was using. (we were also internally inconsistent)
* A number of intrinsics working with `__m64` were instead working with i64/u64,
  so they're now corrected to operate with the vector types instead.

Currently the verifications performed are:

* Each name in Rust is defined in the XML document
* The arguments/return values all agree.
* The CPUID features listed in the XML document are all enabled in Rust as well.

The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require closer
adherence with Intel's own types. Otherwise types like `i32x8` (or any integers
with 256 bits) all match up to `__m256i` right now, althoguh this may want to
change in the future.

Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in the XML
and the instruction listed in `assert_instr`, and those'll need to be taken care
of in a future commit.

Closes #240
2017-12-29 11:52:27 -06:00
gwenn
44a168a0b8 sse2: implements last remaining intrinsics (#244)
* sse2: __m64 related intrinsics

_mm_add_si64
_mm_mul_su32
_mm_sub_si64
_mm_cvtpi32_pd
_mm_set_epi64
_mm_set1_epi64
_mm_setr_epi64

* sse2: _mm_load_sd, _mm_loadh_pd, _mm_loadl_pd

* sse2: _mm_store_sd, _mm_storeh_pd, _mm_storel_pd

* sse2: _mm_shuffle_pd, _mm_move_sd

* sse2: _mm_cast*

_mm_castpd_ps
_mm_castpd_si128
_mm_castps_pd
_mm_castps_si128
_mm_castsi128_pd
_mm_castsi128_ps

* sse2: add some tests

* Try to fix AppVeyor build

* sse2: add more tests

* sse2: fix assert_instr for _mm_shuffle_pd

* Try to fix Travis build

* sse2: try to fix AppVeyor build

* sse2: try to fix AppVeyor build
2017-12-28 10:22:08 -06:00
Jonathan Goodman
3857c3e88a fix sse4a _mm_stream_{ss, sd} tests and docs 2017-12-27 22:32:49 +01:00
Alex Crichton
9aa4e30859 Update to syn master 2017-12-27 07:56:38 -08:00
gnzlbg
42ec76a3ff [sse4a] implement non-immediate-mode intrinsics (#249) 2017-12-22 10:14:41 -06:00
gnzlbg
1db6841813 [fmt] --force rustfmt-nightly 2017-12-22 00:24:23 +01:00
gnzlbg
52cc1abe2c [fmt] remove fn_call_width option (was removed upstream) 2017-12-22 00:24:23 +01:00
gnzlbg
5850282a1c use repr(align) to ensure proper alignment in tests 2017-12-22 00:24:23 +01:00
gnzlbg
4fb9420acb
Fix rustfmt (#239)
* [fmt] manually fix some formatting
* [fmt] reformat with rustfmt-nightly
* [clippy] fix clippy issues
2017-12-14 19:57:53 +01:00
gnzlbg
5ce0c13009 [ci] powerpc/powerpc64/powerpc64le (#237)
* [ci] add powerpc/powerpc64 build bots

* unbreak stdsimd builds for targets without run-time
2017-12-14 10:44:20 -06:00