user0/rust - Forgejo: Beyond coding. We Forge.

user0/rust

Author	SHA1	Message	Date
Alex Crichton	a2403de290	Don't count `nop` instructions after functions Looks like disassemblers will fill this in and/or LLVM inserts them for alignment, not useful to us in calculations.	2018-01-28 20:35:37 -08:00
Alex Crichton	263937a499	Fix a lint warning on aarch64	2018-01-28 20:32:11 -08:00
Alex Crichton	11f7b7e38e	Verify intrinsics don't leak to i686 (#305 ) Some intrinsics take `i64` or `u64` arguments which typically means that they're using 64-bit registers and aren't actually available on x86. This commit adds a check to stdsimd-verify to assert this and moves around some intrinsics that I believe should only be available on x86_64. This commit was checked in many places against gcc/clang/MSVC using godbolt.org to ensure that we're agreeing with what other compilers are doing. Closes #304	2018-01-28 22:31:31 -06:00
Alex Crichton	aefb22c51e	Don't distinguish between i586/i686 (#301 ) This was historically done as the contents of the `i686` module wouldn't actually compile on i586 for various reasons. I believe I've tracked this down to #300 where LLVM refuses to compile a function using the `x86_mmx` type without actually enabling the `mmx` feature (sort of reasonably so!). This commit will now compile in both the `i586` and `i686` modules of this crate into the `i586-unknown-linux-gnu` target, and the relevant functions now also enable the `mmx` feature if they're using the `__m64` type. I believe this is uncovering a more widespread problem where the `__m64` isn't usable outside the context of `mmx`-enabled functions. The i686 and x86_64 targets have this feature enabled by default which is why it's worked there, but they're not enabled for the i586 target. We'll probably want to consider this when stabilizing!	2018-01-28 22:31:22 -06:00
Alex Crichton	e0aed0fffc	Fix tests for a future nightly (#297 ) In rust-lang/rust#47743 the SIMD types in the Rust ABI are being switched to getting passed through memory unconditionally rather than through LLVM's immediate registers. This means that a bunch of our assert_instr invocations started breaking as LLVM has more efficient methods of dealing with memory than the instructions themselves. This switches `assert_instr` to unconditionally use a shim that is an `extern` function which should revert back to the previous behavior, using the simd types as immediates and testing the same.	2018-01-25 12:13:48 -06:00
Alex Crichton	73794e5035	Update stdsimd-verify to print out instruction differences Too many to deal with for now...	2018-01-20 10:13:26 -08:00
Alex Crichton	30694efc68	Remove PartialEq impls for x86 types (#294 ) * Remove `PartialEq for __m64` This helps to strip the public API of the vendor type for now, although this may come back at a later date! * Remove `PartialEq for __m128i` Like the previous commit, but for another type! * Remove `PartialEq for __m256i` Same as previous commit!	2018-01-20 11:24:09 -06:00
Alex Crichton	4b66abaede	Move x86-specific types to the `vendor` module (#293 ) I believe we're reserving the `simd` module for exclusively the portable types and their operations, so this commit moves the various x86-specific types from the portable modules to the `x86` module. Along the way this also adds some doc blocks for all the existing x86 types.	2018-01-19 21:20:44 -06:00
Alex Crichton	e19b6d9efd	Remove Into/From between x86 and portable types (#292 ) This is primarily doing to avoid falling into a portability trap by accident, and in general makes the vendor types (on x86) going towards as minimal as they can be. Along the way some tests were cleaned up which were still using the portable types.	2018-01-19 20:15:07 -06:00
Alex Crichton	54452230a7	Add an example of SIMD-powered hex encoding (#291 ) This is lifted from an example elsewhere I found and shows off runtime dispatching along with a lot of intrinsics being used in a bunch.	2018-01-19 16:53:38 -06:00
Alex Crichton	faf5aea427	Reduce implicit reliance on structure of __m* types (#290 ) They need to be structured somehow to be the right bit width but ideally we wouldn't have the intrinsics rely on the particulars about how they're represented.	2018-01-19 14:44:31 -06:00
Alex Crichton	330a124568	Update stdsimd-verify for vendor types (#289 ) This commit provides insurance that intrinsics are only introduced with known canonical types (`__m128i` and such) instead of also allowing `u8x16` for example.	2018-01-19 12:11:21 -06:00
Alex Crichton	30b1145ef7	Migrate the `i586::avx2` module to vendor types (#287 )	2018-01-19 10:32:16 -06:00
Alex Crichton	1ad6d5fa88	Migrate the x86_64 folder to vendor types (#284 )	2018-01-19 10:30:25 -06:00
messense	8deae9ce66	Update links in Cargo.toml to rust-lang-nursery/stdsimd (#288 )	2018-01-18 20:23:50 -06:00
Alex Crichton	c5afde07d2	Migrate the `i586::avx` module to vendor types (#286 ) Closes #285	2018-01-18 11:21:03 -06:00
Alex Crichton	5c8867c7c3	Update `target_feature` syntax (#283 ) This commit updates to the latest nightly's syntax where `#[target_feature = "+foo"]` is now deprecated in favor of `#[target_feature(enable = "foo")]`. Additionally `#[target_feature]` can only be applied to `unsafe` functions for now. Along the way this removes a few exampels that were just left around and also disables the `fxsr` modules as that target feature will need to land in upstream rust-lang/rust first as it's currently unknown to the compiler.	2018-01-17 09:45:02 -06:00
Josef Ippisch	8deead27f2	Implement addition aliases (#281 ) - `_m_paddb` for `_mm_add_pi8` - `_m_paddw` for `_mm_add_pi16` - `_m_paddd` for `_mm_add_pi32` - `_m_paddsb` for `_mm_adds_pi8` - `_m_paddsw` for `_mm_adds_pi16` - `_m_paddusb` for `_mm_adds_pu8` - `_m_paddusw` for `_mm_adds_pu16`	2018-01-13 12:08:53 -06:00
Josef Ippisch	50cf00372d	MMX subtraction instructions (#280 ) * Implement `_m_psubb` * Implement `_m_psubw` * Implement `_m_psubd` * Implement `_m_psubsb` * Implement `_m_psubsw` * Implement `_m_psubusb` * Implement `_m_psubusw` * Have the subtraction intrinsic naming consistent with the addition ones E.g. use `_mm_sub_pi8` instead of `_m_psubb` * Implement all subtraction aliases for the `_mm_*` variants - `_m_psubb` for `_mm_sub_pi8` - `_m_psubw` for `_mm_sub_pi16` - `_m_psubd` for `_mm_sub_pi32` - `_m_psubsb` for `_mm_subs_pi8` - `_m_psubsw` for `_mm_subs_pi16` - `_m_psubusb` for `_mm_subs_pu8` - `_m_psubusw` for `_mm_subs_pu16`	2018-01-12 17:10:51 -06:00
Alex Crichton	e77ebf194a	Migrate the i686 module to vendor types (#279 ) * Migrate `i686::sse` to vendor types * Migrate `i686::sse2` to vendor types * Migrate i686::sse41 to vendor types * Migrate i686::sse42 to vendor types	2018-01-12 14:08:20 -06:00
Alex Crichton	48a7490711	Make rustc's job a little esaier in sse42 (#277 ) Move all the casts from `__m128i` to `i8x16` outside the macro invocations so rustc only has to resolve a few function calls, not thousands!	2018-01-12 11:37:06 -06:00
Alex Crichton	feb8c2b152	Migrate i586::ssse3 to vendor types (#275 )	2018-01-11 23:18:35 -06:00
Alex Crichton	fde52cb334	Migrate i586::sse41 to vendor types (#276 )	2018-01-11 23:18:15 -06:00
Alex Crichton	3148881fa2	Move travis workaround earlier Try to get it used on OSX as well	2018-01-11 08:24:11 -08:00
Alex Crichton	5467c0a008	Migrate i586::sse3 to vendor types (#274 )	2018-01-11 10:13:26 -06:00
Alex Crichton	6d8d2f81e9	Migrate a bunch of i586::sse2 to native types (#273 )	2018-01-10 12:42:26 -06:00
Alex Crichton	baf9d0e7e0	Migrate the `i686::sse` module to vendor types (#269 ) This migrates the entire `i686::sse` module (and touches a few others) to the vendor types.	2018-01-09 13:38:09 -06:00
Jef	248f5441bb	Make `splat` a const fn	2018-01-09 18:38:47 +01:00
Alex Crichton	fd2cc3bc05	Migrate `_mm_add_ss` to `__m128` (#265 ) This commit starts the migration towards Intel's types one intrinsic at a time, starting with `_mm_add_ss`. This is mostly just to get a feel for what the tests will start to look like.	2018-01-09 09:49:08 -06:00
gnzlbg	58664a6f54	More run-time detection improvements (#242 ) * [core/runtime] use getauxval on non-x86 platforms * test coresimd::auxv against auxv crate * add test files from auxv crate * [arm] use simd_test macro * formatting * missing docs * improve docs * reading /proc/self/auxv succeeds only if reading all fields succeeds * remove cc-crate build dependency * getauxval succeeds only if hwcap/hwcap2 are non-zero * fix formatting * move getauxval to stdsimd * delete getauxval-wrapper.c * remove auxv crate dev-dependency from coresimd	2018-01-09 09:23:45 -06:00
Alex Crichton	94fe929a03	Update to a released syn/quote version	2018-01-08 10:10:52 -08:00
Josef Ippisch	705c34b4eb	Implement all addition MMX intrinsics (#266 ) * Implement `_mm_add_pi16` * Implement `_mm_add_pi8` * Implement `_mm_add_pi32` * Implement `_mm_adds_pi16` * Implement `_mm_adds_pi8` * Implement `_mm_adds_pu8` * Implement `_mm_adds_pu16`	2018-01-06 12:36:05 -06:00
Jake Goulding	4667c63113	Add RDTSC and RDTSCP intrinsics (#264 )	2018-01-05 13:30:26 -06:00
gnzlbg	4bb1ea5a05	Completes SSE and adds some MMX intrinsics (#247 ) * Completes SSE and adds some MMX intrinsics MMX: - `_mm_cmpgt_pi{8,16,32}` - `_mm_unpack{hi,lo}_pi{8,16,32}` SSE (is now complete): - `_mm_cvtp{i,u}{8,16}_ps` - add test for `_m_pmulhuw` * fmt and clippy * add an exception for intrinsics using cvtpi2ps	2018-01-04 10:15:23 -06:00
Alex Crichton	4f1f2bd550	Add an exception for vzeroall/vzeroupper on Windows These apparently blow the 20 intstruction limit with all the loads/stores.	2018-01-03 16:02:35 -08:00
Alex Crichton	3441968ffa	Turn down debug level on release mode Apparently helps fix errors about codeview registers on MSVC!	2018-01-03 15:59:31 -08:00
Alex Crichton	edbfae36c0	Lower the instruction limit to 20 (#262 ) Right now it's 30 which is a bit high, most of the intrinsics requiring all these instructions ended up needing to be fixed anyway.	2018-01-03 17:21:01 -06:00
Alex Crichton	07ebce51b8	Assert intrinsic implementations are inlined properly (#261 ) * assert_instr check for failed inlining * Fix `call` instructions showing up in some intrinsics The ABI of types like `u8x8` as they're defined isn't actually the underlying type we need for LLVM, but only `__m64` currently satisfies that. Apparently this (and the casts involved) caused some extraneous instructions for a number of intrinsics. They've all moved over to the `__m64` type now to ensure that they're what the underlying interface is. * Allow PIC-relative `call` instructions on x86 These should be harmless when evaluating whether we failed inlining	2018-01-03 16:37:45 -06:00
gwenn	acc8d3de10	Use llvm builtins where possible (#260 ) * Fix sse::_mm_cvtsi32_ss and sse::_mm_cvtsi64_ss By using LLVM builtins, the expected instruction is correctly generated on all platforms. * Use LLVM builtins for storeu* Just to make sure that the wrong instructions is not related to Rust code.	2018-01-03 15:18:34 -06:00
gwenn	983b72d189	Last missing avx and avx2 intrinsics (#258 ) * avx: _mm256_cvtss_f32, avx2: _mm256_cvtsd_f64, _mm256_cvtsi256_si32 * avx2: _mm256_slli_si256, _mm256_srli_si256 And aliases: _mm256_bslli_epi128 _mm256_bsrli_epi128	2018-01-02 14:33:02 -06:00
Alex Crichton	ec373ba107	Update to `syn` master	2018-01-02 12:32:27 -08:00
Alex Crichton	59ed27cc95	Fix stdsimd-verify for syn master	2017-12-31 09:52:16 -08:00
Alex Crichton	3403b6f06a	Fix compile with `syn` master	2017-12-31 09:19:44 -08:00
gwenn	802a379a4a	sse2: remove duplicates and move intrinsics to x86_64 file (#256 ) * sse2: remove duplicates from i686 file _mm_cvtsi64x_si128 _mm_cvtsi64_si128 _mm_cvtsi128_si64 _mm_cvtsi128_si64x * sse2: move _mm_cvtsi64_sd and _mm_cvtsi64x_sd to x86_64 file	2017-12-31 00:58:14 -06:00
Adam Niederer	9141a063c9	Add bswap (#257 )	2017-12-31 00:57:04 -06:00
gwenn	5ca8c0aa93	sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps (#255 ) * sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps And mmx: _mm_cmpgt_pi8 _mm_cmpgt_pi16 _mm_unpackhi_pi16 _mm_unpacklo_pi8 _mm_unpacklo_pi16 * Fix: literal out of range	2017-12-30 11:19:44 -06:00
gwenn	17edf649af	Fix some assert_instr (#254 ) * Fix some assert_instr Missing assert_instr: - _mm_cvtsi32_si128 - _mm_cvtsi128_si32 - _mm_loadl_epi64 - _mm_storel_epi64 - _mm_move_epi64 - _mm_cvtsd_f64 - _mm_setzero_pd - _mm_load1_pd - _mm_load_pd1 - _mm_loaddup_pd Wrong intrusction used: - _mm_hsub_pi16 * Try to fix CI build by disabling some asserts * Exclude some assert_instr on (x86_64, linux)	2017-12-30 11:19:00 -06:00
Alex Crichton	be461b1377	Verify Intel intrinsics against upstream definitions (#251 ) This commit adds a new crate for testing that the intrinsics listed in this crate do indeed match the upstream definition of each intrinsic. A pre-downloaded XML description of all Intel intrinsics is checked in which is then parsed in the `stdsimd-verify` crate to verify that everything we write down is matched against the upstream definitions. Currently the checks are pretty loose to get this compiling but a few intrinsics were fixed as a result of this. For example: * `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX * `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX * `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX * `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX * `_mm_tzcnt_32` - erroneously had `u32` in the name * `_mm_tzcnt_64` - erroneously had `u64` in the name * `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms * `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms * `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms * `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms * `_mm_extract_epi64` - erroneously available on 32-bit platforms * `_mm_insert_epi64` - erroneously available on 32-bit platforms * `_mm256_extract_epi16` - erroneously returned i32 instead of i16 * `_mm256_extract_epi8` - erroneously returned i32 instead of i8 * `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32 * `_popcnt32` - the signededness of the argument and return were flipped * `_popcnt64` - the signededness of the argument was flipped and the argument was too large bit-wise * `_mm_tzcnt_32` - the return value's sign was flipped * `_mm_tzcnt_64` - the return value's sign was flipped * A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8: i32` which Intel was using. (we were also internally inconsistent) * A number of intrinsics working with `__m64` were instead working with i64/u64, so they're now corrected to operate with the vector types instead. Currently the verifications performed are: * Each name in Rust is defined in the XML document * The arguments/return values all agree. * The CPUID features listed in the XML document are all enabled in Rust as well. The type matching right now is pretty loose and has a lot of questionable changes. Future commits will touch these up to be more strict and require closer adherence with Intel's own types. Otherwise types like `i32x8` (or any integers with 256 bits) all match up to `__m256i` right now, althoguh this may want to change in the future. Finally we're also not testing the instruction listed in the XML right now. There's a huge number of discrepancies between the instruction listed in the XML and the instruction listed in `assert_instr`, and those'll need to be taken care of in a future commit. Closes #240	2017-12-29 11:52:27 -06:00
gwenn	44a168a0b8	sse2: implements last remaining intrinsics (#244 ) * sse2: __m64 related intrinsics _mm_add_si64 _mm_mul_su32 _mm_sub_si64 _mm_cvtpi32_pd _mm_set_epi64 _mm_set1_epi64 _mm_setr_epi64 * sse2: _mm_load_sd, _mm_loadh_pd, _mm_loadl_pd * sse2: _mm_store_sd, _mm_storeh_pd, _mm_storel_pd * sse2: _mm_shuffle_pd, _mm_move_sd * sse2: _mm_cast* _mm_castpd_ps _mm_castpd_si128 _mm_castps_pd _mm_castps_si128 _mm_castsi128_pd _mm_castsi128_ps * sse2: add some tests * Try to fix AppVeyor build * sse2: add more tests * sse2: fix assert_instr for _mm_shuffle_pd * Try to fix Travis build * sse2: try to fix AppVeyor build * sse2: try to fix AppVeyor build	2017-12-28 10:22:08 -06:00
Jonathan Goodman	3857c3e88a	fix sse4a _mm_stream_{ss, sd} tests and docs	2017-12-27 22:32:49 +01:00

1 2 3 4 5 ...

347 commits