Commit graph

67 commits

Author SHA1 Message Date
Alex Crichton
cf738b0d36
Attempt to fix tests on master (#662)
* Attempt to fix tests on master

* Make all doctests use items from the real `std` rather than this
  crate, it's just easier
* Handle debuginfo weirdness by flagging functions as `no_mangle` that
  we're looking for instructions within.

* Handle double undescores in symbol names
2019-01-30 15:11:35 -08:00
Peter Jin
2c924adce6 Fixes to the nvptx target spec json: disable merge-functions, (#653)
and set the correct datalayout string.
2019-01-25 12:51:13 -07:00
gnzlbg
1d1266b185 Readme from std_detect 2019-01-22 18:49:24 +01:00
gnzlbg
8bfa74b5e7 Enable passing allow_failure builds (#644) 2019-01-22 08:57:17 -08:00
gnzlbg
11c624e488 Refactor stdsimd
This commit:

* renames `coresimd` to `core_arch` and `stdsimd` to `std_detect`

* `std_detect` does no longer depend on `core_arch` - it is a freestanding
  `no_std` library that only depends on `core` - it is renamed to `std_detect`

* moves the top-level coresimd and stdsimd directories into the appropriate
  crates/... directories - this simplifies creating crate.io releases of these crates

* moves the top-level `coresimd` and `stdsimd` sub-directories into their
  corresponding crates in `crates/{core_arch, std_detect}`.
2019-01-22 17:04:25 +01:00
gnzlbg
c4983c50d2 Fix android build jobs 2019-01-21 21:37:45 +01:00
Peter Jin
d30c29e926 Add a build libcore-only nvptx64 test (using xargo).
This also disables the "integer_atomics" feature on nvptx/nvptx64.
2018-12-29 12:02:16 +01:00
Alex Crichton
24b3977f6a
Run multithreaded quiet tests (#622)
We historically have run single-threaded verbose tests because we were
faulting all over the place due to bugs in rustc itself, primarily
around calling conventions and passing values around. Those bugs have
all since been fixed so we should be clear to run multithreaded tests
quietly on CI nowadays!

Closes #621
2018-12-14 13:28:23 -06:00
Alex Crichton
cb921381c4
Rewrite simd128 and wasm support (#620)
* Update representation of `v128`
* Rename everything with new naming convention of underscores and no
  modules/impls
* Remove no longer necessary `wasm_simd128` feature
* Remove `#[target_feature]` attributes (use `#[cfg]` instead)
* Update `assert_instr` tests
* Update some implementations as LLVM has evolved
* Allow some more esoteric syntax in `#[assert_instr]`
* Adjust the safety of APIs where appropriate
* Remove macros in favor of hand-coded implementations
* Comment out the tests for now as there's no known runtime for these
  yet
2018-12-13 20:17:30 -06:00
Alex Crichton
591ce8fe6f Add retries to a number of downloads 2018-12-13 15:30:17 -08:00
gnzlbg
e375261a1c remove intel_sde feature 2018-11-11 12:37:44 +01:00
gnzlbg
25352920e1 silence shellcheck warning 2018-11-11 12:37:44 +01:00
gnzlbg
a3acafad81 pass RUSTFLAGS to docker 2018-11-11 12:37:44 +01:00
gnzlbg
b1782e71ef travis linux VM do not all support avx2 2018-11-11 12:37:44 +01:00
gnzlbg
8d1ae0234a add mips docker containers 2018-11-11 12:37:44 +01:00
gnzlbg
eee3d5e6f0 fix clippy and shellcheck issues 2018-11-11 12:37:44 +01:00
gnzlbg
51d9585ece cleanup travis and run.sh scripts 2018-11-11 12:37:44 +01:00
Kaz Wesley
7fda54f9bc fix _mm_castsi128_pd and _mm_castpd_si128 impls (#581)
* fix _mm_castsi128_pd and _mm_castpd_si128 impls

The _mm_castX_Y SSE intrinsics are "reinterpreting" casts; LLVM's
simd_cast is a "converting" cast. Replace simd_cast with mem::transmute.
Fixes #55249

* Temporarily pin CI

* Fix i686 segfaults

* Fix wasm CI

Output of `wasm2wat` has changed!

* Fix AppVeyor with an older nightly
2018-10-23 18:10:54 +02:00
Alex Crichton
31faffa592 Remove lld-shim.rs no longer needed on wasm
Bugs are fixed upstream!
2018-09-17 11:32:10 +02:00
Alex Crichton
c1965d33a8
Rename wasm32 memory intrinsics (#560)
The official name of the memory intrinsics has changed to `memory.size` and
`memory.grow`, so let's reflect that with our naming as well! Additionally they
have an argument of which memory to operate on with LLVM and must always be zero
currently.
2018-09-06 15:34:05 -07:00
gnzlbg
3daebfbc0b Add wasm32 simd128 intrinsics (#549)
* Add wasm32 simd128 intrinsics

* test wasm32 simd128 instructions

* Run wasm tests like all other tests

* use modules instead of types to access wasm simd128 interpretations

* generate docs for wasm32-unknown-unknown

* fix typo

* Enable #[assert_instr] on wasm32

* Shell out to Node's `execSync` to execute `wasm2wat` over our wasm file
* Parse the wasm file line-by-line, looking for various function markers and
  such
* Use the `elem` section to build a function pointer table, allowing us to map
  exactly from function pointer to a function
* Avoid losing debug info (the names section) in release mode by stripping
  `--strip-debug` from `rust-lld`.

* remove exclude list from Cargo.toml

* fix assert_instr for non-wasm targets

* re-format assert-instr changes

* add crate that uses assert_instr

* Fix instructions having extra quotes

* Add assert_instr for wasm memory intrinsics

* Remove hacks for git wasm-bindgen

* add wasm_simd128 feature

* make wasm32 build correctly

* run simd128 tests on ci

* remove wasm-assert-instr-tests
2018-08-15 09:20:33 -07:00
Alex Crichton
f1e4ebd8de
Fix compile of stdsimd on powerpc with no flags (#531)
We're running into issues updating with rust-lang/rust#52535, so we need to get
this working without `RUSTFLAGS` enabling the `altivec` feature
2018-07-20 11:54:33 -05:00
Luca Barbato
8d8d81aa35 Drop the not really supported PowerPC 32bit target
The LLVM backend has known issues and even for them the main
development target is PowerPC 64bit Little Endian.
2018-07-11 15:41:07 +02:00
Luca Barbato
77243a10a1 Check the documentation for the supported powerpc64
PowerPC 64bit Little Endian is the main development target currently.
2018-07-11 15:41:07 +02:00
Luca Barbato
409f648047 Make the dox.sh more verbose
Make easier spot where the errors happen.
2018-07-11 15:41:07 +02:00
gnzlbg
d5cf70cac5 [s390x] add CI
This commit tests `s390x-unknown-linux-gnu` on CI using `qemu-user`.

Closes #499 .
2018-06-26 14:54:07 +02:00
gnzlbg
e70ae5558f add CI for Android 2018-06-23 16:09:27 +02:00
Luca Barbato
3d618b3cd6 Do not run the altivec tests for powerpc64
The big endian variant will be supported properly later.
2018-05-23 18:16:14 +02:00
Luca Barbato
9888c6ce82 Update proc macro2 (#455)
* Update to proc_macro2 0.4 and related

* Update to proc_macro2 0.4 and related

* Update to proc_macro2 0.4 and related

* Add proc_macro_gen feature

* Update to the new rustfmt cli

* A few proc-macro2 stylistic updates

* Disable RUST_BACKTRACE by default

* Allow rustfmt failure for now

* Disable proc-macro2 nightly feature in verify-x86

Currently this causes bugs on nightly due to upstream rustc bugs, this should be
temporary

* Attempt to thwart mergefunc

* Use static relocation model on i686
2018-05-21 13:37:41 -05:00
gnzlbg
8ea9bc53f1 Initial PowerPC altivec and VSX support (#447)
* add some powerpc/powerpc64 altivec/vsx intrinsics

* temporarily make IntoBits/FromBits inline(always)

* include powerpc64 module; use inline(always) from/into_bits only on powerpc
2018-05-16 12:10:19 -05:00
gnzlbg
c0bf5d9c42 Workarounds for all/any mask reductions on x86, armv7, and aarch64 (#425)
* Work arounds for LLVM6 code-gen bugs in all/any reductions

This commit adds workarounds for the mask reductions: `all` and `any`.

64-bit wide mask types (`m8x8`, `m16x4`, `m32x2`)

`x86_64` with `MMX` enabled

```asm
all_8x8:
 push    rbp
 mov     rbp, rsp
 movzx   eax, byte, ptr, [rdi, +, 7]
 movd    xmm0, eax
 movzx   eax, byte, ptr, [rdi, +, 6]
 movd    xmm1, eax
 punpcklwd xmm1, xmm0
 movzx   eax, byte, ptr, [rdi, +, 5]
 movd    xmm0, eax
 movzx   eax, byte, ptr, [rdi, +, 4]
 movd    xmm2, eax
 punpcklwd xmm2, xmm0
 punpckldq xmm2, xmm1
 movzx   eax, byte, ptr, [rdi, +, 3]
 movd    xmm0, eax
 movzx   eax, byte, ptr, [rdi, +, 2]
 movd    xmm1, eax
 punpcklwd xmm1, xmm0
 movzx   eax, byte, ptr, [rdi, +, 1]
 movd    xmm0, eax
 movzx   eax, byte, ptr, [rdi]
 movd    xmm3, eax
 punpcklwd xmm3, xmm0
 punpckldq xmm3, xmm1
 punpcklqdq xmm3, xmm2
 movdqa  xmm0, xmmword, ptr, [rip, +, LCPI9_0]
 pand    xmm3, xmm0
 pcmpeqw xmm3, xmm0
 pshufd  xmm0, xmm3, 78
 pand    xmm0, xmm3
 pshufd  xmm1, xmm0, 229
 pand    xmm1, xmm0
 movdqa  xmm0, xmm1
 psrld   xmm0, 16
 pand    xmm0, xmm1
 movd    eax, xmm0
 and     al, 1
 pop     rbp
 ret
any_8x8:
 push    rbp
 mov     rbp, rsp
 movzx   eax, byte, ptr, [rdi, +, 7]
 movd    xmm0, eax
 movzx   eax, byte, ptr, [rdi, +, 6]
 movd    xmm1, eax
 punpcklwd xmm1, xmm0
 movzx   eax, byte, ptr, [rdi, +, 5]
 movd    xmm0, eax
 movzx   eax, byte, ptr, [rdi, +, 4]
 movd    xmm2, eax
 punpcklwd xmm2, xmm0
 punpckldq xmm2, xmm1
 movzx   eax, byte, ptr, [rdi, +, 3]
 movd    xmm0, eax
 movzx   eax, byte, ptr, [rdi, +, 2]
 movd    xmm1, eax
 punpcklwd xmm1, xmm0
 movzx   eax, byte, ptr, [rdi, +, 1]
 movd    xmm0, eax
 movzx   eax, byte, ptr, [rdi]
 movd    xmm3, eax
 punpcklwd xmm3, xmm0
 punpckldq xmm3, xmm1
 punpcklqdq xmm3, xmm2
 movdqa  xmm0, xmmword, ptr, [rip, +, LCPI8_0]
 pand    xmm3, xmm0
 pcmpeqw xmm3, xmm0
 pshufd  xmm0, xmm3, 78
 por     xmm0, xmm3
 pshufd  xmm1, xmm0, 229
 por     xmm1, xmm0
 movdqa  xmm0, xmm1
 psrld   xmm0, 16
 por     xmm0, xmm1
 movd    eax, xmm0
 and     al, 1
 pop     rbp
 ret
```

After this PR for `m8x8`, `m16x4`, `m32x2`:

```asm
all_8x8:
 push    rbp
 mov     rbp, rsp
 movq    mm0, qword, ptr, [rdi]
 pmovmskb eax, mm0
 cmp     eax, 255
 sete    al
 pop     rbp
 ret
any_8x8:
 push    rbp
 mov     rbp, rsp
 movq    mm0, qword, ptr, [rdi]
 pmovmskb eax, mm0
 test    eax, eax
 setne   al
 pop     rbp
 ret
```

x86` with `MMX` enabled

Before this PR:

```asm
all_8x8:
 call    L9$pb
L9$pb:
 pop     eax
 mov     ecx, dword, ptr, [esp, +, 4]
 movzx   edx, byte, ptr, [ecx, +, 7]
 movd    xmm0, edx
 movzx   edx, byte, ptr, [ecx, +, 6]
 movd    xmm1, edx
 punpcklwd xmm1, xmm0
 movzx   edx, byte, ptr, [ecx, +, 5]
 movd    xmm0, edx
 movzx   edx, byte, ptr, [ecx, +, 4]
 movd    xmm2, edx
 punpcklwd xmm2, xmm0
 punpckldq xmm2, xmm1
 movzx   edx, byte, ptr, [ecx, +, 3]
 movd    xmm0, edx
 movzx   edx, byte, ptr, [ecx, +, 2]
 movd    xmm1, edx
 punpcklwd xmm1, xmm0
 movzx   edx, byte, ptr, [ecx, +, 1]
 movd    xmm0, edx
 movzx   ecx, byte, ptr, [ecx]
 movd    xmm3, ecx
 punpcklwd xmm3, xmm0
 punpckldq xmm3, xmm1
 punpcklqdq xmm3, xmm2
 movdqa  xmm0, xmmword, ptr, [eax, +, LCPI9_0-L9$pb]
 pand    xmm3, xmm0
 pcmpeqw xmm3, xmm0
 pshufd  xmm0, xmm3, 78
 pand    xmm0, xmm3
 pshufd  xmm1, xmm0, 229
 pand    xmm1, xmm0
 movdqa  xmm0, xmm1
 psrld   xmm0, 16
 pand    xmm0, xmm1
 movd    eax, xmm0
 and     al, 1
 ret
any_8x8:
 call    L8$pb
L8$pb:
 pop     eax
 mov     ecx, dword, ptr, [esp, +, 4]
 movzx   edx, byte, ptr, [ecx, +, 7]
 movd    xmm0, edx
 movzx   edx, byte, ptr, [ecx, +, 6]
 movd    xmm1, edx
 punpcklwd xmm1, xmm0
 movzx   edx, byte, ptr, [ecx, +, 5]
 movd    xmm0, edx
 movzx   edx, byte, ptr, [ecx, +, 4]
 movd    xmm2, edx
 punpcklwd xmm2, xmm0
 punpckldq xmm2, xmm1
 movzx   edx, byte, ptr, [ecx, +, 3]
 movd    xmm0, edx
 movzx   edx, byte, ptr, [ecx, +, 2]
 movd    xmm1, edx
 punpcklwd xmm1, xmm0
 movzx   edx, byte, ptr, [ecx, +, 1]
 movd    xmm0, edx
 movzx   ecx, byte, ptr, [ecx]
 movd    xmm3, ecx
 punpcklwd xmm3, xmm0
 punpckldq xmm3, xmm1
 punpcklqdq xmm3, xmm2
 movdqa  xmm0, xmmword, ptr, [eax, +, LCPI8_0-L8$pb]
 pand    xmm3, xmm0
 pcmpeqw xmm3, xmm0
 pshufd  xmm0, xmm3, 78
 por     xmm0, xmm3
 pshufd  xmm1, xmm0, 229
 por     xmm1, xmm0
 movdqa  xmm0, xmm1
 psrld   xmm0, 16
 por     xmm0, xmm1
 movd    eax, xmm0
 and     al, 1
 ret
```

After this PR:

```asm
all_8x8:
 mov     eax, dword, ptr, [esp, +, 4]
 movq    mm0, qword, ptr, [eax]
 pmovmskb eax, mm0
 cmp     eax, 255
 sete    al
 ret
any_8x8:
 mov     eax, dword, ptr, [esp, +, 4]
 movq    mm0, qword, ptr, [eax]
 pmovmskb eax, mm0
 test    eax, eax
 setne   al
 ret
```

`aarch64`

Before this PR:

```asm
all_8x8:
 ldr     d0, [x0]
 umov    w8, v0.b[0]
 umov    w9, v0.b[1]
 tst     w8, #0xff
 umov    w10, v0.b[2]
 cset    w8, ne
 tst     w9, #0xff
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[3]
 and     w8, w8, w9
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[4]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[5]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[6]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[7]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 and     w8, w9, w8
 cset    w9, ne
 and     w0, w9, w8
 ret
any_8x8:
 ldr     d0, [x0]
 umov    w8, v0.b[0]
 umov    w9, v0.b[1]
 orr     w8, w8, w9
 umov    w9, v0.b[2]
 orr     w8, w8, w9
 umov    w9, v0.b[3]
 orr     w8, w8, w9
 umov    w9, v0.b[4]
 orr     w8, w8, w9
 umov    w9, v0.b[5]
 orr     w8, w8, w9
 umov    w9, v0.b[6]
 orr     w8, w8, w9
 umov    w9, v0.b[7]
 orr     w8, w8, w9
 tst     w8, #0xff
 cset    w0, ne
 ret
```

After this PR:

```asm
all_8x8:
 ldr     d0, [x0]
 mov     v0.d[1], v0.d[0]
 uminv   b0, v0.16b
 fmov    w8, s0
 tst     w8, #0xff
 cset    w0, ne
 ret
any_8x8:
 ldr     d0, [x0]
 mov     v0.d[1], v0.d[0]
 umaxv   b0, v0.16b
 fmov    w8, s0
 tst     w8, #0xff
 cset    w0, ne
 ret
```

`ARMv7` + `neon`

Before this PR:

```asm
all_8x8:
 vmov.i8 d0, #0x1
 vldr    d1, [r0]
 vtst.8  d0, d1, d0
 vext.8  d1, d0, d0, #4
 vand    d0, d0, d1
 vext.8  d1, d0, d0, #2
 vand    d0, d0, d1
 vdup.8  d1, d0[1]
 vand    d0, d0, d1
 vmov.u8 r0, d0[0]
 and     r0, r0, #1
 bx      lr
any_8x8:
 vmov.i8 d0, #0x1
 vldr    d1, [r0]
 vtst.8  d0, d1, d0
 vext.8  d1, d0, d0, #4
 vorr    d0, d0, d1
 vext.8  d1, d0, d0, #2
 vorr    d0, d0, d1
 vdup.8  d1, d0[1]
 vorr    d0, d0, d1
 vmov.u8 r0, d0[0]
 and     r0, r0, #1
 bx      lr
```

After this PR:

```asm
all_8x8:
 vldr    d0, [r0]
 b       <m8x8 as All>::all

<m8x8 as All>::all:
 vpmin.u8 d16, d0, d16
 vpmin.u8 d16, d16, d16
 vpmin.u8 d0, d16, d16
 b       m8x8::extract

any_8x8:
 vldr    d0, [r0]
 b       <m8x8 as Any>::any

<m8x8 as Any>::any:
 vpmax.u8 d16, d0, d16
 vpmax.u8 d16, d16, d16
 vpmax.u8 d0, d16, d16
 b       m8x8::extract
```

(note: inlining does not work properly on ARMv7)

128-bit wide mask types (`m8x16`, `m16x8`, `m32x4`, `m64x2`)

`x86_64` with SSE2 enabled

Before this PR:

```asm
all_8x16:
 push    rbp
 mov     rbp, rsp
 movdqa  xmm0, xmmword, ptr, [rip, +, LCPI9_0]
 movdqa  xmm1, xmmword, ptr, [rdi]
 pand    xmm1, xmm0
 pcmpeqb xmm1, xmm0
 pmovmskb eax, xmm1
 xor     ecx, ecx
 cmp     eax, 65535
 mov     eax, -1
 cmovne  eax, ecx
 and     al, 1
 pop     rbp
 ret
any_8x16:
 push    rbp
 mov     rbp, rsp
 movdqa  xmm0, xmmword, ptr, [rip, +, LCPI8_0]
 movdqa  xmm1, xmmword, ptr, [rdi]
 pand    xmm1, xmm0
 pcmpeqb xmm1, xmm0
 pmovmskb eax, xmm1
 neg     eax
 sbb     eax, eax
 and     al, 1
 pop     rbp
 ret
```

After this PR:

```asm
all_8x16:
 push    rbp
 mov     rbp, rsp
 movdqa  xmm0, xmmword, ptr, [rdi]
 pmovmskb eax, xmm0
 cmp     eax, 65535
 sete    al
 pop     rbp
 ret
any_8x16:
 push    rbp
 mov     rbp, rsp
 movdqa  xmm0, xmmword, ptr, [rdi]
 pmovmskb eax, xmm0
 test    eax, eax
 setne   al
 pop     rbp
 ret
```

`aarch64`

Before this PR:

```asm
all_8x16:
 ldr     q0, [x0]
 umov    w8, v0.b[0]
 umov    w9, v0.b[1]
 tst     w8, #0xff
 umov    w10, v0.b[2]
 cset    w8, ne
 tst     w9, #0xff
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[3]
 and     w8, w8, w9
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[4]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[5]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[6]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[7]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[8]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[9]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[10]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[11]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[12]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[13]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[14]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 umov    w10, v0.b[15]
 and     w8, w9, w8
 cset    w9, ne
 tst     w10, #0xff
 and     w8, w9, w8
 cset    w9, ne
 and     w0, w9, w8
 ret
any_8x16:
 ldr     q0, [x0]
 umov    w8, v0.b[0]
 umov    w9, v0.b[1]
 orr     w8, w8, w9
 umov    w9, v0.b[2]
 orr     w8, w8, w9
 umov    w9, v0.b[3]
 orr     w8, w8, w9
 umov    w9, v0.b[4]
 orr     w8, w8, w9
 umov    w9, v0.b[5]
 orr     w8, w8, w9
 umov    w9, v0.b[6]
 orr     w8, w8, w9
 umov    w9, v0.b[7]
 orr     w8, w8, w9
 umov    w9, v0.b[8]
 orr     w8, w8, w9
 umov    w9, v0.b[9]
 orr     w8, w8, w9
 umov    w9, v0.b[10]
 orr     w8, w8, w9
 umov    w9, v0.b[11]
 orr     w8, w8, w9
 umov    w9, v0.b[12]
 orr     w8, w8, w9
 umov    w9, v0.b[13]
 orr     w8, w8, w9
 umov    w9, v0.b[14]
 orr     w8, w8, w9
 umov    w9, v0.b[15]
 orr     w8, w8, w9
 tst     w8, #0xff
 cset    w0, ne
 ret
```

After this PR:

```asm
all_8x16:
 ldr     q0, [x0]
 uminv   b0, v0.16b
 fmov    w8, s0
 tst     w8, #0xff
 cset    w0, ne
 ret
any_8x16:
 ldr     q0, [x0]
 umaxv   b0, v0.16b
 fmov    w8, s0
 tst     w8, #0xff
 cset    w0, ne
 ret
```

 `ARMv7` + `neon`

Before this PR:

```asm
all_8x16:
 vmov.i8 q0, #0x1
 vld1.64 {d2, d3}, [r0]
 vtst.8  q0, q1, q0
 vext.8  q1, q0, q0, #8
 vand    q0, q0, q1
 vext.8  q1, q0, q0, #4
 vand    q0, q0, q1
 vext.8  q1, q0, q0, #2
 vand    q0, q0, q1
 vdup.8  q1, d0[1]
 vand    q0, q0, q1
 vmov.u8 r0, d0[0]
 and     r0, r0, #1
 bx      lr
any_8x16:
 vmov.i8 q0, #0x1
 vld1.64 {d2, d3}, [r0]
 vtst.8  q0, q1, q0
 vext.8  q1, q0, q0, #8
 vorr    q0, q0, q1
 vext.8  q1, q0, q0, #4
 vorr    q0, q0, q1
 vext.8  q1, q0, q0, #2
 vorr    q0, q0, q1
 vdup.8  q1, d0[1]
 vorr    q0, q0, q1
 vmov.u8 r0, d0[0]
 and     r0, r0, #1
 bx      lr
```

After this PR:

```asm
all_8x16:
 vld1.64 {d0, d1}, [r0]
 b       <m8x16 as All>::all

<m8x16 as All>::all:
 vpmin.u8 d0, d0, d
 b       <m8x8 as All>::all
any_8x16:
 vld1.64 {d0, d1}, [r0]
 b       <m8x16 as Any>::any

<m8x16 as Any>::any:
 vpmax.u8 d0, d0, d1
 b       <m8x8 as Any>::any
```

The inlining problems are pretty bad on ARMv7 + NEON.

256-bit wide mask types (`m8x32`, `m16x16`, `m32x8`, `m64x4`)

With SSE2 enabled

Before this PR:

```asm
all_8x32:
 push    rbp
 mov     rbp, rsp
 movdqa  xmm0, xmmword, ptr, [rip, +, LCPI17_0]
 movdqa  xmm1, xmmword, ptr, [rdi]
 pand    xmm1, xmm0
 movdqa  xmm2, xmmword, ptr, [rdi, +, 16]
 pand    xmm2, xmm0
 pcmpeqb xmm2, xmm0
 pcmpeqb xmm1, xmm0
 pand    xmm1, xmm2
 pmovmskb eax, xmm1
 xor     ecx, ecx
 cmp     eax, 65535
 mov     eax, -1
 cmovne  eax, ecx
 and     al, 1
 pop     rbp
 ret
 any_8x32:
 push    rbp
 mov     rbp, rsp
 movdqa  xmm0, xmmword, ptr, [rdi]
 por     xmm0, xmmword, ptr, [rdi, +, 16]
 movdqa  xmm1, xmmword, ptr, [rip, +, LCPI16_0]
 pand    xmm0, xmm1
 pcmpeqb xmm0, xmm1
 pmovmskb eax, xmm0
 neg     eax
 sbb     eax, eax
 and     al, 1
 pop     rbp
 ret
```

After this PR:

```asm
all_8x32:
 push    rbp
 mov     rbp, rsp
 movdqa  xmm0, xmmword, ptr, [rdi]
 pmovmskb eax, xmm0
 cmp     eax, 65535
 jne     LBB17_1
 movdqa  xmm0, xmmword, ptr, [rdi, +, 16]
 pmovmskb ecx, xmm0
 mov     al, 1
 cmp     ecx, 65535
 je      LBB17_3
LBB17_1:
 xor     eax, eax
LBB17_3:
 pop     rbp
 ret
any_8x32:
 push    rbp
 mov     rbp, rsp
 movdqa  xmm0, xmmword, ptr, [rdi]
 pmovmskb ecx, xmm0
 mov     al, 1
 test    ecx, ecx
 je      LBB16_1
 pop     rbp
 ret
LBB16_1:
 movdqa  xmm0, xmmword, ptr, [rdi, +, 16]
 pmovmskb eax, xmm0
 test    eax, eax
 setne   al
 pop     rbp
 ret
```

With AVX enabled

Before this PR:

```asm
all_8x32:
 push    rbp
 mov     rbp, rsp
 vmovaps ymm0, ymmword, ptr, [rdi]
 vandps  ymm0, ymm0, ymmword, ptr, [rip, +, LCPI25_0]
 vextractf128 xmm1, ymm0, 1
 vpxor   xmm2, xmm2, xmm2
 vpcmpeqb xmm1, xmm1, xmm2
 vpcmpeqd xmm3, xmm3, xmm3
 vpxor   xmm1, xmm1, xmm3
 vpcmpeqb xmm0, xmm0, xmm2
 vpxor   xmm0, xmm0, xmm3
 vinsertf128 ymm0, ymm0, xmm1, 1
 vandps  ymm0, ymm0, ymm1
 vpermilps xmm1, xmm0, 78
 vandps  ymm0, ymm0, ymm1
 vpermilps xmm1, xmm0, 229
 vandps  ymm0, ymm0, ymm1
 vpsrld  xmm1, xmm0, 16
 vandps  ymm0, ymm0, ymm1
 vpsrlw  xmm1, xmm0, 8
 vandps  ymm0, ymm0, ymm1
 vpextrb eax, xmm0, 0
 and     al, 1
 pop     rbp
 vzeroupper
 ret
any_8x32:
 push    rbp
 mov     rbp, rsp
 vmovaps ymm0, ymmword, ptr, [rdi]
 vandps  ymm0, ymm0, ymmword, ptr, [rip, +, LCPI24_0]
 vextractf128 xmm1, ymm0, 1
 vpxor   xmm2, xmm2, xmm2
 vpcmpeqb xmm1, xmm1, xmm2
 vpcmpeqd xmm3, xmm3, xmm3
 vpxor   xmm1, xmm1, xmm3
 vpcmpeqb xmm0, xmm0, xmm2
 vpxor   xmm0, xmm0, xmm3
 vinsertf128 ymm0, ymm0, xmm1, 1
 vorps   ymm0, ymm0, ymm1
 vpermilps xmm1, xmm0, 78
 vorps   ymm0, ymm0, ymm1
 vpermilps xmm1, xmm0, 229
 vorps   ymm0, ymm0, ymm1
 vpsrld  xmm1, xmm0, 16
 vorps   ymm0, ymm0, ymm1
 vpsrlw  xmm1, xmm0, 8
 vorps   ymm0, ymm0, ymm1
 vpextrb eax, xmm0, 0
 and     al, 1
 pop     rbp
 vzeroupper
 ret
```

After this PR:

```asm
all_8x32:
 push    rbp
 mov     rbp, rsp
 vmovdqa ymm0, ymmword, ptr, [rdi]
 vxorps  xmm1, xmm1, xmm1
 vcmptrueps ymm1, ymm1, ymm1
 vptest  ymm0, ymm1
 setb    al
 pop     rbp
 vzeroupper
 ret
any_8x32:
 push    rbp
 mov     rbp, rsp
 vmovdqa ymm0, ymmword, ptr, [rdi]
 vptest  ymm0, ymm0
 setne   al
 pop     rbp
 vzeroupper
 ret
```

---

Closes #362 .

* test avx on all x86 targets

* disable assert_instr on avx test

* enable all appropriate features

* disable assert_instr on x86+avx

* the fn_must_use is stable

* fix nbody example on armv7

* fixup

* fixup

* enable 64-bit wide mask MMX optimizations on x86_64 only

* remove coresimd dependency on cfg_if

* allow wasm to fail

* use an env variable to disable assert_instr tests

* disable m32x2 mask MMX optimization on macos

* move cfg_if to coresimd/macros.rs
2018-05-04 16:03:45 -05:00
gnzlbg
30962e58e6 fix errors/warnings from the stabilization of cfg_target_feature and target_feature (#432)
* fix build after stabilization of cfg_target_feature and target_feature

* fix doc tests

* fix spurious unused_attributes warning

* fix more unused attribute warnings

* More unnecessary target features

* Remove no longer needed trait imports

* Remove fixed upstream workarounds

* Fix parsing the #[assert_instr] macro

Following upstream proc_macro changes

* Fix form and parsing of #[simd_test]

* Don't use Cargo features for testing modes

Instead use RUSTFLAGS with `--cfg`. This'll help us be compatible with the
latest Cargo where a tweak to workspaces and features made the previous
invocations we had invalid.

* Don't thread RUSTFLAGS through docker

* Re-gate on x86 verification

Closes #411
2018-04-26 21:54:15 -05:00
gnzlbg
87ce896543
Documents arithmetic reduction semantics (#412)
* documents arithmetic reduction semantics
2018-04-05 19:36:04 +02:00
gnzlbg
cae02b7fa0 update ubuntu version 2018-04-03 15:40:22 +02:00
gnzlbg
0239a1a0aa update intel SDE version 2018-04-03 15:40:22 +02:00
gnzlbg
ff53ec6cb2 add arm neon vector types (#384) 2018-03-20 09:11:50 -05:00
gnzlbg
68c53c1e55 Split protable vector types tests into multiple crates (#379)
* split the portable vector tests into separate crates

* use rustc reductions
2018-03-18 10:55:20 -05:00
gnzlbg
2762e2ca9a [mips/mips64: msa] add add_a_b intrinsic (#365)
* [mips64/msa] add add_a_b intrinsic

* add make/file to mips64el's Dockerfile

* add run-time detection support for mips64

* add mips64 build bot

* generate docs for mips64

* fix linux test

* cleanup rt-detection

* support mips64/mips64el in stdsimd-test

* support asserting instructions with  in their name

* better error msgs for the auxv_crate test

* debug auxv on mips64

* override run-time detection on mips msa tests

* remove unused #[macro_use]

* try another MIPS cpu

* detect default TARGET in simd-test-macro

* use mips64r2-generic

* disable unused function in mips tests

* move msa to mips

* remove mips from ci

* split into mips and mips64 modules

* add rt-detection for 32-bit mips

* fmt

* remove merge error

* add norun build bots for mips

* add -p to avoid changing the cwd

* fixup

* refactor run-time detection module
2018-03-10 12:22:54 -06:00
gnzlbg
be0b7f41fc adds AArch64's {s,u,f}{min,max}{v,p} and ARM's {vmov}{n,l} (#345)
* adds {s,u,f}{min,max}{v,p} AArch64 intrinsics
* adds {vmov}{n,l} ARM intrinsics

Closes #314 .
2018-03-07 09:31:14 -06:00
gnzlbg
548290b801 Prepare portable packed vector types for RFCs (#338)
* Prepare portable packed SIMD vector types for RFCs

This commit cleans up the implementation of the Portable Packed Vector Types
(PPTV), adds some new features, and makes some breaking changes.

The implementation is moved to `coresimd/src/ppvt` (they are
still exposed via `coresimd::simd`).

As before, the vector types of a certain width are implemented in the `v{width}`
submodules. The `macros.rs` file has been rewritten as an `api` module that
exposes the macros to implement each API.

It should now hopefully be really clear where each API is implemented, and which types
implement these APIs. It should also now be really clear which APIs are tested and how.

- boolean vectors of the form `b{element_size}x{number_of_lanes}`.
- reductions: arithmetic, bitwise, min/max, and boolean - only the facade,
  and a naive working implementation. These need to be implemented
  as `llvm.experimental.vector.reduction.{...}` but this needs rustc support first.
- FromBits trait analogous to `{f32,f64}::from_bits` that perform "safe" transmutes.
  Instead of writing `From::from`/`x.into()` (see below for breaking changes) now you write
  `FromBits::from_bits`/`x.into_bits()`.
- portable vector types implement `Default` and `Hash`
- tests for all portable vector types and all portable operations (~2000 new tests).
- (hopefully) comprehensive implementation of bitwise transmutes and lane-wise
  casts (before `From` and the `.as_...` methods where implemented "when they were needed".
- documentation for PPTV (not great yet, but better than nothing)
- conversions/transmutes from/to x86 architecture specific vector types

- `store/load` API has been replaced with `{store,load}_{aligned,unaligned}`
- `eq,ne,lt,le,gt,ge` APIs now return boolean vectors
- The `.as_{...}` methods have been removed. Lane-wise casts are now performed by `From`.
- `From` now perform casts (see above). It used to perform bitwise transmutes.
- `simd` vectors' `replace` method's result is now `#[must_use]`.

* enable backtrace and nocapture

* unalign load/store fail test by 1 byte

* update arm and aarch64 neon modules

* fix arm example

* fmt

* clippy and read example that rustfmt swallowed

* reductions should take self

* rename add/mul -> sum/product; delete other arith reductions

* clean up fmt::LowerHex impl

* revert incorret doc change

* make Hash equivalent to [T; lanes()]

* use travis_wait to increase timeout limit to 20 minutes

* remove travis_wait; did not help

* implement reductions on top of the llvm.experimental.vector.reduction intrinsics

* implement cmp for boolean vectors

* add missing eq impl file

* implement default

* rename llvm intrinsics

* fix aarch64 example error

* replace #[inline(always)] with #[inline]

* remove cargo clean from run.sh

* workaround broken product in aarch64

* make boolean vector constructors const fn

* fix more reductions on aarch64

* fix min/max reductions on aarch64

* remove whitespace

* remove all boolean vector types except for b8xN

* use a sum reduction fallback on aarch64

* disable llvm add reduction for aarch64

* rename the llvm intrinsics to use llvm names

* remove old macros.rs file
2018-03-05 14:32:35 -06:00
Alex Crichton
39b5ec91ae
Reorganize and refactor source tree (#324)
With RFC 2325 looking close to being accepted, I took a crack at
reorganizing this repository to being more amenable for inclusion in
libstd/libcore. My current plan is to add stdsimd as a submodule in
rust-lang/rust and then use `#[path]` to include the modules directly
into libstd/libcore.

Before this commit, however, the source code of coresimd/stdsimd
themselves were not quite ready for this. Imports wouldn't compile for
one reason or another, and the organization was also different than the
RFC itself!

In addition to moving a lot of files around, this commit has the
following major changes:

* The `cfg_feature_enabled!` macro is now renamed to
  `is_target_feature_detected!`
* The `vendor` module is now called `arch`.
* Under the `arch` module is a suite of modules like `x86`, `x86_64`,
  etc. One per `cfg!(target_arch)`.
* The `is_target_feature_detected!` macro was removed from coresimd.
  Unfortunately libcore has no ability to export unstable macros, so for
  now all feature detection is canonicalized in stdsimd.

The `coresimd` and `stdsimd` crates have been updated to the planned
organization in RFC 2325 as well. The runtime bits saw the largest
amount of refactoring, seeing a good deal of simplification without the
core/std split.
2018-02-18 10:07:35 +09:00
Alex Crichton
5b445c5cac Update doc generation with recent devlopments 2018-01-28 22:00:13 -08:00
Alex Crichton
c5afde07d2
Migrate the i586::avx module to vendor types (#286)
Closes #285
2018-01-18 11:21:03 -06:00
Alex Crichton
be461b1377
Verify Intel intrinsics against upstream definitions (#251)
This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which is
then parsed in the `stdsimd-verify` crate to verify that everything we write
down is matched against the upstream definitions.

Currently the checks are pretty loose to get this compiling but a few intrinsics
were fixed as a result of this. For example:

* `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX
* `_mm_tzcnt_32` - erroneously had `u32` in the name
* `_mm_tzcnt_64` - erroneously had `u64` in the name
* `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms
* `_mm_extract_epi64` - erroneously available on 32-bit platforms
* `_mm_insert_epi64` - erroneously available on 32-bit platforms
* `_mm256_extract_epi16` - erroneously returned i32 instead of i16
* `_mm256_extract_epi8` - erroneously returned i32 instead of i8
* `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32
* `_popcnt32` - the signededness of the argument and return were flipped
* `_popcnt64` - the signededness of the argument was flipped and the argument
  was too large bit-wise
* `_mm_tzcnt_32` - the return value's sign was flipped
* `_mm_tzcnt_64` - the return value's sign was flipped
* A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8:
  i32` which Intel was using. (we were also internally inconsistent)
* A number of intrinsics working with `__m64` were instead working with i64/u64,
  so they're now corrected to operate with the vector types instead.

Currently the verifications performed are:

* Each name in Rust is defined in the XML document
* The arguments/return values all agree.
* The CPUID features listed in the XML document are all enabled in Rust as well.

The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require closer
adherence with Intel's own types. Otherwise types like `i32x8` (or any integers
with 256 bits) all match up to `__m256i` right now, althoguh this may want to
change in the future.

Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in the XML
and the instruction listed in `assert_instr`, and those'll need to be taken care
of in a future commit.

Closes #240
2017-12-29 11:52:27 -06:00
gnzlbg
5ce0c13009 [ci] powerpc/powerpc64/powerpc64le (#237)
* [ci] add powerpc/powerpc64 build bots

* unbreak stdsimd builds for targets without run-time
2017-12-14 10:44:20 -06:00
gnzlbg
b8a4b397ad update docs (#217)
* update docs

* cargo clean deletes previous docs

* remove stdsimd from coresimd examples

* use stdsimd instead of coresimd in core docs

* add stdsimd as a dev-dependency of coresimd
2017-11-27 10:47:23 -08:00
gnzlbg
426621f021
Add FXSAVE/FXRSTOR, update Intel SDE, fix xsave tests (#205)
* [x86] add run-time detection for fxsr
* [x86] add i386 fxsr intrinsics: FXSAVE,FXRSTOR
* [x86_64] add x86_64 fxsr intrinsics: FXSAVE64/FXRSTOR64
* [x86-runtime]: document xsave detection further
* [x86] disable xsaves and xsaves64 tests
2017-11-22 15:25:15 +01:00
Alex Crichton
922345c005 Use workspaces and fix tests
* Enable a Cargo workspace for the repo
* Disable tests for proc-macro crates
* Move back to mounting source directory read-only
* Refactor test invocation to only test one crate with `--all`
2017-11-22 13:42:58 +01:00
gnzlbg
b940d3311a fix doc script 2017-11-22 13:42:58 +01:00
gnzlbg
14d0903309 refactor no_std components into the coresimd crate 2017-11-22 13:42:58 +01:00