Update CONTRIBUTING

This commit is contained in:
Alex Crichton 2017-09-25 13:44:20 -07:00
parent a03935143f
commit 1d22fe36b9

View file

@ -1,131 +1,30 @@
This document is intended to be a guide for documenting the process of adding
new vendor intrinsics to this crate.
# Contributing to stdsimd
If you decide to implement a set of vendor intrinsics, please check the set of
open issues to make sure somebody else isn't already working on them. If no
such issue exists, then create a new issue and state the intrinsics you'd like
to implement.
The `stdsimd` crate is more than willing to accept contributions! First you'll
probably want to check out the repository and make sure that tests pass for you:
At a high level, each vendor intrinsic should correspond to a single exported
Rust function with an appropriate `target_feature` attribute. Here's an
example for `_mm_adds_epi16`:
```rust
/// Add packed 16-bit integers in `a` and `b` using saturation.
#[inline(always)]
#[target_feature = "+sse2"]
#[cfg_attr(test, assert_instr(paddsw))]
pub fn _mm_adds_epi16(a: i16x8, b: i16x8) -> i16x8 {
unsafe { paddsw(a, b) }
}
```
$ git clone https://github.com/rust-lang-nursery/stdsimd
$ cd stdsimd
$ cargo +nightly test
```
Let's break this down:
Remember that this repository requires the nightly channel of Rust! If any of
the above steps don't work, [please let us know][new]!
* The `#[inline(always)]` is added because vendor intrinsic functions generally
should always be inlined because the intent of a vendor intrinsic is to
correspond to a single particular CPU instruction. A vendor intrinsic that
is compiled into an actual function call could be quite disastrous for
performance.
* The `#[target_feature = "+sse2"]` attribute intructs the compiler to generate
code with the `sse2` target feature enabled, *regardless* of the target
platform. That is, even if you're compiling for a platform that doesn't
support `sse2`, the compiler will still generate code for `_mm_adds_epi16`
*as if* `sse2` support existed. Without this attribute, the compiler might
not generate the intended CPU instruction.
* The `#[cfg_attr(test, assert_instr(paddsw))]` attribute indicates that when
we're testing the crate we'll assert that the `paddsw` instruction is
generated inside this function, ensuring that the SIMD intrinsic truly is an
intrinsic for the instruction!
* The types of the vectors given to the intrinsic should generally match the
types as provided in the vendor interface. We'll talk about this more below.
* The implementation of the vendor intrinsic is generally very simple.
Remember, the goal is to compile a call to `_mm_adds_epi16` down to a single
particular CPU instruction. As such, the implementation typically defers to a
compiler intrinsic (in this case, `paddsw`) when one is available. More on
this below as well.
Next up you can [find an issue][issues] to help out on, we've selected a few
with the [`help wanted`][help] and [`impl-period`][impl] tags which could
particularly use some help. You may be most interested in [#40][vendor],
implementing all vendor intrinsics on x86. That issue's got some good pointers
about where to get started!
Once a function has been added, you should also add at least one test for basic
functionality. Here's an example for `_mm_adds_epi16`:
[new]: https://github.com/rust-lang-nursery/stdsimd/issues/new
[issues]: https://github.com/rust-lang-nursery/stdsimd/issues
[help]: https://github.com/rust-lang-nursery/stdsimd/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22
[impl]: https://github.com/rust-lang-nursery/stdsimd/issues?q=is%3Aissue+is%3Aopen+label%3Aimpl-period
[vendor]: https://github.com/rust-lang-nursery/stdsimd/issues/40
```rust
#[test]
fn _mm_adds_epi16() {
let a = i16x8::new(0, 1, 2, 3, 4, 5, 6, 7);
let b = i16x8::new(8, 9, 10, 11, 12, 13, 14, 15);
let r = sse2::_mm_adds_epi16(a, b);
let e = i16x8::new(8, 10, 12, 14, 16, 18, 20, 22);
assert_eq!(r, e);
}
```
If you've got general questions feel free to [join us on gitter][gitter] and ask
around! Feel free to ping either @BurntSushi or @alexcrichton with questions.
Finally, once that's done, check off the corresponding box in the `TODO.md`
file in the root of this repository.
## Determining types
Determining the function signature of each vendor intrinsic can be tricky
depending on the specificity of the vendor API. For SSE, Intel generally has
three types in their interface:
* `__m128` consists of 4 single-precision (32-bit) floating point numbers.
* `__m128d` consists of 2 double-precision (64-bit) floating point numbers.
* `__m128i` consists of `N` integers, where `N` can be 16, 8, 4 or 2. The
corresponding bit sizes for each value of `N` are 8-bit, 16-bit, 32-bit and
64-bit, respectively. Finally, there are signed and unsigned variants for
each value of `N`, which means `__m128i` can be mapped to one of eight
possible concrete integer types.
In terms of the `stdsimd` crate, the first two floating point types have a
straight-forward translation. `__m128` maps to `f32x4` while `__m128d` maps to
`f64x2`.
Unfortunately, since `__m128i` can correspond to any number of integer types
we need to actually inspect the vendor intrinsic to determine the type.
Sometimes this is hinted at in the name of intrinsic itself. Continuing with
our previous example, `_mm_adds_epi16`, we can infer that it is a signed
operation on an integer vector consisting of eight 16-bit integers. Namely, the
`epi` means signed (where as `epu` means unsigned) and `16` means 16-bit.
Fortunately, Clang (and LLVM) have determined the specific concrete integer
types for most of the vendor intrinsics already, but they aren't available in
any easily access away (as far as this author knows). For example, you can
see
[the types for `_mm_adds_epi16` in Clang's `emmintrin.h` header file](https://github.com/llvm-mirror/clang/blob/dcd8d797b20291f1a6b3e0ddda085aa2bbb382a8/lib/Headers/emmintrin.h#L2180-L2200).
## Writing the implementation
An implementation of an intrinsic (so far) generally has one of three shapes:
1. The vendor intrinsic does not have any corresponding compiler intrinsic, so
you must write the implementation in such a way that the compiler will
recognize it and produce the desired codegen. For example, the
`_mm_add_epi16` intrinsic (note the missing `s` in `add`) is implemented
via `a + b`, which compiles down to LLVM's cross platform SIMD vector API.
2. The vendor intrinsic *does* have a corresponding compiler intrinsic, so you
must write an `extern` block to bring that intrinsic into scope and then
call it. The example above (`_mm_adds_epi16`) uses this approach.
3. The vendor intrinsic has a parameter that must be a *constant* value when
given to the CPU instruction, where that constant is often a parameter that
impacts the operation of the intrinsic. This means the implementation of the
vendor intrinsic must guarantee that a particular parameter be a constant.
This is tricky because Rust doesn't (yet) have a stable way of doing this,
so we have to do it ourselves. How you do it can vary, but one particularly
gnarly example is
[`_mm_cmpestri`](https://github.com/BurntSushi/stdsimd/blob/ff6021b72e8cc1e7db942847d99278fe0056c245/src/x86/sse42.rs#L286)
(make sure to look at the `constify_imm8!` macro).
## References
The compiler intrinsics available to us through LLVM can be found here:
https://gist.github.com/anonymous/a25d3e3b4c14ee68d63bd1dcb0e1223c
The Intel vendor intrinsic API can be found here:
https://gist.github.com/anonymous/25d752fda8521d29699a826b980218fc
The Clang header files for vendor intrinsics can also be incredibly useful.
When in doubt, Do What Clang Does:
https://github.com/llvm-mirror/clang/tree/master/lib/Headers
[gitter]: https://gitter.im/rust-impl-period/WG-libs-simd