PTX support, take 2
- You can generate PTX using `--emit=asm` and the right (custom) target. Which
then you can run on a NVIDIA GPU.
- You can compile `core` to PTX. [Xargo] also works and it can compile some
other crates like `collections` (but I doubt all of those make sense on a GPU)
[Xargo]: https://github.com/japaric/xargo
- You can create "global" functions, which can be "called" by the host, using
the `"ptx-kernel"` ABI, e.g. `extern "ptx-kernel" fn kernel() { .. }`. Every
other function is a "device" function and can only be called by the GPU.
- Intrinsics like `__syncthreads()` and `blockIdx.x` are available as
`"platform-intrinsics"`. These intrinsics are *not* in the `core` crate but
any Rust user can create "bindings" to them using an `extern
"platform-intrinsics"` block. See example at the end.
- Trying to emit PTX with `-g` (debuginfo); you get an LLVM error. But I don't
think PTX can contain debuginfo anyway so `-g` should be ignored and a warning
should be printed ("`-g` doesn't work with this target" or something).
- "Single source" support. You *can't* write a single source file that contains
both host and device code. I think that should be possible to implement that
outside the compiler using compiler plugins / build scripts.
- The equivalent to CUDA `__shared__` which it's used to declare memory that's
shared between the threads of the same block. This could be implemented using
attributes: `#[shared] static mut SCRATCH_MEMORY: [f32; 64]` but hasn't been
implemented yet.
- Built-in targets. This PR doesn't add targets to the compiler just yet but one
can create custom targets to be able to emit PTX code (see the example at the
end). The idea is to have people experiment with this feature before
committing to it (built-in targets are "insta-stable")
- All functions must be "inlined". IOW, the `.rlib` must always contain the LLVM
bitcode of all the functions of the crate it was produced from. Otherwise, you
end with "undefined references" in the final PTX code but you won't get *any*
linker error because no linker is involved. IOW, you'll hit a runtime error
when loading the PTX into the GPU. The workaround is to use `#[inline]` on
non-generic functions and to never use `#[inline(never)]` but this may not
always be possible because e.g. you could be relying on third party code.
- Should `--emit=asm` generate a `.ptx` file instead of a `.s` file?
TL;DR Use Xargo to turn a crate into a PTX module (a `.s` file). Then pass that
PTX module, as a string, to the GPU and run it.
The full code is in [this repository]. This section gives an overview of how to
run Rust code on a NVIDIA GPU.
[this repository]: https://github.com/japaric/cuda
- Create a custom target. Here's the 64-bit NVPTX target (NOTE: the comments
are not valid because this is supposed to be a JSON file; remove them before
you use this file):
``` js
// nvptx64-nvidia-cuda.json
{
"arch": "nvptx64", // matches LLVM
"cpu": "sm_20", // "oldest" compute capability supported by LLVM
"data-layout": "e-i64:64-v16:16-v32:32-n16:32:64",
"llvm-target": "nvptx64-nvidia-cuda",
"max-atomic-width": 0, // LLVM errors with any other value :-(
"os": "cuda", // matches LLVM
"panic-strategy": "abort",
"target-endian": "little",
"target-pointer-width": "64",
"target-vendor": "nvidia", // matches LLVM -- not required
}
```
(There's a 32-bit target specification in the linked repository)
- Write a kernel
``` rust
extern "platform-intrinsic" {
fn nvptx_block_dim_x() -> i32;
fn nvptx_block_idx_x() -> i32;
fn nvptx_thread_idx_x() -> i32;
}
/// Copies an array of `n` floating point numbers from `src` to `dst`
pub unsafe extern "ptx-kernel" fn memcpy(dst: *mut f32,
src: *const f32,
n: usize) {
let i = (nvptx_block_dim_x() as isize)
.wrapping_mul(nvptx_block_idx_x() as isize)
.wrapping_add(nvptx_thread_idx_x() as isize);
if (i as usize) < n {
*dst.offset(i) = *src.offset(i);
}
}
```
- Emit PTX code
```
$ xargo rustc --target nvptx64-nvidia-cuda --release -- --emit=asm
Compiling core v0.0.0 (file://..)
(..)
Compiling nvptx-builtins v0.1.0 (https://github.com/japaric/nvptx-builtins)
Compiling kernel v0.1.0
$ cat target/nvptx64-nvidia-cuda/release/deps/kernel-*.s
//
// Generated by LLVM NVPTX Back-End
//
.version 3.2
.target sm_20
.address_size 64
// .globl memcpy
.visible .entry memcpy(
.param .u64 memcpy_param_0,
.param .u64 memcpy_param_1,
.param .u64 memcpy_param_2
)
{
.reg .pred %p<2>;
.reg .s32 %r<5>;
.reg .s64 %rd<12>;
ld.param.u64 %rd7, [memcpy_param_2];
mov.u32 %r1, %ntid.x;
mov.u32 %r2, %ctaid.x;
mul.wide.s32 %rd8, %r2, %r1;
mov.u32 %r3, %tid.x;
cvt.s64.s32 %rd9, %r3;
add.s64 %rd10, %rd9, %rd8;
setp.ge.u64 %p1, %rd10, %rd7;
@%p1 bra LBB0_2;
ld.param.u64 %rd3, [memcpy_param_0];
ld.param.u64 %rd4, [memcpy_param_1];
cvta.to.global.u64 %rd5, %rd4;
cvta.to.global.u64 %rd6, %rd3;
shl.b64 %rd11, %rd10, 2;
add.s64 %rd1, %rd6, %rd11;
add.s64 %rd2, %rd5, %rd11;
ld.global.u32 %r4, [%rd2];
st.global.u32 [%rd1], %r4;
LBB0_2:
ret;
}
```
- Run it on the GPU
``` rust
// `kernel.ptx` is the `*.s` file we got in the previous step
const KERNEL: &'static str = include_str!("kernel.ptx");
driver::initialize()?;
let device = Device(0)?;
let ctx = device.create_context()?;
let module = ctx.load_module(KERNEL)?;
let kernel = module.function("memcpy")?;
let h_a: Vec<f32> = /* create some random data */;
let h_b = vec![0.; N];
let d_a = driver::allocate(bytes)?;
let d_b = driver::allocate(bytes)?;
// Copy from host to GPU
driver::copy(h_a, d_a)?;
// Run `memcpy` on the GPU
kernel.launch(d_b, d_a, N)?;
// Copy from GPU to host
driver::copy(d_b, h_b)?;
// Verify
assert_eq!(h_a, h_b);
// `d_a`, `d_b`, `h_a`, `h_b` are dropped/freed here
```
---
cc @alexcrichton @brson @rkruppe
> What has changed since #34195?
- `core` now can be compiled into PTX. Which makes it very easy to turn `no_std`
crates into "kernels" with the help of Xargo.
- There's now a way, the `"ptx-kernel"` ABI, to generate "global" functions. The
old PR required a manual step (it was hack) to "convert" "device" functions
into "global" functions. (Only "global" functions can be launched by the host)
- Everything is unstable. There are not "insta stable" built-in targets this
time (\*). The users have to use a custom target to experiment with this
feature. Also, PTX instrinsics, like `__syncthreads` and `blockIdx.x`, are now
implemented as `"platform-intrinsics"` so they no longer live in the `core`
crate.
(\*) I'd actually like to have in-tree targets because that makes this target
more discoverable, removes the need to lug around .json files, etc.
However, bundling a target with the compiler immediately puts it in the path
towards stabilization. Which gives us just two cycles to find and fix any
problem with the target specification. Afterwards, it becomes hard to tweak
the specification because that could be a breaking change.
A possible solution could be "unstable built-in targets". Basically, to use an
unstable target, you'll have to also pass `-Z unstable-options` to the compiler.
And unstable targets, being unstable, wouldn't be available on stable.
> Why should this be merged?
- To let people experiment with the feature out of tree. Having easy access to
the feature (in every nightly) allows this. I also think that, as it is, it
should be possible to start prototyping type-safe single source support using
build scripts, macros and/or plugins.
- It's a straightforward implementation. No different that adding support for
any other architecture.
struct field reordering and optimization
This is work in progress. The goal is to divorce the order of fields in source code from the order of fields in the LLVM IR, then optimize structs (and tuples/enum variants)by always ordering fields from least to most aligned. It does not work yet. I intend to check compiler memory usage as a benchmark, and a crater run will probably be required.
I don't know enough of the compiler to complete this work unaided. If you see places that still need updating, please mention them. The only one I know of currently is debuginfo, which I'm putting off intentionally until a bit later.
r? @eddyb
After the fix of #37453 in PR #37369, instead of pointing at only the
cast type, point at the full cast span when a cast needs a dereference:
```
error: casting `&{float}` as `f32` is invalid
--> ../../../src/test/ui/mismatched_types/cast-rfc0401.rs:81:30
|
81 | vec![0.0].iter().map(|s| s as f32).collect::<Vec<f32>>();
| ^^^^^^^^ cannot cast `&{float}` as `f32`
|
help: did you mean `*s`?
--> ../../../src/test/ui/mismatched_types/cast-rfc0401.rs:81:30
|
81 | vec![0.0].iter().map(|s| s as f32).collect::<Vec<f32>>();
| ^
```
instead of
```
error: casting `&{float}` as `f32` is invalid
--> ../../../src/test/ui/mismatched_types/cast-rfc0401.rs:81:35
|
81 | vec![0.0].iter().map(|s| s as f32).collect::<Vec<f32>>();
| - ^^^
| |
| |
| did you mean `*s`?
| cannot cast `&{float}` as `f32`
```
Point arg num mismatch errors back to their definition
This PR updates the arg num errors (like E0061) to point back at the function definition where they were defined.
Before:
```
error[E0061]: this function takes 2 parameters but 1 parameter was supplied
--> E0061.rs:18:7
|
18 | f(0);
| ^
|
= note: the following parameter types were expected:
= note: u16, &str
```
Now:
```
error[E0061]: this function takes 2 parameters but 1 parameter was supplied
--> E0061.rs:18:7
|
11 | fn f(a: u16, b: &str) {}
| ------------------------ defined here
...
18 | f(0);
| ^ expected 2 parameters
```
This is an incremental improvement. We probably want to underline only the function name and also have support for functions defined in crates outside of the current crate.
r? @nikomatsakis
Show `Trait` instead of `<Struct as Trait>` in E0323
For a given file
```
trait Foo {
fn bar(&self);
}
pub struct FooConstForMethod;
impl Foo for FooConstForMethod {
const bar: u64 = 1;
}
```
show
```
error[E0323]: item `bar` is an associated const, which doesn't match its trait `Foo`
```
instead of
```
error[E0323]: item `bar` is an associated const, which doesn't match its trait `<FooConstForMethod as Foo>`
```
Fix#37618
Show multiline spans in full if short enough
When dealing with multiline spans that span few lines, show the complete span instead of restricting to the first character of the first line.
For example, instead of:
```
% ./rustc file2.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> file2.rs:13:9
|
13 | foo(1 + bar(x,
| ^ trait `{integer}: std::ops::Add<()>` not satisfied
|
```
show
```
% ./rustc file2.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> file2.rs:13:9
|
13 | foo(1 + bar(x,
| ________^ starting here...
14 | | y),
| |_____________^ ...ending here: trait `{integer}: std::ops::Add<()>` not satisfied
|
```
The [proposal in internals](https://internals.rust-lang.org/t/proposal-for-multiline-span-comments/4242/6) outlines the reasoning behind this.
For a given file
```
trait Foo {
fn bar(&self);
}
pub struct FooConstForMethod;
impl Foo for FooConstForMethod {
const bar: u64 = 1;
}
```
show
```
error[E0323]: item `bar` is an associated const, which doesn't match its trait `Foo`
```
instead of
```
error[E0323]: item `bar` is an associated const, which doesn't match its trait `<FooConstForMethod as Foo>`
```
Note that the tests have been updated to initialize the local
variables; originally it was enough just to declare them.
Back when I started this, the `layout_cache` contained entries even
just for types that had been declared but not initialized. Apparently
things have changed in the interim so that if I want one of those
layouts to be computed, I need to actually initialize the value.
(Incidentally, this shows a weakness in the strategy of just walking
the `layout_cache`; the original strategy of using a MIR visitor would
probably have exhibited more robustness in terms of consistent output,
but it had other weaknesses so I chose not to reimplement it. At
least, not yet.)
----
Also, I have updated tests to avoid target-specific alignments.
When dealing with multiline spans that span few lines, show the complete
span instead of restricting to the first character of the first line.
For example, instead of:
```
% ./rustc foo.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> foo.rs:13:9
|
13 | foo(1 + bar(x,
| ^ trait `{integer}: std::ops::Add<()>` not satisfied
|
```
show
```
% ./rustc foo.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> foo.rs:13:9
|
13 | foo(1 + bar(x,
| ________^ starting here...
14 | | y),
| |_____________^ ...ending here: trait `{integer}: std::ops::Add<()>` not satisfied
|
```
test: Move missing-items to a ui test
This test is failing on nightly for unknown reasons, and my best guess is a
difference in grep versions which is interpreting symbols differently. For now
let's just move this to a ui test and hope it fixes nightlies.
This test is failing on nightly for unknown reasons, and my best guess is a
difference in grep versions which is interpreting symbols differently. For now
let's just move this to a ui test and hope it fixes nightlies.
On fmt string with unescaped `{` note how to escape
On cases of malformed format strings where a `{` hasn't been properly escaped, like `println!("{");`, present a NOTE explaining how to escape the `{` char.
Fix#34300.
Add foreign formatting directive detection.
This teaches `format_args!` how to interpret format printf- and
shell-style format directives. This is used in cases where there are
unused formatting arguments, and the reason for that *might* be because
the programmer is trying to use the wrong kind of formatting string.
This was prompted by an issue encountered by simulacrum on the #rust IRC
channel. In short: although `println!` told them that they weren't using
all of the conversion arguments, the problem was in using printf-syle
directives rather than ones `println!` would undertand.
Where possible, `format_args!` will tell the programmer what they should
use instead. For example, it will suggest replacing `%05d` with `{:0>5}`,
or `%2$.*3$s` with `{1:.3$}`. Even if it cannot suggest a replacement,
it will explicitly note that Rust does not support that style of directive,
and direct the user to the `std::fmt` documentation.
-----
**Example**: given:
```rust
fn main() {
println!("%.*3$s %s!\n", "Hello,", "World", 4);
println!("%1$*2$.*3$f", 123.456);
}
```
The compiler outputs the following:
```text
error: multiple unused formatting arguments
--> local/fmt.rs:2:5
|
2 | println!("%.*3$s %s!\n", "Hello,", "World", 4);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
note: argument never used
--> local/fmt.rs:2:30
|
2 | println!("%.*3$s %s!\n", "Hello,", "World", 4);
| ^^^^^^^^
note: argument never used
--> local/fmt.rs:2:40
|
2 | println!("%.*3$s %s!\n", "Hello,", "World", 4);
| ^^^^^^^
note: argument never used
--> local/fmt.rs:2:49
|
2 | println!("%.*3$s %s!\n", "Hello,", "World", 4);
| ^
= help: `%.*3$s` should be written as `{:.2$}`
= help: `%s` should be written as `{}`
= note: printf formatting not supported; see the documentation for `std::fmt`
= note: this error originates in a macro outside of the current crate
error: argument never used
--> local/fmt.rs:6:29
|
6 | println!("%1$*2$.*3$f", 123.456);
| ^^^^^^^
|
= help: `%1$*2$.*3$f` should be written as `{0:1$.2$}`
= note: printf formatting not supported; see the documentation for `std::fmt`
```
Don't provide hint to add lifetime on impl items
``` rust
use std::str::FromStr;
pub struct Foo<'a> {
field: &'a str,
}
impl<'a> FromStr for Foo<'a> {
type Err = ();
fn from_str(path: &str) -> Result<Self, ()> {
Ok(Foo { field: path })
}
}
```
would give the following hint:
``` nocode
help: consider using an explicit lifetime parameter as shown: fn from_str(path: &'a str) -> Result<Self, ()>
--> <anon>:9:5
|
9 | fn from_str(path: &str) -> Result<Self, ()> {
| ^
```
which is never correct, since then there will be a lifetime mismatch between the `impl` and the trait.
Remove this hint for all `impl` items.
Re: #37363.
On cases of malformed format strings where a `{` hasn't been properly
escaped, like `println!("{");`, present a note explaining how to escape
the `{` char.
Group unused import warnings per import list
Given a file
``` rust
use std::collections::{BinaryHeap, BTreeMap, BTreeSet};
fn main() {}
```
Show a single warning, instead of three for each unused import:
``` nocode
warning: unused imports, #[warn(unused_imports)] on by default
--> file2.rs:1:24
|
1 | use std::collections::{BinaryHeap, BTreeMap, BTreeSet};
| ^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^
```
Include support for lints pointing at `MultilineSpan`s, instead of just
`Span`s.
Fixes#16132.
This teaches `format_args!` how to interpret format printf- and
shell-style format directives. This is used in cases where there are
unused formatting arguments, and the reason for that *might* be because
the programmer is trying to use the wrong kind of formatting string.
This was prompted by an issue encountered by simulacrum on the #rust IRC
channel. In short: although `println!` told them that they weren't using
all of the conversion arguments, the problem was in using printf-syle
directives rather than ones `println!` would undertand.
Where possible, `format_args!` will tell the programmer what they should
use instead. For example, it will suggest replacing `%05d` with `{:0>5}`,
or `%2$.*3$s` with `{1:.3$}`. Even if it cannot suggest a replacement,
it will explicitly note that Rust does not support that style of directive,
and direct the user to the `std::fmt` documentation.
Don't provide hint to add lifetime on impl items that implement a trait.
```rust
use std::str::FromStr;
pub struct Foo<'a> {
field: &'a str,
}
impl<'a> FromStr for Foo<'a> {
type Err = ();
fn from_str(path: &str) -> Result<Self, ()> {
Ok(Foo { field: path })
}
}
```
would give the following hint:
```nocode
help: consider using an explicit lifetime parameter as shown: fn from_str(path: &'a str) -> Result<Self, ()>
--> <anon>:9:5
|
9 | fn from_str(path: &str) -> Result<Self, ()> {
| ^
```
which is never correct, since then there will be a lifetime mismatch
between the impl and the trait.
Remove this hint for impl items that implement a trait.
Point to type argument span when used as trait
Given the following code:
``` rust
struct Foo<T: Clone>(T);
use std::ops::Add;
impl<T: Clone, Add> Add for Foo<T> {
type Output = usize;
fn add(self, rhs: Self) -> Self::Output {
unimplemented!();
}
}
```
present the following output:
``` nocode
error[E0404]: `Add` is not a trait
--> file3.rs:5:21
|
5 | impl<T: Clone, Add> Add for Okok<T> {
| --- ^^^ expected trait, found type parameter
| |
| type parameter defined here
```
Fixes#35987.
Include type of missing trait methods in error
Provide either a span pointing to the original definition of missing
trait items, or a message with the inferred definitions.
Fixes#24626. Follow up to PR #36371.
If PR #37369 lands, missing trait items that present a multiline span will be able to show the entirety of the item definition on the error itself, instead of just the first line.
Given a file
```rust
use std::collections::{BinaryHeap, BTreeMap, BTreeSet};
fn main() {}
```
Show a single warning, instead of three for each unused import:
```nocode
warning: unused imports, #[warn(unused_imports)] on by default
--> foo.rs:1:24
|
1 | use std::collections::{BinaryHeap, BTreeMap, BTreeSet};
| ^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^
```
Include support for lints pointing at `MultilineSpan`s, instead of just
`Span`s.
Given the following code:
```rust
struct Foo<T: Clone>(T);
use std::ops::Add;
impl<T: Clone, Add> Add for Foo<T> {
type Output = usize;
fn add(self, rhs: Self) -> Self::Output {
unimplemented!();
}
}
```
present the following output:
```nocode
error[E0404]: `Add` is not a trait
--> file3.rs:5:21
|
5 | impl<T: Clone, Add> Add for Okok<T> {
| --- ^^^ expected trait, found type parameter
| |
| type parameter defined here
```