This is the first part of #39018. One of the common things for new users
coming from more dynamic languages like JavaScript, Python or Ruby is to
use `+` to concatenate strings. However, this doesn't work that way in
Rust unless the first type is a `String`. This commit adds a check for
this use case and outputs a new error as well as a suggestion to guide
the user towards the desired behavior. It also adds a new test case to
test the output of the error.
rustc: Remove all "consider using an explicit lifetime parameter" suggestions
These give so many incorrect suggestions that having them is
detrimental to the user experience. The compiler should not be
suggesting changes to the code that are wrong - it is infuriating: not
only is the compiler telling you that _you don't understand_ borrowing,
_the compiler itself_ appears to not understand borrowing. It does not
inspire confidence.
r? @nikomatsakis
These give so many incorrect suggestions that having them is
detrimental to the user experience. The compiler should not be
suggesting changes to the code that are wrong - it is infuriating: not
only is the compiler telling you that _you don't understand_ borrowing,
_the compiler itself_ appears to not understand borrowing. It does not
inspire confidence.
Point to immutable borrow arguments and fields when trying to use them as
mutable borrows. Add label to primary span on "cannot borrow as mutable"
errors.
Present the following output when trying to access an immutable borrow's
field as mutable:
```
error[E0389]: cannot borrow data mutably in a `&` reference
--> $DIR/issue-38147-1.rs:27:9
|
26 | fn f(&self) {
| ----- use `&mut self` here to make mutable
27 | f.s.push('x');
| ^^^ assignment into an immutable reference
```
And the following when trying to access an immutable struct field as mutable:
```
error: cannot borrow immutable borrowed content `*self.s` as mutable
--> $DIR/issue-38147-3.rs:17:9
|
12 | s: &'a String
| ------------- use `&'a mut String` here to make mutable
...|
16 | fn f(&self) {
| ----- use `&mut self` here to make mutable
17 | self.s.push('x');
| ^^^^^^ cannot borrow as mutable
```
Use multiline Diagnostic for candidate in other module
```
error[E0574]: expected struct, variant or union type, found enum `Result`
--> $DIR/issue-16058.rs:19:9
|
19 | Result {
| ^^^^^^ not a struct, variant or union type
|
= help: possible better candidates are found in other modules, you can import them into scope:
`use std::fmt::Result;`
`use std::io::Result;`
`use std:🧵:Result;`
error: aborting due to previous error
```
E0034: provide disambiguated syntax for candidates
For a given file
```rust
trait A { fn foo(&self) {} }
trait B : A { fn foo(&self) {} }
fn bar<T: B>(a: &T) {
a.foo()
}
```
provide the following output
```
error[E0034]: multiple applicable items in scope
--> file.rs:6:5
|
6 | a.foo(1)
| ^^^ multiple `foo` found
|
note: candidate #1 is defined in the trait `A`
--> file.rs:2:11
|
2 | trait A { fn foo(&self, a: usize) {} }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to use it here write `A::foo(&a, 1)` instead
--> file.rs:6:5
|
6 | a.foo(1)
| ^^^
note: candidate #2 is defined in the trait `B`
--> file.rs:3:15
|
3 | trait B : A { fn foo(&self, a: usize) {} }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to use it here write `B::foo(&a, 1)` instead
--> file.rs:6:5
|
6 | a.foo(1)
| ^^^
```
Fix#37767.
Fix lint attributes on non-item nodes.
Currently, late lint checking uses two HIR visitors: LateContext and
IdVisitor. IdVisitor only overrides visit_id, and for each node searches
for builtin lints previously added to the session; LateContext overrides
a number of methods, and runs late lints. When LateContext encounters an
item, it first has IdVisitor walk everything in it except nested items
(OnlyBodies), then recurses into it itself - i.e. there are two separate
walks.
Aside from apparently being unnecessary, this separation prevents lint
attributes (allow/deny/warn) on non-item HIR nodes from working
properly. Test case:
```rust
// generates warning without this change
fn main() { #[allow(unreachable_code)] loop { break; break; } }
```
LateContext contains logic to merge attributes seen into the current lint
settings while walking (with_lint_attrs), but IdVisitor does not. So
such attributes will affect late lints (because they are called from
LateContext), and if the node contains any items within it, they will
affect builtin lints within those items (because that IdVisitor is run
while LateContext is within the attributed node), but otherwise the
attributes will be ignored for builtin lints.
This change simply removes IdVisitor and moves its visit_id into
LateContext itself. Hopefully this doesn't break anything...
Also added walk calls to visit_lifetime and visit_lifetime_def
respectively, so visit_lifetime_def will recurse into the lifetime and
visit_lifetime will recurse into the name. In principle this could
confuse lint plugins. This is "necessary" because walk_lifetime calls
visit_id on the lifetime; of course, an alternative would be directly
calling visit_id (which would require manually iterating over the
lifetimes in visit_lifetime_def), but that seems less clean.
Use multiline Diagnostic for "relevant impl" list
Provide the following output:
```
error[E0277]: the trait bound `Bar: Foo<usize>` is not satisfied
--> $DIR/issue-21659-show-relevant-trait-impls-2.rs:38:8
|
38 | f1.foo(1usize);
| ^^^ the trait `Foo<usize>` is not implemented for `Bar`
|
= help: the following implementations were found:
<Bar as Foo<i8>>
<Bar as Foo<i16>>
<Bar as Foo<i32>>
<Bar as Foo<u8>>
and 2 others
error: aborting due to previous error
```
instead of
```
error[E0277]: the trait bound `Bar: Foo<usize>` is not satisfied
--> $DIR/issue-21659-show-relevant-trait-impls-2.rs:38:8
|
38 | f1.foo(1usize);
| ^^^ the trait `Foo<usize>` is not implemented for `Bar`
|
= help: the following implementations were found:
= help: <Bar as Foo<i8>>
= help: <Bar as Foo<i16>>
= help: <Bar as Foo<i32>>
= help: <Bar as Foo<u8>>
= help: and 2 others
error: aborting due to previous error
```
Provide the following output:
```
error[E0277]: the trait bound `Bar: Foo<usize>` is not satisfied
--> $DIR/issue-21659-show-relevant-trait-impls-2.rs:38:8
|
38 | f1.foo(1usize);
| ^^^ the trait `Foo<usize>` is not implemented for `Bar`
|
= help: the following implementations were found:
<Bar as Foo<i8>>
<Bar as Foo<i16>>
<Bar as Foo<i32>>
<Bar as Foo<u8>>
and 2 others
error: aborting due to previous error
```
instead of
```
error[E0277]: the trait bound `Bar: Foo<usize>` is not satisfied
--> $DIR/issue-21659-show-relevant-trait-impls-2.rs:38:8
|
38 | f1.foo(1usize);
| ^^^ the trait `Foo<usize>` is not implemented for `Bar`
|
= help: the following implementations were found:
= help: <Bar as Foo<i8>>
= help: <Bar as Foo<i16>>
= help: <Bar as Foo<i32>>
= help: <Bar as Foo<u8>>
= help: and 2 others
error: aborting due to previous error
```
resolve: Do not use "resolve"/"resolution" in error messages
Use less jargon-y wording instead.
`cannot find <struct> <S> in <this scope>` and `cannot find <struct> <S> in <module a::b>` are used for base messages (this also harmonizes nicely with "you can import it into scope" suggestions) and `not found in <this scope>` and `not found in <a::b>` are used for short labels in fall-back case.
I tweaked some other diagnostics to avoid using "resolve" (see, e.g., `librustc_resolve/macros.rs`), but haven't touched messages for imports.
Closes https://github.com/rust-lang/rust/issues/38750
r? @nrc
For a given file
```rust
trait A { fn foo(&self) {} }
trait B : A { fn foo(&self) {} }
fn bar<T: B>(a: &T) {
a.foo()
}
```
provide the following output
```
error[E0034]: multiple applicable items in scope
--> file.rs:6:5
|
6 | a.foo(1)
| ^^^ multiple `foo` found
|
note: candidate #1 is defined in the trait `A`
--> file.rs:2:11
|
2 | trait A { fn foo(&self, a: usize) {} }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to use it here write `A::foo(&a, 1)` instead
--> file.rs:6:5
|
6 | a.foo(1)
| ^^^
note: candidate #2 is defined in the trait `B`
--> file.rs:3:15
|
3 | trait B : A { fn foo(&self, a: usize) {} }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to use it here write `B::foo(&a, 1)` instead
--> file.rs:6:5
|
6 | a.foo(1)
| ^^^
```
Teach diagnostics to correct margin of multiline messages
Make the suggestion list have a correct padding:
```
error[E0308]: mismatched types
--> file.rs:3:20
|
3 | let x: usize = "";
| ^^ expected usize, found reference
|
= note: expected type `usize`
= note: found type `&'static str`
= help: here are some functions which might fulfill your needs:
- .len()
- .foo()
- .bar()
```
Make the suggestion list have a correct padding:
```
error[E0308]: mismatched types
--> file.rs:3:20
|
3 | let x: usize = "";
| ^^ expected usize, found reference
|
= note: expected type `usize`
= note: found type `&'static str`
= help: here are some functions which might fulfill your needs:
- .len()
- .foo()
- .bar()
```
PTX support, take 2
- You can generate PTX using `--emit=asm` and the right (custom) target. Which
then you can run on a NVIDIA GPU.
- You can compile `core` to PTX. [Xargo] also works and it can compile some
other crates like `collections` (but I doubt all of those make sense on a GPU)
[Xargo]: https://github.com/japaric/xargo
- You can create "global" functions, which can be "called" by the host, using
the `"ptx-kernel"` ABI, e.g. `extern "ptx-kernel" fn kernel() { .. }`. Every
other function is a "device" function and can only be called by the GPU.
- Intrinsics like `__syncthreads()` and `blockIdx.x` are available as
`"platform-intrinsics"`. These intrinsics are *not* in the `core` crate but
any Rust user can create "bindings" to them using an `extern
"platform-intrinsics"` block. See example at the end.
- Trying to emit PTX with `-g` (debuginfo); you get an LLVM error. But I don't
think PTX can contain debuginfo anyway so `-g` should be ignored and a warning
should be printed ("`-g` doesn't work with this target" or something).
- "Single source" support. You *can't* write a single source file that contains
both host and device code. I think that should be possible to implement that
outside the compiler using compiler plugins / build scripts.
- The equivalent to CUDA `__shared__` which it's used to declare memory that's
shared between the threads of the same block. This could be implemented using
attributes: `#[shared] static mut SCRATCH_MEMORY: [f32; 64]` but hasn't been
implemented yet.
- Built-in targets. This PR doesn't add targets to the compiler just yet but one
can create custom targets to be able to emit PTX code (see the example at the
end). The idea is to have people experiment with this feature before
committing to it (built-in targets are "insta-stable")
- All functions must be "inlined". IOW, the `.rlib` must always contain the LLVM
bitcode of all the functions of the crate it was produced from. Otherwise, you
end with "undefined references" in the final PTX code but you won't get *any*
linker error because no linker is involved. IOW, you'll hit a runtime error
when loading the PTX into the GPU. The workaround is to use `#[inline]` on
non-generic functions and to never use `#[inline(never)]` but this may not
always be possible because e.g. you could be relying on third party code.
- Should `--emit=asm` generate a `.ptx` file instead of a `.s` file?
TL;DR Use Xargo to turn a crate into a PTX module (a `.s` file). Then pass that
PTX module, as a string, to the GPU and run it.
The full code is in [this repository]. This section gives an overview of how to
run Rust code on a NVIDIA GPU.
[this repository]: https://github.com/japaric/cuda
- Create a custom target. Here's the 64-bit NVPTX target (NOTE: the comments
are not valid because this is supposed to be a JSON file; remove them before
you use this file):
``` js
// nvptx64-nvidia-cuda.json
{
"arch": "nvptx64", // matches LLVM
"cpu": "sm_20", // "oldest" compute capability supported by LLVM
"data-layout": "e-i64:64-v16:16-v32:32-n16:32:64",
"llvm-target": "nvptx64-nvidia-cuda",
"max-atomic-width": 0, // LLVM errors with any other value :-(
"os": "cuda", // matches LLVM
"panic-strategy": "abort",
"target-endian": "little",
"target-pointer-width": "64",
"target-vendor": "nvidia", // matches LLVM -- not required
}
```
(There's a 32-bit target specification in the linked repository)
- Write a kernel
``` rust
extern "platform-intrinsic" {
fn nvptx_block_dim_x() -> i32;
fn nvptx_block_idx_x() -> i32;
fn nvptx_thread_idx_x() -> i32;
}
/// Copies an array of `n` floating point numbers from `src` to `dst`
pub unsafe extern "ptx-kernel" fn memcpy(dst: *mut f32,
src: *const f32,
n: usize) {
let i = (nvptx_block_dim_x() as isize)
.wrapping_mul(nvptx_block_idx_x() as isize)
.wrapping_add(nvptx_thread_idx_x() as isize);
if (i as usize) < n {
*dst.offset(i) = *src.offset(i);
}
}
```
- Emit PTX code
```
$ xargo rustc --target nvptx64-nvidia-cuda --release -- --emit=asm
Compiling core v0.0.0 (file://..)
(..)
Compiling nvptx-builtins v0.1.0 (https://github.com/japaric/nvptx-builtins)
Compiling kernel v0.1.0
$ cat target/nvptx64-nvidia-cuda/release/deps/kernel-*.s
//
// Generated by LLVM NVPTX Back-End
//
.version 3.2
.target sm_20
.address_size 64
// .globl memcpy
.visible .entry memcpy(
.param .u64 memcpy_param_0,
.param .u64 memcpy_param_1,
.param .u64 memcpy_param_2
)
{
.reg .pred %p<2>;
.reg .s32 %r<5>;
.reg .s64 %rd<12>;
ld.param.u64 %rd7, [memcpy_param_2];
mov.u32 %r1, %ntid.x;
mov.u32 %r2, %ctaid.x;
mul.wide.s32 %rd8, %r2, %r1;
mov.u32 %r3, %tid.x;
cvt.s64.s32 %rd9, %r3;
add.s64 %rd10, %rd9, %rd8;
setp.ge.u64 %p1, %rd10, %rd7;
@%p1 bra LBB0_2;
ld.param.u64 %rd3, [memcpy_param_0];
ld.param.u64 %rd4, [memcpy_param_1];
cvta.to.global.u64 %rd5, %rd4;
cvta.to.global.u64 %rd6, %rd3;
shl.b64 %rd11, %rd10, 2;
add.s64 %rd1, %rd6, %rd11;
add.s64 %rd2, %rd5, %rd11;
ld.global.u32 %r4, [%rd2];
st.global.u32 [%rd1], %r4;
LBB0_2:
ret;
}
```
- Run it on the GPU
``` rust
// `kernel.ptx` is the `*.s` file we got in the previous step
const KERNEL: &'static str = include_str!("kernel.ptx");
driver::initialize()?;
let device = Device(0)?;
let ctx = device.create_context()?;
let module = ctx.load_module(KERNEL)?;
let kernel = module.function("memcpy")?;
let h_a: Vec<f32> = /* create some random data */;
let h_b = vec![0.; N];
let d_a = driver::allocate(bytes)?;
let d_b = driver::allocate(bytes)?;
// Copy from host to GPU
driver::copy(h_a, d_a)?;
// Run `memcpy` on the GPU
kernel.launch(d_b, d_a, N)?;
// Copy from GPU to host
driver::copy(d_b, h_b)?;
// Verify
assert_eq!(h_a, h_b);
// `d_a`, `d_b`, `h_a`, `h_b` are dropped/freed here
```
---
cc @alexcrichton @brson @rkruppe
> What has changed since #34195?
- `core` now can be compiled into PTX. Which makes it very easy to turn `no_std`
crates into "kernels" with the help of Xargo.
- There's now a way, the `"ptx-kernel"` ABI, to generate "global" functions. The
old PR required a manual step (it was hack) to "convert" "device" functions
into "global" functions. (Only "global" functions can be launched by the host)
- Everything is unstable. There are not "insta stable" built-in targets this
time (\*). The users have to use a custom target to experiment with this
feature. Also, PTX instrinsics, like `__syncthreads` and `blockIdx.x`, are now
implemented as `"platform-intrinsics"` so they no longer live in the `core`
crate.
(\*) I'd actually like to have in-tree targets because that makes this target
more discoverable, removes the need to lug around .json files, etc.
However, bundling a target with the compiler immediately puts it in the path
towards stabilization. Which gives us just two cycles to find and fix any
problem with the target specification. Afterwards, it becomes hard to tweak
the specification because that could be a breaking change.
A possible solution could be "unstable built-in targets". Basically, to use an
unstable target, you'll have to also pass `-Z unstable-options` to the compiler.
And unstable targets, being unstable, wouldn't be available on stable.
> Why should this be merged?
- To let people experiment with the feature out of tree. Having easy access to
the feature (in every nightly) allows this. I also think that, as it is, it
should be possible to start prototyping type-safe single source support using
build scripts, macros and/or plugins.
- It's a straightforward implementation. No different that adding support for
any other architecture.
struct field reordering and optimization
This is work in progress. The goal is to divorce the order of fields in source code from the order of fields in the LLVM IR, then optimize structs (and tuples/enum variants)by always ordering fields from least to most aligned. It does not work yet. I intend to check compiler memory usage as a benchmark, and a crater run will probably be required.
I don't know enough of the compiler to complete this work unaided. If you see places that still need updating, please mention them. The only one I know of currently is debuginfo, which I'm putting off intentionally until a bit later.
r? @eddyb