Properly ban the negation of unsigned integers in type-checking.
Lint-time banning of unsigned negation appears to be vestigial from a time it was feature-gated.
But now it always errors and we do have the ability to deref the checking of e.g. `-0`, through the trait obligation fulfillment context, which will only succeed/error when the `0` gets inferred to a specific type.
The two removed tests are the main reason for finally cleaning this up, they need changing all the time when refactoring the HIR-based `rustc_const_eval` and/or `rustc_passes::consts`, as warnings pile up.
Some i128 tests
* Add some FFI tests for i128 on architectures where we have sort of working "C" FFI support. On all other architectures we ignore the test.
* enhance the u128 overflow tests
Additional cleanup to rustc_trans
Removes `BlockAndBuilder`, `FunctionContext`, and `MaybeSizedValue`.
`LvalueRef` is used instead of `MaybeSizedValue`, which has the added benefit of making functions operating on `Lvalue`s be able to take just that (since it encodes the type with an `LvalueTy`, which can carry discriminant information) instead of a `MaybeSizedValue` and a discriminant.
r? @eddyb
rustc: Stabilize the `proc_macro` feature
This commit stabilizes the `proc_macro` and `proc_macro_lib` features in the
compiler to stabilize the "Macros 1.1" feature of the language. Many more
details can be found on the tracking issue, #35900.
Closes#35900
Fix transmute::<T, U> where T requires a bigger alignment than U
For transmute::<T, U> we simply pointercast the destination from a U
pointer to a T pointer, without providing any alignment information,
thus LLVM assumes that the destination is aligned to hold a value of
type T, which is not necessarily true. This can lead to LLVM emitting
machine instructions that assume said alignment, and thus cause aborts.
To fix this, we need to provide the actual alignment to store_operand()
and in turn to store() so they can set the proper alignment information
on the stores and LLVM can emit the proper machine instructions.
Fixes#32947
prefer hyphens in test files named after issue numbers
We have a lot of tests with filenames honoring particular issues by
number. Typically, these are called issue-${issue_no}.rs (note the
hyphen):
```
$ find . -regextype posix-egrep -regex '.*/issue-[0-9]*.rs' | wc
1289 1289 35935
```
We also had a much smaller number of files that are like this, but don't
have a hyphen in between the substring `issue` and the number:
```
$ find . -regextype posix-egrep -regex '.*/issue[0-9]*.rs'
./debuginfo/issue14411.rs
./debuginfo/issue12886.rs
./debuginfo/issue13213.rs
./debuginfo/issue22656.rs
./debuginfo/issue7712.rs
./compile-fail/issue32829.rs
./run-pass/issue24353.rs
./run-pass/issue34796.rs
./run-pass/issue18173.rs
./run-pass/issue22346.rs
./run-pass/auxiliary/issue13507.rs
./run-pass/issue26127.rs
./run-pass/issue22008.rs
./run-pass/issue34569.rs
./run-pass/issue29927.rs
./run-pass/issue36260.rs
```
Some would argue that the inconsistency is æsthetically displeasing,
hence this trivial patch. (Note that run-pass/auxiliary/issue13507.rs
has an excuse; it's `use`d in run-pass/issue-13507-2.rs; the matter of
there being two different compile-fail tests with different name
conventions for issue no. 32829 is also neglected here for the sake of
keeping this trivial cleanup patch as trivial as possible for ease of
review.)
Fix debuginfo for unsized struct members
The member was given the size of a fat pointer, which caused
llvm to emit DWARF attributes for a 128-bit bitfield.
This commit stabilizes the `proc_macro` and `proc_macro_lib` features in the
compiler to stabilize the "Macros 1.1" feature of the language. Many more
details can be found on the tracking issue, #35900.
Closes#35900
Fix pre-cfg_attr notation in comment
Commit aa3b1261b1 has changed notation
in the test from `#[ignore(cfg(ignorecfg))]` to `#[cfg_attr(ignorecfg, ignore)]`,
but missed to change the comment in the accompanying Makefile.
Commit aa3b1261b1 has changed notation
in the test from `#[ignore(cfg(ignorecfg))]` to `#[cfg_attr(ignorecfg, ignore)]`,
but missed to change the comment in the accompanying Makefile.
i128 and u128 support
Brings i128 and u128 support to nightly rust, behind a feature flag. The goal of this PR is to do the bulk of the work for 128 bit integer support. Smaller but just as tricky features needed for stabilisation like 128 bit enum discriminants are left for future PRs.
Rebased version of #37900, which in turn was a rebase + improvement of #35954 . Sadly I couldn't reopen#37900 due to github. There goes my premium position in the homu queue...
[plugin-breaking-change]
cc #35118 (tracking issue)
For transmute::<T, U> we simply pointercast the destination from a U
pointer to a T pointer, without providing any alignment information,
thus LLVM assumes that the destination is aligned to hold a value of
type T, which is not necessarily true. This can lead to LLVM emitting
machine instructions that assume said alignment, and thus cause aborts.
To fix this, we need to provide the actual alignment to store_operand()
and in turn to store() so they can set the proper alignment information
on the stores and LLVM can emit the proper machine instructions.
Fixes#32947
PTX support, take 2
- You can generate PTX using `--emit=asm` and the right (custom) target. Which
then you can run on a NVIDIA GPU.
- You can compile `core` to PTX. [Xargo] also works and it can compile some
other crates like `collections` (but I doubt all of those make sense on a GPU)
[Xargo]: https://github.com/japaric/xargo
- You can create "global" functions, which can be "called" by the host, using
the `"ptx-kernel"` ABI, e.g. `extern "ptx-kernel" fn kernel() { .. }`. Every
other function is a "device" function and can only be called by the GPU.
- Intrinsics like `__syncthreads()` and `blockIdx.x` are available as
`"platform-intrinsics"`. These intrinsics are *not* in the `core` crate but
any Rust user can create "bindings" to them using an `extern
"platform-intrinsics"` block. See example at the end.
- Trying to emit PTX with `-g` (debuginfo); you get an LLVM error. But I don't
think PTX can contain debuginfo anyway so `-g` should be ignored and a warning
should be printed ("`-g` doesn't work with this target" or something).
- "Single source" support. You *can't* write a single source file that contains
both host and device code. I think that should be possible to implement that
outside the compiler using compiler plugins / build scripts.
- The equivalent to CUDA `__shared__` which it's used to declare memory that's
shared between the threads of the same block. This could be implemented using
attributes: `#[shared] static mut SCRATCH_MEMORY: [f32; 64]` but hasn't been
implemented yet.
- Built-in targets. This PR doesn't add targets to the compiler just yet but one
can create custom targets to be able to emit PTX code (see the example at the
end). The idea is to have people experiment with this feature before
committing to it (built-in targets are "insta-stable")
- All functions must be "inlined". IOW, the `.rlib` must always contain the LLVM
bitcode of all the functions of the crate it was produced from. Otherwise, you
end with "undefined references" in the final PTX code but you won't get *any*
linker error because no linker is involved. IOW, you'll hit a runtime error
when loading the PTX into the GPU. The workaround is to use `#[inline]` on
non-generic functions and to never use `#[inline(never)]` but this may not
always be possible because e.g. you could be relying on third party code.
- Should `--emit=asm` generate a `.ptx` file instead of a `.s` file?
TL;DR Use Xargo to turn a crate into a PTX module (a `.s` file). Then pass that
PTX module, as a string, to the GPU and run it.
The full code is in [this repository]. This section gives an overview of how to
run Rust code on a NVIDIA GPU.
[this repository]: https://github.com/japaric/cuda
- Create a custom target. Here's the 64-bit NVPTX target (NOTE: the comments
are not valid because this is supposed to be a JSON file; remove them before
you use this file):
``` js
// nvptx64-nvidia-cuda.json
{
"arch": "nvptx64", // matches LLVM
"cpu": "sm_20", // "oldest" compute capability supported by LLVM
"data-layout": "e-i64:64-v16:16-v32:32-n16:32:64",
"llvm-target": "nvptx64-nvidia-cuda",
"max-atomic-width": 0, // LLVM errors with any other value :-(
"os": "cuda", // matches LLVM
"panic-strategy": "abort",
"target-endian": "little",
"target-pointer-width": "64",
"target-vendor": "nvidia", // matches LLVM -- not required
}
```
(There's a 32-bit target specification in the linked repository)
- Write a kernel
``` rust
extern "platform-intrinsic" {
fn nvptx_block_dim_x() -> i32;
fn nvptx_block_idx_x() -> i32;
fn nvptx_thread_idx_x() -> i32;
}
/// Copies an array of `n` floating point numbers from `src` to `dst`
pub unsafe extern "ptx-kernel" fn memcpy(dst: *mut f32,
src: *const f32,
n: usize) {
let i = (nvptx_block_dim_x() as isize)
.wrapping_mul(nvptx_block_idx_x() as isize)
.wrapping_add(nvptx_thread_idx_x() as isize);
if (i as usize) < n {
*dst.offset(i) = *src.offset(i);
}
}
```
- Emit PTX code
```
$ xargo rustc --target nvptx64-nvidia-cuda --release -- --emit=asm
Compiling core v0.0.0 (file://..)
(..)
Compiling nvptx-builtins v0.1.0 (https://github.com/japaric/nvptx-builtins)
Compiling kernel v0.1.0
$ cat target/nvptx64-nvidia-cuda/release/deps/kernel-*.s
//
// Generated by LLVM NVPTX Back-End
//
.version 3.2
.target sm_20
.address_size 64
// .globl memcpy
.visible .entry memcpy(
.param .u64 memcpy_param_0,
.param .u64 memcpy_param_1,
.param .u64 memcpy_param_2
)
{
.reg .pred %p<2>;
.reg .s32 %r<5>;
.reg .s64 %rd<12>;
ld.param.u64 %rd7, [memcpy_param_2];
mov.u32 %r1, %ntid.x;
mov.u32 %r2, %ctaid.x;
mul.wide.s32 %rd8, %r2, %r1;
mov.u32 %r3, %tid.x;
cvt.s64.s32 %rd9, %r3;
add.s64 %rd10, %rd9, %rd8;
setp.ge.u64 %p1, %rd10, %rd7;
@%p1 bra LBB0_2;
ld.param.u64 %rd3, [memcpy_param_0];
ld.param.u64 %rd4, [memcpy_param_1];
cvta.to.global.u64 %rd5, %rd4;
cvta.to.global.u64 %rd6, %rd3;
shl.b64 %rd11, %rd10, 2;
add.s64 %rd1, %rd6, %rd11;
add.s64 %rd2, %rd5, %rd11;
ld.global.u32 %r4, [%rd2];
st.global.u32 [%rd1], %r4;
LBB0_2:
ret;
}
```
- Run it on the GPU
``` rust
// `kernel.ptx` is the `*.s` file we got in the previous step
const KERNEL: &'static str = include_str!("kernel.ptx");
driver::initialize()?;
let device = Device(0)?;
let ctx = device.create_context()?;
let module = ctx.load_module(KERNEL)?;
let kernel = module.function("memcpy")?;
let h_a: Vec<f32> = /* create some random data */;
let h_b = vec![0.; N];
let d_a = driver::allocate(bytes)?;
let d_b = driver::allocate(bytes)?;
// Copy from host to GPU
driver::copy(h_a, d_a)?;
// Run `memcpy` on the GPU
kernel.launch(d_b, d_a, N)?;
// Copy from GPU to host
driver::copy(d_b, h_b)?;
// Verify
assert_eq!(h_a, h_b);
// `d_a`, `d_b`, `h_a`, `h_b` are dropped/freed here
```
---
cc @alexcrichton @brson @rkruppe
> What has changed since #34195?
- `core` now can be compiled into PTX. Which makes it very easy to turn `no_std`
crates into "kernels" with the help of Xargo.
- There's now a way, the `"ptx-kernel"` ABI, to generate "global" functions. The
old PR required a manual step (it was hack) to "convert" "device" functions
into "global" functions. (Only "global" functions can be launched by the host)
- Everything is unstable. There are not "insta stable" built-in targets this
time (\*). The users have to use a custom target to experiment with this
feature. Also, PTX instrinsics, like `__syncthreads` and `blockIdx.x`, are now
implemented as `"platform-intrinsics"` so they no longer live in the `core`
crate.
(\*) I'd actually like to have in-tree targets because that makes this target
more discoverable, removes the need to lug around .json files, etc.
However, bundling a target with the compiler immediately puts it in the path
towards stabilization. Which gives us just two cycles to find and fix any
problem with the target specification. Afterwards, it becomes hard to tweak
the specification because that could be a breaking change.
A possible solution could be "unstable built-in targets". Basically, to use an
unstable target, you'll have to also pass `-Z unstable-options` to the compiler.
And unstable targets, being unstable, wouldn't be available on stable.
> Why should this be merged?
- To let people experiment with the feature out of tree. Having easy access to
the feature (in every nightly) allows this. I also think that, as it is, it
should be possible to start prototyping type-safe single source support using
build scripts, macros and/or plugins.
- It's a straightforward implementation. No different that adding support for
any other architecture.
travis: Update the OSX image we run tests in
The current image is `xcode7.3`, Travis's current default. Unfortunately this
has a version of LLDB which doesn't support debuginfo-lldb tests (see #32520),
so we're not running LLDB tests on Travis yet.
This switches us to the newest image from Travis, `xcode8.2`, which should have
a newer version of LLDB we can run tests against.