Fix trait upcasting to dyn type with no principal when there are projections
#126660 (which I had originally authored, lol) had a subtle bug that is the moral equivalent of #114036, which is that when upcasting from `dyn Principal<Projection = Ty> + AutoTrait` to `dyn AutoTrait`, we were dropping the trait ref for `Principal` but not its projections (if there were any).
With debug assertions enabled, this triggers the assertion I luckily added in a2a0cfe825, but even without debug assertions this is a logical bug since we had a dyn type with just a projection bound but no principal, so it caused a type mismatch.
This does not need an FCP because this should've been covered by the FCP in #126660, but we just weren't testing a case when casting from a `dyn` type with projections 😸Fixes#139418
r? ````@oli-obk```` (or anyone)
compiler: report error when trait object type param reference self
Fixes#139082.
Emits an error when `Self` is found in the projection bounds of a trait
object. In type aliases, `Self` has no meaning, so `type A = &'static
dyn B` where `trait B = Fn() -> Self` will expands to `type A = &'static
Fn() -> Self` which is illegal, causing the region solver to bail out
when hitting the uninferred Self.
r? ````@compiler-errors```` ````@fee1-dead````
Tell LLVM about impossible niche tags
I was trying to find a better way of emitting discriminant calculations, but sadly had no luck.
So here's a fairly small PR with the bits that did seem worth bothering:
1. As the [`TagEncoding::Niche` docs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_abi/enum.TagEncoding.html#variant.Niche) describe, it's possible to end up with a dead value in the input that's not already communicated via the range parameter attribute nor the range load metadata attribute. So this adds an `llvm.assume` in non-debug mode to tell LLVM about that. (That way it can tell that the sides of the `select` have disjoint possible values.)
2. I'd written a bunch more tests, or at least made them parameterized, in the process of trying things out, so this checks in those tests to hopefully help future people not trip on the same weird edge cases, like when the tag type is `i8` but yet there's still a variant index and discriminant of `258` which doesn't fit in that tag type because the enum is really weird.
Make error message for missing fields with `..` and without `..` more consistent
When `..` is not present, we say "missing field `bar` in initializer", but when it is present we say "missing mandatory field `bar`". I don't see why the primary error message should change, b/c the root cause is the same.
Let's harmonize these error messages and instead use a label to explain that `..` is required b/c it's not defaulted.
r? estebank
Fixes#139445.
The additional errors aren't great but the first one is still good and
it's the most important, and imperfect errors are better than ICEing.
This can happen when invalid syntax is passed to a declarative macro. We
shouldn't be too strict about the token stream position once the parser
has rejected the invalid syntax.
Fixes#139248.
Do not visit whole crate to compute `lints_that_dont_need_to_run`.
This allows to reuse the computed lint levels instead of re-visiting the whole crate.
add sret handling for scalar autodiff
r? `@oli-obk`
Fixing one of the todo's which I left in my previous batching PR.
This one handles sret for scalar autodiff. `sret` mostly shows up when we try to return a lot of scalar floats.
People often start testing autodiff which toy functions which just use a few scalars as inputs and outputs, and those were the most likely to be affected by this issue. So this fix should make learning/teaching hopefully a bit easier.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
Stop calling `source_span` query in significant drop order code
`source_span` is only meant for incremental tracking. I don't really think we need to highlight the whole drop impl span anyways; it can be quite large.
r? oli-obk
Remove support for `extern "rust-intrinsic"` blocks
Part of rust-lang/rust#132735
Looked manageable and there didn't appear to have been progress in the last two weeks,
so decided to give it a try.
Implement `super let`
Tracking issue: https://github.com/rust-lang/rust/issues/139076
This implements `super let` as proposed in #139080, based on the following two equivalence rules.
1. For all expressions `$expr` in any context, these are equivalent:
- `& $expr`
- `{ super let a = & $expr; a }`
2. And, additionally, these are equivalent in any context when `$expr` is a temporary (aka rvalue):
- `& $expr`
- `{ super let a = $expr; & a }`
So far, this experiment has a few interesting results:
## Interesting result 1
In this snippet:
```rust
super let a = f(&temp());
```
I originally expected temporary `temp()` would be dropped at the end of the statement (`;`), just like in a regular `let`, because `temp()` is not subject to temporary lifetime extension.
However, it turns out that that would break the fundamental equivalence rules.
For example, in
```rust
g(&f(&temp()));
```
the temporary `temp()` will be dropped at the `;`.
The first equivalence rule tells us this must be equivalent:
```rust
g({ super let a = &f(&temp()); a });
```
But that means that `temp()` must live until the last `;` (after `g()`), not just the first `;` (after `f()`).
While this was somewhat surprising to me at first, it does match the exact behavior we need for `pin!()`: The following _should work_. (See also https://github.com/rust-lang/rust/issues/138718)
```rust
g(pin!(f(&mut temp())));
```
Here, `temp()` lives until the end of the statement. This makes sense from the perspective of the user, as no other `;` or `{}` are visible. Whether `pin!()` uses a `{}` block internally or not should be irrelevant.
This means that _nothing_ in a `super let` statement will be dropped at the end of that super let statement. It does not even need its own scope.
This raises questions that are useful for later on:
- Will this make temporaries live _too long_ in cases where `super let` is used not in a hidden block in a macro, but as a visible statement in code like the following?
```rust
let writer = {
super let file = File::create(&format!("/home/{user}/test"));
Writer::new(&file)
};
```
- Is a `let` statement in a block still the right syntax for this? Considering it has _no_ scope of its own, maybe neither a block nor a statement should be involved
This leads me to think that instead of `{ super let $pat = $init; $expr }`, we might want to consider something like `let $pat = $init in $expr` or `$expr where $pat = $init`. Although there are also issues with these, as it isn't obvious anymore if `$init` should be subject to temporary lifetime extension. (Do we want both `let _ = _ in ..` and `super let _ = _ in ..`?)
## Interesting result 2
What about `super let x;` without initializer?
```rust
let a = {
super let x;
x = temp();
&x
};
```
This works fine with the implementation in this PR: `x` is extended to live as long as `a`.
While it matches my expectations, a somewhat interesting thing to realize is that these are _not_ equivalent:
- `super let x = $expr;`
- `super let x; x = $expr;`
In the first case, all temporaries in $expr will live at least as long as (the result of) the surrounding block.
In the second case, temporaries will be dropped at the end of the assignment statement. (Because the assignment statement itself "is not `super`".)
This difference in behavior might be confusing, but it _might_ be useful.
One might want to extend the lifetime of a variable without extending all the temporaries in the initializer expression.
On the other hand, that can also be expressed as:
- `let x = $expr; super let x = x;` (w/o temporary lifetime extension), or
- `super let x = { $expr };` (w/ temporary lifetime extension)
So, this raises these questions:
- Do we want to accept `super let x;` without initializer at all?
- Does it make sense for statements other than let statements to be "super"? An expression statement also drops temporaries at its `;`, so now that we discovered that `super let` basically disables that `;` (see interesting result 1), is there a use to having other statements without their own scope? (I don't think that's ever useful?)
## Interesting result 3
This works now:
```rust
super let Some(x) = a.get(i) else { return };
```
I didn't put in any special cases for `super let else`. This is just the behavior that 'naturally' falls out when implementing `super let` without thinking of the `let else` case.
- Should `super let else` work?
## Interesting result 4
This 'works':
```rust
fn main() {
super let a = 123;
}
```
I didn't put in any special cases for `super let` at function scope. I had expected the code to cause an ICE or other weird failure when used at function body scope, because there's no way to let the variable live as long as the result of the function.
This raises the question:
- Does this mean that this behavior is the natural/expected behavior when `super let` is used at function scope? Or is this just a quirk and should we explicitly disallow `super let` in a function body? (Probably the latter.)
---
The questions above do not need an answer to land this PR. These questions should be considered when redesigning/rfc'ing/stabilizing the feature.
Add new `PatKind::Missing` variants
To avoid some ugly uses of `kw::Empty` when handling "missing" patterns, e.g. in bare fn tys. Helps with #137978. Details in the individual commits.
r? ``@oli-obk``
fix usage of `autodiff` macro with inner functions
This PR adds additional handling into the expansion step of the `std::autodiff` macro (in `compiler/rustc_builtin_macros/src/autodiff.rs`), which allows the macro to be applied to inner functions.
```rust
#![feature(autodiff)]
use std::autodiff::autodiff;
fn main() {
#[autodiff(d_inner, Forward, Dual, DualOnly)]
fn inner(x: f32) -> f32 {
x * x
}
}
```
Previously, the compiler didn't allow this due to only handling `Annotatable::Item` and `Annotatable::AssocItem` and missing the handling of `Annotatable::Stmt`. This resulted in the rather generic error
```
error: autodiff must be applied to function
--> src/main.rs:6:5
|
6 | / fn inner(x: f32) -> f32 {
7 | | x * x
8 | | }
| |_____^
error: could not compile `enzyme-test` (bin "enzyme-test") due to 1 previous error
```
This issue was originally reported [here](https://github.com/EnzymeAD/rust/issues/184).
Quick question: would it make sense to add a ui test to ensure there is no regression on this?
This is my first contribution, so I'm extra grateful for any piece of feedback!! :D
r? `@oli-obk`
Tracking issue for autodiff: #124509
Prevent a test from seeing forbidden numbers in the rustc version
The final CHECK-NOT directive in this test was able to see past the end of the enclosing function, and find the substring `753` or `754` in the git hash in the rustc version number, causing false failures in CI whenever the git hash happens to contain those digits in sequence.
Adding an explicit check for `ret` prevents the CHECK-NOT directive from seeing past the end of the function.
---
Manually tested by adding `// CHECK-NOT: rustc` after the existing CHECK-NOT directives, and demonstrating that the new check prevents it from seeing the rustc version string.
The final CHECK-NOT directive in this test was able to see past the end of the
enclosing function, and find the substring 753 or 754 in the git hash in the
rustc version number, causing false failures in CI.
Adding an explicit check for `ret` prevents the CHECK-NOT directive from seeing
past the end of the function.
Update the minimum external LLVM to 19
With this change, we'll have stable support for LLVM 19 and 20.
For reference, the previous increase to LLVM 18 was #130487.
cc `@rust-lang/wg-llvm`
r? nikic
Implement `SliceIndex` for `ByteStr`
Implement `Index` and `IndexMut` for `ByteStr` in terms of `SliceIndex`. Implement it for the same types that `&[u8]` supports (a superset of those supported for `&str`, which does not have `usize` and `ops::IndexRange`).
At the same time, move compare and index traits to a separate file in the `bstr` module, to give it more space to grow as more functionality is added (e.g., iterators and string-like ops). Order the items in `bstr/traits.rs` similarly to `str/traits.rs`.
cc `@joshtriplett`
`ByteStr`/`ByteString` tracking issue: https://github.com/rust-lang/rust/issues/134915
Apply `Recovery::Forbidden` when reparsing pasted macro fragments.
Fixes#137874.
The changes to the output of `tests/ui/associated-consts/issue-93835.rs`
partly undo the changes seen when `NtTy` was removed in #133436, which
is good.
r? ``@petrochenkov``
Autodiff batching
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
```rs
for i in 0..100 {
df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking:
- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
Expose algebraic floating point intrinsics
# Problem
A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.
See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.
### C++: 10us ✅
With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
<th>C++</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
#pragma clang fp reassociate(on)
float sum = 0.0;
for (size_t i = 0; i < len; ++i) {
sum += a[i] * b[i];
}
return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>
### Nightly Rust: 10us ✅
With rustc 1.86.0-nightly (8239a37f9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>
### Stable Rust: 84us ❌
With rustc 1.84.1 (e71f9a9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum += a[i] * b[i];
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>
# Proposed Change
Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.
# Alternatives Considered
https://github.com/rust-lang/rust/issues/21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.
In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.
# References
* https://github.com/rust-lang/rust/issues/21690
* https://github.com/rust-lang/libs-team/issues/532
* https://github.com/rust-lang/rust/issues/136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps
try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
Use the span of the whole bound when the diagnostic talks about a bound
While it makes sense that the host predicate only points to the `~const` part, as whether the actual trait bound is satisfied is checked separately, the user facing diagnostic is talking about the entire trait bound, at which point it makes more sense to just highlight the entire bound
r? `@compiler-errors` or `@fee1-dead`
Fix `Debug` impl for `LateParamRegionKind`.
It uses `Br` prefixes which are inappropriate and appear to have been incorrectly copy/pasted from the `Debug` impl for `BoundRegionKind`.
r? `@BoxyUwU`
Fix 2024 edition doctest panic output
Fixes#137970.
The problem was that the output was actually displayed by rustc itself because we're exiting with `Result<(), String>`, and the display is really not great. So instead, we get the output, we print it and then we return an `ExitCode`.
r? ````@aDotInTheVoid````