Do not visit whole crate to compute `lints_that_dont_need_to_run`.
This allows to reuse the computed lint levels instead of re-visiting the whole crate.
add sret handling for scalar autodiff
r? `@oli-obk`
Fixing one of the todo's which I left in my previous batching PR.
This one handles sret for scalar autodiff. `sret` mostly shows up when we try to return a lot of scalar floats.
People often start testing autodiff which toy functions which just use a few scalars as inputs and outputs, and those were the most likely to be affected by this issue. So this fix should make learning/teaching hopefully a bit easier.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
Stop calling `source_span` query in significant drop order code
`source_span` is only meant for incremental tracking. I don't really think we need to highlight the whole drop impl span anyways; it can be quite large.
r? oli-obk
Remove support for `extern "rust-intrinsic"` blocks
Part of rust-lang/rust#132735
Looked manageable and there didn't appear to have been progress in the last two weeks,
so decided to give it a try.
Implement `super let`
Tracking issue: https://github.com/rust-lang/rust/issues/139076
This implements `super let` as proposed in #139080, based on the following two equivalence rules.
1. For all expressions `$expr` in any context, these are equivalent:
- `& $expr`
- `{ super let a = & $expr; a }`
2. And, additionally, these are equivalent in any context when `$expr` is a temporary (aka rvalue):
- `& $expr`
- `{ super let a = $expr; & a }`
So far, this experiment has a few interesting results:
## Interesting result 1
In this snippet:
```rust
super let a = f(&temp());
```
I originally expected temporary `temp()` would be dropped at the end of the statement (`;`), just like in a regular `let`, because `temp()` is not subject to temporary lifetime extension.
However, it turns out that that would break the fundamental equivalence rules.
For example, in
```rust
g(&f(&temp()));
```
the temporary `temp()` will be dropped at the `;`.
The first equivalence rule tells us this must be equivalent:
```rust
g({ super let a = &f(&temp()); a });
```
But that means that `temp()` must live until the last `;` (after `g()`), not just the first `;` (after `f()`).
While this was somewhat surprising to me at first, it does match the exact behavior we need for `pin!()`: The following _should work_. (See also https://github.com/rust-lang/rust/issues/138718)
```rust
g(pin!(f(&mut temp())));
```
Here, `temp()` lives until the end of the statement. This makes sense from the perspective of the user, as no other `;` or `{}` are visible. Whether `pin!()` uses a `{}` block internally or not should be irrelevant.
This means that _nothing_ in a `super let` statement will be dropped at the end of that super let statement. It does not even need its own scope.
This raises questions that are useful for later on:
- Will this make temporaries live _too long_ in cases where `super let` is used not in a hidden block in a macro, but as a visible statement in code like the following?
```rust
let writer = {
super let file = File::create(&format!("/home/{user}/test"));
Writer::new(&file)
};
```
- Is a `let` statement in a block still the right syntax for this? Considering it has _no_ scope of its own, maybe neither a block nor a statement should be involved
This leads me to think that instead of `{ super let $pat = $init; $expr }`, we might want to consider something like `let $pat = $init in $expr` or `$expr where $pat = $init`. Although there are also issues with these, as it isn't obvious anymore if `$init` should be subject to temporary lifetime extension. (Do we want both `let _ = _ in ..` and `super let _ = _ in ..`?)
## Interesting result 2
What about `super let x;` without initializer?
```rust
let a = {
super let x;
x = temp();
&x
};
```
This works fine with the implementation in this PR: `x` is extended to live as long as `a`.
While it matches my expectations, a somewhat interesting thing to realize is that these are _not_ equivalent:
- `super let x = $expr;`
- `super let x; x = $expr;`
In the first case, all temporaries in $expr will live at least as long as (the result of) the surrounding block.
In the second case, temporaries will be dropped at the end of the assignment statement. (Because the assignment statement itself "is not `super`".)
This difference in behavior might be confusing, but it _might_ be useful.
One might want to extend the lifetime of a variable without extending all the temporaries in the initializer expression.
On the other hand, that can also be expressed as:
- `let x = $expr; super let x = x;` (w/o temporary lifetime extension), or
- `super let x = { $expr };` (w/ temporary lifetime extension)
So, this raises these questions:
- Do we want to accept `super let x;` without initializer at all?
- Does it make sense for statements other than let statements to be "super"? An expression statement also drops temporaries at its `;`, so now that we discovered that `super let` basically disables that `;` (see interesting result 1), is there a use to having other statements without their own scope? (I don't think that's ever useful?)
## Interesting result 3
This works now:
```rust
super let Some(x) = a.get(i) else { return };
```
I didn't put in any special cases for `super let else`. This is just the behavior that 'naturally' falls out when implementing `super let` without thinking of the `let else` case.
- Should `super let else` work?
## Interesting result 4
This 'works':
```rust
fn main() {
super let a = 123;
}
```
I didn't put in any special cases for `super let` at function scope. I had expected the code to cause an ICE or other weird failure when used at function body scope, because there's no way to let the variable live as long as the result of the function.
This raises the question:
- Does this mean that this behavior is the natural/expected behavior when `super let` is used at function scope? Or is this just a quirk and should we explicitly disallow `super let` in a function body? (Probably the latter.)
---
The questions above do not need an answer to land this PR. These questions should be considered when redesigning/rfc'ing/stabilizing the feature.
Add new `PatKind::Missing` variants
To avoid some ugly uses of `kw::Empty` when handling "missing" patterns, e.g. in bare fn tys. Helps with #137978. Details in the individual commits.
r? ``@oli-obk``
fix usage of `autodiff` macro with inner functions
This PR adds additional handling into the expansion step of the `std::autodiff` macro (in `compiler/rustc_builtin_macros/src/autodiff.rs`), which allows the macro to be applied to inner functions.
```rust
#![feature(autodiff)]
use std::autodiff::autodiff;
fn main() {
#[autodiff(d_inner, Forward, Dual, DualOnly)]
fn inner(x: f32) -> f32 {
x * x
}
}
```
Previously, the compiler didn't allow this due to only handling `Annotatable::Item` and `Annotatable::AssocItem` and missing the handling of `Annotatable::Stmt`. This resulted in the rather generic error
```
error: autodiff must be applied to function
--> src/main.rs:6:5
|
6 | / fn inner(x: f32) -> f32 {
7 | | x * x
8 | | }
| |_____^
error: could not compile `enzyme-test` (bin "enzyme-test") due to 1 previous error
```
This issue was originally reported [here](https://github.com/EnzymeAD/rust/issues/184).
Quick question: would it make sense to add a ui test to ensure there is no regression on this?
This is my first contribution, so I'm extra grateful for any piece of feedback!! :D
r? `@oli-obk`
Tracking issue for autodiff: #124509
Prevent a test from seeing forbidden numbers in the rustc version
The final CHECK-NOT directive in this test was able to see past the end of the enclosing function, and find the substring `753` or `754` in the git hash in the rustc version number, causing false failures in CI whenever the git hash happens to contain those digits in sequence.
Adding an explicit check for `ret` prevents the CHECK-NOT directive from seeing past the end of the function.
---
Manually tested by adding `// CHECK-NOT: rustc` after the existing CHECK-NOT directives, and demonstrating that the new check prevents it from seeing the rustc version string.
The final CHECK-NOT directive in this test was able to see past the end of the
enclosing function, and find the substring 753 or 754 in the git hash in the
rustc version number, causing false failures in CI.
Adding an explicit check for `ret` prevents the CHECK-NOT directive from seeing
past the end of the function.
Update the minimum external LLVM to 19
With this change, we'll have stable support for LLVM 19 and 20.
For reference, the previous increase to LLVM 18 was #130487.
cc `@rust-lang/wg-llvm`
r? nikic
Implement `SliceIndex` for `ByteStr`
Implement `Index` and `IndexMut` for `ByteStr` in terms of `SliceIndex`. Implement it for the same types that `&[u8]` supports (a superset of those supported for `&str`, which does not have `usize` and `ops::IndexRange`).
At the same time, move compare and index traits to a separate file in the `bstr` module, to give it more space to grow as more functionality is added (e.g., iterators and string-like ops). Order the items in `bstr/traits.rs` similarly to `str/traits.rs`.
cc `@joshtriplett`
`ByteStr`/`ByteString` tracking issue: https://github.com/rust-lang/rust/issues/134915
Apply `Recovery::Forbidden` when reparsing pasted macro fragments.
Fixes#137874.
The changes to the output of `tests/ui/associated-consts/issue-93835.rs`
partly undo the changes seen when `NtTy` was removed in #133436, which
is good.
r? ``@petrochenkov``
Autodiff batching
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
```rs
for i in 0..100 {
df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking:
- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
Expose algebraic floating point intrinsics
# Problem
A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.
See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.
### C++: 10us ✅
With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
<th>C++</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
#pragma clang fp reassociate(on)
float sum = 0.0;
for (size_t i = 0; i < len; ++i) {
sum += a[i] * b[i];
}
return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>
### Nightly Rust: 10us ✅
With rustc 1.86.0-nightly (8239a37f9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>
### Stable Rust: 84us ❌
With rustc 1.84.1 (e71f9a9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum += a[i] * b[i];
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>
# Proposed Change
Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.
# Alternatives Considered
https://github.com/rust-lang/rust/issues/21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.
In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.
# References
* https://github.com/rust-lang/rust/issues/21690
* https://github.com/rust-lang/libs-team/issues/532
* https://github.com/rust-lang/rust/issues/136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps
try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
Use the span of the whole bound when the diagnostic talks about a bound
While it makes sense that the host predicate only points to the `~const` part, as whether the actual trait bound is satisfied is checked separately, the user facing diagnostic is talking about the entire trait bound, at which point it makes more sense to just highlight the entire bound
r? `@compiler-errors` or `@fee1-dead`
Fix `Debug` impl for `LateParamRegionKind`.
It uses `Br` prefixes which are inappropriate and appear to have been incorrectly copy/pasted from the `Debug` impl for `BoundRegionKind`.
r? `@BoxyUwU`
Fix 2024 edition doctest panic output
Fixes#137970.
The problem was that the output was actually displayed by rustc itself because we're exiting with `Result<(), String>`, and the display is really not great. So instead, we get the output, we print it and then we return an `ExitCode`.
r? ````@aDotInTheVoid````
add `TypingMode::Borrowck`
Shares the first commit with #138499, doesn't really matter which PR to land first 😊😁
Introduces `TypingMode::Borrowck` which unlike `TypingMode::Analysis`, uses the hidden type computed by HIR typeck as the initial value of opaques instead of an unconstrained infer var. This is a part of https://github.com/rust-lang/types-team/issues/129.
Using this new `TypingMode` is unfortunately a breaking change for now, see tests/ui/impl-trait/non-defining-uses/as-projection-term.rs. Using an inference variable as the initial value results in non-defining uses in the defining scope. We therefore only enable it if with `-Znext-solver=globally` or `-Ztyping-mode-borrowck`
To do that the PR contains the following changes:
- `TypeckResults::concrete_opaque_type` are already mapped to the definition of the opaque type
- writeback now checks that the non-lifetime parameters of the opaque are universal
- for this, `fn check_opaque_type_parameter_valid` is moved from `rustc_borrowck` to `rustc_trait_selection`
- we add a new `query type_of_opaque_hir_typeck` which, using the same visitors as MIR typeck, attempts to merge the hidden types from HIR typeck from all defining scopes
- done by adding a `DefiningScopeKind` flag to toggle between using borrowck and HIR typeck
- the visitors stop checking that the MIR type matches the HIR type. This is trivial as the HIR type are now used as the initial hidden types of the opaque. This check is useful as a safeguard when not using `TypingMode::Borrowck`, but adding it to the new structure is annoying and it's not soundness critical, so I intend to not add it back.
- add a `TypingMode::Borrowck` which behaves just like `TypingMode::Analysis` except when normalizing opaque types
- it uses `type_of_opaque_hir_typeck(opaque)` as the initial value after replacing its regions with new inference vars
- it uses structural lookup in the new solver
fixes#112201, fixes#132335, fixes#137751
r? `@compiler-errors` `@oli-obk`
Pass correct param-env to `error_implies`
Duplicated comment from the test:
In the error reporting code, when reporting fulfillment errors for goals A and B, we try to see if elaborating A will result in another goal that can equate with B. That would signal that B is "implied by" A, allowing us to skip reporting it, which is beneficial for cutting down on the number of diagnostics we report.
In the new trait solver especially, but even in the old trait solver through things like defining opaque type usages, this `can_equate` call was not properly taking the param-env of the goals, resulting in nested obligations that had empty param-envs. If one of these nested obligations was a `ConstParamHasTy` goal, then we would ICE, since those goals are particularly strict about the param-env they're evaluated in.
This is morally a fix for <https://github.com/rust-lang/rust/issues/139314>, but that repro uses details about how defining usages in the `check_opaque_well_formed` code can spring out of type equality, and will likely stop failing soon coincidentally once we start using `PostBorrowck` mode in that check. Instead, we use lazy normalization to end up generating an alias-eq goal whose nested goals woul trigger the ICE instead, since this is a lot more stable.
Fixes https://github.com/rust-lang/rust/issues/139314
r? ``@oli-obk`` or reassign
Fixes#137874.
Removes `tests/crashes/137874.rs`; the new test is simpler (defines its
own macro) but tests the same thing.
The changes to the output of `tests/ui/associated-consts/issue-93835.rs`
partly undo the changes seen when `NtTy` was removed in #133436, which
is good.
Initial support for auto traits with default bounds
This PR is part of ["MCP: Low level components for async drop"](https://github.com/rust-lang/compiler-team/issues/727)
Tracking issue: #138781
Summary: https://github.com/rust-lang/rust/pull/120706#issuecomment-1934006762
### Intro
Sometimes we want to use type system to express specific behavior and provide safety guarantees. This behavior can be specified by various "marker" traits. For example, we use `Send` and `Sync` to keep track of which types are thread safe. As the language develops, there are more problems that could be solved by adding new marker traits:
- to forbid types with an async destructor to be dropped in a synchronous context a trait like `SyncDrop` could be used [Async destructors, async genericity and completion futures](https://sabrinajewson.org/blog/async-drop).
- to support [scoped tasks](https://without.boats/blog/the-scoped-task-trilemma/) or in a more general sense to provide a [destruction guarantee](https://zetanumbers.github.io/book/myosotis.html) there is a desire among some users to see a `Leak` (or `Forget`) trait.
- Withoutboats in his [post](https://without.boats/blog/changing-the-rules-of-rust/) reflected on the use of `Move` trait instead of a `Pin`.
All the traits proposed above are supposed to be auto traits implemented for most types, and usually implemented automatically by compiler.
For backward compatibility these traits have to be added implicitly to all bound lists in old code (see below). Adding new default bounds involves many difficulties: many standard library interfaces may need to opt out of those default bounds, and therefore be infected with confusing `?Trait` syntax, migration to a new edition may contain backward compatibility holes, supporting new traits in the compiler can be quite difficult and so forth. Anyway, it's hard to evaluate the complexity until we try the system on a practice.
In this PR we introduce new optional lang items for traits that are added to all bound lists by default, similarly to existing `Sized`. The examples of such traits could be `Leak`, `Move`, `SyncDrop` or something else, it doesn't matter much right now (further I will call them `DefaultAutoTrait`'s). We want to land this change into rustc under an option, so it becomes available in bootstrap compiler. Then we'll be able to do standard library experiments with the aforementioned traits without adding hundreds of `#[cfg(not(bootstrap))]`s. Based on the experiments, we can come up with some scheme for the next edition, in which such bounds are added in a more targeted way, and not just everywhere.
Most of the implementation is basically a refactoring that replaces hardcoded uses of `Sized` with iterating over a list of traits including both `Sized` and the new traits when `-Zexperimental-default-bounds` is enabled (or just `Sized` as before, if the option is not enabled).
### Default bounds for old editions
All existing types, including generic parameters, are considered `Leak`/`Move`/`SyncDrop` and can be forgotten, moved or destroyed in generic contexts without specifying any bounds. New types that cannot be, for example, forgotten and do not implement `Leak` can be added at some point, and they should not be usable in such generic contexts in existing code.
To both maintain this property and keep backward compatibility with existing code, the new traits should be added as default bounds _everywhere_ in previous editions. Besides the implicit `Sized` bound contexts that includes supertrait lists and trait lists in trait objects (`dyn Trait1 + ... + TraitN`). Compiler should also generate implicit `DefaultAutoTrait` implementations for foreign types (`extern { type Foo; }`) because they are also currently usable in generic contexts without any bounds.
#### Supertraits
Adding the new traits as supertraits to all existing traits is potentially necessary, because, for example, using a `Self` param in a trait's associated item may be a breaking change otherwise:
```rust
trait Foo: Sized {
fn new() -> Option<Self>; // ERROR: `Option` requires `DefaultAutoTrait`, but `Self` is not `DefaultAutoTrait`
}
// desugared `Option`
enum Option<T: DefaultAutoTrait + Sized> {
Some(T),
None,
}
```
However, default supertraits can significantly affect compiler performance. For example, if we know that `T: Trait`, the compiler would deduce that `T: DefaultAutoTrait`. It also implies proving `F: DefaultAutoTrait` for each field `F` of type `T` until an explicit impl is be provided.
If the standard library is not modified, then even traits like `Copy` or `Send` would get these supertraits.
In this PR for optimization purposes instead of adding default supertraits, bounds are added to the associated items:
```rust
// Default bounds are generated in the following way:
trait Trait {
fn foo(&self) where Self: DefaultAutoTrait {}
}
// instead of this:
trait Trait: DefaultAutoTrait {
fn foo(&self) {}
}
```
It is not always possible to do this optimization because of backward compatibility:
```rust
pub trait Trait<Rhs = Self> {}
pub trait Trait1 : Trait {} // ERROR: `Rhs` requires `DefaultAutoTrait`, but `Self` is not `DefaultAutoTrait`
```
or
```rust
trait Trait {
type Type where Self: Sized;
}
trait Trait2<T> : Trait<Type = T> {} // ERROR: `???` requires `DefaultAutoTrait`, but `Self` is not `DefaultAutoTrait`
```
Therefore, `DefaultAutoTrait`'s are still being added to supertraits if the `Self` params or type bindings were found in the trait header.
#### Trait objects
Trait objects requires explicit `+ Trait` bound to implement corresponding trait which is not backward compatible:
```rust
fn use_trait_object(x: Box<dyn Trait>) {
foo(x) // ERROR: `foo` requires `DefaultAutoTrait`, but `dyn Trait` is not `DefaultAutoTrait`
}
// implicit T: DefaultAutoTrait here
fn foo<T>(_: T) {}
```
So, for a trait object `dyn Trait` we should add an implicit bound `dyn Trait + DefaultAutoTrait` to make it usable, and allow relaxing it with a question mark syntax `dyn Trait + ?DefaultAutoTrait` when it's not necessary.
#### Foreign types
If compiler doesn't generate auto trait implementations for a foreign type, then it's a breaking change if the default bounds are added everywhere else:
```rust
// implicit T: DefaultAutoTrait here
fn foo<T: ?Sized>(_: &T) {}
extern "C" {
type ExternTy;
}
fn forward_extern_ty(x: &ExternTy) {
foo(x); // ERROR: `foo` requires `DefaultAutoTrait`, but `ExternTy` is not `DefaultAutoTrait`
}
```
We'll have to enable implicit `DefaultAutoTrait` implementations for foreign types at least for previous editions:
```rust
// implicit T: DefaultAutoTrait here
fn foo<T: ?Sized>(_: &T) {}
extern "C" {
type ExternTy;
}
impl DefaultAutoTrait for ExternTy {} // implicit impl
fn forward_extern_ty(x: &ExternTy) {
foo(x); // OK
}
```
### Unresolved questions
New default bounds affect all existing Rust code complicating an already complex type system.
- Proving an auto trait predicate requires recursively traversing the type and proving the predicate for it's fields. This leads to a significant performance regression. Measurements for the stage 2 compiler build show up to 3x regression.
- We hope that fast path optimizations for well known traits could mitigate such regressions at least partially.
- New default bounds trigger some compiler bugs in both old and new trait solver.
- With new default bounds we encounter some trait solver cycle errors that break existing code.
- We hope that these cases are bugs that can be addressed in the new trait solver.
Also migration to a new edition could be quite ugly and enormous, but that's actually what we want to solve. For other issues there's a chance that they could be solved by a new solver.