The `ErrorId` variant takes a u16 so that `DiagnosticMessageId` can retain
its `Copy` status (the present author's first choice having been the "EXXX"
code as a string).
The duplicated "type mismatch resolving `{}`" literal is unfortunate, but
the `struct_span_err!` macro (which we want to mark that error code as
used) is fussy about taking a literal, and the one-time-diagnostics set
needs an owned string.
This is concerning #33941 and probably #45805!
... rather than being gated by -Z saturating-float-casts.
There are several reasons for this:
1. Const eval already implements this behavior.
2. Unlike with float->int casts, this behavior is uncontroversially the
right behavior and it is not as performance critical. Thus there is no
particular need to make the bug fix for u128->f32 casts opt-in.
3. Having two orthogonal features under one flag is silly, and never
should have happened in the first place.
4. Benchmarking float->int casts with the -Z flag should not pick up
performance changes due to the u128->f32 casts (assuming there are any).
Fixes#41799
incr.comp.: Verify stability of incr. comp. hashes and clean up various other things.
The main contribution of this PR is that it adds the `-Z incremental-verify-ich` functionality. Normally, when the red-green tracking system determines that a certain query result has not changed, it does not re-compute the incr. comp. hash (ICH) for that query result because that hash is already known. `-Z incremental-verify-ich` tells the compiler to re-hash the query result and compare the new hash against the cached hash. This is a rather thorough way of
- testing hashing implementation stability,
- finding missing `[input]` annotations on `DepNodes`, and
- finding missing read-edges,
since both a missed read and a missing `[input]` annotation can lead to something being marked as green instead of red and thus will have a different hash than it should have.
Case in point, implementing this verification logic and activating it for all `src/test/incremental` tests has revealed several such oversights, all of which are fixed in this PR.
r? @nikomatsakis
Saturating casts between integers and floats
Introduces a new flag, `-Z saturating-float-casts`, which makes code generation for int->float and float->int casts safe (`undef`-free), implementing [the saturating semantics laid out by](https://github.com/rust-lang/rust/issues/10184#issuecomment-299229143) @jorendorff for float->int casts and overflowing to infinity for `u128::MAX` -> `f32`.
Constant evaluation in trans was changed to behave like HIR const eval already did, i.e., saturate for u128->f32 and report an error for problematic float->int casts.
Many thanks to @eddyb, whose APFloat port simplified many parts of this patch, and made HIR constant evaluation recognize dangerous float casts as mentioned above.
Also thanks to @ActuallyaDeviloper whose branchless implementation served as inspiration for this implementation.
cc #10184#41799fixes#45134
This affects regular code generation as well as constant evaluation in trans,
but not the HIR constant evaluator because that one returns an error for
overflowing casts and NaN-to-int casts. That error is conservatively
correct and we should be careful to not accept more code in constant
expressions.
The changes to code generation are guarded by a new -Z flag, to be able
to evaluate the performance impact. The trans constant evaluation changes
are unconditional because they have no run time impact and don't affect
type checking either.
Allow overriding the TLS model
This PR adds the ability to override the default "global-dynamic" TLS model with a more specific one through a target json option or a command-line option. This allows for better code generation in certain situations.
This is similar to the `-ftls-model=` option in GCC and Clang.
Avoid repetition on “use of unstable library feature 'rustc_private'”
This PR fixes the error by only emitting it when the span contains a real file (is not inside a macro) - and making sure it's emitted only once per span.
The first check was needed because spans-within-macros seem to differ a lot and "fixing" them to the real location is not trivial (and the method that does this is private to another module). It also feels like there always will be an error on import, with the real file name, so not sure there's a point to re-emit the same error at macro use.
Fix#44953.
rustc: Allow target-specific default cgus
Some targets, like msp430 and nvptx, don't work with multiple codegen units
right now for bugs or fundamental reasons. To expose this allow targets to
express a default.
Closes#45000
Some targets, like msp430 and nvptx, don't work with multiple codegen units
right now for bugs or fundamental reasons. To expose this allow targets to
express a default.
Closes#45000
rustc: Don't inline in CGUs at -O0
This commit tweaks the behavior of inlining functions into multiple codegen
units when rustc is compiling in debug mode. Today rustc will unconditionally
treat `#[inline]` functions by translating them into all codegen units that
they're needed within, marking the linkage as `internal`. This commit changes
the behavior so that in debug mode (compiling at `-O0`) rustc will instead only
translate `#[inline]` functions into *one* codegen unit, forcing all other
codegen units to reference this one copy.
The goal here is to improve debug compile times by reducing the amount of
translation that happens on behalf of multiple codegen units. It was discovered
in #44941 that increasing the number of codegen units had the adverse side
effect of increasing the overal work done by the compiler, and the suspicion
here was that the compiler was inlining, translating, and codegen'ing more
functions with more codegen units (for example `String` would be basically
inlined into all codegen units if used). The strategy in this commit should
reduce the cost of `#[inline]` functions to being equivalent to one codegen
unit, which is only translating and codegen'ing inline functions once.
Collected [data] shows that this does indeed improve the situation from [before]
as the overall cpu-clock time increases at a much slower rate and when pinned to
one core rustc does not consume significantly more wall clock time than with one
codegen unit.
One caveat of this commit is that the symbol names for inlined functions that
are only translated once needed some slight tweaking. These inline functions
could be translated into multiple crates and we need to make sure the symbols
don't collideA so the crate name/disambiguator is mixed in to the symbol name
hash in these situations.
[data]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334880911
[before]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334583384
Add -Zmutable-noalias flag
We disabled noalias on mutable references a long time ago when it was clear that llvm was incorrectly handling this in relation to unwinding edges.
Since then, a few things have happened:
* llvm has cleaned up a bunch of the issues (I'm told)
* we've added a nounwind codegen option
As such, I would like to add this -Z flag so that we can evaluate if the codegen bugs still exist, and if this significantly affects the codegen of different projects, with an eye towards permanently re-enabling it (or at least making it a stable option).
Document that `-C ar=PATH` doesn't do anything
Are there any plans to use an external archiver in the future?
IIRC, it was used before, but its use was replaced with LLVM's built-in archive management machinery. I can't found a relevant PR though. EDIT: Found it - https://github.com/rust-lang/rust/pull/26926!
The `-C` option is stable so it still can't be removed right away even if there are no plans to use it (but maybe it can be deprecated?).
Target specifications have a field for archiver as well, which is unused too (these ones are unstable, so I guess it can be removed).
r? @alexcrichton
This commit tweaks the behavior of inlining functions into multiple codegen
units when rustc is compiling in debug mode. Today rustc will unconditionally
treat `#[inline]` functions by translating them into all codegen units that
they're needed within, marking the linkage as `internal`. This commit changes
the behavior so that in debug mode (compiling at `-O0`) rustc will instead only
translate `#[inline]` functions into *one* codegen unit, forcing all other
codegen units to reference this one copy.
The goal here is to improve debug compile times by reducing the amount of
translation that happens on behalf of multiple codegen units. It was discovered
in #44941 that increasing the number of codegen units had the adverse side
effect of increasing the overal work done by the compiler, and the suspicion
here was that the compiler was inlining, translating, and codegen'ing more
functions with more codegen units (for example `String` would be basically
inlined into all codegen units if used). The strategy in this commit should
reduce the cost of `#[inline]` functions to being equivalent to one codegen
unit, which is only translating and codegen'ing inline functions once.
Collected [data] shows that this does indeed improve the situation from [before]
as the overall cpu-clock time increases at a much slower rate and when pinned to
one core rustc does not consume significantly more wall clock time than with one
codegen unit.
One caveat of this commit is that the symbol names for inlined functions that
are only translated once needed some slight tweaking. These inline functions
could be translated into multiple crates and we need to make sure the symbols
don't collideA so the crate name/disambiguator is mixed in to the symbol name
hash in these situations.
[data]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334880911
[before]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334583384
update FIXME(#6298) to point to open issue 15020
update FIXME(#6268) to point to RFC 811
update FIXME(#10520) to point to RFC 1751
remove FIXME for emscripten issue 4563 and include target in `test_estimate_scaling_factor`
remove FIXME(#18207) since node_id isn't used for `ref` pattern analysis
remove FIXME(#6308) since DST was implemented in #12938
remove FIXME(#2658) since it was decided to not reorganize module
remove FIXME(#20590) since it was decided to stay conservative with projection types
remove FIXME(#20297) since it was decided that solving the issue is unnecessary
remove FIXME(#27086) since closures do correspond to structs now
remove FIXME(#13846) and enable `function_sections` for windows
remove mention of #22079 in FIXME(#22079) since this is a general FIXME
remove FIXME(#5074) since the restriction on borrow were lifted
This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows *incremental* LTO
which should enable even faster release builds without compromising on the
performance we have today.
This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:
* In one mode we'll *only* perform ThinLTO over the codegen units produced in a
single compilation. That is, we won't load upstream rlibs, but we'll instead
just perform ThinLTO amongst all codegen units produced by the compiler for
the local crate. This is intended to emulate a desired end point where we have
codegen units turned on by default for all crates and ThinLTO allows us to do
this without performance loss.
* In anther mode, like full LTO today, we'll optimize all upstream dependencies
in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
should finish much more quickly.
There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:
* Controlling parallelism means we can use the existing jobserver support to
avoid overloading machines.
* We will likely want a slightly different form of incremental caching which
integrates with our own incremental strategy, but this is yet to be
determined.
* This buys us some flexibility about when/where we run ThinLTO, as well as
having it tailored to fit our needs for the time being.
* Finally this allows us to reuse some artifacts such as our `TargetMachine`
creation, where all our options we used today aren't necessarily supported by
upstream LLVM yet.
My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!
This commit is a refactoring of the LTO backend in Rust to support compilations
with multiple codegen units. The immediate result of this PR is to remove the
artificial error emitted by rustc about `-C lto -C codegen-units-8`, but longer
term this is intended to lay the groundwork for LTO with incremental compilation
and ultimately be the underpinning of ThinLTO support.
The problem here that needed solving is that when rustc is producing multiple
codegen units in one compilation LTO needs to merge them all together.
Previously only upstream dependencies were merged and it was inherently relied
on that there was only one local codegen unit. Supporting this involved
refactoring the optimization backend architecture for rustc, namely splitting
the `optimize_and_codegen` function into `optimize` and `codegen`. After an LLVM
module has been optimized it may be blocked and queued up for LTO, and only
after LTO are modules code generated.
Non-LTO compilations should look the same as they do today backend-wise, we'll
spin up a thread for each codegen unit and optimize/codegen in that thread. LTO
compilations will, however, send the LLVM module back to the coordinator thread
once optimizations have finished. When all LLVM modules have finished optimizing
the coordinator will invoke the LTO backend, producing a further list of LLVM
modules. Currently this is always a list of one LLVM module. The coordinator
then spawns further work to run LTO and code generation passes over each module.
In the course of this refactoring a number of other pieces were refactored:
* Management of the bytecode encoding in rlibs was centralized into one module
instead of being scattered across LTO and linking.
* Some internal refactorings on the link stage of the compiler was done to work
directly from `CompiledModule` structures instead of lists of paths.
* The trans time-graph output was tweaked a little to include a name on each
bar and inflate the size of the bars a little