Avoid no-op unlink+link dances in incr comp
Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that.
The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory.
This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old `link_or_copy` implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links.
`cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b` before and then after on `ripgrep-13.0.0`:
```
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
52.56 0.024950 25 978 485 unlink
34.38 0.016318 22 727 linkat
13.06 0.006200 24 249 unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00 0.047467 24 1954 485 total
```
```
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
42.83 0.014521 57 252 unlink
38.41 0.013021 26 486 linkat
18.77 0.006362 25 249 unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00 0.033904 34 987 total
```
This reduces the number of hard-links that are causing perf troubles, noted in https://github.com/rust-lang/rust/issues/64291 and https://github.com/rust-lang/rust/issues/137560
Remove `#[cfg(not(test))]` gates in `core`
These gates are unnecessary now that unit tests for `core` are in a separate package, `coretests`, instead of in the same files as the source code. They previously prevented the two `core` versions from conflicting with each other.
Add `#[define_opaques]` attribute and require it for all type-alias-impl-trait sites that register a hidden type
Instead of relying on the signature of items to decide whether they are constraining an opaque type, the opaque types that the item constrains must be explicitly listed.
A previous version of this PR used an actual attribute, but had to keep the resolved `DefId`s in a side table.
Now we just lower to fields in the AST that have no surface syntax, instead a builtin attribute macro fills in those fields where applicable.
Note that for convenience referencing opaque types in associated types from associated methods on the same impl will not require an attribute. If that causes problems `#[defines()]` can be used to overwrite the default of searching for opaques in the signature.
One wart of this design is that closures and static items do not have generics. So since I stored the opaques in the generics of functions, consts and methods, I would need to add a custom field to closures and statics to track this information. During a T-types discussion we decided to just not do this for now.
fixes#131298
Speed up target feature computation
The LLVM backend calls `LLVMRustHasFeature` twice for every feature. In short-running rustc invocations, this accounts for a surprising amount of work.
r? `@bjorn3`
These gates are unnecessary now that unit tests for `core` are in a
separate package, `coretests`, instead of in the same files as the
source code. They previously prevented the two `core` versions from
conflicting with each other.
I might re-implement it in the future, but would probably do so by
replacing cranelift-jit. cranelift-jit's api doesn't quite work well for
lazy jitting. And even so it adds complexity to cranelift-jit and breaks
cranelift-jit outside of x86_64.
Update `compiler-builtins` to 0.1.149
Includes a change to make a subset of math symbols available on all platforms [1], and disables `f16` on aarch64 without neon [2].
[1]: https://github.com/rust-lang/compiler-builtins/pull/763
[2]: https://github.com/rust-lang/compiler-builtins/pull/775
try-job: aarch64-gnu
try-job: aarch64-gnu-debug
try-job: armhf-gnu
try-job: dist-various-1
try-job: dist-various-2
try-job: dist-aarch64-linux
try-job: dist-arm-linux
try-job: dist-armv7-linux
try-job: dist-x86_64-linux
try-job: test-various
Currently it is called twice, once with `allow_unstable` set to true and
once with it set to false. This results in some duplicated work. Most
notably, for the LLVM backend, `LLVMRustHasFeature` is called twice for
every feature, and it's moderately slow. For very short running
compilations on platforms with many features (e.g. a `check` build of
hello-world on x86) this is a significant fraction of runtime.
This commit changes `target_features_cfg` so it is only called once, and
it now returns a pair of feature sets. This halves the number of
`LLVMRustHasFeature` calls.
rename BackendRepr::Vector → SimdVector
For many Rustaceans, "vector" does not imply "SIMD", so let's be more clear in this type that is used pervasively in the compiler.
r? `@workingjubilee`
The embedded bitcode should always be prepared for LTO/ThinLTO
Fixes#115344. Fixes#117220.
There are currently two methods for generating bitcode that used for LTO. One method involves using `-C linker-plugin-lto` to emit object files as bitcode, which is the typical setting used by cargo. The other method is through `-C embed-bitcode=yes`.
When using with `-C embed-bitcode=yes -C lto=no`, we run a complete non-LTO LLVM pipeline to obtain bitcode, then the bitcode is used for LTO. We run the Call Graph Profile Pass twice on the same module.
This PR is doing something similar to LLVM's `buildFatLTODefaultPipeline`, obtaining the bitcode for embedding after running `buildThinLTOPreLinkDefaultPipeline`.
r? nikic