user0/rust - Forgejo: Beyond coding. We Forge.

user0/rust

Author	SHA1	Message	Date
bors	2219766af6	Auto merge of #152605 - scottmcm:box-drop-alignment, r=Mark-Simulacrum Pass alignments through the shim as `Alignment` (not `usize`) We're using `Layout` on both sides, so might as well skip the transmutes back and forth to `usize`. The mir-opt test shows that doing so allows simplifying the boxed-slice drop slightly, for example.	2026-02-15 13:38:45 +00:00
Jonathan Brouwer	e6ca590153	Rollup merge of #152404 - durin42:llvm-23-instcombine-shrink-constant, r=Mark-Simulacrum tests: adapt align-offset.rs for InstCombine improvements in LLVM 23 Upstream [has improved InstCombine](`8d2078332c`) so that it can shrink added constants using known zeroes, which caused a little bit of change in this test. As far as I can tell either output is fine, so we just accept both. @rustbot label: +llvm-main	2026-02-14 22:11:54 +01:00
Jonathan Brouwer	33c2a6eba9	Rollup merge of #151365 - RalfJung:unsafe-unpin-opsem, r=BoxyUwU UnsafePinned: implement opsem effects of UnsafeUnpin This implements the next step for https://github.com/rust-lang/rust/issues/125735: actually making `UnsafePinned` have special opsem effects by suppressing the `noalias` even if the type is wrapped in an `Unpin` wrapper. For backwards compatibility we also still keep the `Unpin` hack, i.e. a type must be both `Unpin` and `UnsafeUnpin` to get `noalias`.	2026-02-14 22:11:53 +01:00
Jonathan Brouwer	6d625cc074	Rollup merge of #145024 - Kmeakin:km/optimize-slice-index/v3, r=Mark-Simulacrum Optimize indexing slices and strs with inclusive ranges Instead of separately checking for `end == usize::MAX` and `end + 1 > slice.len()`, we can check for `end >= slice.len()`. Also consolidate all the str indexing related panic functions into a single function which reports the correct error depending on the arguments, as the slice indexing code already does. The downside of all this is that the panic message is slightly less specific when trying to index with `[..=usize::MAX]`: instead of saying "attempted to index str up to maximum usize" it just says "end byte index {end} out of bounds". But this is a rare enough case that I think it is acceptable	2026-02-14 22:11:52 +01:00
bors	f8463896a9	Auto merge of #150681 - meithecatte:always-discriminate, r=JonathanBrouwer,Nadrieril Make operational semantics of pattern matching independent of crate and module The question of "when does matching an enum against a pattern of one of its variants read its discriminant" is currently an underspecified part of the language, causing weird behavior around borrowck, drop order, and UB. Of course, in the common cases, the discriminant must be read to distinguish the variant of the enum, but currently the following exceptions are implemented: 1. If the enum has only one variant, we currently skip the discriminant read. - This has the advantage that single-variant enums behave the same way as structs in this regard. - However, it means that if the discriminant exists in the layout, we can't say that this discriminant being invalid is UB. This makes me particularly uneasy in its interactions with niches – consider the following example ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=5904a6155cbdd39af4a2e7b1d32a9b1a)), where miri currently doesn't detect any UB (because the semantics don't specify any): <details><summary>Example 1</summary> ```rust #![allow(dead_code)] use core::mem::{size_of, transmute}; #[repr(u8)] enum Inner { X(u8), } enum Outer { A(Inner), B(u8), } fn f(x: &Inner) { match x { Inner::X(v) => { println!("{v}"); } } } fn main() { assert_eq!(size_of::<Inner>(), 2); assert_eq!(size_of::<Outer>(), 2); let x = Outer::B(42); let y = &x; f(unsafe { transmute(y) }); } ``` </details> 2. For the purpose of the above, enums with marked with `#[non_exhaustive]` are always considered to have multiple variants when observed from foreign crates, but the actual number of variants is considered in the current crate. - This means that whether code has UB can depend on which crate it is in: https://github.com/rust-lang/rust/issues/147722 - In another case of `#[non_exhaustive]` affecting the runtime semantics, its presence or absence can change what gets captured by a closure, and by extension, the drop order: https://github.com/rust-lang/rust/issues/147722#issuecomment-3674554872 - Also at the above link, there is an example where removing `#[non_exhaustive]` can cause borrowck to suddenly start failing in another crate. 3. Moreover, we currently make a more specific check: we only read the discriminant if there is more than one inhabited variant in the enum. - This means that the semantics can differ between `foo<!>`, and a copy of `foo` where `T` was manually replaced with `!`: rust-lang/rust#146803 - Moreover, due to the privacy rules for inhabitedness, it means that the semantics of code can depend on the module in which it is located. - Additionally, this inhabitedness rule is even uglier due to the fact that closure capture analysis needs to happen before we can determine whether types are uninhabited, which means that whether the discriminant read happens has a different answer specifically for capture analysis. - For the two above points, see the following example ([playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=a07d8a3ec0b31953942e96e2130476d9)): <details><summary>Example 2</summary> ```rust #![allow(unused)] mod foo { enum Never {} struct PrivatelyUninhabited(Never); pub enum A { V(String, String), Y(PrivatelyUninhabited), } fn works(mut x: A) { let a = match x { A::V(ref mut a, _) => a, _ => unreachable!(), }; let b = match x { A::V(_, ref mut b) => b, _ => unreachable!(), }; a.len(); b.len(); } fn fails(mut x: A) { let mut f = \|\| match x { A::V(ref mut a, _) => (), _ => unreachable!(), }; let mut g = \|\| match x { A::V(_, ref mut b) => (), _ => unreachable!(), }; f(); g(); } } use foo::A; fn fails(mut x: A) { let a = match x { A::V(ref mut a, _) => a, _ => unreachable!(), }; let b = match x { A::V(_, ref mut b) => b, _ => unreachable!(), }; a.len(); b.len(); } fn fails2(mut x: A) { let mut f = \|\| match x { A::V(ref mut a, _) => (), _ => unreachable!(), }; let mut g = \|\| match x { A::V(_, ref mut b) => (), _ => unreachable!(), }; f(); g(); } ``` </details> In light of the above, and following the discussion at rust-lang/rust#138961 and rust-lang/rust#147722, this PR ~~makes it so that, operationally, matching on an enum always reads its discriminant.~~ introduces the following changes to this behavior: - matching on a `#[non_exhaustive]` enum will always introduce a discriminant read, regardless of whether the enum is from an external crate - uninhabited variants now count just like normal ones, and don't get skipped in the checks As per the discussion below, the resolution for point (1) above is that it should land as part of a separate PR, so that the subtler decision can be more carefully considered. Note that this is a breaking change, due to the aforementioned changes in borrow checking behavior, new UB (or at least UB newly detected by miri), as well as drop order around closure captures. However, it seems to me that the combination of this PR with rust-lang/rust#138961 should have smaller real-world impact than rust-lang/rust#138961 by itself. Fixes rust-lang/rust#142394 Fixes rust-lang/rust#146590 Fixes rust-lang/rust#146803 (though already marked as duplicate) Fixes parts of rust-lang/rust#147722 Fixes rust-lang/miri#4778 r? @Nadrieril @RalfJung @rustbot label +A-closures +A-patterns +T-opsem +T-lang	2026-02-14 12:53:09 +00:00
Scott McMurray	774268afc1	Pass alignments through the shim as `Alignment` (not `usize`) We're using `Layout` on both sides, so might as well skip the transmutes back and forth to `usize`. The mir-opt test shows that doing so allows simplifying the boxed-slice drop slightly, for example.	2026-02-14 01:39:16 -08:00
Stuart Cook	8b036c7b72	Rollup merge of #152486 - fneddy:s390x_simplify_backchain, r=dingxiangfei2009 remove redundant backchain attribute in codegen llvm will look at both 1. the values of `"target-features"` and 2. the function string attributes. this patch removes the redundant function string attribute because it is not needed at all. rustc sets the `+backchain` attribute through `target_features_attr(...)` `d34f1f9314/compiler/rustc_codegen_llvm/src/attributes.rs (L590)` `d34f1f9314/compiler/rustc_codegen_llvm/src/attributes.rs (L326-L337)`	2026-02-13 15:19:13 +11:00
Eddy (Eduard) Stefes	c6f57becfc	remove redundant backchain attribute in codegen llvm will look at both 1. the values of "target-features" and 2. the function string attributes. this removes the redundant function string attribute because it is not needed at all. rustc sets the `+backchain` attribute through `target_features_attr(...)`	2026-02-12 09:58:25 +01:00
Ralf Jung	590c1c9966	UnsafePinned: implement opsem effects of UnsafeUnpin	2026-02-12 09:09:35 +01:00
Jacob Pratt	b1b6533077	Rollup merge of #142680 - beetrees:sparc64-float-struct-abi, r=tgross35 Fix passing/returning structs with the 64-bit SPARC ABI Fixes the 64-bit SPARC part of rust-lang/rust#115609 by replacing the current implementation with a new implementation modelled on the RISC-V calling convention code ([SPARC ABI reference](https://sparc.org/wp-content/uploads/2014/01/SCD.2.4.1.pdf.gz)). Pinging `sparcv9-sun-solaris` target maintainers: @psumbera @kulikjak Fixes rust-lang/rust#115336 Fixes rust-lang/rust#115399 Fixes rust-lang/rust#122620 Fixes https://github.com/rust-lang/rust/issues/147883 r? @workingjubilee	2026-02-12 00:41:05 -05:00
Karl Meakin	262cd76333	Optimize `SliceIndex<str>` for `RangeInclusive` Replace `self.end() == usize::MAX` and `self.end() + 1 > slice.len()` with `self.end() >= slice.len()`. Same reasoning as previous commit. Also consolidate the str panicking functions into function.	2026-02-10 23:19:01 +00:00
Karl Meakin	625b18027d	Optimize `SliceIndex::get` impl for `RangeInclusive` The checks for `self.end() == usize::MAX` and `self.end() + 1 > slice.len()` can be replaced with `self.end() >= slice.len()`, since `self.end() < slice.len()` implies both `self.end() <= slice.len()` and `self.end() < usize::MAX`.	2026-02-10 20:29:45 +00:00
Karl Meakin	d74b276d1d	Precommit tests for `SliceIndex` method codegen Add a `codegen-llvm` test to check the number of `icmp` instrucitons generated for each `SliceIndex` method on the various range types. This will be updated in the next commit when `SliceIndex::get` is optimized for `RangeInclusive`.	2026-02-10 20:29:45 +00:00
Folkert de Vries	c9b5c934ca	Fix passing/returning structs with the 64-bit SPARC ABI Co-authored-by: beetrees <b@beetr.ee>	2026-02-10 12:39:45 +01:00
Augie Fackler	aefb9a9ae2	tests: adapt align-offset.rs for InstCombine improvements in LLVM 23 Upstream has improved InstCombine so that it can shrink added constants using known zeroes, which caused a little bit of change in this test. As far as I can tell either output is fine, so we just accept both.	2026-02-09 15:53:38 -05:00
bors	1c316d3461	Auto merge of #152361 - JonathanBrouwer:rollup-Qkwz1vN, r=JonathanBrouwer Rollup of 5 pull requests Successful merges: - rust-lang/rust#151869 (add test for codegen of SIMD vector from array repeat) - rust-lang/rust#152077 (bootstrap: always propagate `CARGO_TARGET_{host}_LINKER`) - rust-lang/rust#126100 (Reword the caveats on `array::map`) - rust-lang/rust#152275 (Stop having two different alignment constants) - rust-lang/rust#152325 (Remove more adhoc groups that correspond to teams)	2026-02-08 21:42:19 +00:00
Jonathan Brouwer	b566ac2c47	Rollup merge of #151869 - folkertdev:simd-array-repeat, r=Mark-Simulacrum add test for codegen of SIMD vector from array repeat fixes https://github.com/rust-lang/rust/issues/97804 It appears that this issue was fixed silently in LLVM 19. The original codegen was terrible, but starting at LLVM 19 `opt` is able to generate good code. https://llvm.godbolt.org/z/5vq8scP6q cc @programmerjake	2026-02-08 21:06:28 +01:00
Jonathan Brouwer	16c7ee5c05	Rollup merge of #151640 - ZuseZ4:cleanup-datatransfer, r=nnethercote Cleanup offload datatransfer There are 3 steps to run code on a GPU: Copy data from the host to the device, launch the kernel, and move it back. At the moment, we have a single variable describing the memory handling to do in each step, but that makes it hard for LLVM's opt pass to understand what's going on. We therefore split it into three variables, each only including the bits relevant for the corresponding stage. cc @jdoerfert @kevinsala r? compiler	2026-02-08 19:15:26 +01:00
Manuel Drehwald	6de0591c0b	Split ol mapper into more specific to/kernel/from mapper and move init_all_rtls into global ctor	2026-02-07 17:34:39 -08:00
Jonathan Brouwer	27d6b3c9b7	Rollup merge of #151576 - tgross35:stabilize-cold-path, r=jhpratt Stabilize `core::hint::cold_path` `cold_path` has been around unstably for a while and is a rather useful tool to have. It does what it is supposed to and there are no known remaining issues, so stabilize it here (including const). Newly stable API: ```rust // in core::hint pub const fn cold_path(); ``` I have opted to exclude `likely` and `unlikely` for now since they have had some concerns about ease of use that `cold_path` doesn't suffer from. `cold_path` is also significantly more flexible; in addition to working with boolean `if` conditions, it can be used in `match` arms, `if let`, closures, and other control flow blocks. `likely` and `unlikely` are also possible to implement in user code via `cold_path`, if desired. Closes: https://github.com/rust-lang/rust/issues/136873 (tracking issue) --- There has been some design and implementation work for making `#[cold]` function in more places, such as `if` arms, `match` arms, and closure bodies. Considering a stable `cold_path` will cover all of these usecases, it does not seem worth pursuing a more powerful `#[cold]` as an alternative way to do the same thing. If the lang team agrees, then: Closes: https://github.com/rust-lang/rust/issues/26179 Closes: https://github.com/rust-lang/rust/pull/120193	2026-02-07 13:06:35 +01:00
Jonathan Brouwer	f163864627	Rollup merge of #152128 - zmodem:matches-logical-or-141497, r=nikic Adopt matches-logical-or-141497.rs to LLVM HEAD After http://github.com/llvm/llvm-project/pull/178977, the and + icmp are folded to trunc.	2026-02-05 08:32:57 +01:00
Jonathan Brouwer	b66ead827c	Rollup merge of #152020 - Sa4dUs:offload-remove-dummy-loads, r=ZuseZ4 Remove dummy loads on offload codegen The current logic generates two dummy loads to prevent some globals from being optimized away. This blocks memtransfer loop hoisting optimizations, so it's time to remove them. r? @ZuseZ4	2026-02-05 08:32:45 +01:00
Hans Wennborg	23e5b2499f	Adopt matches-logical-or-141497.rs to LLVM HEAD After http://github.com/llvm/llvm-project/pull/178977, the and + icmp are folded to trunc.	2026-02-04 19:20:10 +01:00
bors	db3e99bbab	Auto merge of #150605 - RalfJung:fallback-intrinsic-skip, r=mati865 skip codegen for intrinsics with big fallback bodies if backend does not need them This hopefully fixes the perf regression from https://github.com/rust-lang/rust/pull/148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list. Fixes https://github.com/rust-lang/rust/issues/149945 Cc @scottmcm @bjorn3	2026-02-04 17:12:58 +00:00
Marcelo Domínguez	212c8c3811	Remove dummy loads	2026-02-04 15:26:56 +01:00
Jonathan Brouwer	9fd5712bf5	Rollup merge of #151526 - ZuseZ4:fix-autodiff-codegen-tests, r=oli-obk Fix autodiff codegen tests Preparing autodiff for release on nightly. Since we haven't been running these tests in CI, they regressed over the last months. These changes fixes this and hopefully make the tests more robust for the future. r? compiler	2026-02-04 14:39:19 +01:00
Folkert de Vries	ef7a7809c7	add test for simd from array repeat codegen	2026-02-03 22:44:44 +01:00
Jacob Pratt	e2c5b89d2a	Rollup merge of #151958 - chahar-ritik:add-slp-vectorization-test, r=jieyouxu Add codegen test for SLP vectorization close: rust-lang/rust#142519 This PR adds a codegen regression test for rust-lang/rust#142519. A regression in LLVM to fail to auto-vectorize, leading to significant performance loss. The SLP vectorizer correctly groups the 4-byte operations into <4 x i8> vectors. The loop state is maintained in SIMD registers (phi <4 x i8>). The test remains robust across architectures (AArch64 vs x86_64) by allowing flexible store types (i32 or <4 x i8>).	2026-02-02 23:12:05 -05:00
ltdk	28feae0c87	Move bigint helper tracking issues	2026-02-02 18:45:26 -05:00
Ritik Chahar	8476e893e7	Update min-llvm-version: 22 Co-authored-by: Nikita Popov <github@npopov.com>	2026-02-02 16:47:09 +05:30
ritik chahar	6176945223	fix: remove space for tidy and only for x86_64	2026-02-02 16:05:08 +05:30
ritik chahar	0830a5a928	fix: add min-llvm-version	2026-02-02 15:44:50 +05:30
ritik chahar	95ac5673ce	Fix SLP vectorization test CHECK patterns	2026-02-02 15:38:26 +05:30
ritik chahar	c64f9a0fc4	Add backlink to issue	2026-02-02 07:38:14 +05:30
ritik chahar	1c396d24dd	Restrict test to x86_64 per reviewer feedback	2026-02-01 22:14:13 +05:30
ritik chahar	0a60bd653d	fix: remove trailing newline for tidy	2026-02-01 22:09:05 +05:30
ritik chahar	2292d53b7b	Add codegen test for SLP vectorization	2026-02-01 21:41:43 +05:30
Nikita Popov	acb5ee2f84	Disable append-elements.rs test with debug assertions The IR is a bit different (in particular wrt naming) if debug-assertions-std is enabled. Peculiarly, the issue goes away if overflow-check-std is also enabled, which is why CI did not catch this.	2026-01-30 13:01:22 +01:00
Stuart Cook	3d102a7812	Rollup merge of #150893 - ZuseZ4:move-un-register-lib, r=oli-obk offload: move (un)register lib into global_ctors Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards. What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes. Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in https://github.com/rust-lang/rust/pull/150683, where I introduce a new variant of our offload intrinsic. r? oli-obk	2026-01-28 19:03:51 +11:00
Manuel Drehwald	35ce8ab120	adjust testcase for new logic	2026-01-27 10:43:21 -08:00
Stuart Cook	1c892e829c	Rollup merge of #147436 - okaneco:eq_ignore_ascii_autovec, r=scottmcm slice/ascii: Optimize `eq_ignore_ascii_case` with auto-vectorization - Refactor the current functionality into a helper function - Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function - Add a codegen test checking for vectorization and no panicking - Add benches for `eq_ignore_ascii_case` --- The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation. Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16. Benchmarks - Cases below 16 bytes are unaffected, cases above all show sizeable improvements. ``` before: str::eq_ignore_ascii_case::bench_large_str_eq 4942.30ns/iter +/- 48.20 str::eq_ignore_ascii_case::bench_medium_str_eq 632.01ns/iter +/- 16.87 str::eq_ignore_ascii_case::bench_str_17_bytes_eq 16.28ns/iter +/- 0.45 str::eq_ignore_ascii_case::bench_str_31_bytes_eq 35.23ns/iter +/- 2.28 str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq 7.56ns/iter +/- 0.22 str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq 2.64ns/iter +/- 0.06 after: str::eq_ignore_ascii_case::bench_large_str_eq 611.63ns/iter +/- 28.29 str::eq_ignore_ascii_case::bench_medium_str_eq 77.10ns/iter +/- 19.76 str::eq_ignore_ascii_case::bench_str_17_bytes_eq 3.49ns/iter +/- 0.39 str::eq_ignore_ascii_case::bench_str_31_bytes_eq 3.50ns/iter +/- 0.27 str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq 7.27ns/iter +/- 0.09 str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq 2.60ns/iter +/- 0.05 ```	2026-01-27 17:36:35 +11:00
Trevor Gross	2e365985be	Stabilize `core::hint::cold_path` `cold_path` has been around unstably for a while and is a rather useful tool to have. It does what it is supposed to and there are no known remaining issues, so stabilize it here (including const). Newly stable API: // in core::hint pub const fn cold_path(); I have opted to exclude `likely` and `unlikely` for now since they have had some concerns about ease of use that `cold_path` doesn't suffer from. `cold_path` is also significantly more flexible; in addition to working with boolean `if` conditions, it can be used in `match` arms, `if let`, closures, and other control flow blocks. `likely` and `unlikely` are also possible to implement in user code via `cold_path`, if desired.	2026-01-26 13:43:06 -06:00
Jonathan Pallant	6ecb3f33f0	Adds two new Tier 3 targets - `aarch64v8r-unknown-none` and `aarch64v8r-unknown-none-softfloat`. The existing `aarch64-unknown-none` target assumes Armv8.0-A as a baseline. However, Arm recently released the Arm Cortex-R82 processor which is the first to implement the Armv8-R AArch64 mode architecture. This architecture is similar to Armv8-A AArch64, however it has a different set of mandatory features, and is based off of Armv8.4. It is largely unrelated to the existing Armv8-R architecture target (`armv8r-none-eabihf`), which only operates in AArch32 mode. The second `aarch64v8r-unknown-none-softfloat` target allows for possible Armv8-R AArch64 CPUs with no FPU, or for use-cases where FPU register stacking is not desired. As with the existing `aarch64-unknown-none` target we have coupled FPU support and Neon support together - there is no 'has FPU but does not have NEON' target proposed even though the architecture technically allows for it. This PR was developed by Ferrous Systems on behalf of Arm. Arm is the owner of these changes.	2026-01-26 12:43:52 +00:00
bors	873d4682c7	Auto merge of #151337 - the8472:bail-before-memcpy2, r=Mark-Simulacrum optimize `vec.extend(slice.to_vec())`, take 2 Redoing https://github.com/rust-lang/rust/pull/130998 It was reverted in https://github.com/rust-lang/rust/pull/151150 due to flakiness. I have traced this to layout randomization perturbing the test (the failure reproduces locally with layout randomization), which is now excluded.	2026-01-25 19:45:35 +00:00
Matthias Krüger	0de96f455d	Rollup merge of #151405 - heiher:fix-cli, r=Mark-Simulacrum LoongArch: Fix call-llvm-intrinsics test	2026-01-25 16:27:23 +01:00
Matthias Krüger	f6a8326a99	Rollup merge of #151404 - heiher:fix-dae, r=Mark-Simulacrum LoongArch: Fix direct-access-external-data test On LoongArch targets, `-Cdirect-access-external-data` defaults to `no`. Since copy relocations are not supported, `dso_local` is not emitted under `-Crelocation-model=static`, unlike on other targets.	2026-01-25 16:27:22 +01:00
Matthias Krüger	9dffb21112	Rollup merge of #150065 - is57primenumber:add-slice-cse-test, r=Mark-Simulacrum add CSE optimization tests for iterating over slice This PR is regression test for issue rust-lang/rust#119573. This PR introduces a new regression test to verify a critical optimization known as Common Subexpression Elimination (CSE) is correctly applied during various slice iteration patterns.	2026-01-25 07:42:59 +01:00
Matthias Krüger	b651be2191	Rollup merge of #145393 - clubby789:issue-138497, r=Mark-Simulacrum Add codegen test for removing trailing zeroes from `NonZero` Closes rust-lang/rust#138497	2026-01-25 07:42:56 +01:00
bors	75963ce795	Auto merge of #151065 - nagisa:add-preserve-none-abi, r=petrochenkov abi: add a rust-preserve-none calling convention This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature. For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of: foo: push r12 ; do things pop r12 jmp next_step This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses. I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)	2026-01-25 02:49:32 +00:00
Matthias Krüger	3a69035338	Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee add `simd_splat` intrinsic Add `simd_splat` which lowers to the LLVM canonical splat sequence. ```llvm insertelement <N x elem> poison, elem %x, i32 0 shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer ``` Right now we try to fake it using one of ```rust fn splat(x: u32) -> u32x8 { u32x8::from_array([x; 8]) } ``` or (in `stdarch`) ```rust fn splat(value: $elem_type) -> $name { #[derive(Copy, Clone)] #[repr(simd)] struct JustOne([$elem_type; 1]); let one = JustOne([value]); // SAFETY: 0 is always in-bounds because we're shuffling // a simd type with exactly one element. unsafe { simd_shuffle!(one, one, [0; $len]) } } ``` Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples: - https://github.com/rust-lang/rust/issues/60637 - https://github.com/rust-lang/rust/issues/137407 - https://github.com/rust-lang/rust/issues/122623 - https://github.com/rust-lang/rust/issues/97804 --- As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends. Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below. Currently this just adds the intrinsic, it does not actually use it anywhere yet.	2026-01-24 21:04:15 +01:00

1 2 3 4 5 ...

332 commits