std: allow after-main use of synchronization primitives
By creating an unnamed thread handle when the actual one has already been destroyed, synchronization primitives using thread parking can be used even outside the Rust runtime.
This also fixes an inefficiency in the queue-based `RwLock`: if `thread::current` was not initialized yet, it will create a new handle on every parking attempt without initializing `thread::current`. The private `current_or_unnamed` function introduced here fixes this.
By creating an unnamed thread handle when the actual one has already been destroyed, synchronization primitives using thread parking can be used even outside the Rust runtime.
This also fixes an inefficiency in the queue-based `RwLock`: if `thread::current` was not initialized yet, it will create a new handle on every parking attempt without initializing `thread::current`. The private `current_or_unnamed` function introduced here fixes this.
Rwlock downgrade
Tracking Issue: #128203
This PR adds a `downgrade` method for `RwLock` / `RwLockWriteGuard` on all currently supported platforms.
Outstanding questions:
- [x] ~~Does the `futex.rs` change affect performance at all? It doesn't seem like it will but we can't be certain until we bench it...~~
- [x] ~~Should the SOLID platform implementation [be ported over](https://github.com/rust-lang/rust/pull/128219#discussion_r1693470090) to the `queue.rs` implementation to allow it to support downgrades?~~
This commit fixes a memory ordering bug in the futex implementation
(`Relaxed` -> `Release` on `downgrade`).
This commit also removes a badly written test that deadlocked and
replaces it with a more reasonable test based on an already-tested
`downgrade` test from the parking-lot crate.
split up the first paragraph of doc comments for better summaries
used `./x clippy -Aclippy::all '-Wclippy::too_long_first_doc_paragraph' library/core library/alloc` to find these issues.
The channel's `Block::new` was causing a stack overflow because it held
32 item slots, instantiated on the stack before moving to `Box::new`.
The 32x multiplier made modestly-large item sizes untenable.
That block is now initialized directly on the heap.
Fixes#102246
Since the stabilization in #127679 has reached stage0, 1.82-beta, we can
start using `&raw` freely, and even the soft-deprecated `ptr::addr_of!`
and `ptr::addr_of_mut!` can stop allowing the unstable feature.
I intentionally did not change any documentation or tests, but the rest
of those macro uses are all now using `&raw const` or `&raw mut` in the
standard library.
In the implementation of `force_mut`, I chose performance over safety.
For `LazyLock` this isn't really a choice; the code has to be unsafe.
But for `LazyCell`, we can have a full-safe implementation, but it will
be a bit less performant, so I went with the unsafe approach.
The top-level docs for `LazyLock` included two lines of code, each
with an accompanying comment, that were identical and with nearly-
identical comments. This looks like an oversight from a past edit
which was perhaps trying to rewrite an existing example but ended
up duplicating rather than replacing, though I haven't gone back
through the Git history to check.
This commit removes what I personally think is the less-clear of
the two examples.
Signed-off-by: Andrew Lilley Brinker <alilleybrinker@gmail.com>
Rollup of 9 pull requests
Successful merges:
- #127474 (doc: Make block of inline Deref methods foldable)
- #129678 (Deny imports of `rustc_type_ir::inherent` outside of type ir + new trait solver)
- #129738 (`rustc_mir_transform` cleanups)
- #129793 (add extra linebreaks so rustdoc can identify the first sentence)
- #129804 (Fixed some typos in the standard library documentation/comments)
- #129837 (Actually parse stdout json, instead of using hacky contains logic.)
- #129842 (Fix LLVM ABI NAME for riscv64imac-unknown-nuttx-elf)
- #129843 (Mark myself as on vacation for triagebot)
- #129858 (Replace walk with visit so we dont skip outermost expr kind in def collector)
Failed merges:
- #129777 (Add `unreachable_pub`, round 4)
- #129868 (Remove kobzol vacation status)
r? `@ghost`
`@rustbot` modify labels: rollup
Rollup of 7 pull requests
Successful merges:
- #123813 (Add `REDUNDANT_IMPORTS` lint for new redundant import detection)
- #126697 ([RFC] mbe: consider the `_` in 2024 an expression)
- #127159 (match lowering: Hide `Candidate` from outside the lowering algorithm)
- #128244 (Peel off explicit (or implicit) deref before suggesting clone on move error in borrowck, remove some hacks)
- #128431 (Add myself as VxWorks target maintainer for reference)
- #128438 (Add special-case for [T, 0] in dropck_outlives)
- #128457 (Fix docs for OnceLock::get_mut_or_init)
r? `@ghost`
`@rustbot` modify labels: rollup
std: implement the `once_wait` feature
Tracking issue: #127527
This additionally adds a `wait_force` method to `Once` that doesn't panic on poison.
I also took the opportunity and cleaned up up the code of the queue-based implementation a bit.
Use ThreadId instead of TLS-address in `ReentrantLock`
Fixes#123458
`ReentrantLock` currently uses the address of a thread local variable as an ID that's unique across all currently running threads. This can lead to uninituitive behavior as in #123458 if TLS blocks get reused. This PR changes `ReentrantLock` to instead use the `ThreadId` provided by `std` as the unique ID. `ThreadId` guarantees uniqueness across the lifetime of the whole process, so we don't need to worry about reusing IDs of terminated threads. The main appeal of this PR is thus the possibility of changing the `ReentrantLock` API to guarantee that if a thread leaks a lock guard, no other thread may ever acquire that lock again.
This does entail some complications:
- previously, the only way to retrieve the current thread ID would've been using `thread::current().id()` which creates a temporary `Arc` and which isn't available in TLS destructors. As part of this PR, the thread ID instead gets cached in its own thread local, as suggested [here](https://github.com/rust-lang/rust/issues/123458#issuecomment-2038207704).
- `ThreadId` is always 64-bit whereas the current implementation uses a usize-sized ID. Since this ID needs to be updated atomically, we can't simply use a single atomic variable on 32 bit platforms. Instead, we fall back to using a (sound) seqlock on 32-bit platforms, which works because only one thread at a time can write to the ID. This seqlock is technically susceptible to the ABA problem, but the attack vector to create actual unsoundness has to be very specific:
- You would need to be able to lock+unlock the lock exactly 2^31 times (or a multiple thereof) while a thread trying to lock it sleeps
- The sleeping thread would have to suspend after reading one half of the thread id but before reading the other half
- The teared result from combining the halves of the thread ID would have to exactly line up with the sleeping thread's ID
The risk of this occurring seems slim enough to be acceptable to me, but correct me if I'm wrong. This also means that the size of the lock increases by 8 bytes on 32-bit platforms, but this also shouldn't be an issue.
Performance wise, I did some crude testing of the only case where this could lead to real slowdowns, which is the case of locking a `ReentrantLock` that's already locked by the current thread. On both aarch64 and x86-64, there is (expectedly) pretty much no performance hit. I didn't have any 32-bit platforms to test the seqlock performance on, so I did the next best thing and just forced the 64-bit platforms to use the seqlock implementation. There, the performance degraded by ~1-2ns/(lock+unlock) on x86-64 and ~6-8ns/(lock+unlock) on aarch64, which is measurable but seems acceptable to me seeing as 32-bit platforms should be a small minority anyways.
cc `@joboet` `@RalfJung` `@CAD97`