Replace pthread `RwLock` with custom implementation
This is one of the last items in #93740. I'm doing `RwLock` first because it is more self-contained and has less tradeoffs to make. The motivation is explained in the documentation, but in short: the pthread rwlock is slow and buggy and `std` can do much better. I considered implementing a parking lot, as was discussed in the tracking issue, but settled for the queue-based version because writing self-balancing binary trees is not fun in Rust...
This is a rather complex change, so I have added quite a bit of documentation to help explain it. Please point out any part that could be explained better.
~~The read performance is really good, I'm getting 4x the throughput of the pthread version and about the same performance as usync/parking_lot on an Apple M1 Max in the usync benchmark suite, but the write performance still falls way behind what usync and parking_lot achieve. I tried using a separate queue lock like what usync uses, but that didn't help. I'll try to investigate further in the future, but I wanted to get some eyes on this first.~~ [Resolved](https://github.com/rust-lang/rust/pull/110211#issuecomment-1513682336)
r? `@m-ou-se`
CC `@kprotty`
`local_key_cell_methods` has been stable for a while and provides a much less
clunky way to interface with thread-local variables.
Additionaly add context to the documentation about why types with interior
mutability are needed.
Document memory orderings of `thread::{park, unpark}`
Document `thread::park/unpark` as having acquire/release synchronization. Without that guarantee, even the example in the documentation can deadlock:
```rust
let flag = Arc::new(AtomicBool::new(false));
let t2 = thread::spawn(move || {
while !flag.load(Ordering::Acquire) {
thread::park();
}
});
flag.store(true, Ordering::Release);
t2.thread().unpark();
// t1: flag.store(true)
// t1: thread.unpark()
// t2: flag.load() == false
// t2 now parks, is immediately unblocked but never
// acquires the flag, and thus spins forever
```
Multiple calls to `unpark` should also maintain a release sequence to make sure operations released by previous `unpark`s are not lost:
```rust
let a = Arc::new(AtomicBool::new(false));
let b = Arc::new(AtomicBool::new(false));
let t2 = thread::spawn(move || {
while !a.load(Ordering::Acquire) || !b.load(Ordering::Acquire) {
thread::park();
}
});
thread::spawn(move || {
a.store(true, Ordering::Release);
t2.thread().unpark();
});
b.store(true, Ordering::Release);
t2.thread().unpark();
// t1: a.store(true)
// t1: t2.unpark()
// t3: b.store(true)
// t3: t2.unpark()
// t2 now parks, is immediately unblocked but never
// acquires the store of `a`, only the store of `b` which
// was released by the most recent unpark, and thus spins forever
```
This is of course a contrived example, but is reasonable to rely upon in real code.
Note that all implementations of park/unpark already comply with the rules, it's just undocumented.
Don't claim `LocalKey::with` prevents a reference to be sent across threads
The documentation for `LocalKey` claims that `with` yields a reference that cannot be sent across threads, but this is false since you can easily do that with scoped threads. What it actually prevents is the reference from outliving the current thread.
Restructure and rename std thread_local internals to make it less of a maze
Every time I try to work on std's thread local internals, it feels like I'm trying to navigate a confusing maze made of macros, deeply nested modules, and types with multiple names/aliases. Time to clean it up a bit.
This PR:
- Exports `Key` with its own name (`Key`), instead of `__LocalKeyInner`
- Uses `pub macro` to put `__thread_local_inner` into a (unstable, hidden) module, removing `#[macro_export]`, removing it from the crate root.
- Removes the `__` from `__thread_local_inner`.
- Removes a few unnecessary `allow_internal_unstable` features from the macros
- Removes the `libstd_thread_internals` feature. (Merged with `thread_local_internals`.)
- And removes it from the unstable book
- Gets rid of the deeply nested modules for the `Key` definitions (`mod fast` / `mod os` / `mod statik`).
- Turns a `#[cfg]` mess into a single `cfg_if`, now that there's no `#[macro_export]` anymore that breaks with `cfg_if`.
- Simplifies the `cfg_if` conditions to not repeat the conditions.
- Removes useless `normalize-stderr-test`, which were left over from when the `Key` types had different names on different platforms.
- Removes a seemingly unnecessary `realstd` re-export on `cfg(test)`.
This PR changes nothing about the thread local implementation. That's for a later PR. (Which should hopefully be easier once all this stuff is a bit cleaned up.)
For larger applications it's important that users set `RUST_MIN_STACK`
at the start of their program because `min_stack` caches the value.
Not doing so can lead to their `env::set_var` call surprisingly not having any effect.
This allows removing all the platform-dependent code from `library/std/src/thread/local.rs` and `library/std/src/thread/mod.rs`
Signed-off-by: Ayush Singh <ayushsingh1325@gmail.com>
Move __thread_local_inner macro in crate:🧵:local to crate::sys.
Currently, the tidy check does not fail for `library/std/src/thread/local.rs` even though it contains platform specific code. This is beacause target_family did not exist at the time the tidy checks were written [1].
[1]: https://github.com/rust-lang/rust/pull/105861#discussion_r1125841678
Signed-off-by: Ayush Singh <ayushsingh1325@gmail.com>
std tests: use __OsLocalKeyInner from realstd
This is basically the same as https://github.com/rust-lang/rust/pull/100201, but for __OsLocalKeyInner:
Some std tests are failing in Miri on Windows because [this static](a377893da2/library/std/src/sys/windows/thread_local_key.rs (L234-L239)) is getting duplicated, and Miri does not handle that properly -- Miri does not support this magic `.CRT$XLB` linker section, but instead just looks up this particular hard-coded static in the standard library. This PR lets the test suite use the std static instead of having its own copy.
Fixes https://github.com/rust-lang/miri/issues/2754
r? `@thomcc`
Unify id-based thread parking implementations
Multiple platforms currently use thread-id-based parking implementations (NetBSD and SGX[^1]). Even though the strategy does not differ, these are duplicated for each platform, as the id is encoded into an atomic thread variable in different ways for each platform.
Since `park` is only called by one thread, it is possible to move the thread id into a separate field. By ensuring that the field is only written to once, before any other threads access it, these accesses can be unsynchronized, removing any restrictions on the size and niches of the thread id.
This PR also renames the internal `thread_parker` modules to `thread_parking`, as that name now better reflects their contents. I hope this does not add too much reviewing noise.
r? `@m-ou-se`
`@rustbot` label +T-libs
[^1]: SOLID supports this as well, I will switch it over in a follow-up PR.