Rewrite native thread-local storage
(part of #110897)
The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.
This does not change the recursive initialization behaviour I described in [this comment](https://github.com/rust-lang/rust/issues/110897#issuecomment-1525705682), so it should be a library-only change. Changing that behaviour should be quite easy now, however.
r? `@m-ou-se`
`@rustbot` label +T-libs