Use atomic RMW for `{mutex, rwlock, cond, srwlock}_get_or_create_id` functions
This is required for #1963
`{mutex, rwlock, cond, srwlock}_get_or_create_id()` currently checks whether an ID field is 0 using an atomic read, allocate one and get a new ID if it is, then write it in a separate atomic write. This is fine without weak memory. For instance, in `pthread_mutex_lock` which may be called by two threads concurrently, only one thread can read 0, create and then write a new ID, the later-run thread will always see the newly created ID and never 0.
```rust
fn pthread_mutex_lock(&mut self, mutex_op: &OpTy<'tcx, Tag>) -> InterpResult<'tcx, i32> {
let this = self.eval_context_mut();
let kind = mutex_get_kind(this, mutex_op)?.check_init()?;
let id = mutex_get_or_create_id(this, mutex_op)?;
let active_thread = this.get_active_thread();
```
However, with weak memory behaviour, both threads may read 0: the first thread has to see 0 because nothing else was written to it, and the second thread is not guaranteed to observe the latest value, causing a duplicate mutex to be created and both threads "successfully" acquiring the lock at the same time.
This is a pretty typical pattern requiring the use of atomic RMWs. RMW *always* reads the latest value in a location, so only one thread can create the new mutex and ID, all others scheduled later will see the new ID.