Improve performance of spsc_queue and stream. This PR makes two main changes: 1. It switches the `spsc_queue` node caching strategy from keeping a shared counter of the number of nodes in the cache to keeping a consumer only counter of the number of node eligible to be cached. 2. It separates the consumer and producers fields of `spsc_queue` and `stream` into a producer cache line and consumer cache line. Overall, it speeds up `mpsc` in `spsc` mode by 2-10x. Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in `send` (`fn stream::Queue::send:107`). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per `send`, but I haven't looked through the select implementation enough to be sure. The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in `mpsc` which I expect to reuse for `shared`. It doesn't really belong there, it would probably be best put in `core::sync::atomic`, but putting it in `core` would involve making it public, which I thought would require an RFC. Benchmark runner is [here]( |
||
|---|---|---|
| .. | ||
| mpsc | ||
| barrier.rs | ||
| condvar.rs | ||
| mod.rs | ||
| mutex.rs | ||
| once.rs | ||
| rwlock.rs | ||