Clarify HashMap's capacity handling.
HashMap has two notions of "capacity":
- "Usable capacity": the number of elements a hash map can hold without
resizing. This is the meaning of "capacity" used in HashMap's API,
e.g. the `with_capacity()` function.
- "Internal capacity": the number of allocated slots. Except for the
zero case, it is always larger than the usable capacity (because some
slots must be left empty) and is always a power of two.
HashMap's code is confusing because it does a poor job of
distinguishing these two meanings. I propose using two different terms
for these two concepts. Because "capacity" is already used in HashMap's
API to mean "usable capacity", I will use a different word for "internal
capacity". I propose "span", though I'm happy to consider other names.
Clean up hasher discussion on HashMap
* We never want to make guarantees about protecting against attacks.
* "True randomness" is not the right terminology to be using in this
context.
* There is significantly more nuance to the performance of SipHash than
"somewhat slow".
r? @steveklabnik
Follow up to discussion on #35371
This commit does the following.
- Changes the terminology for capacities used within HashMap's code.
"Internal capacity" is now consistently "raw capacity", and "usable
capacity" is now consistently just "capacity". This makes the code
easier to understand.
- Reworks capacity and raw capacity computations. Raw capacity
computations are now handled in a single place:
`DefaultResizePolicy::raw_capacity()`. This function correctly returns
zero when given zero, which means that the following cases now result
in a capacity of zero when they previously did not.
* `Hash{Map,Set}::with_capacity(0)`
* `Hash{Map,Set}::with_capacity_and_hasher(0)`
* `Hash{Map,Set}::shrink_to_fit()`, when used with a hash map/set whose
elements have all been removed
- Strengthens the language used in the comments describing the above
functions, to make it clearer when they will result in a map/set with
a capacity of zero. The new language is based on the language used for
the corresponding functions in `Vec`.
- Adds tests for the above zero-capacity cases.
- Removes `test_resize_policy` because it is no longer useful.
The following `HashMap` creation functions don't allocate heap storage for elements.
```
HashMap::new()
HashMap::default()
HashMap::with_hasher()
```
This is good, because it's surprisingly common to create a HashMap and never
use it. So that case should be cheap.
However, `HashSet` does not have the same behaviour. The corresponding creation
functions *do* allocate heap storage for the default number of non-zero
elements (which is 32 slots for 29 elements).
```
HashMap::new()
HashMap::default()
HashMap::with_hasher()
```
This commit gives `HashSet` the same behaviour as `HashMap`, by simply calling
the corresponding `HashMap` functions (something `HashSet` already does for
`with_capacity` and `with_capacity_and_hasher`). It also reformats one existing
`HashSet` construction to use a consistent single-line format.
This speeds up rustc itself by 1.01--1.04x on most of the non-tiny
rustc-benchmarks.
* We never want to make guarantees about protecting against attacks.
* "True randomness" is not the right terminology to be using in this
context.
* There is significantly more nuance to the performance of SipHash than
"somewhat slow".
Implement 1581 (FusedIterator)
* [ ] Implement on patterns. See https://github.com/rust-lang/rust/issues/27721#issuecomment-239638642.
* [ ] Handle OS Iterators. A bunch of iterators (`Args`, `Env`, etc.) in libstd wrap platform specific iterators. The current ones all appear to be well-behaved but can we assume that future ones will be?
* [ ] Does someone want to audit this? On first glance, all of the iterators on which I implemented `FusedIterator` appear to be well-behaved but there are a *lot* of them so a second pair of eyes would be nice.
* I haven't touched rustc internal iterators (or the internal rand) because rustc doesn't actually call `fuse()`.
* `FusedIterator` can't be implemented on `std::io::{Bytes, Chars}`.
Closes: #35602 (Tracking Issue)
Implements: rust-lang/rfcs#1581
Because of changes to how Rust acquires randomness HashMap is not
guaranteed to be DoS resistant. This commit reflects these changes in
the docs themselves and provides an alternative method to creating
a hash that is resistant if needed.
This is a rebase and extension of #31356 where we cache the keys in thread local
storage. This should give us a nice speed bost in creating hash maps along with
mostly retaining the property that all maps have a nondeterministic iteration
order.
Closes#27243
rand: don't block before random pool is initialized
If we attempt a read with getrandom() on Linux the syscall can block
before the random pool is initialized unless the GRND_NONBLOCK flag is
passed. This flag causes getrandom() to instead return EAGAIN while the
pool is uninitialized. To avoid downstream users of crate or std
functionality that have no ability to avoid this blocking behavior this
change causes Rust to read bytes from /dev/urandom while getrandom()
would block and once getrandom() is available to use that. Fixes#32953.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Make HashSet::Insert documentation more consistent
I have made the HashSet::Insert documentation more consistent in the use of the term 'value' vs 'key'. Also clarified that if _this_ value is present true is returned, instead of the ambiguous 'a value present'.
r? @steveklabnik