rust/library/core/src
bors 8c4fc9d9a4 Auto merge of #94598 - scottmcm:prefix-free-hasher-methods, r=Amanieu
Add a dedicated length-prefixing method to `Hasher`

This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths

This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.

Fixes #94026

r? rust-lang/libs

---

The core of this change is the following two new methods on `Hasher`:

```rust
pub trait Hasher {
    /// Writes a length prefix into this hasher, as part of being prefix-free.
    ///
    /// If you're implementing [`Hash`] for a custom collection, call this before
    /// writing its contents to this `Hasher`.  That way
    /// `(collection![1, 2, 3], collection![4, 5])` and
    /// `(collection![1, 2], collection![3, 4, 5])` will provide different
    /// sequences of values to the `Hasher`
    ///
    /// The `impl<T> Hash for [T]` includes a call to this method, so if you're
    /// hashing a slice (or array or vector) via its `Hash::hash` method,
    /// you should **not** call this yourself.
    ///
    /// This method is only for providing domain separation.  If you want to
    /// hash a `usize` that represents part of the *data*, then it's important
    /// that you pass it to [`Hasher::write_usize`] instead of to this method.
    ///
    /// # Examples
    ///
    /// ```
    /// #![feature(hasher_prefixfree_extras)]
    /// # // Stubs to make the `impl` below pass the compiler
    /// # struct MyCollection<T>(Option<T>);
    /// # impl<T> MyCollection<T> {
    /// #     fn len(&self) -> usize { todo!() }
    /// # }
    /// # impl<'a, T> IntoIterator for &'a MyCollection<T> {
    /// #     type Item = T;
    /// #     type IntoIter = std::iter::Empty<T>;
    /// #     fn into_iter(self) -> Self::IntoIter { todo!() }
    /// # }
    ///
    /// use std:#️⃣:{Hash, Hasher};
    /// impl<T: Hash> Hash for MyCollection<T> {
    ///     fn hash<H: Hasher>(&self, state: &mut H) {
    ///         state.write_length_prefix(self.len());
    ///         for elt in self {
    ///             elt.hash(state);
    ///         }
    ///     }
    /// }
    /// ```
    ///
    /// # Note to Implementers
    ///
    /// If you've decided that your `Hasher` is willing to be susceptible to
    /// Hash-DoS attacks, then you might consider skipping hashing some or all
    /// of the `len` provided in the name of increased performance.
    #[inline]
    #[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
    fn write_length_prefix(&mut self, len: usize) {
        self.write_usize(len);
    }

    /// Writes a single `str` into this hasher.
    ///
    /// If you're implementing [`Hash`], you generally do not need to call this,
    /// as the `impl Hash for str` does, so you can just use that.
    ///
    /// This includes the domain separator for prefix-freedom, so you should
    /// **not** call `Self::write_length_prefix` before calling this.
    ///
    /// # Note to Implementers
    ///
    /// The default implementation of this method includes a call to
    /// [`Self::write_length_prefix`], so if your implementation of `Hasher`
    /// doesn't care about prefix-freedom and you've thus overridden
    /// that method to do nothing, there's no need to override this one.
    ///
    /// This method is available to be overridden separately from the others
    /// as `str` being UTF-8 means that it never contains `0xFF` bytes, which
    /// can be used to provide prefix-freedom cheaper than hashing a length.
    ///
    /// For example, if your `Hasher` works byte-by-byte (perhaps by accumulating
    /// them into a buffer), then you can hash the bytes of the `str` followed
    /// by a single `0xFF` byte.
    ///
    /// If your `Hasher` works in chunks, you can also do this by being careful
    /// about how you pad partial chunks.  If the chunks are padded with `0x00`
    /// bytes then just hashing an extra `0xFF` byte doesn't necessarily
    /// provide prefix-freedom, as `"ab"` and `"ab\u{0}"` would likely hash
    /// the same sequence of chunks.  But if you pad with `0xFF` bytes instead,
    /// ensuring at least one padding byte, then it can often provide
    /// prefix-freedom cheaper than hashing the length would.
    #[inline]
    #[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
    fn write_str(&mut self, s: &str) {
        self.write_length_prefix(s.len());
        self.write(s.as_bytes());
    }
}
```

With updates to the `Hash` implementations for slices and containers to call `write_length_prefix` instead of `write_usize`.

`write_str` defaults to using `write_length_prefix` since, as was pointed out in the issue, the `write_u8(0xFF)` approach is insufficient for hashers that work in chunks, as those would hash `"a\u{0}"` and `"a"` to the same thing.  But since `SipHash` works byte-wise (there's an internal buffer to accumulate bytes until a full chunk is available) it overrides `write_str` to continue to use the add-non-UTF-8-byte approach.

---

Compatibility:

Because the default implementation of `write_length_prefix` calls `write_usize`, the changed hash implementation for slices will do the same thing the old one did on existing `Hasher`s.
2022-05-06 09:43:57 +00:00
..
alloc fix Layout struct member naming style 2022-04-11 13:35:18 +08:00
array trivial cfg(bootstrap) changes 2022-04-05 23:18:40 +02:00
async_iter Add Stream alias for AsyncIterator 2022-03-15 20:59:13 -07:00
char No need to check the assert all the time. 2022-04-16 19:30:23 +01:00
convert Rollup merge of #96006 - hkBst:patch-2, r=Dylan-DPC 2022-04-13 17:35:37 +02:00
ffi generalize "incoherent impls" impl for custom types 2022-05-05 10:53:00 +02:00
fmt Rollup merge of #95438 - m-ou-se:sync-unsafe-cell, r=joshtriplett 2022-04-04 20:41:32 +02:00
future Rename IntoFuture::Future to IntoFuture::IntoFuture 2022-03-10 20:51:52 +01:00
hash For now, don't change the details of hashing a str 2022-05-06 00:14:44 -07:00
iter This aligns the inline attributes of existing __iterator_get_unchecked with those of next() on adapters that have both. 2022-05-02 20:54:46 +02:00
macros Fix some links in the standard library 2022-05-01 00:02:34 +03:00
mem MaybeUninit array cleanup 2022-04-15 20:53:50 -04:00
num Update int_roundings methods from feedback 2022-05-04 23:20:29 -04:00
ops Add do yeet expressions to allow experimentation in nightly 2022-04-30 17:40:27 -07:00
panic Auto merge of #96348 - overdrivenpotato:inline-location, r=the8472 2022-04-30 16:33:12 +00:00
prelude Create 2024 edition 2022-04-02 02:45:49 -04:00
ptr Fix typo in offset_from documentation 2022-05-02 14:41:21 +00:00
slice This aligns the inline attributes of existing __iterator_get_unchecked with those of next() on adapters that have both. 2022-05-02 20:54:46 +02:00
str Make some usize-typed masks definition agnostic to the size of usize 2022-04-15 17:04:59 +02:00
sync Rollup merge of #95354 - dtolnay:rustc_const_stable, r=lcnr 2022-04-02 03:34:21 +02:00
task Rollup merge of #89869 - kpreid:from-doc, r=yaahc 2022-02-17 06:29:57 +01:00
unicode Regenerate tables for Unicode 14.0.0 2021-10-06 17:49:33 -07:00
any.rs Use implicit capture syntax in format_args 2022-03-10 10:23:40 -05:00
ascii.rs Inline <EscapeDefault as Iterator>::next 2022-03-10 15:35:22 +01:00
bool.rs Stabilize bool::then_some 2022-05-04 13:22:08 +02:00
borrow.rs Make Borrow and BorrowMut impls const 2021-12-04 21:57:39 +09:00
cell.rs Add tracking issue for sync_unsafe_cell. 2022-03-29 19:54:00 +02:00
clone.rs trivial cfg(bootstrap) changes 2022-04-05 23:18:40 +02:00
cmp.rs Derive Eq for std::cmp::Ordering, instead of using manual impl. 2022-03-16 11:36:31 -05:00
default.rs Add documentation 2022-04-07 20:03:24 -04:00
hint.rs Add core::hint::must_use 2022-03-08 10:58:03 -08:00
internal_macros.rs ignore a doctest for the non-exported macro 2022-05-03 18:33:56 +09:00
intrinsics.rs Rollup merge of #96174 - RalfJung:no-run-transmute, r=scottmcm 2022-05-05 19:34:22 -07:00
lazy.rs Rollup merge of #89869 - kpreid:from-doc, r=yaahc 2022-02-17 06:29:57 +01:00
lib.rs Auto merge of #96010 - eduardosm:Unique-on-top-of-NonNull, r=m-ou-se,tmiasko 2022-04-17 05:26:08 +00:00
marker.rs trivial cfg(bootstrap) changes 2022-04-05 23:18:40 +02:00
option.rs Add do yeet expressions to allow experimentation in nightly 2022-04-30 17:40:27 -07:00
panic.rs resolve the conflict in compiler/rustc_session/src/parse.rs 2022-03-16 20:12:30 +08:00
panicking.rs trivial cfg(bootstrap) changes 2022-04-05 23:18:40 +02:00
pin.rs Fix formatting error in pin.rs docs 2022-04-10 12:41:31 -07:00
primitive.rs mv std libs to library/ 2020-07-27 19:51:13 -05:00
primitive_docs.rs Use implicit capture syntax in format_args 2022-03-10 10:23:40 -05:00
result.rs Add do yeet expressions to allow experimentation in nightly 2022-04-30 17:40:27 -07:00
time.rs Adjust feature names that disagree on const stabilization version 2022-03-31 12:34:48 -07:00
tuple.rs Implement tuples using recursion 2022-04-12 16:23:36 -03:00
unit.rs Use implicit capture syntax in format_args 2022-03-10 10:23:40 -05:00