Use associated numeric consts in documentation
Now when the associated constants on int/float types are stabilized and the recommended way of accessing said constants (#68952). We can start using it in this repository, and recommend it via documentation example code.
This PR is the reincarnation of #67913 minus the actual adding + stabilization of said constants. (EDIT: Now it's only changing the documentation. So users will see the new consts, but we don't yet update the internal code)
Because of how fast bit rot happens to PRs that touch this many files, it does not try to replace 100% of the old usage of the constants in the entire repo, but a good chunk of them.
Expand on platform details of `include_xxx` macros
This is a small detail that is not explicitly mentioned, but it left me scratching my head for a while until I looked into its implementation details. Maybe worth mentioning.
Stabilize float::to_int_unchecked
This renames and stabilizes unsafe floating point to integer casts, which are intended to be the substitute for the currently unsound `as` behavior, once that changes to safe-but-slower saturating casts. As such, I believe this also likely unblocks #10184 (our oldest I-unsound issue!), as once this rolls out to stable it would be far easier IMO to change the behavior of `as` to be safe by default.
This does not stabilize the trait or the associated method, as they are deemed internal implementation details (and consumers should not, generally, want to expose them, as in practice all callers likely know statically/without generics what the return type is).
Closes#67058
Overhaul of the `AllocRef` trait to match allocator-wg's latest consens; Take 2
GitHub won't let me reopen#69889 so I make a new PR.
In addition to #69889 this fixes the unsoundness of `RawVec::into_box` when using allocators supporting overallocating. Also it uses `MemoryBlock` in `AllocRef` to unify `_in_place` methods by passing `&mut MemoryBlock`. Additionally, `RawVec` now checks for `size_of::<T>()` again and ignore every ZST. The internal capacity of `RawVec` isn't used by ZSTs anymore, as `into_box` now requires a length to be specified.
r? @Amanieu
fixesrust-lang/wg-allocators#38fixesrust-lang/wg-allocators#41fixesrust-lang/wg-allocators#44fixesrust-lang/wg-allocators#51
add `unused_braces` lint
Add the lint `unused_braces` which is warn by default.
`unused_parens` is also extended and now checks anon consts.
closes#68387
r? @varkor
Fix incorrect documentation for `str::{split_at, split_at_mut}`
The documentation for each method currently states:
> Panics if `mid` is not on a UTF-8 code point boundary, or if it is beyond the last code point of the string slice.
However, this is not consistent with the real behavior, or that of the corresponding methods for `[T]` slices. A comment inside each of the `str` methods states:
> is_char_boundary checks that the index is in [0, .len()]
That is what I would expect the behavior to be, and in fact this seems to be the real behavior. For example ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e03dcc209d4dd176df2297523f9fee1)):
```rust
fn main() {
// Prints ("abc", "") and doesn't panic
println!("{:?}", "abc".split_at(3));
}
```
In this case, I would interpret "the last code point of the string slice" to mean the byte at index 2 in UTF-8. However, it is possible to pass an index of 3, which is definitely "beyond the last code point of the string slice".
I think that this is much clearer, but feel free to bikeshed.
Optimize strip_prefix and strip_suffix with str patterns
As mentioned in https://github.com/rust-lang/rust/issues/67302#issuecomment-585639226.
I'm not sure whether adding these methods to `Pattern` is desirable—but they have default implementations so the change is backwards compatible. Plus it seems like they're slated for wholesale replacement soon anyway? #56345
----
Constructing a Searcher in strip_prefix and strip_suffix is
unnecessarily slow when the pattern is a fixed-length string. Add
strip_prefix and strip_suffix methods to the Pattern trait, and add
optimized implementations of these methods in the str implementation.
The old implementation is retained as the default for these methods.
Constructing a Searcher in strip_prefix and strip_suffix is
unnecessarily slow when the pattern is a fixed-length string. Add
strip_prefix and strip_suffix methods to the Pattern trait, and add
optimized implementations of these methods in the str implementation.
The old implementation is retained as the default for these methods.
Add Result<Result<T, E>, E>::flatten -> Result<T, E>
This PR makes this possible (modulo type inference):
```rust
assert_eq!(Ok(6), Ok(Ok(6)).flatten());
```
Tracking issue: #70142
<sub>largely cribbed directly from <https://github.com/rust-lang/rust/pull/60256></sub>
This renames and stabilizes unsafe floating point to integer casts, which are
intended to be the substitute for the currently unsound `as` behavior, once that
changes to safe-but-slower saturating casts.
Shrink Unicode tables (even more)
This shrinks the Unicode tables further, building upon the wins in #68232 (the previous counts differ due to an interim Unicode version update, see #69929.
The new data structure is slower by around 3x, on the benchmark of looking up every Unicode scalar value in each data set sequentially in every data set included. Note that for ASCII, the exposed functions on `char` optimize with direct branches, so ASCII will retain the same performance regardless of internal optimizations (or the reverse). Also, note that the size reduction due to the skip list (from where the performance losses come) is around 40%, and, as a result, I believe the performance loss is acceptable, as the routines are still quite fast. Anywhere where this is hot, should probably be using a custom data structure anyway (e.g., a raw bitset) or something optimized for frequently seen values, etc.
This PR updates both the bitset data structure, and introduces a new data structure similar to a skip list. For more details, see the [main.rs] of the table generator, which describes both. The commits mostly work individually and document size wins.
As before, this is tested on all valid chars to have the same results as nightly (and the canonical Unicode data sets), happily, no bugs were found.
[main.rs]: fb4a715e18/src/tools/unicode-table-generator/src/main.rs
Set | Previous | New | % of old | Codepoints | Ranges |
----------------|---------:|------:|-----------:|-----------:|-------:|
Alphabetic | 3055 | 1599 | 52% | 132875 | 695 |
Case Ignorable | 2136 | 949 | 44% | 2413 | 410 |
Cased | 934 | 359 | 38% | 4286 | 141 |
Cc | 43 | 9 | 20% | 65 | 2 |
Grapheme Extend | 1774 | 813 | 46% | 1979 | 344 |
Lowercase | 985 | 867 | 88% | 2344 | 652 |
N | 1266 | 419 | 33% | 1781 | 133 |
Uppercase | 934 | 777 | 83% | 1911 | 643 |
White_Space | 140 | 37 | 26% | 25 | 10 |
----------------|----------|-------|------------|------------|--------|
Total | 11267 | 5829 | 51% | - | - |
In practice, for the two data sets that still use the bitset encoding (uppercase
and lowercase) this is not a significant win, so just drop it entirely. It costs
us about 5 bytes, and the complexity is nontrivial.
This arranges for the sparser sets (everything except lower and uppercase) to be
encoded in a significantly smaller context. However, it is also a performance
trade-off (roughly 3x slower than the bitset encoding). The 40% size reduction
is deemed to be sufficiently important to merit this performance loss,
particularly as it is unlikely that this code is hot anywhere (and if it is,
paying the memory cost for a bitset that directly represents the data seems
worthwhile).
Alphabetic : 1599 bytes (- 937 bytes)
Case_Ignorable : 949 bytes (- 822 bytes)
Cased : 359 bytes (- 429 bytes)
Cc : 9 bytes (- 15 bytes)
Grapheme_Extend: 813 bytes (- 675 bytes)
Lowercase : 863 bytes
N : 419 bytes (- 619 bytes)
Uppercase : 776 bytes
White_Space : 37 bytes (- 46 bytes)
Total table sizes: 5824 bytes (-3543 bytes)
Proposal: `fold_self` and `try_fold_self` for Iterators
This pull request proposes & implements two new methods on Iterators: `fold_self` and `try_fold_self`. These are variants of `fold` and `try_fold` that use the first element in the iterator as the initial accumulator.
Let me know if a public feature like this requires an RFC, or if this pull request is sufficient as place for discussion.
- Added `Iterator::fold_first`, which is like `fold`, but uses the first element in the iterator as the initial accumulator
- Includes doc and doctest
- Rebase commit; see #65222 for details
Co-Authored-By: Tim Vermeulen <tvermeulen@me.com>