Refactor core::char::EscapeDefault and co. structures
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.
This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.
This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct. This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.
As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once. On 64-bit platforms, it also reduces size of some of
the structs:
| Struct | Before | After |
|----------------------------+--------+-------+
| core::char::EscapeUnicode | 16 | 12 |
| core::char::EscapeDefault | 16 | 12 |
| core::char::EscapeDebug | 16 | 16 |
My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators. With this change this
will became trivial. It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.
This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.
This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct. This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.
As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once. On 64-bit platforms, it also reduces size of some of
the structs:
| Struct | Before | After |
|----------------------------+--------+-------+
| core::char::EscapeUnicode | 16 | 12 |
| core::char::EscapeDefault | 16 | 12 |
| core::char::EscapeDebug | 16 | 16 |
My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators. With this change this
will became trivial. It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
Use associated items of `char` instead of freestanding items in `core::char`
The associated functions and constants on `char` have been stable since 1.52 and the freestanding items have soft-deprecated since 1.62 (https://github.com/rust-lang/rust/pull/95566). This PR ~~marks them as "deprecated in future", similar to the integer and floating point modules (`core::{i32, f32}` etc)~~ replaces all uses of `core::char::*` with `char::*` to prepare for future deprecation of `core::char::*`.
According to Godbolt¹, on x86_64 using binary and produces slightly
better code than using subtraction. Readability of both is pretty
much equivalent so might just as well use the shorter option.
¹ https://rust.godbolt.org/z/9jM3ejbMx
Stabilize const char convert
Split out `const_char_from_u32_unchecked` from `const_char_convert` and stabilize the rest, i.e. stabilize the following functions:
```Rust
impl char {
pub const fn from_u32(self, i: u32) -> Option<char>;
pub const fn from_digit(self, num: u32, radix: u32) -> Option<char>;
pub const fn to_digit(self, radix: u32) -> Option<u32>;
}
// Available through core::char and std::char
mod char {
pub const fn from_u32(i: u32) -> Option<char>;
pub const fn from_digit(num: u32, radix: u32) -> Option<char>;
}
```
And put the following under the `from_u32_unchecked` const stability gate as it needs `Option::unwrap` which isn't const-stable (yet):
```Rust
impl char {
pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}
// Available through core::char and std::char
mod char {
pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}
```
cc the tracking issue #89259 (which I'd like to keep open for `const_char_from_u32_unchecked`).
Avoid duplication of doc comments in `std::char` constants and functions
For those consts and functions, only the summary is kept and a reference to the `char` associated const/method is included.
Additionaly, re-exported functions have been converted to function definitions that call the previously re-exported function. This makes it easier to add a deprecated attribute to these functions in the future.
Bump bootstrap compiler to 1.61.0 beta
This PR bumps the bootstrap compiler to the 1.61.0 beta. The first commit changes the stage0 compiler, the second commit applies the "mechanical" changes and the third and fourth commits apply changes explained in the relevant comments.
r? `@Mark-Simulacrum`
For those consts and functions, only the summary is kept and a reference to the `char` associated const/method is included.
Additionaly, re-exported functions have been converted to function definitions that call the previously re-exported function. This makes it easier to add a deprecated attribute to these functions in the future.
This updates the standard library's documentation to use the new syntax. The
documentation is worthwhile to update as it should be more idiomatic
(particularly for features like this, which are nice for users to get acquainted
with). The general codebase is likely more hassle than benefit to update: it'll
hurt git blame, and generally updates can be done by folks updating the code if
(and when) that makes things more readable with the new format.
A few places in the compiler and library code are updated (mostly just due to
already having been done when this commit was first authored).
Clarify documentation on char::MAX
As mentioned in https://github.com/rust-lang/rust/issues/91836#issuecomment-994106874, the documentation on `char::MAX` is not quite correct – USVs are not "only ones within a certain range", they are code points _outside_ a certain range. I have corrected this and given the actual numbers as there is no reason to hide them.
Previously suggested in https://github.com/rust-lang/rfcs/issues/2854.
It makes sense to have this since `char` implements `From<u8>`. Likewise
`u32`, `u64`, and `u128` (since #79502) implement `From<char>`.
Allow reverse iteration of lowercase'd/uppercase'd chars
The PR implements `DoubleEndedIterator` trait for `ToLowercase` and `ToUppercase`.
This enables reverse iteration of lowercase/uppercase variants of character sequences.
One of use cases: determining whether a char sequence is a suffix of another one.
Example:
```rust
fn endswith_ignore_case(s1: &str, s2: &str) -> bool {
for eob in s1
.chars()
.flat_map(|c| c.to_lowercase())
.rev()
.zip_longest(s2.chars().flat_map(|c| c.to_lowercase()).rev())
{
match eob {
EitherOrBoth::Both(c1, c2) => {
if c1 != c2 {
return false;
}
}
EitherOrBoth::Left(_) => return true,
EitherOrBoth::Right(_) => return false,
}
}
true
}
```
Make char conversion functions unstably const
The char conversion functions like `char::from_u32` do trivial computations and can easily be converted into const fns. Only smaller tricks are needed to avoid non-const standard library functions like `Result::ok` or `bool::then_some`.
Tracking issue: https://github.com/rust-lang/rust/issues/89259
Add #[must_use] to remaining core functions
I've run out of compelling reasons to group functions together across crates so I'm just going to go module-by-module. This is everything remaining from the `core` crate.
Ignored by clippy for reasons unknown:
```rust
core::alloc::Layout unsafe fn for_value_raw<T: ?Sized>(t: *const T) -> Self;
core::any const fn type_name_of_val<T: ?Sized>(_val: &T) -> &'static str;
```
Ignored by clippy because of `mut`:
```rust
str fn split_at_mut(&mut self, mid: usize) -> (&mut str, &mut str);
```
<del>
Ignored by clippy presumably because a caller might want `f` called for side effects. That seems like a bad usage of `map` to me.
```rust
core::cell::Ref<'b, T> fn map<U: ?Sized, F>(orig: Ref<'b, T>, f: F) -> Ref<'b, T>;
core::cell::Ref<'b, T> fn map_split<U: ?Sized, V: ?Sized, F>(orig: Ref<'b, T>, f: F) -> (Ref<'b, U>, Ref<'b, V>);
```
</del>
Parent issue: #89692
r? ```@joshtriplett```