This adds a dedicated check for the lower bound
(if it is outside of ASCII range) to the output of the `unicode-table-generator` tool.
This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now,
as that is the only one with a lower bound outside of ASCII.
Makes the iterator 2*usize larger, but I doubt that matters much.
In exchange, we save a lot on instruction count.
In the absence of delegation syntax,
we must forward all the specialized impls manually…
Add ASCII fast-path for `char::is_grapheme_extended`
I discovered that `impl Debug for str` is quite slow because it ends up doing a `unicode_data::grapheme_extend::lookup` for each char, which ends up doing a binary search.
This introduces a fast-path for ASCII chars which do not have this property.
The `lookup` is thus completely gone from profiles.
---
As a followup, maybe it’s worth implementing this fast path directly in `unicode_data` so that it can check for the lower bound directly before going to a potentially expensive binary search.
I discovered that `impl Debug for str` is quite slow because it ends up doing a `unicode_data::grapheme_extend::lookup` for each char, which ends up doing a binary search.
This introduces a fast-path for ASCII chars which do not have this property.
The `lookup` is thus completely gone from profiles.
detects redundant imports that can be eliminated.
for #117772 :
In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
Decompose singular `matches!` with or-patterns to individual `matches!`
statements to enable branchless code output. The following functions
were changed:
- `is_ascii_alphanumeric`
- `is_ascii_hexdigit`
- `is_ascii_punctuation`
Add codegen tests
Co-authored-by: George Bateman <george.bateman16@gmail.com>
Co-authored-by: scottmcm <scottmcm@users.noreply.github.com>
Implement FusedIterator for DecodeUtf16 when the inner iterator does
I have just implemented an iterator that wraps `DecodeUtf16` and wanted to implement `FusedIterator` for my iterator when I noticed that `DecodeUtf16` currently doesn't implement `FusedIterator` at all.
A quick look at the code of `DecodeUtf16` revealed that `DecodeUtf16::next` only returns `None` when its inner iterator returns `None`:
3462f79e94/library/core/src/char/decode.rs (L45)
As a result, we can implement `FusedIterator` for `DecodeUtf16` when the inner iterator does.
I'm following the example of #96397 here and consider this change minor and non-controversial, which is why I haven't added an RFC. I have also added the required feature name (`"decode_utf16_fused_iterator"`), however without adding a chapter to the Rust Unstable book (same as #96397).
This means that `EscapeIterInner::as_str` no longer needs unsafe code, because the type system ensures the internal buffer is only ASCII, and thus valid UTF-8.
Refactor core::char::EscapeDefault and co. structures
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.
This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.
This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct. This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.
As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once. On 64-bit platforms, it also reduces size of some of
the structs:
| Struct | Before | After |
|----------------------------+--------+-------+
| core::char::EscapeUnicode | 16 | 12 |
| core::char::EscapeDefault | 16 | 12 |
| core::char::EscapeDebug | 16 | 16 |
My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators. With this change this
will became trivial. It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.
This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.
This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct. This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.
As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once. On 64-bit platforms, it also reduces size of some of
the structs:
| Struct | Before | After |
|----------------------------+--------+-------+
| core::char::EscapeUnicode | 16 | 12 |
| core::char::EscapeDefault | 16 | 12 |
| core::char::EscapeDebug | 16 | 16 |
My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators. With this change this
will became trivial. It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
Use associated items of `char` instead of freestanding items in `core::char`
The associated functions and constants on `char` have been stable since 1.52 and the freestanding items have soft-deprecated since 1.62 (https://github.com/rust-lang/rust/pull/95566). This PR ~~marks them as "deprecated in future", similar to the integer and floating point modules (`core::{i32, f32}` etc)~~ replaces all uses of `core::char::*` with `char::*` to prepare for future deprecation of `core::char::*`.
According to Godbolt¹, on x86_64 using binary and produces slightly
better code than using subtraction. Readability of both is pretty
much equivalent so might just as well use the shorter option.
¹ https://rust.godbolt.org/z/9jM3ejbMx
Stabilize const char convert
Split out `const_char_from_u32_unchecked` from `const_char_convert` and stabilize the rest, i.e. stabilize the following functions:
```Rust
impl char {
pub const fn from_u32(self, i: u32) -> Option<char>;
pub const fn from_digit(self, num: u32, radix: u32) -> Option<char>;
pub const fn to_digit(self, radix: u32) -> Option<u32>;
}
// Available through core::char and std::char
mod char {
pub const fn from_u32(i: u32) -> Option<char>;
pub const fn from_digit(num: u32, radix: u32) -> Option<char>;
}
```
And put the following under the `from_u32_unchecked` const stability gate as it needs `Option::unwrap` which isn't const-stable (yet):
```Rust
impl char {
pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}
// Available through core::char and std::char
mod char {
pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}
```
cc the tracking issue #89259 (which I'd like to keep open for `const_char_from_u32_unchecked`).