Commit graph

128 commits

Author SHA1 Message Date
Markus Reiter
4edf12d33e
Improve escape methods. 2024-05-09 17:04:30 +02:00
Markus Reiter
16981ba406
Avoid panicking branch in EscapeIterInner. 2024-05-08 21:52:32 +02:00
Markus Reiter
e3fc97be2b
Inline EscapeDebug::size_hint. 2024-05-08 21:52:31 +02:00
Arpad Borsos
488598c183
Add a lower bound check to unicode-table-generator output
This adds a dedicated check for the lower bound
(if it is outside of ASCII range) to the output of the `unicode-table-generator` tool.

This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now,
as that is the only one with a lower bound outside of ASCII.
2024-04-20 10:16:45 +02:00
bors
1c19595575 Auto merge of #122616 - Jules-Bertholet:casemappingiter-layout, r=Nilstrieb
Optimize `core::char::CaseMappingIter`

Godbolt says this saves a few instructions…

`@rustbot` label T-libs A-layout C-optimization
2024-03-29 07:02:56 +00:00
Daniel Paoliello
d261647c93 Import the 2021 prelude in the core crate 2024-03-25 13:12:06 -07:00
Ralf Jung
987ef4c922 move assert_unsafe_preconditions to its own file
These macros and functions are not intrinsics, after all.
2024-03-23 18:44:17 +01:00
Jules Bertholet
2f1ab2ce09
Reimplement CaseMappingIter with core::array::IntoIter
Makes the iterator 2*usize larger, but I doubt that matters much.
In exchange, we save a lot on instruction count.

In the absence of delegation syntax,
we must forward all the specialized impls manually…
2024-03-18 23:07:28 -04:00
Jules Bertholet
1c137b7582
Optimize core::char::CaseMappingIter layout
Godbolt says this saves a few instructions…
2024-03-16 23:42:06 -04:00
Ben Kimock
5a93a59fd5 Distinguish between library and lang UB in assert_unsafe_precondition 2024-03-08 18:53:58 -05:00
bors
bdde2a80ae Auto merge of #121138 - Swatinem:grapheme-extend-ascii, r=cuviper
Add ASCII fast-path for `char::is_grapheme_extended`

I discovered that `impl Debug for str` is quite slow because it ends up doing a `unicode_data::grapheme_extend::lookup` for each char, which ends up doing a binary search.

This introduces a fast-path for ASCII chars which do not have this property.

The `lookup` is thus completely gone from profiles.

---

As a followup, maybe it’s worth implementing this fast path directly in `unicode_data` so that it can check for the lower bound directly before going to a potentially expensive binary search.
2024-03-05 10:28:55 +00:00
Arpad Borsos
8eaaa6e610
Add ASCII fast-path for char::is_grapheme_extended
I discovered that `impl Debug for str` is quite slow because it ends up doing a `unicode_data::grapheme_extend::lookup` for each char, which ends up doing a binary search.

This introduces a fast-path for ASCII chars which do not have this property.

The `lookup` is thus completely gone from profiles.
2024-02-15 12:00:34 +01:00
Markus Reiter
746a58d435
Use generic NonZero internally. 2024-02-15 08:09:42 +01:00
Ben Kimock
61118ffd04 Rewrite assert_unsafe_precondition around the new intrinsic 2024-02-08 11:52:14 -05:00
Chris Denton
ab716d0635
Use assert_unsafe_precondition for char::from_u32_unchecked
Co-Authored-By: joboet <jonasboettiger@icloud.com>
2023-12-15 15:15:24 +00:00
surechen
40ae34194c remove redundant imports
detects redundant imports that can be eliminated.

for #117772 :

In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
2023-12-10 10:56:22 +08:00
Mark Rousskov
efe54e24aa Substitute version placeholders 2023-11-15 19:40:51 -05:00
okaneco
465ffc9ca7 Refactor some char, u8 ascii functions to be branchless
Decompose singular `matches!` with or-patterns to individual `matches!`
statements to enable branchless code output. The following functions
were changed:
- `is_ascii_alphanumeric`
- `is_ascii_hexdigit`
- `is_ascii_punctuation`

Add codegen tests

Co-authored-by: George Bateman <george.bateman16@gmail.com>
Co-authored-by: scottmcm <scottmcm@users.noreply.github.com>
2023-10-26 21:48:36 -04:00
bors
64368d0279 Auto merge of #110729 - ColinFinck:decode-utf16-fused-iterator, r=dtolnay
Implement FusedIterator for DecodeUtf16 when the inner iterator does

I have just implemented an iterator that wraps `DecodeUtf16` and wanted to implement `FusedIterator` for my iterator when I noticed that `DecodeUtf16` currently doesn't implement `FusedIterator` at all.
A quick look at the code of `DecodeUtf16` revealed that `DecodeUtf16::next` only returns `None` when its inner iterator returns `None`:
3462f79e94/library/core/src/char/decode.rs (L45)

As a result, we can implement `FusedIterator` for `DecodeUtf16` when the inner iterator does.

I'm following the example of #96397 here and consider this change minor and non-controversial, which is why I haven't added an RFC. I have also added the required feature name (`"decode_utf16_fused_iterator"`), however without adding a chapter to the Rust Unstable book (same as #96397).
2023-10-15 17:09:37 +00:00
Mark Rousskov
787d32324c Bump version placeholders 2023-10-03 20:26:36 -04:00
bors
feb06732c0 Auto merge of #114299 - clarfonthey:char-min, r=dtolnay,BurntSushi
Add char::MIN

ACP: rust-lang/libs-team#252
Tracking issue: #114298

r? `@rust-lang/libs-api`
2023-09-08 00:02:48 +00:00
ltdk
9fce8abe0b I'm mathematically challenged 2023-07-31 15:08:52 -04:00
ltdk
bd6ccf31de Can't compare usize and u32 2023-07-31 13:24:30 -04:00
ltdk
0165a4cf5f Use u32::from for MIN/MAX examples 2023-07-31 13:22:16 -04:00
ltdk
b64f3c7181 Add note on gap for MIN/MAX 2023-07-31 13:21:42 -04:00
ltdk
f65fbe9517 Add char::MIN 2023-07-31 12:34:55 -04:00
Lukas Markeffsky
f86b34730d impl TryFrom<char> for u16 2023-07-25 19:58:00 +02:00
Scott McMurray
28449daa22 ascii::Char-ify the escaping code
This means that `EscapeIterInner::as_str` no longer needs unsafe code, because the type system ensures the internal buffer is only ASCII, and thus valid UTF-8.
2023-05-12 19:37:02 -07:00
Matthias Krüger
ea0b6504fa
Rollup merge of #111009 - scottmcm:ascii-char, r=BurntSushi
Add `ascii::Char` (ACP#179)

ACP second: https://github.com/rust-lang/libs-team/issues/179#issuecomment-1527900570
New tracking issue: https://github.com/rust-lang/rust/issues/110998

For now this is an `enum` as `@kupiakos` [suggested](https://github.com/rust-lang/libs-team/issues/179#issuecomment-1527959724), with the variants under a different feature flag.

There's lots more things that could be added here, and place for further doc updates, but this seems like a plausible starting point PR.

I've gone through and put an `as_ascii` next to every `is_ascii`: on `u8`, `char`, `[u8]`, and `str`.

As a demonstration, made a commit updating some formatting code to use this: https://github.com/scottmcm/rust/commit/ascii-char-in-fmt (I don't want to include that in this PR, though, because that brings in perf questions that don't exist if this is just adding new unstable APIs.)
2023-05-04 19:18:21 +02:00
Scott McMurray
8c781b0906 Add the basic ascii::Char type 2023-05-03 22:09:33 -07:00
Dylan DPC
f916c44aec
Rollup merge of #105076 - mina86:a, r=scottmcm
Refactor core::char::EscapeDefault and co. structures

Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.

This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.

This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct.  This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.

As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once.  On 64-bit platforms, it also reduces size of some of
the structs:

    | Struct                     | Before | After |
    |----------------------------+--------+-------+
    | core::char::EscapeUnicode  |     16 |    12 |
    | core::char::EscapeDefault  |     16 |    12 |
    | core::char::EscapeDebug    |     16 |    16 |

My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators.  With this change this
will became trivial.  It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
2023-05-02 11:44:50 +05:30
Michal Nazarewicz
4d0f7e2f39 review 2023-04-30 03:59:11 +02:00
Colin Finck
60fd119a29
Implement FusedIterator for DecodeUtf16 when the inner iterator does 2023-04-24 13:34:36 +02:00
Deadbeef
76dbe29104 rm const traits in libcore 2023-04-16 06:49:27 +00:00
Michal Nazarewicz
45104397e5 Refactor core::char::EscapeDefault and co. structures
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.

This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.

This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct.  This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.

As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once.  On 64-bit platforms, it also reduces size of some of
the structs:

    | Struct                     | Before | After |
    |----------------------------+--------+-------+
    | core::char::EscapeUnicode  |     16 |    12 |
    | core::char::EscapeDefault  |     16 |    12 |
    | core::char::EscapeDebug    |     16 |    16 |

My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators.  With this change this
will became trivial.  It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
2023-04-05 19:09:55 +02:00
bors
adb4bfd25d Auto merge of #105671 - lukas-code:depreciate-char, r=scottmcm
Use associated items of `char` instead of freestanding items in `core::char`

The associated functions and constants on `char` have been stable since 1.52 and the freestanding items have soft-deprecated since 1.62 (https://github.com/rust-lang/rust/pull/95566). This PR ~~marks them as "deprecated in future", similar to the integer and floating point modules (`core::{i32, f32}` etc)~~ replaces all uses of `core::char::*` with `char::*` to prepare for future deprecation of `core::char::*`.
2023-02-12 11:09:06 +00:00
Tobias Bucher
77c85e9cba Remove a couple of #[doc(hidden)] pub fn and their #[feature] gates 2023-02-10 08:06:35 +01:00
Lukas Markeffsky
76e216f29b Use associated items of char instead of freestanding items in core::char 2023-01-14 11:58:41 +01:00
Pietro Albini
f6762c2035 update stabilization version numbers 2022-12-28 09:18:42 -05:00
Michal Nazarewicz
28162ad970 char: µoptimise UTF-16 surrogates decoding
According to Godbolt¹, on x86_64 using binary and produces slightly
better code than using subtraction.  Readability of both is pretty
much equivalent so might just as well use the shorter option.

¹ https://rust.godbolt.org/z/9jM3ejbMx
2022-12-23 14:15:33 +01:00
Matthias Krüger
8c77da87d7
Rollup merge of #102470 - est31:stabilize_const_char_convert, r=joshtriplett
Stabilize const char convert

Split out `const_char_from_u32_unchecked` from `const_char_convert` and stabilize the rest, i.e. stabilize the following functions:

```Rust
impl char {
    pub const fn from_u32(self, i: u32) -> Option<char>;
    pub const fn from_digit(self, num: u32, radix: u32) -> Option<char>;
    pub const fn to_digit(self, radix: u32) -> Option<u32>;
}

// Available through core::char and std::char
mod char {
    pub const fn from_u32(i: u32) -> Option<char>;
    pub const fn from_digit(num: u32, radix: u32) -> Option<char>;
}
```

And put the following under the `from_u32_unchecked` const stability gate as it needs `Option::unwrap` which isn't const-stable (yet):

```Rust
impl char {
    pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}

// Available through core::char and std::char
mod char {
    pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}
```

cc the tracking issue #89259 (which I'd like to keep open for `const_char_from_u32_unchecked`).
2022-11-14 19:26:15 +01:00
Sky
a6372525ce
Clarify the possible return values of len_utf16 2022-10-16 11:06:19 -04:00
est31
176c44c08e Stabilize const_char_convert 2022-09-29 14:26:56 +02:00
est31
12c15a2bfe Split out from_u32_unchecked from const_char_convert
It relies on the Option::unwrap function which is not const-stable (yet).
2022-09-29 14:26:24 +02:00
Akshay
591c1f25b2 introduce {char, u8}::is_ascii_octdigit 2022-09-27 11:55:13 +05:30
Pietro Albini
3975d55d98
remove cfg(bootstrap) 2022-09-26 10:14:45 +02:00
Sage Mitchell
4a3e169da7
Make char::is_lowercase and char::is_uppercase const
Implements #101400.
2022-09-04 08:07:53 -07:00
Jane Losare-Lusby
bf7611d55e Move error trait into core 2022-08-22 13:28:25 -07:00
Cameron Steffen
17ddcb434b Improve primitive/std docs separation and headers 2022-08-20 16:50:29 -05:00
Vincenzo Palazzo
23bd7cbcb1 docs: remove repetition
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-08-09 21:54:05 +00:00