Commit graph

41 commits

Author SHA1 Message Date
Markus Reiter
3628a8f326
Remove unneeded parentheses. 2025-03-08 12:56:00 +01:00
Markus Reiter
90ebc24607
Use intrinsics::assume instead of hint::assert_unchecked. 2025-03-07 20:19:12 +01:00
Markus Reiter
22725588d3
Never inline lookup_slow. 2025-03-07 20:17:52 +01:00
Markus Reiter
34ac75be28
Add second precondition for skip_search. 2025-03-06 21:38:39 +01:00
Markus Reiter
222adac953
Allow optimizing out panic_bounds_check in Unicode checks. 2025-03-06 21:38:39 +01:00
Urgau
8e61502484 core: add #![warn(unreachable_pub)] 2025-01-20 18:35:32 +01:00
Jakub Beránek
536516f949
Reformat Python code with ruff 2024-12-04 23:03:44 +01:00
Boxy
22998f0785 update cfgs 2024-11-27 15:14:54 +00:00
Ralf Jung
eddab479fd stabilize const_unicode_case_lookup 2024-11-12 15:13:31 +01:00
bors
cf2b370ad0 Auto merge of #132500 - RalfJung:char-is-whitespace-const, r=jhpratt
make char::is_whitespace unstably const

I am adding this to the existing https://github.com/rust-lang/rust/issues/132241 feature gate, since `is_digit` and `is_whitespace` seem similar enough that one can group them together.
2024-11-06 04:07:32 +00:00
Matthias Krüger
b438a5cd2a
Rollup merge of #132499 - RalfJung:unicode_data.rs, r=tgross35
unicode_data.rs: show command for generating file

https://github.com/rust-lang/rust/pull/131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes https://github.com/rust-lang/rust/issues/131640.
2024-11-03 12:08:51 +01:00
Ralf Jung
0804815e69 make char::is_whitespace unstably const 2024-11-02 10:17:16 +01:00
Ralf Jung
720d618b5f unicode_data.rs: show command for generating file 2024-11-02 10:06:52 +01:00
Ralf Jung
66351a6184 get rid of a whole bunch of unnecessary rustc_const_unstable attributes 2024-11-02 09:59:55 +01:00
Ralf Jung
90e4f10f6c switch unicode-data back to 'static' 2024-10-13 11:53:06 +02:00
Matthias Krüger
4428d6f363
Rollup merge of #130101 - RalfJung:const-cleanup, r=fee1-dead
some const cleanup: remove unnecessary attributes, add const-hack indications

I learned that we use `FIXME(const-hack)` on top of the "const-hack" label. That seems much better since it marks the right place in the code and moves around with the code. So I went through the PRs with that label and added appropriate FIXMEs in the code. IMO this means we can then remove the label -- Cc ``@rust-lang/wg-const-eval.``

I also noticed some const stability attributes that don't do anything useful, and removed them.

r? ``@fee1-dead``
2024-09-12 19:03:41 +02:00
Marcondiro
c8d9bd488a Bump unicode printable to version 16.0.0 2024-09-10 11:13:35 +02:00
Marcondiro
bdda4ec2f5 Bump unicode_data to version 16.0.0 2024-09-10 10:50:20 +02:00
Ralf Jung
332fa6aa6e add FIXME(const-hack) 2024-09-08 23:08:40 +02:00
Nicholas Nethercote
c5dadd0408 Use #[rustfmt::skip] on some use groups to prevent reordering.
`use` declarations will be reformatted in #125443. Very rarely, there is
a desire to force a group of `use` declarations together in a way that
auto-formatting will break up. E.g. when you want a single comment to
apply to a group. #126776 dealt with all of these in the codebase,
ensuring that no comments intended for multiple `use` declarations would
end up in the wrong place. But some people were unhappy with it.

This commit uses `#[rustfmt::skip]` to create these custom `use` groups
in an idiomatic way for a few of the cases changed in #126776. This
works because rustfmt treats any `use` item annotated with
`#[rustfmt::skip]` as a barrier and won't reorder other `use` items
around it.
2024-07-19 13:26:48 +10:00
Nicholas Nethercote
75b6ec9800 Avoid comments that describe multiple use items.
There are some comments describing multiple subsequent `use` items. When
the big `use` reformatting happens some of these `use` items will be
reordered, possibly moving them away from the comment. With this
additional level of formatting it's not really feasible to have comments
of this type. This commit removes them in various ways:

- merging separate `use` items when appropriate;

- inserting blank lines between the comment and the first `use` item;

- outright deletion (for comments that are relatively low-value);

- adding a separate "top-level" comment.

We also entirely skip formatting for four library files that contain
nothing but `pub use` re-exports, where reordering would be painful.
2024-07-17 08:02:46 +10:00
Arpad Borsos
488598c183
Add a lower bound check to unicode-table-generator output
This adds a dedicated check for the lower bound
(if it is outside of ASCII range) to the output of the `unicode-table-generator` tool.

This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now,
as that is the only one with a lower bound outside of ASCII.
2024-04-20 10:16:45 +02:00
Marcondiro
e9870b5df3
Bump Unicode printables to version 15.1, align to unicode_data 2024-03-28 11:21:52 +01:00
Marcondiro
01fa7209d5
Bump Unicode to version 15.1.0, regenerate tables 2024-02-09 17:35:46 +01:00
Trevor Gross
22d00dcd47 Apply changes to fix python linting errors 2023-06-16 20:56:01 -04:00
Martin Gammelsæter
54f55efb9a Use hex literal for INDEX_MASK 2023-03-21 09:59:47 +01:00
Martin Gammelsæter
355e1dda1d Improve case mapping encoding scheme
The indices are encoded as `u32`s in the range of invalid `char`s, so
that we know that if any mapping fails to parse as a `char` we should
use the value for lookup in the multi-table.

This avoids the second binary search in cases where a multi-`char`
mapping is needed.

Idea from @nikic
2023-03-16 21:42:15 +01:00
Martin Gammelsæter
f9bd884385 Split unicode case LUTs in single and multi variants
The majority of char case replacements are single char replacements,
so storing them as [char; 3] wastes a lot of space.

This commit splits the replacement tables for both `to_lower` and
`to_upper` into two separate tables, one with single-character mappings
and one with multi-character mappings.

This reduces the binary size for programs using all of these tables
with roughly 24K bytes.
2023-03-16 12:34:04 +01:00
Martin Gammelsæter
8a4eb9e3a8 Skip serializing ascii chars in case LUTs
Since ascii chars are already handled by a special case in the
`to_lower` and `to_upper` functions, there's no need to waste space on
them in the LUTs.
2023-03-15 17:27:23 +01:00
jonathanCogan
db47071df2 Replace libstd, libcore, liballoc in line comments. 2022-12-30 14:00:42 +01:00
Thom Chiovoloni
ac55092a14
Bump Unicode to version 15.0.0, regenerate tables 2022-09-14 13:21:19 -07:00
Sage Mitchell
2b328ea5ee
Address feedback from PR #101401 2022-09-04 08:07:53 -07:00
Sage Mitchell
4a3e169da7
Make char::is_lowercase and char::is_uppercase const
Implements #101400.
2022-09-04 08:07:53 -07:00
Bruce A. MacNaughton
5d048eb69d add #inline 2022-07-20 16:13:54 -07:00
Bruce A. MacNaughton
e5d4de3912 generated code 2022-07-19 18:03:33 -07:00
Nilstrieb
3358a41acb Add unicode fast path to is_printable
Before, it would enter the full expensive check even for normal ascii
characters. Now, it skips the check for the ascii characters in
`32..127`. This range was checked manually from the current behavior.
2022-05-31 10:51:35 +02:00
Josh Stone
459a7e340c Regenerate tables for Unicode 14.0.0 2021-10-06 17:49:33 -07:00
Smitty
bdfcb88e8b Use HTTPS links where possible 2021-06-23 16:26:46 -04:00
Miccah Castorina
e48c68479e Add a check for ASCII characters in to_upper and to_lower
This extra check has better performance. See discussion here:
https://internals.rust-lang.org/t/to-upper-speed/13896
2021-02-26 11:39:36 -06:00
Aleksey Kladov
88da5682c3 Privatize some of libcore unicode_internals
My understanding is that these API are perma unstable, so it doesn't
make sense to pollute docs & IDE completion[1] with them.

[1]: https://github.com/rust-analyzer/rust-analyzer/issues/6738
2020-12-07 16:16:42 +03:00
mark
2c31b45ae8 mv std libs to library/ 2020-07-27 19:51:13 -05:00