Commit graph

22 commits

Author SHA1 Message Date
Nicholas Nethercote
c5dadd0408 Use #[rustfmt::skip] on some use groups to prevent reordering.
`use` declarations will be reformatted in #125443. Very rarely, there is
a desire to force a group of `use` declarations together in a way that
auto-formatting will break up. E.g. when you want a single comment to
apply to a group. #126776 dealt with all of these in the codebase,
ensuring that no comments intended for multiple `use` declarations would
end up in the wrong place. But some people were unhappy with it.

This commit uses `#[rustfmt::skip]` to create these custom `use` groups
in an idiomatic way for a few of the cases changed in #126776. This
works because rustfmt treats any `use` item annotated with
`#[rustfmt::skip]` as a barrier and won't reorder other `use` items
around it.
2024-07-19 13:26:48 +10:00
Nicholas Nethercote
75b6ec9800 Avoid comments that describe multiple use items.
There are some comments describing multiple subsequent `use` items. When
the big `use` reformatting happens some of these `use` items will be
reordered, possibly moving them away from the comment. With this
additional level of formatting it's not really feasible to have comments
of this type. This commit removes them in various ways:

- merging separate `use` items when appropriate;

- inserting blank lines between the comment and the first `use` item;

- outright deletion (for comments that are relatively low-value);

- adding a separate "top-level" comment.

We also entirely skip formatting for four library files that contain
nothing but `pub use` re-exports, where reordering would be painful.
2024-07-17 08:02:46 +10:00
Arpad Borsos
488598c183
Add a lower bound check to unicode-table-generator output
This adds a dedicated check for the lower bound
(if it is outside of ASCII range) to the output of the `unicode-table-generator` tool.

This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now,
as that is the only one with a lower bound outside of ASCII.
2024-04-20 10:16:45 +02:00
Marcondiro
e9870b5df3
Bump Unicode printables to version 15.1, align to unicode_data 2024-03-28 11:21:52 +01:00
Marcondiro
01fa7209d5
Bump Unicode to version 15.1.0, regenerate tables 2024-02-09 17:35:46 +01:00
Trevor Gross
22d00dcd47 Apply changes to fix python linting errors 2023-06-16 20:56:01 -04:00
Martin Gammelsæter
54f55efb9a Use hex literal for INDEX_MASK 2023-03-21 09:59:47 +01:00
Martin Gammelsæter
355e1dda1d Improve case mapping encoding scheme
The indices are encoded as `u32`s in the range of invalid `char`s, so
that we know that if any mapping fails to parse as a `char` we should
use the value for lookup in the multi-table.

This avoids the second binary search in cases where a multi-`char`
mapping is needed.

Idea from @nikic
2023-03-16 21:42:15 +01:00
Martin Gammelsæter
f9bd884385 Split unicode case LUTs in single and multi variants
The majority of char case replacements are single char replacements,
so storing them as [char; 3] wastes a lot of space.

This commit splits the replacement tables for both `to_lower` and
`to_upper` into two separate tables, one with single-character mappings
and one with multi-character mappings.

This reduces the binary size for programs using all of these tables
with roughly 24K bytes.
2023-03-16 12:34:04 +01:00
Martin Gammelsæter
8a4eb9e3a8 Skip serializing ascii chars in case LUTs
Since ascii chars are already handled by a special case in the
`to_lower` and `to_upper` functions, there's no need to waste space on
them in the LUTs.
2023-03-15 17:27:23 +01:00
jonathanCogan
db47071df2 Replace libstd, libcore, liballoc in line comments. 2022-12-30 14:00:42 +01:00
Thom Chiovoloni
ac55092a14
Bump Unicode to version 15.0.0, regenerate tables 2022-09-14 13:21:19 -07:00
Sage Mitchell
2b328ea5ee
Address feedback from PR #101401 2022-09-04 08:07:53 -07:00
Sage Mitchell
4a3e169da7
Make char::is_lowercase and char::is_uppercase const
Implements #101400.
2022-09-04 08:07:53 -07:00
Bruce A. MacNaughton
5d048eb69d add #inline 2022-07-20 16:13:54 -07:00
Bruce A. MacNaughton
e5d4de3912 generated code 2022-07-19 18:03:33 -07:00
Nilstrieb
3358a41acb Add unicode fast path to is_printable
Before, it would enter the full expensive check even for normal ascii
characters. Now, it skips the check for the ascii characters in
`32..127`. This range was checked manually from the current behavior.
2022-05-31 10:51:35 +02:00
Josh Stone
459a7e340c Regenerate tables for Unicode 14.0.0 2021-10-06 17:49:33 -07:00
Smitty
bdfcb88e8b Use HTTPS links where possible 2021-06-23 16:26:46 -04:00
Miccah Castorina
e48c68479e Add a check for ASCII characters in to_upper and to_lower
This extra check has better performance. See discussion here:
https://internals.rust-lang.org/t/to-upper-speed/13896
2021-02-26 11:39:36 -06:00
Aleksey Kladov
88da5682c3 Privatize some of libcore unicode_internals
My understanding is that these API are perma unstable, so it doesn't
make sense to pollute docs & IDE completion[1] with them.

[1]: https://github.com/rust-analyzer/rust-analyzer/issues/6738
2020-12-07 16:16:42 +03:00
mark
2c31b45ae8 mv std libs to library/ 2020-07-27 19:51:13 -05:00