Commit graph

20 commits

Author SHA1 Message Date
Arpad Borsos
488598c183
Add a lower bound check to unicode-table-generator output
This adds a dedicated check for the lower bound
(if it is outside of ASCII range) to the output of the `unicode-table-generator` tool.

This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now,
as that is the only one with a lower bound outside of ASCII.
2024-04-20 10:16:45 +02:00
Marcondiro
e9870b5df3
Bump Unicode printables to version 15.1, align to unicode_data 2024-03-28 11:21:52 +01:00
Marcondiro
01fa7209d5
Bump Unicode to version 15.1.0, regenerate tables 2024-02-09 17:35:46 +01:00
Trevor Gross
22d00dcd47 Apply changes to fix python linting errors 2023-06-16 20:56:01 -04:00
Martin Gammelsæter
54f55efb9a Use hex literal for INDEX_MASK 2023-03-21 09:59:47 +01:00
Martin Gammelsæter
355e1dda1d Improve case mapping encoding scheme
The indices are encoded as `u32`s in the range of invalid `char`s, so
that we know that if any mapping fails to parse as a `char` we should
use the value for lookup in the multi-table.

This avoids the second binary search in cases where a multi-`char`
mapping is needed.

Idea from @nikic
2023-03-16 21:42:15 +01:00
Martin Gammelsæter
f9bd884385 Split unicode case LUTs in single and multi variants
The majority of char case replacements are single char replacements,
so storing them as [char; 3] wastes a lot of space.

This commit splits the replacement tables for both `to_lower` and
`to_upper` into two separate tables, one with single-character mappings
and one with multi-character mappings.

This reduces the binary size for programs using all of these tables
with roughly 24K bytes.
2023-03-16 12:34:04 +01:00
Martin Gammelsæter
8a4eb9e3a8 Skip serializing ascii chars in case LUTs
Since ascii chars are already handled by a special case in the
`to_lower` and `to_upper` functions, there's no need to waste space on
them in the LUTs.
2023-03-15 17:27:23 +01:00
jonathanCogan
db47071df2 Replace libstd, libcore, liballoc in line comments. 2022-12-30 14:00:42 +01:00
Thom Chiovoloni
ac55092a14
Bump Unicode to version 15.0.0, regenerate tables 2022-09-14 13:21:19 -07:00
Sage Mitchell
2b328ea5ee
Address feedback from PR #101401 2022-09-04 08:07:53 -07:00
Sage Mitchell
4a3e169da7
Make char::is_lowercase and char::is_uppercase const
Implements #101400.
2022-09-04 08:07:53 -07:00
Bruce A. MacNaughton
5d048eb69d add #inline 2022-07-20 16:13:54 -07:00
Bruce A. MacNaughton
e5d4de3912 generated code 2022-07-19 18:03:33 -07:00
Nilstrieb
3358a41acb Add unicode fast path to is_printable
Before, it would enter the full expensive check even for normal ascii
characters. Now, it skips the check for the ascii characters in
`32..127`. This range was checked manually from the current behavior.
2022-05-31 10:51:35 +02:00
Josh Stone
459a7e340c Regenerate tables for Unicode 14.0.0 2021-10-06 17:49:33 -07:00
Smitty
bdfcb88e8b Use HTTPS links where possible 2021-06-23 16:26:46 -04:00
Miccah Castorina
e48c68479e Add a check for ASCII characters in to_upper and to_lower
This extra check has better performance. See discussion here:
https://internals.rust-lang.org/t/to-upper-speed/13896
2021-02-26 11:39:36 -06:00
Aleksey Kladov
88da5682c3 Privatize some of libcore unicode_internals
My understanding is that these API are perma unstable, so it doesn't
make sense to pollute docs & IDE completion[1] with them.

[1]: https://github.com/rust-analyzer/rust-analyzer/issues/6738
2020-12-07 16:16:42 +03:00
mark
2c31b45ae8 mv std libs to library/ 2020-07-27 19:51:13 -05:00