Commit graph

12 commits

Author SHA1 Message Date
Martin Gammelsæter
54f55efb9a Use hex literal for INDEX_MASK 2023-03-21 09:59:47 +01:00
Martin Gammelsæter
355e1dda1d Improve case mapping encoding scheme
The indices are encoded as `u32`s in the range of invalid `char`s, so
that we know that if any mapping fails to parse as a `char` we should
use the value for lookup in the multi-table.

This avoids the second binary search in cases where a multi-`char`
mapping is needed.

Idea from @nikic
2023-03-16 21:42:15 +01:00
Martin Gammelsæter
f9bd884385 Split unicode case LUTs in single and multi variants
The majority of char case replacements are single char replacements,
so storing them as [char; 3] wastes a lot of space.

This commit splits the replacement tables for both `to_lower` and
`to_upper` into two separate tables, one with single-character mappings
and one with multi-character mappings.

This reduces the binary size for programs using all of these tables
with roughly 24K bytes.
2023-03-16 12:34:04 +01:00
Martin Gammelsæter
8a4eb9e3a8 Skip serializing ascii chars in case LUTs
Since ascii chars are already handled by a special case in the
`to_lower` and `to_upper` functions, there's no need to waste space on
them in the LUTs.
2023-03-15 17:27:23 +01:00
Thom Chiovoloni
ac55092a14
Bump Unicode to version 15.0.0, regenerate tables 2022-09-14 13:21:19 -07:00
Sage Mitchell
2b328ea5ee
Address feedback from PR #101401 2022-09-04 08:07:53 -07:00
Sage Mitchell
4a3e169da7
Make char::is_lowercase and char::is_uppercase const
Implements #101400.
2022-09-04 08:07:53 -07:00
Bruce A. MacNaughton
5d048eb69d add #inline 2022-07-20 16:13:54 -07:00
Bruce A. MacNaughton
e5d4de3912 generated code 2022-07-19 18:03:33 -07:00
Josh Stone
459a7e340c Regenerate tables for Unicode 14.0.0 2021-10-06 17:49:33 -07:00
Miccah Castorina
e48c68479e Add a check for ASCII characters in to_upper and to_lower
This extra check has better performance. See discussion here:
https://internals.rust-lang.org/t/to-upper-speed/13896
2021-02-26 11:39:36 -06:00
mark
2c31b45ae8 mv std libs to library/ 2020-07-27 19:51:13 -05:00
Renamed from src/libcore/unicode/unicode_data.rs (Browse further)