Split unicode case LUTs in single and multi variants

user0/rust

The majority of char case replacements are single char replacements,
so storing them as [char; 3] wastes a lot of space.

This commit splits the replacement tables for both `to_lower` and
`to_upper` into two separate tables, one with single-character mappings
and one with multi-character mappings.

This reduces the binary size for programs using all of these tables
with roughly 24K bytes.

This commit is contained in:

Martin Gammelsæter

2023-03-16 11:56:33 +01:00

parent 8a4eb9e3a8

commit f9bd884385

2 changed files with 1008 additions and 1695 deletions

2645

library/core/src/unicode/unicode_data.rs

View file

File diff suppressed because it is too large Load diff

Rows
Columns

Split unicode case LUTs in single and multi variants

2645 library/core/src/unicode/unicode_data.rs View file

2645

library/core/src/unicode/unicode_data.rs

View file