This adds a dedicated check for the lower bound
(if it is outside of ASCII range) to the output of the `unicode-table-generator` tool.
This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now,
as that is the only one with a lower bound outside of ASCII.
The indices are encoded as `u32`s in the range of invalid `char`s, so
that we know that if any mapping fails to parse as a `char` we should
use the value for lookup in the multi-table.
This avoids the second binary search in cases where a multi-`char`
mapping is needed.
Idea from @nikic
The majority of char case replacements are single char replacements,
so storing them as [char; 3] wastes a lot of space.
This commit splits the replacement tables for both `to_lower` and
`to_upper` into two separate tables, one with single-character mappings
and one with multi-character mappings.
This reduces the binary size for programs using all of these tables
with roughly 24K bytes.
Since ascii chars are already handled by a special case in the
`to_lower` and `to_upper` functions, there's no need to waste space on
them in the LUTs.
Before, it would enter the full expensive check even for normal ascii
characters. Now, it skips the check for the ascii characters in
`32..127`. This range was checked manually from the current behavior.