Shrink unicode case-mapping LUTs by 24k
I was looking into the binary bloat of a small program using `str::to_lowercase` and `str::to_uppercase`, and noticed that the lookup tables used for case mapping had a lot of zero-bytes in them. The reason for this is that since some characters map to up to three other characters when lower or uppercased, the LUTs store a `[char; 3]` for each character. However, the vast majority of cases only map to a single new character, in other words most of the entries are e.g. `(lowerc, [upperc, '\0', '\0'])`.
This PR introduces a new encoding scheme for these tables.
The changes reduces the size of my test binary by about 24K.
I've also done some `#[bench]`marks on unicode-heavy test data, and found that the performance of both `str::to_lowercase` and `str::to_uppercase` improves by up to 20%. These measurements are obviously very dependent on the character distribution of the data.
Someone else will have to decide whether this more complex scheme is worth it or not, I was just goofing around a bit and here's what came out of it 🤷♂️ No hard feelings if this isn't wanted!
|
||
|---|---|---|
| .. | ||
| bootstrap | ||
| ci | ||
| doc | ||
| etc | ||
| librustdoc | ||
| llvm-project@fd949f3034 | ||
| rustdoc-json-types | ||
| tools | ||
| README.md | ||
| stage0.json | ||
| version | ||
This directory contains some source code for the Rust project, including:
- The bootstrapping build system
- Various submodules for tools, like cargo, tidy, etc.
For more information on how various parts of the compiler work, see the rustc dev guide.