user0/rust - Forgejo: Beyond coding. We Forge.

user0/rust

Author	SHA1	Message	Date
Markus Reiter	3628a8f326	Remove unneeded parentheses.	2025-03-08 12:56:00 +01:00
Markus Reiter	224dad154b	Fix formatting.	2025-03-08 12:47:40 +01:00
Markus Reiter	90ebc24607	Use `intrinsics::assume` instead of `hint::assert_unchecked`.	2025-03-07 20:19:12 +01:00
Markus Reiter	22725588d3	Never inline `lookup_slow`.	2025-03-07 20:17:52 +01:00
Markus Reiter	34ac75be28	Add second precondition for `skip_search`.	2025-03-06 21:38:39 +01:00
Markus Reiter	222adac953	Allow optimizing out `panic_bounds_check` in Unicode checks.	2025-03-06 21:38:39 +01:00
bjorn3	1fcae03369	Rustfmt	2025-02-08 22:12:13 +00:00
Boxy	22998f0785	update cfgs	2024-11-27 15:14:54 +00:00
Ralf Jung	eddab479fd	stabilize const_unicode_case_lookup	2024-11-12 15:13:31 +01:00
bors	cf2b370ad0	Auto merge of #132500 - RalfJung:char-is-whitespace-const, r=jhpratt make char::is_whitespace unstably const I am adding this to the existing https://github.com/rust-lang/rust/issues/132241 feature gate, since `is_digit` and `is_whitespace` seem similar enough that one can group them together.	2024-11-06 04:07:32 +00:00
Matthias Krüger	b438a5cd2a	Rollup merge of #132499 - RalfJung:unicode_data.rs, r=tgross35 unicode_data.rs: show command for generating file https://github.com/rust-lang/rust/pull/131647 made this an easily runnable tool, now we just have to mention that in the comment. :) Fixes https://github.com/rust-lang/rust/issues/131640.	2024-11-03 12:08:51 +01:00
Ralf Jung	0804815e69	make char::is_whitespace unstably const	2024-11-02 10:17:16 +01:00
Ralf Jung	720d618b5f	unicode_data.rs: show command for generating file	2024-11-02 10:06:52 +01:00
Ralf Jung	66351a6184	get rid of a whole bunch of unnecessary rustc_const_unstable attributes	2024-11-02 09:59:55 +01:00
Matthias Krüger	fb42a4581b	Rollup merge of #131647 - jieyouxu:unicode-table-generator, r=Mark-Simulacrum Register `src/tools/unicode-table-generator` as a runnable tool It seems like `src/tools/unicode-table-generator` is not currently managed by bootstrap. This PR wires it up with bootstrap as a runnable tool. This tool seems to take two possible args: 1. (Mandatory) path to `library/core/src/unicode/unicode_data.rs`, and 2. (Optional) path to generate a test file. I only passed the mandatory path to `unicode_data.rs` in bootstrap and didn't do anything about (2). I'm not sure about how this tool is supposed to be run. `Cargo.lock` is modified because I renamed `unicode-table-generator`'s bin name to match the tool name, as bootstrap's tool running logic expects the bin name to be derived from the tool name. I also added a triagebot message to remind to not manually edit the library source file and edit the tool then regenerate instead, but this should probably be a tidy check (if that's desirable then that can be in a follow-up PR, though may be overkill). Helps with #131640 but does not close it because still no docs. r? `@Mark-Simulacrum` (since I think you authored this tool?)	2024-10-20 16:54:09 +02:00
许杰友 Jieyou Xu (Joe)	75a9c86a77	unicode-table-generator: sync comments These comments were updated on master but not through this tool, so the comments in the tool became outdated. Sync the comments to stay consistent.	2024-10-13 19:33:10 +08:00
许杰友 Jieyou Xu (Joe)	d21aa86c65	unicode-table-generator: match bin name with tool name Bootstrap assumes that the binary name is the same as tool name, just makes everyone's lives easier.	2024-10-13 19:14:06 +08:00
Ralf Jung	90e4f10f6c	switch unicode-data back to 'static'	2024-10-13 11:53:06 +02:00
Michael Goulet	c682aa162b	Reformat using the new identifier sorting from rustfmt	2024-09-22 19:11:29 -04:00
Nicholas Nethercote	84ac80f192	Reformat `use` declarations. The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.	2024-07-29 08:26:52 +10:00
Arpad Borsos	488598c183	Add a lower bound check to `unicode-table-generator` output This adds a dedicated check for the lower bound (if it is outside of ASCII range) to the output of the `unicode-table-generator` tool. This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now, as that is the only one with a lower bound outside of ASCII.	2024-04-20 10:16:45 +02:00
KaDiWa	ad2b34d0e3	remove some unneeded imports	2023-04-12 19:27:18 +02:00
Martin Gammelsæter	54f55efb9a	Use hex literal for INDEX_MASK	2023-03-21 09:59:47 +01:00
Martin Gammelsæter	355e1dda1d	Improve case mapping encoding scheme The indices are encoded as `u32`s in the range of invalid `char`s, so that we know that if any mapping fails to parse as a `char` we should use the value for lookup in the multi-table. This avoids the second binary search in cases where a multi-`char` mapping is needed. Idea from @nikic	2023-03-16 21:42:15 +01:00
Martin Gammelsæter	f9bd884385	Split unicode case LUTs in single and multi variants The majority of char case replacements are single char replacements, so storing them as [char; 3] wastes a lot of space. This commit splits the replacement tables for both `to_lower` and `to_upper` into two separate tables, one with single-character mappings and one with multi-character mappings. This reduces the binary size for programs using all of these tables with roughly 24K bytes.	2023-03-16 12:34:04 +01:00
Martin Gammelsæter	8a4eb9e3a8	Skip serializing ascii chars in case LUTs Since ascii chars are already handled by a special case in the `to_lower` and `to_upper` functions, there's no need to waste space on them in the LUTs.	2023-03-15 17:27:23 +01:00
Sage Mitchell	2b328ea5ee	Address feedback from PR #101401	2022-09-04 08:07:53 -07:00
Sage Mitchell	4a3e169da7	Make `char::is_lowercase` and `char::is_uppercase` const Implements #101400.	2022-09-04 08:07:53 -07:00
bors	ce36e88256	Auto merge of #100497 - kadiwa4:remove_clone_into_iter, r=cjgillot Avoid cloning a collection only to iterate over it `@rustbot` label: +C-cleanup	2022-08-28 18:31:08 +00:00
Yuki Okushi	e31bedc9cf	Rollup merge of #100924 - est31:closure_to_fn_ptr, r=Mark-Simulacrum Smaller improvements of tidy and the unicode generator	2022-08-27 13:14:19 +09:00
est31	754b3e7567	Change hint to correct path	2022-08-23 19:06:27 +02:00
est31	0a6af989f6	Simplify unicode_downloads.rs Reduce duplication by moving fetching logic into a dedicated function.	2022-08-23 19:04:07 +02:00
KaDiWa	4eebcb9910	avoid cloning and then iterating	2022-08-13 16:16:52 +02:00
Bruce A. MacNaughton	5d048eb69d	add #inline	2022-07-20 16:13:54 -07:00
Bruce A. MacNaughton	89ace470dc	formatted	2022-07-19 18:03:18 -07:00
Bruce A. MacNaughton	d4819632e2	working updates	2022-07-19 17:35:19 -07:00
T-O-R-U-S	72a25d05bf	Use implicit capture syntax in format_args This updates the standard library's documentation to use the new syntax. The documentation is worthwhile to update as it should be more idiomatic (particularly for features like this, which are nice for users to get acquainted with). The general codebase is likely more hassle than benefit to update: it'll hurt git blame, and generally updates can be done by folks updating the code if (and when) that makes things more readable with the new format. A few places in the compiler and library code are updated (mostly just due to already having been done when this commit was first authored).	2022-03-10 10:23:40 -05:00
Josh Stone	6b0b417299	Let unicode-table-generator fail gracefully for bitsets The "Alphabetic" property in Unicode 14 grew too big for the bitset representation, panicking "cannot pack 264 into 8 bits". However, we were already choosing the skiplist for that anyway, so this doesn't need to be a hard failure. That panic is now a returned `Err`, and then in `emit_codepoints` we automatically defer to skiplist.	2021-10-06 17:35:49 -07:00
Josh Stone	e159d42a9a	Redo #81358 in unicode-table-generator	2021-10-06 15:45:17 -07:00
Mark Rousskov	c746be2219	Migrate to 2021	2021-09-20 22:21:42 -04:00
Jade	3cf820e17d	rfc3052: Remove authors field from Cargo manifests Since RFC 3052 soft deprecated the authors field anyway, hiding it from crates.io, docs.rs, and making Cargo not add it by default, and it is not generally up to date/useful information, we should remove it from crates in this repo.	2021-07-29 14:56:05 -07:00
Matthias Krüger	ba6b4274b5	unicode_table_generator: fix clippy::writeln_empty_string, clippy::useless_format, clippy:::for_kv_map	2020-08-24 00:43:50 +02:00
Izzy Swart	b809f453ca	Fix typo "biset" -> "bitset"	2020-08-06 16:13:29 -07:00
mark	2c31b45ae8	mv std libs to library/	2020-07-27 19:51:13 -05:00
Lzu Tao	fff822fead	Migrate to numeric associated consts	2020-06-10 01:35:47 +00:00
Pyfisch	7f4048c710	Store UNICODE_VERSION as a tuple Remove the UnicodeVersion struct containing major, minor and update fields and replace it with a 3-tuple containing the version number. As the value of each field is limited to 255 use u8 to store them.	2020-04-11 12:56:25 +02:00
Mark Rousskov	ad679a7f43	Update the documentation comment	2020-03-27 19:02:23 -04:00
Mark Rousskov	b6bc906004	Remove separate encoding for a single nonzero-mapping byte In practice, for the two data sets that still use the bitset encoding (uppercase and lowercase) this is not a significant win, so just drop it entirely. It costs us about 5 bytes, and the complexity is nontrivial.	2020-03-27 19:02:23 -04:00
Mark Rousskov	9c1ceece20	Add skip list based implementation for smaller encoding This arranges for the sparser sets (everything except lower and uppercase) to be encoded in a significantly smaller context. However, it is also a performance trade-off (roughly 3x slower than the bitset encoding). The 40% size reduction is deemed to be sufficiently important to merit this performance loss, particularly as it is unlikely that this code is hot anywhere (and if it is, paying the memory cost for a bitset that directly represents the data seems worthwhile). Alphabetic : 1599 bytes (- 937 bytes) Case_Ignorable : 949 bytes (- 822 bytes) Cased : 359 bytes (- 429 bytes) Cc : 9 bytes (- 15 bytes) Grapheme_Extend: 813 bytes (- 675 bytes) Lowercase : 863 bytes N : 419 bytes (- 619 bytes) Uppercase : 776 bytes White_Space : 37 bytes (- 46 bytes) Total table sizes: 5824 bytes (-3543 bytes)	2020-03-27 19:02:23 -04:00
Mark Rousskov	33b9e6f5cf	Add richer printing	2020-03-24 16:24:47 -04:00

1 2

62 commits