Commit graph

46 commits

Author SHA1 Message Date
Ross MacArthur
f7256d28d1
Require issue = "none" over issue = "0" in unstable attributes 2019-12-21 13:16:18 +02:00
David Tolnay
28eb31f8dc
Make libcore/unicode/tables.rs compatible with rustfmt 2019-11-29 20:17:11 -08:00
David Tolnay
f4cff27792
Make libcore/unicode/printable.rs compatible with rustfmt 2019-11-29 20:17:10 -08:00
David Tolnay
95e00bfed8
Format libcore with rustfmt
This commit applies rustfmt with default settings to files in
src/libcore *that are not involved in any currently open PR* to minimize
merge conflicts. The list of files involved in open PRs was determined
by querying GitHub's GraphQL API with this script:
https://gist.github.com/dtolnay/aa9c34993dc051a4f344d1b10e4487e8

With the list of files from the script in `outstanding_files`, the
relevant commands were:

    $ find src/libcore -name '*.rs' | xargs rustfmt --edition=2018
    $ rg libcore outstanding_files | xargs git checkout --

Repeating this process several months apart should get us coverage of
most of the rest of libcore.
2019-11-26 23:02:11 -08:00
Guanqun Lu
ba7d1b80d0 it's more pythonic to use 'is not None' in python files 2019-09-06 15:14:25 +08:00
Aleksey Kladov
a0c186c34f remove XID and Pattern_White_Space unicode tables from libcore
They are only used by rustc_lexer, and are not needed elsewhere.

So we move the relevant definitions into rustc_lexer (while the actual
unicode data comes from the unicode-xid crate) and make the rest of
the compiler use it.
2019-09-04 13:11:11 +03:00
Matthew Jasper
7b41fd2158 Make some items in core::unicode private
They were reachable through opaque macros defined in `core`
2019-08-05 23:50:47 +01:00
Mazdak Farrokhzad
68d94bd741
Rollup merge of #62084 - euclio:unicode-table-tweak, r=kennytm
allow clippy::unreadable_literal in unicode tables

Also modifies the generation script to emit 2018 edition paths.
2019-07-26 18:56:33 +02:00
Andy Russell
dee3d27d9d
allow clippy::unreadable_literal in unicode tables
Also modifies the generation script to emit 2018 edition paths.
2019-07-12 22:51:37 -04:00
Josh Stone
de1e489115 Regenerate character tables for Unicode 12.1 2019-07-12 16:29:40 -07:00
Josh Stone
76128c304d Update unicode scripts for the current coding style 2019-07-12 16:28:58 -07:00
Mazdak Farrokhzad
327c54ed02
Rollup merge of #60081 - pawroman:cleanup_unicode_script, r=varkor
Refactor unicode.py script

Hi, I noticed that the `unicode.py` script used some deprecated escapes in regular expressions. E.g. `\d`, `\w`, `\.` will be illegal in the future without "raw strings". This is now fixed. I have also cleaned up the script quite a bit.

## Escape deprecation

OK (note the `r`):
`re.compile(r"\d")`

Deprecated (from Python 3.6 onwards, see [here][link1] and [here][link2]):
`re.compile("\d")`.

[link1]: https://docs.python.org/3.6/whatsnew/3.6.html#deprecated-python-behavior
[link2]: https://bugs.python.org/issue27364

This was evident running the script using Python 3.7 like so:

```
$ python3 -Wall unicode.py
unicode.py:227: DeprecationWarning: invalid escape sequence \w
  re1 = re.compile("^ *([0-9A-F]+) *; *(\w+)")
unicode.py:228: DeprecationWarning: invalid escape sequence \.
  re2 = re.compile("^ *([0-9A-F]+)\.\.([0-9A-F]+) *; *(\w+)")
unicode.py:453: DeprecationWarning: invalid escape sequence \d
  pattern = "for Version (\d+)\.(\d+)\.(\d+) of the Unicode"
```

The documentation states that
> A backslash-character pair that is not a valid escape sequence now generates a DeprecationWarning. Although this will eventually become a SyntaxError, that will not be for several Python releases.

## Testing

To test my changes, I had to add support for choosing the Unicode version to use. The script will default to latest release (which is 12.0.0 at the moment, repo has 11.0.0 checked in).

The script generates the exact same output for version 11.0.0 with Python 2.7 and 3.7 and no longer generates any deprecation warnings:

```
$ python3 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 --version
Python 2.7.16
$ python3 --version
Python 3.7.3
```

## Extra functionality

Furthermore, the script will check and download the latest Unicode version by default (without the `-v` argument). The `--help` is below:

```
$ ./unicode.py --help
usage: unicode.py [-h] [-v VERSION]

Regenerate Unicode tables (tables.rs).

optional arguments:
  -h, --help            show this help message and exit
  -v VERSION, --version VERSION
                        Unicode version to use (if not specified, defaults to
                        latest available final release).
```

## Cleanups

I have cleaned up the code quite a bit, with Python best practices and code style in mind. I'm happy to provide more details and rationale for all my changes if the reviewers so desire.

One externally visible change is that the Unicode data will now be downloaded into `src/libcore/unicode/downloaded` directory suffixed by Unicode version:

```
$ pwd
.../rust/src/libcore/unicode
$ exa -T downloaded/
downloaded
├── 11.0.0
│  ├── DerivedCoreProperties.txt
│  ├── DerivedNormalizationProps.txt
│  ├── PropList.txt
│  ├── ReadMe.txt
│  ├── Scripts.txt
│  ├── SpecialCasing.txt
│  └── UnicodeData.txt
└── 12.0.0
   ├── DerivedCoreProperties.txt
   ├── DerivedNormalizationProps.txt
   ├── PropList.txt
   ├── ReadMe.txt
   ├── Scripts.txt
   ├── SpecialCasing.txt
   └── UnicodeData.txt
```
2019-07-06 22:14:33 +02:00
Paweł Romanowski
2b47a085dd Address review remarks in unicode.py 2019-07-01 19:43:48 +02:00
Paweł Romanowski
60ccf89693
Apply suggestions from code review
Co-Authored-By: varkor <github@varkor.com>
2019-06-10 20:45:58 +02:00
Paweł Romanowski
2c9c978e1d Refactor and document unicode.py script 2019-04-19 11:42:08 +02:00
Paweł Romanowski
edbc27da2d Fix tidy errors 2019-04-18 17:14:31 +02:00
Paweł Romanowski
a580421afb More cleanups for unicode.py 2019-04-18 16:16:34 +02:00
Paweł Romanowski
89feb6d5fd Clean up unicode.py script 2019-04-18 15:30:50 +02:00
Taiki Endo
360432f1e8 libcore => 2018 2019-04-18 14:47:35 +09:00
Mark Rousskov
2a663555dd Remove licenses 2018-12-25 21:08:33 -07:00
ljedrz
d0c64bb296 cleanup: remove static lifetimes from consts 2018-12-04 12:46:10 +01:00
Mazdak Farrokhzad
e15c62d61f revert making internal APIs const fn. 2018-11-10 01:10:07 +01:00
Mazdak Farrokhzad
5b89877dda constify parts of libcore. 2018-11-10 01:07:32 +01:00
bors
c4156768aa Auto merge of #51609 - dscorbett:is_numeric, r=alexcrichton
Treat gc=No characters as numeric

[`char::is_numeric`](https://doc.rust-lang.org/std/primitive.char.html#method.is_numeric) and [`char::is_alphanumeric`](https://doc.rust-lang.org/std/primitive.char.html#method.is_alphanumeric) are documented to be defined “in terms of the Unicode General Categories 'Nd', 'Nl', 'No'”, but unicode.py does not group 'No' with the other 'N' categories. These functions therefore currently return `false` for characters like ⟨¾⟩ and ⟨①⟩.
2018-08-01 17:44:25 +00:00
Pazzaz
ad7621d42e Handle array manually in string case conversion methods 2018-07-06 17:20:39 +02:00
David Corbett
5150ff0c72 Treat gc=No characters as numeric 2018-06-17 13:47:47 -04:00
Josh Stone
f81e34b825 Regenerate character tables for Unicode 11 2018-06-11 10:54:30 -07:00
varkor
b6539372e9 Fix tables.rs 2018-05-21 19:12:36 +01:00
varkor
2fa22effb6 Avoid counting characters and add explanatory comment to test 2018-05-21 18:57:54 +01:00
varkor
d7aa35eb1b Use Grapheme_Extend instead of Mn 2018-05-21 18:57:54 +01:00
varkor
d3c257b0ae Use the correct output directory for downloading Unicode files 2018-05-21 18:57:54 +01:00
varkor
4694d20170 Escape combining characters in escape_debug 2018-05-21 18:57:54 +01:00
varkor
b72faf5795 Keep tables.rs copyright notice up to date 2018-05-21 18:57:54 +01:00
varkor
a0b5d3813e Download unicode data files in directory of unicode.py 2018-05-21 18:57:54 +01:00
varkor
f53022f88d Update unicode/tables.rs with Mn 2018-05-21 18:57:54 +01:00
Vadzim Dambrouski
f29e62aadf Fix a warning in libcore on 16bit targets.
This code is assuming that usize >= 32bits, but it is not the case on
16bit targets. It is producing a warning that will fail the compilation
on MSP430 if deny(warnings) is enabled.
It is very unlikely that someone would actually use this code on
a microcontroller, but since unicode was merged into libcore we
have compile it on 16bit targets.
2018-05-01 17:48:31 +03:00
Simon Sapin
ef41788cf3 Mark the rest of the unicode feature flag as perma-unstable. 2018-04-12 00:13:53 +02:00
Simon Sapin
1ca2905cda Dedicated tracking issue for UnicodeVersion and UNICODE_VERSION. 2018-04-12 00:13:53 +02:00
Simon Sapin
670e85339a Move core::char::printable to core::unicode::printable 2018-04-12 00:13:53 +02:00
Simon Sapin
d4ed1e6fa4 Merge unstable Utf16Encoder into EncodeUtf16 2018-04-12 00:13:53 +02:00
Simon Sapin
0d9afcd9b9 Merge core::unicode::str into core::str
And the UnicodeStr trait into StrExt
2018-04-12 00:13:52 +02:00
Simon Sapin
33358dc3c5 Remove the CharExt trait, now that libcore has inherent methods for char 2018-04-12 00:13:52 +02:00
Simon Sapin
34c52534f7 Move the rest of core::unicode::char to core::unicode 2018-04-12 00:13:52 +02:00
Simon Sapin
955450212a Move char decoding iterators into a separate private module. 2018-04-12 00:13:52 +02:00
Simon Sapin
939692409d Reexport from core::unicode::char in core::char rather than vice versa 2018-04-12 00:13:52 +02:00
Simon Sapin
5807be7ccb Move contents of libstd_unicode into libcore 2018-04-12 00:13:43 +02:00