Auto merge of #60261 - matklad:one-escape, r=petrochenkov

introduce unescape module

A WIP PR to gauge early feedback

Currently, we deal with escape sequences twice: once when we [lex](112f7e9ac5/src/libsyntax/parse/lexer/mod.rs (L928-L1065)) a string, and a second time when we [unescape](112f7e9ac5/src/libsyntax/parse/mod.rs (L313-L366)) literals. Note that we also produce different sets of diagnostics in these two cases.

This PR aims to remove this duplication, by introducing a new `unescape` module as a single source of truth for character escaping rules.

I think this would be a useful cleanup by itself, but I also need this for https://github.com/rust-lang/rust/pull/59706.

In the current state, the PR has `unescape` module which fully (modulo bugs) deals with string and char literals. I am quite happy about the state of this module

What this PR doesn't have yet are:
* [x] handling of byte and byte string literals (should be simple to add)
* [x] good diagnostics
* [x] actual removal of code from lexer (giant `scan_char_or_byte` should go away completely)
* [x] performance check
* [x] general cleanup of the new code

Diagnostics will be the most labor-consuming bit here, but they are mostly a question of just correctly adjusting spans to sub-tokens. The current setup for diagnostics is that `unescape` produces a plain old `enum` with various problems, and they are rendered into `Handler` separately. This bit is not actually required (it is possible to just pass the `Handler` in), but I like the separation between diagnostics and logic this approach imposes, and such separation should again be useful for #59706

cc @eddyb , @petrochenkov
This commit is contained in:
bors 2019-05-06 00:16:16 +00:00
commit 46d0ca00ad
28 changed files with 1047 additions and 784 deletions

View file

@ -1,3 +1,4 @@
// compile-flags: -Z continue-parse-after-error
// ignore-tidy-tab
fn main() {
@ -76,7 +77,6 @@ raw { \n
println!("\x7B}\u8 {", 1);
//~^ ERROR incorrect unicode escape sequence
//~| ERROR argument never used
// note: raw strings don't escape `\xFF` and `\u{FF}` sequences
println!(r#"\x7B}\u{8} {"#, 1);

View file

@ -1,13 +1,13 @@
error: incorrect unicode escape sequence
--> $DIR/format-string-error-2.rs:77:20
--> $DIR/format-string-error-2.rs:78:20
|
LL | println!("\x7B}\u8 {", 1);
| ^^-
| |
| help: format of unicode escape sequences uses braces: `\u{8}`
| |
| help: format of unicode escape sequences uses braces: `\u{8}`
error: invalid format string: expected `'}'`, found `'a'`
--> $DIR/format-string-error-2.rs:5:5
--> $DIR/format-string-error-2.rs:6:5
|
LL | format!("{
| - because of this opening brace
@ -17,7 +17,7 @@ LL | a");
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'b'`
--> $DIR/format-string-error-2.rs:9:5
--> $DIR/format-string-error-2.rs:10:5
|
LL | format!("{ \
| - because of this opening brace
@ -28,7 +28,7 @@ LL | b");
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'\'`
--> $DIR/format-string-error-2.rs:11:18
--> $DIR/format-string-error-2.rs:12:18
|
LL | format!(r#"{ \
| - ^ expected `}` in format string
@ -38,7 +38,7 @@ LL | format!(r#"{ \
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'\'`
--> $DIR/format-string-error-2.rs:15:18
--> $DIR/format-string-error-2.rs:16:18
|
LL | format!(r#"{ \n
| - ^ expected `}` in format string
@ -48,7 +48,7 @@ LL | format!(r#"{ \n
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'e'`
--> $DIR/format-string-error-2.rs:21:5
--> $DIR/format-string-error-2.rs:22:5
|
LL | format!("{ \n
| - because of this opening brace
@ -59,7 +59,7 @@ LL | e");
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'a'`
--> $DIR/format-string-error-2.rs:25:5
--> $DIR/format-string-error-2.rs:26:5
|
LL | {
| - because of this opening brace
@ -69,7 +69,7 @@ LL | a");
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'a'`
--> $DIR/format-string-error-2.rs:29:5
--> $DIR/format-string-error-2.rs:30:5
|
LL | {
| - because of this opening brace
@ -79,7 +79,7 @@ LL | a
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'b'`
--> $DIR/format-string-error-2.rs:35:5
--> $DIR/format-string-error-2.rs:36:5
|
LL | { \
| - because of this opening brace
@ -90,7 +90,7 @@ LL | b");
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'b'`
--> $DIR/format-string-error-2.rs:40:5
--> $DIR/format-string-error-2.rs:41:5
|
LL | { \
| - because of this opening brace
@ -101,7 +101,7 @@ LL | b \
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'\'`
--> $DIR/format-string-error-2.rs:45:8
--> $DIR/format-string-error-2.rs:46:8
|
LL | raw { \
| - ^ expected `}` in format string
@ -111,7 +111,7 @@ LL | raw { \
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'\'`
--> $DIR/format-string-error-2.rs:50:8
--> $DIR/format-string-error-2.rs:51:8
|
LL | raw { \n
| - ^ expected `}` in format string
@ -121,7 +121,7 @@ LL | raw { \n
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'e'`
--> $DIR/format-string-error-2.rs:57:5
--> $DIR/format-string-error-2.rs:58:5
|
LL | { \n
| - because of this opening brace
@ -132,7 +132,7 @@ LL | e");
= note: if you intended to print `{`, you can escape it using `{{`
error: invalid format string: expected `'}'`, found `'a'`
--> $DIR/format-string-error-2.rs:67:5
--> $DIR/format-string-error-2.rs:68:5
|
LL | {
| - because of this opening brace
@ -142,13 +142,13 @@ LL | asdf}
= note: if you intended to print `{`, you can escape it using `{{`
error: 1 positional argument in format string, but no arguments were given
--> $DIR/format-string-error-2.rs:70:17
--> $DIR/format-string-error-2.rs:71:17
|
LL | println!("\t{}");
| ^^
error: invalid format string: expected `'}'` but string was terminated
--> $DIR/format-string-error-2.rs:74:27
--> $DIR/format-string-error-2.rs:75:27
|
LL | println!("\x7B}\u{8} {", 1);
| -^ expected `'}'` in format string
@ -157,14 +157,6 @@ LL | println!("\x7B}\u{8} {", 1);
|
= note: if you intended to print `{`, you can escape it using `{{`
error: argument never used
--> $DIR/format-string-error-2.rs:77:28
|
LL | println!("\x7B}\u8 {", 1);
| ------------ ^ argument never used
| |
| formatting specifier missing
error: invalid format string: unmatched `}` found
--> $DIR/format-string-error-2.rs:82:21
|
@ -181,5 +173,5 @@ LL | println!(r#"\x7B}\u8 {"#, 1);
|
= note: if you intended to print `}`, you can escape it using `}}`
error: aborting due to 19 previous errors
error: aborting due to 18 previous errors

View file

@ -1,20 +1,20 @@
error: this form of character escape may only be used with characters in the range [\x00-\x7f]
--> $DIR/ascii-only-character-escape.rs:4:16
--> $DIR/ascii-only-character-escape.rs:4:14
|
LL | let x = "\x80";
| ^^
| ^^^^
error: this form of character escape may only be used with characters in the range [\x00-\x7f]
--> $DIR/ascii-only-character-escape.rs:5:16
--> $DIR/ascii-only-character-escape.rs:5:14
|
LL | let y = "\xff";
| ^^
| ^^^^
error: this form of character escape may only be used with characters in the range [\x00-\x7f]
--> $DIR/ascii-only-character-escape.rs:6:16
--> $DIR/ascii-only-character-escape.rs:6:14
|
LL | let z = "\xe2";
| ^^
| ^^^^
error: aborting due to 3 previous errors

View file

@ -34,11 +34,11 @@ error: byte constant must be ASCII. Use a \xHH escape for a non-ASCII byte
LL | b'é';
| ^
error: unterminated byte constant: b'a
--> $DIR/byte-literals.rs:14:5
error: unterminated byte constant
--> $DIR/byte-literals.rs:14:6
|
LL | b'a
| ^^^
| ^^^^
error: aborting due to 7 previous errors

View file

@ -23,10 +23,10 @@ LL | b"é";
| ^
error: unterminated double quote byte string
--> $DIR/byte-string-literals.rs:9:7
--> $DIR/byte-string-literals.rs:9:6
|
LL | b"a
| _______^
| ______^
LL | | }
| |__^

View file

@ -9,32 +9,27 @@ fn main() {
let _ = b'\u';
//~^ ERROR incorrect unicode escape sequence
//~^^ ERROR unicode escape sequences cannot be used as a byte or in a byte string
let _ = b'\x5';
//~^ ERROR numeric character escape is too short
let _ = b'\xxy';
//~^ ERROR invalid character in numeric character escape: x
//~^^ ERROR invalid character in numeric character escape: y
let _ = '\x5';
//~^ ERROR numeric character escape is too short
let _ = '\xxy';
//~^ ERROR invalid character in numeric character escape: x
//~^^ ERROR invalid character in numeric character escape: y
let _ = b"\u{a4a4} \xf \u";
//~^ ERROR unicode escape sequences cannot be used as a byte or in a byte string
//~^^ ERROR invalid character in numeric character escape:
//~^^^ ERROR incorrect unicode escape sequence
//~^^^^ ERROR unicode escape sequences cannot be used as a byte or in a byte string
let _ = "\xf \u";
//~^ ERROR invalid character in numeric character escape:
//~^^ ERROR form of character escape may only be used with characters in the range [\x00-\x7f]
//~^^^ ERROR incorrect unicode escape sequence
//~^^ ERROR incorrect unicode escape sequence
let _ = "\u8f";
//~^ ERROR incorrect unicode escape sequence

View file

@ -18,88 +18,58 @@ LL | let _ = b'\u';
|
= help: format of unicode escape sequences is `\u{...}`
error: unicode escape sequences cannot be used as a byte or in a byte string
--> $DIR/issue-23620-invalid-escapes.rs:10:15
|
LL | let _ = b'\u';
| ^^
error: numeric character escape is too short
--> $DIR/issue-23620-invalid-escapes.rs:14:17
--> $DIR/issue-23620-invalid-escapes.rs:13:15
|
LL | let _ = b'\x5';
| ^
| ^^^
error: invalid character in numeric character escape: x
--> $DIR/issue-23620-invalid-escapes.rs:17:17
--> $DIR/issue-23620-invalid-escapes.rs:16:17
|
LL | let _ = b'\xxy';
| ^
error: invalid character in numeric character escape: y
--> $DIR/issue-23620-invalid-escapes.rs:17:18
|
LL | let _ = b'\xxy';
| ^
error: numeric character escape is too short
--> $DIR/issue-23620-invalid-escapes.rs:21:16
--> $DIR/issue-23620-invalid-escapes.rs:19:14
|
LL | let _ = '\x5';
| ^
| ^^^
error: invalid character in numeric character escape: x
--> $DIR/issue-23620-invalid-escapes.rs:24:16
--> $DIR/issue-23620-invalid-escapes.rs:22:16
|
LL | let _ = '\xxy';
| ^
error: invalid character in numeric character escape: y
--> $DIR/issue-23620-invalid-escapes.rs:24:17
|
LL | let _ = '\xxy';
| ^
error: unicode escape sequences cannot be used as a byte or in a byte string
--> $DIR/issue-23620-invalid-escapes.rs:28:15
--> $DIR/issue-23620-invalid-escapes.rs:25:15
|
LL | let _ = b"\u{a4a4} \xf \u";
| ^^^^^^^^
error: invalid character in numeric character escape:
--> $DIR/issue-23620-invalid-escapes.rs:28:27
--> $DIR/issue-23620-invalid-escapes.rs:25:27
|
LL | let _ = b"\u{a4a4} \xf \u";
| ^
error: incorrect unicode escape sequence
--> $DIR/issue-23620-invalid-escapes.rs:28:28
--> $DIR/issue-23620-invalid-escapes.rs:25:28
|
LL | let _ = b"\u{a4a4} \xf \u";
| ^^ incorrect unicode escape sequence
|
= help: format of unicode escape sequences is `\u{...}`
error: unicode escape sequences cannot be used as a byte or in a byte string
--> $DIR/issue-23620-invalid-escapes.rs:28:28
|
LL | let _ = b"\u{a4a4} \xf \u";
| ^^
error: invalid character in numeric character escape:
--> $DIR/issue-23620-invalid-escapes.rs:34:17
--> $DIR/issue-23620-invalid-escapes.rs:30:17
|
LL | let _ = "\xf \u";
| ^
error: this form of character escape may only be used with characters in the range [\x00-\x7f]
--> $DIR/issue-23620-invalid-escapes.rs:34:16
|
LL | let _ = "\xf \u";
| ^^
error: incorrect unicode escape sequence
--> $DIR/issue-23620-invalid-escapes.rs:34:18
--> $DIR/issue-23620-invalid-escapes.rs:30:18
|
LL | let _ = "\xf \u";
| ^^ incorrect unicode escape sequence
@ -107,12 +77,12 @@ LL | let _ = "\xf \u";
= help: format of unicode escape sequences is `\u{...}`
error: incorrect unicode escape sequence
--> $DIR/issue-23620-invalid-escapes.rs:39:14
--> $DIR/issue-23620-invalid-escapes.rs:34:14
|
LL | let _ = "\u8f";
| ^^--
| |
| help: format of unicode escape sequences uses braces: `\u{8f}`
| |
| help: format of unicode escape sequences uses braces: `\u{8f}`
error: aborting due to 18 previous errors
error: aborting due to 13 previous errors

View file

@ -1,14 +1,14 @@
error: numeric character escape is too short
--> $DIR/lex-bad-char-literals-1.rs:3:8
--> $DIR/lex-bad-char-literals-1.rs:3:6
|
LL | '\x1'
| ^
| ^^^
error: numeric character escape is too short
--> $DIR/lex-bad-char-literals-1.rs:7:8
--> $DIR/lex-bad-char-literals-1.rs:7:6
|
LL | "\x1"
| ^
| ^^^
error: unknown character escape: \u{25cf}
--> $DIR/lex-bad-char-literals-1.rs:11:7

View file

@ -3,6 +3,10 @@ error: character literal may only contain one codepoint
|
LL | 'nope'
| ^^^^^^
help: if you meant to write a `str` literal, use double quotes
|
LL | "nope"
| ^^^^^^
error[E0601]: `main` function not found in crate `lex_bad_char_literals_2`
|

View file

@ -1,5 +1,5 @@
//
// This test needs to the last one appearing in this file as it kills the parser
static c: char =
' //~ ERROR: character literal may only contain one codepoint
' //~ ERROR: unterminated character literal
;

View file

@ -1,8 +1,8 @@
error: character literal may only contain one codepoint: '●
error: unterminated character literal
--> $DIR/lex-bad-char-literals-4.rs:4:5
|
LL | '●
| ^^
| ^^^^
error: aborting due to previous error

View file

@ -3,18 +3,30 @@ error: character literal may only contain one codepoint
|
LL | let x: &str = 'ab';
| ^^^^
help: if you meant to write a `str` literal, use double quotes
|
LL | let x: &str = "ab";
| ^^^^
error: character literal may only contain one codepoint
--> $DIR/lex-bad-char-literals-6.rs:4:19
|
LL | let y: char = 'cd';
| ^^^^
help: if you meant to write a `str` literal, use double quotes
|
LL | let y: char = "cd";
| ^^^^
error: character literal may only contain one codepoint
--> $DIR/lex-bad-char-literals-6.rs:6:13
|
LL | let z = 'ef';
| ^^^^
help: if you meant to write a `str` literal, use double quotes
|
LL | let z = "ef";
| ^^^^
error[E0277]: can't compare `&str` with `char`
--> $DIR/lex-bad-char-literals-6.rs:9:10

View file

@ -0,0 +1,14 @@
// compile-flags: -Z continue-parse-after-error
fn main() {
let _: char = '';
//~^ ERROR: empty character literal
let _: char = '\u{}';
//~^ ERROR: empty unicode escape (must have at least 1 hex digit)
// Next two are OK, but may befool error recovery
let _ = '/';
let _ = b'/';
let _ = ' hello // here's a comment
//~^ ERROR: unterminated character literal
}

View file

@ -0,0 +1,20 @@
error: empty character literal
--> $DIR/lex-bad-char-literals-7.rs:3:20
|
LL | let _: char = '';
| ^
error: empty unicode escape (must have at least 1 hex digit)
--> $DIR/lex-bad-char-literals-7.rs:5:20
|
LL | let _: char = '\u{}';
| ^^^^
error: unterminated character literal
--> $DIR/lex-bad-char-literals-7.rs:12:13
|
LL | let _ = ' hello // here's a comment
| ^^^^^^^^
error: aborting due to 3 previous errors

View file

@ -0,0 +1,10 @@
macro_rules! black_hole {
($($tt:tt)*) => {}
}
fn main() {
black_hole! { '\u{FFFFFF}' }
//~^ ERROR: invalid unicode character escape
black_hole! { "this is surrogate: \u{DAAA}" }
//~^ ERROR: invalid unicode character escape
}

View file

@ -0,0 +1,18 @@
error: invalid unicode character escape
--> $DIR/literals-are-validated-before-expansion.rs:6:20
|
LL | black_hole! { '\u{FFFFFF}' }
| ^^^^^^^^^^
|
= help: unicode escape must be at most 10FFFF
error: invalid unicode character escape
--> $DIR/literals-are-validated-before-expansion.rs:8:39
|
LL | black_hole! { "this is surrogate: \u{DAAA}" }
| ^^^^^^^^
|
= help: unicode escape must not be a surrogate
error: aborting due to 2 previous errors

View file

@ -1,8 +1,8 @@
error: unterminated unicode escape (needed a `}`)
--> $DIR/new-unicode-escapes-1.rs:2:21
--> $DIR/new-unicode-escapes-1.rs:2:14
|
LL | let s = "\u{2603";
| ^
| ^^^^^^^
error: aborting due to previous error

View file

@ -1,8 +1,8 @@
error: overlong unicode escape (must have at most 6 hex digits)
--> $DIR/new-unicode-escapes-2.rs:2:17
--> $DIR/new-unicode-escapes-2.rs:2:14
|
LL | let s = "\u{260311111111}";
| ^^^^^^^^^^^^
| ^^^^^^^^^^^^^^^^
error: aborting due to previous error

View file

@ -1,16 +1,16 @@
error: invalid unicode character escape
--> $DIR/new-unicode-escapes-3.rs:2:14
--> $DIR/new-unicode-escapes-3.rs:2:15
|
LL | let s1 = "\u{d805}";
| ^^^^^^^^^^
| ^^^^^^^^
|
= help: unicode escape must not be a surrogate
error: invalid unicode character escape
--> $DIR/new-unicode-escapes-3.rs:3:14
--> $DIR/new-unicode-escapes-3.rs:3:15
|
LL | let s2 = "\u{ffffff}";
| ^^^^^^^^^^^^
| ^^^^^^^^^^
|
= help: unicode escape must be at most 10FFFF

View file

@ -1,6 +1,5 @@
// run-rustfix
fn main() {
println!("{}", "●●"); //~ ERROR character literal may only contain one codepoint
//~^ ERROR format argument must be a string literal
println!("●●"); //~ ERROR character literal may only contain one codepoint
}

View file

@ -2,5 +2,4 @@
fn main() {
println!(''); //~ ERROR character literal may only contain one codepoint
//~^ ERROR format argument must be a string literal
}

View file

@ -8,15 +8,5 @@ help: if you meant to write a `str` literal, use double quotes
LL | println!("●●");
| ^^^^
error: format argument must be a string literal
--> $DIR/str-as-char.rs:4:14
|
LL | println!('●●');
| ^^^^
help: you might be missing a string literal to format with
|
LL | println!("{}", '●●');
| ^^^^^
error: aborting due to 2 previous errors
error: aborting due to previous error