Commit graph

161 commits

Author SHA1 Message Date
Nicholas Nethercote
9df1576e1d Rename ParseSess::span_diagnostic as ParseSess::dcx. 2023-12-18 16:06:21 +11:00
Nicholas Nethercote
cde19c016e Rename Handler as DiagCtxt. 2023-12-18 16:06:19 +11:00
bors
3ad8e2d129 Auto merge of #118897 - nnethercote:more-unescaping-cleanups, r=fee1-dead
More unescaping cleanups

More minor improvements I found while working on #118699.

r? `@fee1-dead`
2023-12-16 08:52:06 +00:00
Nicholas Nethercote
e3b7ecc1ef Remove one use of span_bug_no_panic.
It's unclear why this is used here. All entries in the third column of
`UNICODE_ARRAY` are covered by `ASCII_ARRAY`, so if the lookup fails
it's a genuine compiler bug. It was added way back in #29837, for no
clear reason.

This commit changes it to `span_bug`, which is more typical.
2023-12-14 15:53:55 +11:00
Nicholas Nethercote
423bf4233d Rename the span args to emit_unescape_error.
The `span` arg is described in a comment as "interior span of the
literal, without quotes", which is incorrect. It's actually the span of
the error part of the literal, corresponding to `range`.

This commit renames `span` and `span_without_quotes` to make things
clearer, and fixes the erroneous comment.
2023-12-13 10:05:57 +11:00
Nicholas Nethercote
4cfdbd328b Add spacing information to delimiters.
This is an extension of the previous commit. It means the output of
something like this:
```
stringify!(let a: Vec<u32> = vec![];)
```
goes from this:
```
let a: Vec<u32> = vec![] ;
```
With this PR, it now produces this string:
```
let a: Vec<u32> = vec![];
```
2023-12-11 09:36:40 +11:00
Nicholas Nethercote
925f7fad57 Improve print_tts by changing tokenstream::Spacing.
`tokenstream::Spacing` appears on all `TokenTree::Token` instances,
both punct and non-punct. Its current usage:
- `Joint` means "can join with the next token *and* that token is a
  punct".
- `Alone` means "cannot join with the next token *or* can join with the
  next token but that token is not a punct".

The fact that `Alone` is used for two different cases is awkward.
This commit augments `tokenstream::Spacing` with a new variant
`JointHidden`, resulting in:
- `Joint` means "can join with the next token *and* that token is a
  punct".
- `JointHidden` means "can join with the next token *and* that token is a
  not a punct".
- `Alone` means "cannot join with the next token".

This *drastically* improves the output of `print_tts`. For example,
this:
```
stringify!(let a: Vec<u32> = vec![];)
```
currently produces this string:
```
let a : Vec < u32 > = vec! [] ;
```
With this PR, it now produces this string:
```
let a: Vec<u32> = vec![] ;
```
(The space after the `]` is because `TokenTree::Delimited` currently
doesn't have spacing information. The subsequent commit fixes this.)

The new `print_tts` doesn't replicate original code perfectly. E.g.
multiple space characters will be condensed into a single space
character. But it's much improved.

`print_tts` still produces the old, uglier output for code produced by
proc macros. Because we have to translate the generated code from
`proc_macro::Spacing` to the more expressive `token::Spacing`, which
results in too much `proc_macro::Along` usage and no
`proc_macro::JointHidden` usage. So `space_between` still exists and
is used by `print_tts` in conjunction with the `Spacing` field.

This change will also help with the removal of `Token::Interpolated`.
Currently interpolated tokens are pretty-printed nicely via AST pretty
printing. `Token::Interpolated` removal will mean they get printed with
`print_tts`. Without this change, that would result in much uglier
output for code produced by decl macro expansions. With this change, AST
pretty printing and `print_tts` produce similar results.

The commit also tweaks the comments on `proc_macro::Spacing`. In
particular, it refers to "compound tokens" rather than "multi-char
operators" because lifetimes aren't operators.
2023-12-11 09:19:09 +11:00
bors
63d16b5a98 Auto merge of #117472 - jmillikin:stable-c-str-literals, r=Nilstrieb
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: https://github.com/rust-lang/rust/issues/105723

Documentation PR (reference manual): https://github.com/rust-lang/reference/pull/1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR https://github.com/rust-lang/rust/pull/108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR https://github.com/rust-lang/rust/pull/113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
2023-12-01 13:33:55 +00:00
Nilstrieb
21a870515b Fix clippy::needless_borrow in the compiler
`x clippy compiler -Aclippy::all -Wclippy::needless_borrow --fix`.

Then I had to remove a few unnecessary parens and muts that were exposed
now.
2023-11-21 20:13:40 +01:00
sjwang05
f88cf0206f
Move unclosed delim errors to separate function 2023-11-11 13:39:08 -08:00
sjwang05
a49368f00b
Correctly handle while-let-chains 2023-11-10 12:13:53 -08:00
sjwang05
9455259450
Catch an edge case 2023-11-09 20:07:17 -08:00
sjwang05
0094238157
Catch stray { in let-chains 2023-11-09 18:47:49 -08:00
John Millikin
0f41bc21b9 Stabilize C string literals 2023-11-01 09:16:34 +09:00
Esteban Küber
50ca5ef07f When encountering unclosed delimiters during parsing, check for diff markers
Fix #116252.
2023-10-30 00:56:46 +00:00
Michael Goulet
b2d2184ede Format all the let chains in compiler 2023-10-13 08:59:36 +00:00
Nicholas Nethercote
bb9c2f50c3 Reorder an expression to improve readability. 2023-10-12 08:46:15 +11:00
Nicholas Nethercote
becf4942a2 Rename Token::is_op as Token::is_punct.
For consistency with `proc_macro::Punct`.
2023-10-12 08:46:15 +11:00
beetrees
072d8c8bbc
Fix suggestion for attempting to define a string with single quotes 2023-08-16 21:51:57 +01:00
bjorn3
ef2da4a49b Remove reached_eof from ParseSess
It was only ever set in a function which isn't called anywhere.
2023-08-13 13:33:37 +00:00
Matthias Krüger
23815467a2 inline format!() args up to and including rustc_middle 2023-07-30 13:18:33 +02:00
bors
23405bb123 Auto merge of #113476 - fee1-dead-contrib:c-str-lit, r=petrochenkov
Reimplement C-str literals

This reverts #113334, cc `@fmease.`

While converting lexer tokens to ast Tokens in `rustc_parse`, we check the edition of the span of the token. If the edition < 2021, we split the token into two, one being the identifier and other being the str literal.
2023-07-25 12:04:34 +00:00
Deadbeef
a0376e9ec2 extract common code 2023-07-25 09:24:12 +00:00
Matthias Krüger
ed4c5fef72 fix some clippy::style findings
comparison_to_empty
iter_nth_zero
for_kv_map
manual_next_back
redundant_pattern
2023-07-23 23:36:56 +02:00
Deadbeef
df9bd80d74 reimplement C string literals 2023-07-23 06:54:07 +00:00
Hankai Zhang
6336da9a75 Use a better link 2023-06-10 14:46:11 -04:00
Hankai Zhang
e5fccf927d Update links to Rust Reference page on literals in diagnostic
Instead of linking to the old Rust Reference site on static.rust-lang.org,
link to the current website doc.rust-lang.org/stable/reference instead in
diagnostic about incorrect literals.
2023-06-10 12:34:16 -04:00
Nicholas Nethercote
01e33a3600 Avoid &format("...") calls in error message code.
Error message all end up passing into a function as an `impl
Into<{D,Subd}iagnosticMessage>`. If an error message is creatd as
`&format("...")` that means we allocate a string (in the `format!`
call), then take a reference, and then clone (allocating again) the
reference to produce the `{D,Subd}iagnosticMessage`, which is silly.

This commit removes the leading `&` from a lot of these cases. This
means the original `String` is moved into the
`{D,Subd}iagnosticMessage`, avoiding the double allocations. This
requires changing some function argument types from `&str` to `String`
(when all arguments are `String`) or `impl
Into<{D,Subd}iagnosticMessage>` (when some arguments are `String` and
some are `&str`).
2023-05-16 17:59:56 +10:00
Dylan DPC
4891f02cff
Rollup merge of #108801 - fee1-dead-contrib:c-str, r=compiler-errors
Implement RFC 3348, `c"foo"` literals

RFC: https://github.com/rust-lang/rfcs/pull/3348
Tracking issue: #105723
2023-05-05 18:40:33 +05:30
Nicholas Nethercote
6b62f37402 Restrict From<S> for {D,Subd}iagnosticMessage.
Currently a `{D,Subd}iagnosticMessage` can be created from any type that
impls `Into<String>`. That includes `&str`, `String`, and `Cow<'static,
str>`, which are reasonable. It also includes `&String`, which is pretty
weird, and results in many places making unnecessary allocations for
patterns like this:
```
self.fatal(&format!(...))
```
This creates a string with `format!`, takes a reference, passes the
reference to `fatal`, which does an `into()`, which clones the
reference, doing a second allocation. Two allocations for a single
string, bleh.

This commit changes the `From` impls so that you can only create a
`{D,Subd}iagnosticMessage` from `&str`, `String`, or `Cow<'static,
str>`. This requires changing all the places that currently create one
from a `&String`. Most of these are of the `&format!(...)` form
described above; each one removes an unnecessary static `&`, plus an
allocation when executed. There are also a few places where the existing
use of `&String` was more reasonable; these now just use `clone()` at
the call site.

As well as making the code nicer and more efficient, this is a step
towards possibly using `Cow<'static, str>` in
`{D,Subd}iagnosticMessage::{Str,Eager}`. That would require changing
the `From<&'a str>` impls to `From<&'static str>`, which is doable, but
I'm not yet sure if it's worthwhile.
2023-05-03 08:44:39 +10:00
Deadbeef
d30c668175 make cook generic 2023-05-02 10:32:08 +00:00
Deadbeef
abb181dfd9 make it semantic error 2023-05-02 10:32:08 +00:00
Deadbeef
4c01d494b8 refactor unescape 2023-05-02 10:32:07 +00:00
Deadbeef
8ff3903643 initial step towards implementing C string literals 2023-05-02 10:30:09 +00:00
clubby789
0138513635 Fix static string lints 2023-04-25 18:59:55 +01:00
Josh Soref
e09d0d2a29 Spelling - compiler
* account
* achieved
* advising
* always
* ambiguous
* analysis
* annotations
* appropriate
* build
* candidates
* cascading
* category
* character
* clarification
* compound
* conceptually
* constituent
* consts
* convenience
* corresponds
* debruijn
* debug
* debugable
* debuggable
* deterministic
* discriminant
* display
* documentation
* doesn't
* ellipsis
* erroneous
* evaluability
* evaluate
* evaluation
* explicitly
* fallible
* fulfill
* getting
* has
* highlighting
* illustrative
* imported
* incompatible
* infringing
* initialized
* into
* intrinsic
* introduced
* javascript
* liveness
* metadata
* monomorphization
* nonexistent
* nontrivial
* obligation
* obligations
* offset
* opaque
* opportunities
* opt-in
* outlive
* overlapping
* paragraph
* parentheses
* poisson
* precisely
* predecessors
* predicates
* preexisting
* propagated
* really
* reentrant
* referent
* responsibility
* rustonomicon
* shortcircuit
* simplifiable
* simplifications
* specify
* stabilized
* structurally
* suggestibility
* translatable
* transmuting
* two
* unclosed
* uninhabited
* visibility
* volatile
* workaround

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2023-04-17 16:09:18 -04:00
bors
9693b178fc Auto merge of #110252 - matthiaskrgr:rollup-ovaixra, r=matthiaskrgr
Rollup of 8 pull requests

Successful merges:

 - #109810 (Replace rustdoc-ui/{c,z}-help tests with a stable run-make test )
 - #110035 (fix: ensure bad `#[test]` invocs retain correct AST)
 - #110089 (sync::mpsc: synchronize receiver disconnect with initialization)
 - #110103 (Report overflows gracefully with new solver)
 - #110122 (Fix x check --stage 1 when download-ci-llvm=false)
 - #110133 (Do not use ImplDerivedObligationCause for inherent impl method error reporting)
 - #110135 (Revert "Don't recover lifetimes/labels containing emojis as character literals")
 - #110235 (Fix `--extend-css` option)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
2023-04-12 22:19:29 +00:00
Matthias Krüger
57393be6fb
Rollup merge of #110135 - compiler-errors:revert-108031, r=davidtwco
Revert "Don't recover lifetimes/labels containing emojis as character literals"

Reverts PR #108031 per https://github.com/rust-lang/rust/pull/109754#issuecomment-1490452045

Fixes (doesnt close until beta backported) #109746

This reverts commit e3f9db5fc3.
This reverts commit 98b82aedba.
This reverts commit 380fa26413.
2023-04-12 22:04:35 +02:00
DaniPopes
677357d32b
Fix typos in compiler 2023-04-10 22:02:52 +02:00
Michael Goulet
a047064d6b Revert "Don't recover lifetimes/labels containing emojis as character literals"
Reverts PR #108031
Fixes (doesnt close until beta backported) #109746

This reverts commit e3f9db5fc3.
This reverts commit 98b82aedba.
This reverts commit 380fa26413.
2023-04-10 06:52:41 +00:00
Nilstrieb
4b4948c2e3 Remove identity casts 2023-04-09 23:22:14 +02:00
Nilstrieb
81c320ea77 Fix some clippy::complexity 2023-04-09 23:22:14 +02:00
Oli Scherer
7edd1d8799 Replace another lock with an append-only vec 2023-04-04 09:01:44 +00:00
Maybe Waffle
775bacd1b8 Simplify sort_by calls 2023-03-07 18:13:41 +00:00
yukang
f808877bbf refactor parse_token_trees to not return unmatched_delims 2023-02-28 07:57:17 +00:00
yukang
9ce7472db4 rename unmatched_braces to unmatched_delims 2023-02-28 07:57:17 +00:00
yukang
65ad5f8de7 remove duplicated diagnostic for unclosed delimiter 2023-02-28 07:57:17 +00:00
许杰友 Jieyou Xu (Joe)
380fa26413
Don't recover lifetimes/labels containing emojis as character literals
Note that at the time of this commit, `unic-emoji-char` seems to have
data tables only up to Unicode 5.0, but Unicode is already newer than
this.

A newer emoji such as `🥺` will not be recognized as an emoji
but older emojis such as `🐱` will.
2023-02-14 17:31:58 +08:00
clubby789
521c5f36d6 Migrate rustc_parse to derive diagnostics 2023-02-06 14:40:35 +00:00
Matthias Krüger
e3048c7838
Rollup merge of #104012 - chenyukang:yukang/fix-103882-deli-indentation, r=petrochenkov
Improve unexpected close and mismatch delimiter hint in TokenTreesReader

Fixes #103882
Fixes #68987
Fixes #69259

The inner indentation mismatching will be covered by outer block, the new added function `report_error_prone_delim_block` will find out the error prone candidates for reporting.
2023-01-28 11:11:05 +01:00