move unescape module to rustc_lexer
It makes sense to keep the definition of escape sequences closer to the lexer itself, and it is also a bit of code that I would like to share with rust-analyzer.
r? @petrochenkov
The idea here is to make a reusable library out of the existing
rust-lexer, by separating out pure lexing and rustc-specific concerns,
like spans, error reporting an interning.
So, rustc_lexer operates directly on `&str`, produces simple tokens
which are a pair of type-tag and a bit of original text, and does not
report errors, instead storing them as flags on the token.
Remove support for 1-token lookahead from the lexer
`StringReader` maintained `peek_token` and `peek_span_src_raw` for look ahead.
`peek_token` was used only by rustdoc syntax coloring. After moving peeking logic into highlighter, I was able to remove `peek_token` from the lexer. I tried to use `iter::Peekable`, but that wasn't as pretty as I hoped, due to buffered fatal errors. So I went with hand-rolled peeking.
After that I've noticed that the only peeking behavior left was for raw tokens to test tt jointness. I've rewritten it in terms of trivia tokens, and not just spans.
After that it became possible to simplify the awkward constructor of the lexer, which could return `Err` if the first peeked token contained error.
The reader itself doesn't need ability to peek tokens, so it's better
if clients implement this functionality.
This hopefully becomes especially easy once we use iterator interface
for lexer, but this is not too easy at the moment, because of buffered
errors.
Allow attributes in formal function parameters
Implements https://github.com/rust-lang/rust/issues/60406.
This is my first contribution to the compiler and since this is a large and complex project, I am not fully aware of the consequences of the changes I have made.
**TODO**
- [x] Forbid some built-in attributes.
- [x] Expand cfg/cfg_attr
lexer: Disallow bare CR in raw byte strings
Handles bare CR ~but doesn't translate `\r\n` to `\n` yet in raw strings yet~ and translates CRLF to LF in raw strings.
As a side-note I think it'd be good to change the `unescape_` to return plain iterators to reduce some boilerplate (e.g. `has_error` could benefit from collecting `Result<T>` and aborting early on errors) but will do that separately, unless I missed something here that prevents it.
@matklad @petrochenkov thoughts?
It was commented out as part of
8a8e497ae7.
Done probably by accident, since the code in question was moved to a
match arm, along with newly introduced logic to detect bare CRs in raw
strings.