In the common case, the string value in a string literal Token is the
same as the string value in a string literal LitKind. (The exception is
when escapes or \r are involved.) This patch takes advantage of that to
avoid calling str_lit() and re-interning the string in that case. This
speeds up incremental builds for a few of the rustc-benchmarks, the best
by 3%.
Rollup of 12 pull requests
Successful merges:
- #50302 (Add query search order check)
- #50320 (Fix invalid path generation in rustdoc search)
- #50349 (Rename "show type declaration" to "show declaration")
- #50360 (Clarify wordings of the `unstable_name_collision` lint.)
- #50365 (Use two vectors in nearest_common_ancestor.)
- #50393 (Allow unaligned reads in constants)
- #50401 (Revert "Implement FromStr for PathBuf")
- #50406 (Forbid constructing empty identifiers from concat_idents)
- #50407 (Always inline simple BytePos and CharPos methods.)
- #50416 (check if the token is a lifetime before parsing)
- #50417 (Update Cargo)
- #50421 (Fix ICE when using a..=b in a closure.)
Failed merges:
Implement tool_attributes feature (RFC 2103)
cc #44690
This is currently just a rebased and compiling (hopefully) version of #47773.
Let's see if travis likes this. I will add the implementation for `tool_lints` this week.
Discovered in #50061 we're falling off the "happy path" of using a stringified
token stream more often than we should. This was due to the fact that a
user-written token like `0xf` is equality-different from the stringified token
of `15` (despite being semantically equivalent).
This patch updates the call to `eq_unspanned` with an even more awful solution,
`probably_equal_for_proc_macro`, which ignores the value of each token and
basically only compares the structure of the token stream, assuming that the AST
doesn't change just one token at a time.
While this is a step towards fixing #50061 there is still one regression
from #49154 which needs to be fixed.
`char_lit` uses an allocation in order to ignore '_' chars in \u{...}
literals. This patch changes it to not do that by processing the chars
more directly.
This improves various rustc-perf benchmark measurements by up to 6%,
particularly regex, futures, clap, coercions, hyper, and encoding.
proc_macro: Avoid cached TokenStream more often
This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.
This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in #43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (#48644 and #49846) related to using this stale cache.
This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.
Closes#48644Closes#49846
Merge the std_unicode crate into the core crate
[The standard library facade](https://github.com/rust-lang/rust/issues/27783) has historically contained a number of crates with different roles, but that number has decreased over time. `rand` and `libc` have moved to crates.io, and [`collections` was merged into `alloc`](https://github.com/rust-lang/rust/pull/42648). Today we have `core` that applies everywhere, `std` that expects a full operating system, and `alloc` in-between that only requires a memory allocator (which can be provided by users)… and `std_unicode`, which doesn’t really have a reason to be separate anymore. It contains functionality based on Unicode data tables that can be large, but as long as relevant functions are not called the tables should be removed from binaries by linkers.
This deprecates the unstable `std_unicode` crate and moves all of its contents into `core`, replacing them with `pub use` reexports. The crate can be removed later. This also removes the `CharExt` trait (replaced with inherent methods in libcore) and `UnicodeStr` trait (merged into `StrExt`). There traits were both unstable and not intended to be used or named directly.
A number of new items are newly-available in libcore and instantly stable there, but only if they were already stable in libstd.
Fixes#49319.
Use sort_by_cached_key where appropriate
A follow-up to https://github.com/rust-lang/rust/pull/48639, converting various slice sorting calls to `sort_by_cached_key` when the key functions are more expensive.
This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.
This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in #43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (#48644 and #49846) related to using this stale cache.
This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.
Closes#48644Closes#49846
Impressing confused Python users with magical diagnostics is perhaps
worth this not-grossly-unreasonable (only 40ish lines) extra complexity
in the parser?
Thanks to Vadim Petrochenkov for guidance.
This resolves#46836.
Expand macros in `extern {}` blocks
This permits macro and proc-macro and attribute invocations (the latter only with the `proc_macro` feature of course) in `extern {}` blocks, gated behind a new `macros_in_extern` feature.
A tracking issue is now open at #49476closes#48747
Expand Attributes on Statements and Expressions
This enables attribute-macro expansion on statements and expressions while retaining the `stmt_expr_attributes` feature requirement for attributes on expressions.
closes#41475
cc #38356 @petrochenkov @jseyfried
r? @nrc
Fix escaped backslash in windows file not found message
When a module is declared, but no matching file exists, rustc gives
an error like `help: name the file either foo.rs or foo/mod.rs inside
the directory "src/bar"`. However, at on windows, the backslash was
double-escaped when naming the directory.
It did this because the string was printed in debug mode (`"{:?}"`) to
surround it with quotes. However, it should just be printed like any
other directory in an error message and surrounded by escaped quotes,
rather than relying on the debug print to add quotes (`"\"{}\""`).
I also checked the test suite to see if this output is being correctly tested. It's not - it only tests up to the word "directory". Presumably this is so that the test is not dependent on its exact position in the source tree. I don't know a better way to test this, unless the test suite supports regex?
When a module is declared, but no matching file exists, rustc gives
an error like 'help: name the file either foo.rs or foo/mod.rs inside
the directory "src/bar"'. However, at on windows, the backslash was
double-escaped when naming the directory.
It did this because the string was printed in debug mode ( "{:?}" ) to
surround it with quotes. However, it should just be printed like any
other directory in an error message and surrounded by escaped quotes,
rather than relying on the debug print to add quotes ( "\"{}\"" ).
libsyntax: Remove obsolete.rs
This little piece of infra is obsolete (ha-ha) and is unlikely to be used in the future, even if new obsolete syntax appears.