syntax: Optimize some literal parsing
Currently in the `wasm-bindgen` project we have a very very large crate that's
procedurally generated, `web-sys`. To generate this crate we parse all of a
browser's WebIDL and we then generate bindings for all of the APIs contained
within.
The resulting Rust file is 18MB large (wow!) and currently takes a very long
time to compile in debug mode. On the nightly compiler a *debug* build takes 90s
for the crate to finish. I was curious what was taking so long and upon
investigating a *massive* portion of the time was spent in the `lit_token`
method of the compiler, primarily formatting strings via `format!`.
Upon some more investigation it looks like the `byte_str_lit` was allocating an
error message once per byte, causing a very large number of allocations to
happen for large literals, of which wasm-bindgen generates quite a few (some are
MB large).
This commit fixes the issue by lazily allocating the error message, only doing
so if the error message is actually needed (which should be never). As a result,
the debug mode compilation time for our `web-sys` crate decreased from 90s to
20s, a very nice improvement! (although we've still got some work to do).
Currently in the `wasm-bindgen` project we have a very very large crate that's
procedurally generated, `web-sys`. To generate this crate we parse all of a
browser's WebIDL and we then generate bindings for all of the APIs contained
within.
The resulting Rust file is 18MB large (wow!) and currently takes a very long
time to compile in debug mode. On the nightly compiler a *debug* build takes 90s
for the crate to finish. I was curious what was taking so long and upon
investigating a *massive* portion of the time was spent in the `lit_token`
method of the compiler, primarily formatting strings via `format!`.
Upon some more investigation it looks like the `byte_str_lit` was allocating an
error message once per byte, causing a very large number of allocations to
happen for large literals, of which wasm-bindgen generates quite a few (some are
MB large).
This commit fixes the issue by lazily allocating the error message, only doing
so if the error message is actually needed (which should be never). As a result,
the debug mode compilation time for our `web-sys` crate decreased from 90s to
20s, a very nice improvement! (although we've still got some work to do).
Addressed #51602Fixed#51602
r? @estebank
here I have addressed the case where `in` was not expected right after `if` block. Speaking of `type ascription` I am not sure if this the best approach which I have implemented. Plus I think one more test case can be added to test `type-ascription` case, though I don't have any at this point of time. I will ping you again if all existing testcases pass.
A few cleanups and minor improvements for the lexer
- improve readability by adjusting the formatting of some function signatures and adding some newlines
- reorder some functions for easier reading
- remove redundant `'static` in `const`s
- remove some explicit `return`s
- read directly to a `String` in `gather_comments_and_literals`
- change `unwrap_or!` (macro) to `unwrap_or` (function)
- move an `assert!`ion from `try_next_token` (called in a loop) to `try_real_token` after all calls to `try_next_token`
- `#[inline]` some one-liner functions
- assign directly from an `if-else` expression
- refactor a `match` to `map_or`
- add a `token::is_irrelevant` function to detect tokens that are not "`real`"
The error and check for this already existed, but the parser didn't try to parse trait method arguments as patterns, so the error was never emitted. This surfaces the error, so we get better errors than simple parse errors.
Replace push loops with extend() where possible
Or set the vector capacity where I couldn't do it.
According to my [simple benchmark](https://gist.github.com/ljedrz/568e97621b749849684c1da71c27dceb) `extend`ing a vector can be over **10 times** faster than `push`ing to it in a loop:
10 elements (6.1 times faster):
```
test bench_extension ... bench: 75 ns/iter (+/- 23)
test bench_push_loop ... bench: 458 ns/iter (+/- 142)
```
100 elements (11.12 times faster):
```
test bench_extension ... bench: 87 ns/iter (+/- 26)
test bench_push_loop ... bench: 968 ns/iter (+/- 3,528)
```
1000 elements (11.04 times faster):
```
test bench_extension ... bench: 311 ns/iter (+/- 9)
test bench_push_loop ... bench: 3,436 ns/iter (+/- 233)
```
Seems like a good idea to use `extend` as much as possible.
Ever plagued by #43081 the compiler can return surprising spans in situations
related to procedural macros. This is exhibited by #47983 where whenever a
procedural macro is invoked in a nested item context it would fail to have
correct span information.
While #43230 provided a "hack" to cache the token stream used for each item in
the compiler it's not a full-blown solution. This commit continues to extend
this "hack" a bit more to work for nested items.
Previously in the parser the `parse_item` method would collect the tokens for an
item into a cache on the item itself. It turned out, however, that nested items
were parsed through the `parse_item_` method, so they didn't receive similar
treatment. To remedy this situation the hook for collecting tokens was moved
into `parse_item_` instead of `parse_item`.
Afterwards the token collection scheme was updated to support nested collection
of tokens. This is implemented by tracking `TokenStream` tokens instead of
`TokenTree` to allow for collecting items into streams at intermediate layers
and having them interleaved in the upper layers.
All in all, this...
Closes#47983
Prepare proc_macro for decoupling it from the rest of the compiler.
This is #49219 up to the point where the bridge is introduced. Aside from moving some code around, the largest change is the rewrite of `proc_macro::quote` to be simpler and do less introspection.
I'd like to also extend `quote!` with `${stmt;...;expr}` instead of just `$variable` (and maybe even `$(... $iter ...)*`), which seems pretty straight-forward now, but I don't know if/when I should.
r? @alexcrichton or @dtolnay cc @jseyfried @petrochenkov