rustdoc: fix & clean up handling of cross-crate higher-ranked parameters
Preparatory work for the refactoring planned in #113015 (for correctness & maintainability).
---
1. Render the higher-ranked parameters of cross-crate function pointer types **(*)**.
2. Replace occurrences of `collect_referenced_late_bound_regions()` (CRLBR) with `bound_vars()`.
The former is quite problematic and the use of the latter allows us to yank a lot of hacky code **(†)**
as you can tell from the diff! :)
3. Add support for cross-crate higher-ranked types (`#![feature(non_lifetime_binders)]`).
We were previously ICE'ing on them (see `inline_cross/non_lifetime_binders.rs`).
---
**(*)**: Extracted from test `inline_cross/fn-type.rs`:
```diff
- fn(_: &'z fn(_: &'b str), _: &'a ()) -> &'a ()
+ for<'z, 'a, '_unused> fn(_: &'z for<'b> fn(_: &'b str), _: &'a ()) -> &'a ()
```
**(†)**: It returns an `FxHashSet` which isn't *predictable* or *stable* wrt. source code (`.rmeta`) changes. To elaborate, the ordering of late-bound regions doesn't necessarily reflect the ordering found in the source code. It does seem to be stable across compilations but modifying the source code of the to-be-documented crates (like adding or renaming items) may result in a different order:
<details><summary>Example</summary>
Let's assume that we're documenting the cross-crate re-export of `produce` from the code below. On `master`, rustdoc would render the list of binders as `for<'x, 'y, 'z>`. However, once you add back the functions `a`–`l`, it would be rendered as `for<'z, 'y, 'x>` (reverse order)! Results may vary. `bound_vars()` fixes this as it returns them in source order.
```rs
// pub fn a() {}
// pub fn b() {}
// pub fn c() {}
// pub fn d() {}
// pub fn e() {}
// pub fn f() {}
// pub fn g() {}
// pub fn h() {}
// pub fn i() {}
// pub fn j() {}
// pub fn k() {}
// pub fn l() {}
pub fn produce() -> impl for<'x, 'y, 'z> Trait<'z, 'y, 'x> {}
pub trait Trait<'a, 'b, 'c> {}
impl Trait<'_, '_, '_> for () {}
```
</details>
Further, as the name suggests, CRLBR only collects *referenced* regions and thus we drop unused binders. `bound_vars()` contains unused binders on the other hand. Let's stay closer to the source where possible and keep unused binders.
Lastly, using `bound_vars()` allows us to get rid of
* the deduplication and alphabetical sorting hack in `simplify.rs`
* the weird field `bound_params` on `EqPredicate`
both of which were introduced by me in #102707 back when I didn't know better.
To illustrate, let's look at the cross-crate bound `T: for<'a, 'b> Trait<A<'a> = (), B<'b> = ()>`.
* With CRLBR + `EqPredicate.bound_params`, *before* bounds simplification we would have the bounds `T: Trait`, `for<'a> <T as Trait>::A<'a> == ()` and `for<'b> <T as Trait>::B<'b> == ()` which required us to merge `for<>`, `for<'a>` and `for<'b>` into `for<'a, 'b>` in a deterministic manner and without introducing duplicate binders.
* With `bound_vars()`, we now have the bounds `for<'a, b> T: Trait`, `<T as Trait>::A<'a> == ()` and `<T as Trait>::B<'b> == ()` before bound simplification similar to rustc itself. This obviously no longer requires any funny merging of `for<>`s. On top of that `for<'a, 'b>` is guaranteed to be in source order.
rustdoc: speed up processing of cross-crate fns to fix a perf regression
* The first commit doesn't affect perf but get's rid of a `.clone()` and a bunch of lines of code. I can drop it if you'd like me to
* The second commit, *“reduce the amount of `asyncness` query executions”*, addresses the perf regression introduced in #116084
r? `@ghost`
rename mir::Constant -> mir::ConstOperand, mir::ConstKind -> mir::Const
Also, be more consistent with the `to/eval_bits` methods... we had some that take a type and some that take a size, and then sometimes the one that takes a type is called `bits_for_ty`.
Turns out that `ty::Const`/`mir::ConstKind` carry their type with them, so we don't need to even pass the type to those `eval_bits` functions at all.
However this is not properly consistent yet: in `ty` we have most of the methods on `ty::Const`, but in `mir` we have them on `mir::ConstKind`. And indeed those two types are the ones that correspond to each other. So `mir::ConstantKind` should actually be renamed to `mir::Const`. But what to do with `mir::Constant`? It carries around a span, that's really more like a constant operand that appears as a MIR operand... it's more suited for `syntax.rs` than `consts.rs`, but the bigger question is, which name should it get if we want to align the `mir` and `ty` types? `ConstOperand`? `ConstOp`? `Literal`? It's not a literal but it has a field called `literal` so it would at least be consistently wrong-ish...
``@oli-obk`` any ideas?
Pretty-print argument-position impl trait to name it.
This removes a corner case.
RPIT and TAIT keep having no name, and it would be wrong to use the one in HIR (Ident::empty), so I make this case ICE.
Custom code classes in docs warning
Fixes https://github.com/rust-lang/rust/issues/115938.
This PR does two things:
1. Unless the `custom_code_classes_in_docs` feature is enabled, it will use the old codeblock tag parser.
2. If there is a codeblock tag that starts with a `.`, it will emit a behaviour change warning.
Hopefully this is the last missing part for this feature until stabilization.
Follow-up of https://github.com/rust-lang/rust/pull/110800.
r? `@notriddle`
move things out of mir/mod.rs
This moves a bunch of things out of `mir/mod.rs`:
- all const-related stuff to a new file consts.rs
- all statement/place/operand-related stuff to a new file statement.rs
- all pretty-printing related stuff to pretty.rs
`mod.rs` started out with 3100 lines and ends up with 1600. :)
Also there was some pretty-printing stuff in terminator.rs, that also got moved to pretty.rs, and I reordered things in pretty.rs so that it can be grouped by functionality.
Only the commit "use pretty_print_const_value from MIR constant 'extra' printing" has any behavior changes; it resolves the issue of having a fancy and a very crude pretty-printer for `ConstValue`.
r? `@oli-obk`
rustdoc-search: add support for type parameters
r? `@GuillaumeGomez`
## Preview
* https://notriddle.com/rustdoc-html-demo-4/advanced-search/rustdoc/read-documentation/search.html
* https://notriddle.com/rustdoc-html-demo-4/advanced-search/std/index.html?search=option%3Coption%3CT%3E%3E%20-%3E%20option%3CT%3E
* https://notriddle.com/rustdoc-html-demo-4/advanced-search/std/index.html?search=option%3CT%3E,%20E%20-%3E%20result%3CT,%20E%3E
* https://notriddle.com/rustdoc-html-demo-4/advanced-search/std/index.html?search=-%3E%20option%3CT%3E
## Description
When writing a type-driven search query in rustdoc, specifically one with more than one query element, non-existent types become generic parameters instead of auto-correcting (which is currently only done for single-element queries) or giving no result. You can also force a generic type parameter by writing `generic:T` (and can force it to not use a generic type parameter with something like `struct:T` or whatever, though if this happens it means the thing you're looking for doesn't exist and will give you no results).
There is no syntax provided for specifying type constraints for generic type parameters.
When you have a generic type parameter in a search query, it will only match up with generic type parameters in the actual function, not concrete types that match, not concrete types that implement a trait. It also strictly matches based on when they're the same or different, so `option<T>, option<U> -> option<U>` matches `Option::and`, but not `Option::or`. Similarly, `option<T>, option<T> -> option<T>` matches `Option::or`, but not `Option::and`.
## Motivation
This feature is motivated by the many "combinitor"-type functions found in generic libraries, such as Option, Future, Iterator, and Entry. These highly-generic functions have names that are almost completely arbitrary, and a type signature that tells you what it actually does.
This PR is a major step towards[^closure] being able to easily search for generic functions by their type signature instead of by name. Some examples of combinators that can be found using this PR (try them out in the preview):
* `option<option<T>> -> option<T>` returns Option::flatten
* `option<T> -> result<T>` returns Option::ok_or
* `option<result<T>> -> result<option<T>>` returns Option::transpose
* `entry<K, V>, FnOnce -> V` returns `Entry::or_insert_with` (and `or_insert_with_key`, since there's no way to specify the generics on FnOnce)
[^closure]:
For this feature to be as useful as it ought to be, you should be able to search for *trait-associated types* and *closures*. This PR does not implement either of these: they are **Future possibilities**.
Trait-associated types would allow queries like `option<T> -> iterator<item=T>` to return `Option::iter`. We should also allow `option<T> -> iterator<T>` to match the associated type version.
Closures would make a good way to query for things like `Option::map`. Closure support needs associated types to be represented in the search index, since `FnOnce() -> i32` desugars to `FnOnce<Output=i32, ()>`, so associated trait types should be implemented first. Also, we'd want to expose an easy way to query closures without specifying which of the three traits you want.
Accept additional user-defined syntax classes in fenced code blocks
Part of #79483.
This is a re-opening of https://github.com/rust-lang/rust/pull/79454 after a big update/cleanup. I also converted the syntax to pandoc as suggested by `@notriddle:` the idea is to be as compatible as possible with the existing instead of having our own syntax.
## Motivation
From the original issue: https://github.com/rust-lang/rust/issues/78917
> The technique used by `inline-c-rs` can be ported to other languages. It's just super fun to see C code inside Rust documentation that is also tested by `cargo doc`. I'm sure this technique can be used by other languages in the future.
Having custom CSS classes for syntax highlighting will allow tools like `highlight.js` to be used in order to provide highlighting for languages other than Rust while not increasing technical burden on rustdoc.
## What is the feature about?
In short, this PR changes two things, both related to codeblocks in doc comments in Rust documentation:
* Allow to disable generation of `language-*` CSS classes with the `custom` attribute.
* Add your own CSS classes to a code block so that you can use other tools to highlight them.
#### The `custom` attribute
Let's start with the new `custom` attribute: it will disable the generation of the `language-*` CSS class on the generated HTML code block. For example:
```rust
/// ```custom,c
/// int main(void) {
/// return 0;
/// }
/// ```
```
The generated HTML code block will not have `class="language-c"` because the `custom` attribute has been set. The `custom` attribute becomes especially useful with the other thing added by this feature: adding your own CSS classes.
#### Adding your own CSS classes
The second part of this feature is to allow users to add CSS classes themselves so that they can then add a JS library which will do it (like `highlight.js` or `prism.js`), allowing to support highlighting for other languages than Rust without increasing burden on rustdoc. To disable the automatic `language-*` CSS class generation, you need to use the `custom` attribute as well.
This allow users to write the following:
```rust
/// Some code block with `{class=language-c}` as the language string.
///
/// ```custom,{class=language-c}
/// int main(void) {
/// return 0;
/// }
/// ```
fn main() {}
```
This will notably produce the following HTML:
```html
<pre class="language-c">
int main(void) {
return 0;
}</pre>
```
Instead of:
```html
<pre class="rust rust-example-rendered">
<span class="ident">int</span> <span class="ident">main</span>(<span class="ident">void</span>) {
<span class="kw">return</span> <span class="number">0</span>;
}
</pre>
```
To be noted, we could have written `{.language-c}` to achieve the same result. `.` and `class=` have the same effect.
One last syntax point: content between parens (`(like this)`) is now considered as comment and is not taken into account at all.
In addition to this, I added an `unknown` field into `LangString` (the parsed code block "attribute") because of cases like this:
```rust
/// ```custom,class:language-c
/// main;
/// ```
pub fn foo() {}
```
Without this `unknown` field, it would generate in the DOM: `<pre class="language-class:language-c language-c">`, which is quite bad. So instead, it now stores all unknown tags into the `unknown` field and use the first one as "language". So in this case, since there is no unknown tag, it'll simply generate `<pre class="language-c">`. I added tests to cover this.
Finally, I added a parser for the codeblock attributes to make it much easier to maintain. It'll be pretty easy to extend.
As to why this syntax for adding attributes was picked: it's [Pandoc's syntax](https://pandoc.org/MANUAL.html#extension-fenced_code_attributes). Even if it seems clunkier in some cases, it's extensible, and most third-party Markdown renderers are smart enough to ignore Pandoc's brace-delimited attributes (from [this comment](https://github.com/rust-lang/rust/pull/110800#issuecomment-1522044456)).
## Raised concerns
#### It's not obvious when the `language-*` attribute generation will be added or not.
It is added by default. If you want to disable it, you will need to use the `custom` attribute.
#### Why not using HTML in markdown directly then?
Code examples in most languages are likely to contain `<`, `>`, `&` and `"` characters. These characters [require escaping](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre) when written inside the `<pre>` element. Using the \`\`\` code blocks allows rustdoc to take care of escaping, which means doc authors can paste code samples directly without manually converting them to HTML.
cc `@poliorcetics`
r? `@notriddle`