rustc_errors: Add (heuristic) Syntax Highlighting for `rustc --explain`
This PR adds a feature that enables `rustc --explain <error>` to have syntax highlighted code blocks. Due to performance, size and complexity constraints, the highlighter is very heuristc, relying on conventions for capitalizations and such to infer what an identifier represents. The details for the implementation are specified below.
# Changes
1. Change `term::entrypoint` to `term::entrypoint_with_formatter`, which takes an optional third argument, which is a function pointer to a formatter. ([compiler/rustc_errors/src/markdown/mod.rs](https://github.com/rust-lang/rust/compare/main...JayanAXHF:rust:rustc_colored_explain?expand=1#diff-a6e139cadbc2e6922d816eb08f9e2c7b48304d09e6588227e2b70215c4f0725c))
2. Change `MdStream::write_anstream_buf` to be a wrapper around a new function, `MdStream::write_anstream_buf_with_formatter`, which takes a function pointer to a formatter. ([compiler/rustc_errors/src/markdown/mod.rs](https://github.com/rust-lang/rust/compare/main...JayanAXHF:rust:rustc_colored_explain?expand=1#diff-a6e139cadbc2e6922d816eb08f9e2c7b48304d09e6588227e2b70215c4f0725c))
3. Change [`compiler/rustc_driver_impl/src/lib.rs`](https://github.com/rust-lang/rust/compare/main...JayanAXHF:rust:rustc_colored_explain?expand=1#diff-39877a2556ea309c89384956740d5892a59cef024aa9473cce16bbdd99287937) to call `MdStream::write_anstream_buf_with_formatter` instead of `MdStream::write_anstream_buf`.
4. Add a `compiler/rustc_driver_impl/src/highlighter.rs` file, which contains the actual syntax highlighter.
# Implementation Details
1. The highlighter starts from the `highlight` function defined in `compiler/rustc_driver_impl/src/highlighter.rs`. It creates a new instance of the `Highlighter` struct, and calls its `highlight_rustc_lexer` function to start highlighting.
2. The `highlight_rustc_lexer` function uses `rustc_lexer` to lex the code into `Token`s. `rustc_lexer` was chosen since it preserves the newlines after scanning.
3. Based on the kind of token (`TokenKind`), we color the corresponding lexeme.
## Highlighter Implementation
### Identifiers
1. All identifiers that match a (non-exhaustive and minimal) list of keywords are coloured magenta.
2. An identifier that begins with a capital letter is assumed as a type. There is no distinction between a `Trait` and a type, since that would involve name resolution, and the parts of `rustc` that perform name resolution on code do not preserve the original formatting. (An attempt to use `rustc_parse`'s lexer and `TokenStream` was made, which was then printed with the pretty printer, but failed to preserve the formatting and was generally more complex to work with)
3. An identifier that is immediately followed by a parenthesis is recognized as a function identifier, and coloured blue.
## Literals
5. A `String` literal (or its corresponding `Raw`, `C` and `Byte` versions) is colored green.
6. All other literals are colored bright red (orange-esque)
## Everything Else
Everything else is colored bright white and dimmed, to create a grayish colour.
---
# Demo
<img width="1864" height="2136" alt="image" src="https://github.com/user-attachments/assets/b17d3a71-e641-4457-be85-5e5b1cea2954" />
<caption> Command: <code>rustc --explain E0520</code> </caption>
---
This description was not generated by an LLM (:p)
cc: @bjorn3
This commit adds a heuristics-based syntax highlighter for the `rustc
--explain` command. It uses `rsutc_lexer`'s lexer to parse input in
tokens, and matches on them to determine their color.
Allow invoking all help options at once
Implements rust-lang/rust#150442
Help messages are printed in the order they are invoked, but only once (e.g. `--help -C help --help` prints regular help then codegen help).
Lint groups (`-Whelp`) are always printed last due to necessarily coming later in the compiler pipeline.
Adds `help` flags to `-C` and `-Z` to avoid an error in `build_session_options`.
cc @bjorn3 [#t-compiler/major changes > Allow combining `--help -C help -Z help -… compiler-team#954 @ 💬](https://rust-lang.zulipchat.com/#narrow/channel/233931-t-compiler.2Fmajor-changes/topic/Allow.20combining.20.60--help.20-C.20help.20-Z.20help.20-.E2.80.A6.20compiler-team.23954/near/564358190) - this currently maintains the behaviour of unrecognized options always failing-fast, as this happens within `getopt`s parsing, before we can even check if help options are present.
Make `--print=check-cfg` output compatible `--check-cfg` arguments
This PR changes significantly the output of the unstable `--print=check-cfg` option.
Specifically it removes the ad-hoc resemblance with `--print=cfg` in order to output a simplified but still compatible `--check-cfg` arguments.
The goal is to future proof the output of `--print=check-cfg` like `--check-cfg` is, and the best way to do that is to use it's syntax.
This is particularly relevant for [RFC3905](https://github.com/rust-lang/rfcs/pull/3905) which wants to introduce a new predicate: `version(...)`.
Destabilise `target-spec-json`
Per rust-lang/compiler-team#944:
> Per https://github.com/rust-lang/rust/issues/71009, the ability to load target spec JSONs was stabilised accidentally. Within the team, we've always considered the format to be unstable and have changed it freely. This has been feasible as custom targets can only be used with core, like any other target, and so custom targets de-facto require nightly to be used (i.e. to build core manually or use Cargo's -Zbuild-std).
>
> Current build-std RFCs (https://github.com/rust-lang/rfcs/pull/3873, https://github.com/rust-lang/rfcs/pull/3874) propose a mechanism for building core on stable (at the request of Rust for Linux), which combined with a stable target-spec-json format, permit the current format to be used much more widely on stable toolchains. This would prevent us from improving the format - making it less tied to LLVM, switching to TOML, enabling keys in the spec to be stabilised individually, etc.
>
> De-stabilising the format gives us the opportunity to improve the format before it is too challenging to do so. Internal company toolchains and projects like Rust for Linux already use target-spec-json, but must use nightly at some point while doing so, so while it could be inconvenient for those users to destabilise this, it is hoped that an minimal alternative that we could choose to stabilise can be proposed relatively quickly.
This commit refactors `SourceMap` and most importantly `RealFileName` to
make it self-contained in order to achieve cross-compiler consistency.
This is achieved:
- by making `RealFileName` immutable
- by only having `SourceMap::to_real_filename` create `RealFileName`
- by also making `RealFileName` holds it's working directory,
it's embeddable name and the remapped scopes
- by making most `FileName` and `RealFileName` methods take a scope as
an argument
In order for `SourceMap::to_real_filename` to know which scopes to apply
`FilePathMapping` now takes the current remapping scopes to apply, which
makes `FileNameDisplayPreference` and company useless and are removed.
The scopes type `RemapPathScopeComponents` was moved from
`rustc_session::config` to `rustc_span`.
The previous system for scoping the local/remapped filenames
`RemapFileNameExt::for_scope` is no longer useful as it's replaced by
methods on `FileName` and `RealFileName`.
Using a defaulted `CodegenBackend` method that querying for zstd support should
automatically print a safe value of `false` on any backend that doesn't
specifically indicate the presence or absence of zstd.
Tests for `-Zdebuginfo-compression=zstd` need to be skipped if LLVM was built
without support for zstd compression.
Currently, compiletest relies on messy and fragile heuristics to detect whether
the compiler's LLVM was built with zstd support. But the compiler itself
already knows whether LLVM has zstd or not, so it's easier for compiletest to
just ask the compiler.
refactor: Move to anstream + anstyle for styling
`rustc` uses [`termcolor`](https://crates.io/crates/termcolor) for styling and writing, while `annotate-snippets` uses [`anstyle`](https://crates.io/crates/anstyle) for styling and currently writes directly to a `String`. When rendering directly to a terminal, there isn't/shouldn't be any differences. Still, there are differences in the escape sequences, which leads to slightly different output in JSON and SVG tests. As part of my work to have `rustc` use `annotate-snippets`, and to reduce the test differences between the two, I switched `rustc` to use `anstlye` and [`anstream`](https://crates.io/crates/anstream) for styling and writing.
The first commit migrates to `anstyle` and `anstream` and notably does not change the output. This is because it includes extra formatting to ensure that `anstyle` + `anstream` match the current output exactly. Most of this code is unnecessary, as it adds redundant resets or uses 256-color (8-bit) when it could be using 4-bit color. The subsequent commits remove this extra formatting while maintaining the correct output when rendered.
[Zulip discussion](https://rust-lang.zulipchat.com/#narrow/channel/147480-t-compiler.2Fdiagnostics/topic/annotate-snippets.20hurdles)
This makes it clearer that it is set by the build system rather than by
the rustc that compiles the current rustc. It also avoids bootstrap
needing to pass --check-cfg llvm_enzyme to rustc.
This schema is helpful for people writing custom target spec JSON. It
can provide autocomplete in the editor, and also serves as documentation
when there are documentation comments on the structs, as `schemars` will
put them in the schema.
a more general version of https://github.com/rust-lang/rust/pull/146080.
after a bit of hacking in [`fluent.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_fluent_macro/src/fluent.rs), i discovered that i'm not the only one that is bad at following guidelines 😅. this pr lowercases the first letter of all the error messages in the codebase.
(i did not change things that are traditionally uppercased such as _MIR_, _ABI_ or _C_)
i think it's reasonable to run a `@bors try` so all the test suite is checked, as i cannot run some of the tests on my machine. i double checked (and replaced manually) all the old error messages, but better be safe than sorry.
in the future i will try to add a check in `x test tidy` that errors if an error message starts with an uppercase letter.
This was done in #145740 and #145947. It is causing problems for people
using r-a on anything that uses the rustc-dev rustup package, e.g. Miri,
clippy.
This repository has lots of submodules and subtrees and various
different projects are carved out of pieces of it. It seems like
`[workspace.dependencies]` will just be more trouble than it's worth.
Emit warning when there is no space between `-o` and arg
Closesrust-lang/rust#142812
`getopt` doesn't seem to have an API to check this, so we have to check the args manually.
r? compiler
Refactor Translator
My main motivation was to simplify the usage of `SilentEmitter` for users like rustfmt. A few refactoring opportunities arose along the way.
* Replace `Translate` trait with `Translator` struct
* Replace `Emitter: Translate` with `Emitter::translator`
* Split `SilentEmitter` into `FatalOnlyEmitter` and `SilentEmitter`
Reduce uses of `hir_crate`.
I tried rebasing my old incremental-HIR branch. This is a by-product, which is required if we want to get rid of `hir_crate` entirely.
The second commit is a drive-by cleanup. It can be pulled into its own PR.
r? ````@oli-obk````