Rearrange symbol-mangling chapter out of codegen-options.

This commit is contained in:
Eric Huss 2022-06-01 14:01:22 -07:00
parent 5d61dcaad5
commit d5d4619e98
4 changed files with 81 additions and 81 deletions

View file

@ -3,7 +3,6 @@
- [What is rustc?](what-is-rustc.md)
- [Command-line Arguments](command-line-arguments.md)
- [Codegen Options](codegen-options/index.md)
- [Symbol Mangling](codegen-options/symbol-mangling.md)
- [Lints](lints/index.md)
- [Lint Levels](lints/levels.md)
- [Lint Groups](lints/groups.md)
@ -53,4 +52,6 @@
- [Instrumentation-based Code Coverage](instrument-coverage.md)
- [Linker-plugin-based LTO](linker-plugin-lto.md)
- [Exploit Mitigations](exploit-mitigations.md)
- [Symbol Mangling](symbol-mangling/index.md)
- [v0 Symbol Format](symbol-mangling/v0.md)
- [Contributing to `rustc`](contributing.md)

View file

@ -577,7 +577,7 @@ change in the future.
See the [Symbol Mangling] chapter for details on symbol mangling and the mangling format.
[name mangling]: https://en.wikipedia.org/wiki/Name_mangling
[Symbol Mangling]: symbol-mangling.md
[Symbol Mangling]: ../symbol-mangling/index.md
## target-cpu

View file

@ -0,0 +1,52 @@
# Symbol Mangling
[Symbol name mangling] is used by `rustc` to encode a unique name for symbols that are used during code generation.
The encoded names are used by the linker to associate the name with the thing it refers to.
The method for mangling the names can be controlled with the [`-C symbol-mangling-version`] option.
[Symbol name mangling]: https://en.wikipedia.org/wiki/Name_mangling
[`-C symbol-mangling-version`]: ../codegen-options/index.md#symbol-mangling-version
## Per-item control
The [`#[no_mangle]` attribute][reference-no_mangle] can be used on items to disable name mangling on that item.
The [`#[export_name]`attribute][reference-export_name] can be used to specify the exact name that will be used for a function or static.
Items listed in an [`extern` block][reference-extern-block] use the identifier of the item without mangling to refer to the item.
The [`#[link_name]` attribute][reference-link_name] can be used to change that name.
<!--
FIXME: This is incomplete for wasm, per https://github.com/rust-lang/rust/blob/d4c364347ce65cf083d4419195b8232440928d4d/compiler/rustc_symbol_mangling/src/lib.rs#L191-L210
-->
[reference-no_mangle]: ../../reference/abi.html#the-no_mangle-attribute
[reference-export_name]: ../../reference/abi.html#the-export_name-attribute
[reference-link_name]: ../../reference/items/external-blocks.html#the-link_name-attribute
[reference-extern-block]: ../../reference/items/external-blocks.html
## Decoding
The encoded names may need to be decoded in some situations.
For example, debuggers and other tooling may need to demangle the name so that it is more readable to the user.
Recent versions of `gdb` and `lldb` have built-in support for demangling Rust identifiers.
In situations where you need to do your own demangling, the [`rustc-demangle`] crate can be used to programmatically demangle names.
[`rustfilt`] is a CLI tool which can demangle names.
An example of running rustfilt:
```text
$ rustfilt _RNvCskwGfYPst2Cb_3foo16example_function
foo::example_function
```
[`rustc-demangle`]: https://crates.io/crates/rustc-demangle
[`rustfilt`]: https://crates.io/crates/rustfilt
## Mangling versions
`rustc` supports different mangling versions which encode the names in different ways.
The legacy version (which is currently the default) is not described here.
The "v0" mangling scheme addresses several limitations of the legacy format,
and is described in the [v0 Symbol Format](v0.md) chapter.

View file

@ -1,57 +1,4 @@
# Symbol Mangling
[Symbol name mangling] is used by `rustc` to encode a unique name for symbols that are used during code generation.
The encoded names are used by the linker to associate the name with the thing it refers to.
The method for mangling the names can be controlled with the [`-C symbol-mangling-version`] option.
[Symbol name mangling]: https://en.wikipedia.org/wiki/Name_mangling
[`-C symbol-mangling-version`]: index.md#symbol-mangling-version
## Per-item control
The [`#[no_mangle]` attribute][reference-no_mangle] can be used on items to disable name mangling on that item.
The [`#[export_name]`attribute][reference-export_name] can be used to specify the exact name that will be used for a function or static.
Items listed in an [`extern` block][reference-extern-block] use the identifier of the item without mangling to refer to the item.
The [`#[link_name]` attribute][reference-link_name] can be used to change that name.
<!--
FIXME: This is incomplete for wasm, per https://github.com/rust-lang/rust/blob/d4c364347ce65cf083d4419195b8232440928d4d/compiler/rustc_symbol_mangling/src/lib.rs#L191-L210
-->
[reference-no_mangle]: ../../reference/abi.html#the-no_mangle-attribute
[reference-export_name]: ../../reference/abi.html#the-export_name-attribute
[reference-link_name]: ../../reference/items/external-blocks.html#the-link_name-attribute
[reference-extern-block]: ../../reference/items/external-blocks.html
## Decoding
The encoded names may need to be decoded in some situations.
For example, debuggers and other tooling may need to demangle the name so that it is more readable to the user.
Recent versions of `gdb` and `lldb` have built-in support for demangling Rust identifiers.
In situations where you need to do your own demangling, the [`rustc-demangle`] crate can be used to programmatically demangle names.
[`rustfilt`] is a CLI tool which can demangle names.
An example of running rustfilt:
```text
$ rustfilt _RNvCskwGfYPst2Cb_3foo16example_function
foo::example_function
```
[`rustc-demangle`]: https://crates.io/crates/rustc-demangle
[`rustfilt`]: https://crates.io/crates/rustfilt
## Mangling versions
`rustc` supports different mangling versions which encode the names in different ways.
The legacy version (which is currently the default) is not described here.
The "v0" mangling scheme addresses several limitations of the legacy format,
and is [described below](#v0-mangling-format).
## v0 mangling format
# v0 Symbol Format
The v0 mangling format was introduced in [RFC 2603].
It has the following properties:
@ -78,7 +25,7 @@ There is no standardized demangled form of the symbols,
though suggestions are provided for how to demangle a symbol.
Implementers may choose to demangle in different ways.
### Grammar notation
## Grammar notation
The format of an encoded symbol is illustrated as a context free grammar in an extended BNF-like syntax.
A consolidated summary can be found in the [Symbol grammar summary][summary].
@ -93,7 +40,7 @@ A consolidated summary can be found in the [Symbol grammar summary][summary].
| Option | <sub>opt</sub> | <nobr>A → *B*<sub>opt</sub> *C*</nobr> | An optional element. |
| Literal | `monospace` | <nobr>A → `G`</nobr> | A terminal matching the exact characters case-sensitive. |
### Symbol name
## Symbol name
[symbol-name]: #symbol-name
> symbol-name → `_R` *[decimal-number]*<sub>opt</sub> *[path]* *[instantiating-crate]*<sub>opt</sub> *[vendor-specific-suffix]*<sub>opt</sub>
@ -128,7 +75,7 @@ The final part is an optional *[vendor-specific-suffix]*.
>
> Recommended demangling: `<std::path::PathBuf>::new`
### Symbol path
## Symbol path
[path]: #symbol-path
> path → \
@ -156,7 +103,7 @@ The initial tag character can be used to determine which kind of path it represe
| `I` | *[generic-args]* | Generic arguments. |
| `B` | *[backref]* | A back reference. |
#### Path: Crate root
### Path: Crate root
[crate-root]: #path-crate-root
> crate-root → `C` *[identifier]*
@ -196,7 +143,7 @@ the *[disambiguator]* is used to make the name unique across the crate graph.
>
> Recommended demangling: `mycrate::example`
#### Path: Inherent impl
### Path: Inherent impl
[inherent-impl]: #path-inherent-impl
> inherent-impl → `M` *[impl-path]* *[type]*
@ -230,7 +177,7 @@ It consists of the character `M` followed by an *[impl-path]* to the impl's pare
>
> Recommended demangling: `<mycrate::Example>::foo`
#### Path: Trait impl
### Path: Trait impl
[trait-impl]: #path-trait-impl
> trait-impl → `X` *[impl-path]* *[type]* *[path]*
@ -268,7 +215,7 @@ It consists of the character `X` followed by an *[impl-path]* to the impl's pare
>
> Recommended demangling: `<mycrate::Example as mycrate::Trait>::foo`
#### Path: Impl
### Path: Impl
[impl-path]: #path-impl
> impl-path → *[disambiguator]*<sub>opt</sub> *[path]*
@ -316,7 +263,7 @@ The *[disambiguator]* can be used to distinguish between multiple impls within t
> * `foo`: `<mycrate::Example>::foo`
> * `bar`: `<mycrate::Example>::bar`
#### Path: Trait definition
### Path: Trait definition
[trait-definition]: #path-trait-definition
> trait-definition → `Y` *[type]* *[path]*
@ -350,7 +297,7 @@ It consists of the character `Y` followed by the *[type]* which is the `Self` ty
>
> Recommended demangling: `<mycrate::Example as mycrate::Trait>::example`
#### Path: Nested path
### Path: Nested path
[nested-path]: #path-nested-path
> nested-path → `N` *[namespace]* *[path]* *[identifier]*
@ -415,7 +362,7 @@ For example, entities like closures, tuple-like struct constructors, and anonymo
> * `x`: `mycrate::main::{closure#0}`
> * `y`: `mycrate::main::{closure#1}`
#### Path: Generic arguments
### Path: Generic arguments
[generic-args]: #path-generic-arguments
[generic-arg]: #path-generic-arguments
@ -462,7 +409,7 @@ Each *[generic-arg]* is either a *[lifetime]* (starting with the character `L`),
>
> Recommended demangling: `mycrate::example::<i32, 1>`
#### Namespace
### Namespace
[namespace]: #namespace
> namespace → *[lower]* | *[upper]*
@ -482,7 +429,7 @@ Uppercase namespaces are:
>
> See *[nested-path]* for recommended demangling.
### Identifier
## Identifier
[identifier]: #identifier
[undisambiguated-identifier]: #identifier
[bytes]: #identifier
@ -515,7 +462,7 @@ The `_` is mandatory if the *bytes* starts with a decimal digit or `_` in order
>
> The *[disambiguator]* may or may not be displayed; see recommendations for rules that use *identifier*.
#### Punycode identifiers
### Punycode identifiers
[Punycode identifiers]: #punycode-identifiers
Because some environments are restricted to ASCII alphanumerics and `_`,
@ -565,7 +512,7 @@ Here are some examples:
[Punycode]: https://tools.ietf.org/html/rfc3492
### Disambiguator
## Disambiguator
[disambiguator]: #disambiguator
> disambiguator → `s` *[base-62-number]*
@ -582,7 +529,7 @@ This allows disambiguators that are encoded sequentially to use minimal bytes.
>
> The *disambiguator* may or may not be displayed; see recommendations for rules that use *disambiguator*.
### Lifetime
## Lifetime
[lifetime]: #lifetime
> lifetime → `L` *[base-62-number]*
@ -632,7 +579,7 @@ Indices starting from 1 refer (as de Bruijn indices) to a higher-ranked lifetime
>
> Recommended demangling: `mycrate::example::<for<'a, 'b> fn(&'a u8, &'b u16)>`
### Const
## Const
[const]: #const
[const-data]: #const
[hex-digit]: #const
@ -695,7 +642,7 @@ The encoding of the *const-data* depends on the type:
> Recommended demangling: `mycrate::example::<305419896>`
### Type
## Type
[type]: #type
[basic-type]: #basic-type
[array-type]: #array-type
@ -881,7 +828,7 @@ The type encodings based on the initial tag character are:
>
> Recommended demangling: `mycrate::example::<[u16; 8]>`
### Binder
## Binder
[binder]: #binder
> binder → `G` *[base-62-number]*
@ -903,7 +850,7 @@ For example, in `for<'a, 'b> fn(for<'c> fn (...))`, any <em>[lifetime]</em>s in
> A *binder* may be printed using `for<…>` syntax listing the lifetimes as recommended in *[lifetime]*.
> See *[lifetime]* for an example.
### Backref
## Backref
[backref]: #backref
> backref → `B` *[base-62-number]*
@ -954,7 +901,7 @@ This is ensured by not allowing optional or repeating elements at the end of sub
>
> Recommended demangling: `mycrate::example::<mycrate::Example, mycrate::Example>`
### Instantiating crate
## Instantiating crate
[instantiating-crate]: #instantiating-crate
> instantiating-crate → *[path]*
@ -988,7 +935,7 @@ so it is usually encoded as a *[backref]* to the *[crate-root]* encoded elsewher
>
> Recommended demangling: `<std::path::Path>::new::<str>`
### Vendor-specific suffix
## Vendor-specific suffix
[vendor-specific-suffix]: #vendor-specific-suffix
[suffix]: #vendor-specific-suffix
@ -1030,7 +977,7 @@ the suffixed name has the same semantics as the original.
>
> Recommended demangling: `mycrate::EXAMPLE::__getit::__KEY`
### Common rules
## Common rules
[decimal-number]: #common-rules
[digit]: #common-rules
[lower]: #common-rules
@ -1054,7 +1001,7 @@ A *digit* is an ASCII number.
A *lower* and *upper* is an ASCII lower and uppercase letter respectively.
### base-62-number
## base-62-number
[base-62-number]: #base-62-number
> [base-62-number] → { *[digit]* | *[lower]* | *[upper]* } `_`
@ -1088,7 +1035,7 @@ Examples:
| 63 | `10_` |
| 1000 | `g7_` |
### Symbol grammar summary
## Symbol grammar summary
[summary]: #symbol-grammar-summary
The following is a summary of all of the productions of the symbol grammar.
@ -1189,7 +1136,7 @@ The following is a summary of all of the productions of the symbol grammar.
> [lower] → `a` |`b` |`c` |`d` |`e` |`f` |`g` |`h` |`i` |`j` |`k` |`l` |`m` |`n` |`o` |`p` |`q` |`r` |`s` |`t` |`u` |`v` |`w` |`x` |`y` |`z` \
> [upper] → `A` | `B` | `C` | `D` | `E` | `F` | `G` | `H` | `I` | `J` | `K` | `L` | `M` | `N` | `O` | `P` | `Q` | `R` | `S` | `T` | `U` | `V` | `W` | `X` | `Y` | `Z`
### Encoding of Rust entities
## Encoding of Rust entities
The following are guidelines for how Rust entities are encoded in a symbol.
The compiler has some latitude in how an entity is encoded as long as the symbol is unambiguous.