From d5d4619e98a459fa8018c906e9f0c0352231ce17 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 1 Jun 2022 14:01:22 -0700 Subject: [PATCH] Rearrange symbol-mangling chapter out of codegen-options. --- src/doc/rustc/src/SUMMARY.md | 3 +- src/doc/rustc/src/codegen-options/index.md | 2 +- src/doc/rustc/src/symbol-mangling/index.md | 52 +++++++++ .../v0.md} | 105 +++++------------- 4 files changed, 81 insertions(+), 81 deletions(-) create mode 100644 src/doc/rustc/src/symbol-mangling/index.md rename src/doc/rustc/src/{codegen-options/symbol-mangling.md => symbol-mangling/v0.md} (94%) diff --git a/src/doc/rustc/src/SUMMARY.md b/src/doc/rustc/src/SUMMARY.md index b06f62f89166..e5ad2f1bf592 100644 --- a/src/doc/rustc/src/SUMMARY.md +++ b/src/doc/rustc/src/SUMMARY.md @@ -3,7 +3,6 @@ - [What is rustc?](what-is-rustc.md) - [Command-line Arguments](command-line-arguments.md) - [Codegen Options](codegen-options/index.md) - - [Symbol Mangling](codegen-options/symbol-mangling.md) - [Lints](lints/index.md) - [Lint Levels](lints/levels.md) - [Lint Groups](lints/groups.md) @@ -53,4 +52,6 @@ - [Instrumentation-based Code Coverage](instrument-coverage.md) - [Linker-plugin-based LTO](linker-plugin-lto.md) - [Exploit Mitigations](exploit-mitigations.md) +- [Symbol Mangling](symbol-mangling/index.md) + - [v0 Symbol Format](symbol-mangling/v0.md) - [Contributing to `rustc`](contributing.md) diff --git a/src/doc/rustc/src/codegen-options/index.md b/src/doc/rustc/src/codegen-options/index.md index f77851cdec2d..38666ba726bd 100644 --- a/src/doc/rustc/src/codegen-options/index.md +++ b/src/doc/rustc/src/codegen-options/index.md @@ -577,7 +577,7 @@ change in the future. See the [Symbol Mangling] chapter for details on symbol mangling and the mangling format. [name mangling]: https://en.wikipedia.org/wiki/Name_mangling -[Symbol Mangling]: symbol-mangling.md +[Symbol Mangling]: ../symbol-mangling/index.md ## target-cpu diff --git a/src/doc/rustc/src/symbol-mangling/index.md b/src/doc/rustc/src/symbol-mangling/index.md new file mode 100644 index 000000000000..be58f2b41b8d --- /dev/null +++ b/src/doc/rustc/src/symbol-mangling/index.md @@ -0,0 +1,52 @@ +# Symbol Mangling + +[Symbol name mangling] is used by `rustc` to encode a unique name for symbols that are used during code generation. +The encoded names are used by the linker to associate the name with the thing it refers to. + +The method for mangling the names can be controlled with the [`-C symbol-mangling-version`] option. + +[Symbol name mangling]: https://en.wikipedia.org/wiki/Name_mangling +[`-C symbol-mangling-version`]: ../codegen-options/index.md#symbol-mangling-version + +## Per-item control + +The [`#[no_mangle]` attribute][reference-no_mangle] can be used on items to disable name mangling on that item. + +The [`#[export_name]`attribute][reference-export_name] can be used to specify the exact name that will be used for a function or static. + +Items listed in an [`extern` block][reference-extern-block] use the identifier of the item without mangling to refer to the item. +The [`#[link_name]` attribute][reference-link_name] can be used to change that name. + + + +[reference-no_mangle]: ../../reference/abi.html#the-no_mangle-attribute +[reference-export_name]: ../../reference/abi.html#the-export_name-attribute +[reference-link_name]: ../../reference/items/external-blocks.html#the-link_name-attribute +[reference-extern-block]: ../../reference/items/external-blocks.html + +## Decoding + +The encoded names may need to be decoded in some situations. +For example, debuggers and other tooling may need to demangle the name so that it is more readable to the user. +Recent versions of `gdb` and `lldb` have built-in support for demangling Rust identifiers. +In situations where you need to do your own demangling, the [`rustc-demangle`] crate can be used to programmatically demangle names. +[`rustfilt`] is a CLI tool which can demangle names. + +An example of running rustfilt: + +```text +$ rustfilt _RNvCskwGfYPst2Cb_3foo16example_function +foo::example_function +``` + +[`rustc-demangle`]: https://crates.io/crates/rustc-demangle +[`rustfilt`]: https://crates.io/crates/rustfilt + +## Mangling versions + +`rustc` supports different mangling versions which encode the names in different ways. +The legacy version (which is currently the default) is not described here. +The "v0" mangling scheme addresses several limitations of the legacy format, +and is described in the [v0 Symbol Format](v0.md) chapter. diff --git a/src/doc/rustc/src/codegen-options/symbol-mangling.md b/src/doc/rustc/src/symbol-mangling/v0.md similarity index 94% rename from src/doc/rustc/src/codegen-options/symbol-mangling.md rename to src/doc/rustc/src/symbol-mangling/v0.md index 1f7c4b805e3d..408e6b1244a3 100644 --- a/src/doc/rustc/src/codegen-options/symbol-mangling.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -1,57 +1,4 @@ -# Symbol Mangling - -[Symbol name mangling] is used by `rustc` to encode a unique name for symbols that are used during code generation. -The encoded names are used by the linker to associate the name with the thing it refers to. - -The method for mangling the names can be controlled with the [`-C symbol-mangling-version`] option. - -[Symbol name mangling]: https://en.wikipedia.org/wiki/Name_mangling -[`-C symbol-mangling-version`]: index.md#symbol-mangling-version - -## Per-item control - -The [`#[no_mangle]` attribute][reference-no_mangle] can be used on items to disable name mangling on that item. - -The [`#[export_name]`attribute][reference-export_name] can be used to specify the exact name that will be used for a function or static. - -Items listed in an [`extern` block][reference-extern-block] use the identifier of the item without mangling to refer to the item. -The [`#[link_name]` attribute][reference-link_name] can be used to change that name. - - - -[reference-no_mangle]: ../../reference/abi.html#the-no_mangle-attribute -[reference-export_name]: ../../reference/abi.html#the-export_name-attribute -[reference-link_name]: ../../reference/items/external-blocks.html#the-link_name-attribute -[reference-extern-block]: ../../reference/items/external-blocks.html - -## Decoding - -The encoded names may need to be decoded in some situations. -For example, debuggers and other tooling may need to demangle the name so that it is more readable to the user. -Recent versions of `gdb` and `lldb` have built-in support for demangling Rust identifiers. -In situations where you need to do your own demangling, the [`rustc-demangle`] crate can be used to programmatically demangle names. -[`rustfilt`] is a CLI tool which can demangle names. - -An example of running rustfilt: - -```text -$ rustfilt _RNvCskwGfYPst2Cb_3foo16example_function -foo::example_function -``` - -[`rustc-demangle`]: https://crates.io/crates/rustc-demangle -[`rustfilt`]: https://crates.io/crates/rustfilt - -## Mangling versions - -`rustc` supports different mangling versions which encode the names in different ways. -The legacy version (which is currently the default) is not described here. -The "v0" mangling scheme addresses several limitations of the legacy format, -and is [described below](#v0-mangling-format). - -## v0 mangling format +# v0 Symbol Format The v0 mangling format was introduced in [RFC 2603]. It has the following properties: @@ -78,7 +25,7 @@ There is no standardized demangled form of the symbols, though suggestions are provided for how to demangle a symbol. Implementers may choose to demangle in different ways. -### Grammar notation +## Grammar notation The format of an encoded symbol is illustrated as a context free grammar in an extended BNF-like syntax. A consolidated summary can be found in the [Symbol grammar summary][summary]. @@ -93,7 +40,7 @@ A consolidated summary can be found in the [Symbol grammar summary][summary]. | Option | opt | A → *B*opt *C* | An optional element. | | Literal | `monospace` | A → `G` | A terminal matching the exact characters case-sensitive. | -### Symbol name +## Symbol name [symbol-name]: #symbol-name > symbol-name → `_R` *[decimal-number]*opt *[path]* *[instantiating-crate]*opt *[vendor-specific-suffix]*opt @@ -128,7 +75,7 @@ The final part is an optional *[vendor-specific-suffix]*. > > Recommended demangling: `::new` -### Symbol path +## Symbol path [path]: #symbol-path > path → \ @@ -156,7 +103,7 @@ The initial tag character can be used to determine which kind of path it represe | `I` | *[generic-args]* | Generic arguments. | | `B` | *[backref]* | A back reference. | -#### Path: Crate root +### Path: Crate root [crate-root]: #path-crate-root > crate-root → `C` *[identifier]* @@ -196,7 +143,7 @@ the *[disambiguator]* is used to make the name unique across the crate graph. > > Recommended demangling: `mycrate::example` -#### Path: Inherent impl +### Path: Inherent impl [inherent-impl]: #path-inherent-impl > inherent-impl → `M` *[impl-path]* *[type]* @@ -230,7 +177,7 @@ It consists of the character `M` followed by an *[impl-path]* to the impl's pare > > Recommended demangling: `::foo` -#### Path: Trait impl +### Path: Trait impl [trait-impl]: #path-trait-impl > trait-impl → `X` *[impl-path]* *[type]* *[path]* @@ -268,7 +215,7 @@ It consists of the character `X` followed by an *[impl-path]* to the impl's pare > > Recommended demangling: `::foo` -#### Path: Impl +### Path: Impl [impl-path]: #path-impl > impl-path → *[disambiguator]*opt *[path]* @@ -316,7 +263,7 @@ The *[disambiguator]* can be used to distinguish between multiple impls within t > * `foo`: `::foo` > * `bar`: `::bar` -#### Path: Trait definition +### Path: Trait definition [trait-definition]: #path-trait-definition > trait-definition → `Y` *[type]* *[path]* @@ -350,7 +297,7 @@ It consists of the character `Y` followed by the *[type]* which is the `Self` ty > > Recommended demangling: `::example` -#### Path: Nested path +### Path: Nested path [nested-path]: #path-nested-path > nested-path → `N` *[namespace]* *[path]* *[identifier]* @@ -415,7 +362,7 @@ For example, entities like closures, tuple-like struct constructors, and anonymo > * `x`: `mycrate::main::{closure#0}` > * `y`: `mycrate::main::{closure#1}` -#### Path: Generic arguments +### Path: Generic arguments [generic-args]: #path-generic-arguments [generic-arg]: #path-generic-arguments @@ -462,7 +409,7 @@ Each *[generic-arg]* is either a *[lifetime]* (starting with the character `L`), > > Recommended demangling: `mycrate::example::` -#### Namespace +### Namespace [namespace]: #namespace > namespace → *[lower]* | *[upper]* @@ -482,7 +429,7 @@ Uppercase namespaces are: > > See *[nested-path]* for recommended demangling. -### Identifier +## Identifier [identifier]: #identifier [undisambiguated-identifier]: #identifier [bytes]: #identifier @@ -515,7 +462,7 @@ The `_` is mandatory if the *bytes* starts with a decimal digit or `_` in order > > The *[disambiguator]* may or may not be displayed; see recommendations for rules that use *identifier*. -#### Punycode identifiers +### Punycode identifiers [Punycode identifiers]: #punycode-identifiers Because some environments are restricted to ASCII alphanumerics and `_`, @@ -565,7 +512,7 @@ Here are some examples: [Punycode]: https://tools.ietf.org/html/rfc3492 -### Disambiguator +## Disambiguator [disambiguator]: #disambiguator > disambiguator → `s` *[base-62-number]* @@ -582,7 +529,7 @@ This allows disambiguators that are encoded sequentially to use minimal bytes. > > The *disambiguator* may or may not be displayed; see recommendations for rules that use *disambiguator*. -### Lifetime +## Lifetime [lifetime]: #lifetime > lifetime → `L` *[base-62-number]* @@ -632,7 +579,7 @@ Indices starting from 1 refer (as de Bruijn indices) to a higher-ranked lifetime > > Recommended demangling: `mycrate::example:: fn(&'a u8, &'b u16)>` -### Const +## Const [const]: #const [const-data]: #const [hex-digit]: #const @@ -695,7 +642,7 @@ The encoding of the *const-data* depends on the type: > Recommended demangling: `mycrate::example::<305419896>` -### Type +## Type [type]: #type [basic-type]: #basic-type [array-type]: #array-type @@ -881,7 +828,7 @@ The type encodings based on the initial tag character are: > > Recommended demangling: `mycrate::example::<[u16; 8]>` -### Binder +## Binder [binder]: #binder > binder → `G` *[base-62-number]* @@ -903,7 +850,7 @@ For example, in `for<'a, 'b> fn(for<'c> fn (...))`, any [lifetime]s in > A *binder* may be printed using `for<…>` syntax listing the lifetimes as recommended in *[lifetime]*. > See *[lifetime]* for an example. -### Backref +## Backref [backref]: #backref > backref → `B` *[base-62-number]* @@ -954,7 +901,7 @@ This is ensured by not allowing optional or repeating elements at the end of sub > > Recommended demangling: `mycrate::example::` -### Instantiating crate +## Instantiating crate [instantiating-crate]: #instantiating-crate > instantiating-crate → *[path]* @@ -988,7 +935,7 @@ so it is usually encoded as a *[backref]* to the *[crate-root]* encoded elsewher > > Recommended demangling: `::new::` -### Vendor-specific suffix +## Vendor-specific suffix [vendor-specific-suffix]: #vendor-specific-suffix [suffix]: #vendor-specific-suffix @@ -1030,7 +977,7 @@ the suffixed name has the same semantics as the original. > > Recommended demangling: `mycrate::EXAMPLE::__getit::__KEY` -### Common rules +## Common rules [decimal-number]: #common-rules [digit]: #common-rules [lower]: #common-rules @@ -1054,7 +1001,7 @@ A *digit* is an ASCII number. A *lower* and *upper* is an ASCII lower and uppercase letter respectively. -### base-62-number +## base-62-number [base-62-number]: #base-62-number > [base-62-number] → { *[digit]* | *[lower]* | *[upper]* } `_` @@ -1088,7 +1035,7 @@ Examples: | 63 | `10_` | | 1000 | `g7_` | -### Symbol grammar summary +## Symbol grammar summary [summary]: #symbol-grammar-summary The following is a summary of all of the productions of the symbol grammar. @@ -1189,7 +1136,7 @@ The following is a summary of all of the productions of the symbol grammar. > [lower] → `a` |`b` |`c` |`d` |`e` |`f` |`g` |`h` |`i` |`j` |`k` |`l` |`m` |`n` |`o` |`p` |`q` |`r` |`s` |`t` |`u` |`v` |`w` |`x` |`y` |`z` \ > [upper] → `A` | `B` | `C` | `D` | `E` | `F` | `G` | `H` | `I` | `J` | `K` | `L` | `M` | `N` | `O` | `P` | `Q` | `R` | `S` | `T` | `U` | `V` | `W` | `X` | `Y` | `Z` -### Encoding of Rust entities +## Encoding of Rust entities The following are guidelines for how Rust entities are encoded in a symbol. The compiler has some latitude in how an entity is encoded as long as the symbol is unambiguous.