rust/src/librustdoc
Michael Goulet b3f307434e
Rollup merge of #119468 - notriddle:notriddle/compression, r=GuillaumeGomez
rustdoc-search: tighter encoding for f index

Depends on https://github.com/rust-lang/rust/pull/119457

Two optimizations for the function signature search:

* Instead of using JSON arrays, like `[1,20]`, it uses VLQ
  hex with no commas, like `[aAd]`.
* This also adds backrefs: if you have more than one function
  with exactly the same signature, it'll not only store it once,
  it'll *decode* it once, and store in the typeIdMap only once.

Based partially on discussions on zulip:
https://rust-lang.zulipchat.com/#narrow/stream/266220-t-rustdoc/topic/search.20index.20size

Performance
-----------

https://notriddle.com/rustdoc-html-demo-8/compression-perf-v2/index.html

### memory/time profiler output (for more details, consult the above link)

<table>
<thead><tr><th>benchmark<th>before<th>after</tr></thead>
<tbody>
<tr><th>arti<td>

```
user: 002.789 s
sys:  000.390 s
wall: 002.096 s
child_RSS_high:     440796 KiB
group_mem_high:     414924 KiB
```

</td><td>

```
user: 002.295 s
sys:  000.278 s
wall: 001.738 s
child_RSS_high:     314588 KiB
group_mem_high:     285220 KiB
```

</td></tr><tr><th>cortex-m<td>

```
user: 000.127 s
sys:  000.030 s
wall: 000.134 s
child_RSS_high:      60264 KiB
group_mem_high:      23824 KiB
```

</td><td>

```
user: 000.136 s
sys:  000.038 s
wall: 000.137 s
child_RSS_high:      59204 KiB
group_mem_high:      22712 KiB
```

</td></tr><tr><th>sqlx<td>

```
user: 000.887 s
sys:  000.118 s
wall: 000.592 s
child_RSS_high:     190408 KiB
group_mem_high:     157804 KiB
```

</td><td>

```
user: 000.798 s
sys:  000.101 s
wall: 000.525 s
child_RSS_high:     159292 KiB
group_mem_high:     126292 KiB
```

</td></tr><tr><th>stm32f4<td>

```
user: 013.884 s
sys:  005.399 s
wall: 013.149 s
child_RSS_high:    1942244 KiB
group_mem_high:    1954916 KiB
```

</td><td>

```
user: 006.128 s
sys:  003.297 s
wall: 007.994 s
child_RSS_high:    1038108 KiB
group_mem_high:    1023900 KiB
```

</td></tr><tr><th>ripgrep<td>

```
user: 000.441 s
sys:  000.063 s
wall: 000.264 s
child_RSS_high:     109180 KiB
group_mem_high:      74272 KiB
```

</td><td>

```
user: 000.408 s
sys:  000.044 s
wall: 000.238 s
child_RSS_high:     101488 KiB
group_mem_high:      66000 KiB
```

</td></tr></tbody></table>

Size change
-----------

standard library without gzip:

```console
$ du -bs search-index-old.js search-index-new.js
4976370 search-index-old.js
4404391 search-index-new.js
```

((4976370-4404391)/4404391)*100% = 12.9%

with gzip:

```console
$ du -hs search-index-old.js.gz search-index-new.js.gz
520K    search-index-old.js.gz
504K    search-index-new.js.gz

$ du -bs search-index-old.js.gz search-index-new.js.gz
522092  search-index-old.js.gz
507654  search-index-new.js.gz
```

((522092-507654)/507654)*100% = 2.8%

Benchmarks are similarly shrunk.

Without gzip:

```console
$ du -hs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.js
10555067        tmp/arti/toolchain_old/doc/search-index.js
8921236 tmp/arti/toolchain_new/doc/search-index.js
77018   tmp/cortex-m/toolchain_old/doc/search-index.js
66676   tmp/cortex-m/toolchain_new/doc/search-index.js
2876330 tmp/sqlx/toolchain_old/doc/search-index.js
2436812 tmp/sqlx/toolchain_new/doc/search-index.js
63632890        tmp/stm32f4/toolchain_old/doc/search-index.js
52337438        tmp/stm32f4/toolchain_new/doc/search-index.js
631150  tmp/ripgrep/toolchain_old/doc/search-index.js
541646  tmp/ripgrep/toolchain_new/doc/search-index.js
```

With gzip:

```console
$ du -bs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.js.gz
1618852 tmp/arti/toolchain_old/doc/search-index.js.gz
1582007 tmp/arti/toolchain_new/doc/search-index.js.gz
16109   tmp/cortex-m/toolchain_old/doc/search-index.js.gz
15831   tmp/cortex-m/toolchain_new/doc/search-index.js.gz
422257  tmp/sqlx/toolchain_old/doc/search-index.js.gz
411507  tmp/sqlx/toolchain_new/doc/search-index.js.gz
4454761 tmp/stm32f4/toolchain_old/doc/search-index.js.gz
4334924 tmp/stm32f4/toolchain_new/doc/search-index.js.gz
98312   tmp/ripgrep/toolchain_old/doc/search-index.js.gz
96864   tmp/ripgrep/toolchain_new/doc/search-index.js.gz

$ du -hs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.j
s.gz
1.6M    tmp/arti/toolchain_old/doc/search-index.js.gz
1.6M    tmp/arti/toolchain_new/doc/search-index.js.gz
24K     tmp/cortex-m/toolchain_old/doc/search-index.js.gz
24K     tmp/cortex-m/toolchain_new/doc/search-index.js.gz
424K    tmp/sqlx/toolchain_old/doc/search-index.js.gz
412K    tmp/sqlx/toolchain_new/doc/search-index.js.gz
4.3M    tmp/stm32f4/toolchain_old/doc/search-index.js.gz
4.2M    tmp/stm32f4/toolchain_new/doc/search-index.js.gz
108K    tmp/ripgrep/toolchain_old/doc/search-index.js.gz
104K    tmp/ripgrep/toolchain_new/doc/search-index.js.gz
```
2024-01-05 23:41:42 -05:00
..
clean Remove Session methods that duplicate DiagCtxt methods. 2023-12-24 08:05:28 +11:00
doctest pass unused_extern_crates in librustdoc::doctest::make_test 2023-04-25 17:20:58 +03:00
formats rustdoc: Move AssocItemRender and RenderMode to html::render. 2023-11-29 00:56:47 +00:00
html Rollup merge of #119468 - notriddle:notriddle/compression, r=GuillaumeGomez 2024-01-05 23:41:42 -05:00
json Introduce const Trait (always-const trait bounds) 2023-12-27 12:51:32 +01:00
passes Rollup merge of #119538 - nnethercote:cleanup-errors-5, r=compiler-errors 2024-01-05 10:57:21 -05:00
theme rustdoc: merge theme css into rustdoc.css 2023-09-15 07:40:17 -07:00
askama.toml Remove unneeded minus sign in jinja tags 2023-03-06 11:38:15 +01:00
Cargo.toml Update itertools to 0.11. 2023-11-22 08:13:21 +11:00
config.rs Rename EarlyDiagCtxt methods to match DiagCtxt. 2023-12-23 13:23:28 +11:00
core.rs Rollup merge of #119601 - nnethercote:Emitter-cleanups, r=oli-obk 2024-01-05 10:57:24 -05:00
docfs.rs remove redundant imports 2023-12-10 10:56:22 +08:00
doctest.rs Rename EmitterWriter as HumanEmitter. 2024-01-05 10:02:40 +11:00
error.rs Remove crate visibility modifier in libs, tests 2022-05-21 00:32:47 -04:00
externalfiles.rs Rename many DiagCtxt arguments. 2023-12-18 16:06:22 +11:00
fold.rs rustdoc: bind typedef inner type items to the folding system 2023-08-26 00:15:02 +02:00
lib.rs Remove more Session methods that duplicate DiagCtxt methods. 2023-12-24 08:17:47 +11:00
lint.rs fix some typos found scrolling through the docs 2023-12-22 18:28:19 -05:00
markdown.rs Minimize pub usage in source_map.rs. 2023-11-02 19:35:00 +11:00
README.md rust-lang.github.io/rustc-dev-guide -> rustc-dev-guide.rust-lang.org 2020-03-10 17:08:18 -03:00
scrape_examples.rs Remove Session methods that duplicate DiagCtxt methods. 2023-12-24 08:05:28 +11:00
theme.rs Rename many DiagCtxt arguments. 2023-12-18 16:06:22 +11:00
visit.rs rustdoc: Rename clean items from typedef to type alias 2023-08-21 13:56:22 -07:00
visit_ast.rs Rename Session::span_diagnostic as Session::dcx. 2023-12-18 16:06:21 +11:00
visit_lib.rs Correctly handle --document-hidden-items 2023-07-14 17:25:09 +02:00

For more information about how librustdoc works, see the rustc dev guide.