Auto merge of #88362 - pietroalbini:bump-stage0, r=Mark-Simulacrum

Pin bootstrap checksums and add a tool to update it automatically

⚠️ ⚠️ This is just a proactive hardening we're performing on the build system, and it's not prompted by any known compromise. If you're aware of security issues being exploited please [check out our responsible disclosure page](https://www.rust-lang.org/policies/security). ⚠️ ⚠️

---

This PR aims to improve Rust's supply chain security by pinning the checksums of the bootstrap compiler downloaded by `x.py`, preventing a compromised `static.rust-lang.org` from affecting building the compiler. The checksums are stored in `src/stage0.json`, which replaces `src/stage0.txt`. This PR also adds a tool to automatically update the bootstrap compiler.

The changes in this PR were originally discussed in [Zulip](https://zulip-archive.rust-lang.org/stream/241545-t-release/topic/pinning.20stage0.20hashes.html).

## Potential attack

Before this PR, an attacker who wanted to compromise the bootstrap compiler would "just" need to:

1. Gain write access to `static.rust-lang.org`, either by compromising DNS or the underlying storage.
2. Upload compromised binaries and corresponding `.sha256` files to `static.rust-lang.org`.

There is no signature verification in `x.py` as we don't want the build system to depend on GPG. Also, since the checksums were not pinned inside the repository, they were downloaded from `static.rust-lang.org` too: this only protected from accidental changes in `static.rust-lang.org` that didn't change the `*.sha256` files. The attack would allow the attacker to compromise past and future invocations of `x.py`.

## Mitigations introduced in this PR

This PR adds pinned checksums for all the bootstrap components in `src/stage0.json` instead of downloading the checksums from `static.rust-lang.org`. This changes the attack scenario to:

1. Gain write access to `static.rust-lang.org`, either by compromising DNS or the underlying storage.
2. Upload compromised binaries to `static.rust-lang.org`.
3. Land a (reviewed) change in the `rust-lang/rust` repository changing the pinned hashes.

Even with a successful attack, existing clones of the Rust repository won't be affected, and once the attack is detected reverting the pinned hashes changes should be enough to be protected from the attack. This also enables further mitigations to be implemented in following PRs, such as verifying signatures when pinning new checksums (removing the trust on first use aspect of this PR) and adding a check in CI making sure a PR updating the checksum has not been tampered with (see the future improvements section).

## Additional changes

There are additional changes implemented in this PR to enable the mitigation:

* The `src/stage0.txt` file has been replaced with `src/stage0.json`. The reasoning for the change is that there is existing tooling to read and manipulate JSON files compared to the custom format we were using before, and the slight challenge of manually editing JSON files (no comments, no trailing commas) are not a problem thanks to the new `bump-stage0`.

* A new tool has been added to the repository, `bump-stage0`. When invoked, the tool automatically calculates which release should be used as the bootstrap compiler given the current version and channel, gathers all the relevant checksums and updates `src/stage0.json`. The tool can be invoked by running:

  ```
  ./x.py run src/tools/bump-stage0
  ```

* Support for downloading releases from `https://dev-static.rust-lang.org` has been removed, as it's not possible to verify checksums there (it's customary to replace existing artifacts there if a rebuild is warranted). This will require a change to the release process to avoid bumping the bootstrap compiler on beta before the stable release.

## Future improvements

* Add signature verification as part of `bump-stage0`, which would require the attacker to also obtain the release signing keys in order to successfully compromise the bootstrap compiler. This would be fine to add now, as the burden of installing the tool to verify signatures would only be placed on whoever updates the bootstrap compiler, instead of everyone compiling Rust.

* Add a check on CI that ensures the checksums in `src/stage0.json` are the expected ones. If a PR changes the stage0 file CI should also run the `bump-stage0` tool and fail if the output in CI doesn't match the committed file. This prevents the PR author from tweaking the output of the tool manually, which would otherwise be close to impossible for a human to detect.

* Automate creating the PRs bumping the bootstrap compiler, by setting up a scheduled job in GitHub Actions that runs the tool and opens a PR.

* Investigate whether a similar mitigation can be done for "download from CI" components like the prebuilt LLVM.

r? `@Mark-Simulacrum`
This commit is contained in:
bors 2021-09-06 16:01:17 +00:00
commit 8ceea01bb4
13 changed files with 677 additions and 149 deletions

View file

@ -4,6 +4,7 @@ import contextlib
import datetime
import distutils.version
import hashlib
import json
import os
import re
import shutil
@ -24,19 +25,17 @@ def support_xz():
except tarfile.CompressionError:
return False
def get(url, path, verbose=False, do_verify=True):
suffix = '.sha256'
sha_url = url + suffix
def get(base, url, path, checksums, verbose=False, do_verify=True):
with tempfile.NamedTemporaryFile(delete=False) as temp_file:
temp_path = temp_file.name
with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as sha_file:
sha_path = sha_file.name
try:
if do_verify:
download(sha_path, sha_url, False, verbose)
if url not in checksums:
raise RuntimeError("src/stage0.json doesn't contain a checksum for {}".format(url))
sha256 = checksums[url]
if os.path.exists(path):
if verify(path, sha_path, False):
if verify(path, sha256, False):
if verbose:
print("using already-download file", path)
return
@ -45,23 +44,17 @@ def get(url, path, verbose=False, do_verify=True):
print("ignoring already-download file",
path, "due to failed verification")
os.unlink(path)
download(temp_path, url, True, verbose)
if do_verify and not verify(temp_path, sha_path, verbose):
download(temp_path, "{}/{}".format(base, url), True, verbose)
if do_verify and not verify(temp_path, sha256, verbose):
raise RuntimeError("failed verification")
if verbose:
print("moving {} to {}".format(temp_path, path))
shutil.move(temp_path, path)
finally:
delete_if_present(sha_path, verbose)
delete_if_present(temp_path, verbose)
def delete_if_present(path, verbose):
"""Remove the given file if present"""
if os.path.isfile(path):
if verbose:
print("removing", path)
os.unlink(path)
if os.path.isfile(temp_path):
if verbose:
print("removing", temp_path)
os.unlink(temp_path)
def download(path, url, probably_big, verbose):
@ -98,14 +91,12 @@ def _download(path, url, probably_big, verbose, exception):
exception=exception)
def verify(path, sha_path, verbose):
def verify(path, expected, verbose):
"""Check if the sha256 sum of the given path is valid"""
if verbose:
print("verifying", path)
with open(path, "rb") as source:
found = hashlib.sha256(source.read()).hexdigest()
with open(sha_path, "r") as sha256sum:
expected = sha256sum.readline().split()[0]
verified = found == expected
if not verified:
print("invalid checksum:\n"
@ -176,15 +167,6 @@ def require(cmd, exit=True):
sys.exit(1)
def stage0_data(rust_root):
"""Build a dictionary from stage0.txt"""
nightlies = os.path.join(rust_root, "src/stage0.txt")
with open(nightlies, 'r') as nightlies:
lines = [line.rstrip() for line in nightlies
if not line.startswith("#")]
return dict([line.split(": ", 1) for line in lines if line])
def format_build_time(duration):
"""Return a nicer format for build time
@ -372,13 +354,22 @@ def output(filepath):
os.rename(tmp, filepath)
class Stage0Toolchain:
def __init__(self, stage0_payload):
self.date = stage0_payload["date"]
self.version = stage0_payload["version"]
def channel(self):
return self.version + "-" + self.date
class RustBuild(object):
"""Provide all the methods required to build Rust"""
def __init__(self):
self.date = ''
self.checksums_sha256 = {}
self.stage0_compiler = None
self.stage0_rustfmt = None
self._download_url = ''
self.rustc_channel = ''
self.rustfmt_channel = ''
self.build = ''
self.build_dir = ''
self.clean = False
@ -402,11 +393,10 @@ class RustBuild(object):
will move all the content to the right place.
"""
if rustc_channel is None:
rustc_channel = self.rustc_channel
rustfmt_channel = self.rustfmt_channel
rustc_channel = self.stage0_compiler.version
bin_root = self.bin_root(stage0)
key = self.date
key = self.stage0_compiler.date
if not stage0:
key += str(self.rustc_commit)
if self.rustc(stage0).startswith(bin_root) and \
@ -445,19 +435,23 @@ class RustBuild(object):
if self.rustfmt() and self.rustfmt().startswith(bin_root) and (
not os.path.exists(self.rustfmt())
or self.program_out_of_date(self.rustfmt_stamp(), self.rustfmt_channel)
or self.program_out_of_date(
self.rustfmt_stamp(),
"" if self.stage0_rustfmt is None else self.stage0_rustfmt.channel()
)
):
if rustfmt_channel:
if self.stage0_rustfmt is not None:
tarball_suffix = '.tar.xz' if support_xz() else '.tar.gz'
[channel, date] = rustfmt_channel.split('-', 1)
filename = "rustfmt-{}-{}{}".format(channel, self.build, tarball_suffix)
filename = "rustfmt-{}-{}{}".format(
self.stage0_rustfmt.version, self.build, tarball_suffix,
)
self._download_component_helper(
filename, "rustfmt-preview", tarball_suffix, key=date
filename, "rustfmt-preview", tarball_suffix, key=self.stage0_rustfmt.date
)
self.fix_bin_or_dylib("{}/bin/rustfmt".format(bin_root))
self.fix_bin_or_dylib("{}/bin/cargo-fmt".format(bin_root))
with output(self.rustfmt_stamp()) as rustfmt_stamp:
rustfmt_stamp.write(self.rustfmt_channel)
rustfmt_stamp.write(self.stage0_rustfmt.channel())
# Avoid downloading LLVM twice (once for stage0 and once for the master rustc)
if self.downloading_llvm() and stage0:
@ -518,7 +512,7 @@ class RustBuild(object):
):
if key is None:
if stage0:
key = self.date
key = self.stage0_compiler.date
else:
key = self.rustc_commit
cache_dst = os.path.join(self.build_dir, "cache")
@ -527,12 +521,21 @@ class RustBuild(object):
os.makedirs(rustc_cache)
if stage0:
url = "{}/dist/{}".format(self._download_url, key)
base = self._download_url
url = "dist/{}".format(key)
else:
url = "https://ci-artifacts.rust-lang.org/rustc-builds/{}".format(self.rustc_commit)
base = "https://ci-artifacts.rust-lang.org"
url = "rustc-builds/{}".format(self.rustc_commit)
tarball = os.path.join(rustc_cache, filename)
if not os.path.exists(tarball):
get("{}/{}".format(url, filename), tarball, verbose=self.verbose, do_verify=stage0)
get(
base,
"{}/{}".format(url, filename),
tarball,
self.checksums_sha256,
verbose=self.verbose,
do_verify=stage0,
)
unpack(tarball, tarball_suffix, self.bin_root(stage0), match=pattern, verbose=self.verbose)
def _download_ci_llvm(self, llvm_sha, llvm_assertions):
@ -542,7 +545,8 @@ class RustBuild(object):
if not os.path.exists(rustc_cache):
os.makedirs(rustc_cache)
url = "https://ci-artifacts.rust-lang.org/rustc-builds/{}".format(llvm_sha)
base = "https://ci-artifacts.rust-lang.org"
url = "rustc-builds/{}".format(llvm_sha)
if llvm_assertions:
url = url.replace('rustc-builds', 'rustc-builds-alt')
# ci-artifacts are only stored as .xz, not .gz
@ -554,7 +558,14 @@ class RustBuild(object):
filename = "rust-dev-nightly-" + self.build + tarball_suffix
tarball = os.path.join(rustc_cache, filename)
if not os.path.exists(tarball):
get("{}/{}".format(url, filename), tarball, verbose=self.verbose, do_verify=False)
get(
base,
"{}/{}".format(url, filename),
tarball,
self.checksums_sha256,
verbose=self.verbose,
do_verify=False,
)
unpack(tarball, tarball_suffix, self.llvm_root(),
match="rust-dev",
verbose=self.verbose)
@ -816,7 +827,7 @@ class RustBuild(object):
def rustfmt(self):
"""Return config path for rustfmt"""
if not self.rustfmt_channel:
if self.stage0_rustfmt is None:
return None
return self.program_config('rustfmt')
@ -1040,19 +1051,12 @@ class RustBuild(object):
self.update_submodule(module[0], module[1], recorded_submodules)
print("Submodules updated in %.2f seconds" % (time() - start_time))
def set_normal_environment(self):
def set_dist_environment(self, url):
"""Set download URL for normal environment"""
if 'RUSTUP_DIST_SERVER' in os.environ:
self._download_url = os.environ['RUSTUP_DIST_SERVER']
else:
self._download_url = 'https://static.rust-lang.org'
def set_dev_environment(self):
"""Set download URL for development environment"""
if 'RUSTUP_DEV_DIST_SERVER' in os.environ:
self._download_url = os.environ['RUSTUP_DEV_DIST_SERVER']
else:
self._download_url = 'https://dev-static.rust-lang.org'
self._download_url = url
def check_vendored_status(self):
"""Check that vendoring is configured properly"""
@ -1161,17 +1165,14 @@ def bootstrap(help_triggered):
build_dir = build.get_toml('build-dir', 'build') or 'build'
build.build_dir = os.path.abspath(build_dir.replace("$ROOT", build.rust_root))
data = stage0_data(build.rust_root)
build.date = data['date']
build.rustc_channel = data['rustc']
with open(os.path.join(build.rust_root, "src", "stage0.json")) as f:
data = json.load(f)
build.checksums_sha256 = data["checksums_sha256"]
build.stage0_compiler = Stage0Toolchain(data["compiler"])
if data.get("rustfmt") is not None:
build.stage0_rustfmt = Stage0Toolchain(data["rustfmt"])
if "rustfmt" in data:
build.rustfmt_channel = data['rustfmt']
if 'dev' in data:
build.set_dev_environment()
else:
build.set_normal_environment()
build.set_dist_environment(data["dist_server"])
build.build = args.build or build.build_triple()
build.update_submodules()

View file

@ -13,38 +13,18 @@ from shutil import rmtree
import bootstrap
class Stage0DataTestCase(unittest.TestCase):
"""Test Case for stage0_data"""
def setUp(self):
self.rust_root = tempfile.mkdtemp()
os.mkdir(os.path.join(self.rust_root, "src"))
with open(os.path.join(self.rust_root, "src",
"stage0.txt"), "w") as stage0:
stage0.write("#ignore\n\ndate: 2017-06-15\nrustc: beta\ncargo: beta\nrustfmt: beta")
def tearDown(self):
rmtree(self.rust_root)
def test_stage0_data(self):
"""Extract data from stage0.txt"""
expected = {"date": "2017-06-15", "rustc": "beta", "cargo": "beta", "rustfmt": "beta"}
data = bootstrap.stage0_data(self.rust_root)
self.assertDictEqual(data, expected)
class VerifyTestCase(unittest.TestCase):
"""Test Case for verify"""
def setUp(self):
self.container = tempfile.mkdtemp()
self.src = os.path.join(self.container, "src.txt")
self.sums = os.path.join(self.container, "sums")
self.bad_src = os.path.join(self.container, "bad.txt")
content = "Hello world"
self.expected = hashlib.sha256(content.encode("utf-8")).hexdigest()
with open(self.src, "w") as src:
src.write(content)
with open(self.sums, "w") as sums:
sums.write(hashlib.sha256(content.encode("utf-8")).hexdigest())
with open(self.bad_src, "w") as bad:
bad.write("Hello!")
@ -53,11 +33,11 @@ class VerifyTestCase(unittest.TestCase):
def test_valid_file(self):
"""Check if the sha256 sum of the given file is valid"""
self.assertTrue(bootstrap.verify(self.src, self.sums, False))
self.assertTrue(bootstrap.verify(self.src, self.expected, False))
def test_invalid_file(self):
"""Should verify that the file is invalid"""
self.assertFalse(bootstrap.verify(self.bad_src, self.sums, False))
self.assertFalse(bootstrap.verify(self.bad_src, self.expected, False))
class ProgramOutOfDate(unittest.TestCase):
@ -99,7 +79,6 @@ if __name__ == '__main__':
TEST_LOADER = unittest.TestLoader()
SUITE.addTest(doctest.DocTestSuite(bootstrap))
SUITE.addTests([
TEST_LOADER.loadTestsFromTestCase(Stage0DataTestCase),
TEST_LOADER.loadTestsFromTestCase(VerifyTestCase),
TEST_LOADER.loadTestsFromTestCase(ProgramOutOfDate)])

View file

@ -523,7 +523,7 @@ impl<'a> Builder<'a> {
install::Src,
install::Rustc
),
Kind::Run => describe!(run::ExpandYamlAnchors, run::BuildManifest),
Kind::Run => describe!(run::ExpandYamlAnchors, run::BuildManifest, run::BumpStage0),
}
}

View file

@ -31,7 +31,7 @@
//! When you execute `x.py build`, the steps executed are:
//!
//! * First, the python script is run. This will automatically download the
//! stage0 rustc and cargo according to `src/stage0.txt`, or use the cached
//! stage0 rustc and cargo according to `src/stage0.json`, or use the cached
//! versions if they're available. These are then used to compile rustbuild
//! itself (using Cargo). Finally, control is then transferred to rustbuild.
//!

View file

@ -82,3 +82,24 @@ impl Step for BuildManifest {
builder.run(&mut cmd);
}
}
#[derive(Debug, PartialOrd, Ord, Copy, Clone, Hash, PartialEq, Eq)]
pub struct BumpStage0;
impl Step for BumpStage0 {
type Output = ();
const ONLY_HOSTS: bool = true;
fn should_run(run: ShouldRun<'_>) -> ShouldRun<'_> {
run.path("src/tools/bump-stage0")
}
fn make_run(run: RunConfig<'_>) {
run.builder.ensure(BumpStage0);
}
fn run(self, builder: &Builder<'_>) -> Self::Output {
let mut cmd = builder.tool_cmd(Tool::BumpStage0);
builder.run(&mut cmd);
}
}

View file

@ -15,7 +15,7 @@ use std::fs;
use std::path::PathBuf;
use std::process::Command;
use build_helper::{output, t};
use build_helper::output;
use crate::cache::INTERNER;
use crate::config::Target;
@ -227,14 +227,4 @@ $ pacman -R cmake && pacman -S mingw-w64-x86_64-cmake
if let Some(ref s) = build.config.ccache {
cmd_finder.must_have(s);
}
if build.config.channel == "stable" {
let stage0 = t!(fs::read_to_string(build.src.join("src/stage0.txt")));
if stage0.contains("\ndev:") {
panic!(
"bootstrapping from a dev compiler in a stable release, but \
should only be bootstrapping from a released compiler!"
);
}
}
}

View file

@ -377,6 +377,7 @@ bootstrap_tool!(
LintDocs, "src/tools/lint-docs", "lint-docs";
JsonDocCk, "src/tools/jsondocck", "jsondocck";
HtmlChecker, "src/tools/html-checker", "html-checker";
BumpStage0, "src/tools/bump-stage0", "bump-stage0";
);
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq, Ord, PartialOrd)]