Rollup merge of #135557 - estebank:wtf8, r=fee1-dead

Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix #76869.
This commit is contained in:
Matthias Krüger 2025-01-22 20:37:24 +01:00 committed by GitHub
commit b4266b0bcd
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
14 changed files with 88 additions and 20 deletions

View file

@ -101,6 +101,8 @@ pub fn load_errors(testfile: &Path, revision: Option<&str>) -> Vec<Error> {
rdr.lines()
.enumerate()
// We want to ignore utf-8 failures in tests during collection of annotations.
.filter(|(_, line)| line.is_ok())
.filter_map(|(line_num, line)| {
parse_expected(last_nonfollow_error, line_num + 1, &line.unwrap(), revision).map(
|(which, error)| {

View file

@ -58,7 +58,7 @@ pub fn check(tests_path: impl AsRef<Path>, bad: &mut bool) {
let mut expected_revisions = BTreeSet::new();
let contents = std::fs::read_to_string(test).unwrap();
let Ok(contents) = std::fs::read_to_string(test) else { continue };
// Collect directives.
iter_header(&contents, &mut |HeaderLine { revision, directive, .. }| {