Do not GC the current active incremental session directory

In `setup_dep_graph`, we set up a session directory for the current
incremental compilation session, load the dep graph, and then GC stale
incremental compilation sessions for the crate. The freshly-created
session directory ends up in this list of potentially-GC'd directories
but in practice is not typically even considered for GC because the new
directory is neither finalized nor `is_old_enough_to_be_collected`.

Unfortunately, `is_old_enough_to_be_collected` is a simple time check,
and if `load_dep_graph` is slow enough it's possible for the
freshly-created session directory to be tens of seconds old already.
Then, old enough to be *eligible* to GC, we try to `flock::Lock` it as
proof it is not owned by anyone else, and so is a stale working
directory.

Because we hold the lock in the same process, the behavior of
`flock::Lock` is dependent on platform-specifics about file locking
APIs. `fcntl(F_SETLK)`-style locks used on non-Linux Unices do not
provide mutual exclusion internal to a process. `fcntl_locking(2)` on
Linux describes some relevant problems:

```
       The record locks described above are associated with the process
       (unlike the open file description locks described below).  This
       has some unfortunate consequences:

       *  If a process closes any file descriptor referring to a file,
          then all of the process's locks on that file are released, [...]

       *  The threads in a process share locks.  In other words, a
          multithreaded program can't use record locking to ensure that
          threads don't simultaneously access the same region of a file.
```

`fcntl`-locks will appear to succeed to lock the fresh incremental
compilation directory, at which point we can remove it just before using
it later for incremental compilation. Saving incremental compilation
state later fails and takes rustc with it with an error like
```
[..]/target/debug/incremental/crate-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2)
```

The release-lock-on-close behavior has uncomfortable consequences for
the freshly-opened file description for the lock, but I think in
practice isn't an issue. If we would close the file, we failed to
acquire the lock, so someone else had the lock ad we're not releasing
locks prematurely.

`flock(LOCK_EX)` doesn't seem to have these same issues, and because
`flock::Lock::new` always opens a new file description when locking, I
don't think Linux can have this issue.

From reading `LockFileEx` on MSDN I *think* Windows has locking
semantics similar to `flock`, but I haven't tested there at all.

My conclusion is that there is no way to write a pure-POSIX
`flock::Lock::new` which guarantees mutual exclusion across different
file descriptions of the same file in the same process, and
`flock::Lock::new` must not be used for that purpose. So, instead, avoid
considering the current incremental session directory for GC in the
first place. Our own `sess` is evidence we're alive and using it.
This commit is contained in:
iximeow 2025-10-17 16:41:55 +00:00 committed by iximeow
parent a41214f9bd
commit 4e816d8bc5

View file

@ -721,11 +721,37 @@ pub(crate) fn garbage_collect_session_directories(sess: &Session) -> io::Result<
}
}
let current_session_directory_name =
session_directory.file_name().expect("session directory is not `..`");
// Now garbage collect the valid session directories.
let deletion_candidates =
lock_file_to_session_dir.items().filter_map(|(lock_file_name, directory_name)| {
debug!("garbage_collect_session_directories() - inspecting: {}", directory_name);
if directory_name.as_str() == current_session_directory_name {
// Skipping our own directory is, unfortunately, important for correctness.
//
// To summarize #147821: we will try to lock directories before deciding they can be
// garbage collected, but the ability of `flock::Lock` to detect a lock held *by the
// same process* varies across file locking APIs. Then, if our own session directory
// has become old enough to be eligible for GC, we are beholden to platform-specific
// details about detecting the our own lock on the session directory.
//
// POSIX `fcntl(F_SETLK)`-style file locks are maintained across a process. On
// systems where this is the mechanism for `flock::Lock`, there is no way to
// discover if an `flock::Lock` has been created in the same process on the same
// file. Attempting to set a lock on the lockfile again will succeed, even if the
// lock was set by another thread, on another file descriptor. Then we would
// garbage collect our own live directory, unable to tell it was locked perhaps by
// this same thread.
//
// It's not clear that `flock::Lock` can be fixed for this in general, and our own
// incremental session directory is the only one which this process may own, so skip
// it here and avoid the problem. We know it's not garbage anyway: we're using it.
return None;
}
let Ok(timestamp) = extract_timestamp_from_session_dir(directory_name) else {
debug!(
"found session-dir with malformed timestamp: {}",