Allow writing of incomplete UTF-8 sequences to the Windows console via stdout/stderr
# Problem
Writes of just an incomplete UTF-8 byte sequence (e.g. `b"\xC3"` or `b"\xF0\x9F"`) to stdout/stderr with a Windows console attached error with `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` even though further writes could complete the codepoint. This is currently a rare occurence since the [linewritershim](2c56ea38b0/library/std/src/io/buffered/linewritershim.rs) implementation flushes complete lines immediately and buffers up to 1024 bytes for incomplete lines. It can still happen as described in #83258.
The problem will become more pronounced once the developer can switch stdout/stderr from line-buffered to block-buffered or immediate when the changes in the "Switchable buffering for Stdout" pull request (#78515) get merged.
# Patch description
If there is at least one valid UTF-8 codepoint all valid UTF-8 is passed through to the extracted `write_valid_utf8_to_console()` fn. The new code only comes into play if `write()` is being passed a short byte slice comprising an incomplete UTF-8 codepoint. In this case up to three bytes are buffered in the `IncompleteUtf8` struct associated with `Stdout` / `Stderr`. The bytes are accepted one at a time. As soon as an error can be detected `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` is returned. Once a complete UTF-8 codepoint is received it is passed to the `write_valid_utf8_to_console()` and the buffer length is set to zero.
Calling `flush()` will neither error nor write anything if an incomplete codepoint is present in the buffer.
# Tests
Currently there are no Windows-specific tests for console writing code at all. Writing (regression) tests for this problem is a bit challenging since unit tests and UI tests don't run in a console and suddenly popping up another console window might be surprising to developers running the testsuite and it might not work at all in CI builds. To just test the new functionality in unit tests the code would need to be refactored. Some guidance on how to proceed would be appreciated.
# Public API changes
* `std::str::verifications::utf8_char_width()` would be exposed as `std::str::utf8_char_width()` behind the "str_internals" feature gate.
# Related issues
* Fixes#83258.
* PR #78515 will exacerbate the problem.
# Open questions
* Add tests?
* Squash into one commit with better commit message?
Use the return value of readdir_r() instead of errno
POSIX says:
> If successful, the readdir_r() function shall return zero; otherwise,
> an error number shall be returned to indicate the error.
But we were previously using errno instead of the return value. This
led to issue #86649.
POSIX says:
> If successful, the readdir_r() function shall return zero; otherwise,
> an error number shall be returned to indicate the error.
But we were previously using errno instead of the return value. This
led to issue #86649.
Unbox mutexes, condvars and rwlocks on hermit
[RustyHermit](https://github.com/hermitcore/rusty-hermit) provides now movable synchronization primitives and we are able to unbox mutexes and condvars.
Change WASI's `RawFd` from `u32` to `c_int` (`i32`).
WASI previously used `u32` as its `RawFd` type, since its "file descriptors"
are unsigned table indices, and there's no fundamental reason why WASI can't
have more than 2^31 handles.
However, this creates myriad little incompability problems with code
that also supports Unix platforms, where `RawFd` is `c_int`. While WASI
isn't a Unix, it often shares code with Unix, and this difference made
such shared code inconvenient. #87329 is the most recent example of such
code.
So, switch WASI to use `c_int`, which is `i32`. This will mean that code
intending to support WASI should ideally avoid assuming that negative file
descriptors are invalid, even though POSIX itself says that file descriptors
are never negative.
This is a breaking change, but `RawFd` is considerd an experimental
feature in [the documentation].
[the documentation]: https://doc.rust-lang.org/stable/std/os/wasi/io/type.RawFd.html
r? `@alexcrichton`
WASI previously used `u32` as its `RawFd` type, since its "file descriptors"
are unsigned table indices, and there's no fundamental reason why WASI can't
have more than 2^31 handles.
However, this creates myriad little incompability problems with code
that also supports Unix platforms, where `RawFd` is `c_int`. While WASI
isn't a Unix, it often shares code with Unix, and this difference made
such shared code inconvenient. #87329 is the most recent example of such
code.
So, switch WASI to use `c_int`, which is `i32`. This will mean that code
intending to support WASI should ideally avoid assuming that negative file
descriptors are invalid, even though POSIX itself says that file descriptors
are never negative.
This is a breaking change, but `RawFd` is considerd an experimental
feature in [the documentation].
[the documentation]: https://doc.rust-lang.org/stable/std/os/wasi/io/type.RawFd.html
Move `os_str_bytes` to `sys::unix`
Followup to #84967, with `OsStrExt` and `OsStringExt` moved out of `sys_common`, there is no reason anymore for `os_str_bytes` to live in `sys_common` and not in sys. This pr moves it to the location `sys::unix::os_str` and reuses the code on other platforms via `#[path]` (as is common in `sys`) instead of importing.
Change environment variable getters to error recoverably
This PR changes the standard library environment variable getter functions to error recoverably (i.e. not panic) when given an invalid value.
On some platforms, it is invalid for environment variable names to contain `'\0'` or `'='`, or for their values to contain `'\0'`. Currently, the standard library panics when manipulating environment variables with names or values that violate these invariants. However, this behavior doesn't make a lot of sense, at least in the case of getters. If the environment variable is missing, the standard library just returns an error value, rather than panicking. It doesn't make sense to treat the case where the variable is invalid any differently from that. See the [internals thread](https://internals.rust-lang.org/t/why-should-std-var-panic/14847) for discussion. Thus, this PR changes the functions to error recoverably in this case as well.
If desired, I could change the functions that manipulate environment variables in other ways as well. I didn't do that here because it wasn't entirely clear what to change them to. Should they error silently or do something else? If someone tells me how to change them, I'm happy to implement the changes.
This fixes#86082, an ICE that arises from the current behavior. It also adds a regression test to make sure the ICE does not occur again in the future.
`@rustbot` label +T-libs
r? `@joshtriplett`
Bump bootstrap compiler to 1.55
Changing the cfgs for stdarch is missing, but my understanding is that we don't need to do it as part of this PR?
r? `@Mark-Simulacrum`
Add Linux-specific pidfd process extensions (take 2)
Continuation of #77168.
I addressed the following concerns from the original PR:
- make `CommandExt` and `ChildExt` sealed traits
- wrap file descriptors in `PidFd` struct representing ownership over the fd
- add `take_pidfd` to take the fd out of `Child`
- close fd when dropped
Tracking Issue: #82971
Update VxWork's UNIX support
1. VxWorks does not provide glibc
2. VxWorks does provide `sigemptyset` and `sigaddset`
Note: these changes are concurrent to [this PR](https://github.com/rust-lang/libc/pull/2295) in libc.
Background:
Over the last year, pidfd support was added to the Linux kernel. This
allows interacting with other processes. In particular, this allows
waiting on a child process with a timeout in a race-free way, bypassing
all of the awful signal-handler tricks that are usually required.
Pidfds can be obtained for a child process (as well as any other
process) via the `pidfd_open` syscall. Unfortunately, this requires
several conditions to hold in order to be race-free (i.e. the pid is not
reused).
Per `man pidfd_open`:
```
· the disposition of SIGCHLD has not been explicitly set to SIG_IGN
(see sigaction(2));
· the SA_NOCLDWAIT flag was not specified while establishing a han‐
dler for SIGCHLD or while setting the disposition of that signal to
SIG_DFL (see sigaction(2)); and
· the zombie process was not reaped elsewhere in the program (e.g.,
either by an asynchronously executed signal handler or by wait(2)
or similar in another thread).
If any of these conditions does not hold, then the child process
(along with a PID file descriptor that refers to it) should instead
be created using clone(2) with the CLONE_PIDFD flag.
```
Sadly, these conditions are impossible to guarantee once any libraries
are used. For example, C code runnng in a different thread could call
`wait()`, which is impossible to detect from Rust code trying to open a
pidfd.
While pid reuse issues should (hopefully) be rare in practice, we can do
better. By passing the `CLONE_PIDFD` flag to `clone()` or `clone3()`, we
can obtain a pidfd for the child process in a guaranteed race-free
manner.
This PR:
This PR adds Linux-specific process extension methods to allow obtaining
pidfds for processes spawned via the standard `Command` API. Other than
being made available to user code, the standard library does not make
use of these pidfds in any way. In particular, the implementation of
`Child::wait` is completely unchanged.
Two Linux-specific helper methods are added: `CommandExt::create_pidfd`
and `ChildExt::pidfd`. These methods are intended to serve as a building
block for libraries to build higher-level abstractions - in particular,
waiting on a process with a timeout.
I've included a basic test, which verifies that pidfds are created iff
the `create_pidfd` method is used. This test is somewhat special - it
should always succeed on systems with the `clone3` system call
available, and always fail on systems without `clone3` available. I'm
not sure how to best ensure this programatically.
This PR relies on the newer `clone3` system call to pass the `CLONE_FD`,
rather than the older `clone` system call. `clone3` was added to Linux
in the same release as pidfds, so this shouldn't unnecessarily limit the
kernel versions that this code supports.
Unresolved questions:
* What should the name of the feature gate be for these newly added
methods?
* Should the `pidfd` method distinguish between an error occurring
and `create_pidfd` not being called?