Eliminate bunch of copies of error codepath from Utf8LossyChunksIter
Using a macro to stamp out 7 identical copies of the nontrivial slicing logic to exit this loop didn't seem like a necessary use of a macro. The early return case can be handled by `break` without practically any changes to the logic inside the loop.
All this code is from early 2014 (#12062—nearly 8 years ago; pre-1.0) so it's possible there were compiler limitations that forced the macro way at the time.
Confirmed that `x.py bench library/alloc --stage 0 --test-args from_utf8_lossy` is unaffected on my machine.
Using a macro to stamp out 7 identical copies of the nontrivial slicing
logic to exit this loop didn't seem like a necessary use of a macro. The
early return case can be handled by `break` without practically any
changes to the logic inside the loop.
All this code is from early 2014 (7.5 years old, pre-1.0) so it's
possible there were compiler limitations that forced the macro way at
the time.
Confirmed that `x.py bench library/alloc --stage 0 --test-args from_utf8_lossy`
is unaffected on my machine.
The functions are now `unsafe` and they use `Option::unwrap_unchecked` instead of `unwrap_or_0`
`unwrap_or_0` was added in 42357d772b. I guess `unwrap_unchecked` was not available back then.
Given this example:
```rust
pub fn first_char(s: &str) -> Option<char> {
s.chars().next()
}
```
Previously, the following assembly was produced:
```asm
_ZN7example10first_char17ha056ddea6bafad1cE:
.cfi_startproc
test rsi, rsi
je .LBB0_1
movzx edx, byte ptr [rdi]
test dl, dl
js .LBB0_3
mov eax, edx
ret
.LBB0_1:
mov eax, 1114112
ret
.LBB0_3:
lea r8, [rdi + rsi]
xor eax, eax
mov r9, r8
cmp rsi, 1
je .LBB0_5
movzx eax, byte ptr [rdi + 1]
add rdi, 2
and eax, 63
mov r9, rdi
.LBB0_5:
mov ecx, edx
and ecx, 31
cmp dl, -33
jbe .LBB0_6
cmp r9, r8
je .LBB0_9
movzx esi, byte ptr [r9]
add r9, 1
and esi, 63
shl eax, 6
or eax, esi
cmp dl, -16
jb .LBB0_12
.LBB0_13:
cmp r9, r8
je .LBB0_14
movzx edx, byte ptr [r9]
and edx, 63
jmp .LBB0_16
.LBB0_6:
shl ecx, 6
or eax, ecx
ret
.LBB0_9:
xor esi, esi
mov r9, r8
shl eax, 6
or eax, esi
cmp dl, -16
jae .LBB0_13
.LBB0_12:
shl ecx, 12
or eax, ecx
ret
.LBB0_14:
xor edx, edx
.LBB0_16:
and ecx, 7
shl ecx, 18
shl eax, 6
or eax, ecx
or eax, edx
ret
```
After this change, the assembly is reduced to:
```asm
_ZN7example10first_char17h4318683472f884ccE:
.cfi_startproc
test rsi, rsi
je .LBB0_1
movzx ecx, byte ptr [rdi]
test cl, cl
js .LBB0_3
mov eax, ecx
ret
.LBB0_1:
mov eax, 1114112
ret
.LBB0_3:
mov eax, ecx
and eax, 31
movzx esi, byte ptr [rdi + 1]
and esi, 63
cmp cl, -33
jbe .LBB0_4
movzx edx, byte ptr [rdi + 2]
shl esi, 6
and edx, 63
or edx, esi
cmp cl, -16
jb .LBB0_7
movzx ecx, byte ptr [rdi + 3]
and eax, 7
shl eax, 18
shl edx, 6
and ecx, 63
or ecx, edx
or eax, ecx
ret
.LBB0_4:
shl eax, 6
or eax, esi
ret
.LBB0_7:
shl eax, 12
or eax, edx
ret
```
This commit makes the following functions from `core::str` `const fn`:
- `from_utf8[_mut]` (`feature(const_str_from_utf8)`)
- `from_utf8_unchecked_mut` (`feature(const_str_from_utf8_unchecked_mut)`)
- `Utf8Error::{valid_up_to,error_len}` (`feature(const_str_from_utf8)`)
Add #[must_use] to remaining core functions
I've run out of compelling reasons to group functions together across crates so I'm just going to go module-by-module. This is everything remaining from the `core` crate.
Ignored by clippy for reasons unknown:
```rust
core::alloc::Layout unsafe fn for_value_raw<T: ?Sized>(t: *const T) -> Self;
core::any const fn type_name_of_val<T: ?Sized>(_val: &T) -> &'static str;
```
Ignored by clippy because of `mut`:
```rust
str fn split_at_mut(&mut self, mid: usize) -> (&mut str, &mut str);
```
<del>
Ignored by clippy presumably because a caller might want `f` called for side effects. That seems like a bad usage of `map` to me.
```rust
core::cell::Ref<'b, T> fn map<U: ?Sized, F>(orig: Ref<'b, T>, f: F) -> Ref<'b, T>;
core::cell::Ref<'b, T> fn map_split<U: ?Sized, V: ?Sized, F>(orig: Ref<'b, T>, f: F) -> (Ref<'b, U>, Ref<'b, V>);
```
</del>
Parent issue: #89692
r? ```@joshtriplett```
Add #[must_use] to expensive computations
The unifying theme for this commit is weak, admittedly. I put together a list of "expensive" functions when I originally proposed this whole effort, but nobody's cared about that criterion. Still, it's a decent way to bite off a not-too-big chunk of work.
Given the grab bag nature of this commit, the messages I used vary quite a bit. I'm open to wording changes.
For some reason clippy flagged four `BTreeSet` methods but didn't say boo about equivalent ones on `HashSet`. I stared at them for a while but I can't figure out the difference so I added the `HashSet` ones in.
```rust
// Flagged by clippy.
alloc::collections::btree_set::BTreeSet<T> fn difference<'a>(&'a self, other: &'a BTreeSet<T>) -> Difference<'a, T>;
alloc::collections::btree_set::BTreeSet<T> fn symmetric_difference<'a>(&'a self, other: &'a BTreeSet<T>) -> SymmetricDifference<'a, T>
alloc::collections::btree_set::BTreeSet<T> fn intersection<'a>(&'a self, other: &'a BTreeSet<T>) -> Intersection<'a, T>;
alloc::collections::btree_set::BTreeSet<T> fn union<'a>(&'a self, other: &'a BTreeSet<T>) -> Union<'a, T>;
// Ignored by clippy, but not by me.
std::collections::HashSet<T, S> fn difference<'a>(&'a self, other: &'a HashSet<T, S>) -> Difference<'a, T, S>;
std::collections::HashSet<T, S> fn symmetric_difference<'a>(&'a self, other: &'a HashSet<T, S>) -> SymmetricDifference<'a, T, S>
std::collections::HashSet<T, S> fn intersection<'a>(&'a self, other: &'a HashSet<T, S>) -> Intersection<'a, T, S>;
std::collections::HashSet<T, S> fn union<'a>(&'a self, other: &'a HashSet<T, S>) -> Union<'a, T, S>;
```
Parent issue: #89692
r? ```@joshtriplett```
Using short-circuiting operators makes it easier to perform some kinds
of source code analysis, like MC/DC code coverage (a requirement in
safety-critical environments). The optimized x86_64 assembly is
equivalent between the old and new versions.
Old assembly of that condition:
```
mov rax, qword ptr [rdi + rdx + 8]
or rax, qword ptr [rdi + rdx]
test rax, r9
je .LBB0_7
```
New assembly of that condition:
```
mov rax, qword ptr [rdi + rdx]
or rax, qword ptr [rdi + rdx + 8]
test rax, r8
je .LBB0_7
```
My change to use `Type::def_id()` (formerly `Type::def_id_full()`) in
more places caused some docs to show up that used to be missed by
rustdoc. Those docs contained unescaped square brackets, which triggered
linkcheck errors. This commit escapes the square brackets and adds this
particular instance to the linkcheck exception list.
The unifying theme for this commit is weak, admittedly. I put together a
list of "expensive" functions when I originally proposed this whole
effort, but nobody's cared about that criterion. Still, it's a decent
way to bite off a not-too-big chunk of work.
Given the grab bag nature of this commit, the messages I used vary quite
a bit.
Add #[must_use] to from_value conversions
I added two methods to the list myself. Clippy did not flag them because they take `mut` args, but neither modifies their argument.
```rust
core::str const unsafe fn from_utf8_unchecked_mut(v: &mut [u8]) -> &mut str;
std::ffi::CString unsafe fn from_raw(ptr: *mut c_char) -> CString;
```
I put a custom note on `from_raw`:
```rust
#[must_use = "call `drop(from_raw(ptr))` if you intend to drop the `CString`"]
pub unsafe fn from_raw(ptr: *mut c_char) -> CString {
```
Parent issue: #89692
r? ``@joshtriplett``
Add #[must_use] to string/char transformation methods
These methods could be misconstrued as modifying their arguments instead of returning new values.
Where possible I made the note recommend a method that does mutate in place.
Parent issue: #89692
These methods could be misconstrued as modifying their arguments instead
of returning new values.
Where possible I made the note recommend a method that does mutate in
place.
Allow writing of incomplete UTF-8 sequences to the Windows console via stdout/stderr
# Problem
Writes of just an incomplete UTF-8 byte sequence (e.g. `b"\xC3"` or `b"\xF0\x9F"`) to stdout/stderr with a Windows console attached error with `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` even though further writes could complete the codepoint. This is currently a rare occurence since the [linewritershim](2c56ea38b0/library/std/src/io/buffered/linewritershim.rs) implementation flushes complete lines immediately and buffers up to 1024 bytes for incomplete lines. It can still happen as described in #83258.
The problem will become more pronounced once the developer can switch stdout/stderr from line-buffered to block-buffered or immediate when the changes in the "Switchable buffering for Stdout" pull request (#78515) get merged.
# Patch description
If there is at least one valid UTF-8 codepoint all valid UTF-8 is passed through to the extracted `write_valid_utf8_to_console()` fn. The new code only comes into play if `write()` is being passed a short byte slice comprising an incomplete UTF-8 codepoint. In this case up to three bytes are buffered in the `IncompleteUtf8` struct associated with `Stdout` / `Stderr`. The bytes are accepted one at a time. As soon as an error can be detected `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` is returned. Once a complete UTF-8 codepoint is received it is passed to the `write_valid_utf8_to_console()` and the buffer length is set to zero.
Calling `flush()` will neither error nor write anything if an incomplete codepoint is present in the buffer.
# Tests
Currently there are no Windows-specific tests for console writing code at all. Writing (regression) tests for this problem is a bit challenging since unit tests and UI tests don't run in a console and suddenly popping up another console window might be surprising to developers running the testsuite and it might not work at all in CI builds. To just test the new functionality in unit tests the code would need to be refactored. Some guidance on how to proceed would be appreciated.
# Public API changes
* `std::str::verifications::utf8_char_width()` would be exposed as `std::str::utf8_char_width()` behind the "str_internals" feature gate.
# Related issues
* Fixes#83258.
* PR #78515 will exacerbate the problem.
# Open questions
* Add tests?
* Squash into one commit with better commit message?
The libs-api team agrees to allow const_trait_impl to appear in the
standard library as long as stable code cannot be broken (they are
properly gated) this means if the compiler teams thinks it's okay, then
it's okay.
My priority on constifying would be:
1. Non-generic impls (e.g. Default) or generic impls with no
bounds
2. Generic functions with bounds (that use const impls)
3. Generic impls with bounds
4. Impls for traits with associated types
For people opening constification PRs: please cc me and/or oli-obk.
This will not affect ABI since the other variant of the enum is bigger.
It may break some code, but that would be very strange: usually people
don't continue after the first `Done` (or `None` for a normal iterator).
Remove some doc aliases
As per the new doc alias policy in https://github.com/rust-lang/std-dev-guide/pull/25, this removes some controversial doc aliases:
- `malloc`, `alloc`, `realloc`, etc.
- `length` (alias for `len`)
- `delete` (alias for `remove` in collections and also file/directory deletion)
r? `@joshtriplett`
Stabilize `str::from_utf8_unchecked` as `const`
This stabilizes `unsafe fn str::from_utf8_unchecked` as `const` pending FCP on #75196. By the time FCP finishes, the beta will have already been cut, so I've set 1.55 as the stable-since version.
(should also be +relnotes but I don't have the permission to do that)
r? `@m-ou-se`
Closes#75196
Due to the std/alloc split, it is not possible to make
`alloc::collections::TryReserveError::AllocError` non-exhaustive without
having an unstable, doc-hidden method to construct (which negates the
benefits from `#[non_exhaustive]`.