Commit graph

147 commits

Author SHA1 Message Date
ltdk
edd318c313 Add {floor,ceil}_char_boundary methods to str 2022-02-07 13:34:08 -05:00
Thom Chiovoloni
41f821461f
Fix comment grammar for do_count_chars 2022-02-05 11:17:10 -08:00
Thom Chiovoloni
ebbccaf6bf
Respond to review feedback, and improve implementation somewhat 2022-02-05 11:15:18 -08:00
Thom Chiovoloni
628b217326
Optimize core::str::Chars::count 2022-02-05 11:15:17 -08:00
Frank Steffahn
a957cefda6 Fix a bunch of typos 2021-12-14 16:40:43 +01:00
japm48
0d7b830139 doc: fix typo in comments
dereferencable -> dereferenceable
2021-12-12 00:27:27 +01:00
David Tolnay
4b0a9c9bc3
Delete Utf8Lossy::from_str 2021-12-08 22:54:51 -08:00
bors
94bec90702 Auto merge of #91244 - dtolnay:lossy, r=Mark-Simulacrum
Eliminate bunch of copies of error codepath from Utf8LossyChunksIter

Using a macro to stamp out 7 identical copies of the nontrivial slicing logic to exit this loop didn't seem like a necessary use of a macro. The early return case can be handled by `break` without practically any changes to the logic inside the loop.

All this code is from early 2014 (#12062—nearly 8 years ago; pre-1.0) so it's possible there were compiler limitations that forced the macro way at the time.

Confirmed that `x.py bench library/alloc --stage 0 --test-args from_utf8_lossy` is unaffected on my machine.
2021-11-30 01:08:56 +00:00
David Tolnay
c6810a569f
Clarify safety comment on using i to index into self.source 2021-11-26 12:57:36 -08:00
David Tolnay
2be9a8349f
Eliminate bunch of copies of error codepath from Utf8LossyChunksIter
Using a macro to stamp out 7 identical copies of the nontrivial slicing
logic to exit this loop didn't seem like a necessary use of a macro. The
early return case can be handled by `break` without practically any
changes to the logic inside the loop.

All this code is from early 2014 (7.5 years old, pre-1.0) so it's
possible there were compiler limitations that forced the macro way at
the time.

Confirmed that `x.py bench library/alloc --stage 0 --test-args from_utf8_lossy`
is unaffected on my machine.
2021-11-25 19:52:45 -08:00
David Tolnay
553a84c445
Saner formatting for UTF8_CHAR_WIDTH table 2021-11-25 18:18:36 -08:00
Eduardo Sánchez Muñoz
23637e20cd libcore: assume the input of next_code_point and next_code_point_reverse is UTF-8-like
The functions are now `unsafe` and they use `Option::unwrap_unchecked` instead of `unwrap_or_0`

`unwrap_or_0` was added in 42357d772b. I guess `unwrap_unchecked` was not available back then.

Given this example:

```rust
pub fn first_char(s: &str) -> Option<char> {
    s.chars().next()
}
```

Previously, the following assembly was produced:

```asm
_ZN7example10first_char17ha056ddea6bafad1cE:
	.cfi_startproc
	test	rsi, rsi
	je	.LBB0_1
	movzx	edx, byte ptr [rdi]
	test	dl, dl
	js	.LBB0_3
	mov	eax, edx
	ret
.LBB0_1:
	mov	eax, 1114112
	ret
.LBB0_3:
	lea	r8, [rdi + rsi]
	xor	eax, eax
	mov	r9, r8
	cmp	rsi, 1
	je	.LBB0_5
	movzx	eax, byte ptr [rdi + 1]
	add	rdi, 2
	and	eax, 63
	mov	r9, rdi
.LBB0_5:
	mov	ecx, edx
	and	ecx, 31
	cmp	dl, -33
	jbe	.LBB0_6
	cmp	r9, r8
	je	.LBB0_9
	movzx	esi, byte ptr [r9]
	add	r9, 1
	and	esi, 63
	shl	eax, 6
	or	eax, esi
	cmp	dl, -16
	jb	.LBB0_12
.LBB0_13:
	cmp	r9, r8
	je	.LBB0_14
	movzx	edx, byte ptr [r9]
	and	edx, 63
	jmp	.LBB0_16
.LBB0_6:
	shl	ecx, 6
	or	eax, ecx
	ret
.LBB0_9:
	xor	esi, esi
	mov	r9, r8
	shl	eax, 6
	or	eax, esi
	cmp	dl, -16
	jae	.LBB0_13
.LBB0_12:
	shl	ecx, 12
	or	eax, ecx
	ret
.LBB0_14:
	xor	edx, edx
.LBB0_16:
	and	ecx, 7
	shl	ecx, 18
	shl	eax, 6
	or	eax, ecx
	or	eax, edx
	ret
```

After this change, the assembly is reduced to:

```asm
_ZN7example10first_char17h4318683472f884ccE:
	.cfi_startproc
	test	rsi, rsi
	je	.LBB0_1
	movzx	ecx, byte ptr [rdi]
	test	cl, cl
	js	.LBB0_3
	mov	eax, ecx
	ret
.LBB0_1:
	mov	eax, 1114112
	ret
.LBB0_3:
	mov	eax, ecx
	and	eax, 31
	movzx	esi, byte ptr [rdi + 1]
	and	esi, 63
	cmp	cl, -33
	jbe	.LBB0_4
	movzx	edx, byte ptr [rdi + 2]
	shl	esi, 6
	and	edx, 63
	or	edx, esi
	cmp	cl, -16
	jb	.LBB0_7
	movzx	ecx, byte ptr [rdi + 3]
	and	eax, 7
	shl	eax, 18
	shl	edx, 6
	and	ecx, 63
	or	ecx, edx
	or	eax, ecx
	ret
.LBB0_4:
	shl	eax, 6
	or	eax, esi
	ret
.LBB0_7:
	shl	eax, 12
	or	eax, edx
	ret
```
2021-11-21 17:05:55 +01:00
Maybe Waffle
573a00e3f9 Fill in tracking issues for const_str_from_utf8 and const_str_from_utf8_unchecked_mut features 2021-11-18 14:04:01 +03:00
Maybe Waffle
cf6f64a963 Make slice->str conversion and related functions const
This commit makes the following functions from `core::str` `const fn`:
- `from_utf8[_mut]` (`feature(const_str_from_utf8)`)
- `from_utf8_unchecked_mut` (`feature(const_str_from_utf8_unchecked_mut)`)
- `Utf8Error::{valid_up_to,error_len}` (`feature(const_str_from_utf8)`)
2021-11-18 00:50:42 +03:00
bors
c7e4740ec1 Auto merge of #86336 - camsteffen:char-array-pattern, r=joshtriplett
impl Pattern for char array

Closes #39511
Closes #86329
2021-10-31 15:45:39 +00:00
Matthias Krüger
88e5ae2dd3
Rollup merge of #89786 - jkugelman:must-use-len-and-is_empty, r=joshtriplett
Add #[must_use] to len and is_empty

Parent issue: #89692

r? `@joshtriplett`
2021-10-31 13:20:05 +01:00
Matthias Krüger
95750ae439
Rollup merge of #89897 - jkugelman:must-use-core, r=joshtriplett
Add #[must_use] to remaining core functions

I've run out of compelling reasons to group functions together across crates so I'm just going to go module-by-module. This is everything remaining from the `core` crate.

Ignored by clippy for reasons unknown:

```rust
core::alloc::Layout   unsafe fn for_value_raw<T: ?Sized>(t: *const T) -> Self;
core::any             const fn type_name_of_val<T: ?Sized>(_val: &T) -> &'static str;
```

Ignored by clippy because of `mut`:

```rust
str   fn split_at_mut(&mut self, mid: usize) -> (&mut str, &mut str);
```

<del>
Ignored by clippy presumably because a caller might want `f` called for side effects. That seems like a bad usage of `map` to me.

```rust
core::cell::Ref<'b, T>   fn map<U: ?Sized, F>(orig: Ref<'b, T>, f: F) -> Ref<'b, T>;
core::cell::Ref<'b, T>   fn map_split<U: ?Sized, V: ?Sized, F>(orig: Ref<'b, T>, f: F) -> (Ref<'b, U>, Ref<'b, V>);
```
</del>

Parent issue: #89692

r? ```@joshtriplett```
2021-10-31 09:20:26 +01:00
Matthias Krüger
a26b1d2259
Rollup merge of #89835 - jkugelman:must-use-expensive-computations, r=joshtriplett
Add #[must_use] to expensive computations

The unifying theme for this commit is weak, admittedly. I put together a list of "expensive" functions when I originally proposed this whole effort, but nobody's cared about that criterion. Still, it's a decent way to bite off a not-too-big chunk of work.

Given the grab bag nature of this commit, the messages I used vary quite a bit. I'm open to wording changes.

For some reason clippy flagged four `BTreeSet` methods but didn't say boo about equivalent ones on `HashSet`. I stared at them for a while but I can't figure out the difference so I added the `HashSet` ones in.

```rust
// Flagged by clippy.
alloc::collections::btree_set::BTreeSet<T>   fn difference<'a>(&'a self, other: &'a BTreeSet<T>) -> Difference<'a, T>;
alloc::collections::btree_set::BTreeSet<T>   fn symmetric_difference<'a>(&'a self, other: &'a BTreeSet<T>) -> SymmetricDifference<'a, T>
alloc::collections::btree_set::BTreeSet<T>   fn intersection<'a>(&'a self, other: &'a BTreeSet<T>) -> Intersection<'a, T>;
alloc::collections::btree_set::BTreeSet<T>   fn union<'a>(&'a self, other: &'a BTreeSet<T>) -> Union<'a, T>;

// Ignored by clippy, but not by me.
std::collections::HashSet<T, S>              fn difference<'a>(&'a self, other: &'a HashSet<T, S>) -> Difference<'a, T, S>;
std::collections::HashSet<T, S>              fn symmetric_difference<'a>(&'a self, other: &'a HashSet<T, S>) -> SymmetricDifference<'a, T, S>
std::collections::HashSet<T, S>              fn intersection<'a>(&'a self, other: &'a HashSet<T, S>) -> Intersection<'a, T, S>;
std::collections::HashSet<T, S>              fn union<'a>(&'a self, other: &'a HashSet<T, S>) -> Union<'a, T, S>;
```

Parent issue: #89692

r? ```@joshtriplett```
2021-10-31 09:20:24 +01:00
John Kugelman
6745e8da06 Add #[must_use] to len and is_empty 2021-10-30 19:25:12 -04:00
John Kugelman
68b0d86294 Add #[must_use] to remaining core functions 2021-10-30 18:21:29 -04:00
Matthias Krüger
86087f906d
Rollup merge of #90371 - Veykril:patch-2, r=jyn514
Fix incorrect doc link

Looks like a copy paste mistake
2021-10-30 14:36:59 +02:00
Lukas Wirth
29a4e4a009
Fix incorrect doc link 2021-10-28 11:51:00 +02:00
Pietro Albini
a5a8bb0125
replace | with || in string validation
Using short-circuiting operators makes it easier to perform some kinds
of source code analysis, like MC/DC code coverage (a requirement in
safety-critical environments). The optimized x86_64 assembly is
equivalent between the old and new versions.

Old assembly of that condition:

```
mov  rax, qword ptr [rdi + rdx + 8]
or   rax, qword ptr [rdi + rdx]
test rax, r9
je   .LBB0_7
```

New assembly of that condition:

```
mov  rax, qword ptr [rdi + rdx]
or   rax, qword ptr [rdi + rdx + 8]
test rax, r8
je   .LBB0_7
```
2021-10-27 17:00:49 +02:00
Noah Lev
865d99f82b docs: Escape brackets to satisfy the linkchecker
My change to use `Type::def_id()` (formerly `Type::def_id_full()`) in
more places caused some docs to show up that used to be missed by
rustdoc. Those docs contained unescaped square brackets, which triggered
linkcheck errors. This commit escapes the square brackets and adds this
particular instance to the linkcheck exception list.
2021-10-22 14:08:43 -07:00
Adam Skoufis
4b59b35b76
Add missing word to FromStr trait docs 2021-10-14 13:47:54 +11:00
John Kugelman
21f4677744 Add #[must_use] to expensive computations
The unifying theme for this commit is weak, admittedly. I put together a
list of "expensive" functions when I originally proposed this whole
effort, but nobody's cared about that criterion. Still, it's a decent
way to bite off a not-too-big chunk of work.

Given the grab bag nature of this commit, the messages I used vary quite
a bit.
2021-10-12 23:27:17 -04:00
the8472
b55a3c5d15
Rollup merge of #89778 - jkugelman:must-use-as_type-conversions, r=joshtriplett
Add #[must_use] to as_type conversions

Clippy missed these:

```rust
alloc::string::String   fn as_mut_str(&mut self) -> &mut str;
core::mem::NonNull<T>   unsafe fn as_uninit_mut<'a>(&mut self) -> &'a MaybeUninit<T>;
str                     unsafe fn as_bytes_mut(&mut self) -> &mut [u8];
str                     fn as_mut_ptr(&mut self) -> *mut u8;
```

Parent issue: #89692

r? ````@joshtriplett````
2021-10-12 14:53:08 +02:00
John Kugelman
06e625f7d5 Add #[must_use] to as_type conversions 2021-10-11 13:57:38 -04:00
Guillaume Gomez
96ffc74fe3
Rollup merge of #89753 - jkugelman:must-use-from_value-conversions, r=joshtriplett
Add #[must_use] to from_value conversions

I added two methods to the list myself. Clippy did not flag them because they take `mut` args, but neither modifies their argument.

```rust
core::str           const unsafe fn from_utf8_unchecked_mut(v: &mut [u8]) -> &mut str;
std::ffi::CString   unsafe fn from_raw(ptr: *mut c_char) -> CString;
```

I put a custom note on `from_raw`:

```rust
#[must_use = "call `drop(from_raw(ptr))` if you intend to drop the `CString`"]
pub unsafe fn from_raw(ptr: *mut c_char) -> CString {
```

Parent issue: #89692

r? ``@joshtriplett``
2021-10-11 14:11:45 +02:00
John Kugelman
cf2bcd10ed Add #[must_use] to from_value conversions 2021-10-10 19:00:33 -04:00
Matthias Krüger
0c04b1fc03
Rollup merge of #89718 - jkugelman:must-use-is_condition-tests, r=joshtriplett
Add #[must_use] to is_condition tests

There's nothing insightful to say about these so I didn't write any extra explanations.

Parent issue: #89692
2021-10-10 18:22:23 +02:00
bors
0c87288f92 Auto merge of #89219 - nickkuk:str_split_once_get_unchecked, r=Mark-Simulacrum
Use get_unchecked in str::[r]split_once

This PR removes indices checking in `str::split_once` and `str::rsplit_once` methods.
2021-10-10 12:29:48 +00:00
John Kugelman
475e9925a7 Add #[must_use] to is_condition tests
There's nothing insightful to say about these so I didn't write any
extra explanations.
2021-10-09 21:27:13 -04:00
Matthias Krüger
827b540424
Rollup merge of #89694 - jkugelman:must-use-string-transforms, r=joshtriplett
Add #[must_use] to string/char transformation methods

These methods could be misconstrued as modifying their arguments instead of returning new values.

Where possible I made the note recommend a method that does mutate in place.

Parent issue: #89692
2021-10-09 11:56:07 +02:00
Matthias Krüger
36db658796
Rollup merge of #88707 - sylvestre:split_example, r=yaahc
String.split_terminator: Add an example when using a slice of chars
2021-10-09 11:55:58 +02:00
John Kugelman
54d807cfc7 Add #[must_use] to string/char transformation methods
These methods could be misconstrued as modifying their arguments instead
of returning new values.

Where possible I made the note recommend a method that does mutate in
place.
2021-10-09 01:01:40 -04:00
nickkuk
a35aaa2108 Use get_unchecked in str::[r]split_once 2021-10-05 14:42:08 +05:00
The8472
5e1428e18b manually inline function 2021-09-11 12:29:34 +02:00
The8472
66195d8bc4 optimization continuation byte validation of strings containing multibyte chars
```
old, -O2, x86-64
test str::str_validate_emoji                                    ... bench:       4,606 ns/iter (+/- 64)

new, -O2, x86-64
test str::str_validate_emoji                                    ... bench:       3,837 ns/iter (+/- 60)
```
2021-09-11 00:25:41 +02:00
The8472
b6278664af optimize utf8_is_cont_byte() to speed up str.chars().count()
it shows consistent improvements across several x86_64 feature levels

```
old, -O2, x86-64
test str::str_char_count_emoji                                  ... bench:       1,924 ns/iter (+/- 26)
test str::str_char_count_lorem                                  ... bench:         879 ns/iter (+/- 12)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64
test str::str_char_count_emoji                                  ... bench:       1,878 ns/iter (+/- 21)
test str::str_char_count_lorem                                  ... bench:         851 ns/iter (+/- 11)
test str::str_char_count_lorem_short                            ... bench:           4 ns/iter (+/- 0)

old, -O2, x86-64-v2
test str::str_char_count_emoji                                  ... bench:       1,477 ns/iter (+/- 46)
test str::str_char_count_lorem                                  ... bench:         675 ns/iter (+/- 15)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64-v2
test str::str_char_count_emoji                                  ... bench:       1,323 ns/iter (+/- 39)
test str::str_char_count_lorem                                  ... bench:         593 ns/iter (+/- 18)
test str::str_char_count_lorem_short                            ... bench:           4 ns/iter (+/- 0)

old, -O2, x86-64-v3
test str::str_char_count_emoji                                  ... bench:         748 ns/iter (+/- 7)
test str::str_char_count_lorem                                  ... bench:         348 ns/iter (+/- 2)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64-v3
test str::str_char_count_emoji                                  ... bench:         650 ns/iter (+/- 4)
test str::str_char_count_lorem                                  ... bench:         301 ns/iter (+/- 1)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)
```
2021-09-11 00:25:41 +02:00
Mark Rousskov
b4e7649d6d Bump stage0 compiler to 1.56 2021-09-08 20:51:05 -04:00
Sylvestre Ledru
d4031d092d String.split_terminator: Add an example when using a slice of chars 2021-09-06 23:25:38 +02:00
bors
cc9bb1522e Auto merge of #83342 - Count-Count:win-console-incomplete-utf8, r=m-ou-se
Allow writing of incomplete UTF-8 sequences to the Windows console via stdout/stderr

# Problem
Writes of just an incomplete UTF-8 byte sequence (e.g. `b"\xC3"` or `b"\xF0\x9F"`)  to stdout/stderr with a Windows console attached error with `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` even though further writes could complete the codepoint. This is currently a rare occurence since the [linewritershim](2c56ea38b0/library/std/src/io/buffered/linewritershim.rs) implementation flushes complete lines immediately and buffers up to 1024 bytes for incomplete lines. It can still happen as described in #83258.

The problem will become more pronounced once the developer can switch stdout/stderr from line-buffered to block-buffered or immediate when the changes in the "Switchable buffering for Stdout" pull request (#78515) get merged.

# Patch description
If there is at least one valid UTF-8 codepoint all valid UTF-8 is passed through to the extracted `write_valid_utf8_to_console()` fn. The new code only comes into play if `write()` is being passed a short byte slice comprising an incomplete UTF-8 codepoint. In this case up to three bytes are buffered in the `IncompleteUtf8` struct associated with `Stdout` / `Stderr`. The bytes are accepted one at a time. As soon as an error can be detected `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` is returned. Once a complete UTF-8 codepoint is received it is passed to the `write_valid_utf8_to_console()` and the buffer length is set to zero.

Calling `flush()` will neither error nor write anything if an incomplete codepoint is present in the buffer.

# Tests
Currently there are no Windows-specific tests for console writing code at all. Writing (regression) tests for this problem is a bit challenging since unit tests and UI tests don't run in a console and suddenly popping up another console window might be surprising to developers running the testsuite and it might not work at all in CI builds. To just test the new functionality in unit tests the code would need to be refactored. Some guidance on how to proceed would be appreciated.

# Public API changes
* `std::str::verifications::utf8_char_width()` would be exposed as `std::str::utf8_char_width()` behind the "str_internals" feature gate.

# Related issues
* Fixes #83258.
* PR #78515 will exacerbate the problem.

# Open questions
* Add tests?
* Squash into one commit with better commit message?
2021-09-02 03:31:17 +00:00
Deadbeef
b5afa6807b
Constified Default implementations
The libs-api team agrees to allow const_trait_impl to appear in the
standard library as long as stable code cannot be broken (they are
properly gated) this means if the compiler teams thinks it's okay, then
it's okay.

My priority on constifying would be:

	1. Non-generic impls (e.g. Default) or generic impls with no
	   bounds
	2. Generic functions with bounds (that use const impls)
	3. Generic impls with bounds
	4. Impls for traits with associated types

For people opening constification PRs: please cc me and/or oli-obk.
2021-08-17 07:15:54 +00:00
Ali Malik
e43254aad1 Fix may not to appropriate might not or must not 2021-07-29 01:15:20 -04:00
Cameron Steffen
28f7890a29 Add char array without ref Pattern impl 2021-07-28 16:13:46 -05:00
Cameron Steffen
5e4f2128e2 impl Pattern for char array 2021-07-28 16:13:45 -05:00
Frank Steffahn
69dd992f95 Add TrustedRandomAccessNoCoerce supertrait without requirements or guarantees about subtype coercions
Update all the TrustedRandomAccess impls to also implement the new supertrait
2021-07-28 14:33:35 +02:00
Jacob Pratt
36f02f3523
Stabilize const_fn_transmute 2021-07-27 16:03:09 -04:00
Alexis Bourget
101a146db9 Fix #85462 by adding a marker flag
This will not affect ABI since the other variant of the enum is bigger.
It may break some code, but that would be very strange: usually people
don't continue after the first `Done` (or `None` for a normal iterator).
2021-07-11 17:45:12 +02:00