Commit graph

498 commits

Author SHA1 Message Date
Broono Lu
16b7fd2272 Fix src/libcore/str/mod.rs doc comments 2019-12-21 18:12:46 +08:00
Mark Rousskov
82184440ec Propagate cfg bootstrap 2019-12-18 12:16:19 -05:00
Mazdak Farrokhzad
a6f817f429
Rollup merge of #67249 - ranma42:improve-starts-with-literal-char, r=BurntSushi
Improve code generated for `starts_with(<literal char>)`

This PR includes two minor improvements to the code generated when checking for string prefix/suffix.

The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation.

The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method.

The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string.

This PR should fix #41993
2019-12-16 17:33:01 +01:00
Mazdak Farrokhzad
1c12dc8cdf
Rollup merge of #66735 - SOF3:feature/str_strip, r=KodrAus
Add str::strip_prefix and str::strip_suffix

Introduces a counterpart for `Path::strip_prefix` on `str`.

This was also discussed in https://internals.rust-lang.org/t/pre-pr-path-strip-prefix-counterpart-in-str/11364/.
2019-12-16 05:23:33 +01:00
SOFe
6176051dd0
Set tracking issue for str_strip 2019-12-15 17:07:57 +08:00
Oliver Scherer
5e17e39881 Require stable/unstable annotations for the constness of all stable functions with a const modifier 2019-12-13 11:27:02 +01:00
Andrea Canciani
de7fefa04c Minor cleanup in Pattern::{is_prefix_of,is_suffix_of} for char 2019-12-12 21:09:17 +01:00
Andrea Canciani
1f6d0234db Prefer encoding the char when checking for string prefix/suffix
This enables constant folding when matching a literal char.

Fixes #41993.
2019-12-12 03:33:46 +01:00
Andrea Canciani
cc863f02a3 Improve str prefix/suffix comparison
The comparison can be performed on the raw bytes, as the chars can
only match if their UTF8 encoding matches.

This avoids the `is_char_boundary` checks and translates to a straight
`u8` slice comparison which is optimized to a memcmp or inline
comparison where appropriate.
2019-12-11 21:24:35 +01:00
Andre Bogus
7f87dd1bc0 Some small readability improvements 2019-12-11 20:49:46 +01:00
Sen Jiang
52649ddfbd
Fix documentation of pattern for str::matches()
Made it the same as rmatches()
2019-12-03 14:31:41 -08:00
SOFe
4718e20fcf
Fixed formatting issues 2019-11-26 19:33:06 +08:00
SOFe
2e2e0dfc1a
Improved comments to clarify sasumptions in str::strip_prefix 2019-11-26 17:42:43 +08:00
SOFe
9badc33cda
Add str::strip_prefix and str::strip_suffix 2019-11-25 19:36:47 +08:00
Oliver Scherer
02f9167f94 Have tidy ensure that we document all unsafe blocks in libcore 2019-11-06 11:04:42 +01:00
Jeff Dickey
d9ec5fa88c doc(str): show example of chars().count() under len()
the docs are great at explaining that .len() isn't like in other
languages but stops short of explaining how to get the character length.

r? @steveklabnik
2019-11-01 20:18:33 -07:00
Mikko Rantanen
040d88dda1
Remove leading :: from paths in doc examples 2019-10-20 21:13:47 +03:00
Mark Rousskov
d0a6805b0e Allow unused attributes to avoid incremental bug 2019-10-04 11:11:58 -04:00
Mark Rousskov
f359a94849 Snap cfgs to new beta 2019-09-25 08:42:46 -04:00
Oliver Scherer
7767e7fb16 Stabilize str::len, [T]::len, is_empty and str::as_bytes as const fn 2019-09-24 12:56:44 +02:00
Julian Gehring
c4d0c285fe Fix word repetition in str documentation
Fixes a few repetitions of "like like" in the `trim*` methods documentation of `str`.
2019-08-31 17:38:23 +01:00
Dodo
080fdb8184 add missing #[repr(C)] on a union 2019-08-28 17:38:24 +02:00
Ilija Tovilo
3a6a29b4ec
Use associated_type_bounds where applicable - closes #61738 2019-08-08 22:39:15 +02:00
Maximilian Roos
3325ff6df4
comments from @lzutao 2019-07-29 12:26:59 -04:00
Maximilian Roos
624c5da1aa
impl Debug for Chars 2019-07-29 12:17:59 -04:00
Mateusz Mikuła
124f6ef7cd Fix clippy::len_zero warnings 2019-07-18 15:14:56 +02:00
bors
e34d4ae869 Auto merge of #61339 - jridgewell:pointer-alignment, r=BurntSushi
Optimize pointer alignment in utf8 validation

This uses (and reuses) the u8 arrays's inherent block alignment when checking whether the current index is block aligned.

I initially thought that this would just move the expensive `align_offset` call out of the while loop and replace it with a subtraction and bitwise AND. But it appears this optimizes much better, too...

before: https://rust.godbolt.org/z/WIPvWl
after: https://rust.godbolt.org/z/-jBPoW

## Benchmarks

https://github.com/jridgewell/faster-from_utf8/tree/pointer-alignment

```
test from_utf8_2_bytes_fast      ... bench:         310 ns/iter (+/- 42) = 1290 MB/s
test from_utf8_2_bytes_regular   ... bench:         309 ns/iter (+/- 24) = 1294 MB/s

test from_utf8_3_bytes_fast      ... bench:       1,027 ns/iter (+/- 62) = 1168 MB/s
test from_utf8_3_bytes_regular   ... bench:       1,513 ns/iter (+/- 611) = 793 MB/s

test from_utf8_4_bytes_fast      ... bench:       1,788 ns/iter (+/- 26) = 1342 MB/s
test from_utf8_4_bytes_regular   ... bench:       1,907 ns/iter (+/- 181) = 1258 MB/s

test from_utf8_all_bytes_fast    ... bench:       3,463 ns/iter (+/- 97) = 1155 MB/s
test from_utf8_all_bytes_regular ... bench:       4,083 ns/iter (+/- 89) = 979 MB/s

test from_utf8_ascii_fast        ... bench:          88 ns/iter (+/- 4) = 28988 MB/s
test from_utf8_ascii_regular     ... bench:          88 ns/iter (+/- 8) = 28988 MB/s

test from_utf8_cyr_fast          ... bench:       7,707 ns/iter (+/- 531) = 665 MB/s
test from_utf8_cyr_regular       ... bench:       8,202 ns/iter (+/- 135) = 625 MB/s

test from_utf8_enwik8_fast       ... bench:   1,135,756 ns/iter (+/- 84,450) = 8804 MB/s
test from_utf8_enwik8_regular    ... bench:   1,145,468 ns/iter (+/- 79,601) = 8730 MB/s

test from_utf8_jawik10_fast      ... bench:  12,723,844 ns/iter (+/- 473,247) = 785 MB/s
test from_utf8_jawik10_regular   ... bench:  13,384,596 ns/iter (+/- 666,997) = 747 MB/s

test from_utf8_mixed_fast        ... bench:       2,321 ns/iter (+/- 123) = 2081 MB/s
test from_utf8_mixed_regular     ... bench:       2,702 ns/iter (+/- 408) = 1788 MB/s

test from_utf8_mostlyasc_fast    ... bench:         249 ns/iter (+/- 10) = 14666 MB/s
test from_utf8_mostlyasc_regular ... bench:         276 ns/iter (+/- 5) = 13231 MB/s
```
2019-07-17 12:13:36 +00:00
Justin Ridgewell
3b6e8ed502 Do not use pointer alignment on unsupported platforms 2019-07-05 16:39:56 -04:00
Mazdak Farrokhzad
4049a3cf1b
Rollup merge of #62316 - khuey:efficient_last, r=sfackler
When possible without changing semantics, implement Iterator::last in terms of DoubleEndedIterator::next_back for types in liballoc and libcore.

Provided that the iterator has finite length and does not trigger user-provided code, this is safe.

What follows is a full list of the DoubleEndedIterators in liballoc/libcore and whether this optimization is safe, and if not, why not.

src/liballoc/boxed.rs
Box: Pass through to avoid defeating optimization of the underlying DoubleIterator implementation. This has no correctness impact.

src/liballoc/collections/binary_heap.rs
Iter: Pass through to avoid defeating optimizations on slice::Iter
IntoIter: Not safe, changes Drop order
Drain: Not safe, changes Drop order

src/liballoc/collections/btree/map.rs
Iter: Safe to call next_back, invokes no user defined code.
IterMut: ditto
IntoIter: Not safe, changes Drop order
Keys: Safe to call next_back, invokes no user defined code.
Values: ditto
ValuesMut: ditto
Range: ditto
RangeMut: ditto

src/liballoc/collections/btree/set.rs
Iter: Safe to call next_back, invokes no user defined code.
IntoIter: Not safe, changes Drop order
Range: Safe to call next_back, invokes no user defined code.

src/liballoc/collections/linked_list.rs
Iter: Safe to call next_back, invokes no user defined code.
IterMut: ditto
IntoIter: Not safe, changes Drop order

src/liballoc/collections/vec_deque.rs
Iter: Safe to call next_back, invokes no user defined code.
IterMut: ditto
IntoIter: Not safe, changes Drop order
Drain: ditto

src/liballoc/string.rs
Drain: Safe because return type is a primitive (char)

src/liballoc/vec.rs
IntoIter: Not safe, changes Drop order
Drain: ditto
Splice: ditto

src/libcore/ascii.rs
EscapeDefault: Safe because return type is a primitive (u8)

src/libcore/iter/adapters/chain.rs
Chain: Not safe, invokes user defined code (Iterator impl)

src/libcore/iter/adapters/flatten.rs
FlatMap: Not safe, invokes user defined code (Iterator impl)
Flatten: ditto
FlattenCompat: ditto

src/libcore/iter/adapters/mod.rs
Rev: Not safe, invokes user defined code (Iterator impl)
Copied: ditto
Cloned: Not safe, invokes user defined code (Iterator impl and T::clone)
Map: Not safe, invokes user defined code (Iterator impl + closure)
Filter: ditto
FilterMap: ditto
Enumerate: Not safe, invokes user defined code (Iterator impl)
Skip: ditto
Fuse: ditto
Inspect: ditto

src/libcore/iter/adapters/zip.rs
Zip: Not safe, invokes user defined code (Iterator impl)

src/libcore/iter/range.rs
ops::Range: Not safe, changes Drop order, but ALREADY HAS SPECIALIZATION
ops::RangeInclusive: ditto

src/libcore/iter/sources.rs
Repeat: Not safe, calling last should iloop.
Empty: No point, iterator is at most one item long.
Once: ditto
OnceWith: ditto

src/libcore/option.rs
Item: No point, iterator is at most one item long.
Iter: ditto
IterMut: ditto
IntoIter: ditto

src/libcore/result.rs
Iter: No point, iterator is at most one item long
IterMut: ditto
IntoIter: ditto

src/libcore/slice/mod.rs
Split: Not safe, invokes user defined closure
SplitMut: ditto
RSplit: ditto
RSplitMut: ditto
Windows: Safe, already has specialization
Chunks: ditto
ChunksMut: ditto
ChunksExact: ditto
ChunksExactMut: ditto
RChunks: ditto
RChunksMut: ditto
RChunksExact: ditto
RChunksExactMut: ditto

src/libcore/str/mod.rs
Chars: Safe, already has specialization
CharIndices: ditto
Bytes: ditto
Lines: Safe to call next_back, invokes no user defined code.
LinesAny: Deprecated
Everything that is generic over P: Pattern: Not safe because Pattern invokes user defined code.
SplitWhitespace: Safe to call next_back, invokes no user defined code.
SplitAsciiWhitespace: ditto

This is attempt 2 of #60130.

r? @sfackler
2019-07-04 01:38:56 +02:00
Kohei Takahashi
e5ede80a9a
Fixed document bug, those replaced each other
Introduced by #58005
2019-07-03 15:59:40 +09:00
Kyle Huey
db16e17212 When possible without changing semantics, implement Iterator::last in terms of DoubleEndedIterator::next_back for types in liballoc and libcore.
Provided that the iterator has finite length and does not trigger user-provided code, this is safe.

What follows is a full list of the DoubleEndedIterators in liballoc/libcore and whether this optimization is safe, and if not, why not.

src/liballoc/boxed.rs
Box: Pass through to avoid defeating optimization of the underlying DoubleIterator implementation. This has no correctness impact.

src/liballoc/collections/binary_heap.rs
Iter: Pass through to avoid defeating optimizations on slice::Iter
IntoIter: Not safe, changes Drop order
Drain: Not safe, changes Drop order

src/liballoc/collections/btree/map.rs
Iter: Safe to call next_back, invokes no user defined code.
IterMut: ditto
IntoIter: Not safe, changes Drop order
Keys: Safe to call next_back, invokes no user defined code.
Values: ditto
ValuesMut: ditto
Range: ditto
RangeMut: ditto

src/liballoc/collections/btree/set.rs
Iter: Safe to call next_back, invokes no user defined code.
IntoIter: Not safe, changes Drop order
Range: Safe to call next_back, invokes no user defined code.

src/liballoc/collections/linked_list.rs
Iter: Safe to call next_back, invokes no user defined code.
IterMut: ditto
IntoIter: Not safe, changes Drop order

src/liballoc/collections/vec_deque.rs
Iter: Safe to call next_back, invokes no user defined code.
IterMut: ditto
IntoIter: Not safe, changes Drop order
Drain: ditto

src/liballoc/string.rs
Drain: Safe because return type is a primitive (char)

src/liballoc/vec.rs
IntoIter: Not safe, changes Drop order
Drain: ditto
Splice: ditto

src/libcore/ascii.rs
EscapeDefault: Safe because return type is a primitive (u8)

src/libcore/iter/adapters/chain.rs
Chain: Not safe, invokes user defined code (Iterator impl)

src/libcore/iter/adapters/flatten.rs
FlatMap: Not safe, invokes user defined code (Iterator impl)
Flatten: ditto
FlattenCompat: ditto

src/libcore/iter/adapters/mod.rs
Rev: Not safe, invokes user defined code (Iterator impl)
Copied: ditto
Cloned: Not safe, invokes user defined code (Iterator impl and T::clone)
Map: Not safe, invokes user defined code (Iterator impl + closure)
Filter: ditto
FilterMap: ditto
Enumerate: Not safe, invokes user defined code (Iterator impl)
Skip: ditto
Fuse: ditto
Inspect: ditto

src/libcore/iter/adapters/zip.rs
Zip: Not safe, invokes user defined code (Iterator impl)

src/libcore/iter/range.rs
ops::Range: Not safe, changes Drop order, but ALREADY HAS SPECIALIZATION
ops::RangeInclusive: ditto

src/libcore/iter/sources.rs
Repeat: Not safe, calling last should iloop.
Empty: No point, iterator is at most one item long.
Once: ditto
OnceWith: ditto

src/libcore/option.rs
Item: No point, iterator is at most one item long.
Iter: ditto
IterMut: ditto
IntoIter: ditto

src/libcore/result.rs
Iter: No point, iterator is at most one item long
IterMut: ditto
IntoIter: ditto

src/libcore/slice/mod.rs
Split: Not safe, invokes user defined closure
SplitMut: ditto
RSplit: ditto
RSplitMut: ditto
Windows: Safe, already has specialization
Chunks: ditto
ChunksMut: ditto
ChunksExact: ditto
ChunksExactMut: ditto
RChunks: ditto
RChunksMut: ditto
RChunksExact: ditto
RChunksExactMut: ditto

src/libcore/str/mod.rs
Chars: Safe, already has specialization
CharIndices: ditto
Bytes: ditto
Lines: Safe to call next_back, invokes no user defined code.
LinesAny: Deprecated
Everything that is generic over P: Pattern: Not safe because Pattern invokes user defined code.
SplitWhitespace: Safe to call next_back, invokes no user defined code.
SplitAsciiWhitespace: ditto
2019-07-02 13:45:29 -07:00
Ralf Jung
7b97cf9431 make sure to_ascii_lowercase actually leaves upper-case non-ASCII characters alone 2019-06-10 12:42:43 +02:00
Napen123
1b6b7597ed Add examples for make_ascii_{uppercase, lowercase} 2019-06-08 18:28:29 -06:00
Justin Ridgewell
3d2c4ff456 Optimize pointer alignment in utf8 validation
This uses (and reuses) the u8 arrays's inherent block alignment when checking whether the current index is block aligned.

I initially thought that this would just move the expensive `align_offset` call out of the while loop and replace it with a subtraction and bitwise AND. But it appears this optimizes much better, too...

before: https://rust.godbolt.org/z/WIPvWl
after: https://rust.godbolt.org/z/-jBPoW

https://github.com/jridgewell/faster-from_utf8/tree/pointer-alignment

```
test from_utf8_2_bytes_fast      ... bench:         310 ns/iter (+/- 42) = 1290 MB/s
test from_utf8_2_bytes_regular   ... bench:         309 ns/iter (+/- 24) = 1294 MB/s

test from_utf8_3_bytes_fast      ... bench:       1,027 ns/iter (+/- 62) = 1168 MB/s
test from_utf8_3_bytes_regular   ... bench:       1,513 ns/iter (+/- 611) = 793 MB/s

test from_utf8_4_bytes_fast      ... bench:       1,788 ns/iter (+/- 26) = 1342 MB/s
test from_utf8_4_bytes_regular   ... bench:       1,907 ns/iter (+/- 181) = 1258 MB/s

test from_utf8_all_bytes_fast    ... bench:       3,463 ns/iter (+/- 97) = 1155 MB/s
test from_utf8_all_bytes_regular ... bench:       4,083 ns/iter (+/- 89) = 979 MB/s

test from_utf8_ascii_fast        ... bench:          88 ns/iter (+/- 4) = 28988 MB/s
test from_utf8_ascii_regular     ... bench:          88 ns/iter (+/- 8) = 28988 MB/s

test from_utf8_cyr_fast          ... bench:       7,707 ns/iter (+/- 531) = 665 MB/s
test from_utf8_cyr_regular       ... bench:       8,202 ns/iter (+/- 135) = 625 MB/s

test from_utf8_enwik8_fast       ... bench:   1,135,756 ns/iter (+/- 84,450) = 8804 MB/s
test from_utf8_enwik8_regular    ... bench:   1,145,468 ns/iter (+/- 79,601) = 8730 MB/s

test from_utf8_jawik10_fast      ... bench:  12,723,844 ns/iter (+/- 473,247) = 785 MB/s
test from_utf8_jawik10_regular   ... bench:  13,384,596 ns/iter (+/- 666,997) = 747 MB/s

test from_utf8_mixed_fast        ... bench:       2,321 ns/iter (+/- 123) = 2081 MB/s
test from_utf8_mixed_regular     ... bench:       2,702 ns/iter (+/- 408) = 1788 MB/s

test from_utf8_mostlyasc_fast    ... bench:         249 ns/iter (+/- 10) = 14666 MB/s
test from_utf8_mostlyasc_regular ... bench:         276 ns/iter (+/- 5) = 13231 MB/s
```
2019-05-29 23:52:05 -04:00
Steven Fackler
8a22bc3b30 Revert "Add implementations of last in terms of next_back on a bunch of DoubleEndedIterators."
This reverts commit 3e86cf36b5.
2019-05-22 14:09:34 -07:00
Mazdak Farrokhzad
088c99410b
Rollup merge of #60443 - RalfJung:as_ptr, r=SimonSapin
as_ptr returns a read-only pointer

Add comments to `as_ptr` methods to warn that these are read-only pointers, and writing to them is UB.

[It was pointed out](https://internals.rust-lang.org/t/as-ptr-vs-as-mut-ptr/9940) that `CStr` does not even have an `as_mut_ptr`. I originally was going to add one, but there is no method at all that would mutate a `CStr`. Was that a deliberate choice or should I add an `as_mut_ptr` (similar to [what I did for `str`](https://github.com/rust-lang/rust/pull/58200))?
2019-05-14 22:00:11 +02:00
Mazdak Farrokhzad
bab03cecfe
Rollup merge of #60130 - khuey:efficient_last, r=sfackler
Add implementations of last in terms of next_back on a bunch of DoubleEndedIterators

Provided a `DoubleEndedIterator` has finite length, `Iterator::last` is equivalent to `DoubleEndedIterator::next_back`. But searching forwards through the iterator when it's unnecessary is obviously not good for performance. I ran into this on one of the collection iterators.

I tried adding appropriate overloads for a bunch of the iterator adapters like filter, map, etc, but I ran into a lot of type inference failures after doing so.

The other interesting case is what to do with `Repeat`. Do we consider it part of the contract that `Iterator::last` will loop forever on it? The docs do say that the iterator will be evaluated until it returns None. This is also relevant for the adapters, it's trivially easy to observe whether a `Map` adapter invoked its closure a zillion times or just once for the last element.
2019-05-14 22:00:09 +02:00
Ralf Jung
30cf0e4251 clarify wording 2019-05-02 13:36:30 +02:00
Ralf Jung
1e47250540 as_ptr returns a read-only pointer 2019-05-01 17:59:48 +02:00
Mazdak Farrokhzad
c9b70144a3
Rollup merge of #60356 - JohnTitor:stabilize-as-mut-ptr, r=Centril
Stabilize str::as_mut_ptr

Closes #58215
2019-04-29 22:22:42 +02:00
Mazdak Farrokhzad
eb3c53071a
Rollup merge of #59946 - mernen:patch-2, r=ehuss
Fix equivalent string in escape_default docs

This newline should be escaped.
2019-04-29 22:22:34 +02:00
Yuki OKUSHI
4c0f01cc27 Stabilize str::as_mut_ptr 2019-04-29 02:33:50 +09:00
varkor
aa388f1d11 ignore-tidy-filelength on all files with greater than 3000 lines 2019-04-25 21:39:09 +01:00
Kyle Huey
3e86cf36b5 Add implementations of last in terms of next_back on a bunch of DoubleEndedIterators.
r?Manishearth
2019-04-19 21:52:43 -07:00
Mazdak Farrokhzad
3ad9fcccbb
Rollup merge of #60098 - Centril:libcore-deny-more, r=varkor
libcore: deny `elided_lifetimes_in_paths`

r? @varkor
2019-04-19 06:03:30 +02:00
Mazdak Farrokhzad
291e44b381
Rollup merge of #60023 - koalatux:nth-back, r=scottmcm
implement specialized nth_back() for Bytes, Fuse and Enumerate

Hi,

After my first PR has been successfully merged, here is my second pull request :-)

Also this PR contains some specializations for the problem discussed in #54054.
2019-04-19 06:03:12 +02:00
Mazdak Farrokhzad
dbfbadeac4 libcore: deny more... 2019-04-19 01:37:12 +02:00
Taiki Endo
360432f1e8 libcore => 2018 2019-04-18 14:47:35 +09:00
Adrian Friedli
cc2689a253
implement nth_back for Bytes 2019-04-16 20:41:23 +02:00
Daniel Luz
e2613f521a
Fix equivalent string in escape_default 2019-04-13 16:44:21 -03:00