Commit graph

185 commits

Author SHA1 Message Date
bors
86bd45979a Auto merge of #130223 - LaihoE:faster_str_replace, r=thomcc
optimize str.replace

Adds a fast path for str.replace for the ascii to ascii case. This allows for autovectorizing the code. Also should this instead be done with specialization? This way we could remove one branch. I think it is the kind of branch that is easy to predict though.

Benchmark for the fast path (replace all "a" with "b" in the rust wikipedia article, using criterion) :
| N        | Speedup | Time New (ns) | Time Old (ns) |
|----------|---------|---------------|---------------|
| 2        | 2.03    | 13.567        | 27.576        |
| 8        | 1.73    | 17.478        | 30.259        |
| 11       | 2.46    | 18.296        | 45.055        |
| 16       | 2.71    | 17.181        | 46.526        |
| 37       | 4.43    | 18.526        | 81.997        |
| 64       | 8.54    | 18.670        | 159.470       |
| 200      | 9.82    | 29.634        | 291.010       |
| 2000     | 24.34   | 81.114        | 1974.300      |
| 20000    | 30.61   | 598.520       | 18318.000     |
| 1000000  | 29.31   | 33458.000     | 980540.000    |
2024-10-17 16:20:02 +00:00
Stuart Cook
dd4f062b07
Rollup merge of #128399 - mammothbane:master, r=Amanieu,tgross35
liballoc: introduce String, Vec const-slicing

This change `const`-qualifies many methods on `Vec` and `String`, notably `as_slice`, `as_str`, `len`. These changes are made behind the unstable feature flag `const_vec_string_slice`.

## Motivation
This is to support simultaneous variance over ownership and constness. I have an enum type that may contain either `String` or `&str`, and I want to produce a `&str` from it in a possibly-`const` context.

```rust
enum StrOrString<'s> {
    Str(&'s str),
    String(String),
}

impl<'s> StrOrString<'s> {
    const fn as_str(&self) -> &str {
        match self {
             // In a const-context, I really only expect to see this variant, but I can't switch the implementation
             // in some mode like #[cfg(const)] -- there has to be a single body
             Self::Str(s) => s,

             // so this is a problem, since it's not `const`
             Self::String(s) => s.as_str(),
        }
    }
}
```

Currently `String` and `Vec` don't support this, but can without functional changes. Similar logic applies for `len`, `capacity`, `is_empty`.

## Changes

The essential thing enabling this change is that `Unique::as_ptr` is `const`. This lets us convert `RawVec::ptr` -> `Vec::as_ptr` -> `Vec::as_slice` -> `String::as_str`.

I had to move the `Deref` implementations into `as_{str,slice}` because `Deref` isn't `#[const_trait]`, but I would expect this change to be invisible up to inlining. I moved the `DerefMut` implementations as well for uniformity.
2024-10-07 15:37:06 +11:00
Nathan Perry
d793766a61 liballoc: introduce String, Vec const-slicing
This change `const`-qualifies many methods on Vec and String, notably
`as_slice`, `as_str`, `len`. These changes are made behind the unstable
feature flag `const_vec_string_slice` with the following tracking issue:

https://github.com/rust-lang/rust/issues/129041
2024-10-06 19:58:35 -04:00
Laiho
4484085b18 Add fast path for ascii to ascii in str::replace 2024-09-23 19:24:06 +03:00
Michael Goulet
c682aa162b Reformat using the new identifier sorting from rustfmt 2024-09-22 19:11:29 -04:00
Michael Goulet
493852ccd6
Rollup merge of #130408 - okaneco:into_lossy_refactor, r=Noratrieb
Avoid re-validating UTF-8 in `FromUtf8Error::into_utf8_lossy`

Part of the unstable feature `string_from_utf8_lossy_owned` - #129436

Refactor `FromUtf8Error::into_utf8_lossy` to copy valid UTF-8 bytes into the buffer, avoiding double validation of bytes.
Add tests that mirror the `String::from_utf8_lossy` tests.
2024-09-21 15:18:56 -04:00
okaneco
b94c5a169b Avoid re-validating UTF-8 in FromUtf8Error::into_utf8_lossy
Refactor `into_utf8_lossy` to copy valid UTF-8 bytes into the buffer,
avoiding double validation of bytes.
Add tests that mirror the `String::from_utf8_lossy` tests
2024-09-20 20:56:07 -04:00
GnomedDev
5f85f73f63
[Clippy] Swap unnecessary_owned_empty_strings to use diagnostic item instead of path 2024-09-19 13:13:43 +01:00
GnomedDev
89532c0f30
[Clippy] Swap unnecessary_to_owned to use diagnostic item instead of path 2024-09-19 13:13:42 +01:00
GnomedDev
28f4c8293a
[Clippy] Swap single_char_add_str/format_push_string to use diagnostic items instead of paths 2024-09-19 13:13:20 +01:00
GnomedDev
5e4716888a
[Clippy] Swap option_as_ref_deref to use diagnostic items instead of paths 2024-09-19 13:13:19 +01:00
Matthias Krüger
df3cf91b63
Rollup merge of #129439 - okaneco:vec_string_lossy, r=Noratrieb
Implement feature `string_from_utf8_lossy_owned` for lossy conversion from `Vec<u8>` to `String` methods

Accepted ACP: https://github.com/rust-lang/libs-team/issues/116
Tracking issue: #129436

Implement feature for lossily converting from `Vec<u8>` to `String`
- Add `String::from_utf8_lossy_owned`
- Add `FromUtf8Error::into_utf8_lossy`

---
Related to #64727, but unsure whether to mark it "fixed" by this PR.
That issue partly asks for in-place replacement of the original allocation. We fulfill the other half of that request with these functions.

closes #64727
2024-09-15 16:01:36 +02:00
Urgau
843708a32e Add missing #[allow(missing_docs)] on hack functions in alloc 2024-09-09 13:44:09 +02:00
Pavel Grigorenko
f7b0b22137 Fix elided_named_lifetimes in code 2024-08-31 15:35:41 +03:00
okaneco
65abcc2bcc Implement feature string_from_utf8_lossy_owned
Implement feature for lossily converting from `Vec<u8>` to `String`
- Add `String::from_utf8_lossy_owned`
- Add `FromUtf8Error::into_utf8_lossy`
2024-08-22 23:37:52 -04:00
Michael Howell
3312f5d652 alloc: make to_string_str! a bit less complex 2024-08-07 10:06:54 -07:00
Michael Howell
1b587a6e76 alloc: add ToString specialization for &&str
Fixes #128690
2024-08-06 14:37:33 -07:00
Matthias Krüger
1f700139f8
Rollup merge of #127586 - zachs18:more-must-use, r=cuviper
Add `#[must_use]` to some `into_raw*` functions.

cc #121287

r? ``@cuviper``

Adds `#[must_use = "losing the pointer will leak memory"]`[^1] to `Box::into_raw(_with_allocator)`, `Vec::into_raw_parts(_with_alloc)`, `String::into_raw_parts`[^2], and `rc::{Rc, Weak}::into_raw_with_allocator` (Rc's normal `into_raw` and all of `Arc`'s `into_raw*`s are already `must_use`).

Adds `#[must_use = "losing the raw <resource name may leak resources"]` to `IntoRawFd::into_raw_fd`, `IntoRawSocket::into_raw_socket`, and `IntoRawHandle::into_raw_handle`.

[^1]: "*will* leak memory" may be too-strong wording (since `Box`/`Vec`/`String`/`rc::Weak` might not have a backing allocation), but I left it as-is for simplicity and consistency.

[^2]: `String::into_raw_parts`'s `must_use` message is changed from the previous (possibly misleading) "`self` will be dropped if the result is not used".
2024-08-03 11:17:42 +02:00
Nicholas Nethercote
84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00
Benoît du Garreau
772315de7c Remove generic lifetime parameter of trait Pattern
Use a GAT for `Searcher` associated type because this trait is always
implemented for every lifetime anyway.
2024-07-15 12:12:44 +02:00
Zachary S
0d49862998 Clarify/add must_use messages for more into_raw* functions of alloc types. 2024-07-10 13:05:03 -05:00
Zachary S
e0ed696d2f Mitigate focused memory leaks in alloc doctests for Miri.
If/when `-Zmiri-disable-leak-check` is able to be used at test-granularity, it should applied to these tests instead of unleaking.
2024-07-06 22:35:19 -05:00
bors
bfa3635df9 Auto merge of #99969 - calebsander:feature/collect-box-str, r=dtolnay
alloc: implement FromIterator for Box<str>

`Box<[T]>` implements `FromIterator<T>` using `Vec<T>` + `into_boxed_slice()`.
Add analogous `FromIterator` implementations for `Box<str>`
matching the current implementations for `String`.
Remove the `Global` allocator requirement for `FromIterator<Box<str>>` too.

ACP: https://github.com/rust-lang/libs-team/issues/196
2024-05-19 02:13:06 +00:00
Jon Gjengset
0beba9699c Clarify how String::leak and into_boxed_str differ 2024-05-18 19:17:43 +02:00
Caleb Sander
c92c228260 alloc: implement FromIterator for Box<str>
Box<[T]> implements FromIterator<T> using Vec<T> + into_boxed_slice().
Add analogous FromIterator implementations for Box<str>
matching the current implementations for String.
Remove the Global allocator requirement for FromIterator<Box<str>> too.
2024-05-05 10:29:57 -07:00
JirCep
f840da757f WS fix. 2024-04-27 17:18:25 +02:00
JirCep
825a3b1f09 String.truncate calls Vec.truncate, in turn, and that states
"is greater or equal to". Beside common sense.
2024-04-27 16:57:20 +02:00
bors
4d570eea02 Auto merge of #123909 - dtolnay:utf8chunks, r=joboet
Stabilize `Utf8Chunks`

Pending FCP in https://github.com/rust-lang/rust/issues/99543.

This PR includes the proposed modification in https://github.com/rust-lang/libs-team/issues/190 as agreed in https://github.com/rust-lang/rust/issues/99543#issuecomment-2050406568.
2024-04-26 17:41:24 +00:00
David Tolnay
61cf00464e
Stabilize Utf8Chunks 2024-04-24 15:27:47 -07:00
Matthias Krüger
21deaed4a1
Rollup merge of #122201 - coolreader18:doc-clone_from, r=dtolnay
Document overrides of `clone_from()` in core/std

As mentioned in https://github.com/rust-lang/rust/pull/96979#discussion_r1379502413

Specifically, when an override doesn't just forward to an inner type, document the behavior and that it's preferred over simply assigning a clone of source. Also, change instances where the second parameter is "other" to "source".

I reused some of the wording over and over for similar impls, but I'm not sure that the wording is actually *good*. Would appreciate feedback about that.

Also, now some of these seem to provide pretty specific guarantees about behavior (e.g. will reuse the exact same allocation iff the len is the same), but I was basing it off of the docs for [`Box::clone_from`](https://doc.rust-lang.org/1.75.0/std/boxed/struct.Box.html#method.clone_from-1) - I'm not sure if providing those strong guarantees is actually good or not.
2024-04-17 18:01:37 +02:00
Jani Mustonen
418535b798 doc: mention heap allocation earlier in String docs
Just a tiny addition.

Helps with #123263.
2024-04-01 00:04:57 +03:00
Matthias Krüger
0029a11d7d
Rollup merge of #122835 - compiler-errors:deref-pure, r=Nadrieril
Require `DerefMut` and `DerefPure` on `deref!()` patterns when appropriate

Waiting on the deref pattern syntax pr to merge

r? nadrieril
2024-03-26 21:23:48 +01:00
Michael Goulet
b56279569b Require DerefPure for patterns 2024-03-25 19:39:45 -04:00
Reed
58e2ad3d58 Amended wording 2024-03-24 12:20:39 -07:00
Reed
7b3ef13146 Fix a typo in the alloc::string::String docs 2024-03-18 11:39:00 -07:00
Noa
c0e913fdd7
Document overrides of clone_from()
Specifically, when an override doesn't just forward to an inner type,
document the behavior and that it's preferred over simply assigning
a clone of source. Also, change instances where the second parameter is
"other" to "source".
2024-03-08 12:27:24 -06:00
Kornel
78fb977d6b try_with_capacity for Vec, VecDeque, String
#91913
2024-03-01 18:24:02 +00:00
Jacob Pratt
6d865038a5
Rollup merge of #120291 - pitaj:string-sliceindex, r=Amanieu
Have `String` use `SliceIndex` impls from `str`

This PR simplifies the implementation of `Index` and `IndexMut` on `String`, and in the process enables indexing `String` by any user types that implement `SliceIndex<str>`.

Similar to #47832

r? libs

Not sure if this warrants a crater run.
2024-02-29 05:25:26 -05:00
Peter Jaszkowiak
5d59d0c7d7 have String use SliceIndex impls from str 2024-02-27 09:41:32 -07:00
许杰友 Jieyou Xu (Joe)
bfeea294cc
Document args returned from String::into_raw_parts 2024-02-26 19:32:32 +00:00
许杰友 Jieyou Xu (Joe)
26bdf2900a
Rearrange String::from_raw_parts doc argument order to match code argument order 2024-02-26 19:32:26 +00:00
Esteban Küber
0e89465672 On type error of method call arguments, look at confusables for suggestion 2024-02-22 18:04:55 +00:00
Esteban Küber
e5b3c7ef14 Add rustc_confusables annotations to some stdlib APIs
Help with common API confusion, like asking for `push` when the data structure really has `append`.

```
error[E0599]: no method named `size` found for struct `Vec<{integer}>` in the current scope
  --> $DIR/rustc_confusables_std_cases.rs:17:7
   |
LL |     x.size();
   |       ^^^^
   |
help: you might have meant to use `len`
   |
LL |     x.len();
   |       ~~~
help: there is a method with a similar name
   |
LL |     x.resize();
   |       ~~~~~~
```

#59450
2024-02-22 18:04:55 +00:00
Ali MJ Al-Nasrawy
38654ad741
Rollup merge of #95967 - CAD97:from-utf16, r=dtolnay
Add explicit-endian String::from_utf16 variants

This adds the following APIs under `feature(str_from_utf16_endian)`:

```rust
impl String {
    pub fn from_utf16le(v: &[u8]) -> Result<String, FromUtf16Error>;
    pub fn from_utf16le_lossy(v: &[u8]) -> String;
    pub fn from_utf16be(v: &[u8]) -> Result<String, FromUtf16Error>;
    pub fn from_utf16be_lossy(v: &[u8]) -> String;
}
```

These are versions of `String::from_utf16` that explicitly take [UTF-16LE and UTF-16BE](https://unicode.org/faq/utf_bom.html#gen7). Notably, we can do better than just the obvious `decode_utf16(v.array_chunks::<2>().copied().map(u16::from_le_bytes)).collect()` in that:

- We handle the case where the byte slice is not an even number of bytes, and
- In the case that the UTF-16 is native endian and the slice is aligned, we can forward to `String::from_utf16`.

If the Unicode Consortium actively defines how to handle character replacement when decoding a UTF-16 bytestream with a trailing odd byte, I was unable to find reference. However, the behavior implemented here is fairly self-evidently correct: replace the single errant byte with the replacement character.
2023-10-11 03:53:16 +03:00
Jason Newcomb
d464b72970 Add more diagnostic items for clippy 2023-10-05 18:21:47 -04:00
Christopher Durham
5facc32e22 fix char imports 2023-09-29 00:04:57 -04:00
Christopher Durham
1efea31385 add str_from_utf16_endian tracking issue 2023-09-28 23:56:27 -04:00
Christopher Durham
3d448bd067 style nits
Co-authored-by: David Tolnay <dtolnay@gmail.com>
2023-09-28 23:56:10 -04:00
CAD97
8047f8fb51 Add feature(str_from_utf16_endian) 2023-09-28 23:54:38 -04:00
Tshepang Mbambo
e47cd2f250
string.rs: remove "Basic usage" text
Only a single example is given
2023-08-02 11:17:57 +02:00