core/char: Speed up `to_digit()` for `radix <= 10`
I noticed that `char::to_digit()` seemed to do a bit of extra work for handling `[a-zA-Z]` characters. Since `to_digit(10)` seems to be the most common case (at least in the `rust` codebase) I thought it might be valuable to create a fast path for that case, and according to the benchmarks that I added in one of the commits it seems to pay off. I also created another fast path for the `radix < 10` case, which also seems to have a positive effect.
It is very well possible that I'm measuring something entirely unrelated though, so please verify these numbers and let me know if I missed something!
### Before
```
# Run 1
test char::methods::bench_to_digit_radix_10 ... bench: 16,265 ns/iter (+/- 1,774)
test char::methods::bench_to_digit_radix_16 ... bench: 13,938 ns/iter (+/- 2,479)
test char::methods::bench_to_digit_radix_2 ... bench: 13,090 ns/iter (+/- 524)
test char::methods::bench_to_digit_radix_36 ... bench: 14,236 ns/iter (+/- 1,949)
# Run 2
test char::methods::bench_to_digit_radix_10 ... bench: 16,176 ns/iter (+/- 1,589)
test char::methods::bench_to_digit_radix_16 ... bench: 13,896 ns/iter (+/- 3,140)
test char::methods::bench_to_digit_radix_2 ... bench: 13,158 ns/iter (+/- 1,112)
test char::methods::bench_to_digit_radix_36 ... bench: 14,206 ns/iter (+/- 1,312)
# Run 3
test char::methods::bench_to_digit_radix_10 ... bench: 16,221 ns/iter (+/- 2,423)
test char::methods::bench_to_digit_radix_16 ... bench: 14,361 ns/iter (+/- 3,926)
test char::methods::bench_to_digit_radix_2 ... bench: 13,097 ns/iter (+/- 671)
test char::methods::bench_to_digit_radix_36 ... bench: 14,388 ns/iter (+/- 1,068)
```
### After
```
# Run 1
test char::methods::bench_to_digit_radix_10 ... bench: 11,521 ns/iter (+/- 552)
test char::methods::bench_to_digit_radix_16 ... bench: 12,926 ns/iter (+/- 684)
test char::methods::bench_to_digit_radix_2 ... bench: 11,266 ns/iter (+/- 1,085)
test char::methods::bench_to_digit_radix_36 ... bench: 14,213 ns/iter (+/- 614)
# Run 2
test char::methods::bench_to_digit_radix_10 ... bench: 11,424 ns/iter (+/- 1,042)
test char::methods::bench_to_digit_radix_16 ... bench: 12,854 ns/iter (+/- 1,193)
test char::methods::bench_to_digit_radix_2 ... bench: 11,193 ns/iter (+/- 716)
test char::methods::bench_to_digit_radix_36 ... bench: 14,249 ns/iter (+/- 3,514)
# Run 3
test char::methods::bench_to_digit_radix_10 ... bench: 11,469 ns/iter (+/- 685)
test char::methods::bench_to_digit_radix_16 ... bench: 12,852 ns/iter (+/- 568)
test char::methods::bench_to_digit_radix_2 ... bench: 11,275 ns/iter (+/- 1,356)
test char::methods::bench_to_digit_radix_36 ... bench: 14,188 ns/iter (+/- 1,501)
```
I ran the benchmark using:
```sh
python x.py bench src/libcore --stage 1 --keep-stage 0 --test-args "bench_to_digit"
```
Add mem::forget_unsized() for forgetting unsized values
~~Allows passing values of `T: ?Sized` types to `mem::drop` and `mem::forget`.~~
Adds `mem::forget_unsized()` that accepts `T: ?Sized`.
I had to revert the PR that removed the `forget` intrinsic and replaced it with `ManuallyDrop`: https://github.com/rust-lang/rust/pull/40559
We can't use `ManuallyDrop::new()` here because it needs `T: Sized` and we don't have support for unsized return values yet (will we ever?).
r? @eddyb
Add link to std::mem::size_of to size_of intrinsic documentation
The other intrinsics with safe/stable alternatives already have documentation to this effect.
Document optimizations enabled by FusedIterator
When reading this I wondered what “some significant optimizations” referred to. As far as I can tell from reading code, the specialization of `.fuse()` is the only case where `FusedIterator` has any impact at all. Is this accurate @Stebalien?
Make PhantomData #[structural_match]
fixes https://github.com/rust-lang/rust/issues/55028
This makes `PhantomData<T>` structurally matchable, irrespective of whether `T` is, per the discussion on this week's language team meeting (the general consensus was that this was a bug-fix).
All types containing `PhantomData<T>` and which used `#[derive(PartialEq, Eq)]` and were previously not `#[structural_match]` only because of `PhantomData<T>` will now be `#[structural_match]`.
r? @nikomatsakis
Use read_unaligned instead of read in transmute_copy
Closes: #55044
This change could result in performance regression on non-x86 platforms. (but it also can fix some of UB which lurks in existing programs) An alternative would be to update `transmute_copy` documentation with alignment requirements.
When reading this I wondered what “some significant optimizations” referred to. As far as I can tell, the specialization of `.fuse()` is the only case where `FusedIterator` has any impact at all. Is this accurate @Stebalien?
Fix Rc/Arc allocation layout
* Rounds allocation layout up to a multiple of alignment
* Adds a convenience method `Layout::pad_to_align` to perform rounding
Closes#55747
cc #55724
Implement rotate using funnel shift on LLVM >= 7
Implement the rotate_left and rotate_right operations using
llvm.fshl and llvm.fshr if they are available (LLVM >= 7).
Originally I wanted to expose the funnel_shift_left and
funnel_shift_right intrinsics and implement rotate_left and
rotate_right on top of them. However, emulation of funnel
shifts requires emitting a conditional to check for zero shift
amount, which is not necessary for rotates. I was uncomfortable
doing that here, as I don't want to rely on LLVM to optimize
away that conditional (and for variable rotates, I'm not sure it
can). We should revisit that question when we raise our minimum
version requirement to LLVM 7 and don't need emulation code
anymore.
Fixes#52457.
Implement the rotate_left and rotate_right operations using
llvm.fshl and llvm.fshr if they are available (LLVM >= 7).
Originally I wanted to expose the funnel_shift_left and
funnel_shift_right intrinsics and implement rotate_left and
rotate_right on top of them. However, emulation of funnel
shifts requires emitting a conditional to check for zero shift
amount, which is not necessary for rotates. I was uncomfortable
doing that here, as I don't want to rely on LLVM to optimize
away that conditional (and for variable rotates, I'm not sure it
can). We should revisit that question when we raise our minimum
version requirement to LLVM 7 and don't need emulation code
anymore.
Rename `CoerceSized` to `DispatchFromDyn`, and reverse the direction so that, for example, you write
```
impl<T: Unsize<U>, U> DispatchFromDyn<*const U> for *const T {}
```
instead of
```
impl<T: Unsize<U>, U> DispatchFromDyn<*const T> for *const U {}
```
this way the trait is really just a subset of `CoerceUnsized`.
The checks in object_safety.rs are updated for the new trait, and some documentation and method names in there are updated for the new trait name — e.g. `receiver_is_coercible` is now called `receiver_is_dispatchable`. Since the trait now works in the opposite direction, some code had to updated here for that too.
I did not update the error messages for invalid `CoerceSized` (now `DispatchFromDyn`) implementations, except to find/replace `CoerceSized` with `DispatchFromDyn`. Will ask for suggestions in the PR thread.