optimize zipping over array iterators
Fixes#115339 (somewhat)
the new assembly:
```asm
zip_arrays:
.cfi_startproc
vmovups (%rdx), %ymm0
leaq 32(%rsi), %rcx
vxorps %xmm1, %xmm1, %xmm1
vmovups %xmm1, -24(%rsp)
movq $0, -8(%rsp)
movq %rsi, -88(%rsp)
movq %rdi, %rax
movq %rcx, -80(%rsp)
vmovups %ymm0, -72(%rsp)
movq $0, -40(%rsp)
movq $32, -32(%rsp)
movq -24(%rsp), %rcx
vmovups (%rsi,%rcx), %ymm0
vorps -72(%rsp,%rcx), %ymm0, %ymm0
vmovups %ymm0, (%rsi,%rcx)
vmovups (%rsi), %ymm0
vmovups %ymm0, (%rdi)
vzeroupper
retq
```
This is still longer than the slice version given in the issue but at least it eliminates the terrible `vpextrb`/`orb` chain. I guess this is due to excessive memcpys again (haven't looked at the llvmir)?
The `TrustedLen` specialization is a drive-by change since I had to do something for the default impl anyway to be able to specialize the `TrustedRandomAccessNoCoerce` impl.
This takes a whole 3 lines in `compiler/` since it lowers to `CastKind::Transmute` in MIR *exactly* the same as the existing `intrinsics::transmute` does, it just doesn't have the fancy checking in `hir_typeck`.
Added to enable experimenting with the request in <https://github.com/rust-lang/rust/pull/106281#issuecomment-1496648190> and because the portable-simd folks might be interested for dependently-sized array-vector conversions.
It also simplifies a couple places in `core`.
A successful advance is now signalled by returning `0` and other values now represent the remaining number
of steps that couldn't be advanced as opposed to the amount of steps that have been advanced during a partial advance_by.
This simplifies adapters a bit, replacing some `match`/`if` with arithmetic. Whether this is beneficial overall depends
on whether `advance_by` is mostly used as a building-block for other iterator methods and adapters or whether
we also see uses by users where `Result` might be more useful.
`.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`.
This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better.
I added one simple demonstration of the improvement as a codegen test.
`array::IntoIter` has a bunch of really handy logic for dealing with partial arrays, but it's currently hamstrung by only being creatable from a fully-initialized array.
This PR adds two new constructors:
- a safe & const `empty`, since `[].into_iter()` gives `<T, 0>`, not `<T, N>`.
- an unsafe `from_raw_parts`, to allow experimentation with new uses.
(Slice & vec iterators don't need `from_raw_parts` because you `from_raw_parts` the slice or vec instead, but there's no useful way to made a `<[T; N]>::from_raw_parts`, so I think this is a reasonable place to have one.)
Removes the implementations that depend on the user-definable trait `Copy`.
Only fix regressions to ensure merge in 1.55: Does not modify `vec::IntoIter`.
This method on the Iterator trait is doc(hidden), and about half of
implementations were doc(hidden). This adds the attribute to the
remaining implementations.
- Initialization can use `transmute_copy` to do the bitwise copy.
- `as_slice` can use `get_unchecked` and `MaybeUninit::slice_get_ref`,
and `as_mut_slice` can do similar.
- `next` and `next_back` can use the corresponding `Range` methods.
- `Clone` doesn't need any unsafety, and we can dynamically update the
new range to get partial drops if `T::clone` panics.