speed up `String::push` and `String::insert`
Addresses the concerns described in #116235.
The performance gain comes mainly from avoiding temporary buffers.
Complex pattern matching in `encode_utf8` (introduced in #67569) has been simplified to a comparison and an exhaustive `match` in the `encode_utf8_raw_unchecked` helper function. It takes a slice of `MaybeUninit<u8>` because otherwise we'd have to construct a normal slice to uninitialized data, which is not desirable, I guess.
Several functions still have that [unneeded zeroing](https://rust.godbolt.org/z/5oKfMPo7j), but a single instruction is not that important, I guess.
`@rustbot` label T-libs C-optimization A-str