librustc_data_structures: Speedup union of sparse and dense hybrid set
This optimization speeds up the union of a hybrid bitset when that
switches it from a sparse representation to a dense bitset. It now
clones the dense bitset and integrate only the spare elements instead of
densifying the sparse bitset, initializing all elements, and then a
union on two dense bitset, touching all words a second time.
It's not completely certain if the added complexity is worth it but I would
like to hear some feedback in any case. Benchmark results from my machine:
```
Now: bit_set::union_hybrid_sparse_to_dense ... bench: 72 ns/iter (+/- 5)
Previous: bit_set::union_hybrid_sparse_to_dense ... bench: 90 ns/iter (+/- 6)
```
This being the second iteration of trying to improve the speed, since I missed the return value in the first, and forgot to run the relevant tests. Oops.