The loop to load all the known impls from external crates seems to have been used because `ty::populate_implementations_for_trait_if_necessary` wasn't doing its job, and solely relying on it resulted in loading only impls in the same crate as the trait.
Coherence for `librustc` was reduced from 18.310s to 0.610s, from stage1 to stage2.
Interestingly, type checking also went from 46.232s to 42.003s, though that could be noise or unrelated improvements.
On a smaller scale, `fn main() {}` now spends 0.003s in coherence instead of 0.368s, which fixes#22068.
It also peaks at only 1.2MB, instead of 16MB of heap usage.