* Reuse memory
* simplify `next_def_id`, avoid multiple hashing and unnecessary lookups
* remove `all_fake_def_ids`, use the global map instead (probably not a good step toward parallelization, though...)
* convert `add_deref_target` to iterative implementation
* use `ArrayVec` where we know the max number of elements
* minor touchups here and there
* avoid building temporary vectors that get appended to other vectors
At most places I may or may not be doing the compiler's job is this PR.