offload: move (un)register lib into global_ctors
Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards.
What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes.
Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in https://github.com/rust-lang/rust/pull/150683, where I introduce a new variant of our offload intrinsic.
r? oli-obk