The `gpu-kernel` calling convention has several restrictions that were
not enforced by the compiler until now.
Add the following restrictions:
1. Cannot be async
2. Cannot be called
3. Cannot return values, return type must be `()` or `!`
4. Arguments should be primitives, i.e. passed by value. More complicated
types can work when you know what you are doing, but it is rather
unintuitive, one needs to know ABI/compiler internals.
5. Export name should be unmangled, either through `no_mangle` or
`export_name`. Kernels are searched by name on the CPU side, having
a mangled name makes it hard to find and probably almost always
unintentional.