https://patricktoulme.substack.com/p/cutile-on-blackwell-nvidias-compiler
https://developer.nvidia.com/blog/optimizing-communication-for-mixture-of-experts-training-with-hybrid-expert-parallel/