20240930 SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

20241013 MoEUT: Mixture-of-Experts Universal Transformers