20240930 SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
20241013 MoEUT: Mixture-of-Experts Universal Transformers