2024 ICML ICML 2024

Scaling Beyond the GPU Memory Limit for Large Mixture-of-Experts Model Training