2025 ICML ICML 2025

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training