2024 ICML ICML 2024

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot