2025 ICML ICML 2025

Learning In-context $n$-grams with Transformers: Sub-$n$-grams Are Near-Stationary Points