2025 AISTATS AISTATS 2025

Superiority of Multi-Head Attention: A Theoretical Study in Shallow Transformers in In-Context Linear Regression