2024 ICML ICML 2024

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

The Questioner