2024 ICML ICML 2024

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context