2025 ICML ICML 2025

Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layers