2024
NIPS
NeurIPS 2024
Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling
Abstract
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates.Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads.These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
🧭
Keyword Pioneer
— transformer expressive power
Authors
Topics
Machine Learning > Optimization & Theory > Learning Theory
Machine Learning > Optimization & Theory > Theory
Deep Learning > Architectures > Transformers
Deep Learning > Techniques > Model Architecture
Machine Learning > Learning Types > Representation Learning
Deep Learning > Optimization & Theory > Neural Network Optimization
Deep Learning > Optimization & Theory > Theory