Systematicity Emerges in Transformers when Abstract Grammatical Roles Guide Attention

Ayush K Chakravarthy; Jacob Labe Russin; Randall O’Reilly

2022 NAACL NAACL 2022

Systematicity Emerges in Transformers when Abstract Grammatical Roles Guide Attention

Abstract

AbstractSystematicity is thought to be a key inductive bias possessed by humans that is lacking in standard natural language processing systems such as those utilizing transformers. In this work, we investigate the extent to which the failure of transformers on systematic generalization tests can be attributed to a lack of linguistic abstraction in its attention mechanism. We develop a novel modification to the transformer by implementing two separate input streams: a role stream controls the attention distributions (i.e., queries and keys) at each layer, and a filler stream determines the values. Our results show that when abstract role labels are assigned to input sequences and provided to the role stream, systematic generalization is improved.

🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary and Natural Language Processing

🧭 Keyword Pioneer — linguistic abstraction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio