Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation

Ta-Chung Chi; Ting-Han Fan; Alexander Rudnicky; Peter Ramadge

2023 EMNLP EMNLP 2023

Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation

Abstract

AbstractUnlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of Weight-Sharing, Adaptive-Depth, and Sliding-Dilated-Attention, RegularGPT constructs working memory along the depth dimension, thereby enabling efficient and successful modeling of regular languages such as PARITY. We further test RegularGPT on the task of natural language length extrapolation and surprisingly find that it rediscovers the local windowed attention effect deemed necessary in prior work for length extrapolation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — sliding attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ta-Chung Chi , Ting-Han Fan , Alexander Rudnicky , Peter Ramadge

Topics

Artificial Intelligence > Core AI > Memory Machine Learning > Optimization & Theory > Learning Theory Deep Learning > Architectures > Transformers Natural Language Processing > Generation > Language Modeling Artificial Intelligence > Core AI > Reasoning Artificial Intelligence > Core AI > Language

Keywords

transformer architecture attention mechanism working memory weight sharing length extrapolation regular language sliding attention

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023