Attention is Turing-Complete

Jorge Pérez; Pablo Barceló; Javier Marinković

2021 JMLR JMLR 2021

Attention is Turing-Complete

Abstract

Alternatives to recurrent neural networks, in particular, architectures based on self-attention, are gaining momentum for processing input sequences. In spite of their relevance, the computational properties of such networks have not yet been fully explored.We study the computational power of the Transformer, one of the most paradigmatic architectures exemplifying self-attention. We show that the Transformer with hard-attention is Turing complete exclusively based on their capacity to compute and access internal dense representations of the data.Our study also reveals some minimal sets of elements needed to obtain this completeness result. [abs] [ pdf ][ bib ] © JMLR 2021. (edit, beta)

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jorge Pérez , Pablo Barceló , Javier Marinković

Topics

Machine Learning > Optimization & Theory > Theory Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks

Keywords

transformer architecture recurrent neural network turing completeness

Download PDF

Related papers

Optimal Feedback Law Recovery by Gradient-Augmented Sparse Polynomial Regression 2021

Normalizing Flows for Probabilistic Modeling and Inference 2021

Determining the Number of Communities in Degree-corrected Stochastic Block Models 2021

Guided Visual Exploration of Relations in Data Sets 2021

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach 2021