Recurrent Attention for the Transformer

Jan Rosendahl; Christian Herold; Frithjof Petrick; Hermann Ney

2021 EMNLP EMNLP 2021

Recurrent Attention for the Transformer

Abstract

AbstractIn this work, we conduct a comprehensive investigation on one of the centerpieces of modern machine translation systems: the encoder-decoder attention mechanism. Motivated by the concept of first-order alignments, we extend the (cross-)attention mechanism by a recurrent connection, allowing direct access to previous attention/alignment decisions. We propose several ways to include such a recurrency into the attention mechanism. Verifying their performance across different translation tasks we conclude that these extensions and dependencies are not beneficial for the translation performance of the Transformer architecture.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — alignment modeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jan Rosendahl , Christian Herold , Frithjof Petrick , Hermann Ney

Topics

Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation Artificial Intelligence > Core AI > Language

Keywords

transformer architecture machine translation encoder-decoder attention recurrent attention alignment modeling

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021