Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

Javier Ferrando; Marta R. Costa-jussà

2021 EMNLP EMNLP 2021

Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

Abstract

AbstractThis work proposes an extensive analysis of the Transformer architecture in the Neural Machine Translation (NMT) setting. Focusing on the encoder-decoder attention mechanism, we prove that attention weights systematically make alignment errors by relying mainly on uninformative tokens from the source sequence. However, we observe that NMT models assign attention to these tokens to regulate the contribution in the prediction of the two contexts, the source and the prefix of the target sequence. We provide evidence about the influence of wrong alignments on the model behavior, demonstrating that the encoder-decoder attention mechanism is well suited as an interpretability method for NMT. Finally, based on our analysis, we propose methods that largely reduce the word alignment error rate compared to standard induced alignments from attention weights.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Javier Ferrando , Marta R. Costa-jussà

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation Deep Learning > Models > Transformers Deep Learning > Techniques > Attention

Keywords

transformer architecture attention mechanism neural machine translation word alignment encoder-decoder attention

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021