Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Gongbo Tang; Mathias Müller; Annette Rios; Rico Sennrich

2018 EMNLP EMNLP 2018

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Abstract

AbstractRecently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

❓ The Questioner

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — self-attentional network

🐣 Hot Topic Early Bird — word sense disambiguation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gongbo Tang , Mathias Müller , Annette Rios , Rico Sennrich

Topics

Deep Learning > Architectures > Transformers Natural Language Processing > Applications > Machine Translation Deep Learning > Learning Types > Representation Learning

Keywords

word sense disambiguation neural machine translation convolutional neural network recurrent neural network long-range dependency subject-verb agreement self-attentional network

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018