Surprisingly Easy Hard-Attention for Sequence to Sequence Learning

Shiv Shankar; Siddhant Garg; Sunita Sarawagi

2018 EMNLP EMNLP 2018

Surprisingly Easy Hard-Attention for Sequence to Sequence Learning

Abstract

AbstractIn this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning. The method combines the advantage of sharp focus in hard attention and the implementation ease of soft attention. On five translation tasks we show effortless and consistent gains in BLEU compared to existing attention mechanisms.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Techniques

🧭 Keyword Pioneer — beam approximation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shiv Shankar , Siddhant Garg , Sunita Sarawagi

Topics

Machine Learning > Optimization & Theory > Optimization Deep Learning > Architectures > Neural Networks Deep Learning > Techniques Natural Language Processing > Generation > Machine Translation Deep Learning > Learning Types > Representation Learning Deep Learning > Techniques > Attention

Keywords

attention mechanism machine translation neural machine translation sequence-to-sequence learning beam search hard attention neural network sequence to sequence learning beam approximation

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018