On the Sub-layer Functionalities of Transformer Decoder

Yilin Yang; Longyue Wang; Shuming Shi; Prasad Tadepalli; Stefan Lee; Zhaopeng Tu

2020 EMNLP EMNLP 2020

On the Sub-layer Functionalities of Transformer Decoder

Abstract

AbstractThere have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During translation, the decoder must predict output tokens by considering both the source-language text from the encoder and the target-language prefix produced in previous steps. In this work, we study how Transformer-based decoders leverage information from the source and target languages – developing a universal probe task to assess how information is propagated through each module of each decoder layer. We perform extensive experiments on three major translation datasets (WMT En-De, En-Fr, and En-Zh). Our analysis provides insight on when and where decoders leverage different sources. Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance – a significant reduction in computation and number of parameters, and consequently a significant boost to both training and inference speed.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — feed-forward module

🐣 Hot Topic Early Bird — transformer decoder

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yilin Yang , Longyue Wang , Shuming Shi , Prasad Tadepalli , Stefan Lee , Zhaopeng Tu

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Transformers Natural Language Processing > Applications > Machine Translation Deep Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Natural Language Processing

Keywords

representation learning neural machine translation transformer decoder information propagation probe task feed-forward module

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020