On the Sparsity of Neural Machine Translation Models

Yong Wang; Longyue Wang; Victor Li; Zhaopeng Tu

2020 EMNLP EMNLP 2020

On the Sparsity of Neural Machine Translation Models

Abstract

AbstractModern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — parameter rejuvenation

🐣 Hot Topic Early Bird — model pruning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Yong Wang , Longyue Wang , Victor Li , Zhaopeng Tu

Topics

Artificial Intelligence > Core AI > Model Compression Natural Language Processing > Applications > Machine Translation Machine Learning > Application Areas > Model Compression Deep Learning > Optimization & Theory > Model Compression

Keywords

neural machine translation model pruning sparse model parameter sparsity parameter rejuvenation parameter reuse

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020