Distilling Knowledge for Search-based Structured Prediction

Yijia Liu; Wanxiang Che; Huaipeng Zhao; Bing Qin; Ting Liu

2018 ACL ACL 2018

Distilling Knowledge for Search-based Structured Prediction

Abstract

AbstractMany natural language processing tasks can be modeled into structured prediction and solved as a search problem. In this paper, we distill an ensemble of multiple models trained with different initialization into a single model. In addition to learning to match the ensemble’s probability output on the reference states, we also use the ensemble to explore the search space and learn from the encountered states in the exploration. Experimental results on two typical search-based structured prediction tasks – transition-based dependency parsing and neural machine translation show that distillation can effectively improve the single model’s performance and the final model achieves improvements of 1.32 in LAS and 2.65 in BLEU score on these two tasks respectively over strong baselines and it outperforms the greedy structured prediction models in previous literatures.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

📈 Trend Setter — Knowledge Distillation

🧭 Keyword Pioneer — search-based learning

🐣 Hot Topic Early Bird — knowledge distillation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Yijia Liu , Wanxiang Che , Huaipeng Zhao , Bing Qin , Ting Liu

Topics

Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Understanding > Parsing Natural Language Processing > Applications > Machine Translation Deep Learning > Learning Types > Knowledge Distillation Machine Learning > Core Methods > Structured Prediction Deep Learning > Learning Types > Structured Prediction

Keywords

model compression structured prediction knowledge distillation machine translation dependency parsing ensemble method search-based learning

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018