Attention-Guided Answer Distillation for Machine Reading Comprehension

Minghao Hu; Yuxing Peng; Furu Wei; Zhen Huang; Dongsheng Li; Nan Yang; Ming Zhou

2018 EMNLP EMNLP 2018

Attention-Guided Answer Distillation for Machine Reading Comprehension

Abstract

AbstractDespite that current reading comprehension systems have achieved significant advancements, their promising performances are often obtained at the cost of making an ensemble of numerous models. Besides, existing approaches are also vulnerable to adversarial attacks. This paper tackles these problems by leveraging knowledge distillation, which aims to transfer knowledge from an ensemble model to a single model. We first demonstrate that vanilla knowledge distillation applied to answer span prediction is effective for reading comprehension systems. We then propose two novel approaches that not only penalize the prediction on confusing answers but also guide the training with alignment information distilled from the ensemble. Experiments show that our best student model has only a slight drop of 0.4% F1 on the SQuAD test set compared to the ensemble teacher, while running 12x faster during inference. It even outperforms the teacher on adversarial SQuAD datasets and NarrativeQA benchmark.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

📈 Trend Setter — Knowledge Distillation

🧭 Keyword Pioneer — answer span prediction

🐣 Hot Topic Early Bird — model ensemble

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Minghao Hu , Yuxing Peng , Furu Wei , Zhen Huang , Dongsheng Li , Nan Yang , Ming Zhou

Topics

Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Applications > Machine Reading Comprehension Machine Learning > Learning Types > Knowledge Distillation

Keywords

model compression attention mechanism knowledge distillation machine reading comprehension adversarial attack model ensemble neural network answer span prediction

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018