Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Bowen Wu; Huan Zhang; MengYuan Li; Zongsheng Wang; Qihang Feng; Junhong Huang; Baoxun Wang

2020 AACL AACL 2020

Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Abstract

AbstractRecently, BERT has become an essential ingredient of various NLP deep models due to its effectiveness and universal-usability. However, the online deployment of BERT is often blocked by its large-scale parameters and high computational cost. There are plenty of studies showing that the knowledge distillation is efficient in transferring the knowledge from BERT into the model with a smaller size of parameters. Nevertheless, current BERT distillation approaches mainly focus on task-specified distillation, such methodologies lead to the loss of the general semantic knowledge of BERT for universal-usability. In this paper, we propose a sentence representation approximating oriented distillation framework that can distill the pre-trained BERT into a simple LSTM based model without specifying tasks. Consistent with BERT, our distilled model is able to perform transfer learning via fine-tuning to adapt to any sentence-level downstream task. Besides, our model can further cooperate with task-specific distillation procedures. The experimental results on multiple NLP tasks from the GLUE benchmark show that our approach outperforms other task-specific distillation methods or even much larger models, i.e., ELMO, with efficiency well-improved.

🚀 Conference Pioneer — AACL 2020

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — bert compression

🐣 Hot Topic Early Bird — large language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Bowen Wu , Huan Zhang , MengYuan Li , Zongsheng Wang , Qihang Feng , Junhong Huang , Baoxun Wang

Topics

Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Paradigms > Transfer Learning

Keywords

transfer learning knowledge distillation bert compression model distillation text encoding sentence representation sentence embedding large language model

Download PDF

Related papers

Can Monolingual Pretrained Models Help Cross-Lingual Classification? 2020

Text Simplification with Reinforcement Learning Using Supervised Rewards on Grammaticality, Meaning Preservation, and Simplicity 2020

ISA: An Intelligent Shopping Assistant 2020

Social Media Medical Concept Normalization using RoBERTa in Ontology Enriched Text Similarity Framework 2020

Overcoming Resistance: The Normalization of an Amazonian Tribal Language 2020