Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng; Minghui Qiu; Yaliang Li; Hai-Tao Zheng; Ying Shen

2021 AAAI AAAI 2021

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Abstract

Abstract Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of Natural Language Processing (NLP) tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation for the student models. To tackle this problem, we propose a method to learn to augment data for BERT Knowledge Distillation in target domains with scarce labeled data, by learning a cross-domain manipulation scheme that automatically augments the target domain with the help of resource-rich source domains. Specifically, the proposed method generates samples acquired from a stationary distribution near the target data and adopts a reinforced controller to automatically refine the augmentation strategy according to the performance of the student. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines on different NLP tasks, and for the data-scarce domains, the compressed student models even perform better than the original large teacher model, with much fewer parameters (only ~13.3%) when only a few labeled examples available.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lingyun Feng , Minghui Qiu , Yaliang Li , Hai-Tao Zheng , Ying Shen

Topics

Machine Learning > Application Areas > Data Augmentation Machine Learning > Application Areas > Domain Adaptation Machine Learning > Application Areas > Knowledge Distillation Machine Learning > Application Areas > Model Compression Natural Language Processing > Resources & Methods > Transfer Learning Deep Learning > Learning Types > Knowledge Distillation

Keywords

model compression domain adaptation knowledge distillation data augmentation language model

Download PDF

Related papers

Contextual Conditional Reasoning 2021

Attention Beam: An Image Captioning Approach (Student Abstract) 2021

Movie Summarization via Sparse Graph Construction 2021

Text Analysis for Understanding Symptoms of Social Anxiety in Student Veterans 2021

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs 2021