One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification

Xiaoqin Chang; Sophia Yat Mei Lee; Suyang Zhu; Shoushan Li; Guodong Zhou

2022 COLING COLING 2022

One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification

Abstract

AbstractKnowledge distillation is an effective method to transfer knowledge from a large pre-trained teacher model to a compacted student model. However, in previous studies, the distilled student models are still large and remain impractical in highly speed-sensitive systems (e.g., an IR system). In this study, we aim to distill a deep pre-trained model into an extremely compacted shallow model like CNN. Specifically, we propose a novel one-teacher and multiple-student knowledge distillation approach to distill a deep pre-trained teacher model into multiple shallow student models with ensemble learning. Moreover, we leverage large-scale unlabeled data to improve the performance of students. Empirical studies on three sentiment classification tasks demonstrate that our approach achieves better results with much fewer parameters (0.9%-18%) and extremely high speedup ratios (100X-1000X).

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaoqin Chang , Sophia Yat Mei Lee , Suyang Zhu , Shoushan Li , Guodong Zhou

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Knowledge Distillation

Keywords

model compression ensemble learning knowledge distillation sentiment classification convolutional neural network teacher-student learning

Download PDF

Related papers

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation 2022

The Role of Context and Uncertainty in Shallow Discourse Parsing 2022

SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training 2022

Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification 2022

Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories 2022