How to Select One Among All ? An Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

Tianda Li; Ahmad Rashid; Aref Jafari; Pranav Sharma; Ali Ghodsi; Mehdi Rezagholizadeh

2021 EMNLP EMNLP 2021

How to Select One Among All ? An Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

Abstract

AbstractKnowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge in a large neural network into a smaller one. Even though KD has shown promise on a wide range of Natural Language Processing (NLP) applications, little is understood about how one KD algorithm compares to another and whether these approaches can be complimentary to each other. In this work, we evaluate various KD algorithms on in-domain, out-of-domain and adversarial testing. We propose a framework to assess adversarial robustness of multiple KD algorithms. Moreover, we introduce a new KD algorithm, Combined-KD, which takes advantage of two promising approaches (better training scheme and more efficient data augmentation). Our extensive experimental results show that Combined-KD achieves state-of-the-art results on the GLUE benchmark, out-of-domain generalization, and adversarial robustness compared to competitive methods.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tianda Li , Ahmad Rashid , Aref Jafari , Pranav Sharma , Ali Ghodsi , Mehdi Rezagholizadeh

Topics

Machine Learning > Application Areas > Domain Generalization Machine Learning > Application Areas > Efficient Computing Machine Learning > Application Areas > Knowledge Distillation Machine Learning > Application Areas > Model Compression Machine Learning > Learning Types > Knowledge Distillation Deep Learning > Learning Types > Knowledge Distillation Natural Language Processing > Applications > Natural Language Understanding Artificial Intelligence > Core AI > Knowledge Distillation

Keywords

model compression adversarial robustness domain generalization transfer learning knowledge distillation natural language understanding glue benchmark

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021