GAML-BERT: Improving BERT Early Exiting by Gradient Aligned Mutual Learning

Wei Zhu; Xiaoling Wang; Yuan Ni; Guotong Xie

2021 EMNLP EMNLP 2021

GAML-BERT: Improving BERT Early Exiting by Gradient Aligned Mutual Learning

Abstract

AbstractIn this work, we propose a novel framework, Gradient Aligned Mutual Learning BERT (GAML-BERT), for improving the early exiting of BERT. GAML-BERT’s contributions are two-fold. We conduct a set of pilot experiments, which shows that mutual knowledge distillation between a shallow exit and a deep exit leads to better performances for both. From this observation, we use mutual learning to improve BERT’s early exiting performances, that is, we ask each exit of a multi-exit BERT to distill knowledge from each other. Second, we propose GA, a novel training method that aligns the gradients from knowledge distillation to cross-entropy losses. Extensive experiments are conducted on the GLUE benchmark, which shows that our GAML-BERT can significantly outperform the state-of-the-art (SOTA) BERT early exiting methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — bert early exiting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wei Zhu , Xiaoling Wang , Yuan Ni , Guotong Xie

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Application Areas > Model Compression Deep Learning > Models > Transformers Deep Learning > Techniques > Knowledge Distillation

Keywords

model compression knowledge distillation neural network optimization mutual learning gradient alignment early exiting bert early exiting

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021