Efficient Knowledge Distillation from an Ensemble of Teachers

Takashi Fukuda; Masayuki Suzuki; Gakuto Kurata; Samuel Thomas; Jia Cui; Bhuvana Ramabhadran

2017 INTERSPEECH INTERSPEECH 2017

Efficient Knowledge Distillation from an Ensemble of Teachers

Abstract

This paper describes the effectiveness of knowledge distillation using teacher student training for building accurate and compact neural networks. We show that with knowledge distillation, information from multiple acoustic models like very deep VGG networks and Long Short-Term Memory (LSTM) models can be used to train standard convolutional neural network (CNN) acoustic models for a variety of systems requiring a quick turnaround. We examine two strategies to leverage multiple teacher labels for training student models. In the first technique, the weights of the student model are updated by switching teacher labels at the minibatch level. In the second method, student models are trained on multiple streams of information from various teacher distributions via data augmentation. We show that standard CNN acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to teacher VGG and LSTM acoustic models. Additionally we also investigate the effectiveness of using broadband teacher labels as privileged knowledge for training better narrowband acoustic models within this framework. We show the benefit of this simple technique by training narrowband student models with broadband teacher soft labels on the Aurora 4 task.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Knowledge Distillation

🧭 Keyword Pioneer — teacher student training

🐣 Hot Topic Early Bird — knowledge distillation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Takashi Fukuda , Masayuki Suzuki , Gakuto Kurata , Samuel Thomas , Jia Cui , Bhuvana Ramabhadran

Topics

Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Architectures > Neural Networks Machine Learning > Application Areas > Model Compression Machine Learning > Learning Types > Knowledge Distillation Deep Learning > Techniques > Knowledge Distillation

Keywords

model compression knowledge distillation acoustic modeling acoustic model convolutional neural network long short-term memory teacher student training neural network

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017