Shrinking Bigfoot: Reducing wav2vec 2.0 footprint

Zilun Peng; Akshay Budhkar; Ilana Tuil; Jason Levy; Parinaz Sobhani; Raphael Cohen; Jumana Nassour

2021 EMNLP EMNLP 2021

Shrinking Bigfoot: Reducing wav2vec 2.0 footprint

Abstract

AbstractWav2vec 2.0 is a state-of-the-art speech recognition model which maps speech audio waveforms into latent representations. The largest version of wav2vec 2.0 contains 317 million parameters. Hence, the inference latency of wav2vec 2.0 will be a bottleneck in production, leading to high costs and a significant environmental footprint. To improve wav2vec’s applicability to a production setting, we explore multiple model compression methods borrowed from the domain of large language models. Using a teacher-student approach, we distilled the knowledge from the original wav2vec 2.0 model into a student model, which is 2 times faster, 4.8 times smaller than the original model. More importantly, the student model is 2 times more energy efficient than the original model in terms of CO2 emission. This increase in performance is accomplished with only a 7% degradation in word error rate (WER). Our quantized model is 3.6 times smaller than the original model, with only a 0.1% degradation in WER. To the best of our knowledge, this is the first work that compresses wav2vec 2.0.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — speech recognition model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zilun Peng , Akshay Budhkar , Ilana Tuil , Jason Levy , Parinaz Sobhani , Raphael Cohen , Jumana Nassour

Topics

Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Techniques > Model Architecture Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Recognition > Speech Recognition Machine Learning > Application Areas > Model Compression Deep Learning > Techniques > Knowledge Distillation Deep Learning > Learning Types > Knowledge Distillation

Keywords

model compression knowledge distillation speech recognition energy efficiency wav2vec 2.0 word error rate speech recognition model teacher-student approach

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021