Low-resource Low-footprint Wake-word Detection using Knowledge Distillation

Arindam Ghosh; Mark Fuhs; Deblin Bagchi; Bahman Farahani; Monika Woszczyna

2022 INTERSPEECH INTERSPEECH 2022

Low-resource Low-footprint Wake-word Detection using Knowledge Distillation

Abstract

As virtual assistants have become more diverse and specialized, so has the demand for application or brand-specific wake words. However, the wake-word-specific datasets typically used to train wake-word detectors are costly to create. In this paper, we explore two techniques to leverage acoustic modeling data for large-vocabulary speech recognition to improve a purpose-built wake-word detector: transfer learning and knowledge distillation. We also explore how these techniques interact with time-synchronous training targets to improve detection latency. Experiments are presented on the open-source "Hey Snips” dataset and a more challenging in-house far-field dataset. Using phone-synchronous targets and knowledge distillation from a large acoustic model, we are able to improve accuracy across dataset sizes for both datasets while reducing latency.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Arindam Ghosh , Mark Fuhs , Deblin Bagchi , Bahman Farahani , Monika Woszczyna

Topics

Machine Learning > Application Areas > Knowledge Distillation Speech & Audio > Recognition > Speech Recognition Machine Learning > Learning Types > Transfer Learning Deep Learning > Techniques > Knowledge Distillation

Keywords

transfer learning knowledge distillation speech recognition low-resource learning acoustic model wake-word detection

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022