2021 INTERSPEECH INTERSPEECH 2021

Robust Continuous On-Device Personalization for Automatic Speech Recognition

Abstract

On-device personalization of an all-neural automatic speech recognition (ASR) model can be achieved efficiently by fine-tuning the last few layers of the model. This approach has been shown to be effective for adapting the model to recognize rare named entities using only a small amount of data. To reliably perform continuous on-device learning, it is important for the training process to be completely autonomous without manual intervention. Our simulation studies show that training over many rounds may eventually lead to a significant model drift if the personalized model is indiscriminately accepted at the end of each training round. It is important to have appropriate acceptance criteria in place to guard the model against drifting. Moreover, for storage efficiency, it is desirable to persist the model weights in quantized form. We found that quantizing and dequantizing the model weights in between training rounds can prevent the model from learning effectively. This issue can be circumvented by adding noise to the quantized weights at the start of each training round.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — model fine-tuning
🐣 Hot Topic Early Bird — continual learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio