Adjunct-Emeritus Distillation for Semi-Supervised Language Model Adaptation

Scott Novotney; Yile Gu; Ivan Bulyko

2021 INTERSPEECH INTERSPEECH 2021

Adjunct-Emeritus Distillation for Semi-Supervised Language Model Adaptation

Abstract

To improve customer privacy, commercial speech applications are reducing human transcription of customer data. This has a negative impact on language model training due to a smaller amount of in-domain transcripts. Prior work demonstrated that training on automated transcripts alone provides modest gains due to reinforcement of recognition errors. We consider a new condition, where a model trained on historical human transcripts, but not the transcripts themselves, are available to us. To overcome temporal drift in vocabulary and topics, we propose a novel extension of knowledge distillation, adjunct-emeritus distillation where two imperfect teachers jointly train a student model. We conduct experiments on an English voice assistant domain and simulate a one year gap in human transcription. Unlike fine-tuning, our approach is architecture agnostic and achieves a 14% relative reduction in perplexity over the baseline approach of freezing model development and improves over the baseline of knowledge distillation.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🐣 Hot Topic Early Bird — model adaptation

Authors

Scott Novotney , Yile Gu , Ivan Bulyko

Topics

Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Generation > Language Modeling Speech & Audio > Recognition > Speech Recognition Machine Learning > Learning Types > Knowledge Distillation Machine Learning > Learning Paradigms > Semi-Supervised Learning

Keywords

semi-supervised learning knowledge distillation speech recognition language model adaptation model adaptation language model temporal drift

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021