Continual Learning for Multi-Dialect Acoustic Models

Brady Houston; Katrin Kirchhoff

2020 INTERSPEECH INTERSPEECH 2020

Continual Learning for Multi-Dialect Acoustic Models

Abstract

Using data from multiple dialects has shown promise in improving neural network acoustic models. While such training can improve the performance of an acoustic model on a single dialect, it can also produce a model capable of good performance on multiple dialects. However, training an acoustic model on pooled data from multiple dialects takes a significant amount of time and computing resources, and it needs to be retrained every time a new dialect is added to the model. In contrast, sequential transfer learning (fine-tuning) does not require retraining using all data, but may result in catastrophic forgetting of previously-seen dialects. Using data from four english dialects, we demonstrate that by using loss functions that mitigate catastrophic forgetting, sequential transfer learning can be used to train multi-dialect acoustic models that narrow the WER gap between the best (combined training) and worst (fine-tuning) case by up to 65%. Continual learning shows great promise in minimizing training time while approaching the performance of models that require much more training time.

🧭 Keyword Pioneer — sequential transfer learning

🐣 Hot Topic Early Bird — continual learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Brady Houston , Katrin Kirchhoff

Topics

Machine Learning > Learning Types > Continual Learning Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Learning Types > Transfer Learning

Keywords

continual learning catastrophic forgetting acoustic model dialect recognition sequential transfer learning

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020