Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation

M.A. Tuğtekin Turan; Emmanuel Vincent; Denis Jouvet

2020 INTERSPEECH INTERSPEECH 2020

Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation

Abstract

Current automatic speech recognition (ASR) systems trained on native speech often perform poorly when applied to non-native or accented speech. In this work, we propose to compute x-vector-like accent embeddings and use them as auxiliary inputs to an acoustic model trained on native data only in order to improve the recognition of multi-accent data comprising native, non-native, and accented speech. In addition, we leverage untranscribed accented training data by means of semi-supervised learning. Our experiments show that acoustic models trained with the proposed accent embeddings outperform those trained with conventional i-vector or x-vector speaker embeddings, and achieve a 15% relative word error rate (WER) reduction on non-native and accented speech w.r.t. acoustic models trained with regular spectral features only. Semi-supervised training using just 1 hour of untranscribed speech per accent yields an additional 15% relative WER reduction w.r.t. models trained on native data only.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — accent embedding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio

Authors

M.A. Tuğtekin Turan , Emmanuel Vincent , Denis Jouvet

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Recognition > Automatic Speech Recognition Machine Learning > Learning Types > Domain Adaptation Machine Learning > Learning Paradigms > Semi-Supervised Learning

Keywords

unsupervised learning semi-supervised learning automatic speech recognition speaker embedding acoustic model acoustic model adaptation accent adaptation accent embedding

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020