A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems

Xiaoqiang Wang; Yanqing Liu; Sheng Zhao; Jinyu Li

2021 INTERSPEECH INTERSPEECH 2021

A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems

Abstract

It’s challenging to customize transducer-based automatic speech recognition (ASR) system with context information which is dynamic and unavailable during model training. In this work, we introduce a light-weight contextual spelling correction model to correct context-related recognition errors in transducer-based ASR systems. We incorporate the context information into the spelling correction model with a shared context encoder and use a filtering algorithm to handle large-size context lists. Experiments show that the model improves baseline ASR model performance with about 50% relative word error rate reduction, which also significantly outperforms the baseline method such as contextual LM biasing. The model also shows excellent performance for out-of-vocabulary terms not seen during training.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — transducer-based speech recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Xiaoqiang Wang , Yanqing Liu , Sheng Zhao , Jinyu Li

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Automatic Speech Recognition Machine Learning > Learning Types > Transfer Learning

Keywords

speech recognition automatic speech recognition spelling correction context encoder out-of-vocabulary term transducer-based speech recognition contextual spelling correction language model biasing

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021