2023 INTERSPEECH INTERSPEECH 2023

Model-Internal Slot-triggered Biasing for Domain Expansion in Neural Transducer ASR Models

Abstract

Personal rare word recognition is an important yet challenging task for end-to-end speech recognition. Contextual biasing has demonstrated success in tackling this problem. Though effective in improving rare word recognition, these mechanisms can lead to errors due to false-biasing while facing further challenges when attempting to expand them to many domains. To address these limitations, in this work we propose a neural biasing design with a streaming model-internal slot classifier, trained to categorise the domain of each word piece before it is emitted. The neural biasing module can therefore be triggered in a controlled way, permitting natural scaling to many domains while reducing false-biasing and computational cost. Experiments on diverse domain slot types of application names, communications and playlist names demonstrate the proposed architecture results in 26% to 58% relative improvements on personal rare word recognition with minimal impact (0.6% rel.) on general data.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — slot classifier
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio