Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

Sai Muralidhar Jayanthi; Devang Kulshreshtha; Saket Dingliwal; Srikanth Ronanki; Sravan Bodapati

2023 EMNLP EMNLP 2023

Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

Abstract

AbstractPersonalization of automatic speech recognition (ASR) models is a widely studied topic because of its many practical applications. Most recently, attention-based contextual biasing techniques are used to improve the recognition of rare words and/or domain specific entities. However, due to performance constraints, the biasing is often limited to a few thousand entities, restricting real-world usability. To address this, we first propose a “Retrieve and Copy” mechanism to improve latency while retaining the accuracy even when scaled to a large catalog. We also propose a training strategy to overcome the degradation in recall at such scale due to an increased number of confusing entities. Overall, our approach achieves up to 6% more Word Error Rate reduction (WERR) and 3.6% absolute improvement in F1 when compared to a strong baseline. Our method also allows for large catalog sizes of up to 20K without significantly affecting WER and F1-scores, while achieving at least 20% inference speedup per acoustic frame.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Sai Muralidhar Jayanthi , Devang Kulshreshtha , Saket Dingliwal , Srikanth Ronanki , Sravan Bodapati

Topics

Machine Learning > Core Methods > Metric Learning Machine Learning > Application Areas > Knowledge Distillation Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition word error rate entity retrieval contextual biasing rare word recognition name entity recognition

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023