Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models

Bolaji Yusuf; Murali Karthick Baskar; Andrew Rosenberg; Bhuvana Ramabhadran

2024 INTERSPEECH INTERSPEECH 2024

Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models

Abstract

This paper explores speculative speech recognition (SSR), where we empower conventional automatic speech recognition (ASR) with speculation capabilities, allowing the recognizer to run ahead of audio. We introduce a metric for measuring SSR performance and we propose a model which does SSR by combining a RNN-Transducer-based ASR system with an audio-prefixed language model (LM). The ASR system transcribes ongoing audio and feeds the resulting transcripts, along with an audio-dependent prefix, to the LM, which speculates likely completions for the transcriptions. We experiment with a variety of ASR datasets on which show the efficacy our method and the feasibility of SSR as a method of reducing ASR latency.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — speculative recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bolaji Yusuf , Murali Karthick Baskar , Andrew Rosenberg , Bhuvana Ramabhadran

Topics

Machine Learning > Optimization & Theory > Optimization Natural Language Processing > Resources & Methods > Large Language Models Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition language model low-rank adaptation speculative recognition

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024