Adding User Feedback To Enhance CB-Whisper

Raul Monteiro

2024 INTERSPEECH INTERSPEECH 2024

Adding User Feedback To Enhance CB-Whisper

Abstract

Contextual biasing has been demonstrated to be effective in improving Whisper recall for named entities or domain-specific words. In a recent work, CB-Whisper takes an additional step and integrates a classifier for open-vocabulary keyword-spotting (OV-KWS) to retrieve keywords from an external database to form a restricted biasing list. Heavy dependence on text-to-speech (TTS) models for generating the speech for the keywords makes the system prone to the drawbacks of using TTS models to generate speech for graphemes with non-trivial phonetic transcriptions. This work proposes an extension to CB-Whisper that leverages user feedback to extend the database of keywords with audio extracted from natural speech. We experiment with different learning strategies for the OV-KWS classifier to assess its domain generalization capabilities for TTS-generated or natural-speech keyword audios and unseen languages.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Raul Monteiro

Topics

Machine Learning > Application Areas > Domain Adaptation Machine Learning > Application Areas > Domain Generalization Speech & Audio > Recognition > Speech Recognition

Keywords

domain generalization speech recognition keyword spotting open vocabulary

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024