2024 INTERSPEECH INTERSPEECH 2024

Adding User Feedback To Enhance CB-Whisper

Abstract

Contextual biasing has been demonstrated to be effective in improving Whisper recall for named entities or domain-specific words. In a recent work, CB-Whisper takes an additional step and integrates a classifier for open-vocabulary keyword-spotting (OV-KWS) to retrieve keywords from an external database to form a restricted biasing list. Heavy dependence on text-to-speech (TTS) models for generating the speech for the keywords makes the system prone to the drawbacks of using TTS models to generate speech for graphemes with non-trivial phonetic transcriptions. This work proposes an extension to CB-Whisper that leverages user feedback to extend the database of keywords with audio extracted from natural speech. We experiment with different learning strategies for the OV-KWS classifier to assess its domain generalization capabilities for TTS-generated or natural-speech keyword audios and unseen languages.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors