2024 INTERSPEECH INTERSPEECH 2024

Prompt Tuning for Speech Recognition on Unknown Spoken Name Entities

Abstract

This paper explores the challenge of recognising relevant but previously unheard named entities in spoken input. This scenario pertains to real-world applications where establishing an automatic speech recognition (ASR) model trained on new entity phrases may not be efficient. We propose a technique that involves fine-tuning a Whisper model with a list of entity phrases as prompts. We establish a task-specific dataset where stratification of different entity phrases supports evaluation of three different scenarios in which entities might be encountered. We focus our analysis on a seen-but-unheard scenario, reflecting a situation where only textual representations of novel entity phrases are available for a commercial banking assistant bot. We show that a model tuned to anticipate prompts reflecting novel named entities makes substantial improvements in entity recall over non-tuned baseline models, and meaningful improvements in performance over models fine-tuned without a prompt.

🧭 Keyword Pioneer — unknown spoken entity
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio
🌉 Interdisciplinary Bridge — Natural Language Processing and Speech & Audio