Towards interfacing large language models with ASR systems using confidence measures and prompting

Maryam Naderi; Enno Hermann; Alexandre Nanchen; Sevada Hovsepyan; Mathew Magimai.-Doss

2024 INTERSPEECH INTERSPEECH 2024

Towards interfacing large language models with ASR systems using confidence measures and prompting

Abstract

As large language models (LLMs) grow in parameter size and capabilities, such as interaction through prompting, they open up new ways of interfacing with automatic speech recognition (ASR) systems beyond rescoring n-best lists. This work investigates post-hoc correction of ASR transcripts with LLMs. To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods. Our results indicate that this can improve the performance of less competitive ASR systems.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — post-hoc correction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Maryam Naderi , Enno Hermann , Alexandre Nanchen , Sevada Hovsepyan , Mathew Magimai.-Doss

Topics

Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Generation > Text Generation Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

zero-shot learning prompt engineering automatic speech recognition confidence measure post-hoc correction large language model

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024