2024 INTERSPEECH INTERSPEECH 2024

Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses

Abstract

This paper investigates using pre-trained large language models (LLMs) to improve multilingual automatic speech recognition (ASR) outputs. Current popular methods involve feeding the N-best ASR output into LLMs. Although this approach demonstrates improved results, obtaining N-best hypotheses is time-consuming and unavailable sometimes. To develop a more general method, this paper investigates LLM-based ASR error correction with 1-best hypotheses. We fine-tuned a multilingual LLM covering more than 100 languages and let it correct 1-best hypotheses errors from different speech foundation models. The experiment shows that the proposed method effectively enhances the ASR result only using 1-best hypotheses. Moreover, we also noticed that knowledge-transferring between the languages using the same writing system in the LLM can effectively correct low-resourced languages' hypotheses.

🌉 Interdisciplinary Bridge — Natural Language Processing and Speech & Audio
🧭 Keyword Pioneer — 1-best hypothesis
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio