Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses

Sheng Li; Chen Chen; Chin Yuen Kwok; Chenhui Chu; Eng Siong Chng; Hisashi Kawai

2024 INTERSPEECH INTERSPEECH 2024

Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses

Abstract

This paper investigates using pre-trained large language models (LLMs) to improve multilingual automatic speech recognition (ASR) outputs. Current popular methods involve feeding the N-best ASR output into LLMs. Although this approach demonstrates improved results, obtaining N-best hypotheses is time-consuming and unavailable sometimes. To develop a more general method, this paper investigates LLM-based ASR error correction with 1-best hypotheses. We fine-tuned a multilingual LLM covering more than 100 languages and let it correct 1-best hypotheses errors from different speech foundation models. The experiment shows that the proposed method effectively enhances the ASR result only using 1-best hypotheses. Moreover, we also noticed that knowledge-transferring between the languages using the same writing system in the LLM can effectively correct low-resourced languages' hypotheses.

🌉 Interdisciplinary Bridge — Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — 1-best hypothesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sheng Li , Chen Chen , Chin Yuen Kwok , Chenhui Chu , Eng Siong Chng , Hisashi Kawai

Topics

Natural Language Processing > Applications > Machine Translation Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

speech foundation model asr error correction multilingual asr large language model 1-best hypothesis

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024