2024 INTERSPEECH INTERSPEECH 2024

Online Knowledge Distillation of Decoder-Only Large Language Models for Efficient Speech Recognition

Abstract

Large language models (LLMs), which show promising performance in generation tasks, have proven their capabilities to be applied in a wide range of tasks. Although there are several approaches to adapt LLMs as decoder in speech recognition tasks, these can slow down inference speed, which is an important issue for the product-level systems. To address this problem, we introduce online knowledge distillation methods to transfer information from the decoder-only LLMs to a more compact Transformer decoder during the training phase. Implementing our proposed methods on a multilingual low-resource dataset, we achieved a 8.2% relative character error rate (CER) reduction compared to the LLM decoder model with much lower inference cost and a 34.7% relative CER reduction compared to the attention-based encoder-decoder (AED) model. Furthermore, we obtained a 14.9% relative CER reduction along with the same inference cost on a general Korean dataset.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing and Speech & Audio
🧭 Keyword Pioneer — decoder-only architecture
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors