Online Knowledge Distillation of Decoder-Only Large Language Models for Efficient Speech Recognition
Abstract
Large language models (LLMs), which show promising performance in generation tasks, have proven their capabilities to be applied in a wide range of tasks. Although there are several approaches to adapt LLMs as decoder in speech recognition tasks, these can slow down inference speed, which is an important issue for the product-level systems. To address this problem, we introduce online knowledge distillation methods to transfer information from the decoder-only LLMs to a more compact Transformer decoder during the training phase. Implementing our proposed methods on a multilingual low-resource dataset, we achieved a 8.2% relative character error rate (CER) reduction compared to the LLM decoder model with much lower inference cost and a 34.7% relative CER reduction compared to the attention-based encoder-decoder (AED) model. Furthermore, we obtained a 14.9% relative CER reduction along with the same inference cost on a general Korean dataset.