Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition

Eesung Kim; Aditya Jajodia; Cindy Tseng; Divya Neelagiri; Taeyeon Ki; Vijendra Raj Apsingekar

2023 INTERSPEECH INTERSPEECH 2023

Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition

Abstract

In production scenarios that require frequent change, it is inefficient to repeatedly train and update the entire End-to-end (E2E) model for spoken language understanding (SLU). In this paper, we present a study on efficiently adapting E2E SLU models based on pre-trained ASR model. Specifically, we propose the ASR-based E2E SLU model integrating an additional decoder for SLU and a fusion module that incorporates acoustic representation from the shared encoder and text transcript representation from ASR decoder. Furthermore, we investigate the effectiveness of an adapter module that fine-tunes only a small number of parameters for semantic and tran- script predictions. The experimental results show that the proposed model outperforms other competitive baselines in intent accuracy, SLU F1 score and word error rate (WER) on FSC, SLURP, and Samsung in-house SLU datasets.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — intent accuracy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio