2024 INTERSPEECH INTERSPEECH 2024

Boosting CTC-based ASR using inter-layer attention-based CTC loss

Abstract

This paper addresses improving the performance of CTC-based models, which leverage the intermediate outputs of all encoder layers with an attention mechanism. Several previous studies have used the intermediate outputs of the encoder layer to modify CTC-based models. Here, we focus on the role of the Transformer encoder layer, and each encoder layer is computed for two CTC losses by weighting the intermediate outputs of its lower and upper layers using an attention mechanism. By dividing the layer into two groups, it is expected to be possible to calculate the loss, taking into account both acoustic and linguistic features. Experimental results showed that the proposed method improved the baseline recognition performance of TEDLIUM2 speech data, achieving a WER of 9.9% on the dev set and 11.8% on the test set. Our method outperformed the conventional methods for WER with only slightly increased inference speed measured by RTF.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — inter-layer attention
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio