2022 INTERSPEECH INTERSPEECH 2022

Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism

Abstract

This work addresses automatic speech recognition (ASR) of a low-resource language using a translation corpus, which includes the simultaneous translation of the low-resource language. In multi-lingual events such as international meetings and court proceedings, simultaneous interpretation by a human is often available for speeches of low-resource languages. In this setting, we can assume that the content of its back-translation is the same as the transcription of the original speech. Thus, the former is expected to enhance the later process. We formulate this framework as a joint process of ASR and machine translation (MT) and implement it with a combination of cross attention mechanisms of the ASR encoder and the MT encoder. We evaluate the proposed method using the spoken language translation corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC), achieving a significant improvement in the ASR word error rate (WER) of Khmer by 8.9% relative. The effectiveness is also confirmed in the Fisher-CallHome-Spanish corpus with the reduction of WER in Spanish by 1.7% relative.

🌉 Interdisciplinary Bridge — Natural Language Processing and Speech & Audio
🧭 Keyword Pioneer — cross attention
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio