Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism

Kak Soky; Sheng Li; Masato Mimura; Chenhui Chu; Tatsuya Kawahara

2022 INTERSPEECH INTERSPEECH 2022

Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism

Abstract

This work addresses automatic speech recognition (ASR) of a low-resource language using a translation corpus, which includes the simultaneous translation of the low-resource language. In multi-lingual events such as international meetings and court proceedings, simultaneous interpretation by a human is often available for speeches of low-resource languages. In this setting, we can assume that the content of its back-translation is the same as the transcription of the original speech. Thus, the former is expected to enhance the later process. We formulate this framework as a joint process of ASR and machine translation (MT) and implement it with a combination of cross attention mechanisms of the ASR encoder and the MT encoder. We evaluate the proposed method using the spoken language translation corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC), achieving a significant improvement in the ASR word error rate (WER) of Khmer by 8.9% relative. The effectiveness is also confirmed in the Fisher-CallHome-Spanish corpus with the reduction of WER in Spanish by 1.7% relative.

🌉 Interdisciplinary Bridge — Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — cross attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Kak Soky , Sheng Li , Masato Mimura , Chenhui Chu , Tatsuya Kawahara

Topics

Natural Language Processing > Applications > Machine Translation Speech & Audio > Recognition > Automatic Speech Recognition Deep Learning > Learning Types > Deep Learning

Keywords

machine translation automatic speech recognition low-resource language cross attention simultaneous translation word error rate cross attention mechanism

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022