ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation

Yuka Ko; Katsuhito Sudoh; Sakriani Sakti; Satoshi Nakamura

2021 INTERSPEECH INTERSPEECH 2021

ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation

Abstract

End-to-end speech translation (ST) translates source language speech directly into target language without an intermediate automatic speech recognition (ASR) output, as in a cascading approach. End-to-end ST has the advantage of avoiding error propagation from the intermediate ASR results, but its performance still lags behind the cascading approach. A recent effort to increase performance is multi-task learning using an auxiliary task of ASR. However, previous multi-task learning for end-to-end ST using cross entropy (CE) loss in ASR-task targets one-hot references and does not consider ASR confusion. In this study, we propose a novel end-to-end ST training method using ASR loss against ASR posterior distributions given by a pre-trained model, which we call ASR posterior-based loss. The proposed method is expected to consider possible ASR confusion due to competing hypotheses with similar pronunciations. The proposed method demonstrated better BLEU results in our Fisher Spanish-to-English translation experiments than the baseline with standard CE loss with label smoothing.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — asr posterior

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuka Ko , Katsuhito Sudoh , Sakriani Sakti , Satoshi Nakamura

Topics

Machine Learning > Optimization & Theory > Loss Functions Natural Language Processing > Applications > Machine Translation Speech & Audio > Recognition > Automatic Speech Recognition Machine Learning > Learning Types > Multi-Task Learning

Keywords

multi-task learning automatic speech recognition label smoothing posterior distribution end-to-end speech translation neural network asr posterior

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021