Self-Training for End-to-End Speech Translation

Juan Pino; Qiantong Xu; Xutai Ma; Mohammad Javad Dousti; Yun Tang

2020 INTERSPEECH INTERSPEECH 2020

Self-Training for End-to-End Speech Translation

Abstract

One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — pseudo-label generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio

Authors

Juan Pino , Qiantong Xu , Xutai Ma , Mohammad Javad Dousti , Yun Tang

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Natural Language Processing > Applications > Machine Translation Speech & Audio > Recognition > Speech Recognition Natural Language Processing > Generation > Machine Translation Machine Learning > Learning Paradigms > Self-Supervised Learning

Keywords

semi-supervised learning pseudo-label generation cascade model speech translation end-to-end speech translation speech recognition pre-training

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020