2020
INTERSPEECH
INTERSPEECH 2020
Self-Training for End-to-End Speech Translation
Abstract
One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing and Speech & Audio
🧭
Keyword Pioneer
— pseudo-label generation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Semi-Supervised Learning
Natural Language Processing > Applications > Machine Translation
Speech & Audio > Recognition > Speech Recognition
Natural Language Processing > Generation > Machine Translation
Machine Learning > Learning Paradigms > Self-Supervised Learning