End-to-End Speech Translation with Knowledge Distillation

Yuchen Liu; Hao Xiong; Jiajun Zhang; Zhongjun He; Hua Wu; Haifeng Wang; Chengqing Zong

2019 INTERSPEECH INTERSPEECH 2019

End-to-End Speech Translation with Knowledge Distillation

Abstract

End-to-end speech translation (ST), which directly translates from source language speech into target language text, has attracted intensive attentions in recent years. Compared to conventional pipeline systems, end-to-end ST model has potential benefits of lower latency, smaller model size and less error propagation. However, it is notoriously difficult to implement such model which combines automatic speech recognition (ASR) and machine translation (MT) together. In this paper, we propose a knowledge distillation approach to improve ST by transferring the knowledge from text translation. Specifically, we first train a text translation model, regarded as the teacher model, and then ST model is trained to learn the output probabilities of teacher model through knowledge distillation. Experiments on English-French Augmented LibriSpeech and English-Chinese TED corpus show that end-to-end ST is possible to implement on both similar and dissimilar language pairs. In addition, with the instruction of the teacher model, end-to-end ST model can gain significant improvements by over 3.5 BLEU points.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing and Speech & Audio

🐣 Hot Topic Early Bird — speech translation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuchen Liu , Hao Xiong , Jiajun Zhang , Zhongjun He , Hua Wu , Haifeng Wang , Chengqing Zong

Topics

Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Applications > Machine Translation Speech & Audio > Recognition > Speech Recognition

Keywords

knowledge distillation machine translation automatic speech recognition speech translation neural network

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019