Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data

Yuhao Zhang; Chen Xu; Bojie Hu; Chunliang Zhang; Tong Xiao; Jingbo Zhu

2023 AAAI AAAI 2023

Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data

Abstract

Abstract We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Speech & Audio

🐣 Hot Topic Early Bird — text encoder

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuhao Zhang , Chen Xu , Bojie Hu , Chunliang Zhang , Tong Xiao , Jingbo Zhu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Transfer Learning Deep Learning > Architectures > Transformers Deep Learning > Techniques > Transfer Learning Speech & Audio > Recognition > Speech Translation

Keywords

transfer learning speech processing text encoder end-to-end translation speech translation denoising method modality adaptation

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023