Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings

Junkun Chen; Renjie Zheng; Atsuhito Kita; Mingbo Ma; Liang Huang

2021 EMNLP EMNLP 2021

Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings

Abstract

AbstractSimultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay. However, due to the lack of large-scale, high-quality simultaneous translation datasets, most such systems are still trained on conventional full-sentence bitexts. This is far from ideal for the simultaneous scenario due to the abundance of unnecessary long-distance reorderings in those bitexts. We propose a novel method that rewrites the target side of existing full-sentence corpora into simultaneous-style translation. Experiments on Zh→En and Ja→En simultaneous translation show substantial improvements (up to +2.7 BLEU) with the addition of these generated pseudo-references.

🧭 Keyword Pioneer — pseudo-reference generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Junkun Chen , Renjie Zheng , Atsuhito Kita , Mingbo Ma , Liang Huang

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Application Areas > Efficient Computing

Keywords

simultaneous translation word reordering pseudo-reference generation

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021