2021 ACL ACL 2021

[RETRACTED] Breaking the Corpus Bottleneck for Context-Aware Neural Machine Translation with Cross-Task Pre-training

Abstract

AbstractContext-aware neural machine translation (NMT) remains challenging due to the lack of large-scale document-level parallel corpora. To break the corpus bottleneck, in this paper we aim to improve context-aware NMT by taking the advantage of the availability of both large-scale sentence-level parallel dataset and source-side monolingual documents. To this end, we propose two pre-training tasks. One learns to translate a sentence from source language to target language on the sentence-level parallel dataset while the other learns to translate a document from deliberately noised to original on the monolingual documents. Importantly, the two pre-training tasks are jointly and simultaneously learned via the same model, thereafter fine-tuned on scale-limited parallel documents from both sentence-level and document-level perspectives. Experimental results on four translation tasks show that our approach significantly improves translation performance. One nice property of our approach is that the fine-tuned model can be used to translate both sentences and documents.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing
🧭 Keyword Pioneer — context-aware neural machine translation
🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing