Cross-lingual Visual Pre-training for Multimodal Machine Translation

Ozan Caglayan; Menekse Kuyu; Mustafa Sercan Amac; Pranava Madhyastha; Erkut Erdem; Aykut Erdem; Lucia Specia

2021 EACL EACL 2021

Cross-lingual Visual Pre-training for Multimodal Machine Translation

Abstract

AbstractPre-trained language models have been shown to improve performance in many natural language tasks substantially. Although the early focus of such models was single language pre-training, recent advances have resulted in cross-lingual and visual pre-training methods. In this paper, we combine these two approaches to learn visually-grounded cross-lingual representations. Specifically, we extend the translation language modelling (Lample and Conneau, 2019) with masked region classification and perform pre-training with three-way parallel vision & language corpora. We show that when fine-tuned for multimodal machine translation, these models obtain state-of-the-art performance. We also provide qualitative insights into the usefulness of the learned grounded representations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — masked region classification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ozan Caglayan , Menekse Kuyu , Mustafa Sercan Amac , Pranava Madhyastha , Erkut Erdem , Aykut Erdem , Lucia Specia

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Techniques > Pretraining Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation

Keywords

cross-lingual representation image captioning visual grounding pre-trained language model multimodal machine translation masked region classification

Download PDF

Related papers

Joint Coreference Resolution and Character Linking for Multiparty Conversation 2021

Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering 2021

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO 2021

Representations for Question Answering from Documents with Tables and Text 2021

Gender and Racial Fairness in Depression Research using Social Media 2021