Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task

Alireza Mohammadshahi; Rémi Lebret; Karl Aberer

2019 EMNLP EMNLP 2019

Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task

Abstract

AbstractIn this paper, we propose a new approach to learn multimodal multilingual embeddings for matching images and their relevant captions in two languages. We combine two existing objective functions to make images and captions close in a joint embedding space while adapting the alignment of word embeddings between existing languages in our model. We show that our approach enables better generalization, achieving state-of-the-art performance in text-to-image and image-to-text retrieval task, and caption-caption similarity task. Two multimodal multilingual datasets are used for evaluation: Multi30k with German and English captions and Microsoft-COCO with English and Japanese captions.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — caption similarity

🐣 Hot Topic Early Bird — image-text matching

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alireza Mohammadshahi , Rémi Lebret , Karl Aberer

Topics

Machine Learning > Core Methods > Metric Learning Machine Learning > Core Methods > Embedding Learning Computer Vision > Generation > Image Captioning Natural Language Processing > Applications > Information Retrieval Deep Learning > Learning Types > Multi-Modal Learning

Keywords

multimodal learning cross-modal retrieval text-to-image retrieval image-text matching multilingual word embedding joint embedding space caption similarity

Download PDF

Related papers

Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation 2019

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference 2019

A Boundary-aware Neural Model for Nested Named Entity Recognition 2019

Iterative Dual Domain Adaptation for Neural Machine Translation 2019

A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation 2019