Image Pivoting for Learning Multilingual Multimodal Representations

Spandana Gella; Rico Sennrich; Frank Keller; Mirella Lapata

2017 EMNLP EMNLP 2017

Image Pivoting for Learning Multilingual Multimodal Representations

Abstract

AbstractIn this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

🌱 Topic Pioneer — Multimodal Learning

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Multimodal Learning

🧭 Keyword Pioneer — pairwise ranking loss

🐣 Hot Topic Early Bird — multimodal representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Spandana Gella , Rico Sennrich , Frank Keller , Mirella Lapata

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Metric Learning Natural Language Processing > Applications > Information Retrieval Computer Vision > Core AI > Multimodal Learning Machine Learning > Learning Types > Multimodal Learning Deep Learning > Learning Types > Multi-Modal Learning Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

multimodal representation pairwise ranking loss image-text matching multilingual representation cross-lingual retrieval semantic textual similarity multilingual multimodal representation

Download PDF

Related papers

Reinforced Video Captioning with Entailment Rewards 2017

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017