Learning Visually Grounded Sentence Representations

Douwe Kiela; Alexis CONNEAU; Allan Jabri; Maximilian Nickel

2018 NAACL NAACL 2018

Learning Visually Grounded Sentence Representations

Abstract

AbstractWe investigate grounded sentence representations, where we train a sentence encoder to predict the image features of a given caption—i.e., we try to “imagine” how a sentence would be depicted visually—and use the resultant features as sentence representations. We examine the quality of the learned representations on a variety of standard sentence representation quality benchmarks, showing improved performance for grounded models over non-grounded ones. In addition, we thoroughly analyze the extent to which grounding contributes to improved performance, and show that the system also learns improved word embeddings.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning

🧭 Keyword Pioneer — visually grounded representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Douwe Kiela , Alexis CONNEAU , Allan Jabri , Maximilian Nickel

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Representation Learning Computer Vision > Core AI > Multimodal Learning

Keywords

representation learning multimodal learning visual grounding sentence representation image feature visually grounded representation

Download PDF

Related papers

A Melody-Conditioned Lyrics Language Model 2018

Before Name-Calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation 2018

Automated Essay Scoring in the Presence of Biased Ratings 2018

Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input 2018

QuickEdit: Editing Text & Translations by Crossing Words Out 2018