A Corpus for Reasoning about Natural Language Grounded in Photographs

Alane Suhr; Stephanie Zhou; Ally Zhang; Iris Zhang; Huajun Bai; Yoav Artzi

2019 ACL ACL 2019

A Corpus for Reasoning about Natural Language Grounded in Photographs

Abstract

AbstractWe introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language. Qualitative analysis shows the data requires compositional joint reasoning, including about quantities, comparisons, and relations. Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

📈 Trend Setter — Machine Reading Comprehension

🧭 Keyword Pioneer — joint reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — compositional reasoning

Authors

Alane Suhr , Stephanie Zhou , Ally Zhang , Iris Zhang , Huajun Bai , Yoav Artzi

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Weakly Supervised Learning Natural Language Processing > Applications > Machine Reading Comprehension Natural Language Processing > Applications > Natural Language Inference Machine Learning > Learning Types > Multi-Modal Learning Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Analysis > Visual Question Answering

Keywords

natural language inference multimodal learning image captioning visual reasoning compositional reasoning joint reasoning natural language grounding image caption semantic diversity comparative reasoning compare-and-contrast task

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019