Seeing the World through Text: Evaluating Image Descriptions for Commonsense Reasoning in Machine Reading Comprehension

Diana Galván-Sosa; Jun Suzuki; Kyosuke Nishida; Koji Matsuda; Kentaro Inui

2020 COLING COLING 2020

Seeing the World through Text: Evaluating Image Descriptions for Commonsense Reasoning in Machine Reading Comprehension

Abstract

AbstractDespite recent achievements in natural language understanding, reasoning over commonsense knowledge still represents a big challenge to AI systems. As the name suggests, common sense is related to perception and as such, humans derive it from experience rather than from literary education. Recent works in the NLP and the computer vision field have made the effort of making such knowledge explicit using written language and visual inputs, respectively. Our premise is that the latter source fits better with the characteristics of commonsense acquisition. In this work, we explore to what extent the descriptions of real-world scenes are sufficient to learn common sense about different daily situations, drawing upon visual information to answer script knowledge questions.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Diana Galván-Sosa , Jun Suzuki , Kyosuke Nishida , Koji Matsuda , Kentaro Inui

Topics

Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Applications > Machine Reading Comprehension

Keywords

visual reasoning machine reading comprehension commonsense reasoning script knowledge image description

Download PDF

Related papers

Persuasiveness of News Editorials depending on Ideology and Personality 2020

A Graph Representation of Semi-structured Data for Web Question Answering 2020

Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations 2020

Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism 2020

End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network 2020