ConceptBert: Concept-Aware Representation for Visual Question Answering

François Gardères; Maryam Ziaeefard; Baptiste Abeloos; Freddy Lecue

2020 EMNLP EMNLP 2020

ConceptBert: Concept-Aware Representation for Visual Question Answering

Abstract

AbstractVisual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. A VQA model combines visual and textual features in order to answer questions grounded in an image. Current works in VQA focus on questions which are answerable by direct analysis of the question and image alone. We present a concept-aware algorithm, ConceptBert, for questions which require common sense, or basic factual knowledge from external structured content. Given an image and a question in natural language, ConceptBert requires visual elements of the image and a Knowledge Graph (KG) to infer the correct answer. We introduce a multi-modal representation which learns a joint Concept-Vision-Language embedding inspired by the popular BERT architecture. We exploit ConceptNet KG for encoding the common sense knowledge and evaluate our methodology on the Outside Knowledge-VQA (OK-VQA) and VQA datasets.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Knowledge & Reasoning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — concept net

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

François Gardères , Maryam Ziaeefard , Baptiste Abeloos , Freddy Lecue

Topics

Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Architectures > Transformers Computer Vision > Analysis > Scene Understanding Natural Language Processing > Applications > Question Answering Knowledge & Reasoning > Representation > Knowledge Graphs Computer Vision > Core AI > Multimodal Learning Deep Learning > Learning Types > Multi-Modal Learning

Keywords

visual question answering multimodal learning multi-modal learning knowledge graph concept embedding common sense reasoning bert architecture multi-modal representation concept recognition concept net

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020