Answer Them All! Toward Universal Visual Question Answering Models

Robik Shrestha; Kushal Kafle; Christopher Kanan

2019 CVPR CVPR 2019

Answer Them All! Toward Universal Visual Question Answering Models

Abstract

Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both domains. To make the comparison fair, all of the models are standardized as much as possible, e.g., they use the same visual features, answer vocabularies, etc. We find that methods do not generalize across the two domains. To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — universal model

🐣 Hot Topic Early Bird — image understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Robik Shrestha , Kushal Kafle , Christopher Kanan

Topics

Computer Vision > Processing > Image Restoration Natural Language Processing > Applications > Question Answering Artificial Intelligence > Core AI > Reasoning Computer Vision > Core AI > Multimodal Learning Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Applications > Visual Question Answering

Keywords

benchmark evaluation domain generalization visual question answering multi-modal learning natural language understanding image understanding universal model synthetic dataset synthetic reasoning natural image understanding

Download PDF

Related papers

Fast Single Image Reflection Suppression via Convex Optimization 2019

Learning Video Representations From Correspondence Proposals 2019

ATOM: Accurate Tracking by Overlap Maximization 2019

Visual Tracking via Adaptive Spatially-Regularized Correlation Filters 2019

Edge-Labeling Graph Neural Network for Few-Shot Learning 2019