2019
CVPR
CVPR 2019
Answer Them All! Toward Universal Visual Question Answering Models
Abstract
Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both domains. To make the comparison fair, all of the models are standardized as much as possible, e.g., they use the same visual features, answer vocabularies, etc. We find that methods do not generalize across the two domains. To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing
🧭
Keyword Pioneer
— universal model
🐣
Hot Topic Early Bird
— image understanding
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Computer Vision > Processing > Image Restoration
Natural Language Processing > Applications > Question Answering
Artificial Intelligence > Core AI > Reasoning
Computer Vision > Core AI > Multimodal Learning
Deep Learning > Learning Types > Multi-Modal Learning
Computer Vision > Applications > Visual Question Answering