Answer-Type Prediction for Visual Question Answering

Kushal Kafle; Christopher Kanan

2016 CVPR CVPR 2016

Answer-Type Prediction for Visual Question Answering

Abstract

Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued. In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach's key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Natural Language Processing

📈 Trend Setter — Multi-Modal Learning

🧭 Keyword Pioneer — answer type prediction

🐣 Hot Topic Early Bird — visual question answering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio