Where to Look: Focus Regions for Visual Question Answering

Kevin J. Shih; Saurabh Singh; Derek Hoiem

2016 CVPR CVPR 2016

Where to Look: Focus Regions for Visual Question Answering

Abstract

We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method maps textual queries and visual features from various regions into a shared space where they are compared for relevance with an inner product. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the recently released VQA dataset, which features free-form human-annotated questions and answers.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Natural Language Processing

📈 Trend Setter — Question Answering

🐣 Hot Topic Early Bird — visual question answering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio