Visual Question Answering with Question Representation Update (QRU)

Ruiyu Li; Jiaya Jia

2016 NIPS NeurIPS 2016

Visual Question Answering with Question Representation Update (QRU)

Abstract

Our method aims at reasoning over natural language questions and visual images. Given a natural language question about an image, our model updates the question representation iteratively by selecting image regions relevant to the query and learns to give the correct answer. Our model contains several reasoning layers, exploiting complex visual relations in the visual question answering (VQA) task. The proposed network is end-to-end trainable through back-propagation, where its weights are initialized using pre-trained convolutional neural network (CNN) and gated recurrent unit (GRU). Our method is evaluated on challenging datasets of COCO-QA and VQA and yields state-of-the-art performance.

🌱 Topic Pioneer — Vision-Language Models

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Visual Question Answering

🧭 Keyword Pioneer — question representation

🐣 Hot Topic Early Bird — visual question answering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ruiyu Li , Jiaya Jia

Topics

Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Generation > Visual Question Answering Artificial Intelligence > Core AI > Vision-Language Models

Keywords

visual question answering multimodal learning visual reasoning convolutional neural network recurrent neural network image understanding question representation image reasoning

Download PDF

Related papers

Bayesian Intermittent Demand Forecasting for Large Inventories 2016

Dynamic Network Surgery for Efficient DNNs 2016

Beyond Exchangeability: The Chinese Voting Process 2016

Safe and Efficient Off-Policy Reinforcement Learning 2016

Tagger: Deep Unsupervised Perceptual Grouping 2016