Self-Critical Reasoning for Robust Visual Question Answering

Jialin Wu; Raymond Mooney

2019 NIPS NeurIPS 2019

Self-Critical Reasoning for Robust Visual Question Answering

Abstract

Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution. To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates. The influential regions are either determined from human visual/textual explanations or automatically from just significant words in the question and answer. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e. 49.5\% using textual explanations and 48.5\% using automatically

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Robustness

🧭 Keyword Pioneer — question-answer distribution

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jialin Wu , Raymond Mooney

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Application Areas > Domain Generalization Natural Language Processing > Applications > Question Answering Artificial Intelligence > Core AI > Reasoning Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Generation > Visual Question Answering Computer Vision > Applications > Question Answering Deep Learning > Learning Types > Robustness

Keywords

visual question answering self-supervised learning attention mechanism multimodal learning visual explanation robust generalization language prior self-critical training question-answer distribution robust visual question answering

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019