Heterogeneous Graph Learning for Visual Commonsense Reasoning

Weijiang Yu; Jingwen Zhou; Weihao Yu; Xiaodan Liang; Nong Xiao

2019 NIPS NeurIPS 2019

Heterogeneous Graph Learning for Visual Commonsense Reasoning

Abstract

Visual commonsense reasoning task aims at leading the research field into solving cognition-level reasoning with the ability to predict correct answers and meanwhile providing convincing reasoning paths, resulting in three sub-tasks i.e., Q->A, QA->R and Q->AR. It poses great challenges over the proper semantic alignment between vision and linguistic domains and knowledge reasoning to generate persuasive reasoning paths. Existing works either resort to a powerful end-to-end network that cannot produce interpretable reasoning paths or solely explore intra-relationship of visual objects (homogeneous graph) while ignoring the cross-domain semantic alignment among visual concepts and linguistic words. In this paper, we propose a new Heterogeneous Graph Learning (HGL) framework for seamlessly integrating the intra-graph and inter-graph reasoning in order to bridge the vision and language domain. Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement. Moreover, our HGL integrates a contextual voting module to exploit a long-range visual context for better global reasoning. Experiments on the large-scale Visual Commonsense Reasoning benchmark demonstrate the superior performance of our proposed modules on three tasks (improving 5% accuracy on Q->A, 3.5% on QA->R, 5.8% on Q->AR).

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Visual Question Answering

🧭 Keyword Pioneer — heterogeneous graph

🐣 Hot Topic Early Bird — semantic alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Weijiang Yu , Jingwen Zhou , Weihao Yu , Xiaodan Liang , Nong Xiao

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Analysis > Scene Understanding Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Analysis > Visual Question Answering

Keywords

visual question answering semantic alignment knowledge reasoning heterogeneous graph visual commonsense reasoning cross-domain reasoning heterogeneous graph learning

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019