Counterfactual Vision and Language Learning

Ehsan Abbasnejad; Damien Teney; Amin Parvaneh; Javen Shi; Anton van den Hengel

2020 CVPR CVPR 2020

Counterfactual Vision and Language Learning

Abstract

The ongoing success of visual question answering methods has been somwehat surprising given that, at its most general, the problem requires understanding the entire variety of both visual and language stimuli. It is particularly remarkable that this success has been achieved on the basis of comparatively small datasets, given the scale of the problem. One explanation is that this has been accomplished partly by exploiting bias in the datasets rather than developing deeper multi-modal reasoning. This fundamentally limits the generalization of the method, and thus its practical applicability. We propose a method that addresses this problem by introducing counterfactuals in the training. In doing so we leverage structural causal models for counterfactual evaluation to formulate alternatives, for instance, questions that could be asked of the same image set. We show that simulating plausible alternative training data through this process results in better generalization.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — counterfactual reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ehsan Abbasnejad , Damien Teney , Amin Parvaneh , Javen Shi , Anton van den Hengel

Topics

Artificial Intelligence > Core AI > Causal Inference Deep Learning > Models > Generative Models Computer Vision > Generation > Image Captioning Artificial Intelligence > Core AI > Reasoning Machine Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Multimodal Learning

Keywords

causal inference visual question answering multimodal learning multi-modal learning counterfactual reasoning structural causal model

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020