Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering

Zujie Liang; Weitao Jiang; Haifeng Hu; Jiaying Zhu

2020 EMNLP EMNLP 2020

Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering

Abstract

AbstractIn the task of Visual Question Answering (VQA), most state-of-the-art models tend to learn spurious correlations in the training set and achieve poor performance in out-of-distribution test data. Some methods of generating counterfactual samples have been proposed to alleviate this problem. However, the counterfactual samples generated by most previous methods are simply added to the training data for augmentation and are not fully utilized. Therefore, we introduce a novel self-supervised contrastive learning mechanism to learn the relationship between original samples, factual samples and counterfactual samples. With the better cross-modal joint embeddings learned from the auxiliary training objective, the reasoning capability and robustness of the VQA model are boosted significantly. We evaluate the effectiveness of our method by surpassing current state-of-the-art models on the VQA-CP dataset, a diagnostic benchmark for assessing the VQA model’s robustness.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — out-of-distribution generalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zujie Liang , Weitao Jiang , Haifeng Hu , Jiaying Zhu

Topics

Machine Learning > Learning Types > Contrastive Learning Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Contrastive Learning Natural Language Processing > Applications > Visual Question Answering Computer Vision > Analysis > Visual Question Answering Computer Vision > Applications > Visual Question Answering

Keywords

contrastive learning visual question answering self-supervised learning out-of-distribution generalization cross-modal embedding out-of-distribution robustness counterfactual sample

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020