Unshuffling Data for Improved Generalization in Visual Question Answering

Damien Teney; Ehsan Abbasnejad; Anton van den Hengel

2021 ICCV ICCV 2021

Unshuffling Data for Improved Generalization in Visual Question Answering

Abstract

Generalization beyond the training distribution is a core challenge in machine learning. The common practice of mixing and shuffling examples when training neural networks may not be optimal in this regard. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization. We describe a training procedure to capture the patterns that are stable across environments while discarding spurious ones. The method makes a step beyond correlation-based learning: the choice of the partitioning allows injecting information about the task that cannot be otherwise recovered from the joint distribution of the training data. We demonstrate multiple use cases with the task of visual question answering, which is notorious for dataset biases. We obtain significant improvements on VQA-CP, using environments built from prior knowledge, existing meta data, or unsupervised clustering. We also get improvements on GQA using annotations of "equivalent questions", and on multi-dataset training (VQA v2 / Visual Genome) by treating them as distinct environments.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — non-i.i.d. training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Damien Teney , Ehsan Abbasnejad , Anton van den Hengel

Topics

Machine Learning > Application Areas > Domain Generalization Machine Learning > Learning Types > Transfer Learning Machine Learning > Learning Types > Domain Generalization Computer Vision > Core AI > Computer Vision Natural Language Processing > Applications > Visual Question Answering

Keywords

visual question answering out-of-distribution generalization spurious correlation dataset bia non-i.i.d. training environment partitioning

Download PDF

Related papers

Spatial-Temporal Transformer for Dynamic Scene Graph Generation 2021

ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators 2021

A Broad Study on the Transferability of Visual Representations With Contrastive Learning 2021

Query Adaptive Few-Shot Object Detection With Heterogeneous Graph Convolutional Networks 2021

Self-Supervised Neural Networks for Spectral Snapshot Compressive Imaging 2021