Supervising the Transfer of Reasoning Patterns in VQA

Corentin Kervadec; Christian Wolf; Grigory Antipov; Moez Baccouche; Madiha Nadri

2021 NIPS NeurIPS 2021

Supervising the Transfer of Reasoning Patterns in VQA

Abstract

Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning, hindering generalization. It has been recently shown that better reasoning patterns emerge in attention layers of a state-of-the-art VQA model when they are trained on perfect (oracle) visual inputs. This provides evidence that deep neural networks can learn to reason when training conditions are favorable enough. However, transferring this learned knowledge to deployable models is a challenge, as much of it is lost during the transfer.We propose a method for knowledge transfer based on a regularization term in our loss function, supervising the sequence of required reasoning operations.We provide a theoretical analysis based on PAC-learning, showing that such program prediction can lead to decreased sample complexity under mild hypotheses. We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and show its complementarity to BERT-like self-supervised pre-training.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — reasoning pattern

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Corentin Kervadec , Christian Wolf , Grigory Antipov , Moez Baccouche , Madiha Nadri

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Optimization & Theory > Learning Theory Deep Learning > Architectures > Neural Networks Deep Learning > Learning Types > Transfer Learning Deep Learning > Techniques > Attention Computer Vision > Applications > Visual Question Answering

Keywords

representation learning visual question answering self-supervised learning attention mechanism knowledge transfer neural network reasoning pattern

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021