In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering

Peter Vickers; Nikolaos Aletras; Emilio Monti; Loic Barrault

2021 ACL ACL 2021

In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering

Abstract

AbstractVisual Question Answering (VQA) methods aim at leveraging visual input to answer questions that may require complex reasoning over entities. Current models are trained on labelled data that may be insufficient to learn complex knowledge representations. In this paper, we propose a new method to enhance the reasoning capabilities of a multi-modal pretrained model (Vision+Language BERT) by integrating facts extracted from an external knowledge base. Evaluation on the KVQA dataset benchmark demonstrates that our method outperforms competitive baselines by 19%, achieving new state-of-the-art results. We also perform an extensive analysis highlighting the limitations of our best performing model through an ablation study.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Knowledge & Reasoning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — multi-modal pretrained model

🐣 Hot Topic Early Bird — vision language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Peter Vickers , Nikolaos Aletras , Emilio Monti , Loic Barrault

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Applications > Question Answering Knowledge & Reasoning > Representation > Knowledge Graphs Natural Language Processing > Applications > Visual Question Answering

Keywords

transfer learning visual question answering multimodal learning knowledge base vision language model knowledge reasoning knowledge integration external knowledge multi-modal reasoning fact extraction vision language external knowledge base multi-modal pretrained model

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021