Faithful Multimodal Explanation for Visual Question Answering

Jialin Wu; Raymond Mooney

2019 ACL ACL 2019

Faithful Multimodal Explanation for Visual Question Answering

Abstract

AbstractAI systems’ ability to explain their reasoning is critical to their utility and trustworthiness. Deep neural networks have enabled significant progress on many challenging problems such as visual question answering (VQA). However, most of them are opaque black boxes with limited explanatory capability. This paper presents a novel approach to developing a high-performing VQA system that can elucidate its answers with integrated textual and visual explanations that faithfully reflect important aspects of its underlying reasoning while capturing the style of comprehensible human explanations. Extensive experimental evaluation demonstrates the advantages of this approach compared to competing methods using both automated metrics and human evaluation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

📈 Trend Setter — Vision-Language Models

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jialin Wu , Raymond Mooney

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Architectures > Neural Networks Deep Learning > Learning Types > Representation Learning Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Models > Vision-Language Models

Keywords

visual question answering text generation multimodal learning model interpretability neural network

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019