Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Qing Li; Jianlong Fu; Dongfei Yu; Tao Mei; Jiebo Luo

2018 EMNLP EMNLP 2018

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Abstract

AbstractIn Visual Question Answering, most existing approaches adopt the pipeline of representing an image via pre-trained CNNs, and then using the uninterpretable CNN features in conjunction with the question to predict the answer. Although such end-to-end models might report promising performance, they rarely provide any insight, apart from the answer, into the VQA process. In this work, we propose to break up the end-to-end VQA into two steps: explaining and reasoning, in an attempt towards a more explainable VQA by shedding light on the intermediate results between these two steps. To that end, we first extract attributes and generate descriptions as explanations for an image. Next, a reasoning module utilizes these explanations in place of the image to infer an answer. The advantages of such a breakdown include: (1) the attributes and captions can reflect what the system extracts from the image, thus can provide some insights for the predicted answer; (2) these intermediate results can help identify the inabilities of the image understanding or the answer inference part when the predicted answer is wrong. We conduct extensive experiments on a popular VQA dataset and our system achieves comparable performance with the baselines, yet with added benefits of explanability and the inherent ability to further improve with higher quality explanations.

🌉 Interdisciplinary Bridge — Computer Vision and Natural Language Processing

🧭 Keyword Pioneer — attribute extraction

🐣 Hot Topic Early Bird — explainable ai

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qing Li , Jianlong Fu , Dongfei Yu , Tao Mei , Jiebo Luo

Topics

Computer Vision > Generation > Image Captioning Natural Language Processing > Applications > Question Answering

Keywords

visual question answering attribute extraction image captioning explainable ai semantic reasoning

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018