Multi-grained Attention with Object-level Grounding for Visual Question Answering

Pingping Huang; Jianhui Huang; Yuqing Guo; Min Qiao; Yong Zhu

2019 ACL ACL 2019

Multi-grained Attention with Object-level Grounding for Visual Question Answering

Abstract

AbstractAttention mechanisms are widely used in Visual Question Answering (VQA) to search for visual clues related to the question. Most approaches train attention models from a coarse-grained association between sentences and images, which tends to fail on small objects or uncommon concepts. To address this problem, this paper proposes a multi-grained attention method. It learns explicit word-object correspondence by two types of word-level attention complementary to the sentence-image association. Evaluated on the VQA benchmark, the multi-grained attention model achieves competitive performance with state-of-the-art models. And the visualized attention maps demonstrate that addition of object-level groundings leads to a better understanding of the images and locates the attended objects more precisely.

🌉 Interdisciplinary Bridge — Computer Vision and Interdisciplinary and Natural Language Processing

🧭 Keyword Pioneer — word-object correspondence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Pingping Huang , Jianhui Huang , Yuqing Guo , Min Qiao , Yong Zhu

Topics

Natural Language Processing > Applications > Question Answering Interdisciplinary > Cognitive Science > Perception Computer Vision > Core AI > Multimodal Learning Natural Language Processing > Applications > Visual Question Answering

Keywords

visual question answering attention mechanism visual grounding multi-grained attention object grounding object-level grounding word-object correspondence

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019