Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

Zhe Gan; Yu Cheng; Ahmed Kholy; Linjie Li; Jingjing Liu; Jianfeng Gao

2019 ACL ACL 2019

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

Abstract

AbstractThis paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image. In each question-answering turn of a dialog, ReDAN infers the answer progressively through multiple reasoning steps. In each step of the reasoning process, the semantic representation of the question is updated based on the image and the previous dialog history, and the recurrently-refined representation is used for further reasoning in the subsequent step. On the VisDial v1.0 dataset, the proposed ReDAN model achieves a new state-of-the-art of 64.47% NDCG score. Visualization on the reasoning process further demonstrates that ReDAN can locate context-relevant visual and textual clues via iterative refinement, which can lead to the correct answer step-by-step.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — dual attention mechanism

🐣 Hot Topic Early Bird — iterative refinement

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhe Gan , Yu Cheng , Ahmed Kholy , Linjie Li , Jingjing Liu , Jianfeng Gao

Topics

Machine Learning > Core Methods > Representation Learning Natural Language Processing > Generation > Dialogue Systems Natural Language Processing > Applications > Question Answering Natural Language Processing > Applications > Dialogue Systems Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Architectures > Recurrent Neural Networks

Keywords

visual question answering attention mechanism visual dialog iterative refinement recurrent neural network dialogue system multi-step reasoning dual attention mechanism

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019