Region under Discussion for visual dialog

Mauricio Mazuecos; Franco M. Luque; Jorge Sánchez; Hernán Maina; Thomas Vadora; Luciana Benotti

2021 EMNLP EMNLP 2021

Region under Discussion for visual dialog

Abstract

AbstractVisual Dialog is assumed to require the dialog history to generate correct responses during a dialog. However, it is not clear from previous work how dialog history is needed for visual dialog. In this paper we define what it means for a visual question to require dialog history and we release a subset of the Guesswhat?! questions for which their dialog history completely changes their responses. We propose a novel interpretable representation that visually grounds dialog history: the Region under Discussion. It constrains the image’s spatial features according to a semantic representation of the history inspired by the information structure notion of Question under Discussion.We evaluate the architecture on task-specific multimodal models and the visual transformer model LXMERT.

🌉 Interdisciplinary Bridge — Computer Vision and Natural Language Processing

📈 Trend Setter — Video Understanding

🧭 Keyword Pioneer — visual dialog

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Mauricio Mazuecos , Franco M. Luque , Jorge Sánchez , Hernán Maina , Thomas Vadora , Luciana Benotti

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Processing > Video Understanding Natural Language Processing > Generation > Dialogue Systems Artificial Intelligence > Core AI > Language Deep Learning > Learning Types > Multi-Modal Learning Artificial Intelligence > Core AI > Multi-Modal Learning Artificial Intelligence > Core AI > Dialogue Systems

Keywords

visual question answering multimodal learning visual grounding visual dialog multimodal interaction dialogue system image feature dialog history question under discussion

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021