Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts

Ece Takmaz; Mario Giulianelli; Sandro Pezzelle; Arabella Sinclair; Raquel Fernández

2020 EMNLP EMNLP 2020

Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts

Abstract

AbstractDialogue participants often refer to entities or situations repeatedly within a conversation, which contributes to its cohesiveness. Subsequent references exploit the common ground accumulated by the interlocutors and hence have several interesting properties, namely, they tend to be shorter and reuse expressions that were effective in previous mentions. In this paper, we tackle the generation of first and subsequent references in visually grounded dialogue. We propose a generation model that produces referring utterances grounded in both the visual and the conversational context. To assess the referring effectiveness of its output, we also implement a reference resolution system. Our experiments and analyses show that the model produces better, more effective referring utterances than a model not grounded in the dialogue context, and generates subsequent references that exhibit linguistic patterns akin to humans.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Natural Language Processing

🧭 Keyword Pioneer — reference generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ece Takmaz , Mario Giulianelli , Sandro Pezzelle , Arabella Sinclair , Raquel Fernández

Topics

Computer Vision > Generation > Image Captioning Natural Language Processing > Generation > Dialogue Systems Natural Language Processing > Applications > Dialogue Systems Computer Vision > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Multi-Modal Learning Artificial Intelligence > Core AI > Dialogue Systems

Keywords

multimodal learning referring expression visual grounding visual dialogue dialogue system conversational context reference generation grounded dialogue subsequent reference

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020