doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

Song Feng; Hui Wan; Chulaka Gunasekara; Siva Patel; Sachindra Joshi; Luis Lastras

2020 EMNLP EMNLP 2020

doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

Abstract

AbstractWe introduce doc2dial, a new dataset of goal-oriented dialogues that are grounded in the associated documents. Inspired by how the authors compose documents for guiding end users, we first construct dialogue flows based on the content elements that corresponds to higher-level relations across text sections as well as lower-level relations between discourse units within a section. Then we present these dialogue flows to crowd contributors to create conversational utterances. The dataset includes over 4500 annotated conversations with an average of 14 turns that are grounded in over 450 documents from four domains. Compared to the prior document-grounded dialogue datasets, this dataset covers a variety of dialogue scenes in information-seeking conversations. For evaluating the versatility of the dataset, we introduce multiple dialogue modeling tasks and present baseline approaches.

🧭 Keyword Pioneer — document grounding

🐣 Hot Topic Early Bird — dataset creation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Song Feng , Hui Wan , Chulaka Gunasekara , Siva Patel , Sachindra Joshi , Luis Lastras

Topics

Artificial Intelligence > Core AI > Human-AI Interaction

Keywords

dataset creation conversational ai information seeking goal-oriented dialogue dialogue system dialogue dataset document grounding

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020