Simple Conversational Data Augmentation for Semi-supervised Abstractive Dialogue Summarization

Jiaao Chen; Diyi Yang

2021 EMNLP EMNLP 2021

Simple Conversational Data Augmentation for Semi-supervised Abstractive Dialogue Summarization

Abstract

AbstractAbstractive conversation summarization has received growing attention while most current state-of-the-art summarization models heavily rely on human-annotated summaries. To reduce the dependence on labeled summaries, in this work, we present a simple yet effective set of Conversational Data Augmentation (CODA) methods for semi-supervised abstractive conversation summarization, such as random swapping/deletion to perturb the discourse relations inside conversations, dialogue-acts-guided insertion to interrupt the development of conversations, and conditional-generation-based substitution to substitute utterances with their paraphrases generated based on the conversation context. To further utilize unlabeled conversations, we combine CODA with two-stage noisy self-training where we first pre-train the summarization model on unlabeled conversations with pseudo summaries and then fine-tune it on labeled conversations. Experiments conducted on the recent conversation summarization datasets demonstrate the effectiveness of our methods over several state-of-the-art data augmentation baselines.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — noisy self-training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiaao Chen , Diyi Yang

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Generation > Summarization Natural Language Processing > Applications > Summarization Machine Learning > Learning Types > Data Augmentation Machine Learning > Learning Paradigms > Semi-Supervised Learning Deep Learning > Learning Types > Data Augmentation Deep Learning > Learning Types > Semi-Supervised Learning

Keywords

semi-supervised learning data augmentation text generation dialogue summarization abstractive summarization noisy self-training conversational data augmentation

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021