2021
EMNLP
EMNLP 2021
Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization
Abstract
AbstractThis paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🐣
Hot Topic Early Bird
— data synthesis
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Optimization & Theory > Optimization
Machine Learning > Application Areas > Data Augmentation
Natural Language Processing > Generation > Summarization
Natural Language Processing > Applications > Summarization
Machine Learning > Learning Types > Data Augmentation
Machine Learning > Learning Paradigms > Curriculum Learning