SumTitles: a Summarization Dataset with Low Extractiveness

Valentin Malykh; Konstantin Chernis; Ekaterina Artemova; Irina Piontkovskaya

2020 COLING COLING 2020

SumTitles: a Summarization Dataset with Low Extractiveness

Abstract

AbstractThe existing dialogue summarization corpora are significantly extractive. We introduce a methodology for dataset extractiveness evaluation and present a new low-extractive corpus of movie dialogues for abstractive text summarization along with baseline evaluation. The corpus contains 153k dialogues and consists of three parts: 1) automatically aligned subtitles, 2) automatically aligned scenes from scripts, and 3) manually aligned scenes from scripts. We also present an alignment algorithm which we use to construct the corpus.

🐣 Hot Topic Early Bird — dialogue summarization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Valentin Malykh , Konstantin Chernis , Ekaterina Artemova , Irina Piontkovskaya

Topics

Natural Language Processing > Generation > Summarization

Keywords

dialogue summarization abstractive summarization

Download PDF

Related papers

Persuasiveness of News Editorials depending on Ideology and Personality 2020

A Graph Representation of Semi-structured Data for Web Question Answering 2020

Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations 2020

Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism 2020

End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network 2020