2018
EMNLP
EMNLP 2018
Learning To Split and Rephrase From Wikipedia Edit History
Abstract
AbstractSplit and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia’s edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Natural Language Processing
📈
Trend Setter
— Text Classification
🧭
Keyword Pioneer
— split and rephrase
🐣
Hot Topic Early Bird
— text simplification
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio