Does syntax help discourse segmentation? Not so much

Chloé Braud; Ophélie Lacroix; Anders Søgaard

2017 EMNLP EMNLP 2017

Does syntax help discourse segmentation? Not so much

Abstract

AbstractDiscourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages, for two reasons: (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced with universal dependencies, or left out completely, as well as how state-of-the-art segmenters fare in the absence of sentence boundaries. Our results show that dependency information is less useful than expected, but we provide a fully scalable, robust model that only relies on part-of-speech information, and show that it performs well across languages in the absence of any gold-standard annotation.

❓ The Questioner

📈 Trend Setter — Domain Generalization

🧭 Keyword Pioneer — cross-linguistic model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Chloé Braud , Ophélie Lacroix , Anders Søgaard

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Domain Generalization

Keywords

universal dependencies dependency parsing part-of-speech tagging discourse segmentation cross-linguistic model part-of-speech information

Download PDF

Related papers

Reinforced Video Captioning with Entailment Rewards 2017

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017