2022
IJCNLP
IJCNLP 2022
GCDT: A Chinese RST Treebank for Multigenre and Multilingual Discourse Parsing
Abstract
AbstractA lack of large-scale human-annotated data has hampered the hierarchical discourse parsing of Chinese. In this paper, we present GCDT, the largest hierarchical discourse treebank for Mandarin Chinese in the framework of Rhetorical Structure Theory (RST). GCDT covers over 60K tokens across five genres of freely available text, using the same relation inventory as contemporary RST treebanks for English. We also report on this dataset’s parsing experiments, including state-of-the-art (SOTA) scores for Chinese RST parsing and RST parsing on the English GUM dataset, using cross-lingual training in Chinese and English with multilingual embeddings.
🌉
Interdisciplinary Bridge
— Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Representation Learning
Machine Learning > Application Areas > Domain Generalization
Deep Learning > Techniques > Pretraining
Natural Language Processing > Understanding > Semantic Analysis
Natural Language Processing > Resources & Methods > Multilingual NLP
Interdisciplinary > Linguistics > Computational Linguistics