TED-CDB: A Large-Scale Chinese Discourse Relation Dataset on TED Talks

Wanqiu Long; Bonnie Webber; Deyi Xiong

2020 EMNLP EMNLP 2020

TED-CDB: A Large-Scale Chinese Discourse Relation Dataset on TED Talks

Abstract

AbstractAs different genres are known to differ in their communicative properties and as previously, for Chinese, discourse relations have only been annotated over news text, we have created the TED-CDB dataset. TED-CDB comprises a large set of TED talks in Chinese that have been manually annotated according to the goals and principles of Penn Discourse Treebank, but adapted to features that are not present in English. It serves as a unique Chinese corpus of spoken discourse. Benchmark experiments show that TED-CDB poses a challenge for state-of-the-art discourse relation classifiers, whose F1 performance on 4-way classification is 60%. This is a dramatic drop of 35% from performance on the news text in the Chinese Discourse Treebank. Transfer learning experiments have been carried out with the TED-CDB for both same-language cross-domain transfer and same-domain cross-language transfer. Both demonstrate that the TED-CDB can improve the performance of systems being developed for languages other than Chinese and would be helpful for insufficient or unbalanced data in other corpora. The dataset and our Chinese annotation guidelines will be made freely available.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wanqiu Long , Bonnie Webber , Deyi Xiong

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Core Methods > Classification Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Applications > Text Classification Machine Learning > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Transfer Learning Natural Language Processing > Applications > Text Processing

Keywords

transfer learning cross-lingual transfer discourse parsing chinese nlp cross-domain transfer discourse analysis discourse relation implicit relation chinese corpus chinese discourse relation ted talk

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020