Do UD Trees Match Mention Spans in Coreference Annotations?

Martin Popel; Zdeněk Žabokrtský; Anna Nedoluzhko; Michal Novák; Daniel Zeman

2021 EMNLP EMNLP 2021

Do UD Trees Match Mention Spans in Coreference Annotations?

Abstract

AbstractOne can find dozens of data resources for various languages in which coreference - a relation between two or more expressions that refer to the same real-world entity - is manually annotated. One could also assume that such expressions usually constitute syntactically meaningful units; however, mention spans have been annotated simply by delimiting token intervals in most coreference projects, i.e., independently of any syntactic representation. We argue that it could be advantageous to make syntactic and coreference annotations convergent in the long term. We present a pilot empirical study focused on matches and mismatches between hand-annotated linear mention spans and automatically parsed syntactic trees that follow Universal Dependencies conventions. The study covers 9 datasets for 8 different languages.

❓ The Questioner

🌉 Interdisciplinary Bridge — Interdisciplinary and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Martin Popel , Zdeněk Žabokrtský , Anna Nedoluzhko , Michal Novák , Daniel Zeman

Topics

Natural Language Processing > Understanding > Coreference Resolution Natural Language Processing > Understanding > Syntax Interdisciplinary > Linguistics > Computational Linguistics Interdisciplinary > Linguistics > Syntax

Keywords

syntactic parsing universal dependencies coreference resolution syntactic tree syntactic annotation empirical analysis mention span

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021