2020 COLING COLING 2020

TwiConv: A Coreference-annotated Corpus of Twitter Conversations

Abstract

AbstractThis article introduces TwiConv, an English coreference-annotated corpus of microblog conversations from Twitter. We describe the corpus compilation process and the annotation scheme, and release the corpus publicly, along with this paper. We manually annotated nominal coreference in 1756 tweets arranged in 185 conversation threads. The annotation achieves satisfactory annotation agreement results. We also present a new method for mapping the tweet contents with distributed stand-off annotations, which can easily be adapted to different annotation tasks.

🧭 Keyword Pioneer — distributed annotation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing