Using Linguistic Resources to Evaluate the Quality of Annotated Corpora

Max Silberztein

2018 COLING COLING 2018

Using Linguistic Resources to Evaluate the Quality of Annotated Corpora

Abstract

AbstractStatistical and neural-network-based methods that compute their results by comparing a given text to be analyzed with a reference corpus assume that the reference corpus is complete and reliable enough. In this article, I conduct several experiments on an extract of the Open American National Corpus to verify this assumption.

🧭 Keyword Pioneer — reference corpus

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Max Silberztein

Topics

Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Optimization & Theory > Statistical Learning

Keywords

corpus quality annotated corpus reference corpus neural network statistical method

Download PDF

Related papers

DialEdit: Annotations for Spoken Conversational Image Editing 2018

Downward Compatible Revision of Dialogue Annotation 2018

Zero Pronoun Resolution with Attention-based Neural Network 2018

Triad-based Neural Network for Coreference Resolution 2018

Challenges of language technologies for the indigenous languages of the Americas 2018