Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction

Christopher Bryant; Mariano Felice; Ted Briscoe

2017 ACL ACL 2017

Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction

Abstract

AbstractUntil now, error type performance for Grammatical Error Correction (GEC) systems could only be measured in terms of recall because system output is not annotated. To overcome this problem, we introduce ERRANT, a grammatical ERRor ANnotation Toolkit designed to automatically extract edits from parallel original and corrected sentences and classify them according to a new, dataset-agnostic, rule-based framework. This not only facilitates error type evaluation at different levels of granularity, but can also be used to reduce annotator workload and standardise existing GEC datasets. Human experts rated the automatic edits as “Good” or “Acceptable” in at least 95% of cases, so we applied ERRANT to the system output of the CoNLL-2014 shared task to carry out a detailed error type analysis for the first time.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning

🧭 Keyword Pioneer — error type classification

🐣 Hot Topic Early Bird — grammatical error correction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio

Authors

Christopher Bryant , Mariano Felice , Ted Briscoe

Topics

Machine Learning > Core Methods > Classification Interdisciplinary > Linguistics > Computational Linguistics

Keywords

grammatical error correction automatic annotation error type classification parallel sentence rule-based framework

Download PDF

Related papers

A* CCG Parsing with a Supertag and Dependency Factored Model 2017

Detecting annotation noise in automatically labelled data 2017

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2017

Annotating tense, mood and voice for English, French and German 2017

Word Embedding for Response-To-Text Assessment of Evidence 2017