CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Jingheng Ye; Zishan Xu; Yinghui Li; Linlin Song; Qingyu Zhou; Hai-Tao Zheng; Ying Shen; Wenhao Jiang; Hong-Gee Kim; Ruitong Liu; Xin Su; Zifei Shan

2025 ACL ACL 2025

CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Abstract

AbstractThe paper focuses on the interpretability of Grammatical Error Correction (GEC) evaluation metrics, which received little attention in previous studies. To bridge the gap, we introduce **CLEME2.0**, a reference-based metric describing four fundamental aspects of GEC systems: hit-correction, wrong-correction, under-correction, and over-correction. They collectively contribute to exposing critical qualities and locating drawbacks of GEC systems. Evaluating systems by combining these aspects also leads to superior human consistency over other reference-based and reference-less metrics. Extensive experiments on two human judgment datasets and six reference datasets demonstrate the effectiveness and robustness of our method, achieving a new state-of-the-art result. Our codes are released at https://github.com/THUKElab/CLEME.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Jingheng Ye , Zishan Xu , Yinghui Li , Linlin Song , Qingyu Zhou , Hai-Tao Zheng , Ying Shen , Wenhao Jiang , Hong-Gee Kim , Ruitong Liu , Xin Su , Zifei Shan

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Core Methods > Evaluation Deep Learning > Optimization & Theory > Evaluation Natural Language Processing > Applications > Text Processing

Keywords

text editing grammatical error correction evaluation metric reference-based metric

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Abstract

Authors

Topics

Keywords

Related papers