A Methodology for the Comparison of Human Judgments With Metrics for Coreference Resolution

Mariya Borovikova; Loïc Grobol; Anaïs Halftermeyer; Sylvie Billot

2022 ACL ACL 2022

A Methodology for the Comparison of Human Judgments With Metrics for Coreference Resolution

Abstract

AbstractWe propose a method for investigating the interpretability of metrics used for the coreference resolution task through comparisons with human judgments. We provide a corpus with annotations of different error types and human evaluations of their gravity. Our preliminary analysis shows that metrics considerably overlook several error types and overlook errors in general in comparison to humans. This study is conducted on French texts, but the methodology is language-independent.

🧭 Keyword Pioneer — metric comparison

🐣 Hot Topic Early Bird — human judgment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio