Common Law Annotations: Investigating the Stability of Dialog System Output Annotations

Seunggun Lee; Alexandra DeLucia; Nikita Nangia; Praneeth Ganedi; Ryan Guan; Rubing Li; Britney Ngaw; Aditya Singhal; Shalaka Vaidya; Zijun Yuan; Lining Zhang; João Sedoc

2023 ACL ACL 2023

Common Law Annotations: Investigating the Stability of Dialog System Output Annotations

Abstract

AbstractMetrics for Inter-Annotator Agreement (IAA), like Cohen’s Kappa, are crucial for validating annotated datasets. Although high agreement is often used to show the reliability of annotation procedures, it is insufficient to ensure or reproducibility. While researchers are encouraged to increase annotator agreement, this can lead to specific and tailored annotation guidelines. We hypothesize that this may result in diverging annotations from different groups. To study this, we first propose the Lee et al. Protocol (LEAP), a standardized and codified annotation protocol. LEAP strictly enforces transparency in the annotation process, which ensures reproducibility of annotation guidelines. Using LEAP to annotate a dialog dataset, we empirically show that while research groups may create reliable guidelines by raising agreement, this can cause divergent annotations across different research groups, thus questioning the validity of the annotations. Therefore, we caution NLP researchers against using reliability as a proxy for reproducibility and validity.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — annotation stability

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Seunggun Lee , Alexandra DeLucia , Nikita Nangia , Praneeth Ganedi , Ryan Guan , Rubing Li , Britney Ngaw , Aditya Singhal , Shalaka Vaidya , Zijun Yuan , Lining Zhang , João Sedoc

Topics

Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Theory Natural Language Processing > Applications > Dialogue Systems Machine Learning > Optimization & Theory > Evaluation Artificial Intelligence > Core AI > Dialogue Systems Deep Learning > Learning Types > Evaluation

Keywords

inter-annotator agreement dialog system dataset annotation reproducibility study annotation guideline cohen kappa annotation stability dialog system output

Download PDF

History Semantic Graph Enhanced Conversational KBQA with Temporal Information Modeling 2023

Efficient Transformers with Dynamic Token Pooling 2023

HHU at SemEval-2023 Task 3: An Adapter-based Approach for News Genre Classification 2023

NAP at SemEval-2023 Task 3: Is Less Really More? (Back-)Translation as Data Augmentation Strategies for Detecting Persuasion Techniques 2023

Common Law Annotations: Investigating the Stability of Dialog System Output Annotations

Abstract

Authors

Topics

Keywords

Related papers