2024 COLING COLING 2024

PyRater: A Python Toolkit for Annotation Analysis

Abstract

AbstractWe introduce PyRater, an open-source Python toolkit designed for analysing corpora annotations. When creating new annotated language resources, probabilistic models of annotation are the state-of-the-art solution for identifying the best annotators, retrieving the gold standard, and more generally separating annotation signal from noise. PyRater offers a unified interface for several such models and includes an API for the addition of new ones. Additionally, the toolkit has built-in functions to read datasets with multiple annotations and plot the analysis outcomes. In this work, we also demonstrate a novel application of PyRater to zero-shot classifiers, where it effectively selects the best-performing prompt. We make PyRater available to the research community.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio