2011 NIPS NeurIPS 2011

Ranking annotators for crowdsourced labeling tasks

Abstract

With the advent of crowdsourcing services it has become quite cheap and reasonably effective to get a dataset labeled by multiple annotators in a short amount of time. Various methods have been proposed to estimate the consensus labels by correcting for the bias of annotators with different kinds of expertise. Often we have low quality annotators or spammers--annotators who assign labels randomly (e.g., without actually looking at the instance). Spammers can make the cost of acquiring labels very expensive and can potentially degrade the quality of the consensus labels. In this paper we formalize the notion of a spammer and define a score which can be used to rank the annotators---with the spammers having a score close to zero and the good annotators having a high score close to one.

🌱 Topic Pioneer — Fairness
🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning
📈 Trend Setter — Data Augmentation
🧭 Keyword Pioneer — annotator ranking
🐝 Cross-Pollinator — Data Science & Analytics, Deep Learning, Machine Learning