Finding Patterns in Noisy Crowds: Regression-based Annotation Aggregation for Crowdsourced Data

Natalie Parde; Rodney Nielsen

2017 EMNLP EMNLP 2017

Finding Patterns in Noisy Crowds: Regression-based Annotation Aggregation for Crowdsourced Data

Abstract

AbstractCrowdsourcing offers a convenient means of obtaining labeled data quickly and inexpensively. However, crowdsourced labels are often noisier than expert-annotated data, making it difficult to aggregate them meaningfully. We present an aggregation approach that learns a regression model from crowdsourced annotations to predict aggregated labels for instances that have no expert adjudications. The predicted labels achieve a correlation of 0.594 with expert labels on our data, outperforming the best alternative aggregation method by 11.9%. Our approach also outperforms the alternatives on third-party datasets.

🐣 Hot Topic Early Bird — label noise

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Natalie Parde , Rodney Nielsen

Topics

Machine Learning > Core Methods > Regression Machine Learning > Application Areas > Data Augmentation Machine Learning > Learning Types > Supervised Learning

Keywords

data augmentation label noise crowdsourced annotation regression model annotation aggregation

Download PDF

Related papers

Reinforced Video Captioning with Entailment Rewards 2017

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017