2016 ICML ICML 2016

Exact Exponent in Optimal Rates for Crowdsourcing

Abstract

Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🧭 Keyword Pioneer — error exponent
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Speech & Audio
🐣 Hot Topic Early Bird — information theory