spark-crowd: A Spark Package for Learning from Crowdsourced Big Data

Enrique G. Rodrigo; Juan A. Aledo; José A. Gámez

2019 JMLR JMLR 2019

spark-crowd: A Spark Package for Learning from Crowdsourced Big Data

Abstract

As the data sets increase in size, the process of manually labeling data becomes unfeasible by small groups of experts. Thus, it is common to rely on crowdsourcing platforms which provide inexpensive, but noisy, labels. Although implementations of algorithms to tackle this problem exist, none of them focus on scalability, limiting the area of application to relatively small data sets. In this paper, we present spark-crowd, an Apache Spark package for learning from crowdsourced data with scalability in mind. [abs] [ pdf ][ bib ] [ code ] © JMLR 2019. (edit, beta)

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Reinforcement Learning

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

🧭 Keyword Pioneer — scalable computing

Authors

Enrique G. Rodrigo , Juan A. Aledo , José A. Gámez

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Optimization & Theory > Distributed Learning Data Science & Analytics > Applications > Information Retrieval Machine Learning > Learning Types > Crowdsourcing

Keywords

scalable machine learning label aggregation crowdsourced learning scalable computing noisy label apache spark

Download PDF

Related papers

Adaptation Based on Generalized Discrepancy 2019

Iterated Learning in Dynamic Social Networks 2019

Pyro: Deep Universal Probabilistic Programming 2019

Matched Bipartite Block Model with Covariates 2019

Approximation Hardness for A Class of Sparse Optimization Problems 2019