2019
JMLR
JMLR 2019
spark-crowd: A Spark Package for Learning from Crowdsourced Big Data
Abstract
As the data sets increase in size, the process of manually labeling data becomes unfeasible by small groups of experts. Thus, it is common to rely on crowdsourcing platforms which provide inexpensive, but noisy, labels. Although implementations of algorithms to tackle this problem exist, none of them focus on scalability, limiting the area of application to relatively small data sets. In this paper, we present spark-crowd, an Apache Spark package for learning from crowdsourced data with scalability in mind. [abs] [ pdf ][ bib ] [ code ] © JMLR 2019. (edit, beta)
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Reinforcement Learning
🌉
Interdisciplinary Bridge
— Data Science & Analytics and Machine Learning
🧭
Keyword Pioneer
— scalable computing