2014 JMLR JMLR 2014

A Reliable Effective Terascale Linear Learning System

Abstract

We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features, (The number of features here refers to the number of non-zero entries in the data matrix.) billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature. (All the empirical evaluation reported in this work was carried out between May-Oct 2011.) We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices. [abs] [ pdf ][ bib ] © JMLR 2014. (edit, beta)

🧭 Keyword Pioneer — terascale learning
🐣 Hot Topic Early Bird — distributed learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio