AdaDelay: Delay Adaptive Distributed Stochastic Optimization

Suvrit Sra; Adams Wei Yu; Mu Li; Alex Smola

2016 AISTATS AISTATS 2016

AdaDelay: Delay Adaptive Distributed Stochastic Optimization

Abstract

We develop distributed stochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastic (sub)gradients. Our setup is motivated by the behavior of real-world distributed computation systems; in particular, we analyze a setting wherein worker nodes can be differently slow at different times. In contrast to existing approaches, we do not impose a worst-case bound on the delays experienced but rather allow the updates to be sensitive to the actual delays experienced. This sensitivity allows use of larger stepsizes, which can help speed up initial convergence without having to wait too long for slower machines; the global convergence rate is still preserved. We experiment with different delay patterns, and obtain noticeable improvements for large-scale real datasets with billions of examples and features.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — delay adaptive

🐣 Hot Topic Early Bird — distributed optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Suvrit Sra , Adams Wei Yu , Mu Li , Alex Smola

Topics

Machine Learning > Optimization & Theory > Distributed Learning Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

stochastic optimization convex optimization distributed optimization delay adaptive

Download PDF

Related papers

Bipartite Correlation Clustering: Maximizing Agreements 2016

Precision Matrix Estimation in High Dimensional Gaussian Graphical Models with Faster Rates 2016

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes 2016

Time-Varying Gaussian Process Bandit Optimization 2016

Bayesian Markov Blanket Estimation 2016