Is Local SGD Better than Minibatch SGD?

Blake Woodworth; Kumar Kshitij Patel; Sebastian Stich; Zhen Dai; Brian Bullins; Brendan Mcmahan; Ohad Shamir; Nathan Srebro

2020 ICML ICML 2020

Is Local SGD Better than Minibatch SGD?

Abstract

We study local SGD (also known as parallel SGD and federated SGD), a natural and frequently used distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minmax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least \emph{sometimes} improves over minibatch SGD, but our guarantee does not always improve over, nor even match, minibatch SGD; (3) We show that indeed local SGD does \emph{not} dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.

❓ The Questioner

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning

🧭 Keyword Pioneer — local sgd

🐣 Hot Topic Early Bird — stochastic gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Blake Woodworth , Kumar Kshitij Patel , Sebastian Stich , Zhen Dai , Brian Bullins , Brendan Mcmahan , Ohad Shamir , Nathan Srebro

Topics

Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Statistical Learning Computer Science > Systems > Distributed Systems Mathematics & Optimization > Optimization > Distributed Learning Machine Learning > Learning Types > Deep Learning

Keywords

federated learning stochastic gradient descent convex optimization distributed optimization parallel optimization local sgd minibatch sgd quadratic objective

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020