LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

Tianyi Chen; Georgios Giannakis; Tao Sun; Wotao Yin

2018 NIPS NeurIPS 2018

LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

Abstract

This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation. Simple rules are designed to detect slowly-varying gradients and, therefore, trigger the reuse of outdated gradients. The resultant gradient-based algorithms are termed Lazily Aggregated Gradient --- justifying our acronym LAG used henceforth. Theoretically, the merits of this contribution are: i) the convergence rate is the same as batch gradient descent in strongly-convex, convex, and nonconvex cases; and, ii) if the distributed datasets are heterogeneous (quantified by certain measurable constants), the communication rounds needed to achieve a targeted accuracy are reduced thanks to the adaptive reuse of lagged gradients. Numerical experiments on both synthetic and real data corroborate a significant communication reduction compared to alternatives.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

📈 Trend Setter — Stochastic Methods

🧭 Keyword Pioneer — adaptive skipping

🐣 Hot Topic Early Bird — communication efficiency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tianyi Chen , Georgios Giannakis , Tao Sun , Wotao Yin

Topics

Artificial Intelligence > Learning Paradigms > Federated Learning Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Efficient Computing Deep Learning > Optimization & Theory > Stochastic Methods

Keywords

stochastic gradient distributed learning gradient descent communication efficiency adaptive optimization heterogeneous datum gradient method adaptive skipping

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018