Coordinating Distributed Example Orders for Provably Accelerated Training

A. Feder Cooper; Wentao Guo; Duc Khiem Pham; Tiancheng Yuan; Charlie Ruan; Yucheng Lu; Christopher M De Sa

2023 NIPS NeurIPS 2023

Coordinating Distributed Example Orders for Provably Accelerated Training

Abstract

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — kernel thinning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

A. Feder Cooper , Wentao Guo , Duc Khiem Pham , Tiancheng Yuan , Charlie Ruan , Yucheng Lu , Christopher M De Sa

Topics

Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

stochastic gradient descent distributed training convergence rate kernel thinning gradient balancing

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023