Delay-Adaptive Distributed Stochastic Optimization

Zhaolin Ren; Zhengyuan Zhou; Linhai Qiu; Ajay Deshpande; Jayant Kalagnanam

2020 AAAI AAAI 2020

Delay-Adaptive Distributed Stochastic Optimization

Abstract

Abstract In large-scale optimization problems, distributed asynchronous stochastic gradient descent (DASGD) is a commonly used algorithm. In most applications, there are often a large number of computing nodes asynchronously computing gradient information. As such, the gradient information received at a given iteration is often stale. In the presence of such delays, which can be unbounded, the convergence of DASGD is uncertain. The contribution of this paper is twofold. First, we propose a delay-adaptive variant of DASGD where we adjust each iteration's step-size based on the size of the delay, and prove asymptotic convergence of the algorithm on variationally coherent stochastic problems, a class of functions which properly includes convex, quasi-convex and star-convex functions. Second, we extend the convergence results of standard DASGD, used usually for problems with bounded domains, to problems with unbounded domains. In this way, we extend the frontier of theoretical guarantees for distributed asynchronous optimization, and provide new insights for practitioners working on large-scale optimization problems.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — asynchronous computing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhaolin Ren , Zhengyuan Zhou , Linhai Qiu , Ajay Deshpande , Jayant Kalagnanam

Topics

Machine Learning > Optimization & Theory > Distributed Learning Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Optimization & Theory > Stochastic Methods Mathematics & Optimization > Optimization > Distributed Optimization

Keywords

stochastic optimization stochastic gradient descent convex optimization distributed optimization asynchronous optimization asynchronous computing delay-adaptive step-size delay-adaptive optimization

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020