Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima

Brian Swenson; Ryan Murray; H. Vincent Poor; Soummya Kar

2022 JMLR JMLR 2022

Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima

Abstract

Gradient-descent (GD) based algorithms are an indispensable tool for optimizing modern machine learning models. The paper considers distributed stochastic GD (D-SGD)--a network-based variant of GD. Distributed algorithms play an important role in large-scale machine learning problems as well as the Internet of Things (IoT) and related applications. The paper considers two main issues. First, we study convergence of D-SGD to critical points when the loss function is nonconvex and nonsmooth. We consider a broad range of nonsmooth loss functions including those of practical interest in modern deep learning. It is shown that, for each fixed initialization, D-SGD converges to critical points of the loss with probability one. Next, we consider the problem of avoiding saddle points. It is well known that classical GD avoids saddle points; however, analogous results have been absent for distributed variants of GD. For this problem, we again assume that loss functions may be nonconvex and nonsmooth, but are smooth in a neighborhood of a saddle point. It is shown that, for any fixed initialization, D-SGD avoids such saddle points with probability one. Results are proved by studying the underlying (distributed) gradient flow, using the ordinary differential equation (ODE) method of stochastic approximation. [abs] [ pdf ][ bib ] © JMLR 2022. (edit, beta)

🐣 Hot Topic Early Bird — non-convex optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Mathematics & Optimization

Authors

Brian Swenson , Ryan Murray , H. Vincent Poor , Soummya Kar

Topics

Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Core Methods > Optimization Deep Learning > Optimization & Theory > Optimization

Keywords

stochastic gradient descent nonconvex optimization non-convex optimization convergence analysis distributed learning distributed optimization nonsmooth optimization gradient flow saddle point critical point

Download PDF

Related papers

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping 2022

LinCDE: Conditional Density Estimation via Lindsey's Method 2022

Causal Classification: Treatment Effect Estimation vs. Outcome Prediction 2022

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization 2022

Power Iteration for Tensor PCA 2022