Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Xiangru Lian; Yijun Huang; Yuncheng Li; Ji Liu

2015 NIPS NeurIPS 2015

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Abstract

The asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is on the computer network and the other is on the shared memory system. We establish an ergodic convergence rate $O(1/\sqrt{K})$ for both algorithms and prove that the linear speedup is achievable if the number of workers is bounded by $\sqrt{K}$ ($K$ is the total number of iterations). Our results generalize and improve existing analysis for convex minimization.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Stochastic Methods

🧭 Keyword Pioneer — asynchronous parallel optimization

🐣 Hot Topic Early Bird — nonconvex optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiangru Lian , Yijun Huang , Yuncheng Li , Ji Liu

Topics

Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Deep Learning > Optimization & Theory > Optimization Deep Learning > Optimization & Theory > Stochastic Methods

Keywords

stochastic gradient descent nonconvex optimization convergence analysis distributed learning distributed optimization asynchronous parallel optimization asynchronous parallel computation

Download PDF

Related papers

Data Generation as Sequential Decision Making 2015

A Recurrent Latent Variable Model for Sequential Data 2015

Combinatorial Cascading Bandits 2015

Accelerated Mirror Descent in Continuous and Discrete Time 2015

Matrix Completion with Noisy Side Information 2015