Gradient Descent Finds Global Minima of Deep Neural Networks

Simon Du; Jason Lee; Haochuan Li; Liwei Wang; Xiyu Zhai

2019 ICML ICML 2019

Gradient Descent Finds Global Minima of Deep Neural Networks

Abstract

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — global convergence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Simon Du , Jason Lee , Haochuan Li , Liwei Wang , Xiyu Zhai

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Theory Machine Learning > Learning Types > Deep Learning Deep Learning > Optimization & Theory > Optimization

Keywords

global convergence gradient descent gradient descent optimization deep neural network training dynamics residual network global minimum over-parameterized network

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019