Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S Du; Chi Jin; Jason Lee; Aarti Singh; Barnabás Póczos; Michael I. Jordan; Michael I Jordan

2017 NIPS NeurIPS 2017

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Abstract

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points—it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Mathematics & Optimization

📈 Trend Setter — Non-Convex Optimization

🧭 Keyword Pioneer — escape time

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Simon S Du , Chi Jin , Jason Lee , Michael I. Jordan , Michael I Jordan , Aarti Singh , Barnabás Póczos

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Continuous Optimization Deep Learning > Optimization & Theory > Theory Mathematics & Optimization > Optimization > Non-Convex Optimization

Keywords

non-convex optimization gradient descent convergence rate saddle point escape time

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017