On the Quality of the Initial Basin in Overspecified Neural Networks

Itay Safran; Ohad Shamir

2016 ICML ICML 2016

On the Quality of the Initial Basin in Overspecified Neural Networks

Abstract

Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the \emphgeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger (“overspecified”) networks, which accords with some recent empirical and theoretical observations.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — overspecified networks

🐣 Hot Topic Early Bird — neural network optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Itay Safran , Ohad Shamir

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Theory Deep Learning > Architectures > Neural Networks Deep Learning > Optimization & Theory > Theory

Keywords

non-convex optimization neural network optimization overspecified networks initialization loss landscape global minimum relu network

Download PDF

Related papers

Associative Long Short-Term Memory 2016

Recycling Randomness with Structure for Sublinear time Kernel Expansions 2016

Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues 2016

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization 2016

Hawkes Processes with Stochastic Excitations 2016