Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Yuanzhi Li; Tengyu Ma; Hongyang Zhang

2018 COLT COLT 2018

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Abstract

We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given $\tilde{O}(dr^{2})$ random linear measurements of a rank $r$ positive semidefinite matrix $X^{\star}$, we can recover $X^{\star}$ by parameterizing it by $UU^\top$ with $U\in \mathbb R^{d\times d}$ and minimizing the squared loss, even if $r \ll d$. We prove that starting from a small initialization, gradient descent recovers $X^{\star}$ in $\tilde{O}(\sqrt{r})$ iterations approximately. The results solve the conjecture of Gunasekar et al.’17 under the restricted isometry property. The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — algorithmic regularization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuanzhi Li , Tengyu Ma , Hongyang Zhang

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Neural Networks

Keywords

matrix factorization gradient descent implicit regularization quadratic activation neural network algorithmic regularization

Download PDF

Related papers

Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure 2018

Actively Avoiding Nonsense in Generative Models 2018

A Faster Approximation Algorithm for the Gibbs Partition Function 2018

Size-Independent Sample Complexity of Neural Networks 2018

Information Directed Sampling and Bandits with Heteroscedastic Noise 2018