Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

Etienne Boursier; Loucas Pillaud-Vivien; Nicolas Flammarion

2022 NIPS NeurIPS 2022

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

Abstract

The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — saddle-to-saddle dynamics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Etienne Boursier , Loucas Pillaud-Vivien , Nicolas Flammarion

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Theory Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Optimization & Theory > Theory

Keywords

mean squared error gradient flow implicit bia relu network optimization dynamics saddle-to-saddle dynamics gradient flow dynamics

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022