One-pass Stochastic Gradient Descent in overparametrized two-layer neural networks

Hanjing Zhu; Jiaming Xu

2021 AISTATS AISTATS 2021

One-pass Stochastic Gradient Descent in overparametrized two-layer neural networks

Abstract

There has been a recent surge of interest in understanding the convergence of gradient descent (GD) and stochastic gradient descent (SGD) in overparameterized neural networks. Most previous work assumes that the training data is provided a priori in a batch, while less attention has been paid to the important setting where the training data arrives in a stream. In this paper, we study the streaming data setup and show that with overparamterization and random initialization, the prediction error of two-layer neural networks under one-pass SGD converges in expectation. The convergence rate depends on the eigen-decomposition of the integral operator associated with the so-called neural tangent kernel (NTK). A key step of our analysis is to show a random kernel function converges to the NTK with high probability using the VC dimension and McDiarmid’s inequality.

🐣 Hot Topic Early Bird — stochastic gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

Authors

Hanjing Zhu , Jiaming Xu

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Machine Learning > Learning Paradigms > Transfer Learning Machine Learning > Optimization & Theory > Stochastic Methods Deep Learning > Learning Types > Representation Learning

Keywords

representation learning neural tangent kernel stochastic gradient descent feature learning convergence analysis streaming datum overparameterized neural network

Download PDF

Related papers

Linear Regression Games: Convergence Guarantees to Approximate Out-of-Distribution Solutions 2021

Semi-Supervised Learning with Meta-Gradient 2021

Accelerating Metropolis-Hastings with Lightweight Inference Compilation 2021

When MAML Can Adapt Fast and How to Assist When It Cannot 2021

On the convergence of the Metropolis algorithm with fixed-order updates for multivariate binary probability distributions 2021