Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Tianyang Hu; Wenjia Wang; Cong Lin; Guang Cheng

2021 AISTATS AISTATS 2021

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Abstract

Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the L2 estimation error with respect to the GD iteration, which is away from zero without a delicate choice of early stopping. In turn, through a comprehensive analysis of L2-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the L2 regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax optimal rate of the L2 estimation error is achieved. Numerical experiments confirm our theory and further demonstrate that the L2 regularization approach improves the training robustness and works for a wider range of neural networks.

🧭 Keyword Pioneer — minimax optimal rate

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

Authors

Tianyang Hu , Wenjia Wang , Cong Lin , Guang Cheng

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Theory Deep Learning > Optimization & Theory > Theory

Keywords

neural tangent kernel gradient descent l2 regularization kernel ridge regression early stopping minimax optimal rate overparameterized neural network overparametrized neural network

Download PDF

Related papers

Linear Regression Games: Convergence Guarantees to Approximate Out-of-Distribution Solutions 2021

Semi-Supervised Learning with Meta-Gradient 2021

Accelerating Metropolis-Hastings with Lightweight Inference Compilation 2021

When MAML Can Adapt Fast and How to Assist When It Cannot 2021

On the convergence of the Metropolis algorithm with fixed-order updates for multivariate binary probability distributions 2021