On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Peizhong Ju; Xiaojun Lin; Ness Shroff

2021 ICML ICML 2021

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Abstract

In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

Authors

Peizhong Ju , Xiaojun Lin , Ness Shroff

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Optimization & Theory > Theory

Keywords

neural tangent kernel double descent gradient descent generalization error relu activation two-layer neural network

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021