The effect of Leaky ReLUs on the training and generalization of overparameterized networks

Yinglong Guo; Shaohan Li; Gilad Lerman

2024 AISTATS AISTATS 2024

The effect of Leaky ReLUs on the training and generalization of overparameterized networks

Abstract

We investigate the training and generalization errors of overparameterized neural networks (NNs) with a wide class of leaky rectified linear unit (ReLU) functions. More specifically, we carefully upper bound both the convergence rate of the training error and the generalization error of such NNs and investigate the dependence of these bounds on the Leaky ReLU parameter, $\alpha$. We show that $\alpha =-1$, which corresponds to the absolute value activation function, is optimal for the training error bound. Furthermore, in special settings, it is also optimal for the generalization error bound. Numerical experiments empirically support the practical choices guided by the theory.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Security & Privacy

Authors

Yinglong Guo , Shaohan Li , Gilad Lerman

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Model Architecture Deep Learning > Optimization & Theory > Theory

Keywords

neural network optimization neural network theory generalization error generalization bound convergence rate activation function overparameterized network overparameterized neural network leaky relu training error

Download PDF

Related papers

Causal Bandits with General Causal Models and Interventions 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers 2024

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective 2024

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning 2024

Pure Exploration in Bandits with Linear Constraints 2024