Optimal Bump Functions for Shallow ReLU networks: Weight Decay, Depth Separation, Curse of Dimensionality

Stephan Wojtowytsch

2024 JMLR JMLR 2024

Optimal Bump Functions for Shallow ReLU networks: Weight Decay, Depth Separation, Curse of Dimensionality

Abstract

In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose average parameters and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the average weight variable grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality. [abs] [ pdf ][ bib ] © JMLR 2024. (edit, beta)

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — bump function

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Security & Privacy

Authors

Stephan Wojtowytsch

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Neural Networks

Keywords

curse of dimensionality weight decay regularization relu network depth separation bump function

Download PDF

Related papers

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks 2024

Convergence for nonconvex ADMM, with applications to CT imaging 2024

Functional Directed Acyclic Graphs 2024

Sum-of-norms clustering does not separate nearby balls 2024

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning 2024