Random Feature Amplification: Feature Learning and Generalization in Neural Networks

Spencer Frei; Niladri S. Chatterji; Peter L. Bartlett

2023 JMLR JMLR 2023

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

Abstract

In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics `amplify’ these weak, random features to strong, useful features. [abs] [ pdf ][ bib ] © JMLR 2023. (edit, beta)

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — random feature amplification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Spencer Frei , Niladri S. Chatterji , Peter L. Bartlett

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Learning Theory Deep Learning > Architectures > Neural Networks

Keywords

feature learning binary classification gradient descent relu network random feature amplification xor function

Download PDF

Related papers

Flexible Model Aggregation for Quantile Regression 2023

Efficient Computation of Rankings from Pairwise Comparisons 2023

Efficient Structure-preserving Support Tensor Train Machine 2023

Attacks against Federated Learning Defense Systems and their Mitigation 2023

How Do You Want Your Greedy: Simultaneous or Repeated? 2023