Towards Understanding Learning in Neural Networks with Linear Teachers

Roei Sarussi; Alon Brutzkus; Amir Globerson

2021 ICML ICML 2021

Towards Understanding Learning in Neural Networks with Linear Teachers

Abstract

Can a neural network minimizing cross-entropy learn linearly separable data? Despite progress in the theory of deep learning, this question remains unsolved. Here we prove that SGD globally optimizes this learning problem for a two-layer network with Leaky ReLU activations. The learned network can in principle be very complex. However, empirical evidence suggests that it often turns out to be approximately linear. We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary. Finally, we show a condition on the optimization that leads to weight clustering. We provide empirical results that validate our theoretical analysis.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — linear teacher

🐣 Hot Topic Early Bird — stochastic gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Roei Sarussi , Alon Brutzkus , Amir Globerson

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Learning Theory Deep Learning > Architectures > Neural Networks Machine Learning > Learning Types > Supervised Learning Deep Learning > Optimization & Theory > Theory

Keywords

stochastic gradient descent neural network training linear separability linear teacher weight clustering linear separable datum neural network

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021