Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

Eran Malach; Pritish Kamath; Emmanuel Abbe; Nathan Srebro

2021 ICML ICML 2021

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

Abstract

We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — tangent kernels

🐣 Hot Topic Early Bird — neural network optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Eran Malach , Pritish Kamath , Emmanuel Abbe , Nathan Srebro

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Deep Learning > Techniques > Model Architecture Deep Learning > Optimization & Theory > Theory Deep Learning > Learning Types > Representation Learning

Keywords

representation learning neural network optimization differentiable learning tangent kernels gradient descent kernel methods neural network

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021