Learning Neural Network Subspaces

Mitchell Wortsman; Maxwell C Horton; Carlos Guestrin; Ali Farhadi; Mohammad Rastegari

2021 ICML ICML 2021

Learning Neural Network Subspaces

Abstract

Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — weight averaging

🐣 Hot Topic Early Bird — neural network optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Mitchell Wortsman , Maxwell C Horton , Carlos Guestrin , Ali Farhadi , Mohammad Rastegari

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Neural Networks Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Learning Types > Representation Learning Deep Learning > Learning Types > Ensemble Learning

Keywords

ensemble learning subspace learning model calibration neural network optimization label noise stochastic weight averaging weight averaging neural network ensemble neural network subspace

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021