On the Predictability of Pruning Across Scales

Jonathan S Rosenfeld; Jonathan Frankle; Michael Carbin; Nir Shavit

2021 ICML ICML 2021

On the Predictability of Pruning Across Scales

Abstract

We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable. We demonstrate the accuracy of this approximation over orders of magnitude in depth, width, dataset size, and density. We show that the functional form holds (generalizes) for large scale data (e.g., ImageNet) and architectures (e.g., ResNets). As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — scaling law

🐣 Hot Topic Early Bird — neural network optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Jonathan S Rosenfeld , Jonathan Frankle , Michael Carbin , Nir Shavit

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Optimization & Theory > Learning Theory Deep Learning > Techniques > Model Architecture Machine Learning > Core Methods > Model Compression Deep Learning > Optimization & Theory > Optimization Deep Learning > Optimization & Theory > Model Compression

Keywords

model compression network pruning neural network optimization scaling law model generalization unstructured pruning network compression magnitude pruning

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021