SWALP : Stochastic Weight Averaging in Low Precision Training

Guandao Yang; Tianyi Zhang; Polina Kirichenko; Junwen Bai; Andrew Gordon Wilson; Chris De Sa

2019 ICML ICML 2019

SWALP : Stochastic Weight Averaging in Low Precision Training

Abstract

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

🧭 Keyword Pioneer — gradient accumulator

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Reinforcement Learning

Authors

Guandao Yang , Tianyi Zhang , Polina Kirichenko , Junwen Bai , Andrew Gordon Wilson , Chris De Sa

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Efficient Computing

Keywords

stochastic weight averaging low precision training gradient accumulator sgd iterate

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019