A Bregman Learning Framework for Sparse Neural Networks

Leon Bungert; Tim Roith; Daniel Tenbrinck; Martin Bürger

2022 JMLR JMLR 2022

A Bregman Learning Framework for Sparse Neural Networks

Abstract

We propose a learning framework based on stochastic Bregman iterations, also known as mirror descent, to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated version using momentum, and AdaBreg, which is a Bregmanized generalization of the Adam algorithm. In contrast to established methods for sparse training the proposed family of algorithms constitutes a regrowth strategy for neural networks that is solely optimization-based without additional heuristics. Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network. The proposed approach is extremely easy and efficient, yet supported by the rich mathematical theory of inverse scale space methods. We derive a statistically profound sparse parameter initialization strategy and provide a rigorous stochastic convergence analysis of the loss decay and additional convergence proofs in the convex regime. Using only $3.4\%$ of the parameters of ResNet-18 we achieve $90.2\%$ test accuracy on CIFAR-10, compared to $93.6\%$ using the dense network. Our algorithm also unveils an autoencoder architecture for a denoising task. The proposed framework also has a huge potential for integrating sparse backpropagation and resource-friendly training. Code is available at https://github.com/TimRoith/BregmanLearning. [abs] [ pdf ][ bib ] [ code ] © JMLR 2022. (edit, beta)

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Leon Bungert , Tim Roith , Daniel Tenbrinck , Martin Bürger

Topics

Machine Learning > Optimization & Theory > Optimization Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Model Architecture

Keywords

stochastic optimization network pruning parameter efficient mirror descent sparse neural network inverse scale space

Download PDF

Related papers

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping 2022

LinCDE: Conditional Density Estimation via Lindsey's Method 2022

Causal Classification: Treatment Effect Estimation vs. Outcome Prediction 2022

Provable Tensor-Train Format Tensor Completion by Riemannian Optimization 2022

Power Iteration for Tensor PCA 2022