ResIST: Layer-wise decomposition of ResNets for distributed training

Chen Dun; Cameron R. Wolfe; Christopher M. Jermaine; Anastasios Kyrillidis

2022 UAI UAI 2022

ResIST: Layer-wise decomposition of ResNets for distributed training

Abstract

We propose ResIST, a novel distributed training protocol for Residual Networks (ResNets). ResIST randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the process repeats until convergence. By construction, per iteration, ResIST communicates only a small portion of network parameters to each machine and never uses the full model during training. Thus, ResIST reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training. In comparison to common protocols, like data-parallel training and data-parallel training with local SGD, ResIST yields a decrease in communication and compute requirements, while being competitive with respect to model performance.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — layer-wise decomposition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chen Dun , Cameron R. Wolfe , Christopher M. Jermaine , Anastasios Kyrillidis

Topics

Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Neural Networks

Keywords

distributed training residual network data-parallel training local sgd layer-wise decomposition

Download PDF

Related papers

Combating the instability of mutual information-based losses via regularization 2022

Future gradient descent for adapting the temporal shifting data distribution in online recommendation systems 2022

Privacy-aware compression for federated data analysis 2022

Fixing the Bethe approximation: How structural modifications in a graph improve belief propagation 2022

Probabilistic spatial transformer networks 2022