meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Xu Sun; Xuancheng Ren; Shuming Ma; Houfeng Wang

2017 ICML ICML 2017

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Abstract

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1–4\% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — gradient sparsification

🐣 Hot Topic Early Bird — computational efficiency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xu Sun , Xuancheng Ren , Shuming Ma , Houfeng Wang

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Application Areas > Efficient Computing Machine Learning > Application Areas > Model Compression Deep Learning > Optimization & Theory > Neural Network Optimization

Keywords

model compression deep learning neural network optimization computational efficiency gradient sparsification neural network back propagation

Download PDF

Related papers

Bottleneck Conditional Density Estimation 2017

Constrained Policy Optimization 2017

Near-Optimal Design of Experiments via Regret Minimization 2017

Input Convex Neural Networks 2017

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation 2017