Normalization Helps Training of Quantized LSTM

Lu Hou; Jinhua Zhu; James Kwok; Fei Gao; Tao Qin; Tie-yan Liu

2019 NIPS NeurIPS 2019

Normalization Helps Training of Quantized LSTM

Abstract

The long-short-term memory (LSTM), though powerful, is memory and computa\x02tion expensive. To alleviate this problem, one approach is to compress its weights by quantization. However, existing quantization methods usually have inferior performance when used on LSTMs. In this paper, we first show theoretically that training a quantized LSTM is difficult because quantization makes the exploding gradient problem more severe, particularly when the LSTM weight matrices are large. We then show that the popularly used weight/layer/batch normalization schemes can help stabilize the gradient magnitude in training quantized LSTMs. Empirical results show that the normalized quantized LSTMs achieve significantly better results than their unnormalized counterparts. Their performance is also comparable with the full-precision LSTM, while being much smaller in size.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — weight quantization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lu Hou , Jinhua Zhu , James Kwok , Fei Gao , Tao Qin , Tie-yan Liu

Topics

Artificial Intelligence > Core AI > Model Compression Deep Learning > Techniques > Normalization Machine Learning > Application Areas > Model Compression Deep Learning > Optimization & Theory > Model Compression

Keywords

neural network quantization batch normalization long short-term memory weight quantization weight normalization layer normalization gradient explosion

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019