Adaptive Loss-Aware Quantization for Multi-Bit Networks

Zhongnan Qu; Zimu Zhou; Yun Cheng; Lothar Thiele

2020 CVPR CVPR 2020

Adaptive Loss-Aware Quantization for Multi-Bit Networks

Abstract

We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on low-resource mobile and embedded platforms. We propose Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy. Unlike previous MBN quantization solutions that train a quantizer by minimizing the error to reconstruct full precision weights, ALQ directly minimizes the quantization-induced error on the loss function involving neither gradient approximation nor full precision maintenance. ALQ also exploits strategies including adaptive bitwidth, smooth bitwidth reduction, and iterative trained quantization to allow a smaller network size without loss in accuracy. Experiment results on popular image datasets show that ALQ outperforms state-of-the-art compressed networks in terms of both storage and accuracy.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — low-resource deployment

🐣 Hot Topic Early Bird — weight quantization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhongnan Qu , Zimu Zhou , Yun Cheng , Lothar Thiele

Topics

Artificial Intelligence > Core AI > Model Compression Deep Learning > Techniques > Model Architecture Machine Learning > Application Areas > Model Compression Deep Learning > Optimization & Theory > Model Compression

Keywords

neural network quantization model compression model quantization loss function weight quantization network compression neural network low-resource deployment multi-bit network bitwidth reduction

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020