FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Linjie Yang; Qing Jin

2021 AAAI AAAI 2021

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Abstract

Abstract Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bit-width of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Our final models achieve comparable or better performance than previous quantization methods with mixed precision on MobilenetV1/V2, ResNet18 under different resource constraints on ImageNet dataset.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — bit-width optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Linjie Yang , Qing Jin

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture Machine Learning > Application Areas > Model Compression Deep Learning > Optimization & Theory > Optimization Deep Learning > Optimization & Theory > Model Compression Deep Learning > Learning Types > Model Compression

Keywords

model compression model quantization neural network optimization latency reduction neural network inference quantization-aware training mixed precision mixed precision quantization bit-width optimization layer-wise bit-width

Download PDF

Related papers

Contextual Conditional Reasoning 2021

Attention Beam: An Image Captioning Approach (Student Abstract) 2021

Movie Summarization via Sparse Graph Construction 2021

Text Analysis for Understanding Symptoms of Social Anxiety in Student Veterans 2021

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs 2021