2018 CVPR CVPR 2018

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Abstract

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based visual recognition models call for efficient on-device inference schemes. We propose a quantization scheme along with a co-designed training procedure allowing inference to be carried out using integer-only arithmetic while preserving an end-to-end model accuracy that is close to floating-point inference. Inference using integer-only arithmetic performs better than floating-point arithmetic on typical ARM CPUs and can be implemented on integer-arithmetic-only hardware such as mobile accelerators (e.g. Qualcomm Hexagon). By quantizing both activations and weights as 8-bit integers, we obtain a close to 4x memory footprint reduction compared to 32-bit floating-point representations. Even on MobileNets, a model family known for runtime efficiency, our quantization approach results in an improved tradeoff between latency and accuracy on popular ARM CPUs for ImageNet classification and COCO detection.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — mobile inference
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio