Boosting Deep Neural Network Efficiency with Dual-Module Inference

Liu Liu; Lei Deng; Zhaodong Chen; Yuke Wang; Shuangchen Li; Jingwei Zhang; Yihua Yang; Zhenyu Gu; Yufei Ding; Yuan Xie

2020 ICML ICML 2020

Boosting Deep Neural Network Efficiency with Dual-Module Inference

Abstract

Using deep neural networks (DNNs) in machine learning tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements and energy constraints because of the memory-bound and the compute-bound execution pattern of DNNs. We propose a big-little dual-module inference to dynamically skip unnecessary memory accesses and computations to accelerate DNN inference. Leveraging the noise-resilient feature of nonlinear activation functions, we propose to use a lightweight little module that approximates the original DNN layer, termed as the big module, to compute activations of the insensitive region that are more noise-resilient. Hence, the expensive memory accesses and computations of the big module can be reduced as the results are only calculated in the sensitive region. For memory-bound models such as recurrent neural networks (RNNs), our method can reduce the overall memory accesses by 40% on average and achieve 1.54x to 1.75x speedup on a commodity CPU-based server platform with a negligible impact on model quality. In addition, our method can reduce the operations of the compute-bound models such as convolutional neural networks (CNNs) by 3.02x, with only a 0.5% accuracy drop.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — dual-module inference

🐣 Hot Topic Early Bird — memory optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Liu Liu , Lei Deng , Zhaodong Chen , Yuke Wang , Shuangchen Li , Jingwei Zhang , Yihua Yang , Zhenyu Gu , Yufei Ding , Yuan Xie

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture

Keywords

convolutional neural network recurrent neural network deep neural network memory optimization dual-module inference

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020