A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Wei Niu; Zhenglun Kong; Geng Yuan; Weiwen Jiang; Jiexiong Guan; Caiwen Ding; Pu Zhao; Sijia Liu; Bin Ren; Yanzhi Wang

2021 IJCAI IJCAI 2021

A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Abstract

Transformer-based deep learning models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. In this paper, we propose a compression-compilation co-design framework that can guarantee the identified model meets both resource and real-time specifications of mobile devices. Our framework applies a compiler-aware neural architecture optimization method (CANAO), which can generate the optimal compressed model that balances both accuracy and latency. We are able to achieve up to 7.8x speedup compared with TensorFlow-Lite with only minor accuracy loss. We present two types of BERT applications on mobile devices: Question Answering (QA) and Text Generation. Both can be executed in real-time with latency as low as 45ms. Videos for demonstrating the framework can be found on https://www.youtube.com/watch?v=_WIRvK_2PZI

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wei Niu , Zhenglun Kong , Geng Yuan , Weiwen Jiang , Jiexiong Guan , Caiwen Ding , Pu Zhao , Sijia Liu , Bin Ren , Yanzhi Wang

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture

Keywords

model compression model compilation real-time inference mobile deployment neural architecture optimization

Download PDF

Related papers

Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard 2021

Guaranteeing Maximin Shares: Some Agents Left Behind 2021

Surprisingly Popular Voting Recovers Rankings, Surprisingly! 2021

Strategyproof Randomized Social Choice for Restricted Sets of Utility Functions 2021

Diversity in Kemeny Rank Aggregation: A Parameterized Approach 2021