PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

Saurabh Goyal; Anamitra Roy Choudhury; Saurabh Raje; Venkatesan Chakaravarthy; Yogish Sabharwal; Ashish Verma

2020 ICML ICML 2020

PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

Abstract

We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate transformer block outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by developing a strategy for measuring their significance, based on the self-attention mechanism. c) learning how many word-vectors to eliminate by augmenting the BERT model and the loss function. Experiments on the standard GLUE benchmark shows that PoWER-BERT achieves up to 4.5x reduction in inference time over BERT with < 1% loss in accuracy. We show that PoWER-BERT offers significantly better trade-off between accuracy and inference time compared to prior methods. We demonstrate that our method attains up to 6.8x reduction in inference time with < 1% loss in accuracy when applied over ALBERT, a highly compressed version of BERT. The code for PoWER-BERT is publicly available at https://github.com/IBM/PoWER-BERT.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Saurabh Goyal , Anamitra Roy Choudhury , Saurabh Raje , Venkatesan Chakaravarthy , Yogish Sabharwal , Ashish Verma

Topics

Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Resources & Methods > Large Language Models

Keywords

model compression bert model inference time word vector transformer model

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020