A Survey on Model Compression and Acceleration for Pretrained Language Models

Canwen Xu; Julian McAuley

2023 AAAI AAAI 2023

A Survey on Model Compression and Acceleration for Pretrained Language Models

Abstract

Abstract Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — inference delay

🐣 Hot Topic Early Bird — model acceleration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Canwen Xu , Julian McAuley

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Application Areas > Model Compression Deep Learning > Optimization & Theory > Model Compression Deep Learning > Optimization & Theory > Efficient Computing Natural Language Processing > Resources & Methods > Pretraining

Keywords

model compression efficient inference neural network optimization efficient computing inference optimization model acceleration pretrained language model inference delay

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023