Progressive Ensemble Distillation: Building Ensembles for Efficient Inference

Don Dennis; Abhishek Shetty; Anish Prasad Sevekari; Kazuhito Koishida; Virginia Smith

2023 NIPS NeurIPS 2023

Progressive Ensemble Distillation: Building Ensembles for Efficient Inference

Abstract

Knowledge distillation is commonly used to compress an ensemble of models into a single model. In this work we study the problem of progressive ensemble distillation: Given a large, pretrained teacher model , we seek to decompose the model into an ensemble of smaller, low-inference cost student models . The resulting ensemble allows for flexibly tuning accuracy vs. inference cost, which can be useful for a multitude of applications in efficient inference. Our method, B-DISTIL, uses a boosting procedure that allows function composition based aggregation rules to construct expressive ensembles with similar performance as using much smaller student models. We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across a variety of image, speech, and sensor datasets. Our method comes with strong theoretical guarantees in terms of convergence as well as generalization.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Don Dennis , Abhishek Shetty , Anish Prasad Sevekari , Kazuhito Koishida , Virginia Smith

Topics

Machine Learning > Application Areas > Efficient Computing Machine Learning > Application Areas > Knowledge Distillation Machine Learning > Learning Types > Ensemble Learning Deep Learning > Optimization & Theory > Model Compression Deep Learning > Learning Types > Knowledge Distillation

Keywords

model compression ensemble learning knowledge distillation efficient inference student model

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023