2021 IJCAI IJCAI 2021

Self-boosting for Feature Distillation

Abstract

Knowledge distillation is a simple but effective method for model compression, which obtains a better-performing small network (Student) by learning from a well-trained large network (Teacher). However, when the difference in the model sizes of Student and Teacher is large, the gap in capacity leads to poor performance of Student. Existing methods focus on seeking simplified or more effective knowledge from Teacher to narrow the Teacher-Student gap, while we address this problem by Student's self-boosting. Specifically, we propose a novel distillation method named Self-boosting Feature Distillation (SFD), which eases the Teacher-Student gap by feature integration and self-distillation of Student. Three different modules are designed for feature integration to enhance the discriminability of Student's feature, which leads to improving the order of convergence in theory. Moreover, an easy-to-operate self-distillation strategy is put forward to stabilize the training process and promote the performance of Student, without additional forward propagation or memory consumption. Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
🐣 Hot Topic Early Bird — feature distillation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio