DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Xianing Chen; Qiong Cao; Yujie Zhong; Jing Zhang; Shenghua Gao; Dacheng Tao

2022 CVPR CVPR 2022

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Abstract

Transformers have been successfully applied to computer vision due to its powerful modelling capacity with self-attention. However, the good performance of transformers heavily depends on enormous training images. Thus, a data-efficient transformer solution is urgently needed. In this work, we propose an early knowledge distillation framework, which is termed as DearKD, to improvethe data-efficiency required by transformers. Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation. Further, our DearKD can also be applied to the extreme data-free case where no real images are available, where we propose a boundary-preserving intra-divergence loss based on DeepInversion to further close the performance gap against the full-data counterpart. Extensive experiments on ImageNet, partial ImageNet, data-free setting and other downstream tasks prove the superiority of DearKD over its baselines and state-of-the-art methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xianing Chen , Qiong Cao , Yujie Zhong , Jing Zhang , Shenghua Gao , Dacheng Tao

Topics

Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Architectures > Transformers Deep Learning > Techniques > Knowledge Distillation Deep Learning > Learning Types > Transfer Learning

Keywords

vision transformer transfer learning self-supervised learning knowledge distillation inductive bia data-efficient learning deep inversion

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022