Isotonic Data Augmentation for Knowledge Distillation

Wanyun Cui; Sen Yan

2021 IJCAI IJCAI 2021

Isotonic Data Augmentation for Knowledge Distillation

Abstract

Knowledge distillation uses both real hard labels and soft labels predicted by teacher model as supervision. Intuitively, we expect the soft label probabilities and hard label probabilities to be concordant. However, in the real knowledge distillations, we found critical rank violations between hard labels and soft labels for augmented samples. For example, for an augmented sample x = 0.7 * cat + 0.3 * panda, a meaningful soft label distribution should have the same rank: P(cat|x)>P(panda|x)>P(other|x). But real teacher models usually violate the rank: P(tiger|x)>P(panda|x)>P(cat|x). We attribute the rank violations to the increased difficulty of understanding augmented samples for the teacher model. Empirically, we found the violations injuries the knowledge transfer. In this paper, we denote eliminating rank violations in data augmentation for knowledge distillation as isotonic data augmentation (IDA). We use isotonic regression (IR) -- a classic statistical algorithm -- to eliminate the rank violations. We show that IDA can be modeled as a tree-structured IR problem and gives an O(c*log(c)) optimal algorithm, where c is the number of labels. In order to further reduce the time complexity of the optimal algorithm, we also proposed a GPU-friendly approximation algorithm with linear time complexity. We have verified on variant datasets and data augmentation baselines that (1) the rank violation is a general phenomenon for data augmentation in knowledge distillation. And (2) our proposed IDA algorithms effectively increases the accuracy of knowledge distillation by solving the ranking violations.

🧭 Keyword Pioneer — rank violation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Wanyun Cui , Sen Yan

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Data Augmentation Machine Learning > Application Areas > Knowledge Distillation Machine Learning > Application Areas > Model Compression

Keywords

model compression knowledge distillation data augmentation isotonic regression soft label rank violation

Download PDF

Related papers

Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard 2021

Guaranteeing Maximin Shares: Some Agents Left Behind 2021

Surprisingly Popular Voting Recovers Rankings, Surprisingly! 2021

Strategyproof Randomized Social Choice for Restricted Sets of Utility Functions 2021

Diversity in Kemeny Rank Aggregation: A Parameterized Approach 2021