Data-Distortion Guided Self-Distillation for Deep Neural Networks

Ting-Bing Xu; Cheng-lin Liu

2019 AAAI AAAI 2019

Data-Distortion Guided Self-Distillation for Deep Neural Networks

Abstract

Abstract Knowledge distillation is an effective technique that has been widely used for transferring knowledge from a network to another network. Despite its effective improvement of network performance, the dependence of accompanying assistive models complicates the training process of single network in the need of large memory and time cost. In this paper, we design a more elegant self-distillation mechanism to transfer knowledge between different distorted versions of same training data without the reliance on accompanying models. Specifically, the potential capacity of single network is excavated by learning consistent global feature distributions and posterior distributions (class probabilities) across these distorted versions of data. Extensive experiments on multiple datasets (i.e., CIFAR-10/100 and ImageNet) demonstrate that the proposed method can effectively improve the generalization performance of various network architectures (such as AlexNet, ResNet, Wide ResNet, and DenseNet), outperform existing distillation methods with little extra training efforts.

🚀 Conference Pioneer — AAAI 2019

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — data distortion

🐣 Hot Topic Early Bird — model generalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ting-Bing Xu , Cheng-lin Liu

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Architectures > Neural Networks Machine Learning > Application Areas > Model Compression Machine Learning > Core Methods > Feature Learning Deep Learning > Optimization & Theory > Model Compression Deep Learning > Learning Types > Knowledge Distillation

Keywords

model compression feature distribution knowledge distillation data augmentation posterior distribution deep neural network model generalization generalization performance neural network data distortion

Download PDF

Related papers

Cooperative Multimodal Approach to Depression Detection in Twitter 2019

Learning to Align Question and Answer Utterances in Customer Service Conversation with Recurrent Pointer Networks 2019

Community Detection in Social Networks Considering Topic Correlations 2019

Session-Based Recommendation with Graph Neural Networks 2019

Blameworthiness in Multi-Agent Settings 2019