EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su; Ching-Hsun Tseng; Bin Pu; lei zhao; Jiewen Yang; Zhuangzhuang Chen; Shin-Jye Lee

2025 ICCV ICCV 2025

EA-KD: Entropy-based Adaptive Knowledge Distillation

Abstract

Knowledge distillation (KD) enables a smaller "student" model to mimic a larger "teacher" model by transferring knowledge from the teacher's output or features. However, most KD methods treat all samples uniformly, overlooking the varying learning value of each sample and thereby limiting effectiveness. In this paper, we propose Entropy-based Adaptive Knowledge Distillation (EA-KD), a simple yet effective plug-and-play KD method that prioritizes learning from valuable samples. EA-KD quantifies each sample's learning value by strategically combining the entropy of the teacher and student output, then dynamically reweights the distillation loss to place greater emphasis on high-entropy samples. Extensive experiments across diverse KD frameworks and tasks--including image classification, object detection, and large language model (LLM) distillation--demonstrate that EA-KD consistently enhances performance, achieving state-of-the-art results with negligible computational cost. Our code is available at: https://github.com/cpsu00/EA-KD

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chi-Ping Su , Ching-Hsun Tseng , Bin Pu , lei zhao , Jiewen Yang , Zhuangzhuang Chen , Shin-Jye Lee

Topics

Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Techniques > Model Architecture Machine Learning > Application Areas > Model Compression Deep Learning > Optimization & Theory > Model Compression Deep Learning > Learning Types > Knowledge Distillation

Keywords

model compression knowledge distillation entropy-based method adaptive weighting student-teacher model large language model student teacher sample prioritization

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025