Knowledge Distillation for Language Models

Yuqiao Wen; Freda Shi; Lili Mou

2025 NAACL NAACL 2025

Knowledge Distillation for Language Models

Abstract

AbstractKnowledge distillation (KD) aims to transfer the knowledge of a teacher (usually a large model) to a student (usually a small one). In this tutorial, our goal is to provide participants with a comprehensive understanding of the techniques and applications of KD for language models. After introducing the basic concepts including intermediate-layer matching and prediction matching, we will present advanced techniques such as reinforcement learning-based KD and multi-teacher distillation. For applications, we will focus on KD for large language models (LLMs), covering topics ranging from LLM sequence compression to LLM self-distillation. The target audience is expected to know the basics of machine learning and NLP, but do not have to be familiar with the details of math derivation and neural models

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio