Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search

Pengtao Xie; Xuefeng Du

2022 CVPR CVPR 2022

Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search

Abstract

Knowledge distillation has shown great effectiveness for improving neural architecture search (NAS). Mutual knowledge distillation (MKD), where a group of models mutually generate knowledge to train each other, has achieved promising results in many applications. In existing MKD methods, mutual knowledge distillation is performed between models without scrutiny: a worse-performing model is allowed to generate knowledge to train a better-performing model, which may lead to collective failures. To address this problem, we propose a performance-aware MKD (PAMKD) approach for NAS, where knowledge generated by model A is allowed to train model B only if the performance of A is better than B. We propose a three-level optimization framework to formulate PAMKD, where three learning stages are performed end-to-end: 1) each model trains an initial model independently; 2) the initial models are evaluated on a validation set and better-performing models generate knowledge to train worse-performing models; 3) architectures are updated by minimizing a validation loss. Experimental results on a variety of datasets demonstrate that our method is effective.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — performance-aware learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Pengtao Xie , Xuefeng Du

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Knowledge Distillation Machine Learning > Learning Types > Multi-Task Learning Machine Learning > Learning Types > Knowledge Distillation Deep Learning > Techniques > Knowledge Distillation Deep Learning > Models > Foundation Models Deep Learning > Learning Types > Knowledge Distillation Artificial Intelligence > Core AI > Knowledge Distillation

Keywords

model compression knowledge transfer knowledge distillation neural architecture search model ensemble mutual learning architecture optimization mutual knowledge distillation performance-aware learning

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022