2024 CVPR CVPR 2024

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Abstract

Contemporary machine learning which involves training large neural networks on massive datasets faces significant computational challenges. Dataset distillation as a recent emerging strategy aims to compress real-world datasets for efficient training. However this line of research currently struggles with large-scale and high-resolution datasets hindering its practicality and feasibility. Thus we re-examine existing methods and identify three properties essential for real-world applications: realism diversity and efficiency. As a remedy we propose RDED a novel computationally-efficient yet effective data distillation paradigm to enable both diversity and realism of the distilled data. Extensive empirical results over various model architectures and datasets demonstrate the advancement of RDED: we can distill a dataset to 10 images per class from full ImageNet-1K within 7 minutes achieving a notable 42% accuracy with ResNet-18 on a single RTX-4090 GPU (while the SOTA only achieves 21% but requires 6 hours). Code: https://github.com/LINs-lab/RDED.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — data realism
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio