Di[M]O: Distilling Masked Diffusion Models into One-step Generator

Yuanzhi Zhu; Xi Wang; Stéphane Lathuilière; Vicky Kalogeiton

2025 ICCV ICCV 2025

Di[M]O: Distilling Masked Diffusion Models into One-step Generator

Abstract

Masked Diffusion Models (MDMs) have emerged as a powerful generative modeling technique. Despite their remarkable results, they typically suffer from slow inference with several steps. In this paper, we propose Di\mathtt [M] O, a novel approach that distills masked diffusion models into a one-step generator.Di\mathtt [M] O addresses two key challenges: (1) the intractability of using intermediate-step information for one-step generation, which we solve through token-level distribution matching that optimizes model output logits by an `on-policy framework' with the help of an auxiliary model; and (2) the lack of entropy in the initial distribution, which we address through a token initialization strategy that injects randomness while maintaining similarity to teacher training distribution. We show Di\mathtt [M] O's effectiveness on both class-conditional and text-conditional image generation, impressively achieving performance competitive to multi-step teacher outputs while drastically reducing inference time. To our knowledge, we are the first to successfully achieve one-step distillation of masked diffusion models and the first to apply discrete distillation to text-to-image generation, opening new paths for efficient generative modeling.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — token-level distribution matching

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuanzhi Zhu , Xi Wang , Stéphane Lathuilière , Vicky Kalogeiton

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Models > Diffusion Models Deep Learning > Models > Generative Models Deep Learning > Techniques > Model Architecture Computer Vision > Generation > Image Generation

Keywords

model compression image generation knowledge distillation model distillation diffusion model one-step generation masked diffusion token-level distribution matching masked diffusion model

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025