QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Haoxuan Wang; Yuzhang Shang; Zhihang Yuan; Junyi Wu; Junchi Yan; Yan Yan

2025 ICCV ICCV 2025

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Abstract

The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate performance degradation while enhancing quantization efficiency. Our method demonstrates its efficacy across three high-resolution image generation tasks, obtaining state-of-the-art performance across multiple bit-width settings.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — diffusion model quantization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haoxuan Wang , Yuzhang Shang , Zhihang Yuan , Junyi Wu , Junchi Yan , Yan Yan

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Models > Diffusion Models Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Optimization & Theory > Model Compression

Keywords

model compression model quantization diffusion model low-bit quantization activation distribution diffusion model quantization selective finetuning

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025