Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

Yuhao Liu; Zhanghan Ke; Fang Liu; Nanxuan Zhao; Rynson W.H. Lau

2024 CVPR CVPR 2024

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

Abstract

Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However due to the randomness in the diffusion process they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity results across a variety of low-level tasks. Specifically we first propose a lightweight Task-Plugin module with a dual branch design to provide task-specific priors guiding the diffusion process in preserving image content. We then propose a Plugin-Selector that can automatically select different Task-Plugins based on the text instruction allowing users to edit images by indicating multiple low-level tasks with natural language. We conduct extensive experiments on 8 low-level vision tasks. The results demonstrate the superiority of Diff-Plugin over existing methods particularly in real-world scenarios. Our ablations further validate that Diff-Plugin is stable schedulable and supports robust training across different dataset sizes.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — task-plugin module

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuhao Liu , Zhanghan Ke , Fang Liu , Nanxuan Zhao , Rynson W.H. Lau

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Processing > Image Editing Computer Vision > Processing > Image Restoration

Keywords

image restoration image editing diffusion model latent diffusion low-level vision detail preservation task-plugin module text instruction

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024