h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform

Toan Nguyen; Kien Do; Duc Kieu; Thin Nguyen

2025 CVPR CVPR 2025

h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform

Abstract

We introduce a theoretical framework for diffusion-based image editing by formulating it as a reverse-time bridge modeling problem. This approach modifies the backward process of a pretrained diffusion model to construct a bridge that converges to an implicit distribution associated with the editing target at time 0. Building on this framework, we propose h-Edit, a novel editing method that utilizes Doob's h-transform and Langevin Monte Carlo to decompose the update of an intermediate edited sample into two components: a "reconstruction" term and an "editing" term. This decomposition provides flexibility, allowing the reconstruction term to be computed via existing inversion techniques and enabling the combination of multiple editing terms to handle complex editing tasks. To our knowledge, h-Edit is the first training-free method capable of performing simultaneous text-guided and reward-model-based editing. Extensive experiments, both quantitative and qualitative, show that h-Edit outperforms state-of-the-art baselines in terms of editing effectiveness and faithfulness.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — doob's h-transform

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Toan Nguyen , Kien Do , Duc Kieu , Thin Nguyen

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Processing > Image Editing Computer Vision > Generation > Image Editing

Keywords

image editing diffusion model text-guided editing langevin monte carlo doob h-transform doob's h-transform reverse-time process reverse-time bridge

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025