AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

Qifan Yu; Wei Chow; Zhongqi Yue; Kaihang Pan; Yang Wu; Xiaoyang Wan; Juncheng Li; Siliang Tang; Hanwang Zhang; Yueting Zhuang

2025 CVPR CVPR 2025

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

Abstract

Instruction-based image editing aims to modify specific image elements with natural language instructions. However, current models in this domain often struggle to execute complex user instructions accurately, as they are trained on low-quality data with limited editing types. We present AnyEdit, a comprehensive multi-modal instruction editing dataset, comprising 2.5 million high-quality editing pairs spanning over 20 editing types and five domains. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Using the dataset, we further train a novel AnyEdit Stable Diffusion with task-aware routing and learnable task embedding for unified image editing. Comprehensive experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models. This presents prospects for developing instruction-driven image editing models that support human creativity.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — task-aware routing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qifan Yu , Wei Chow , Zhongqi Yue , Kaihang Pan , Yang Wu , Xiaoyang Wan , Juncheng Li , Siliang Tang , Hanwang Zhang , Yueting Zhuang

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Vision > Processing > Image Editing

Keywords

image generation multimodal learning multi-modal learning image editing text-to-image generation instruction following diffusion model stable diffusion image manipulation instruction-based editing task-aware routing

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025