Inpaint-Anywhere: Zero-Shot Multi-Identity Inpainting with Efficient Diffusion Transformer

Junsheng Luan; lei zhao; Wei Xing

2026 AAAI AAAI 2026

Inpaint-Anywhere: Zero-Shot Multi-Identity Inpainting with Efficient Diffusion Transformer

Abstract

Abstract Subject-driven generation, which aims to synthesize visual content for a given identity V* with specific attributes, has garnered increasing attention in recent years. While existing methods demonstrate impressive identity consistency for both single and multiple identities, they often lack user-specified spatial control. Recent approaches, such as OminiControl-2 and EasyControl, enable inpainting conditioned on a single identity but fall short in multi-identity scenarios. In this paper, we introduce BoundID, a dataset synthesis pipeline for generating multi-identity images with bounding box annotations, and introduce Inpaint-Anywhere, a diffusion transformer framework for multi-identity inpainting. Given multiple identity references and corresponding masks, our method simultaneously generates all desired identities at precise locations while achieving both high identity and prompt fidelity. Extensive experiments show that Inpaint-Anywhere achieves state-of-the-art performance in multi-identity inpainting.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — multi-identity generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Junsheng Luan , lei zhao , Wei Xing

Topics

Machine Learning > Learning Types > Zero-Shot Learning Deep Learning > Models > Diffusion Models Computer Vision > Processing > Image Editing

Keywords

zero-shot learning image editing image inpainting diffusion transformer multi-identity generation

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026