Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter

Jianhui Zhang; Shen Cheng; Qirui Sun; Jia Liu; Wang Luyang; Chaoyu Feng; Chen Fang; Lei Lei; Jue Wang; Shuaicheng Liu

2025 ICCV ICCV 2025

Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter

Abstract

In this work, we present Patch-Adapter, an effective framework for high-resolution text-guided image inpainting. Unlike existing methods limited to lower resolutions, our approach achieves 4K+ resolution while maintaining precise content consistency and prompt alignment--two critical challenges in image inpainting that intensify with increasing resolution and texture complexity.Patch-Adapter leverages a two-stage adapter architecture to scale the Diffusion models's resolution from 1K to 4K+ without requiring structural overhauls:(1)Dual Context Adapter: Learns coherence between masked and unmasked regions at reduced resolutions to establish global structural consistency.(2)Reference Patch Adapter: Implements a patch-level attention mechanism for full-resolution inpainting, preserving local detail fidelity through adaptive feature fusion.This dual-stage architecture uniquely addresses the scalability gap in high-resolution inpainting by decoupling global semantics from localized refinement. Experiments demonstrate that Patch-Adapter not only resolves artifacts common in large-scale inpainting but also achieves state-of-the-art performance on the OpenImages and photo-concept-bucket datasets, outperforming existing methods in both perceptual quality and text-prompt adherence. The code is available at: https://github.com/Roveer/Patch-Based-Adapter

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — patch-level attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jianhui Zhang , Shen Cheng , Qirui Sun , Jia Liu , Wang Luyang , Chaoyu Feng , Chen Fang , Lei Lei , Jue Wang , Shuaicheng Liu

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Vision > Processing > Image Restoration

Keywords

image inpainting diffusion model high resolution prompt alignment high-resolution image content consistency patch-level attention

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025