Eliminating the Cross-Domain Misalignment in Text-guided Image Inpainting

Muqi Huang; Chaoyue Wang; Yong Luo; Lefei Zhang

2024 IJCAI IJCAI 2024

Eliminating the Cross-Domain Misalignment in Text-guided Image Inpainting

Abstract

Text-guided image inpainting has rapidly garnered prominence as a task in user-directed image synthesis, aiming to complete the occluded image regions following the textual prompt provided. However, current methods usually grapple with issues arising from the disparity between low-level pixel data and high-level semantic descriptions, which results in inpainted sections not harmonizing with the original image (either structurally or texturally). In this study, we introduce a Structure-Aware Inpainting Learning scheme and an Asymmetric Cross Domain Attention to address these cross-domain misalignment challenges. The proposed structure-aware learning scheme employs features of an intermediate modality as structure guidance to bridge the gap between text information and low-level pixels. Meanwhile, asymmetric cross-domain attention enhances the texture consistency between inpainted and unmasked regions. Our experiments show exceptional performance on leading datasets such as MS-COCO and Open Images, surpassing state-of-the-art text-guided image inpainting methods. Code is released at: https://github.com/MucciH/ECDM-inpainting

🧭 Keyword Pioneer — text-guided image inpainting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

Authors

Muqi Huang , Chaoyue Wang , Yong Luo , Lefei Zhang

Topics

Computer Vision > Generation > Image Generation Computer Vision > Processing > Image Editing Deep Learning > Learning Types > Multi-Modal Learning

Keywords

image generation attention mechanism cross-domain learning multi-modal learning image inpainting feature fusion text-guided image inpainting cross-domain alignment texture consistency

Download PDF

Related papers

Langshaw: Declarative Interaction Protocols Based on Sayso and Conflict 2024

A Successful Strategy for Multichannel Iterated Prisoner’s Dilemma 2024

Bring Metric Functions into Diffusion Models 2024

Fast One-Stage Unsupervised Domain Adaptive Person Search 2024

FreqFormer: Frequency-aware Transformer for Lightweight Image Super-resolution 2024