IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement

Zhihao Shi; Dong Huo; Yuhongze Zhou; Yan Min; Juwei Lu; Xinxin Zuo

2025 CVPR CVPR 2025

IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement

Abstract

Current 3D inpainting and object removal methods are largely limited to front-facing scenes, facing substantial challenges when applied to diverse, "unconstrained" scenes where the camera orientation and trajectory are unrestricted. To bridge this gap, we introduce a novel approach that produces inpainted 3D scenes with consistent visual quality and coherent underlying geometry across both front-facing and unconstrained scenes. Specifically, we propose a robust 3D inpainting pipeline that incorporates geometric priors and a multi-view refinement network trained via test-time adaptation, building on a pre-trained image inpainting model. Additionally, we develop a novel inpainting mask detection technique to derive targeted inpainting masks from object masks, boosting the performance in handling unconstrained scenes. To validate the efficacy of our approach, we create a challenging and diverse benchmark that spans a wide range of scenes. Comprehensive experiments demonstrate that our proposed method substantially outperforms existing state-of-the-art approaches.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhihao Shi , Dong Huo , Yuhongze Zhou , Yan Min , Juwei Lu , Xinxin Zuo

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Computer Vision > Analysis > 3D Vision Computer Vision > Processing > Image Editing Computer Vision > Processing > Image Restoration Deep Learning > Learning Types > Multi-Modal Learning

Keywords

test-time adaptation scene reconstruction geometry-guided learning view synthesis geometry prior scene completion 3d inpainting multi-view refinement

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025