Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model

Haobo Jiang; Jin Xie; Jian Yang; Liang Yu; Jianmin Zheng

2025 CVPR CVPR 2025

Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model

Abstract

This paper introduces ZeroMatch, a novel zero-shot RGB-D point cloud registration framework, aimed at achieving robust 3D matching on unseen data without any task-specific training. Our core idea is to utilize the powerful zero-shot image representation of Stable Diffusion, achieved through extensive pre-training on large-scale data, to enhance point-cloud geometric descriptors for robust matching. Specifically, we combine the handcrafted geometric descriptor FPFH with Stable-Diffusion features to create point descriptors that are both locally and contextually aware, enabling reliable RGB-D registration with zero-shot capability. This approach is based on our observation that Stable-Diffusion features effectively encode discriminative global contextual cues, naturally alleviating the feature ambiguity that FPFH often encounters in scenes with repetitive patterns or low overlap. To further enhance cross-view consistency of Stable-Diffusion features for improved matching, we propose a coupled-image input mode that concatenates the source and target images into a single input, replacing the original single-image mode. This design achieves both inter-image and prompt-to-image consistency attentions, facilitating robust cross-view feature interaction and alignment. Finally, we leverage feature nearest neighbors to construct putative correspondences for hypothesize-and-verify transformation estimation. Extensive experiments on 3DMatch, ScanNet, and ScanLoNet verify the excellent zero-shot matching ability of our method.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — zero-shot matching

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haobo Jiang , Jin Xie , Jian Yang , Liang Yu , Jianmin Zheng

Topics

Machine Learning > Learning Types > Zero-Shot Learning Deep Learning > Models > Diffusion Models Deep Learning > Techniques > Pretraining Computer Vision > Analysis > 3D Vision Machine Learning > Learning Paradigms > Zero-Shot Learning Deep Learning > Learning Types > Zero-Shot Learning

Keywords

zero-shot learning feature extraction point cloud registration feature matching stable diffusion 3d matching zero-shot matching rgb-d registration geometric descriptor

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025