Position-Aware Guided Point Cloud Completion with CLIP Model

Feng Zhou; Qi Zhang; Ju Dai; Lei Li; Qing Fan; Junliang Xing

2025 AAAI AAAI 2025

Position-Aware Guided Point Cloud Completion with CLIP Model

Abstract

Abstract Point cloud completion aims to recover partial geometric and topological shapes caused by equipment defects or limited viewpoints. Current methods either solely rely on the 3D coordinates of the point cloud to complete it or incorporate additional images with well-calibrated intrinsic parameters to guide the geometric estimation of the missing parts. Although these methods have achieved excellent performance by directly predicting the location of complete points, the extracted features lack fine-grained information regarding the location of the missing area. To address this issue, we propose a rapid and efficient method to expand an unimodal framework into a multimodal framework. This approach incorporates a position-aware module designed to enhance the spatial information of the missing parts through a weighted map learning mechanism. In addition, we establish a Point-Text-Image triplet corpus PCI-TI and MVP-TI based on the existing unimodal point cloud completion dataset and use the pre-trained vision-language model CLIP to provide richer detail information for 3D shapes, thereby enhancing performance. Extensive quantitative and qualitative experiments demonstrate that our method outperforms state-of-the-art point cloud completion methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — position-aware module

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Feng Zhou , Qi Zhang , Ju Dai , Lei Li , Qing Fan , Junliang Xing

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Analysis > 3D Vision Deep Learning > Models > Transformers Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Processing > Point Cloud Processing Deep Learning > Models > Vision-Language Models

Keywords

3d reconstruction multimodal learning multi-modal learning vision-language model clip model spatial information multimodal framework point cloud completion 3d shape understanding position-aware module

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025