Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance

Duc-Hai Pham; Duc-Dung Nguyen; Anh Pham; Tuan Ho; Phong Nguyen; Khoi Nguyen; Rang Nguyen

2025 AAAI AAAI 2025

Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance

Abstract

Abstract Accurate prediction of 3D semantic occupancy from 2D visual images is vital in enabling autonomous agents to comprehend their surroundings for planning and navigation. State-of-the-art methods typically employ fully supervised approaches, necessitating a huge labeled dataset acquired through expensive LiDAR sensors and meticulous voxel-wise labeling by human annotators. The resource-intensive nature of this annotating process significantly hampers the application and scalability of these methods. We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data. Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues, facilitating a more efficient training process. Our framework exhibits notable properties: (1) Generalizability, applicable to various 3D semantic scene completion approaches, including 2D-3D lifting and 3D-2D transformer methods. (2) Effectiveness, as demonstrated through experiments on SemanticKITTI and NYUv2, wherein our method achieves up to 85% of the fully-supervised performance using only 10% labeled data. This approach not only reduces the cost and labor associated with data annotation but also demonstrates the potential for broader adoption in camera-based systems for 3D semantic occupancy prediction.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Duc-Hai Pham , Duc-Dung Nguyen , Anh Pham , Tuan Ho , Phong Nguyen , Khoi Nguyen , Rang Nguyen

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Computer Vision > Analysis > 3D Vision Machine Learning > Learning Paradigms > Semi-Supervised Learning Deep Learning > Models > Foundation Models

Keywords

semi-supervised learning scene understanding foundation model vision foundation model semantic scene completion semantic occupancy scene completion 3d semantic occupancy 2d-3d lifting

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025