SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection

Hyo-Jun Lee; Yeong Jun Koh; Hanul Kim; Hyunseop Kim; Yonguk Lee; Jinu Lee

2025 CVPR CVPR 2025

SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection

Abstract

Existing view transformations in vision-centric 3D Semantic Scene Completion (SSC) inevitably experience erroneous feature duplication in the reconstructed voxel space due to occlusions, leading to a dilution of informative contexts. Furthermore, semantic classes exhibit high variability in their appearance in real-world driving scenarios. To address these issues, we introduce a novel 3D SSC method, called SOAP, including two key components: an occluded region-aware view projection and a scene-adaptive decoder. The occluded region-aware view projection effectively converts 2D image features into voxel space, refining the duplicated features of occluded regions using information gathered from previous observations. The scene-adaptive decoder guides query embeddings to learn diverse driving environments based on a comprehensive semantic repository. Extensive experiments validate that the proposed SOAP significantly outperforms existing methods for the vision-centric 3D SSC on automated driving datasets, SemanticKITTI and SSCBench. Code is available at https://github.com/gywns6287/SOAP.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — view projection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hyo-Jun Lee , Yeong Jun Koh , Hanul Kim , Hyunseop Kim , Yonguk Lee , Jinu Lee

Topics

Deep Learning > Architectures > Transformers Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Semantic Segmentation Computer Vision > Domain-Specific > Autonomous Driving Computer Vision > Processing > 3D Vision

Keywords

3d reconstruction scene understanding autonomous driving 3d vision semantic scene completion view projection voxel space

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025