2025 AAAI AAAI 2025

SOVGaussian: Sparse-View 3D Gaussian Splatting for Open-Vocabulary Scene Understanding

Abstract

Abstract Modeling 3D open-vocabulary language fields is challenging yet highly anticipated. Despite great progress, existing approaches heavily rely on a large number of training views to construct language-embedded 3D scenes, which is unfortunately impractical in real-world scenarios. This paper introduces SOVGaussian, the first method for few-shot novel view open-vocabulary language querying. We introduce a depth-constrained neural language field to mitigate the geometry degradation caused by overfitting training views. Rather than straightforwardly using dense depth maps for loosely accurate supervision, Language-Aware Depth Distillation (LAD) based on open-vocabulary object masks is proposed, ensuring intra-object geometric accuracy within the language field. To further refine the language-geometry consistency of the language field, we propose a novel Language-Guided Outlier Pruning (LOP) strategy, which identifies floating 3D Gaussian primitives overfitting training views based on their language-grouped densities. Our comprehensive experiments demonstrate that SOVGaussian is able to reconstruct a superior scene representation from few-shot images, outperforming existing state-of-the-art methods and achieving significantly better performance on novel view language querying and synthesis.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision
🧭 Keyword Pioneer — language field
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio