DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Lu Ling; Yichen Sheng; Zhi Tu; Wentian Zhao; Cheng Xin; Kun Wan; Lantao Yu; Qianyu Guo; Zixun Yu; Yawen Lu; Xuanmao Li; Xingpeng Sun; Rohan Ashok; Aniruddha Mukherjee; Hao Kang; Xiangrui Kong; Gang Hua; Tianyi Zhang; Bedrich Benes; Aniket Bera

2024 CVPR CVPR 2024

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Abstract

We have witnessed significant progress in deep learning-based 3D vision ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However existing scene-level datasets for deep learning-based 3D vision limited to either synthetic environments or a narrow selection of real-world scenes are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap we present DL3DV-10K a large-scale scene dataset featuring 51.2 million frames from 10510 videos captured from 65 types of point-of-interest (POI) locations covering both bounded and unbounded scenes with different levels of reflection transparency and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K which revealed valuable insights for future research in NVS. In addition we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation. Our DL3DV-10K dataset benchmark results and models will be publicly accessible.

👥 Mega-Team — 20 authors

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — scene dataset

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lu Ling , Yichen Sheng , Zhi Tu , Wentian Zhao , Cheng Xin , Kun Wan , Lantao Yu , Qianyu Guo , Zixun Yu , Yawen Lu , Xuanmao Li , Xingpeng Sun , Rohan Ashok , Aniruddha Mukherjee , Hao Kang , Xiangrui Kong , Gang Hua , Tianyi Zhang , Bedrich Benes , Aniket Bera

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Pretraining Computer Vision Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Image Generation Deep Learning > Models > Neural Networks Computer Vision > Core AI > Computer Vision Computer Vision > Generation > 3D Generation Deep Learning > Application Areas > Computer Vision

Keywords

representation learning 3d vision foundation model neural radiance field novel view synthesis 3d representation learning scene dataset

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024