SuperPrimitive: Scene Reconstruction at a Primitive Level

Kirill Mazur; Gwangbin Bae; Andrew J. Davison

2024 CVPR CVPR 2024

SuperPrimitive: Scene Reconstruction at a Primitive Level

Abstract

Joint camera pose and dense geometry estimation from a set of images or a monocular video remains a challenging problem due to its computational complexity and inherent visual ambiguities. Most dense incremental reconstruction systems operate directly on image pixels and solve for their 3D positions using multi-view geometry cues. Such pixel-level approaches suffer from ambiguities or violations of multi-view consistency (e.g. caused by textureless or specular surfaces). We address this issue with a new image representation which we call a SuperPrimitive. SuperPrimitives are obtained by splitting images into semantically correlated local regions and enhancing them with estimated surface normal directions both of which are predicted by state-of-the-art single image neural networks. This provides a local geometry estimate per SuperPrimitive while their relative positions are adjusted based on multi-view observations. We demonstrate the versatility of our new representation by addressing three 3D reconstruction tasks: depth completion few-view structure from motion and monocular dense visual odometry. Project page: https://makezur.github.io/SuperPrimitive/

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kirill Mazur , Gwangbin Bae , Andrew J. Davison

Topics

Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Depth Estimation Computer Vision > Processing > Image Restoration Machine Learning > Learning Types > Representation Learning Computer Vision > Processing > Image Processing Computer Vision > Processing > 3D Vision

Keywords

3d reconstruction depth estimation 3d vision structure from motion surface normal estimation scene reconstruction surface normal visual odometry depth completion multi-view geometry dense reconstruction

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024