Learning To Recover 3D Scene Shape From a Single Image

Wei Yin; Jianming Zhang; Oliver Wang; Simon Niklaus; Long Mai; Simon Chen; Chunhua Shen

2021 CVPR CVPR 2021

Learning To Recover 3D Scene Shape From a Single Image

Abstract

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at:https://git.io/Depth.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — point cloud encoder

🐣 Hot Topic Early Bird — zero-shot generalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wei Yin , Jianming Zhang , Oliver Wang , Simon Niklaus , Long Mai , Simon Chen , Chunhua Shen

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Depth Estimation Machine Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Representation Learning Computer Vision > Processing > Depth Estimation Computer Vision > Processing > 3D Vision

Keywords

monocular depth estimation zero-shot generalization scene geometry 3d scene reconstruction depth prediction point cloud encoder 3d scene shape depth shift camera focal length

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021