Multi-View Stereo by Temporal Nonparametric Fusion

Yuxin Hou; Juho Kannala; Arno Solin

2019 ICCV ICCV 2019

Multi-View Stereo by Temporal Nonparametric Fusion

Abstract

We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder-decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior. We propose a pose-kernel structure that encourages similar poses to have resembling latent spaces. The flexibility of the Gaussian process (GP) prior provides adapting memory for fusing information from nearby views. We train the encoder-decoder and the GP hyperparameters jointly end-to-end. In addition to a batch method, we derive a lightweight estimation scheme that circumvents standard pitfalls in scaling Gaussian process inference, and demonstrate how our scheme can run in real-time on smart devices.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🐣 Hot Topic Early Bird — multi-view stereo

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Yuxin Hou , Juho Kannala , Arno Solin

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Depth Estimation

Keywords

disparity estimation depth estimation gaussian process prior multi-view stereo encoder-decoder architecture temporal fusion

Download PDF

Related papers

Hierarchical Self-Attention Network for Action Localization in Videos 2019

StructureFlow: Image Inpainting via Structure-Aware Appearance Flow 2019

Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild 2019

Compact Trilinear Interaction for Visual Question Answering 2019

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image 2019