Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video

Mohammed Suhail; Erika Lu; Zhengqi Li; Noah Snavely; Leonid Sigal; Forrester Cole

2023 CVPR CVPR 2023

Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video

Abstract

We propose a method to decompose a video into a background and a set of foreground layers, where the background captures stationary elements while the foreground layers capture moving objects along with their associated effects (e.g. shadows and reflections). Our approach is designed for unconstrained monocular videos, with arbitrary camera and object motion. Prior work that tackles this problem assumes that the video can be mapped onto a fixed 2D canvas, severely limiting the possible space of camera motion. Instead, our method applies recent progress in monocular camera pose and depth estimation to create a full, RGBD video layer for the background, along with a video layer for each foreground object. To solve the underconstrained decomposition problem, we propose a new loss formulation based on multi-view consistency. We test our method on challenging videos with complex camera motion and show significant qualitative improvement over current approaches.

🧭 Keyword Pioneer — foreground layer

🐣 Hot Topic Early Bird — multi-view consistency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mohammed Suhail , Erika Lu , Zhengqi Li , Noah Snavely , Leonid Sigal , Forrester Cole

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Video Generation Computer Vision > Processing > Video Processing Computer Vision > Processing > Video Understanding Computer Vision > Analysis > Video Understanding

Keywords

depth estimation neural rendering monocular video camera pose camera pose estimation multi-view consistency foreground segmentation layer separation video decomposition foreground layer

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023