Learning Semantic-Aware Dynamics for Video Prediction

Xinzhu Bei; Yanchao Yang; Stefano Soatto

2021 CVPR CVPR 2021

Learning Semantic-Aware Dynamics for Video Prediction

Abstract

We propose an architecture and training scheme to predict video frames by explicitly modeling dis-occlusions and capturing the evolution of semantically consistent regions in the video. The scene layout (semantic map) and motion (optical flow) are decomposed into layers, which are predicted and fused with their context to generate future layouts and motions. The appearance of the scene is warped from past frames using the predicted motion in co-visible regions; dis-occluded regions are synthesized with content-aware inpainting utilizing the predicted scene layout. The result is a predictive model that explicitly represents objects and learns their class-specific motion, which we evaluate on video prediction benchmarks.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐣 Hot Topic Early Bird — motion prediction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinzhu Bei , Yanchao Yang , Stefano Soatto

Topics

Computer Vision > Generation > Video Generation Computer Vision > Processing > Video Understanding Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Architectures > Convolutional Neural Networks

Keywords

semantic segmentation video prediction motion estimation optical flow image inpainting semantic map motion prediction

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021