Flexible Spatio-Temporal Networks for Video Prediction

Chaochao Lu; Michael Hirsch; Bernhard Schölkopf

2017 CVPR CVPR 2017

Flexible Spatio-Temporal Networks for Video Prediction

Abstract

We describe a modular framework for video frame prediction. We refer to it as a Flexible Spatio-Temporal Network (FSTN) as it allows the extrapolation of a video sequence as well as the estimation of synthetic frames lying in between observed frames and thus the generation of slow-motion videos. By devising a customized objective function comprising decoding, encoding, and adversarial losses, we are able to mitigate the common problem of blurry predictions, managing to retain high frequency information even for relatively distant future predictions. We propose and analyse different training strategies to optimize our model. Extensive experiments on several challenging public datasets demonstrate both the versatility and validity of our model.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

📈 Trend Setter — Loss Functions

🧭 Keyword Pioneer — slow-motion video

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chaochao Lu , Michael Hirsch , Bernhard Schölkopf

Topics

Computer Vision > Generation > Video Generation Deep Learning > Optimization & Theory > Loss Functions Deep Learning > Learning Types > Generative Models

Keywords

adversarial learning video prediction frame interpolation adversarial loss spatio-temporal network frame prediction slow-motion video

Download PDF

Related papers

Deep Outdoor Illumination Estimation 2017

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild 2017

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition 2017

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization 2017