Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Gaurav Shrivastava; Abhinav Shrivastava

2024 CVPR CVPR 2024

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Abstract

Diffusion models have made significant strides in image generation mastering tasks such as unconditional image synthesis text-image translation and image-to-image conversions. However their capability falls short in the realm of video prediction mainly because they treat videos as a collection of independent images relying on external constraints such as temporal attention mechanisms to enforce temporal coherence. In our paper we introduce a novel model class that treats video as a continuous multi-dimensional process rather than a series of discrete frames. Through extensive experimentation we establish state-of-the-art performance in video prediction validated on benchmark datasets including KTH BAIR Human3.6M and UCF101.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — continuous process

🐣 Hot Topic Early Bird — temporal coherence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gaurav Shrivastava , Abhinav Shrivastava

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Video Generation Artificial Intelligence > Core AI > Computer Vision

Keywords

video generation video prediction generative model diffusion model temporal coherence continuous process

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024