EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Zihao Zhang; Haoran Chen; Haoyu Zhao; Guansong Lu; Yanwei Fu; Hang Xu; Zuxuan Wu

2025 CVPR CVPR 2025

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Abstract

Handling complex or nonlinear motion patterns has long posed challenges for video frame interpolation. Although recent advances in diffusion-based methods offer improvements over traditional optical flow-based approaches, they still struggle to generate sharp, temporally consistent frames in scenarios with large motion. To address this limitation, we introduce EDEN, an Enhanced Diffusion for high-quality large-motion vidEo frame iNterpolation. Our approach first utilizes a transformer-based tokenizer to produce refined latent representations of the intermediate frames for diffusion models. We then enhance the diffusion transformer with temporal attention across the process and incorporate a start-end frame difference embedding to guide the generation of dynamic motion. Extensive experiments demonstrate that EDEN achieves state-of-the-art results across popular benchmarks, including nearly a 10% LPIPS reduction on DAVIS and SNU-FILM, and an 8% improvement on DAIN-HD.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — transformer tokenizer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zihao Zhang , Haoran Chen , Haoyu Zhao , Guansong Lu , Yanwei Fu , Hang Xu , Zuxuan Wu

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Video Generation Computer Vision > Processing > Video Processing Deep Learning > Learning Types > Representation Learning

Keywords

video frame interpolation diffusion model temporal attention large motion transformer tokenizer

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025