2024 CVPR CVPR 2024

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

Abstract

In this paper we introduce Fairy a minimalist yet robust adaptation of image-editing diffusion models enhancing them for video editing applications. Our approach centers on the concept of anchor-based cross-frame attention a mechanism that implicitly propagates diffusion features across frames ensuring superior temporal coherence and high-fidelity synthesis. Fairy not only addresses limitations of previous models including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy. This strategy renders the model equivariant to affine transformations in both source and target images. Remarkably efficient Fairy generates 120-frame 512x384 videos (4-second duration at 30 FPS) in just 14 seconds outpacing prior works by at least 44x. A comprehensive user study involving 1000 generated samples confirms that our approach delivers superior quality decisively outperforming established methods.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning
🧭 Keyword Pioneer — anchor-based attention
🐣 Hot Topic Early Bird — video synthesis
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio