Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation

Guozhen Zhang; Yuhan Zhu; Haonan Wang; Youxin Chen; Gangshan Wu; Limin Wang

2023 CVPR CVPR 2023

Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation

Abstract

Effectively extracting inter-frame motion and appearance information is important for video frame interpolation (VFI). Previous works either extract both types of information in a mixed way or devise separate modules for each type of information, which lead to representation ambiguity and low efficiency. In this paper, we propose a new module to explicitly extract motion and appearance information via a unified operation. Specifically, we rethink the information process in inter-frame attention and reuse its attention map for both appearance feature enhancement and motion information extraction. Furthermore, for efficient VFI, our proposed module could be seamlessly integrated into a hybrid CNN and Transformer architecture. This hybrid pipeline can alleviate the computational complexity of inter-frame attention as well as preserve detailed low-level structure information. Experimental results demonstrate that, for both fixed- and arbitrary-timestep interpolation, our method achieves state-of-the-art performance on various datasets. Meanwhile, our approach enjoys a lighter computation overhead over models with close performance. The source code and models are available at https://github.com/MCG-NJU/EMA-VFI.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — hybrid cnn transformer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Guozhen Zhang , Yuhan Zhu , Haonan Wang , Youxin Chen , Gangshan Wu , Limin Wang

Topics

Deep Learning > Architectures > Transformers Deep Learning > Models > Generative Models Computer Vision > Processing > Video Processing Computer Vision > Core AI > Computer Vision

Keywords

appearance modeling video frame interpolation motion estimation appearance feature inter-frame attention motion extraction hybrid cnn transformer

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023