Shape-Aware Text-Driven Layered Video Editing

Yao-Chih Lee; Ji-Ze Genevieve Jang; Yi-Ting Chen; Elizabeth Qiu; Jia-Bin Huang

2023 CVPR CVPR 2023

Shape-Aware Text-Driven Layered Video Editing

Abstract

Temporal consistency is essential for video editing applications. Existing work on layered representation of videos allows propagating edits consistently to each frame. These methods, however, can only edit object appearance rather than object shape changes due to the limitation of using a fixed UV mapping field for texture atlas. We present a shape-aware, text-driven video editing method to tackle this challenge. To handle shape changes in video editing, we first propagate the deformation field between the input and edited keyframe to all frames. We then leverage a pre-trained text-conditioned diffusion model as guidance for refining shape distortion and completing unseen regions. The experimental results demonstrate that our method can achieve shape-aware consistent video editing and compare favorably with the state-of-the-art.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐣 Hot Topic Early Bird — video editing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yao-Chih Lee , Ji-Ze Genevieve Jang , Yi-Ting Chen , Elizabeth Qiu , Jia-Bin Huang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Procedural Generation Deep Learning > Models > Diffusion Models Computer Vision > Generation > Video Generation Computer Vision > Processing > Video Processing

Keywords

diffusion model video editing text-to-image diffusion temporal consistency shape deformation layered representation text-driven editing

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023