Video Harmonization with Triplet Spatio-Temporal Variation Patterns

Zonghui Guo; Xinyu Han; Jie Zhang; Shiguang Shan; Haiyong Zheng

2024 CVPR CVPR 2024

Video Harmonization with Triplet Spatio-Temporal Variation Patterns

Abstract

Video harmonization is an important and challenging task that aims to obtain visually realistic composite videos by automatically adjusting the foreground's appearance to harmonize with the background. Inspired by the short-term and long-term gradual adjustment process of manual harmonization we present a Video Triplet Transformer framework to model three spatio-temporal variation patterns within videos i.e. short-term spatial as well as long-term global and dynamic for video-to-video tasks like video harmonization. Specifically for short-term harmonization we adjust foreground appearance to consist with background in spatial dimension based on the neighbor frames; for long-term harmonization we not only explore global appearance variations to enhance temporal consistency but also alleviate motion offset constraints to align similar contextual appearances dynamically. Extensive experiments and ablation studies demonstrate the effectiveness of our method achieving state-of-the-art performance in video harmonization video enhancement and video demoireing tasks. We also propose a temporal consistency metric to better evaluate the harmonized videos. Code is available at https://github.com/zhenglab/VideoTripletTransformer.

🧭 Keyword Pioneer — spatio-temporal variation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Zonghui Guo , Xinyu Han , Jie Zhang , Shiguang Shan , Haiyong Zheng

Topics

Computer Vision > Processing > Image Editing Computer Vision > Processing > Video Processing Computer Vision > Processing > Video Understanding

Keywords

video enhancement temporal consistency video harmonization spatio-temporal variation foreground-background harmonization

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024