Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Kangning Liu; Shuhang Gu; Andres Romero; Radu Timofte

2021 WACV WACV 2021

Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Abstract

Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose a novel unsupervised video-to-video translation model. Our model decomposes the style and the content uses the specialized encoder-decoder structure and propagates the inter-frame information through bidirectional recurrent neural network (RNN) units. The style-content decomposition mechanism enables us to achieve style-consistent video translation results as well as provides us with a good interface for modality flexible translation. In addition, by changing the input frames and style codes incorporated in our translation, we propose a video interpolation loss, which captures temporal information within the sequence to train our building blocks in a self-supervised manner. Our model can produce photo-realistic, spatio-temporal consistent translated videos in a multimodal way. Subjective and objective experimental results validate the superiority of our model over existing methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — style-content decomposition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kangning Liu , Shuhang Gu , Andres Romero , Radu Timofte

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Unsupervised Learning Deep Learning > Models > Generative Models

Keywords

unsupervised learning self-supervised learning generative adversarial network bidirectional recurrent neural network video-to-video translation style-content decomposition

Download PDF

Related papers

Multimodal Humor Dataset: Predicting Laughter Tracks for Sitcoms 2021

Benchmark for Evaluating Pedestrian Action Prediction 2021

Regional Attention Networks With Context-Aware Fusion for Group Emotion Recognition 2021

Robust Lensless Image Reconstruction via PSF Estimation 2021

Improved Training of Generative Adversarial Networks Using Decision Forests 2021