Zero-shot High-fidelity and Pose-controllable Character Animation

Bingwen Zhu; Fanyi Wang; Tianyi Lu; Peng Liu; Jingwen Su; Jinxiu Liu; Yanhao Zhang; Zuxuan Wu; Guo-Jun Qi; Yu-Gang Jiang

2024 IJCAI IJCAI 2024

Zero-shot High-fidelity and Pose-controllable Character Animation

Abstract

Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations, we propose PoseAnimate, a novel zero-shot I2V framework for character animation. PoseAnimate contains three key components: 1) a Pose-Aware Control Module (PACM) that incorporates diverse pose signals into text embeddings, to preserve character-independent content and maintain precise alignment of actions. 2) a Dual Consistency Attention Module (DCAM) that enhances temporal consistency and retains character identity and intricate background details. 3) a Mask-Guided Decoupling Module (MGDM) that refines distinct feature perception abilities, improving animation fidelity by decoupling the character and background. We also propose a Pose Alignment Transition Algorithm (PATA) to ensure smooth action transition. Extensive experiment results demonstrate that our approach outperforms the state-of-the-art training-based methods in terms of character consistency and detail fidelity. Moreover, it maintains a high level of temporal coherence throughout the generated animations.

🧭 Keyword Pioneer — pose-controllable generation

🐣 Hot Topic Early Bird — temporal coherence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bingwen Zhu , Fanyi Wang , Tianyi Lu , Peng Liu , Jingwen Su , Jinxiu Liu , Yanhao Zhang , Zuxuan Wu , Guo-Jun Qi , Yu-Gang Jiang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Zero-Shot Learning

Keywords

zero-shot learning temporal coherence character animation image-to-video generation pose-controllable generation

Download PDF

Related papers

Langshaw: Declarative Interaction Protocols Based on Sayso and Conflict 2024

A Successful Strategy for Multichannel Iterated Prisoner’s Dilemma 2024

Bring Metric Functions into Diffusion Models 2024

Fast One-Stage Unsupervised Domain Adaptive Person Search 2024

FreqFormer: Frequency-aware Transformer for Lightweight Image Super-resolution 2024