ReMask-Animate: Refined Character Image Animation Using Mask-Guided Adapters

Xunzhi Xiang; Haiwei Xue; Zonghong Dai; Di Wang; Minglei Li; Ye Yue; Fei Ma; Weijiang Yu; Heng Chang; Fei Richard Yu

2025 AAAI AAAI 2025

ReMask-Animate: Refined Character Image Animation Using Mask-Guided Adapters

Abstract

Abstract Pose-controlled human video generation is of significant interest and finds extensive applications in areas such as automated advertising and content creation on social media platforms. While existing methods employing pose sequences and reference images for human image animation have exhibited notable performance, they tend to encounter issues such as specific region blurring, background sharpening, and decreased identity consistency. In this paper, we introduce ReMask-Animate, which utilizes masks as additional priors to guide the model's local visual attention to specific areas, thereby alleviating feature confusion between different regions of the image. Three distinct mask-guided adapters are designed for cross-condition regional fusion of hand and face pose features, mitigating feature confusion between the foreground and background, and enhancing the visual consistency of character identity. Moreover, these lightweight adapters introduce minimal computational overhead and can be seamlessly integrated into specific layers of the backbone architecture. Extensive experiments show that our method outperforms state-of-the-art methods on five metrics in public datasets. Additionally, qualitative evaluations highlight a significant improvement in the quality of generated videos, demonstrating our approach's superiority.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — pose-controlled human video generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Speech & Audio

Authors

Xunzhi Xiang , Haiwei Xue , Zonghong Dai , Di Wang , Minglei Li , Ye Yue , Fei Ma , Weijiang Yu , Heng Chang , Fei Richard Yu

Topics

Computer Vision > Generation > Video Generation Computer Vision > Processing > Video Processing Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Applications > Computer Vision

Keywords

pose sequence character animation image animation pose-controlled human video generation character image animation identity consistency pose-controlled generation pose-controlled video generation mask-guided adapter

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025