MatAnyone: Stable Video Matting with Consistent Memory Propagation

Peiqing Yang; Shangchen Zhou; Jixin Zhao; Qingyi Tao; Chen Change Loy

2025 CVPR CVPR 2025

MatAnyone: Stable Video Matting with Consistent Memory Propagation

Abstract

Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To tackle this, we propose MatAnyone, a practical framework designed for target-assigned video matting. Specifically, building on a memory-based framework, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively combines memory from the previous frame. This ensures stable semantic consistency in core regions while maintaining fine details along object boundaries. For robust training, we present a larger, high-quality, and diverse dataset for video matting. Additionally, we incorporate a novel training strategy that efficiently leverages large-scale segmentation data, further improving matting stability. With this new network design, dataset, and training strategy, MatAnyone delivers robust, accurate video matting in diverse real-world scenarios, outperforming existing methods. The code and model will be publicly available.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — consistent memory propagation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Peiqing Yang , Shangchen Zhou , Jixin Zhao , Qingyi Tao , Chen Change Loy

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Domain Adaptation Computer Vision > Analysis > Semantic Segmentation Computer Vision > Generation > Image Generation Computer Vision > Processing > Video Processing Computer Vision > Processing > Semantic Segmentation Deep Learning > Learning Types > Self-Supervised Learning

Keywords

semantic segmentation object detection memory network object boundary memory propagation video matting human video matting consistent memory propagation

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025