Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning

Xiaohan Zou; Wenchao Ma; Shu Zhao

2025 CVPR CVPR 2025

Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning

Abstract

Recent advancements in prompt-based learning have significantly advanced image and video class-incremental learning. However, the prompts learned by these methods often fail to capture the diverse and informative characteristics of videos, and struggle to generalize effectively to future tasks and classes. To address these challenges, this paper proposes modeling the distribution of space-time prompts conditioned on the input video using a diffusion model. This generative approach allows the proposed model to naturally handle the diverse characteristics of videos, leading to more robust prompt learning and enhanced generalization capabilities. Additionally, we develop a simple yet effective mechanism to transfer the token relationship modeling capabilities of pre-trained image transformers to spatio-temporal modeling in videos. Our approach has been thoroughly evaluated across four established benchmarks, showing remarkable improvements over existing state-of-the-art methods in video class-incremental learning.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — space-time prompt

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaohan Zou , Wenchao Ma , Shu Zhao

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Continual Learning Deep Learning > Models > Diffusion Models Computer Vision > Processing > Video Understanding Machine Learning > Learning Paradigms > Transfer Learning Computer Vision > Analysis > Video Understanding Machine Learning > Learning Paradigms > Continual Learning Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Continual Learning

Keywords

representation learning continual learning video classification class-incremental learning prompt learning diffusion model video diffusion video representation vision language video class-incremental learning space-time prompt

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025