Ego-VPA: Egocentric Video Understanding with Parameter-Efficient Adaptation

Tz-Ying Wu; Kyle Min; Subarna Tripathi; Nuno Vasconcelos

2025 WACV WACV 2025

Ego-VPA: Egocentric Video Understanding with Parameter-Efficient Adaptation

Abstract

Video understanding typically requires fine-tuning the large backbone when adapting to new domains. In this paper we leverage the egocentric video foundation models (Ego-VFMs) based on video-language pre-training and propose a parameter-efficient adaptation for egocentric video tasks namely Ego-VPA. It employs a local sparse approximation for each video frame/text feature using the basis prompts and the selected basis prompts are used to synthesize video/text prompts. Since the basis prompts are shared across frames and modalities it models context fusion and cross-modal transfer in an efficient fashion. Experiments show that Ego-VPA excels in lightweight adaptation (with only 0.84% learnable parameters) largely improving over baselines and reaching the performance of full fine-tuning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — basis prompt

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tz-Ying Wu , Kyle Min , Subarna Tripathi , Nuno Vasconcelos

Topics

Artificial Intelligence > Core AI > Model Compression Artificial Intelligence > Learning Paradigms > Transfer Learning Deep Learning > Techniques > Model Architecture Computer Vision > Domain-Specific > Egocentric Vision Deep Learning > Learning Types > Transfer Learning

Keywords

few-shot learning transfer learning prompt learning efficient computing egocentric video parameter-efficient adaptation video-language model video-language pre-training basis prompt

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025