Boosting Semi-Supervised Video Action Detection with Temporal Context

Donghyeon Kwon; Inho Kim; Suha Kwak

2025 WACV WACV 2025

Boosting Semi-Supervised Video Action Detection with Temporal Context

Abstract

This paper studies semi-supervised learning of video action detection (VAD) which assumes that only a small portion of training videos are labeled and the others remain unlabeled. The existing semi-supervised methods for VAD mainly focus on leveraging spatial context of unlabeled video lacking its exploration of temporal context. To resolve this we present a novel semi-supervised learning framework that effectively incorporates spatio-temporal context during training. We first introduce a new augmentation strategy called temporal cross-view augmentation to achieve robust representation across clips depicting the same action but not aligned on the time axis. We also propose a new context fusion method called global-local context fusion that effectively utilizes the spatio-temporal context of videos to enhances the features of each frame by incorporating those of other frames within a clip; this method aids in actively leveraging spatio-temporal context of video leading to significant performance improvement. Our framework was evaluated on UCF101-24 and JHMDB-21 where it outperformed all existing methods in every evaluation setting.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Donghyeon Kwon , Inho Kim , Suha Kwak

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Computer Vision > Analysis > Action Recognition Computer Vision > Processing > Video Understanding Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Paradigms > Semi-Supervised Learning

Keywords

semi-supervised learning action recognition spatio-temporal learning temporal context spatio-temporal context video action detection video augmentation

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025