SSCAP: Self-Supervised Co-Occurrence Action Parsing for Unsupervised Temporal Action Segmentation

Zhe Wang; Hao Chen; Xinyu Li; Chunhui Liu; Yuanjun Xiong; Joseph Tighe; Charless Fowlkes

2022 WACV WACV 2022

SSCAP: Self-Supervised Co-Occurrence Action Parsing for Unsupervised Temporal Action Segmentation

Abstract

Temporal action segmentation is a task to classify each frame in the video with an action label. However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset. Thus in this work we propose an unsupervised method, namely SSCAP, that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos. SSCAP leverages Self-Supervised learning to extract distinguishable features and then applies a novel Co-occurrence Action Parsing algorithm to not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal path of the sub-actions in an accurate and general way. We evaluate on both classic datasets (Breakfast, 50Salads) and the emerging fine-grained action dataset (FineGym) with more complex activity structures and similar sub-actions. Results show that SSCAP achieves state-of-the-art performance on all datasets and can even outperform some weakly-supervised approaches, demonstrating its effectiveness and generalizability.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — co-occurrence action parsing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhe Wang , Hao Chen , Xinyu Li , Chunhui Liu , Yuanjun Xiong , Joseph Tighe , Charless Fowlkes

Topics

Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > Action Recognition

Keywords

unsupervised learning self-supervised learning video understanding temporal action segmentation co-occurrence action parsing action label classification

Download PDF

Related papers

A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation 2022

Unsupervised Sounding Object Localization With Bottom-Up and Top-Down Attention 2022

Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation 2022

Deep Photo Scan: Semi-Supervised Learning for Dealing With the Real-World Degradation in Smartphone Photo Scanning 2022

Let There Be a Clock on the Beach: Reducing Object Hallucination in Image Captioning 2022