Condensing Action Segmentation Datasets via Generative Network Inversion

Guodong Ding; Rongyu Chen; Angela Yao

2025 CVPR CVPR 2025

Condensing Action Segmentation Datasets via Generative Network Inversion

Abstract

This work presents the first condensation approach for procedural video datasets used in temporal action segmentation. We propose a condensation framework that leverages generative prior learned from the dataset and network inversion to condense data into compact latent codes with significant storage reduced across temporal and channel aspects. Orthogonally, we propose sampling diverse and representative action sequences to minimize video-wise redundancy. Our evaluation on standard benchmarks demonstrates consistent effectiveness in condensing TAS datasets and achieving competitive performances. Specifically, on the Breakfast dataset, our approach reduces storage by over 500xwhile retaining 83% of the performance compared to training with the full dataset. Furthermore, when applied to a downstream incremental learning task, it yields superior performance compared to the state-of-the-art.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — generative network inversion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Guodong Ding , Rongyu Chen , Angela Yao

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Efficient Computing Deep Learning > Models > Generative Models Computer Vision > Analysis > Action Recognition Computer Vision > Processing > Video Processing Machine Learning > Application Areas > Model Compression Machine Learning > Learning Types > Representation Learning Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Generative Models

Keywords

video understanding generative model temporal action segmentation video dataset action segmentation dataset condensation latent code data condensation network inversion generative network inversion

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025