Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization

Kun Xia; Le Wang; Sanping Zhou; Nanning Zheng; Wei Tang

2022 CVPR CVPR 2022

Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization

Abstract

The main challenge of Temporal Action Localization is to retrieve subtle human actions from various co-occurring ingredients, e.g., context and background, in an untrimmed video. While prior approaches have achieved substantial progress through devising advanced action detectors, they still suffer from these co-occurring ingredients which often dominate the actual action content in videos. In this paper, we explore two orthogonal but complementary aspects of a video snippet, i.e., the action features and the co-occurrence features. Especially, we develop a novel auxiliary task by decoupling these two types of features within a video snippet and recombining them to generate a new feature representation with more salient action information for accurate action localization. We term our method RefactorNet, which first explicitly factorizes the action content and regularizes its co-occurrence features, and then synthesizes a new action-dominated video representation. Extensive experimental results and ablation studies on THUMOS14 and ActivityNet v1.3 demonstrate that our new representation, combined with a simple action detector, can significantly improve the action localization performance.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — feature refactoring

🐣 Hot Topic Early Bird — temporal action localization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kun Xia , Le Wang , Sanping Zhou , Nanning Zheng , Wei Tang

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Analysis > Action Recognition Computer Vision > Processing > Video Understanding Deep Learning > Learning Types > Representation Learning

Keywords

action recognition feature extraction temporal action localization video representation feature refactoring co-occurrence feature

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022