Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

Chuming Lin; Chengming Xu; Donghao Luo; Yabiao Wang; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Yanwei Fu

2021 CVPR CVPR 2021

Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

Abstract

Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video. While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3. Our code will be made available upon publication.

🐣 Hot Topic Early Bird — temporal action localization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chuming Lin , Chengming Xu , Donghao Luo , Yabiao Wang , Ying Tai , Chengjie Wang , Jilin Li , Feiyue Huang , Yanwei Fu

Topics

Computer Vision > Analysis > Action Recognition Computer Vision > Analysis > Object Detection Computer Vision > Processing > Video Understanding Computer Vision > Analysis > Video Understanding

Keywords

action recognition video understanding video analysis boundary detection anchor-free detection temporal action localization

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021