Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection

Yunlu Xu; Chengwei Zhang; Zhanzhan Cheng; Jianwen Xie; Yi Niu; Shiliang Pu; Fei Wu

2019 AAAI AAAI 2019

Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection

Abstract

Abstract This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weights and thus helps the noise relieving from background or other actions. Secondly, we build temporal relationship between actions by feeding the assembled features into an enhanced recurrent neural network. Finally, we transform the output of recurrent neural network into the corresponding action distribution. In order to generate more precise temporal proposals, we design a score term called segregated temporal gradient-weighted class activation mapping (ST-GradCAM) fused with attention weights. Experiments on THUMOS’14 and ActivityNet1.3 datasets show that our approach outperforms the state-of-theart weakly-supervised method, and performs at par with the fully-supervised counterparts.

🚀 Conference Pioneer — AAAI 2019

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — temporal assembly

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yunlu Xu , Chengwei Zhang , Zhanzhan Cheng , Jianwen Xie , Yi Niu , Shiliang Pu , Fei Wu

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Weakly Supervised Learning Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Action Recognition Deep Learning > Learning Types > Weakly Supervised Learning Deep Learning > Architectures > Recurrent Neural Networks

Keywords

video classification attention mechanism weakly supervised learning video understanding recurrent neural network action segmentation temporal localization action detection temporal assembly temporal gradient

Download PDF

Related papers

Cooperative Multimodal Approach to Depression Detection in Twitter 2019

Learning to Align Question and Answer Utterances in Customer Service Conversation with Recurrent Pointer Networks 2019

Community Detection in Social Networks Considering Topic Correlations 2019

Session-Based Recommendation with Graph Neural Networks 2019

Blameworthiness in Multi-Agent Settings 2019