Social Scene Understanding: End-To-End Multi-Person Action Localization and Collective Activity Recognition

Timur Bagautdinov; Alexandre Alahi; Francois Fleuret; Pascal Fua; Silvio Savarese

2017 CVPR CVPR 2017

Social Scene Understanding: End-To-End Multi-Person Action Localization and Collective Activity Recognition

Abstract

We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps that are refined via a novel inference scheme. The temporal consistency is handled via a person-level matching Recurrent Neural Network. The complete model takes as input a sequence of frames and outputs detections along with the estimates of individual actions and collective activities. We demonstrate state-of-the-art performance of our algorithm on multiple publicly available benchmarks.

🧭 Keyword Pioneer — collective activity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Timur Bagautdinov , Alexandre Alahi , Francois Fleuret , Pascal Fua , Silvio Savarese

Topics

Computer Vision > Analysis > Action Recognition Computer Vision > Analysis > Activity Recognition Computer Vision > Analysis > Human Analysis Computer Vision > Analysis > Object Detection

Keywords

action recognition human analysis action localization recurrent neural network social behavior multi-person detection collective activity social scene understanding collective activity recognition

Download PDF

Related papers

Deep Outdoor Illumination Estimation 2017

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild 2017

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition 2017

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization 2017