Parsing Videos of Actions with Segmental Grammars

Hamed Pirsiavash; Deva Ramanan

2014 CVPR CVPR 2014

Parsing Videos of Actions with Segmental Grammars

Abstract

Real-world videos of human activities exhibit temporal structure at various scales; long videos are typically composed out of multiple action instances, where each instance is itself composed of sub-actions with variable durations and orderings. Temporal grammars can presumably model such hierarchical structure, but are computationally difficult to apply for long video streams. We describe simple grammars that capture hierarchical temporal structure while admitting inference with a finite-state-machine. This makes parsing linear time, constant storage, and naturally online. We train grammar parameters using a latent structural SVM, where latent subactions are learned automatically. We illustrate the effectiveness of our approach over common baselines on a new half-million frame dataset of continuous YouTube videos.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Knowledge & Reasoning and Machine Learning

📈 Trend Setter — Automated Planning

🧭 Keyword Pioneer — finite state machine

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hamed Pirsiavash , Deva Ramanan

Topics

Artificial Intelligence > Core AI > Planning Machine Learning > Core Methods > Classification Knowledge & Reasoning > Reasoning > Automated Planning Computer Vision > Analysis > Video Understanding Machine Learning > Core Methods > Structured Prediction

Keywords

action recognition structured prediction structural learning latent variable finite state machine video parsing temporal grammar

Download PDF

Related papers

Efficient Nonlinear Markov Models for Human Motion 2014

Occlusion Geodesics for Online Multi-Object Tracking 2014

A Principled Approach for Coarse-to-Fine MAP Inference 2014

Locally Optimized Product Quantization for Approximate Nearest Neighbor Search 2014

Fast and Accurate Image Matching with Cascade Hashing for 3D Reconstruction 2014