Learning Maximum Margin Temporal Warping for Action Recognition

Jiang Wang; Ying Wu

2013 ICCV ICCV 2013

Learning Maximum Margin Temporal Warping for Action Recognition

Abstract

Temporal misalignment and duration variation in video actions largely influence the performance of action recognition, but it is very difficult to specify effective temporal alignment on action sequences. To address this challenge, this paper proposes a novel discriminative learning-based temporal alignment method, called maximum margin temporal warping (MMTW), to align two action sequences and measure their matching score. Based on the latent structure SVM formulation, the proposed MMTW method is able to learn a phantom action template to represent an action class for maximum discrimination against other classes. The recognition of this action class is based on the associated learned alignment of the input action. Extensive experiments on five benchmark datasets have demonstrated that this MMTW model is able to significantly promote the accuracy and robustness of action recognition under temporal misalignment and variations.

🚀 Conference Pioneer — ICCV 2013

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — latent structure svm

🐣 Hot Topic Early Bird — temporal alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiang Wang , Ying Wu

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Optimization Computer Vision > Analysis > Action Recognition

Keywords

action recognition video analysis maximum margin temporal alignment support vector machine latent structure latent structure svm temporal warping

Download PDF

Related papers

Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences 2013

Cascaded Shape Space Pruning for Robust Facial Landmark Detection 2013

Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach 2013

Accurate and Robust 3D Facial Capture Using a Single RGBD Camera 2013

From Where and How to What We See 2013