Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

Arash Vahdat; Kevin Cannons; Greg Mori; Sangmin Oh; Ilseo Kim

2013 ICCV ICCV 2013

Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

Abstract

We present a compositional model for video event detection. A video is modeled using a collection of both global and segment-level features and kernel functions are employed for similarity comparisons. The locations of salient, discriminative video segments are treated as a latent variable, allowing the model to explicitly ignore portions of the video that are unimportant for classification. A novel, multiple kernel learning (MKL) latent support vector machine (SVM) is defined, that is used to combine and re-weight multiple feature types in a principled fashion while simultaneously operating within the latent variable framework. The compositional nature of the proposed model allows it to respond directly to the challenges of temporal clutter and intra-class variation, which are prevalent in unconstrained internet videos. Experimental results on the TRECVID Multimedia Event Detection 2011 (MED11) dataset demonstrate the efficacy of the method.

🚀 Conference Pioneer — ICCV 2013

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — video event detection

🐣 Hot Topic Early Bird — latent variable model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Arash Vahdat , Kevin Cannons , Greg Mori , Sangmin Oh , Ilseo Kim

Topics

Machine Learning > Core Methods > Classification Computer Vision > Analysis > Semantic Segmentation Computer Vision > Processing > Video Understanding Machine Learning > Core Methods > Kernel Methods Computer Vision > Analysis > Video Understanding

Keywords

multiple kernel learning support vector machine latent variable model latent variable compositional model video event detection temporal clutter

Download PDF

Related papers

Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences 2013

Cascaded Shape Space Pruning for Robust Facial Landmark Detection 2013

Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach 2013

Accurate and Robust 3D Facial Capture Using a Single RGBD Camera 2013

From Where and How to What We See 2013