Active Learning of an Action Detector from Untrimmed Videos

Sunil Bandla; Kristen Grauman

2013 ICCV ICCV 2013

Active Learning of an Action Detector from Untrimmed Videos

Abstract

Collecting and annotating videos of realistic human actions is tedious, yet critical for training action recognition systems. We propose a method to actively request the most useful video annotations among a large set of unlabeled videos. Predicting the utility of annotating unlabeled video is not trivial, since any given clip may contain multiple actions of interest, and it need not be trimmed to temporal regions of interest. To deal with this problem, we propose a detection-based active learner to train action category models. We develop a voting-based framework to localize likely intervals of interest in an unlabeled clip, and use them to estimate the total reduction in uncertainty that annotating that clip would yield. On three datasets, we show our approach can learn accurate action detectors more efficiently than alternative active learning strategies that fail to accommodate the "untrimmed" nature of real video data.

🚀 Conference Pioneer — ICCV 2013

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning

📈 Trend Setter — Active Learning

🧭 Keyword Pioneer — untrimmed video

🐣 Hot Topic Early Bird — uncertainty estimation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sunil Bandla , Kristen Grauman

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Active Learning Computer Vision > Analysis > Action Recognition Artificial Intelligence > Learning Paradigms > Active Learning

Keywords

active learning action recognition video annotation uncertainty estimation untrimmed video temporal localization action detection

Download PDF

Related papers

Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences 2013

Cascaded Shape Space Pruning for Robust Facial Landmark Detection 2013

Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach 2013

Accurate and Robust 3D Facial Capture Using a Single RGBD Camera 2013

From Where and How to What We See 2013