Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition

Deepti Ghadiyaram; Du Tran; Dhruv Mahajan

2019 CVPR CVPR 2019

Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition

Abstract

Current fully-supervised video datasets consist of only a few hundred thousand videos and fewer than a thousand domain-specific labels. This hinders the progress towards advanced video architectures. This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition. Our primary empirical finding is that pre-training at a very large scale (over 65 million videos), despite on noisy social-media videos and hashtags, substantially improves the state-of-the-art on three challenging public action recognition datasets. Further, we examine three questions in the construction of weakly-supervised video action datasets. First, given that actions involve interactions with objects, how should one construct a verb-object pre-training label space to benefit transfer learning the most? Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning? Finally, actions are generally less well-localized in long videos vs. short videos; since action labels are provided at a video level, how should one choose video clips for best performance, given some fixed budget of number or minutes of videos?

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — web-scale datum

🐣 Hot Topic Early Bird — weakly-supervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Deepti Ghadiyaram , Du Tran , Dhruv Mahajan

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Deep Learning > Techniques > Pretraining Computer Vision > Analysis > Action Recognition Machine Learning > Learning Paradigms > Transfer Learning Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Weakly Supervised Learning

Keywords

action recognition transfer learning domain adaptation weakly-supervised learning image feature video action recognition web-scale datum video pre-training

Download PDF

Related papers

Fast Single Image Reflection Suppression via Convex Optimization 2019

Learning Video Representations From Correspondence Proposals 2019

ATOM: Accurate Tracking by Overlap Maximization 2019

Visual Tracking via Adaptive Spatially-Regularized Correlation Filters 2019

Edge-Labeling Graph Neural Network for Few-Shot Learning 2019