Beyond Short Snippets: Deep Networks for Video Classification

Joe Yue-Hei Ng; Matthew Hausknecht; Sudheendra Vijayanarasimhan; Oriol Vinyals; Rajat Monga; George Toderici

2015 CVPR CVPR 2015

Beyond Short Snippets: Deep Networks for Video Classification

Abstract

Convolutional neural networks (CNNs) have been exten- sively applied for image recognition problems giving state- of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep neural network architectures to combine image infor- mation across a video over longer time periods than previ- ously attempted. We propose two methods capable of han- dling full length videos. The first method explores various convolutional temporal feature pooling architectures, ex- amining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN. Our best networks exhibit significant performance improve- ments over previously published results on the Sports 1 mil- lion dataset (73.1% vs. 60.9%) and the UCF-101 datasets with (88.2% vs. 87.9%) and without additional optical flow information (82.6% vs. 72.8%).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

📈 Trend Setter — Transformers

🧭 Keyword Pioneer — temporal feature pooling

🐣 Hot Topic Early Bird — long short-term memory

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Joe Yue-Hei Ng , Matthew Hausknecht , Sudheendra Vijayanarasimhan , Oriol Vinyals , Rajat Monga , George Toderici

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks

Keywords

video classification convolutional neural network long short-term memory recurrent neural network temporal feature pooling

Download PDF

Related papers

Long-Term Correlation Tracking 2015

Hierarchically-Constrained Optical Flow 2015

Propagated Image Filtering 2015

Web Scale Photo Hash Clustering on A Single Machine 2015

Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos 2015