2015 CVPR CVPR 2015

Beyond Short Snippets: Deep Networks for Video Classification

Abstract

Convolutional neural networks (CNNs) have been exten- sively applied for image recognition problems giving state- of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep neural network architectures to combine image infor- mation across a video over longer time periods than previ- ously attempted. We propose two methods capable of han- dling full length videos. The first method explores various convolutional temporal feature pooling architectures, ex- amining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN. Our best networks exhibit significant performance improve- ments over previously published results on the Sports 1 mil- lion dataset (73.1% vs. 60.9%) and the UCF-101 datasets with (88.2% vs. 87.9%) and without additional optical flow information (82.6% vs. 72.8%).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning
📈 Trend Setter — Transformers
🧭 Keyword Pioneer — temporal feature pooling
🐣 Hot Topic Early Bird — long short-term memory
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio