Making Convolutional Networks Recurrent for Visual Sequence Learning

Xiaodong Yang; Pavlo Molchanov; Jan Kautz

2018 CVPR CVPR 2018

Making Convolutional Networks Recurrent for Visual Sequence Learning

Abstract

Recurrent neural networks (RNNs) have emerged as a powerful model for a broad range of machine learning problems that involve sequential data. While an abundance of work exists to understand and improve RNNs in the context of language and audio signals such as language modeling and speech recognition, relatively little attention has been paid to analyze or modify RNNs for visual sequences, which by nature have distinct properties. In this paper, we aim to bridge this gap and present the first large-scale exploration of RNNs for visual sequence learning. In particular, with the intention of leveraging the strong generalization capacity of pre-trained convolutional neural networks (CNNs), we propose a novel and effective approach, PreRNN, to make pre-trained CNNs recurrent by transforming convolutional layers or fully connected layers into recurrent layers. We conduct extensive evaluations on three representative visual sequence learning tasks: sequential face alignment, dynamic hand gesture recognition, and action recognition. Our experiments reveal that PreRNN consistently outperforms the traditional RNNs and achieves state-of-the-art results on the three applications, suggesting that PreRNN is more suitable for visual sequence learning.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — pre-trained model

🐣 Hot Topic Early Bird — pre-trained model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaodong Yang , Pavlo Molchanov , Jan Kautz

Topics

Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Pretraining Computer Vision > Analysis > Action Recognition Deep Learning > Learning Types > Representation Learning Deep Learning > Architectures > Convolutional Neural Networks Deep Learning > Architectures > Recurrent Neural Networks

Keywords

sequence modeling action recognition convolutional neural network recurrent neural network pre-trained model visual sequence learning sequential face alignment

Download PDF

Related papers

Multi-Shot Pedestrian Re-Identification via Sequential Decision Making 2018

Multi-Cue Correlation Filters for Robust Visual Tracking 2018

Pointwise Convolutional Neural Networks 2018

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking 2018

Image Generation From Scene Graphs 2018