Unsupervised Learning of Long-Term Motion Dynamics for Videos

Zelun Luo; Boya Peng; De-An Huang; Alexandre Alahi; Li Fei-fei

2017 CVPR CVPR 2017

Unsupervised Learning of Long-Term Motion Dynamics for Videos

Abstract

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, depth, and RGB-D videos.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — temporal representation

🐣 Hot Topic Early Bird — motion estimation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zelun Luo , Boya Peng , De-An Huang , Alexandre Alahi , Li Fei-fei

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Unsupervised Learning Computer Vision > Analysis > Activity Recognition Deep Learning > Learning Types > Representation Learning Deep Learning > Architectures > Recurrent Neural Networks

Keywords

unsupervised learning representation learning motion estimation video understanding optical flow recurrent neural network motion dynamics temporal representation activity classification

Download PDF

Related papers

Deep Outdoor Illumination Estimation 2017

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild 2017

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition 2017

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization 2017