Deep Learning of Invariant Features via Simulated Fixations in Video

Will Zou; Shenghuo Zhu; Kai Yu; Andrew Y. Ng

2012 NIPS NeurIPS 2012

Deep Learning of Invariant Features via Simulated Fixations in Video

Abstract

We apply salient feature detection and tracking in videos to simulate ﬁxations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classiﬁcation accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Semantic Segmentation

🧭 Keyword Pioneer — video understanding

🐣 Hot Topic Early Bird — deep learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Will Zou , Shenghuo Zhu , Kai Yu , Andrew Y. Ng

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Object Detection Computer Vision > Analysis > Semantic Segmentation Deep Learning > Learning Types > Self-Supervised Learning Computer Vision > Core AI > Computer Vision Deep Learning > Learning Types > Representation Learning Computer Vision > Analysis > Image Classification

Keywords

image classification feature learning computer vision video understanding deep learning video analysis hierarchical network invariant feature learning hierarchical feature invariant feature

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012