2012
NIPS
NeurIPS 2012
Deep Learning of Invariant Features via Simulated Fixations in Video
Abstract
We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classification accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset.
🌉
Interdisciplinary Bridge
— Computer Vision and Deep Learning and Machine Learning
📈
Trend Setter
— Semantic Segmentation
🧭
Keyword Pioneer
— video understanding
🐣
Hot Topic Early Bird
— deep learning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Representation Learning
Machine Learning > Learning Types > Self-Supervised Learning
Deep Learning > Architectures > Neural Networks
Computer Vision > Analysis > Object Detection
Computer Vision > Analysis > Semantic Segmentation
Deep Learning > Learning Types > Self-Supervised Learning
Computer Vision > Core AI > Computer Vision
Deep Learning > Learning Types > Representation Learning
Computer Vision > Analysis > Image Classification