Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

S. M. Ali Eslami; Nicolas Heess; Theophane Weber; Yuval Tassa; David Szepesvari; Koray Kavukcuoglu; Geoffrey E. Hinton

2016 NIPS NeurIPS 2016

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

Abstract

We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network that attends to scene elements and processes them one at a time. Crucially, the model itself learns to choose the appropriate number of inference steps. We use this scheme to learn to perform inference in partially specified 2D models (variable-sized variational auto-encoders) and fully specified 3D models (probabilistic renderers). We show that such models learn to identify multiple objects - counting, locating and classifying the elements of a scene - without any supervision, e.g., decomposing 3D images with various numbers of objects in a single forward pass of a neural network at unprecedented speed. We further show that the networks produce accurate inferences when compared to supervised counterparts, and that their structure leads to improved generalization.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Generative Models

🧭 Keyword Pioneer — probabilistic renderer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

S. M. Ali Eslami , Nicolas Heess , Theophane Weber , Yuval Tassa , David Szepesvari , Koray Kavukcuoglu , Geoffrey E. Hinton

Topics

Artificial Intelligence > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Learning Types > Unsupervised Learning Computer Vision > Analysis > Scene Understanding Computer Vision > Core AI > Computer Vision Deep Learning > Learning Types > Generative Models

Keywords

scene understanding probabilistic inference attention mechanism generative model recurrent neural network variational auto-encoder probabilistic renderer

Download PDF

Related papers

Bayesian Intermittent Demand Forecasting for Large Inventories 2016

Dynamic Network Surgery for Efficient DNNs 2016

Beyond Exchangeability: The Chinese Voting Process 2016

Safe and Efficient Off-Policy Reinforcement Learning 2016

Tagger: Deep Unsupervised Perceptual Grouping 2016