DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Carles Gelada; Saurabh Kumar; Jacob Buckman; Ofir Nachum; Marc G. Bellemare

2019 ICML ICML 2019

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Abstract

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a \texit{DeepMDP}, a parameterized latent space model that is trained via the minimization of two tractable latent space losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the embedding function as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — deep rl

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Carles Gelada , Saurabh Kumar , Jacob Buckman , Ofir Nachum , Marc G. Bellemare

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Deep Learning > Learning Types > Representation Learning

Keywords

representation learning state representation model-based reinforcement learning reward prediction latent space deep rl auxiliary task latent space model

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019