Dynamic-Depth Context Tree Weighting

Shimon Whiteson; João V. Messias; Joao V Messias

2017 NIPS NeurIPS 2017

Dynamic-Depth Context Tree Weighting

Abstract

Reinforcement learning (RL) in partially observable settings is challenging because the agent’s observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing between observation histories due to sensor noise. This paper proposes dynamic-depth context tree weighting (D2-CTW), a model-learning method that addresses these limitations. D2-CTW dynamically expands a suffix tree while ensuring that the size of the model, but not its depth, remains bounded. We show that D2-CTW approximately matches the performance of state-of-the-art alternatives at stochastic time-series prediction while using at least an order of magnitude less memory. We also apply D2-CTW to model-based RL, showing that, on tasks that require memory of past observations, D2-CTW can learn without prior knowledge of a good state representation, or even the length of history upon which such a representation should depend.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

📈 Trend Setter — Sequence Modeling

🧭 Keyword Pioneer — dynamic-depth context tree weighting

🐣 Hot Topic Early Bird — model-based reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

João V. Messias , Joao V Messias , Shimon Whiteson

Topics

Machine Learning > Core Methods > Representation Learning Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Sequence Modeling

Keywords

suffix trees model-based reinforcement learning partial observability state representation learning partially observable environment dynamic-depth context tree weighting variable-order markov model

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017