Learning to Explore and Exploit in POMDPs

Chenghui Cai; Xuejun Liao; Lawrence Carin

2009 NIPS NeurIPS 2009

Learning to Explore and Exploit in POMDPs

Abstract

A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem. Theoretical guarantees are provided concerning the optimality of the balancing of exploration and exploitation. The effectiveness of the method is demonstrated by experimental results on benchmark problems.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — dual-policy method

🐣 Hot Topic Early Bird — active learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

📈 Trend Setter — Exploration-Exploitation

Authors

Chenghui Cai , Xuejun Liao , Lawrence Carin

Topics

Artificial Intelligence > Core AI > Planning Machine Learning > Learning Types > Active Learning Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Exploration-Exploitation

Keywords

active learning reinforcement learning policy learning exploration-exploitation tradeoff partially observable markov decision process exploration exploitation dual-policy method pomdp dual-policy learning partially observable environment exploration-exploitation trade-off

Download PDF

Related papers

Solving Stochastic Games 2009

Bilinear classifiers for visual recognition 2009

Zero-shot Learning with Semantic Output Codes 2009

Matrix Completion from Power-Law Distributed Samples 2009

Heavy-Tailed Symmetric Stochastic Neighbor Embedding 2009