Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts

Hamoon Azizsoltani; Yeo Jin Kim; Markel Sanz Ausin; Tiffany Barnes; Min Chi

2019 IJCAI IJCAI 2019

Unobserved Is Not Equal to Non-existent: Using Gaussian Processes to Infer Immediate Rewards Across Contexts

Abstract

Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GP-inferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — offline reinforcement learning

🐣 Hot Topic Early Bird — offline reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Hamoon Azizsoltani , Yeo Jin Kim , Markel Sanz Ausin , Tiffany Barnes , Min Chi

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Reinforcement Learning > Methods > Offline RL

Keywords

offline reinforcement learning gaussian processes credit assignment deep q-network reward inference

Download PDF

Related papers

Causal Embeddings for Recommendation: An Extended Abstract 2019

Pivotal Relationship Identification: The K-Truss Minimization Problem 2019

Portioning Using Ordinal Preferences: Fairness and Efficiency 2019

Probabilistic Strategy Logic 2019

Multi-Agent Pathfinding with Continuous Time 2019