Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Guoxi Zhang; Hisashi Kashima

2023 AAAI AAAI 2023

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Abstract

Abstract Offline reinforcement learning (RL) have received rising interest due to its appealing data efficiency. The present study addresses behavior estimation, a task that aims at estimating the data-generating policy. In particular, this work considers a scenario where data are collected from multiple sources. Neglecting data heterogeneity, existing approaches cannot provide good estimates and impede policy learning. To overcome this drawback, the present study proposes a latent variable model and a model-learning algorithm to infer a set of policies from data, which allows an agent to use as behavior policy the policy that best describes a particular trajectory. To illustrate the benefit of such a fine-grained characterization for multi-source data, this work showcases how the proposed model can be incorporated into an existing offline RL algorithm. Lastly, with extensive empirical evaluation this work confirms the risks of neglecting data heterogeneity and the efficacy of the proposed model.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — policy inference

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Guoxi Zhang , Hisashi Kashima

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Core Methods > Clustering Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Multi-Agent Systems

Keywords

offline reinforcement learning policy learning latent variable model data heterogeneity policy inference multi-source datum behavior estimation

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023