X-MEN: guaranteed XOR-maximum entropy constrained inverse reinforcement learning

Fan Ding; Yexiang Xue

2022 UAI UAI 2022

X-MEN: guaranteed XOR-maximum entropy constrained inverse reinforcement learning

Abstract

Inverse Reinforcement Learning (IRL) is a powerful way of learning from demonstrations. In this paper, we address IRL problems with the availability of prior knowledge that optimal policies will never violate certain constraints. Conventional approaches ignoring these constraints need many demonstrations to converge. We propose XOR-Maximum Entropy Constrained Inverse Reinforcement Learning (X-MEN), which is guaranteed to converge to the global optimal reward function in linear rate w.r.t. the number of learning iterations. X-MEN embeds XOR-sampling – a provable sampling approach which transforms the #-P complete sampling problem into queries to NP oracles – into the framework of maximum entropy IRL. X-MEN also guarantees the learned IRL agent will never generate trajectories that violate constraints. Empirical results in navigation demonstrate that X-MEN converges faster to the optimal rewards compared to baseline approaches and always generates trajectories that satisfy multi-state combinatorial constraints.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fan Ding , Yexiang Xue

Topics

Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Inverse Reinforcement Learning

Keywords

maximum entropy inverse reinforcement learning constraint satisfaction reward function optimal policy

Download PDF

Related papers

Combating the instability of mutual information-based losses via regularization 2022

Future gradient descent for adapting the temporal shifting data distribution in online recommendation systems 2022

Privacy-aware compression for federated data analysis 2022

Fixing the Bethe approximation: How structural modifications in a graph improve belief propagation 2022

Probabilistic spatial transformer networks 2022