OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Jinyi Liu; Zhi Wang; YAN ZHENG; Jianye Hao; Chenjia Bai; Junjie Ye; Zhen Wang; haiyin piao; Yang Sun

2024 AAAI AAAI 2024

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Abstract

Abstract In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — noise-aware exploration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jinyi Liu , Zhi Wang , YAN ZHENG , Jianye Hao , Chenjia Bai , Junjie Ye , Zhen Wang , haiyin piao , Yang Sun

Topics

Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Multi-Agent Systems

Keywords

reinforcement learning uncertainty quantification continuous control exploration strategy optimism-based exploration optimism in face of uncertainty value distribution noise-aware exploration

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024