Risk-Averse Stochastic Convex Bandit

Adrian Rivera Cardoso; Huan Xu

2019 AISTATS AISTATS 2019

Risk-Averse Stochastic Convex Bandit

Abstract

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — stochastic convex bandit

🐣 Hot Topic Early Bird — online convex optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Adrian Rivera Cardoso , Huan Xu

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Machine Learning > Application Areas > Risk Management Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

online convex optimization bandit feedback risk aversion regret bound ellipsoid method stochastic convex bandit

Download PDF

Related papers

Inferring Multidimensional Rates of Aging from Cross-Sectional Data 2019

On the Interaction Effects Between Prediction and Clustering 2019

Efficient Linear Bandits through Matrix Sketching 2019

An Optimal Algorithm for Stochastic Three-Composite Optimization 2019

Efficient Inference in Multi-task Cox Process Models 2019