Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates

xue wang; Mingcheng Wei; Tao Yao

2018 ICML ICML 2018

Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates

Abstract

In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — high-dimensional covariate

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

xue wang , Mingcheng Wei , Tao Yao

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Core Methods > Regression Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Core Methods > Feature Selection Machine Learning > Learning Types > Online Learning Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning multi-armed bandit sparse structure sparse feature high-dimensional covariate minimax concave penalty

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018