Bounded Rationality in Las Vegas: Probabilistic Finite Automata Play Multi-Armed Bandits

Xinming Liu; Joseph Halpern

2020 UAI UAI 2020

Bounded Rationality in Las Vegas: Probabilistic Finite Automata Play Multi-Armed Bandits

Abstract

While traditional economics assumes that humans are fully rational agents who always maximize their expected utility, in practice, we constantly observe apparently irrational behavior. One explanation is that people have limited computational power, so that they are, quite rationally, making the best decisions they can, given their computational limitations. To test this hypothesis, we consider the multi-armed bandit (MAB) problem. We examine a simple strategy for playing an MAB that can be implemented easily by a probabilistic finite automaton (PFA). Roughly speaking, the PFA sets certain expectations, and plays an arm as long as it meets them. If the PFA has sufficiently many states, it performs near-optimally. Its performance degrades gracefully as the number of states decreases. Moreover, the PFA acts in a "human-like" way, exhibiting a number of standard human biases, like an optimism bias and a negativity bias.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — optimism bia

🐣 Hot Topic Early Bird — decision making

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinming Liu , Joseph Halpern

Topics

Artificial Intelligence > Core AI > Game AI Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Game Theory Machine Learning > Learning Types > Multi-Armed Bandits Artificial Intelligence > Core AI > Game Theory

Keywords

online learning decision making bounded rationality multi-armed bandit optimism bia probabilistic finite automaton negativity bia

Download PDF

Related papers

Walking on Two Legs: Learning Image Segmentation with Noisy Labels 2020

Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward 2020

Automated Dependence Plots 2020

Collapsible IDA: Collapsing Parental Sets for Locally Estimating Possible Causal Effects 2020

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect 2020