Learning with Bandit Feedback in Potential Games

Amélie Heliou; Johanne Cohen; Panayotis Mertikopoulos

2017 NIPS NeurIPS 2017

Learning with Bandit Feedback in Potential Games

Abstract

This paper examines the equilibrium convergence properties of no-regret learning with exponential weights in potential games. To establish convergence with minimal information requirements on the players' side, we focus on two frameworks: the semi-bandit case (where players have access to a noisy estimate of their payoff vectors, including strategies they did not play), and the bandit case (where players are only able to observe their in-game, realized payoffs). In the semi-bandit case, we show that the induced sequence of play converges almost surely to a Nash equilibrium at a quasi-exponential rate. In the bandit case, the same result holds for approximate Nash equilibria if we introduce a constant exploration factor that guarantees that action choice probabilities never become arbitrarily small. In particular, if the algorithm is run with a suitably decreasing exploration factor, the sequence of play converges to a bona fide Nash equilibrium with probability 1.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Mathematics & Optimization

🐣 Hot Topic Early Bird — nash equilibrium

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Amélie Heliou , Johanne Cohen , Panayotis Mertikopoulos

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Mathematics & Optimization > Optimization > Online Algorithms

Keywords

nash equilibrium bandit feedback no-regret learning potential game multi-agent system

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017