Playing in stochastic environment: from multi-armed bandits to two-player games
Given a zero-sum infinite game we examine the question if players have optimal memoryless deterministic strategies. It turns out that under some general conditions the problem for two-player games can be reduced to the same problem for one-player games which in turn can be reduced to a simpler related problem for multi-armed bandits.
two-player zero-sum game
one-player zero-sum game
multi-armed bandit
memoryless deterministic strategy
65-72
Regular Paper
Wieslaw
Zielonka
Wieslaw Zielonka
10.4230/LIPIcs.FSTTCS.2010.65
Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported license
https://creativecommons.org/licenses/by-nc-nd/3.0/legalcode