Playing in stochastic environment: from multi-armed bandits to two-player games

Playing in stochastic environment: from multi-armed bandits to two-player games Given a zero-sum infinite game we examine the question if players have optimal memoryless deterministic strategies. It turns out that under some general conditions the problem for two-player games can be reduced to the same problem for one-player games which in turn can be reduced to a simpler related problem for multi-armed bandits. two-player zero-sum game one-player zero-sum game multi-armed bandit memoryless deterministic strategy 65-72 Regular Paper Wieslaw Zielonka Wieslaw Zielonka 10.4230/LIPIcs.FSTTCS.2010.65 Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported license https://creativecommons.org/licenses/by-nc-nd/3.0/legalcode