Playing in stochastic environment: from multi-armed bandits to two-player games

eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2010-12-14 65 72 10.4230/LIPIcs.FSTTCS.2010.65 article Playing in stochastic environment: from multi-armed bandits to two-player games Zielonka, Wieslaw Given a zero-sum infinite game we examine the question if players have optimal memoryless deterministic strategies. It turns out that under some general conditions the problem for two-player games can be reduced to the same problem for one-player games which in turn can be reduced to a simpler related problem for multi-armed bandits. https://drops.dagstuhl.de/storage/00lipics/lipics-vol008-fsttcs2010/LIPIcs.FSTTCS.2010.65/LIPIcs.FSTTCS.2010.65.pdf two-player zero-sum game one-player zero-sum game multi-armed bandit memoryless deterministic strategy