LIPIcs.ICALP.2024.147.pdf
- Filesize: 0.74 MB
- 16 pages
We devise a policy-iteration algorithm for deterministic two-player discounted and mean-payoff games, that runs in polynomial time with high probability, on any input where each payoff is chosen independently from a sufficiently random distribution and the underlying graph of the game is ergodic. This includes the case where an arbitrary set of payoffs has been perturbed by a Gaussian, showing for the first time that deterministic two-player games can be solved efficiently, in the sense of smoothed analysis. More generally, we devise a condition number for deterministic discounted and mean-payoff games played on ergodic graphs, and show that our algorithm runs in time polynomial in this condition number. Our result confirms a previous conjecture of Boros et al., which was claimed as a theorem [Boros et al., 2011] and later retracted [Boros et al., 2018]. It stands in contrast with a recent counter-example by Christ and Yannakakis [Christ and Yannakakis, 2023], showing that Howard’s policy-iteration algorithm does not run in smoothed polynomial time on stochastic single-player mean-payoff games. Our approach is inspired by the analysis of random optimal assignment instances by Frieze and Sorkin [Frieze and Sorkin, 2007], and the analysis of bias-induced policies for mean-payoff games by Akian, Gaubert and Hochart [Akian et al., 2018].
Feedback for Dagstuhl Publishing