Smoothed Analysis of Deterministic Discounted and Mean-Payoff Games

Authors Bruno Loff , Mateusz Skomra



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2024.147.pdf
  • Filesize: 0.74 MB
  • 16 pages

Document Identifiers

Author Details

Bruno Loff
  • LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
Mateusz Skomra
  • LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France

Acknowledgements

MS would like to thank Xavier Allamigeon, Stéphane Gaubert, and Ricardo D. Katz for many useful discussions on mean-payoff games, policy iteration, and the operator approach, for exchanging ideas about the problem of smoothed analysis, for their remarks on a preliminary version of this paper, and for being a perpetual source of friendship and inspiration.

Cite AsGet BibTex

Bruno Loff and Mateusz Skomra. Smoothed Analysis of Deterministic Discounted and Mean-Payoff Games. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 147:1-147:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICALP.2024.147

Abstract

We devise a policy-iteration algorithm for deterministic two-player discounted and mean-payoff games, that runs in polynomial time with high probability, on any input where each payoff is chosen independently from a sufficiently random distribution and the underlying graph of the game is ergodic. This includes the case where an arbitrary set of payoffs has been perturbed by a Gaussian, showing for the first time that deterministic two-player games can be solved efficiently, in the sense of smoothed analysis. More generally, we devise a condition number for deterministic discounted and mean-payoff games played on ergodic graphs, and show that our algorithm runs in time polynomial in this condition number. Our result confirms a previous conjecture of Boros et al., which was claimed as a theorem [Boros et al., 2011] and later retracted [Boros et al., 2018]. It stands in contrast with a recent counter-example by Christ and Yannakakis [Christ and Yannakakis, 2023], showing that Howard’s policy-iteration algorithm does not run in smoothed polynomial time on stochastic single-player mean-payoff games. Our approach is inspired by the analysis of random optimal assignment instances by Frieze and Sorkin [Frieze and Sorkin, 2007], and the analysis of bias-induced policies for mean-payoff games by Akian, Gaubert and Hochart [Akian et al., 2018].

Subject Classification

ACM Subject Classification
  • Theory of computation → Algorithmic game theory
Keywords
  • Mean-payoff games
  • discounted games
  • policy iteration
  • smoothed analysis

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ilan Adler, Richard M. Karp, and Ron Shamir. A simplex variant solving an m× d linear program in O(min (m², d²)) expected number of pivot steps. Journal of Complexity, 3(4):372-387, 1987. Google Scholar
  2. Miklós Ajtai. Generating hard instances of the short basis problem. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP), pages 1-9, 1999. Google Scholar
  3. M. Akian, J. Cochet-Terrasson, S. Detournay, and S. Gaubert. Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, 2012. URL: https://arxiv.org/abs/1208.0446.
  4. M. Akian, S. Gaubert, and A. Guterman. Tropical polyhedra are equivalent to mean payoff games. Int. J. Algebra Comput., 22(1):125001 (43 pages), 2012. URL: https://doi.org/10.1142/S0218196711006674.
  5. M. Akian, S. Gaubert, and A. Hochart. Ergodicity conditions for zero-sum games. Discrete Contin. Dyn. Syst., 35(9):3901-3931, 2015. URL: https://doi.org/10.3934/dcds.2015.35.3901.
  6. M. Akian, S. Gaubert, and A. Hochart. Generic uniqueness of the bias vector of finite zero-sum stochastic games with perfect information. J. Math. Anal. Appl., 457:1038-1064, 2018. URL: https://doi.org/10.1016/j.jmaa.2017.07.017.
  7. M. Akian, S. Gaubert, and A. Hochart. A game theory approach to the existence and uniqueness of nonlinear Perron-Frobenius eigenvectors. Discrete & Continuous Dynamical Systems - A, 40:207-231, 2020. URL: https://doi.org/10.3934/dcds.2020009.
  8. M. Akian, S. Gaubert, U. Naepels, and B. Terver. Solving irreducible stochastic mean-payoff games and entropy games by relative Krasnoselskii-Mann iteration. In 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), volume 272 of Leibniz International Proceedings in Informatics (LIPIcs), pages 10:1-10:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.MFCS.2023.10.
  9. X. Allamigeon, P. Benchimol, and S. Gaubert. The tropical shadow-vertex algorithm solves mean payoff games in polynomial time on average. In Proceedings of the 41st International Colloquium on Automata, Languages, and Programming (ICALP), volume 8572 of Lecture Notes in Comput. Sci., pages 89-100. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_8.
  10. X. Allamigeon, P. Benchimol, S. Gaubert, and M. Joswig. Combinatorial simplex algorithms can solve mean payoff games. SIAM J. Optim., 24(4):2096-2117, 2014. URL: https://doi.org/10.1137/140953800.
  11. X. Allamigeon, P. Benchimol, S. Gaubert, and M. Joswig. Tropicalizing the simplex algorithm. SIAM J. Discrete Math., 29(2):751-795, 2015. URL: https://doi.org/10.1137/130936464.
  12. X. Allamigeon, P. Benchimol, S. Gaubert, and M. Joswig. Log-barrier interior point methods are not strongly polynomial. SIAM J. Appl. Algebra Geom., 2(1):140-178, 2018. URL: https://doi.org/10.1137/17M1142132.
  13. X. Allamigeon, S. Gaubert, R. D. Katz, and M. Skomra. Universal complexity bounds based on value iteration and application to entropy games. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), volume 229 of Leibniz International Proceedings in Informatics (LIPIcs), pages 126:1-126:20, Dagstuhl, Germany, 2022. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. Google Scholar
  14. X. Allamigeon, S. Gaubert, and M. Skomra. Solving generic nonarchimedean semidefinite programs using stochastic game algorithms. J. Symbolic Comput., 85:25-54, 2018. URL: https://doi.org/10.1016/j.jsc.2017.07.002.
  15. R. Beier and B. Vöcking. Typical properties of winners and losers in discrete optimization. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pages 343-352. ACM, 2004. URL: https://doi.org/10.1145/1007352.1007409.
  16. Richard Bellman. Dynamic Programming. Princeton University Press, 1957. Google Scholar
  17. M. Bezem, R. Nieuwenhuis, and E. Rodríguez-Carbonell. Exponential behaviour of the Butkovič-Zimmermann algorithm for solving two-sided linear systems in max-algebra. Discrete Appl. Math., 156(18):3506-3509, 2008. URL: https://doi.org/10.1016/j.dam.2008.03.016.
  18. E. Boros, K. Elbassioni, M. Fouz, V. Gurvich, K. Makino, and B. Manthey. Stochastic mean payoff games: smoothed analysis and approximation schemes. In Proceedings of the 38th International Colloquium on Automata, Languages, and Programming (ICALP), volume 6755 of Lecture Notes in Comput. Sci., pages 147-158. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22006-7_13.
  19. Endre Boros, Khaled Elbassioni, Mahmoud Fouz, Vladimir Gurvich, Kazuhisa Makino, and Bodo Manthey. Approximation schemes for stochastic mean payoff games with perfect information and few random positions. Algorithmica, 80:3132-3157, 2018. Google Scholar
  20. Peter Bürgisser and Felipe Cucker. Condition: The geometry of numerical algorithms, volume 349. Springer Science & Business Media, 2013. Google Scholar
  21. Jakub Chaloupka. Parallel algorithms for mean-payoff games: An experimental evaluation. In European Symposium on Algorithms, pages 599-610. Springer, 2009. Google Scholar
  22. K. Chatterjee, M. Henzinger, S. Krinninger, and D. Nanongkai. Polynomial-time algorithms for energy games with special weight structures. Algorithmica, 70(3):457-492, 2014. URL: https://doi.org/10.1007/s00453-013-9843-7.
  23. K. Chatterjee and R. Ibsen-Jensen. The complexity of ergodic mean-payoff games. In Proceedings of the 41st International Colloquium on Automata, Languages, and Programming (ICALP), volume 8573 of Lecture Notes in Comput. Sci., pages 122-133. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-43951-7_11.
  24. Miranda Christ and Mihalis Yannakakis. The smoothed complexity of policy iteration for Markov decision processes. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing (STOC), pages 1890-1903, 2023. Google Scholar
  25. J. Cochet-Terrasson, S. Gaubert, and J. Gunawardena. A constructive fixed point theorem for min-max functions. Dyn. Stab. Syst., 14(4):407-433, 1999. URL: https://doi.org/10.1080/026811199281967.
  26. Jean Cochet-Terrasson and Stéphane Gaubert. A policy iteration algorithm for zero-sum stochastic games with mean payoff. Comptes Rendus Mathematique, 343(5):377-382, 2006. Google Scholar
  27. Jean Cochet-Terrasson Guy Cohen, Stephane Gaubert, Michael Mc Gettrick, and Jean-Pierre Quadrat. Numerical computation of spectral elements in max-plus algebra. In Proceedings of the IFAC Conference on System Structure and Control, 1998. Google Scholar
  28. Daniel Dadush and Sophie Huiberts. A friendly smoothed analysis of the simplex method. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 390-403, 2018. Google Scholar
  29. Eric V. Denardo. Contraction mappings in the theory underlying dynamic programming. Siam Review, 9(2):165-177, 1967. Google Scholar
  30. Eric V. Denardo and Bennett L. Fox. Multichain Markov renewal programs. SIAM Journal on Applied Mathematics, 16(3):468-487, 1968. Google Scholar
  31. M. Develin and J. Yu. Tropical polytopes and cellular resolutions. Exp. Math., 16(3):277-291, 2007. URL: https://doi.org/10.1080/10586458.2007.10129009.
  32. Vishesh Dhingra and Stéphane Gaubert. How to solve large scale deterministic games with mean payoff by policy iteration. In Proceedings of the 1st International Conference on Performance Evaluation Methodologies and Tools (ValueTools), pages 12-es, 2006. URL: https://doi.org/10.1145/1190095.1190110.
  33. Y. Disser and N. Mosis. A unified worst case for classical simplex and policy iteration pivot rules. In Proceedings of the 34th International Symposium on Algorithms and Computation (ISAAC), pages 27:1-27:17, 2023. Google Scholar
  34. John Fearnley, Paul Goldberg, Alexandros Hollender, and Rahul Savani. The complexity of gradient descent: CLS = PPAD ∩ PLS. Journal of the ACM, 70(1):1-74, 2022. Google Scholar
  35. John Fearnley, Spencer Gordon, Ruta Mehta, and Rahul Savani. Unique end of potential line. Journal of Computer and System Sciences, 114:1-35, 2020. Google Scholar
  36. Uriel Feige. Relations between average case complexity and approximation complexity. In Proceedings of the 34th annual ACM Symposium on Theory of Computing (STOC), pages 534-543, 2002. Google Scholar
  37. J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer, New York, 2007. URL: https://doi.org/10.1007/978-1-4612-4054-9.
  38. Oliver Friedmann. Exponential lower bounds for solving infinitary payoff games and linear programs. PhD thesis, Ludwig Maximilian University of Munich, 2011. Google Scholar
  39. Alan Frieze and Gregory B. Sorkin. The probabilistic relationship between the assignment and asymmetric traveling salesman problems. SIAM Journal on Computing, 36(5):1435-1452, 2007. Google Scholar
  40. S. Gaubert and J. Gunawardena. The duality theorem for min-max functions. C. R. Acad. Sci., 326(1):43-48, 1998. URL: https://doi.org/10.1016/S0764-4442(97)82710-3.
  41. Loukas Georgiadis, Andrew V Goldberg, Robert E Tarjan, and Renato F Werneck. An experimental study of minimum mean cycle algorithms. In 2009 Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments (ALENEX), pages 1-13. SIAM, 2009. Google Scholar
  42. D. Gillette. Stochastic games with zero stop probabilities. In M. Dresher, A. W. Tucker, and P. Wolfe, editors, Contributions to the Theory of Games III, volume 39 of Ann. of Math. Stud., pages 179-188. Princeton University Press, Princeton, NJ, 1957. Google Scholar
  43. Mika Göös, Alexandros Hollender, Siddhartha Jain, Gilbert Maystre, William Pires, Robert Robere, and Ran Tao. Further collapses in TFNP. In Proceedings of the 37th Computational Complexity Conference (CCC), pages 1-15, 2022. Google Scholar
  44. V. A. Gurvich, A. V. Karzanov, and L. G. Khachiyan. Cyclic games and finding minimax mean cycles in digraphs. Zh. Vychisl. Mat. Mat. Fiz., 28(9):1406-1417, 1988. URL: https://doi.org/10.1016/0041-5553(88)90012-2.
  45. V. A. Gurvich and V. N. Lebedev. A criterion and verification of the ergodicity of cyclic game forms. Russian Math. Surveys, 44(1):243-244, 1989. URL: https://doi.org/10.1070/RM1989v044n01ABEH002010.
  46. N. Halman. Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica, 49(1):37-50, 2007. URL: https://doi.org/10.1007/s00453-007-0175-3.
  47. T. D. Hansen, P. B. Miltersen, and U. Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J. ACM, 60(1):1-16, 2013. URL: https://doi.org/10.1145/2432622.2432623.
  48. T. D. Hansen and U. Zwick. An improved version of the Random-Facet pivoting rule for the simplex algorithm. In Proceedings of the 47th Annual ACM Symposium on the Theory of Computing (STOC), pages 209-218. ACM, 2015. URL: https://doi.org/10.1145/2746539.2746557.
  49. Thomas Dueholm Hansen and Uri Zwick. Lower bounds for Howard’s algorithm for finding minimum mean-cost cycles. In International Symposium on Algorithms and Computation, pages 415-426. Springer, 2010. Google Scholar
  50. A. J. Hoffman and R. M. Karp. On nonterminating stochastic games. Manag. Sci., 12(5):359-370, 1966. URL: https://doi.org/10.1287/mnsc.12.5.359.
  51. A. Hordijk and A. A. Yushkevich. Blackwell optimality. In E. A. Feinberg and A. Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications, volume 40 of Internat. Ser. Oper. Res. Management Sci., pages 231-267. Springer, Boston, MA, 2002. URL: https://doi.org/10.1007/978-1-4615-0805-2_8.
  52. Ronald A. Howard. Dynamic Programming and Markov Processes. MIT Press, 1960. Google Scholar
  53. Pavel Hubácek and Eylon Yogev. Hardness of continuous local search: Query complexity and cryptographic lower bounds. SIAM Journal on Computing, 49(6):1128-1172, 2020. Google Scholar
  54. R. Ibsen-Jensen and P. B. Miltersen. Solving simple stochastic games with few coin toss positions. In Proceedings of the 20th Annual European Symposium on Algorithms (ESA), volume 7501 of Lecture Notes in Comput. Sci., pages 636-647. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-33090-2_55.
  55. David S Johnson, Christos H Papadimitriou, and Mihalis Yannakakis. How easy is local search? Journal of computer and system sciences, 37(1):79-100, 1988. Google Scholar
  56. M. Jurdziński. Deciding the winner in parity games is in UP ∩ co-UP. Inform. Process. Lett., 68(3):119-124, 1998. URL: https://doi.org/10.1016/S0020-0190(98)00150-1.
  57. L. Kallenberg. Finite state and action MDPs. In E. A. Feinberg and A. Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications, volume 40 of Internat. Ser. Oper. Res. Management Sci., pages 21-87. Springer, Boston, MA, 2002. URL: https://doi.org/10.1007/978-1-4615-0805-2_2.
  58. Ricardo David Katz. Max-plus (A, B)-invariant spaces and control of timed discrete-event systems. IEEE Transactions on Automatic Control, 52(2):229-241, 2007. Google Scholar
  59. Jan Křetínskỳ and Tobias Meggendorfer. Efficient strategy iteration for mean payoff in Markov decision processes. In International Symposium on Automated Technology for Verification and Analysis, pages 380-399. Springer, 2017. Google Scholar
  60. T. M. Liggett and S. A. Lippman. Stochastic games with perfect information and time average payoff. SIAM Rev., 11(4):604-607, 1969. URL: https://doi.org/10.1137/1011093.
  61. C. Mathieu and D. B. Wilson. The min mean-weight cycle in a random network. Combin. Probab. Comput., 22(5):763-782, 2013. URL: https://doi.org/10.1017/S0963548313000229.
  62. Nimrod Megiddo and Christos H Papadimitriou. On total functions, existence theorems and computational complexity. Theoretical Computer Science, 81(2):317-324, 1991. Google Scholar
  63. J.-F. Mertens and A. Neyman. Stochastic games. Internat. J. Game Theory, 10(2):53-66, 1981. URL: https://doi.org/10.1007/BF01769259.
  64. Rolf H. Möhring, Martin Skutella, and Frederik Stork. Scheduling with AND/OR precedence constraints. SIAM Journal on Computing, 33(2):393-415, 2004. Google Scholar
  65. Ketan Mulmuley, Umesh V Vazirani, and Vijay V Vazirani. Matching is as easy as matrix inversion. In Proceedings of the 19th annual ACM Symposium on Theory of Computing (STOC), pages 345-354, 1987. Google Scholar
  66. Christos H Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and system Sciences, 48(3):498-532, 1994. Google Scholar
  67. Anuj Puri. Theory of hybrid systems and discrete event systems. University of California at Berkeley, 1995. Google Scholar
  68. M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Ser. Probab. Stat. Wiley, Hoboken, NJ, 2005. Google Scholar
  69. T.E.S. Raghavan and Zamir Syed. A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information. Mathematical Programming, 95(3):513-532, 2003. Google Scholar
  70. S. S. Rao, R. Chandrasekaran, and K.P.K. Nair. Algorithms for discounted stochastic games. Journal of Optimization Theory and Applications, 11(6):627-637, 1973. Google Scholar
  71. H. Röglin and B. Vöcking. Smoothed analysis of integer programming. Math. Program., 110(1):21-56, 2007. URL: https://doi.org/10.1007/s10107-006-0055-7.
  72. Steven Rudich. Super-bits, demi-bits, and NP/qpoly-natural proofs. In Proceedings of the International Workshop on Randomization and Approximation Techniques in Computer Science (RANDOM/APPROX), pages 85-93, 1997. Google Scholar
  73. Sven Schewe. From parity and payoff games to linear programming. In Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science (MFCS), pages 675-686, 2009. Google Scholar
  74. Bart Selman, David G Mitchell, and Hector J Levesque. Generating hard satisfiability problems. Artificial intelligence, 81(1-2):17-29, 1996. Google Scholar
  75. L. S. Shapley. Stochastic games. Proc. Natl. Acad. Sci. USA, 39(10):1095-1100, 1953. URL: https://doi.org/10.1073/pnas.39.10.1095.
  76. Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance reduced value iteration and faster algorithms for solving Markov decision processes. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 770-787, 2018. Google Scholar
  77. Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM, 51(3):385-463, 2004. Google Scholar
  78. Y. Ye. The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res., 36(4):593-603, 2011. Google Scholar
  79. U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoret. Comput. Sci., 158(1-2):343-359, 1996. URL: https://doi.org/10.1016/0304-3975(95)00188-3.