Smoothed Analysis of Deterministic Discounted and Mean-Payoff Games

Loff, Bruno; Skomra, Mateusz

doi:10.4230/LIPIcs.ICALP.2024.147

File

LIPIcs.ICALP.2024.147.pdf

Filesize: 0.74 MB
16 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ICALP.2024.147
URN: urn:nbn:de:0030-drops-202908

Author Details

Bruno Loff

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal

Mateusz Skomra

LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France

Acknowledgements

MS would like to thank Xavier Allamigeon, Stéphane Gaubert, and Ricardo D. Katz for many useful discussions on mean-payoff games, policy iteration, and the operator approach, for exchanging ideas about the problem of smoothed analysis, for their remarks on a preliminary version of this paper, and for being a perpetual source of friendship and inspiration.

Cite AsGet BibTex

Bruno Loff and Mateusz Skomra. Smoothed Analysis of Deterministic Discounted and Mean-Payoff Games. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 147:1-147:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICALP.2024.147

Abstract

We devise a policy-iteration algorithm for deterministic two-player discounted and mean-payoff games, that runs in polynomial time with high probability, on any input where each payoff is chosen independently from a sufficiently random distribution and the underlying graph of the game is ergodic. This includes the case where an arbitrary set of payoffs has been perturbed by a Gaussian, showing for the first time that deterministic two-player games can be solved efficiently, in the sense of smoothed analysis. More generally, we devise a condition number for deterministic discounted and mean-payoff games played on ergodic graphs, and show that our algorithm runs in time polynomial in this condition number. Our result confirms a previous conjecture of Boros et al., which was claimed as a theorem [Boros et al., 2011] and later retracted [Boros et al., 2018]. It stands in contrast with a recent counter-example by Christ and Yannakakis [Christ and Yannakakis, 2023], showing that Howard’s policy-iteration algorithm does not run in smoothed polynomial time on stochastic single-player mean-payoff games. Our approach is inspired by the analysis of random optimal assignment instances by Frieze and Sorkin [Frieze and Sorkin, 2007], and the analysis of bias-induced policies for mean-payoff games by Akian, Gaubert and Hochart [Akian et al., 2018].

Subject Classification

ACM Subject Classification

Theory of computation → Algorithmic game theory

Keywords

Mean-payoff games
discounted games
policy iteration
smoothed analysis

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Ilan Adler, Richard M. Karp, and Ron Shamir. A simplex variant solving an m× d linear program in O(min (m², d²)) expected number of pivot steps. Journal of Complexity, 3(4):372-387, 1987.
Miklós Ajtai. Generating hard instances of the short basis problem. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP), pages 1-9, 1999.
M. Akian, J. Cochet-Terrasson, S. Detournay, and S. Gaubert. Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information, 2012. URL: https://arxiv.org/abs/1208.0446.
M. Akian, S. Gaubert, and A. Guterman. Tropical polyhedra are equivalent to mean payoff games. Int. J. Algebra Comput., 22(1):125001 (43 pages), 2012. URL: https://doi.org/10.1142/S0218196711006674.
M. Akian, S. Gaubert, and A. Hochart. Ergodicity conditions for zero-sum games. Discrete Contin. Dyn. Syst., 35(9):3901-3931, 2015. URL: https://doi.org/10.3934/dcds.2015.35.3901.
M. Akian, S. Gaubert, and A. Hochart. Generic uniqueness of the bias vector of finite zero-sum stochastic games with perfect information. J. Math. Anal. Appl., 457:1038-1064, 2018. URL: https://doi.org/10.1016/j.jmaa.2017.07.017.
M. Akian, S. Gaubert, and A. Hochart. A game theory approach to the existence and uniqueness of nonlinear Perron-Frobenius eigenvectors. Discrete & Continuous Dynamical Systems - A, 40:207-231, 2020. URL: https://doi.org/10.3934/dcds.2020009.
M. Akian, S. Gaubert, U. Naepels, and B. Terver. Solving irreducible stochastic mean-payoff games and entropy games by relative Krasnoselskii-Mann iteration. In 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), volume 272 of Leibniz International Proceedings in Informatics (LIPIcs), pages 10:1-10:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.MFCS.2023.10.
X. Allamigeon, P. Benchimol, and S. Gaubert. The tropical shadow-vertex algorithm solves mean payoff games in polynomial time on average. In Proceedings of the 41st International Colloquium on Automata, Languages, and Programming (ICALP), volume 8572 of Lecture Notes in Comput. Sci., pages 89-100. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_8.
X. Allamigeon, P. Benchimol, S. Gaubert, and M. Joswig. Combinatorial simplex algorithms can solve mean payoff games. SIAM J. Optim., 24(4):2096-2117, 2014. URL: https://doi.org/10.1137/140953800.
X. Allamigeon, P. Benchimol, S. Gaubert, and M. Joswig. Tropicalizing the simplex algorithm. SIAM J. Discrete Math., 29(2):751-795, 2015. URL: https://doi.org/10.1137/130936464.
X. Allamigeon, P. Benchimol, S. Gaubert, and M. Joswig. Log-barrier interior point methods are not strongly polynomial. SIAM J. Appl. Algebra Geom., 2(1):140-178, 2018. URL: https://doi.org/10.1137/17M1142132.
X. Allamigeon, S. Gaubert, R. D. Katz, and M. Skomra. Universal complexity bounds based on value iteration and application to entropy games. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), volume 229 of Leibniz International Proceedings in Informatics (LIPIcs), pages 126:1-126:20, Dagstuhl, Germany, 2022. Schloss Dagstuhl - Leibniz-Zentrum für Informatik.
X. Allamigeon, S. Gaubert, and M. Skomra. Solving generic nonarchimedean semidefinite programs using stochastic game algorithms. J. Symbolic Comput., 85:25-54, 2018. URL: https://doi.org/10.1016/j.jsc.2017.07.002.
R. Beier and B. Vöcking. Typical properties of winners and losers in discrete optimization. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pages 343-352. ACM, 2004. URL: https://doi.org/10.1145/1007352.1007409.
Richard Bellman. Dynamic Programming. Princeton University Press, 1957.
M. Bezem, R. Nieuwenhuis, and E. Rodríguez-Carbonell. Exponential behaviour of the Butkovič-Zimmermann algorithm for solving two-sided linear systems in max-algebra. Discrete Appl. Math., 156(18):3506-3509, 2008. URL: https://doi.org/10.1016/j.dam.2008.03.016.
E. Boros, K. Elbassioni, M. Fouz, V. Gurvich, K. Makino, and B. Manthey. Stochastic mean payoff games: smoothed analysis and approximation schemes. In Proceedings of the 38th International Colloquium on Automata, Languages, and Programming (ICALP), volume 6755 of Lecture Notes in Comput. Sci., pages 147-158. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22006-7_13.
Endre Boros, Khaled Elbassioni, Mahmoud Fouz, Vladimir Gurvich, Kazuhisa Makino, and Bodo Manthey. Approximation schemes for stochastic mean payoff games with perfect information and few random positions. Algorithmica, 80:3132-3157, 2018.
Peter Bürgisser and Felipe Cucker. Condition: The geometry of numerical algorithms, volume 349. Springer Science & Business Media, 2013.
Jakub Chaloupka. Parallel algorithms for mean-payoff games: An experimental evaluation. In European Symposium on Algorithms, pages 599-610. Springer, 2009.
K. Chatterjee, M. Henzinger, S. Krinninger, and D. Nanongkai. Polynomial-time algorithms for energy games with special weight structures. Algorithmica, 70(3):457-492, 2014. URL: https://doi.org/10.1007/s00453-013-9843-7.
K. Chatterjee and R. Ibsen-Jensen. The complexity of ergodic mean-payoff games. In Proceedings of the 41st International Colloquium on Automata, Languages, and Programming (ICALP), volume 8573 of Lecture Notes in Comput. Sci., pages 122-133. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-43951-7_11.
Miranda Christ and Mihalis Yannakakis. The smoothed complexity of policy iteration for Markov decision processes. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing (STOC), pages 1890-1903, 2023.
J. Cochet-Terrasson, S. Gaubert, and J. Gunawardena. A constructive fixed point theorem for min-max functions. Dyn. Stab. Syst., 14(4):407-433, 1999. URL: https://doi.org/10.1080/026811199281967.
Jean Cochet-Terrasson and Stéphane Gaubert. A policy iteration algorithm for zero-sum stochastic games with mean payoff. Comptes Rendus Mathematique, 343(5):377-382, 2006.
Jean Cochet-Terrasson Guy Cohen, Stephane Gaubert, Michael Mc Gettrick, and Jean-Pierre Quadrat. Numerical computation of spectral elements in max-plus algebra. In Proceedings of the IFAC Conference on System Structure and Control, 1998.
Daniel Dadush and Sophie Huiberts. A friendly smoothed analysis of the simplex method. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 390-403, 2018.
Eric V. Denardo. Contraction mappings in the theory underlying dynamic programming. Siam Review, 9(2):165-177, 1967.
Eric V. Denardo and Bennett L. Fox. Multichain Markov renewal programs. SIAM Journal on Applied Mathematics, 16(3):468-487, 1968.
M. Develin and J. Yu. Tropical polytopes and cellular resolutions. Exp. Math., 16(3):277-291, 2007. URL: https://doi.org/10.1080/10586458.2007.10129009.
Vishesh Dhingra and Stéphane Gaubert. How to solve large scale deterministic games with mean payoff by policy iteration. In Proceedings of the 1st International Conference on Performance Evaluation Methodologies and Tools (ValueTools), pages 12-es, 2006. URL: https://doi.org/10.1145/1190095.1190110.
Y. Disser and N. Mosis. A unified worst case for classical simplex and policy iteration pivot rules. In Proceedings of the 34th International Symposium on Algorithms and Computation (ISAAC), pages 27:1-27:17, 2023.
John Fearnley, Paul Goldberg, Alexandros Hollender, and Rahul Savani. The complexity of gradient descent: CLS = PPAD ∩ PLS. Journal of the ACM, 70(1):1-74, 2022.
John Fearnley, Spencer Gordon, Ruta Mehta, and Rahul Savani. Unique end of potential line. Journal of Computer and System Sciences, 114:1-35, 2020.
Uriel Feige. Relations between average case complexity and approximation complexity. In Proceedings of the 34th annual ACM Symposium on Theory of Computing (STOC), pages 534-543, 2002.
J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer, New York, 2007. URL: https://doi.org/10.1007/978-1-4612-4054-9.
Oliver Friedmann. Exponential lower bounds for solving infinitary payoff games and linear programs. PhD thesis, Ludwig Maximilian University of Munich, 2011.
Alan Frieze and Gregory B. Sorkin. The probabilistic relationship between the assignment and asymmetric traveling salesman problems. SIAM Journal on Computing, 36(5):1435-1452, 2007.
S. Gaubert and J. Gunawardena. The duality theorem for min-max functions. C. R. Acad. Sci., 326(1):43-48, 1998. URL: https://doi.org/10.1016/S0764-4442(97)82710-3.
Loukas Georgiadis, Andrew V Goldberg, Robert E Tarjan, and Renato F Werneck. An experimental study of minimum mean cycle algorithms. In 2009 Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments (ALENEX), pages 1-13. SIAM, 2009.
D. Gillette. Stochastic games with zero stop probabilities. In M. Dresher, A. W. Tucker, and P. Wolfe, editors, Contributions to the Theory of Games III, volume 39 of Ann. of Math. Stud., pages 179-188. Princeton University Press, Princeton, NJ, 1957.
Mika Göös, Alexandros Hollender, Siddhartha Jain, Gilbert Maystre, William Pires, Robert Robere, and Ran Tao. Further collapses in TFNP. In Proceedings of the 37th Computational Complexity Conference (CCC), pages 1-15, 2022.
V. A. Gurvich, A. V. Karzanov, and L. G. Khachiyan. Cyclic games and finding minimax mean cycles in digraphs. Zh. Vychisl. Mat. Mat. Fiz., 28(9):1406-1417, 1988. URL: https://doi.org/10.1016/0041-5553(88)90012-2.
V. A. Gurvich and V. N. Lebedev. A criterion and verification of the ergodicity of cyclic game forms. Russian Math. Surveys, 44(1):243-244, 1989. URL: https://doi.org/10.1070/RM1989v044n01ABEH002010.
N. Halman. Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica, 49(1):37-50, 2007. URL: https://doi.org/10.1007/s00453-007-0175-3.
T. D. Hansen, P. B. Miltersen, and U. Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J. ACM, 60(1):1-16, 2013. URL: https://doi.org/10.1145/2432622.2432623.
T. D. Hansen and U. Zwick. An improved version of the Random-Facet pivoting rule for the simplex algorithm. In Proceedings of the 47th Annual ACM Symposium on the Theory of Computing (STOC), pages 209-218. ACM, 2015. URL: https://doi.org/10.1145/2746539.2746557.
Thomas Dueholm Hansen and Uri Zwick. Lower bounds for Howard’s algorithm for finding minimum mean-cost cycles. In International Symposium on Algorithms and Computation, pages 415-426. Springer, 2010.
A. J. Hoffman and R. M. Karp. On nonterminating stochastic games. Manag. Sci., 12(5):359-370, 1966. URL: https://doi.org/10.1287/mnsc.12.5.359.
A. Hordijk and A. A. Yushkevich. Blackwell optimality. In E. A. Feinberg and A. Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications, volume 40 of Internat. Ser. Oper. Res. Management Sci., pages 231-267. Springer, Boston, MA, 2002. URL: https://doi.org/10.1007/978-1-4615-0805-2_8.
Ronald A. Howard. Dynamic Programming and Markov Processes. MIT Press, 1960.
Pavel Hubácek and Eylon Yogev. Hardness of continuous local search: Query complexity and cryptographic lower bounds. SIAM Journal on Computing, 49(6):1128-1172, 2020.
R. Ibsen-Jensen and P. B. Miltersen. Solving simple stochastic games with few coin toss positions. In Proceedings of the 20th Annual European Symposium on Algorithms (ESA), volume 7501 of Lecture Notes in Comput. Sci., pages 636-647. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-33090-2_55.
David S Johnson, Christos H Papadimitriou, and Mihalis Yannakakis. How easy is local search? Journal of computer and system sciences, 37(1):79-100, 1988.
M. Jurdziński. Deciding the winner in parity games is in UP ∩ co-UP. Inform. Process. Lett., 68(3):119-124, 1998. URL: https://doi.org/10.1016/S0020-0190(98)00150-1.
L. Kallenberg. Finite state and action MDPs. In E. A. Feinberg and A. Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications, volume 40 of Internat. Ser. Oper. Res. Management Sci., pages 21-87. Springer, Boston, MA, 2002. URL: https://doi.org/10.1007/978-1-4615-0805-2_2.
Ricardo David Katz. Max-plus (A, B)-invariant spaces and control of timed discrete-event systems. IEEE Transactions on Automatic Control, 52(2):229-241, 2007.
Jan Křetínskỳ and Tobias Meggendorfer. Efficient strategy iteration for mean payoff in Markov decision processes. In International Symposium on Automated Technology for Verification and Analysis, pages 380-399. Springer, 2017.
T. M. Liggett and S. A. Lippman. Stochastic games with perfect information and time average payoff. SIAM Rev., 11(4):604-607, 1969. URL: https://doi.org/10.1137/1011093.
C. Mathieu and D. B. Wilson. The min mean-weight cycle in a random network. Combin. Probab. Comput., 22(5):763-782, 2013. URL: https://doi.org/10.1017/S0963548313000229.
Nimrod Megiddo and Christos H Papadimitriou. On total functions, existence theorems and computational complexity. Theoretical Computer Science, 81(2):317-324, 1991.
J.-F. Mertens and A. Neyman. Stochastic games. Internat. J. Game Theory, 10(2):53-66, 1981. URL: https://doi.org/10.1007/BF01769259.
Rolf H. Möhring, Martin Skutella, and Frederik Stork. Scheduling with AND/OR precedence constraints. SIAM Journal on Computing, 33(2):393-415, 2004.
Ketan Mulmuley, Umesh V Vazirani, and Vijay V Vazirani. Matching is as easy as matrix inversion. In Proceedings of the 19th annual ACM Symposium on Theory of Computing (STOC), pages 345-354, 1987.
Christos H Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and system Sciences, 48(3):498-532, 1994.
Anuj Puri. Theory of hybrid systems and discrete event systems. University of California at Berkeley, 1995.
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Ser. Probab. Stat. Wiley, Hoboken, NJ, 2005.
T.E.S. Raghavan and Zamir Syed. A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information. Mathematical Programming, 95(3):513-532, 2003.
S. S. Rao, R. Chandrasekaran, and K.P.K. Nair. Algorithms for discounted stochastic games. Journal of Optimization Theory and Applications, 11(6):627-637, 1973.
H. Röglin and B. Vöcking. Smoothed analysis of integer programming. Math. Program., 110(1):21-56, 2007. URL: https://doi.org/10.1007/s10107-006-0055-7.
Steven Rudich. Super-bits, demi-bits, and NP/qpoly-natural proofs. In Proceedings of the International Workshop on Randomization and Approximation Techniques in Computer Science (RANDOM/APPROX), pages 85-93, 1997.
Sven Schewe. From parity and payoff games to linear programming. In Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science (MFCS), pages 675-686, 2009.
Bart Selman, David G Mitchell, and Hector J Levesque. Generating hard satisfiability problems. Artificial intelligence, 81(1-2):17-29, 1996.
L. S. Shapley. Stochastic games. Proc. Natl. Acad. Sci. USA, 39(10):1095-1100, 1953. URL: https://doi.org/10.1073/pnas.39.10.1095.
Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance reduced value iteration and faster algorithms for solving Markov decision processes. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 770-787, 2018.
Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM, 51(3):385-463, 2004.
Y. Ye. The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res., 36(4):593-603, 2011.
U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoret. Comput. Sci., 158(1-2):343-359, 1996. URL: https://doi.org/10.1016/0304-3975(95)00188-3.

Smoothed Analysis of Deterministic Discounted and Mean-Payoff Games

Authors Bruno Loff , Mateusz Skomra

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Smoothed Analysis of Deterministic Discounted and Mean-Payoff Games

Authors Bruno Loff , Mateusz Skomra

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message