Universal Complexity Bounds Based on Value Iteration and Application to Entropy Games

Allamigeon, Xavier; Gaubert, Stéphane; Katz, Ricardo D.; Skomra, Mateusz

doi:10.4230/LIPIcs.ICALP.2022.110

Abstract

We develop value iteration-based algorithms to solve in a unified manner different classes of combinatorial zero-sum games with mean-payoff type rewards. These algorithms rely on an oracle, evaluating the dynamic programming operator up to a given precision. We show that the number of calls to the oracle needed to determine exact optimal (positional) strategies is, up to a factor polynomial in the dimension, of order R/sep, where the "separation" sep is defined as the minimal difference between distinct values arising from strategies, and R is a metric estimate, involving the norm of approximate sub and super-eigenvectors of the dynamic programming operator. We illustrate this method by two applications. The first one is a new proof, leading to improved complexity estimates, of a theorem of Boros, Elbassioni, Gurvich and Makino, showing that turn-based mean payoff games with a fixed number of random positions can be solved in pseudo-polynomial time. The second one concerns entropy games, a model introduced by Asarin, Cervelle, Degorre, Dima, Horn and Kozyakin. The rank of an entropy game is defined as the maximal rank among all the ambiguity matrices determined by strategies of the two players. We show that entropy games with a fixed rank, in their original formulation, can be solved in polynomial time, and that an extension of entropy games incorporating weights can be solved in pseudo-polynomial time under the same fixed rank condition.

M. Akian, S. Gaubert, J. Grand-Clément, and J. Guillaud. The operator approach to entropy games. Theor. Comp. Sys., 63(5):1089-1130, July 2019. URL: https://doi.org/10.1007/s00224-019-09925-z.
M. Akian, S. Gaubert, and A. Guterman. Tropical polyhedra are equivalent to mean payoff games. Int. J. Algebra Comput., 22(1):125001 (43 pages), 2012. URL: https://doi.org/10.1142/S0218196711006674.
M. Akian, S. Gaubert, and A. Hochart. A game theory approach to the existence and uniqueness of nonlinear Perron-Frobenius eigenvectors. Discrete & Continuous Dynamical Systems - A, 40:207-231, 2020. URL: https://doi.org/10.3934/dcds.2020009.
M. Akian, S. Gaubert, and R. Nussbaum. A Collatz-Wielandt characterization of the spectral radius of order-preserving homogeneous maps on cones. arXiv:1112.5968, 2011.
M. Akian, A. Sulem, and M. I. Taksar. Dynamic optimization of long-term growth rate for a portfolio with transaction costs and logarithmic utility. Mathematical Finance, 11(2):153-188, April 2001. URL: https://doi.org/10.1111/1467-9965.00111.
X. Allamigeon, S. Gaubert, R. D. Katz, and M. Skomra. Condition numbers of stochastic mean payoff games and what they say about nonarchimedean semidefinite programming. In Proceedings of the 23rd International Symposium on Mathematical Theory of Networks and Systems (MTNS), pages 160-167, 2018. URL: http://mtns2018.ust.hk/media/files/0213.pdf.
X. Allamigeon, S. Gaubert, and M. Skomra. Solving generic nonarchimedean semidefinite programs using stochastic game algorithms. J. Symbolic Comput., 85:25-54, 2018. URL: https://doi.org/10.1016/j.jsc.2017.07.002.
V. Anantharam and V. S. Borkar. A variational formula for risk-sensitive reward. SIAM J. Contro Optim., 55(2):961-988, 2017. arXiv:1501.00676.
D. Andersson and P. B. Miltersen. The complexity of solving stochastic games on graphs. In Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC), volume 5878 of Lecture Notes in Comput. Sci., pages 112-121. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-10631-6_13.
E. Asarin, J. Cervelle, A. Degorre, C. Dima, F. Horn, and V. Kozyakin. Entropy games and matrix multiplication games. In Proceedings of the 33rd International Symposium on Theoretical Aspects of Computer Science (STACS), volume 47 of LIPIcs. Leibniz Int. Proc. Inform., pages 11:1-11:14, Wadern, 2016. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.STACS.2016.11.
D. Auger, X. Badin de Montjoye, and Y. Strozecki. A generic strategy improvement method for simple stochastic games. In Filippo Bonchi and Simon J. Puglisi, editors, 46th International Symposium on Mathematical Foundations of Computer Science, MFCS 2021, August 23-27, 2021, Tallinn, Estonia, volume 202 of LIPIcs, pages 12:1-12:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.MFCS.2021.12.
A. Berman and R.J. Plemmons. Nonnegative matrices in the mathematical sciences. SIAM, 1994.
T. Bewley and E. Kohlberg. The asymptotic theory of stochastic games. Math. Oper. Res., 1(3):197-208, 1976. URL: https://doi.org/10.1287/moor.1.3.197.
J. Bolte, S. Gaubert, and G. Vigeral. Definable zero-sum stochastic games. Mathematics of Operations Research, 40(1):171-191, 2014. URL: https://doi.org/10.1287/moor.2014.0666.
E. Boros, K. Elbassioni, V. Gurvich, and K. Makino. A convex programming-based algorithm for mean payoff stochastic games with perfect information. Optim. Lett., 11(8):1499-1512, 2017. URL: https://doi.org/10.1007/s11590-017-1140-y.
E. Boros, K. Elbassioni, V. Gurvich, and K. Makino. A pseudo-polynomial algorithm for mean payoff stochastic games with perfect information and few random positions. Inform. and Comput., 267:74-95, 2019. URL: https://doi.org/10.1016/j.ic.2019.03.005.
J. M. Borwein and P. B. Borwein. On the complexity of familiar functions and numbers. SIAM Review, 30(4):589-601, 1988.
A. D. Burbanks, R. D. Nussbaum, and C. T. Sparrow. Extension of order-preserving maps on a cone. Proc. Roy. Soc. Edinburgh Sect. A, 133(1):35-59, 2003. URL: https://doi.org/10.1017/S0308210500002274.
K. Chatterjee and R. Ibsen-Jensen. The complexity of ergodic mean-payoff games. In Proceedings of the 41st International Colloquium on Automata, Languages, and Programming (ICALP), volume 8573 of Lecture Notes in Comput. Sci., pages 122-133. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-43951-7_11.
John H. E. Cohn. On the value of determinants. Proceedings of the American Mathematical Society, 14(4):581-588, 1963.
A. Condon. The complexity of stochastic games. Inform. and Comput., 96(2):203-224, 1992. URL: https://doi.org/10.1016/0890-5401(92)90048-K.
O. Friedmann. An exponential lower bound for the latest deterministic strategy iteration algorithms. Logical Methods in Computer Science, 7(3:19):1-42, 2011.
S. Gaubert and J. Gunawardena. The Perron-Frobenius theorem for homogeneous, monotone functions. Trans. Amer. Math. Soc., 356(12):4931-4950, 2004. URL: https://doi.org/10.1090/S0002-9947-04-03470-1.
H. Gimbert and F. Horn. Simple stochastic games with few random vertices are easy to solve. In Proceedings of the 11th International Conference on Foundations of Software Science and Computational Structures (FoSSaCS), volume 4962 of Lecture Notes in Comput. Sci., pages 5-19. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-78499-9_2.
V. A. Gurvich, A. V. Karzanov, and L. G. Khachiyan. Cyclic games and finding minimax mean cycles in digraphs. Zh. Vychisl. Mat. Mat. Fiz., 28(9):1406-1417, 1988. URL: https://doi.org/10.1016/0041-5553(88)90012-2.
A. J. Hoffman and R. M. Karp. On nonterminating stochastic games. Manag. Sci., 12(5):359-370, 1966. URL: https://doi.org/10.1287/mnsc.12.5.359.
R. A. Howard and J. E. Matheson. Risk-sensitive markov decision processes. Management Science, 18(7):356-369, 1972. URL: https://doi.org/10.1287/mnsc.18.7.356.
R. Ibsen-Jensen and P. B. Miltersen. Solving simple stochastic games with few coin toss positions. In Proceedings of the 20th Annual European Symposium on Algorithms (ESA), volume 7501 of Lecture Notes in Comput. Sci., pages 636-647. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-33090-2_55.
M. Jurdziński, M. Paterson, and U. Zwick. A deterministic subexponential algorithm for solving parity games. SIAM J. Comput., 38(4):1519-1532, 2008. URL: https://doi.org/10.1137/070686652.
E. Kohlberg. Invariant half-lines of nonexpansive piecewise-linear transformations. Math. Oper. Res., 5(3):366-372, 1980. URL: https://doi.org/10.1287/moor.5.3.366.
T. M. Liggett and S. A. Lippman. Stochastic games with perfect information and time average payoff. SIAM Rev., 11(4):604-607, 1969. URL: https://doi.org/10.1137/1011093.
J.-F. Mertens and A. Neyman. Stochastic games. Internat. J. Game Theory, 10(2):53-66, 1981. URL: https://doi.org/10.1007/BF01769259.
J.-F. Mertens, S. Sorin, and S. Zamir. Repeated games, volume 55 of Econom. Soc. Monogr. Cambridge University Press, Cambridge, 2015. URL: https://doi.org/10.1017/CBO9781139343275.
A. Neyman. Stochastic games and nonexpansive maps. In A. Neyman and S. Sorin, editors, Stochastic Games and Applications, volume 570 of NATO Science Series C, pages 397-415. Kluwer Academic Publishers, 2003. URL: https://doi.org/10.1007/978-94-010-0189-2_26.
R. D. Nussbaum. Convexity and log convexity for the spectral radius. Linear Algebra Appl., 73:59-122, 1986. URL: https://doi.org/10.1016/0024-3795(86)90233-8.
D. Rosenberg and S. Sorin. An operator approach to zero-sum repeated games. Israel J. Math., 121(1):221-246, 2001. URL: https://doi.org/10.1007/BF02802505.
U. G. Rothblum. Multiplicative markov decision chains. Mathematics of Operations Research, 9(1):6-24, 1984.
U. G. Rothblum and P. Whittle. Growth optimality for branching markov decision chains. Mathematics of Operations Research, 7(4):582-601, 1982.
S. M. Rump. Polynomial minimum root separation. Mathematics of Computation, 145(33):327-336, 1979.
M. Skomra. Tropical spectrahedra: Application to semidefinite programming and mean payoff games. PhD thesis, Université Paris-Saclay, 2018. URL: https://pastel.archives-ouvertes.fr/tel-01958741.
M. Skomra. Optimal bounds for bit-sizes of stationary distributions in finite Markov chains. https://arxiv.org/abs/2109.04976, 2021.
K. Sladký. On dynamic programming recursions for multiplicative Markov decision chains, pages 216-226. Springer Berlin Heidelberg, Berlin, Heidelberg, 1976. URL: https://doi.org/10.1007/BFb0120753.
L. van den Dries. Tame topology and o-minimal structures, volume 248 of London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge, 1998. URL: https://doi.org/10.1017/CBO9780511525919.
R. J. Walker. Algebraic Curves. Springer, New York, 1978.
A. J. Wilkie. Model completeness results for expansions of the ordered field of real numbers by restricted Pfaffian functions and the exponential function. J. Amer. Math. Soc., 9(4):1051-1094, 1996.
W. H. M. Zijm. Asymptotic expansions for dynamic programming recursions with general nonnegative matrices. J. Optim. Theory Appl., 54(1):157-191, 1987. URL: https://doi.org/10.1007/BF00940410.
U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoret. Comput. Sci., 158(1-2):343-359, 1996. URL: https://doi.org/10.1016/0304-3975(95)00188-3.

Universal Complexity Bounds Based on Value Iteration and Application to Entropy Games

Authors Xavier Allamigeon, Stéphane Gaubert, Ricardo D. Katz, Mateusz Skomra

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Universal Complexity Bounds Based on Value Iteration and Application to Entropy Games

Authors Xavier Allamigeon, Stéphane Gaubert, Ricardo D. Katz, Mateusz Skomra

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message