The Complexity of Infinite-Horizon General-Sum Stochastic Games

Authors Yujia Jin, Vidya Muthukumar, Aaron Sidford



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2023.76.pdf
  • Filesize: 1.21 MB
  • 20 pages

Document Identifiers

Author Details

Yujia Jin
  • Stanford University, CA, USA
Vidya Muthukumar
  • Georgia Institute of Technology, Atlanta, GA, USA
Aaron Sidford
  • Stanford University, CA, USA

Acknowledgements

The authors thank Constantinos Daskalakis, Noah Golowich, and Kaiqing Zhang for kindly coordinating on uploads to arXiv. The authors also thank Aviad Rubinstein and anonymous reviewers for helpful feedback. Part of this work was conducted while the authors were visiting the Simons Institute for the Theory of Computing.

Cite AsGet BibTex

Yujia Jin, Vidya Muthukumar, and Aaron Sidford. The Complexity of Infinite-Horizon General-Sum Stochastic Games. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 76:1-76:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ITCS.2023.76

Abstract

We study the complexity of computing stationary Nash equilibrium (NE) in n-player infinite-horizon general-sum stochastic games. We focus on the problem of computing NE in such stochastic games when each player is restricted to choosing a stationary policy and rewards are discounted. First, we prove that computing such NE is in PPAD (in addition to clearly being PPAD-hard). Second, we consider turn-based specializations of such games where at each state there is at most a single player that can take actions and show that these (seemingly-simpler) games remain PPAD-hard. Third, we show that under further structural assumptions on the rewards computing NE in such turn-based games is possible in polynomial time. Towards achieving these results we establish structural facts about stochastic games of broader utility, including monotonicity of utilities under single-state single-action changes and reductions to settings where each player controls a single state.

Subject Classification

ACM Subject Classification
  • Theory of computation → Complexity classes
Keywords
  • complexity
  • stochastic games
  • general-sum games
  • Nash equilibrium

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Eitan Altman. Flow control using the theory of zero sum markov games. IEEE transactions on automatic control, 39(4):814-818, 1994. Google Scholar
  2. Daniel Andersson and Peter Bro Miltersen. The complexity of solving stochastic games on graphs. In International Symposium on Algorithms and Computation, pages 112-121. Springer, 2009. Google Scholar
  3. Yu Bai, Chi Jin, and Tiancheng Yu. Near-optimal reinforcement learning with self-play. Advances in neural information processing systems, 33:2159-2170, 2020. Google Scholar
  4. Tamer Başar and Geert Jan Olsder. Dynamic noncooperative game theory. SIAM, 1998. Google Scholar
  5. Dimitri P Bertsekas. Dynamic programming and optimal control, volume 1(2). Athena scientific Belmont, MA, 1995. Google Scholar
  6. Shant Boodaghians, Joshua Brakensiek, Samuel B Hopkins, and Aviad Rubinstein. Smoothed complexity of 2-player nash equilibria. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 271-282. IEEE, 2020. Google Scholar
  7. Ron N Borkovsky, Ulrich Doraszelski, and Yaroslav Kryukov. A user’s guide to solving dynamic stochastic games using the homotopy method. Operations Research, 58(4-part-2):1116-1132, 2010. Google Scholar
  8. Xi Chen, Decheng Dai, Ye Du, and Shang-Hua Teng. Settling the complexity of arrow-debreu equilibria in markets with additively separable utilities. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 273-282. IEEE, 2009. Google Scholar
  9. Xi Chen and Xiaotie Deng. Settling the complexity of two-player nash equilibrium. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pages 261-272. IEEE, 2006. Google Scholar
  10. Xi Chen, David Durfee, and Anthi Orfanou. On the complexity of nash equilibria in anonymous games. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 381-390, 2015. Google Scholar
  11. Xi Chen, Dimitris Paparas, and Mihalis Yannakakis. The complexity of non-monotone markets. Journal of the ACM (JACM), 64(3):1-56, 2017. Google Scholar
  12. Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203-224, 1992. Google Scholar
  13. Vincent Conitzer and Tuomas Sandholm. Complexity results about nash equilibria. arXiv preprint, 2002. URL: http://arxiv.org/abs/cs/0205074.
  14. Partha Dasgupta and Eric Maskin. The existence of equilibrium in discontinuous economic games, i: Theory. The Review of economic studies, 53(1):1-26, 1986. Google Scholar
  15. Constantinos Daskalakis. On the complexity of approximating a nash equilibrium. ACM Transactions on Algorithms (TALG), 9(3):1-35, 2013. Google Scholar
  16. Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a nash equilibrium. SIAM Journal on Computing, 39(1):195-259, 2009. Google Scholar
  17. Constantinos Daskalakis, Noah Golowich, and Kaiqing Zhang. The complexity of markov equilibrium in stochastic games. arXiv preprint, 2022. URL: http://arxiv.org/abs/2204.03991.
  18. Constantinos Daskalakis, Stratis Skoulakis, and Manolis Zampetakis. The complexity of constrained min-max optimization. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1466-1478, 2021. Google Scholar
  19. Argyrios Deligkas, John Fearnley, and Rahul Savani. Tree polymatrix games are ppad-hard. arXiv preprint, 2020. URL: http://arxiv.org/abs/2002.12119.
  20. Xiaotie Deng, Yuhao Li, David Henry Mguni, Jun Wang, and Yaodong Yang. On the complexity of computing markov perfect equilibrium in general-sum stochastic games. arXiv preprint, 2021. URL: http://arxiv.org/abs/2109.01795.
  21. Liam Dermed and Charles Isbell. Solving stochastic games. Advances in Neural Information Processing Systems, 22, 2009. Google Scholar
  22. E Allen Emerson and Charanjit S Jutla. Tree automata, mu-calculus and determinacy. In FoCS, volume 91, pages 368-377. Citeseer, 1991. Google Scholar
  23. Kousha Etessami and Mihalis Yannakakis. On the complexity of nash equilibria and other fixed points. SIAM Journal on Computing, 39(6):2531-2597, 2010. Google Scholar
  24. Jerzy Filar and Koos Vrieze. Competitive Markov decision processes. Springer Science & Business Media, 2012. Google Scholar
  25. Aris Filos-Ratsikas, Yiannis Giannakopoulos, Alexandros Hollender, Philip Lazos, and Diogo Poças. On the complexity of equilibrium computation in first-price auctions. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 454-476, 2021. Google Scholar
  26. Aris Filos-Ratsikas, Kristoffer Arnsfelt Hansen, Kasper Høgh, and Alexandros Hollender. Fixp-membership via convex optimization: Games, cakes, and markets. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 827-838. IEEE, 2022. Google Scholar
  27. Arlington M Fink. Equilibrium in a stochastic n-person game. Journal of science of the hiroshima university, series ai (mathematics), 28(1):89-93, 1964. Google Scholar
  28. Jugal Garg, Ruta Mehta, Vijay V Vazirani, and Sadra Yazdanbod. Settling the complexity of leontief and plc exchange markets under exact and approximate equilibria. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 890-901, 2017. Google Scholar
  29. Amy Greenwald, Keith Hall, Roberto Serrano, et al. Correlated q-learning. In ICML, volume 3, pages 242-249, 2003. Google Scholar
  30. Vladimir A Gurvich, Alexander V Karzanov, and LG Khachivan. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Computational Mathematics and Mathematical Physics, 28(5):85-91, 1988. Google Scholar
  31. Thomas Dueholm Hansen, Peter Bro Miltersen, and Uri Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. Journal of the ACM (JACM), 60(1):1-16, 2013. Google Scholar
  32. P Jean-Jacques Herings and Ronald JAP Peeters. Stationary equilibria in stochastic games: structure, selection, and computation. Journal of Economic Theory, 118:32-60, 2004. Google Scholar
  33. Junling Hu and Michael P Wellman. Nash q-learning for general-sum stochastic games. Journal of machine learning research, 4(Nov):1039-1069, 2003. Google Scholar
  34. Chi Jin, Qinghua Liu, Yuanhao Wang, and Tiancheng Yu. V-learning-a simple, efficient, decentralized algorithm for multiagent rl. arXiv preprint, 2021. URL: http://arxiv.org/abs/2110.14555.
  35. Yujia Jin and Aaron Sidford. Towards tight bounds on the sample complexity of average-reward mdps. In International Conference on Machine Learning, pages 5055-5064. PMLR, 2021. Google Scholar
  36. Marcin Jurdziński, Mike Paterson, and Uri Zwick. A deterministic subexponential algorithm for solving parity games. SIAM Journal on Computing, 38(4):1519-1532, 2008. Google Scholar
  37. Sham Machandranath Kakade et al. On the sample complexity of reinforcement learning. PhD thesis, University of London London, England, 2003. Google Scholar
  38. Ioannis Karatzas, Martin Shubik, and William D Sudderth. A strategic market game with secured lending. Journal of mathematical economics, 28(2):207-247, 1997. Google Scholar
  39. Michael Kearns. Graphical games. Algorithmic game theory, 3:159-180, 2007. Google Scholar
  40. Michael Kearns, Michael L Littman, and Satinder Singh. Graphical models for game theory. arXiv preprint, 2013. URL: http://arxiv.org/abs/1301.2281.
  41. Vijaymohan R Konda and Vivek S Borkar. Actor-critic-type learning algorithms for markov decision processes. SIAM Journal on control and Optimization, 38(1):94-123, 1999. Google Scholar
  42. David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic cournot-nash solution. The Bell Journal of Economics, pages 322-334, 1980. Google Scholar
  43. Michael L Littman et al. Friend-or-foe q-learning in general-sum games. In ICML, volume 1, pages 322-328, 2001. Google Scholar
  44. Zhengyang Liu and Ying Sheng. On the approximation of nash equilibria in sparse win-lose games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32(1), 2018. Google Scholar
  45. Dmitrii Lozovanu. Stationary nash equilibria for average stochastic positional games. In Frontiers of Dynamic Games, pages 139-163. Springer, 2018. Google Scholar
  46. Weichao Mao and Tamer Başar. Provably efficient reinforcement learning in decentralized general-sum markov games. Dynamic Games and Applications, pages 1-22, 2022. Google Scholar
  47. Nimrod Megiddo and Christos H Papadimitriou. On total functions, existence theorems and computational complexity. Theoretical Computer Science, 81(2):317-324, 1991. Google Scholar
  48. Ruta Mehta. Constant rank bimatrix games are ppad-hard. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 545-554, 2014. Google Scholar
  49. Dov Monderer and Lloyd S Shapley. Potential games. Games and economic behavior, 14(1):124-143, 1996. Google Scholar
  50. Roger B Myerson. Game theory: analysis of conflict. Harvard university press, 1997. Google Scholar
  51. John Nash. Non-cooperative games. Annals of mathematics, pages 286-295, 1951. Google Scholar
  52. Christos Papadimitriou and Binghui Peng. Public goods games in directed networks. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 745-762, 2021. Google Scholar
  53. Christos H Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and system Sciences, 48(3):498-532, 1994. Google Scholar
  54. Christos H Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM), 55(3):1-29, 2008. Google Scholar
  55. Julien Pérolat, Florian Strub, Bilal Piot, and Olivier Pietquin. Learning nash equilibrium for general-sum markov games from batch data. In Artificial Intelligence and Statistics, pages 232-241. PMLR, 2017. Google Scholar
  56. HL Prasad, Prashanth LA, and Shalabh Bhatnagar. Two-timescale algorithms for learning nash equilibria in general-sum stochastic games. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 1371-1379, 2015. Google Scholar
  57. Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014. Google Scholar
  58. Aviad Rubinstein. Settling the complexity of computing approximate two-player nash equilibria. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 258-265. IEEE, 2016. Google Scholar
  59. Aviad Rubinstein. Inapproximability of nash equilibrium. SIAM Journal on Computing, 47(3):917-959, 2018. Google Scholar
  60. Grant R Schoenebeck and Salil Vadhan. The computational complexity of nash equilibria in concisely represented games. ACM Transactions on Computation Theory (TOCT), 4(2):1-50, 2012. Google Scholar
  61. Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095-1100, 1953. Google Scholar
  62. Aaron Sidford, Mengdi Wang, Lin Yang, and Yinyu Ye. Solving discounted stochastic two-player games with near-optimal time and sample complexity. In International Conference on Artificial Intelligence and Statistics, pages 2992-3002. PMLR, 2020. Google Scholar
  63. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484-489, 2016. Google Scholar
  64. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550(7676):354-359, 2017. Google Scholar
  65. Ziang Song, Song Mei, and Yu Bai. When can we learn general-sum markov games with a large number of players sample-efficiently? arXiv preprint, 2021. URL: http://arxiv.org/abs/2110.04184.
  66. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. Google Scholar
  67. Masayuki Takahashi. Equilibrium points of stochastic non-cooperative n-person games. Journal of Science of the Hiroshima University, Series AI (Mathematics), 28(1):95-99, 1964. Google Scholar
  68. Vijay V Vazirani and Mihalis Yannakakis. Market equilibrium under separable, piecewise-linear, concave utilities. Journal of the ACM (JACM), 58(3):1-25, 2011. Google Scholar
  69. Jens Vöge and Marcin Jurdziński. A discrete strategy improvement algorithm for solving parity games. In International conference on computer aided verification, pages 202-215. Springer, 2000. Google Scholar
  70. Yinyu Ye. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4):593-603, 2011. Google Scholar
  71. Peyton Young and Shmuel Zamir. Handbook of game theory. Elsevier, 2014. Google Scholar
  72. Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, pages 321-384, 2021. Google Scholar
  73. Martin Zinkevich, Amy Greenwald, and Michael Littman. Cyclic equilibria in markov games. Advances in Neural Information Processing Systems, 18:1641, 2006. Google Scholar
  74. Uri Zwick and Mike Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158(1-2):343-359, 1996. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail