The Complexity of Infinite-Horizon General-Sum Stochastic Games

Jin, Yujia; Muthukumar, Vidya; Sidford, Aaron

doi:10.4230/LIPIcs.ITCS.2023.76

Abstract

We study the complexity of computing stationary Nash equilibrium (NE) in n-player infinite-horizon general-sum stochastic games. We focus on the problem of computing NE in such stochastic games when each player is restricted to choosing a stationary policy and rewards are discounted. First, we prove that computing such NE is in PPAD (in addition to clearly being PPAD-hard). Second, we consider turn-based specializations of such games where at each state there is at most a single player that can take actions and show that these (seemingly-simpler) games remain PPAD-hard. Third, we show that under further structural assumptions on the rewards computing NE in such turn-based games is possible in polynomial time. Towards achieving these results we establish structural facts about stochastic games of broader utility, including monotonicity of utilities under single-state single-action changes and reductions to settings where each player controls a single state.

Eitan Altman. Flow control using the theory of zero sum markov games. IEEE transactions on automatic control, 39(4):814-818, 1994.
Daniel Andersson and Peter Bro Miltersen. The complexity of solving stochastic games on graphs. In International Symposium on Algorithms and Computation, pages 112-121. Springer, 2009.
Yu Bai, Chi Jin, and Tiancheng Yu. Near-optimal reinforcement learning with self-play. Advances in neural information processing systems, 33:2159-2170, 2020.
Tamer Başar and Geert Jan Olsder. Dynamic noncooperative game theory. SIAM, 1998.
Dimitri P Bertsekas. Dynamic programming and optimal control, volume 1(2). Athena scientific Belmont, MA, 1995.
Shant Boodaghians, Joshua Brakensiek, Samuel B Hopkins, and Aviad Rubinstein. Smoothed complexity of 2-player nash equilibria. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 271-282. IEEE, 2020.
Ron N Borkovsky, Ulrich Doraszelski, and Yaroslav Kryukov. A user’s guide to solving dynamic stochastic games using the homotopy method. Operations Research, 58(4-part-2):1116-1132, 2010.
Xi Chen, Decheng Dai, Ye Du, and Shang-Hua Teng. Settling the complexity of arrow-debreu equilibria in markets with additively separable utilities. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 273-282. IEEE, 2009.
Xi Chen and Xiaotie Deng. Settling the complexity of two-player nash equilibrium. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pages 261-272. IEEE, 2006.
Xi Chen, David Durfee, and Anthi Orfanou. On the complexity of nash equilibria in anonymous games. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 381-390, 2015.
Xi Chen, Dimitris Paparas, and Mihalis Yannakakis. The complexity of non-monotone markets. Journal of the ACM (JACM), 64(3):1-56, 2017.
Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203-224, 1992.
Vincent Conitzer and Tuomas Sandholm. Complexity results about nash equilibria. arXiv preprint, 2002. URL: http://arxiv.org/abs/cs/0205074.
Partha Dasgupta and Eric Maskin. The existence of equilibrium in discontinuous economic games, i: Theory. The Review of economic studies, 53(1):1-26, 1986.
Constantinos Daskalakis. On the complexity of approximating a nash equilibrium. ACM Transactions on Algorithms (TALG), 9(3):1-35, 2013.
Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a nash equilibrium. SIAM Journal on Computing, 39(1):195-259, 2009.
Constantinos Daskalakis, Noah Golowich, and Kaiqing Zhang. The complexity of markov equilibrium in stochastic games. arXiv preprint, 2022. URL: http://arxiv.org/abs/2204.03991.
Constantinos Daskalakis, Stratis Skoulakis, and Manolis Zampetakis. The complexity of constrained min-max optimization. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1466-1478, 2021.
Argyrios Deligkas, John Fearnley, and Rahul Savani. Tree polymatrix games are ppad-hard. arXiv preprint, 2020. URL: http://arxiv.org/abs/2002.12119.
Xiaotie Deng, Yuhao Li, David Henry Mguni, Jun Wang, and Yaodong Yang. On the complexity of computing markov perfect equilibrium in general-sum stochastic games. arXiv preprint, 2021. URL: http://arxiv.org/abs/2109.01795.
Liam Dermed and Charles Isbell. Solving stochastic games. Advances in Neural Information Processing Systems, 22, 2009.
E Allen Emerson and Charanjit S Jutla. Tree automata, mu-calculus and determinacy. In FoCS, volume 91, pages 368-377. Citeseer, 1991.
Kousha Etessami and Mihalis Yannakakis. On the complexity of nash equilibria and other fixed points. SIAM Journal on Computing, 39(6):2531-2597, 2010.
Jerzy Filar and Koos Vrieze. Competitive Markov decision processes. Springer Science & Business Media, 2012.
Aris Filos-Ratsikas, Yiannis Giannakopoulos, Alexandros Hollender, Philip Lazos, and Diogo Poças. On the complexity of equilibrium computation in first-price auctions. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 454-476, 2021.
Aris Filos-Ratsikas, Kristoffer Arnsfelt Hansen, Kasper Høgh, and Alexandros Hollender. Fixp-membership via convex optimization: Games, cakes, and markets. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 827-838. IEEE, 2022.
Arlington M Fink. Equilibrium in a stochastic n-person game. Journal of science of the hiroshima university, series ai (mathematics), 28(1):89-93, 1964.
Jugal Garg, Ruta Mehta, Vijay V Vazirani, and Sadra Yazdanbod. Settling the complexity of leontief and plc exchange markets under exact and approximate equilibria. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 890-901, 2017.
Amy Greenwald, Keith Hall, Roberto Serrano, et al. Correlated q-learning. In ICML, volume 3, pages 242-249, 2003.
Vladimir A Gurvich, Alexander V Karzanov, and LG Khachivan. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Computational Mathematics and Mathematical Physics, 28(5):85-91, 1988.
Thomas Dueholm Hansen, Peter Bro Miltersen, and Uri Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. Journal of the ACM (JACM), 60(1):1-16, 2013.
P Jean-Jacques Herings and Ronald JAP Peeters. Stationary equilibria in stochastic games: structure, selection, and computation. Journal of Economic Theory, 118:32-60, 2004.
Junling Hu and Michael P Wellman. Nash q-learning for general-sum stochastic games. Journal of machine learning research, 4(Nov):1039-1069, 2003.
Chi Jin, Qinghua Liu, Yuanhao Wang, and Tiancheng Yu. V-learning-a simple, efficient, decentralized algorithm for multiagent rl. arXiv preprint, 2021. URL: http://arxiv.org/abs/2110.14555.
Yujia Jin and Aaron Sidford. Towards tight bounds on the sample complexity of average-reward mdps. In International Conference on Machine Learning, pages 5055-5064. PMLR, 2021.
Marcin Jurdziński, Mike Paterson, and Uri Zwick. A deterministic subexponential algorithm for solving parity games. SIAM Journal on Computing, 38(4):1519-1532, 2008.
Sham Machandranath Kakade et al. On the sample complexity of reinforcement learning. PhD thesis, University of London London, England, 2003.
Ioannis Karatzas, Martin Shubik, and William D Sudderth. A strategic market game with secured lending. Journal of mathematical economics, 28(2):207-247, 1997.
Michael Kearns. Graphical games. Algorithmic game theory, 3:159-180, 2007.
Michael Kearns, Michael L Littman, and Satinder Singh. Graphical models for game theory. arXiv preprint, 2013. URL: http://arxiv.org/abs/1301.2281.
Vijaymohan R Konda and Vivek S Borkar. Actor-critic-type learning algorithms for markov decision processes. SIAM Journal on control and Optimization, 38(1):94-123, 1999.
David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic cournot-nash solution. The Bell Journal of Economics, pages 322-334, 1980.
Michael L Littman et al. Friend-or-foe q-learning in general-sum games. In ICML, volume 1, pages 322-328, 2001.
Zhengyang Liu and Ying Sheng. On the approximation of nash equilibria in sparse win-lose games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32(1), 2018.
Dmitrii Lozovanu. Stationary nash equilibria for average stochastic positional games. In Frontiers of Dynamic Games, pages 139-163. Springer, 2018.
Weichao Mao and Tamer Başar. Provably efficient reinforcement learning in decentralized general-sum markov games. Dynamic Games and Applications, pages 1-22, 2022.
Nimrod Megiddo and Christos H Papadimitriou. On total functions, existence theorems and computational complexity. Theoretical Computer Science, 81(2):317-324, 1991.
Ruta Mehta. Constant rank bimatrix games are ppad-hard. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 545-554, 2014.
Dov Monderer and Lloyd S Shapley. Potential games. Games and economic behavior, 14(1):124-143, 1996.
Roger B Myerson. Game theory: analysis of conflict. Harvard university press, 1997.
John Nash. Non-cooperative games. Annals of mathematics, pages 286-295, 1951.
Christos Papadimitriou and Binghui Peng. Public goods games in directed networks. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 745-762, 2021.
Christos H Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and system Sciences, 48(3):498-532, 1994.
Christos H Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM), 55(3):1-29, 2008.
Julien Pérolat, Florian Strub, Bilal Piot, and Olivier Pietquin. Learning nash equilibrium for general-sum markov games from batch data. In Artificial Intelligence and Statistics, pages 232-241. PMLR, 2017.
HL Prasad, Prashanth LA, and Shalabh Bhatnagar. Two-timescale algorithms for learning nash equilibria in general-sum stochastic games. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 1371-1379, 2015.
Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
Aviad Rubinstein. Settling the complexity of computing approximate two-player nash equilibria. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 258-265. IEEE, 2016.
Aviad Rubinstein. Inapproximability of nash equilibrium. SIAM Journal on Computing, 47(3):917-959, 2018.
Grant R Schoenebeck and Salil Vadhan. The computational complexity of nash equilibria in concisely represented games. ACM Transactions on Computation Theory (TOCT), 4(2):1-50, 2012.
Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095-1100, 1953.
Aaron Sidford, Mengdi Wang, Lin Yang, and Yinyu Ye. Solving discounted stochastic two-player games with near-optimal time and sample complexity. In International Conference on Artificial Intelligence and Statistics, pages 2992-3002. PMLR, 2020.
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484-489, 2016.
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550(7676):354-359, 2017.
Ziang Song, Song Mei, and Yu Bai. When can we learn general-sum markov games with a large number of players sample-efficiently? arXiv preprint, 2021. URL: http://arxiv.org/abs/2110.04184.
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
Masayuki Takahashi. Equilibrium points of stochastic non-cooperative n-person games. Journal of Science of the Hiroshima University, Series AI (Mathematics), 28(1):95-99, 1964.
Vijay V Vazirani and Mihalis Yannakakis. Market equilibrium under separable, piecewise-linear, concave utilities. Journal of the ACM (JACM), 58(3):1-25, 2011.
Jens Vöge and Marcin Jurdziński. A discrete strategy improvement algorithm for solving parity games. In International conference on computer aided verification, pages 202-215. Springer, 2000.
Yinyu Ye. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4):593-603, 2011.
Peyton Young and Shmuel Zamir. Handbook of game theory. Elsevier, 2014.
Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, pages 321-384, 2021.
Martin Zinkevich, Amy Greenwald, and Michael Littman. Cyclic equilibria in markov games. Advances in Neural Information Processing Systems, 18:1641, 2006.
Uri Zwick and Mike Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158(1-2):343-359, 1996.

The Complexity of Infinite-Horizon General-Sum Stochastic Games

Authors Yujia Jin, Vidya Muthukumar, Aaron Sidford

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

The Complexity of Infinite-Horizon General-Sum Stochastic Games

Authors Yujia Jin, Vidya Muthukumar, Aaron Sidford

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message