On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

Authors Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm, Manolis Zampetakis



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2024.5.pdf
  • Filesize: 0.94 MB
  • 24 pages

Document Identifiers

Author Details

Ioannis Anagnostides
  • Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Alkis Kalavasis
  • Department of Computer Science, Yale University, New Haven, CT, USA
Tuomas Sandholm
  • Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Manolis Zampetakis
  • Department of Computer Science, Yale University, New Haven, CT, USA

Acknowledgements

We are grateful to the anonymous ITCS reviewers for their helpful feedback.

Cite AsGet BibTex

Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm, and Manolis Zampetakis. On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 5:1-5:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ITCS.2024.5

Abstract

Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby leading to faster convergence to the set of coarse correlated equilibria (CCE) - a standard game-theoretic equilibrium concept. Yet, despite considerable recent progress, the fundamental complexity barriers for learning in normal- and extensive-form games are poorly understood. In this paper, we make a step towards closing this gap by first showing that - barring major complexity breakthroughs - any polynomial-time learning algorithms in extensive-form games need at least 2^{log^{1/2 - o(1)} |𝒯|} iterations for the average regret to reach below even an absolute constant, where |𝒯| is the number of nodes in the game. This establishes a superpolynomial separation between no-regret learning in normal- and extensive-form games, as in the former class a logarithmic number of iterations suffices to achieve constant average regret. Furthermore, our results imply that algorithms such as multiplicative weights update, as well as its optimistic counterpart, require at least 2^{(log log m)^{1/2 - o(1)}} iterations to attain an O(1)-CCE in m-action normal-form games under any parameterization. These are the first non-trivial - and dimension-dependent - lower bounds in that setting for the most well-studied algorithms in the literature. From a technical standpoint, we follow a beautiful connection recently made by Foster, Golowich, and Kakade (ICML '23) between sparse CCE and Nash equilibria in the context of Markov games. Consequently, our lower bounds rule out polynomial-time algorithms well beyond the traditional online learning framework, capturing techniques commonly used for accelerating centralized equilibrium computation.

Subject Classification

ACM Subject Classification
  • Theory of computation → Convergence and learning in games
Keywords
  • No-regret learning
  • extensive-form games
  • multiplicative weights update
  • optimism
  • lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jacob Abernethy, Chansoo Lee, and Ambuj Tewari. Perturbation techniques in online learning and optimization. Perturbations, Optimization, and Statistics, 233, 2016. Google Scholar
  2. Robert Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1:67-96, 1974. Google Scholar
  3. Yakov Babichenko. Query complexity of approximate nash equilibria. Journal of the ACM, 63(4):36:1-36:24, 2016. Google Scholar
  4. Yakov Babichenko, Christos H. Papadimitriou, and Aviad Rubinstein. Can almost everybody be almost happy? In Proceedings of the Conference on Innovations in Theoretical Computer Science, pages 1-9. ACM, 2016. Google Scholar
  5. Yu Bai, Chi Jin, Song Mei, Ziang Song, and Tiancheng Yu. Efficient phi-regret minimization in extensive-form games via online mirror descent. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022. Google Scholar
  6. Yu Bai, Chi Jin, Song Mei, and Tiancheng Yu. Near-optimal learning of extensive-form games with imperfect information. In International Conference on Machine Learning (ICML), pages 1337-1382. PMLR, 2022. Google Scholar
  7. Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sasha Mitts, Adithya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, and Markus Zijlstra. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067-1074, 2022. Google Scholar
  8. Daniel Beaglehole, Max Hopkins, Daniel Kane, Sihan Liu, and Shachar Lovett. Sampling equilibria: Fast no-regret learning in structured games. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3817-3855. SIAM, 2023. Google Scholar
  9. Shai Ben-David, Dávid Pál, and Shai Shalev-Shwartz. Agnostic online learning. In Conference on Learning Theory (COLT), 2009. Google Scholar
  10. David Blackwell. An analog of the minmax theorem for vector payoffs. Pacific Journal of Mathematics, 6:1-8, 1956. Google Scholar
  11. Avrim Blum and Yishay Mansour. Learning, regret minimization, and equilibria, 2007. Google Scholar
  12. Christian Borgs, Jennifer T. Chayes, Nicole Immorlica, Adam Tauman Kalai, Vahab S. Mirrokni, and Christos H. Papadimitriou. The myth of the folk theorem. Games and Economic Behavior, 70(1):34-43, 2010. Google Scholar
  13. Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up limit hold'em poker is solved. Science, 347(6218), January 2015. Google Scholar
  14. Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, pages 418-424, December 2018. Google Scholar
  15. Noam Brown and Tuomas Sandholm. Solving imperfect-information games via discounted regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 2019. Google Scholar
  16. Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885-890, 2019. Google Scholar
  17. Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. Google Scholar
  18. Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM, 2009. Google Scholar
  19. Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020. Google Scholar
  20. Chirag Chhablani, Michael Sullins, and Ian A. Kash. Multiplicative weight updates for extensive form games. In Autonomous Agents and Multi-Agent Systems, pages 1071-1078. ACM, 2023. Google Scholar
  21. Francis Chu and Joseph Halpern. On the NP-completeness of finding an optimal strategy in games with common payoffs. International Journal of Game Theory, 2001. Google Scholar
  22. Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. From external to swap regret 2.0: An efficient reduction and oblivious adversary for large action spaces, 2023. Google Scholar
  23. Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. Near-optimal no-regret algorithms for zero-sum games. Games and Economic Behavior, 92:327-348, 2015. Google Scholar
  24. Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. Near-optimal no-regret learning in general games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 27604-27616, 2021. Google Scholar
  25. Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a nash equilibrium. SIAM Journal on Computing, 39(1), 2009. Google Scholar
  26. Constantinos Daskalakis and Noah Golowich. Fast rates for nonparametric online learning: from realizability to learning in games. In Proceedings of the Annual Symposium on Theory of Computing (STOC), pages 846-859. ACM, 2022. Google Scholar
  27. Miroslav Dudík and Geoffrey J. Gordon. A sampling-based approach to computing equilibria in succinct extensive-form games. In UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009, pages 151-160. AUAI Press, 2009. Google Scholar
  28. Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, and Yishay Mansour. Regret minimization and convergence to equilibria in general-sum markov games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 9343-9373. PMLR, 2023. Google Scholar
  29. Gabriele Farina, Tommaso Bianchi, and Tuomas Sandholm. Coarse correlation in extensive-form games. In AAAI Conference on Artificial Intelligence (AAAI), volume 34, pages 1934-1941, 2020. Google Scholar
  30. Gabriele Farina, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Simple uncoupled no-regret learning dynamics for extensive-form correlated equilibrium. Journal of the ACM, 69(6):41:1-41:41, 2022. Google Scholar
  31. Gabriele Farina, Christian Kroer, Noam Brown, and Tuomas Sandholm. Stable-predictive optimistic counterfactual regret minimization. In International Conference on Machine Learning (ICML), 2019. Google Scholar
  32. Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Regret circuits: Composability of regret minimizers. In International Conference on Machine Learning, pages 1863-1872, 2019. Google Scholar
  33. Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Better regularization for sequential decision spaces: Fast convergence rates for nash, correlated, and team equilibria. In Proceedings of the ACM Conference on Economics and Computation (EC), page 432. ACM, 2021. Google Scholar
  34. Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent. In AAAI Conference on Artificial Intelligence (AAAI), 2021. Google Scholar
  35. Gabriele Farina, Chung-Wei Lee, Haipeng Luo, and Christian Kroer. Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games. In International Conference on Machine Learning (ICML), volume 162 of Proceedings of Machine Learning Research, pages 6337-6357. PMLR, 2022. Google Scholar
  36. John Fearnley, Martin Gairing, Paul W. Goldberg, and Rahul Savani. Learning equilibria of games via payoff queries. Journal of Machine Learning Research, 16:1305-1344, 2015. Google Scholar
  37. John Fearnley and Rahul Savani. Finding approximate nash equilibria of bimatrix games via payoff queries. ACM Trans. Economics and Comput., 4(4):25:1-25:19, 2016. Google Scholar
  38. Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, and Michal Valko. Adapting to game trees in zero-sum imperfect information games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 10093-10135. PMLR, 2023. Google Scholar
  39. Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, and Michal Valko. Local and adaptive mirror descents in extensive-form games, 2023. URL: https://arxiv.org/abs/2309.00656.
  40. Dean Foster and Rakesh Vohra. Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21:40-55, 1997. Google Scholar
  41. Dylan J. Foster, Noah Golowich, and Sham M. Kakade. Hardness of independent learning and sparse equilibrium computation in markov games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 10188-10221. PMLR, 2023. Google Scholar
  42. Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, and Éva Tardos. Learning in games: Robustness of fast convergence. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 4727-4735, 2016. Google Scholar
  43. Paul W. Goldberg and Matthew J. Katzman. Lower bounds for the query complexity of equilibria in lipschitz games. Theor. Comput. Sci., 962:113931, 2023. Google Scholar
  44. Geoffrey J Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. In Proceedings of the 25superscriptth international conference on Machine learning, pages 360-367. ACM, 2008. Google Scholar
  45. Hédi Hadiji, Sarah Sachs, Tim van Erven, and Wouter M. Koolen. Towards characterizing the first-order query complexity of learning (approximate) nash equilibria in zero-sum matrix games, 2023. Google Scholar
  46. Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68:1127-1150, 2000. Google Scholar
  47. Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157-325, 2016. Google Scholar
  48. Johannes Heinrich, Marc Lanctot, and David Silver. Fictitious self-play in extensive-form games. In International Conference on Machine Learning (ICML), volume 37 of JMLR Workshop and Conference Proceedings, pages 805-813. JMLR.org, 2015. Google Scholar
  49. Wassily Hoeffding and J. Wolfowitz. Distinguishability of sets of distributions. The Annals of Mathematical Statistics, 29(3):700-718, 1958. Google Scholar
  50. Yu-Guan Hsieh, Kimon Antonakopoulos, Volkan Cevher, and Panayotis Mertikopoulos. No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022. Google Scholar
  51. Yu-Guan Hsieh, Kimon Antonakopoulos, and Panayotis Mertikopoulos. Adaptive learning in continuous games: Optimal regret bounds and convergence to nash equilibrium. In Conference on Learning Theory (COLT), volume 134 of Proceedings of Machine Learning Research, pages 2388-2422. PMLR, 2021. Google Scholar
  52. Wan Huang and Bernhard von Stengel. Computing an extensive-form correlated equilibrium in polynomial time. In Internet and Network Economics, 4th International Workshop, WINE 2008, volume 5385 of Lecture Notes in Computer Science, pages 506-513. Springer, 2008. Google Scholar
  53. Albert Xin Jiang and Kevin Leyton-Brown. Polynomial-time computation of exact correlated equilibrium in compact games. Games and Economic Behavior, 91:347-359, 2015. Google Scholar
  54. Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71:291-307, 2005. Google Scholar
  55. Ehsan Asadi Kangarshahi, Ya-Ping Hsieh, Mehmet Fatih Sahin, and Volkan Cevher. Let’s be honest: An optimal no-regret framework for zero-sum games. In International Conference on Machine Learning (ICML), volume 80 of Proceedings of Machine Learning Research, pages 2493-2501. PMLR, 2018. Google Scholar
  56. Daphne Koller, Nimrod Megiddo, and Bernhard von Stengel. Fast algorithms for finding randomized strategies in game trees. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 1994. Google Scholar
  57. Tadashi Kozuno, Pierre Ménard, Rémi Munos, and Michal Valko. Learning in two-player zero-sum partially observable markov games with perfect recall. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 11987-11998, 2021. Google Scholar
  58. Richard Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games using simple strategies. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 36-41, San Diego, CA, 2003. ACM. Google Scholar
  59. Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2:285-318, 1988. Google Scholar
  60. Michael Littman and Peter Stone. A polynomial-time Nash equilibrium algorithm for repeated games. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 48-54, San Diego, CA, 2003. Google Scholar
  61. Arnab Maiti, Ross Boczar, Kevin G. Jamieson, and Lillian J. Ratliff. Query-efficient algorithms to find the unique nash equilibrium in a two-player zero-sum matrix game, 2023. Google Scholar
  62. Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, and Amy R. Greenwald. Efficient deviation types and learning for hindsight rationality in extensive-form games. In Marina Meila and Tong Zhang, editors, International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 7818-7828. PMLR, 2021. Google Scholar
  63. Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy R. Greenwald, and Michael Bowling. Hindsight and sequential rationality of correlated play. In AAAI Conference on Artificial Intelligence (AAAI), pages 5584-5594. AAAI Press, 2021. Google Scholar
  64. H. Moulin and J.-P. Vial. Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 7(3-4):201-221, 1978. Google Scholar
  65. Christos H. Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and system Sciences, 48(3):498-532, 1994. Google Scholar
  66. Christos H. Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player games. Journal of the ACM, 55(3):14:1-14:29, 2008. Google Scholar
  67. Binghui Peng and Aviad Rubinstein. Fast swap regret minimization and applications to approximate correlated equilibria, 2023. Google Scholar
  68. Georgios Piliouras, Ryann Sim, and Stratis Skoulakis. Beyond time-average convergence: Near-optimal uncoupled online learning via clairvoyant multiplicative weights update. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022. Google Scholar
  69. Ju Qi, Ting Feng, Falun Hei, Zhemei Fang, and Yunfeng Luo. Pure monte carlo counterfactual regret minimization, 2023. URL: https://arxiv.org/abs/2309.03084.
  70. Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In Conference on Learning Theory, pages 993-1019, 2013. Google Scholar
  71. Alexander Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems, pages 3066-3074, 2013. Google Scholar
  72. Julia Robinson. An iterative method of solving a game. Annals of Mathematics, 54:296-301, 1951. Google Scholar
  73. I. Romanovskii. Reduction of a game with complete memory to a matrix game. Soviet Mathematics, 3, 1962. Google Scholar
  74. Aviad Rubinstein. Inapproximability of nash equilibrium. SIAM Journal on Computing, 47(3):917-959, 2018. Google Scholar
  75. Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008. Google Scholar
  76. Ziang Song, Song Mei, and Yu Bai. Sample-efficient learning of correlated equilibria in extensive-form games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022. Google Scholar
  77. Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems, pages 2989-2997, 2015. Google Scholar
  78. Eiji Takimoto and Manfred K. Warmuth. Path kernels and multiplicative updates. Journal of Machine Learning Research, 4:773-818, 2003. Google Scholar
  79. Oskari Tammelin, Neil Burch, Michael Johanson, and Michael Bowling. Solving heads-up limit Texas hold'em. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), 2015. Google Scholar
  80. Xiaohang Tang, Le Cong Dinh, Stephen Marcus McAleer, and Yaodong Yang. Regret-minimizing double oracle for extensive-form games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 33599-33615. PMLR, 2023. Google Scholar
  81. Emanuel Tewolde, Caspar Oesterheld, Vincent Conitzer, and Paul W. Goldberg. The computational complexity of single-player imperfect-recall games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2878-2887, 2023. Google Scholar
  82. Bernhard von Stengel. Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220-246, 1996. Google Scholar
  83. Bernhard von Stengel and Françoise Forges. Extensive-form correlated equilibrium: Definition and computational complexity. Mathematics of Operations Research, 33(4):1002-1022, 2008. Google Scholar
  84. V. G. Vovk. Aggregating strategies. In Conference on Learning Theory (COLT), pages 371-386. Morgan Kaufmann, 1990. Google Scholar
  85. Andre Wibisono, Molei Tao, and Georgios Piliouras. Alternating mirror descent for constrained min-max games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022. Google Scholar
  86. Yuepeng Yang and Cong Ma. O(T^-1) convergence of optimistic-follow-the-regularized-leader in two-player zero-sum markov games. In The Eleventh International Conference on Learning Representations, ICLR 2023. OpenReview.net, 2023. Google Scholar
  87. Brian Hu Zhang and Tuomas Sandholm. Finding and certifying (near-)optimal strategies in black-box extensive-form games. In AAAI Conference on Artificial Intelligence (AAAI), pages 5779-5788. AAAI Press, 2021. Google Scholar
  88. Brian Hu Zhang and Tuomas Sandholm. Team correlated equilibria in zero-sum extensive-form games via tree decompositions. In AAAI Conference on Artificial Intelligence (AAAI), pages 5252-5259. AAAI Press, 2022. Google Scholar
  89. Runyu Zhang, Qinghua Liu, Huan Wang, Caiming Xiong, Na Li, and Yu Bai. Policy optimization for markov games: Unified framework and faster convergence. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022. Google Scholar
  90. Martin Zinkevich, Michael Bowling, Michael Johanson, and Carmelo Piccione. Regret minimization in games with incomplete information. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2007. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail