On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

Anagnostides, Ioannis; Kalavasis, Alkis; Sandholm, Tuomas; Zampetakis, Manolis

doi:10.4230/LIPIcs.ITCS.2024.5

Abstract

Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby leading to faster convergence to the set of coarse correlated equilibria (CCE) - a standard game-theoretic equilibrium concept. Yet, despite considerable recent progress, the fundamental complexity barriers for learning in normal- and extensive-form games are poorly understood. In this paper, we make a step towards closing this gap by first showing that - barring major complexity breakthroughs - any polynomial-time learning algorithms in extensive-form games need at least 2^{log^{1/2 - o(1)} |𝒯|} iterations for the average regret to reach below even an absolute constant, where |𝒯| is the number of nodes in the game. This establishes a superpolynomial separation between no-regret learning in normal- and extensive-form games, as in the former class a logarithmic number of iterations suffices to achieve constant average regret. Furthermore, our results imply that algorithms such as multiplicative weights update, as well as its optimistic counterpart, require at least 2^{(log log m)^{1/2 - o(1)}} iterations to attain an O(1)-CCE in m-action normal-form games under any parameterization. These are the first non-trivial - and dimension-dependent - lower bounds in that setting for the most well-studied algorithms in the literature. From a technical standpoint, we follow a beautiful connection recently made by Foster, Golowich, and Kakade (ICML '23) between sparse CCE and Nash equilibria in the context of Markov games. Consequently, our lower bounds rule out polynomial-time algorithms well beyond the traditional online learning framework, capturing techniques commonly used for accelerating centralized equilibrium computation.

Jacob Abernethy, Chansoo Lee, and Ambuj Tewari. Perturbation techniques in online learning and optimization. Perturbations, Optimization, and Statistics, 233, 2016.
Robert Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1:67-96, 1974.
Yakov Babichenko. Query complexity of approximate nash equilibria. Journal of the ACM, 63(4):36:1-36:24, 2016.
Yakov Babichenko, Christos H. Papadimitriou, and Aviad Rubinstein. Can almost everybody be almost happy? In Proceedings of the Conference on Innovations in Theoretical Computer Science, pages 1-9. ACM, 2016.
Yu Bai, Chi Jin, Song Mei, Ziang Song, and Tiancheng Yu. Efficient phi-regret minimization in extensive-form games via online mirror descent. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
Yu Bai, Chi Jin, Song Mei, and Tiancheng Yu. Near-optimal learning of extensive-form games with imperfect information. In International Conference on Machine Learning (ICML), pages 1337-1382. PMLR, 2022.
Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sasha Mitts, Adithya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, and Markus Zijlstra. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067-1074, 2022.
Daniel Beaglehole, Max Hopkins, Daniel Kane, Sihan Liu, and Shachar Lovett. Sampling equilibria: Fast no-regret learning in structured games. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3817-3855. SIAM, 2023.
Shai Ben-David, Dávid Pál, and Shai Shalev-Shwartz. Agnostic online learning. In Conference on Learning Theory (COLT), 2009.
David Blackwell. An analog of the minmax theorem for vector payoffs. Pacific Journal of Mathematics, 6:1-8, 1956.
Avrim Blum and Yishay Mansour. Learning, regret minimization, and equilibria, 2007.
Christian Borgs, Jennifer T. Chayes, Nicole Immorlica, Adam Tauman Kalai, Vahab S. Mirrokni, and Christos H. Papadimitriou. The myth of the folk theorem. Games and Economic Behavior, 70(1):34-43, 2010.
Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up limit hold'em poker is solved. Science, 347(6218), January 2015.
Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, pages 418-424, December 2018.
Noam Brown and Tuomas Sandholm. Solving imperfect-information games via discounted regret minimization. In AAAI Conference on Artificial Intelligence (AAAI), 2019.
Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885-890, 2019.
Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM, 2009.
Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
Chirag Chhablani, Michael Sullins, and Ian A. Kash. Multiplicative weight updates for extensive form games. In Autonomous Agents and Multi-Agent Systems, pages 1071-1078. ACM, 2023.
Francis Chu and Joseph Halpern. On the NP-completeness of finding an optimal strategy in games with common payoffs. International Journal of Game Theory, 2001.
Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. From external to swap regret 2.0: An efficient reduction and oblivious adversary for large action spaces, 2023.
Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. Near-optimal no-regret algorithms for zero-sum games. Games and Economic Behavior, 92:327-348, 2015.
Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. Near-optimal no-regret learning in general games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 27604-27616, 2021.
Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a nash equilibrium. SIAM Journal on Computing, 39(1), 2009.
Constantinos Daskalakis and Noah Golowich. Fast rates for nonparametric online learning: from realizability to learning in games. In Proceedings of the Annual Symposium on Theory of Computing (STOC), pages 846-859. ACM, 2022.
Miroslav Dudík and Geoffrey J. Gordon. A sampling-based approach to computing equilibria in succinct extensive-form games. In UAI 2009, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June 18-21, 2009, pages 151-160. AUAI Press, 2009.
Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, and Yishay Mansour. Regret minimization and convergence to equilibria in general-sum markov games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 9343-9373. PMLR, 2023.
Gabriele Farina, Tommaso Bianchi, and Tuomas Sandholm. Coarse correlation in extensive-form games. In AAAI Conference on Artificial Intelligence (AAAI), volume 34, pages 1934-1941, 2020.
Gabriele Farina, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Simple uncoupled no-regret learning dynamics for extensive-form correlated equilibrium. Journal of the ACM, 69(6):41:1-41:41, 2022.
Gabriele Farina, Christian Kroer, Noam Brown, and Tuomas Sandholm. Stable-predictive optimistic counterfactual regret minimization. In International Conference on Machine Learning (ICML), 2019.
Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Regret circuits: Composability of regret minimizers. In International Conference on Machine Learning, pages 1863-1872, 2019.
Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Better regularization for sequential decision spaces: Fast convergence rates for nash, correlated, and team equilibria. In Proceedings of the ACM Conference on Economics and Computation (EC), page 432. ACM, 2021.
Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent. In AAAI Conference on Artificial Intelligence (AAAI), 2021.
Gabriele Farina, Chung-Wei Lee, Haipeng Luo, and Christian Kroer. Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games. In International Conference on Machine Learning (ICML), volume 162 of Proceedings of Machine Learning Research, pages 6337-6357. PMLR, 2022.
John Fearnley, Martin Gairing, Paul W. Goldberg, and Rahul Savani. Learning equilibria of games via payoff queries. Journal of Machine Learning Research, 16:1305-1344, 2015.
John Fearnley and Rahul Savani. Finding approximate nash equilibria of bimatrix games via payoff queries. ACM Trans. Economics and Comput., 4(4):25:1-25:19, 2016.
Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, and Michal Valko. Adapting to game trees in zero-sum imperfect information games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 10093-10135. PMLR, 2023.
Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, and Michal Valko. Local and adaptive mirror descents in extensive-form games, 2023. URL: https://arxiv.org/abs/2309.00656.
Dean Foster and Rakesh Vohra. Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21:40-55, 1997.
Dylan J. Foster, Noah Golowich, and Sham M. Kakade. Hardness of independent learning and sparse equilibrium computation in markov games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 10188-10221. PMLR, 2023.
Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, and Éva Tardos. Learning in games: Robustness of fast convergence. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 4727-4735, 2016.
Paul W. Goldberg and Matthew J. Katzman. Lower bounds for the query complexity of equilibria in lipschitz games. Theor. Comput. Sci., 962:113931, 2023.
Geoffrey J Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. In Proceedings of the 25superscriptth international conference on Machine learning, pages 360-367. ACM, 2008.
Hédi Hadiji, Sarah Sachs, Tim van Erven, and Wouter M. Koolen. Towards characterizing the first-order query complexity of learning (approximate) nash equilibria in zero-sum matrix games, 2023.
Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68:1127-1150, 2000.
Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157-325, 2016.
Johannes Heinrich, Marc Lanctot, and David Silver. Fictitious self-play in extensive-form games. In International Conference on Machine Learning (ICML), volume 37 of JMLR Workshop and Conference Proceedings, pages 805-813. JMLR.org, 2015.
Wassily Hoeffding and J. Wolfowitz. Distinguishability of sets of distributions. The Annals of Mathematical Statistics, 29(3):700-718, 1958.
Yu-Guan Hsieh, Kimon Antonakopoulos, Volkan Cevher, and Panayotis Mertikopoulos. No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
Yu-Guan Hsieh, Kimon Antonakopoulos, and Panayotis Mertikopoulos. Adaptive learning in continuous games: Optimal regret bounds and convergence to nash equilibrium. In Conference on Learning Theory (COLT), volume 134 of Proceedings of Machine Learning Research, pages 2388-2422. PMLR, 2021.
Wan Huang and Bernhard von Stengel. Computing an extensive-form correlated equilibrium in polynomial time. In Internet and Network Economics, 4th International Workshop, WINE 2008, volume 5385 of Lecture Notes in Computer Science, pages 506-513. Springer, 2008.
Albert Xin Jiang and Kevin Leyton-Brown. Polynomial-time computation of exact correlated equilibrium in compact games. Games and Economic Behavior, 91:347-359, 2015.
Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71:291-307, 2005.
Ehsan Asadi Kangarshahi, Ya-Ping Hsieh, Mehmet Fatih Sahin, and Volkan Cevher. Let’s be honest: An optimal no-regret framework for zero-sum games. In International Conference on Machine Learning (ICML), volume 80 of Proceedings of Machine Learning Research, pages 2493-2501. PMLR, 2018.
Daphne Koller, Nimrod Megiddo, and Bernhard von Stengel. Fast algorithms for finding randomized strategies in game trees. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 1994.
Tadashi Kozuno, Pierre Ménard, Rémi Munos, and Michal Valko. Learning in two-player zero-sum partially observable markov games with perfect recall. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), pages 11987-11998, 2021.
Richard Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games using simple strategies. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 36-41, San Diego, CA, 2003. ACM.
Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2:285-318, 1988.
Michael Littman and Peter Stone. A polynomial-time Nash equilibrium algorithm for repeated games. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC), pages 48-54, San Diego, CA, 2003.
Arnab Maiti, Ross Boczar, Kevin G. Jamieson, and Lillian J. Ratliff. Query-efficient algorithms to find the unique nash equilibrium in a two-player zero-sum matrix game, 2023.
Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, and Amy R. Greenwald. Efficient deviation types and learning for hindsight rationality in extensive-form games. In Marina Meila and Tong Zhang, editors, International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 7818-7828. PMLR, 2021.
Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy R. Greenwald, and Michael Bowling. Hindsight and sequential rationality of correlated play. In AAAI Conference on Artificial Intelligence (AAAI), pages 5584-5594. AAAI Press, 2021.
H. Moulin and J.-P. Vial. Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 7(3-4):201-221, 1978.
Christos H. Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and system Sciences, 48(3):498-532, 1994.
Christos H. Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player games. Journal of the ACM, 55(3):14:1-14:29, 2008.
Binghui Peng and Aviad Rubinstein. Fast swap regret minimization and applications to approximate correlated equilibria, 2023.
Georgios Piliouras, Ryann Sim, and Stratis Skoulakis. Beyond time-average convergence: Near-optimal uncoupled online learning via clairvoyant multiplicative weights update. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
Ju Qi, Ting Feng, Falun Hei, Zhemei Fang, and Yunfeng Luo. Pure monte carlo counterfactual regret minimization, 2023. URL: https://arxiv.org/abs/2309.03084.
Alexander Rakhlin and Karthik Sridharan. Online learning with predictable sequences. In Conference on Learning Theory, pages 993-1019, 2013.
Alexander Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems, pages 3066-3074, 2013.
Julia Robinson. An iterative method of solving a game. Annals of Mathematics, 54:296-301, 1951.
I. Romanovskii. Reduction of a game with complete memory to a matrix game. Soviet Mathematics, 3, 1962.
Aviad Rubinstein. Inapproximability of nash equilibrium. SIAM Journal on Computing, 47(3):917-959, 2018.
Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.
Ziang Song, Song Mei, and Yu Bai. Sample-efficient learning of correlated equilibria in extensive-form games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems, pages 2989-2997, 2015.
Eiji Takimoto and Manfred K. Warmuth. Path kernels and multiplicative updates. Journal of Machine Learning Research, 4:773-818, 2003.
Oskari Tammelin, Neil Burch, Michael Johanson, and Michael Bowling. Solving heads-up limit Texas hold'em. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), 2015.
Xiaohang Tang, Le Cong Dinh, Stephen Marcus McAleer, and Yaodong Yang. Regret-minimizing double oracle for extensive-form games. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 33599-33615. PMLR, 2023.
Emanuel Tewolde, Caspar Oesterheld, Vincent Conitzer, and Paul W. Goldberg. The computational complexity of single-player imperfect-recall games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2878-2887, 2023.
Bernhard von Stengel. Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220-246, 1996.
Bernhard von Stengel and Françoise Forges. Extensive-form correlated equilibrium: Definition and computational complexity. Mathematics of Operations Research, 33(4):1002-1022, 2008.
V. G. Vovk. Aggregating strategies. In Conference on Learning Theory (COLT), pages 371-386. Morgan Kaufmann, 1990.
Andre Wibisono, Molei Tao, and Georgios Piliouras. Alternating mirror descent for constrained min-max games. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
Yuepeng Yang and Cong Ma. O(T^-1) convergence of optimistic-follow-the-regularized-leader in two-player zero-sum markov games. In The Eleventh International Conference on Learning Representations, ICLR 2023. OpenReview.net, 2023.
Brian Hu Zhang and Tuomas Sandholm. Finding and certifying (near-)optimal strategies in black-box extensive-form games. In AAAI Conference on Artificial Intelligence (AAAI), pages 5779-5788. AAAI Press, 2021.
Brian Hu Zhang and Tuomas Sandholm. Team correlated equilibria in zero-sum extensive-form games via tree decompositions. In AAAI Conference on Artificial Intelligence (AAAI), pages 5252-5259. AAAI Press, 2022.
Runyu Zhang, Qinghua Liu, Huan Wang, Caiming Xiong, Na Li, and Yu Bai. Policy optimization for markov games: Unified framework and faster convergence. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
Martin Zinkevich, Michael Bowling, Michael Johanson, and Carmelo Piccione. Regret minimization in games with incomplete information. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2007.

On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

Authors Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm, Manolis Zampetakis

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

Authors Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm, Manolis Zampetakis

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message