Competing Bandits: Learning Under Competition

Authors Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

Thumbnail PDF


  • Filesize: 0.65 MB
  • 27 pages

Document Identifiers

Author Details

Yishay Mansour
Aleksandrs Slivkins
Zhiwei Steven Wu

Cite AsGet BibTex

Yishay Mansour, Aleksandrs Slivkins, and Zhiwei Steven Wu. Competing Bandits: Learning Under Competition. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 48:1-48:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We initiate a study of the interplay between exploration and competition--how such systems balance the exploration for learning and the competition for users. Here the users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing systems. In our model, we consider competition between two multi-armed bandit algorithms faced with the same bandit instance. Users arrive one by one and choose among the two algorithms, so that each algorithm makes progress if and only if it is chosen. We ask whether and to what extent competition incentivizes the adoption of better bandit algorithms. We investigate this issue for several models of user response, as we vary the degree of rationality and competitiveness in the model. Our findings are closely related to the "competition vs. innovation" relationship, a well-studied theme in economics.
  • machine learning
  • game theory
  • competition
  • exploration
  • rationality


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Alekh Agarwal, Sarah Bird, Markus Cozowicz, Miro Dudik, John Langford, Lihong Li, Luong Hoang, Dan Melamed, Siddhartha Sen, Robert Schapire, and Alex Slivkins. Multiworld testing: A system for experimentation, learning, and decision-making, 2016. A white paper, available at URL:
  2. Philippe Aghion, Nicholas Bloom, Richard Blundell, Rachel Griffith, and Peter Howitt. Competition and innovation: An inverted u relationship. Quaterly J. of Economics, 120(2):701-728, 2005. Google Scholar
  3. Susan Athey and Ilya Segal. An efficient dynamic mechanism. Econometrica, 81(6):2463-2485, 2013. A preliminary version has been available as a working paper since 2007. Google Scholar
  4. Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002. Google Scholar
  5. Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48-77, 2002. Preliminary version in 36th IEEE FOCS, 1995. Google Scholar
  6. Eduardo Azevedo and Daniel Gottlieb. Perfect competition in markets with adverse selection. Econometrica, 85(1):67-105, 2017. Google Scholar
  7. Moshe Babaioff, Robert Kleinberg, and Aleksandrs Slivkins. Truthful mechanisms with implicit payment computation. J. of the ACM, 62(2):10, 2015. Subsumes the conference papers in ACM EC 2010 and ACM EC 2013. Google Scholar
  8. Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. Characterizing truthful multi-armed bandit mechanisms. SIAM J. on Computing (SICOMP), 43(1):194-230, 2014. Preliminary version in 10th ACM EC, 2009. Google Scholar
  9. Gal Bahar, Rann Smorodinsky, and Moshe Tennenholtz. Economic recommendation systems. In 16th ACM Conf. on Electronic Commerce (EC), 2016. Google Scholar
  10. Dirk Bergemann and Juuso Välimäki. The dynamic pivot mechanism. Econometrica, 78(2):771-789, 2010. Preliminary versions have been available since 2006, as Cowles Foundation Discussion Papers #1584 (2006), #1616 (2007) and #1672(2008). Google Scholar
  11. Kostas Bimpikis, Yiangos Papanastasiou, and Nicos Savva. Crowdsourcing exploration. Management Science, 2017. Forthcoming. Google Scholar
  12. Patrick Bolton and Christopher Harris. Strategic Experimentation. Econometrica, 67(2):349-374, 1999. Google Scholar
  13. Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, 5(1), 2012. Google Scholar
  14. Yeon-Koo Che and Johannes Hörner. Optimal design for social learning. Preprint, 2015. First draft: 2013. Google Scholar
  15. Nikhil Devanur and Sham M. Kakade. The price of truthfulness for pay-per-click auctions. In 10th ACM Conf. on Electronic Commerce (EC), pages 99-106, 2009. Google Scholar
  16. Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. of Machine Learning Research (JMLR), 7:1079-1105, 2006. Google Scholar
  17. Peter Frazier, David Kempe, Jon M. Kleinberg, and Robert Kleinberg. Incentivizing exploration. In ACM Conf. on Economics and Computation (ACM EC), pages 5-22, 2014. Google Scholar
  18. Xavier Gabaix, David Laibson, Deyuan Li, Hongyi Li, Sidney Resnick, and Casper G. de Vries. The impact of competition on prices with numerous firms. J. of Economic Theory, 165:1-24, 2016. Google Scholar
  19. Arpita Ghosh and Patrick Hummel. Learning and incentives in user-generated content: multi-armed bandits with endogenous arms. In Innovations in Theoretical Computer Science Conf. (ITCS), pages 233-246, 2013. Google Scholar
  20. John Gittins, Kevin Glazebrook, and Richard Weber. Multi-Armed Bandit Allocation Indices. John Wiley &Sons, 2011. Google Scholar
  21. Ramakrishna Gummadi, Ramesh Johari, and Jia Yuan Yu. Mean field equilibria of multiarmed bandit games. In 13th ACM Conf. on Electronic Commerce (EC), 2012. Google Scholar
  22. Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. of Artificial Intelligence Research, 55:317-359, 2016. Preliminary version appeared in ACM EC 2014. Google Scholar
  23. Harold Hotelling. Stability in competition. The Economic Journal, 39(153):41-57, 1929. Google Scholar
  24. Nicole Immorlica, Adam Tauman Kalai, Brendan Lucier, Ankur Moitra, Andrew Postlewaite, and Moshe Tennenholtz. Dueling algorithms. In 43rd ACM Symp. on Theory of Computing (STOC), pages 215-224, 2011. Google Scholar
  25. Sham M. Kakade, Ilan Lobel, and Hamid Nazerzadeh. Optimal dynamic mechanism design and the virtual-pivot mechanism. Operations Research, 61(4):837-854, 2013. Google Scholar
  26. Godfrey Keller, Sven Rady, and Martin Cripps. Strategic Experimentation with Exponential Bandits. Econometrica, 73(1):39-68, 2005. Google Scholar
  27. Robert D. Kleinberg, Bo Waggoner, and E. Glen Weyl. Descending price optimally coordinates search. Working paper, 2016. Preliminary version in ACM EC 2016. Under submission to Econometrica. Google Scholar
  28. Ilan Kremer, Yishay Mansour, and Motty Perry. Implementing the “wisdom of the crowd”. J. of Political Economy, 122:988-1012, 2014. Preliminary version in ACM EC 2014. Google Scholar
  29. Tze Leung Lai and Herbert Robbins. Asymptotically efficient Adaptive Allocation Rules. Advances in Applied Mathematics, 6:4-22, 1985. Google Scholar
  30. Yishay Mansour, Aleksandrs Slivkins, and Vasilis Syrgkanis. Bayesian incentive-compatible bandit exploration. In 15th ACM Conf. on Electronic Commerce (EC), 2015. Google Scholar
  31. Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, and Steven Wu. Bayesian exploration: Incentivizing exploration in bayesian games. Working paper, 2016. available at Preliminary version in ACM EC 2016. Google Scholar
  32. Paul Milgrom and Nancy Stokey. Information, trade and common knowledge. J. of Economic Theory, 26(1):17-27, 1982. Google Scholar
  33. Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. Google Scholar
  34. Jeffrey M. Perloff and Steven C. Salop. Equilibrium with product differentiation. Review of Economic Studies, LII:107-120, 1985. Google Scholar
  35. Michael Rothschild and Joseph Stiglitz. Equilibrium in competitive insurance markets: An essay on the economics of imperfect information. Quaterly J. of Economics, 90(4):629-649, 1976. Google Scholar
  36. Marc Rysman. The economics of two-sided markets. J. of Economic Perspectives, 23(3):125-144, 2009. Google Scholar
  37. Joseph Schumpeter. Capitalism, Socialism and Democracy. Harper &Brothers, 1942. Google Scholar
  38. Adish Singla and Andreas Krause. Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In 22nd Intl. World Wide Web Conf. (WWW), pages 1167-1178, 2013. Google Scholar
  39. Aleksandrs Slivkins. Introduction to multi-armed bandits, 2017. A book draft, available at Google Scholar
  40. Andre Veiga and Glen Weyl. Product design in selection markets. Quarterly J. of Economics, 131(2):1007-1056, 2016. Google Scholar
  41. Xavier Vives. Innovation and competitive pressure. J. of Industrial Economics, 56(3), 2008. Google Scholar
  42. Glen Weyl and Alexander White. Let the right ‘one' win: Policy lessons from the new economics of platforms. Competition Policy International, 12(2):29-51, 2014. Google Scholar
  43. Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. The k-armed dueling bandits problem. J. Comput. Syst. Sci., 78(5):1538-1556, 2012. Preliminary version in COLT 2009. Google Scholar
  44. Yisong Yue and Thorsten Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. In 26th Intl. Conf. on Machine Learning (ICML), pages 1201-1208, 2009. Google Scholar