Competing Bandits: Learning Under Competition

Mansour, Yishay; Slivkins, Aleksandrs; Wu, Zhiwei Steven

doi:10.4230/LIPIcs.ITCS.2018.48

Abstract

Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We initiate a study of the interplay between exploration and competition--how such systems balance the exploration for learning and the competition for users. Here the users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing systems. In our model, we consider competition between two multi-armed bandit algorithms faced with the same bandit instance. Users arrive one by one and choose among the two algorithms, so that each algorithm makes progress if and only if it is chosen. We ask whether and to what extent competition incentivizes the adoption of better bandit algorithms. We investigate this issue for several models of user response, as we vary the degree of rationality and competitiveness in the model. Our findings are closely related to the "competition vs. innovation" relationship, a well-studied theme in economics.

Alekh Agarwal, Sarah Bird, Markus Cozowicz, Miro Dudik, John Langford, Lihong Li, Luong Hoang, Dan Melamed, Siddhartha Sen, Robert Schapire, and Alex Slivkins. Multiworld testing: A system for experimentation, learning, and decision-making, 2016. A white paper, available at URL: https://github.com/Microsoft/mwt-ds/raw/master/images/MWT-WhitePaper.pdf.
Philippe Aghion, Nicholas Bloom, Richard Blundell, Rachel Griffith, and Peter Howitt. Competition and innovation: An inverted u relationship. Quaterly J. of Economics, 120(2):701-728, 2005.
Susan Athey and Ilya Segal. An efficient dynamic mechanism. Econometrica, 81(6):2463-2485, 2013. A preliminary version has been available as a working paper since 2007.
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002.
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48-77, 2002. Preliminary version in 36th IEEE FOCS, 1995.
Eduardo Azevedo and Daniel Gottlieb. Perfect competition in markets with adverse selection. Econometrica, 85(1):67-105, 2017.
Moshe Babaioff, Robert Kleinberg, and Aleksandrs Slivkins. Truthful mechanisms with implicit payment computation. J. of the ACM, 62(2):10, 2015. Subsumes the conference papers in ACM EC 2010 and ACM EC 2013.
Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. Characterizing truthful multi-armed bandit mechanisms. SIAM J. on Computing (SICOMP), 43(1):194-230, 2014. Preliminary version in 10th ACM EC, 2009.
Gal Bahar, Rann Smorodinsky, and Moshe Tennenholtz. Economic recommendation systems. In 16th ACM Conf. on Electronic Commerce (EC), 2016.
Dirk Bergemann and Juuso Välimäki. The dynamic pivot mechanism. Econometrica, 78(2):771-789, 2010. Preliminary versions have been available since 2006, as Cowles Foundation Discussion Papers #1584 (2006), #1616 (2007) and #1672(2008).
Kostas Bimpikis, Yiangos Papanastasiou, and Nicos Savva. Crowdsourcing exploration. Management Science, 2017. Forthcoming.
Patrick Bolton and Christopher Harris. Strategic Experimentation. Econometrica, 67(2):349-374, 1999.
Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, 5(1), 2012.
Yeon-Koo Che and Johannes Hörner. Optimal design for social learning. Preprint, 2015. First draft: 2013.
Nikhil Devanur and Sham M. Kakade. The price of truthfulness for pay-per-click auctions. In 10th ACM Conf. on Electronic Commerce (EC), pages 99-106, 2009.
Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. of Machine Learning Research (JMLR), 7:1079-1105, 2006.
Peter Frazier, David Kempe, Jon M. Kleinberg, and Robert Kleinberg. Incentivizing exploration. In ACM Conf. on Economics and Computation (ACM EC), pages 5-22, 2014.
Xavier Gabaix, David Laibson, Deyuan Li, Hongyi Li, Sidney Resnick, and Casper G. de Vries. The impact of competition on prices with numerous firms. J. of Economic Theory, 165:1-24, 2016.
Arpita Ghosh and Patrick Hummel. Learning and incentives in user-generated content: multi-armed bandits with endogenous arms. In Innovations in Theoretical Computer Science Conf. (ITCS), pages 233-246, 2013.
John Gittins, Kevin Glazebrook, and Richard Weber. Multi-Armed Bandit Allocation Indices. John Wiley &Sons, 2011.
Ramakrishna Gummadi, Ramesh Johari, and Jia Yuan Yu. Mean field equilibria of multiarmed bandit games. In 13th ACM Conf. on Electronic Commerce (EC), 2012.
Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. of Artificial Intelligence Research, 55:317-359, 2016. Preliminary version appeared in ACM EC 2014.
Harold Hotelling. Stability in competition. The Economic Journal, 39(153):41-57, 1929.
Nicole Immorlica, Adam Tauman Kalai, Brendan Lucier, Ankur Moitra, Andrew Postlewaite, and Moshe Tennenholtz. Dueling algorithms. In 43rd ACM Symp. on Theory of Computing (STOC), pages 215-224, 2011.
Sham M. Kakade, Ilan Lobel, and Hamid Nazerzadeh. Optimal dynamic mechanism design and the virtual-pivot mechanism. Operations Research, 61(4):837-854, 2013.
Godfrey Keller, Sven Rady, and Martin Cripps. Strategic Experimentation with Exponential Bandits. Econometrica, 73(1):39-68, 2005.
Robert D. Kleinberg, Bo Waggoner, and E. Glen Weyl. Descending price optimally coordinates search. Working paper, 2016. Preliminary version in ACM EC 2016. Under submission to Econometrica.
Ilan Kremer, Yishay Mansour, and Motty Perry. Implementing the “wisdom of the crowd”. J. of Political Economy, 122:988-1012, 2014. Preliminary version in ACM EC 2014.
Tze Leung Lai and Herbert Robbins. Asymptotically efficient Adaptive Allocation Rules. Advances in Applied Mathematics, 6:4-22, 1985.
Yishay Mansour, Aleksandrs Slivkins, and Vasilis Syrgkanis. Bayesian incentive-compatible bandit exploration. In 15th ACM Conf. on Electronic Commerce (EC), 2015.
Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, and Steven Wu. Bayesian exploration: Incentivizing exploration in bayesian games. Working paper, 2016. available at https://arxiv.org/abs/1602.07570. Preliminary version in ACM EC 2016.
Paul Milgrom and Nancy Stokey. Information, trade and common knowledge. J. of Economic Theory, 26(1):17-27, 1982.
Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.
Jeffrey M. Perloff and Steven C. Salop. Equilibrium with product differentiation. Review of Economic Studies, LII:107-120, 1985.
Michael Rothschild and Joseph Stiglitz. Equilibrium in competitive insurance markets: An essay on the economics of imperfect information. Quaterly J. of Economics, 90(4):629-649, 1976.
Marc Rysman. The economics of two-sided markets. J. of Economic Perspectives, 23(3):125-144, 2009.
Joseph Schumpeter. Capitalism, Socialism and Democracy. Harper &Brothers, 1942.
Adish Singla and Andreas Krause. Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In 22nd Intl. World Wide Web Conf. (WWW), pages 1167-1178, 2013.
Aleksandrs Slivkins. Introduction to multi-armed bandits, 2017. A book draft, available at http://research.microsoft.com/en-us/people/slivkins.
Andre Veiga and Glen Weyl. Product design in selection markets. Quarterly J. of Economics, 131(2):1007-1056, 2016.
Xavier Vives. Innovation and competitive pressure. J. of Industrial Economics, 56(3), 2008.
Glen Weyl and Alexander White. Let the right ‘one' win: Policy lessons from the new economics of platforms. Competition Policy International, 12(2):29-51, 2014.
Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. The k-armed dueling bandits problem. J. Comput. Syst. Sci., 78(5):1538-1556, 2012. Preliminary version in COLT 2009.
Yisong Yue and Thorsten Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. In 26th Intl. Conf. on Machine Learning (ICML), pages 1201-1208, 2009.

Competing Bandits: Learning Under Competition

Authors Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message