Buying Data over Time: Approximately Optimal Strategies for Dynamic Data-Driven Decisions

Authors Nicole Immorlica, Ian A. Kash, Brendan Lucier

Thumbnail PDF


  • Filesize: 0.54 MB
  • 14 pages

Document Identifiers

Author Details

Nicole Immorlica
  • Microsoft Research, Cambridge, MA, USA
Ian A. Kash
  • Department of Computer Science, University of Illinois at Chicago, IL, USA
Brendan Lucier
  • Microsoft Research, Cambridge, MA, USA


Part of this work was done while Ian Kash was at Microsoft Research. He gratefully acknowledges support from the National Science Foundation via award CCF 1934915.

Cite AsGet BibTex

Nicole Immorlica, Ian A. Kash, and Brendan Lucier. Buying Data over Time: Approximately Optimal Strategies for Dynamic Data-Driven Decisions. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 77:1-77:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


We consider a model where an agent has a repeated decision to make and wishes to maximize their total payoff. Payoffs are influenced by an action taken by the agent, but also an unknown state of the world that evolves over time. Before choosing an action each round, the agent can purchase noisy samples about the state of the world. The agent has a budget to spend on these samples, and has flexibility in deciding how to spread that budget across rounds. We investigate the problem of choosing a sampling algorithm that optimizes total expected payoff. For example: is it better to buy samples steadily over time, or to buy samples in batches? We solve for the optimal policy, and show that it is a natural instantiation of the latter. Under a more general model that includes per-round fixed costs, we prove that a variation on this batching policy is a 2-approximation.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Markov decision processes
  • Online Algorithms
  • Value of Data
  • Markov Processes


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Charu C. Aggarwal. A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD '03, pages 575-586, New York, NY, USA, 2003. ACM. URL:
  2. Charu C. Aggarwal. Data Streams: Models and Algorithms (Advances in Database Systems). Springer-Verlag, Berlin, Heidelberg, 2006. Google Scholar
  3. Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29, VLDB '03, pages 81-92. VLDB Endowment, 2003. URL:
  4. Imanol Arrieta-Ibarra, Leonard Goff, Diego Jiménez-Hernández, Jaron Lanier, and E Glen Weyl. Should we treat data as labor? moving beyond" free". In AEA Papers and Proceedings, volume 108, pages 38-42, 2018. Google Scholar
  5. Claudia Beleites, Ute Neugebauer, Thomas Bocklitz, Christoph Krafft, and Jürgen Popp. Sample size planning for classification models. Analytica chimica acta, 760:25-33, 2013. Google Scholar
  6. Michael Buhrmester, Tracy Kwang, and Samuel D Gosling. Amazon’s mechanical turk: A new source of inexpensive, yet high-quality, data? Perspectives on psychological science, 6(1):3-5, 2011. Google Scholar
  7. Yiling Chen, Nicole Immorlica, Brendan Lucier, Vasilis Syrgkanis, and Juba Ziani. Optimal data acquisition for statistical estimation. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 27-44. ACM, 2018. Google Scholar
  8. Yiling Chen and Bo Waggoner. Informational substitutes. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 239-247. IEEE, 2016. Google Scholar
  9. Junghwan Cho, Kyewook Lee, Ellie Shin, Garry Choy, and Synho Do. How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? CoRR, abs/1511.06348, 2015. URL:
  10. Corinna Cortes, Lawrence D Jackel, Sara A Solla, Vladimir Vapnik, and John S Denker. Learning curves: Asymptotic values and rate of convergence. In Advances in Neural Information Processing Systems, pages 327-334, 1994. Google Scholar
  11. Bo Cowgill, Justin Wolfers, and Eric Zitzewitz. Using prediction markets to track information flows: Evidence from google. In AMMA, page 3, 2009. Google Scholar
  12. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows: (extended abstract). In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '02, pages 635-644, Philadelphia, PA, USA, 2002. Society for Industrial and Applied Mathematics. URL:
  13. I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 655-664, 2016. Google Scholar
  14. I. Diakonikolas, D. M. Kane, and A. Stewart. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 73-84, 2017. Google Scholar
  15. Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robustly learning a gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '18, pages 2683-2702, Philadelphia, PA, USA, 2018. Society for Industrial and Applied Mathematics. URL:
  16. Fang Fang, Maxwell Stinchcombe, and Andrew Whinston. " putting your money where your mouth is"-a betting platform for better prediction. Review of Network Economics, 6(2), 2007. Google Scholar
  17. Simon Fothergill, Helena Mentis, Pushmeet Kohli, and Sebastian Nowozin. Instructing people for training gestural interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1737-1746. ACM, 2012. Google Scholar
  18. Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, STOC '02, pages 389-398, New York, NY, USA, 2002. ACM. URL:
  19. Shugang Hao and Lingjie Duan. Regulating competition in age of information under network externalities. IEEE Journal on Selected Areas in Communications, 38(4):697-710, 2020. Google Scholar
  20. Panagiotis G Ipeirotis. Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads, The ACM Magazine for Students, 17(2):16-21, 2010. Google Scholar
  21. Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Efficiently learning mixtures of two gaussians. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC '10, page 553–562, New York, NY, USA, 2010. Association for Computing Machinery. URL:
  22. H. M. Kalayeh and D. A. Landgrebe. Predicting the required number of training samples. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(6):664-667, November 1983. URL:
  23. Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Journal of basic Engineering, 82(1):35-45, 1960. Google Scholar
  24. Sanjit Kaul, Roy Yates, and Marco Gruteser. Real-time status: How often should one update? In 2012 Proceedings IEEE INFOCOM, pages 2731-2735. IEEE, 2012. Google Scholar
  25. Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In International Conference on Machine Learning, pages 1885-1894, 2017. Google Scholar
  26. K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665-674, Los Alamitos, CA, USA, October 2016. IEEE Computer Society. URL:
  27. Ern J Lefferts, F Landis Markley, and Malcolm D Shuster. Kalman filtering for spacecraft attitude estimation. Journal of Guidance, Control, and Dynamics, 5(5):417-429, 1982. Google Scholar
  28. Chao Li and Gerome Miklau. Pricing aggregate queries in a data marketplace. In WebDB, pages 19-24, 2012. Google Scholar
  29. Annie Liang, Xiaosheng Mu, and Vasilis Syrgkanis. Dynamic information acquisition from multiple sources. arXiv preprint, 2017. URL:
  30. Nihar Bhadresh Shah and Denny Zhou. Double or nothing: Multiplicative incentive mechanisms for crowdsourcing. In Advances in neural information processing systems, pages 1-9, 2015. Google Scholar
  31. Xiaohui Song and Jane W-S Liu. Performance of multiversion concurrency control algorithms in maintaining temporal consistency. In Proceedings Fourteenth Annual International Computer Software and Applications Conference, pages 132-133. IEEE Computer Society, 1990. Google Scholar
  32. Florian Stahl, Fabian Schomm, and Gottfried Vossen. The data marketplace survey revisited. Technical report, Working Papers, ERCIS-European Research Center for Information Systems, 2014. Google Scholar
  33. Amos Storkey. Machine learning markets. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 716-724, 2011. Google Scholar
  34. Sebastian Thrun. Probabilistic algorithms in robotics. Ai Magazine, 21(4):93, 2000. Google Scholar
  35. Xuehe Wang and Lingjie Duan. Dynamic pricing for controlling age of information. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 962-966. IEEE, 2019. Google Scholar
  36. Daniel B Work, Olli-Pekka Tossavainen, Sébastien Blandin, Alexandre M Bayen, Tochukwu Iwuchukwu, and Kenneth Tracton. An ensemble kalman filtering approach to highway traffic estimation using gps enabled mobile devices. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, pages 5062-5068. IEEE, 2008. Google Scholar
  37. Xianwen Wu, Jing Yang, and Jingxian Wu. Optimal status update for age of information minimization with an energy harvesting source. IEEE Transactions on Green Communications and Networking, 2(1):193-204, 2017. Google Scholar
  38. Meng Zhang, Ahmed Arafa, Jianwei Huang, and H Vincent Poor. How to price fresh data. arXiv preprint, 2019. URL: