Buying Data over Time: Approximately Optimal Strategies for Dynamic Data-Driven Decisions

Immorlica, Nicole; Kash, Ian A.; Lucier, Brendan

doi:10.4230/LIPIcs.ITCS.2021.77

Abstract

We consider a model where an agent has a repeated decision to make and wishes to maximize their total payoff. Payoffs are influenced by an action taken by the agent, but also an unknown state of the world that evolves over time. Before choosing an action each round, the agent can purchase noisy samples about the state of the world. The agent has a budget to spend on these samples, and has flexibility in deciding how to spread that budget across rounds. We investigate the problem of choosing a sampling algorithm that optimizes total expected payoff. For example: is it better to buy samples steadily over time, or to buy samples in batches? We solve for the optimal policy, and show that it is a natural instantiation of the latter. Under a more general model that includes per-round fixed costs, we prove that a variation on this batching policy is a 2-approximation.

Charu C. Aggarwal. A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD '03, pages 575-586, New York, NY, USA, 2003. ACM. URL: https://doi.org/10.1145/872757.872826.
Charu C. Aggarwal. Data Streams: Models and Algorithms (Advances in Database Systems). Springer-Verlag, Berlin, Heidelberg, 2006.
Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29, VLDB '03, pages 81-92. VLDB Endowment, 2003. URL: http://dl.acm.org/citation.cfm?id=1315451.1315460.
Imanol Arrieta-Ibarra, Leonard Goff, Diego Jiménez-Hernández, Jaron Lanier, and E Glen Weyl. Should we treat data as labor? moving beyond" free". In AEA Papers and Proceedings, volume 108, pages 38-42, 2018.
Claudia Beleites, Ute Neugebauer, Thomas Bocklitz, Christoph Krafft, and Jürgen Popp. Sample size planning for classification models. Analytica chimica acta, 760:25-33, 2013.
Michael Buhrmester, Tracy Kwang, and Samuel D Gosling. Amazon’s mechanical turk: A new source of inexpensive, yet high-quality, data? Perspectives on psychological science, 6(1):3-5, 2011.
Yiling Chen, Nicole Immorlica, Brendan Lucier, Vasilis Syrgkanis, and Juba Ziani. Optimal data acquisition for statistical estimation. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 27-44. ACM, 2018.
Yiling Chen and Bo Waggoner. Informational substitutes. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 239-247. IEEE, 2016.
Junghwan Cho, Kyewook Lee, Ellie Shin, Garry Choy, and Synho Do. How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? CoRR, abs/1511.06348, 2015. URL: http://arxiv.org/abs/1511.06348.
Corinna Cortes, Lawrence D Jackel, Sara A Solla, Vladimir Vapnik, and John S Denker. Learning curves: Asymptotic values and rate of convergence. In Advances in Neural Information Processing Systems, pages 327-334, 1994.
Bo Cowgill, Justin Wolfers, and Eric Zitzewitz. Using prediction markets to track information flows: Evidence from google. In AMMA, page 3, 2009.
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows: (extended abstract). In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '02, pages 635-644, Philadelphia, PA, USA, 2002. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=545381.545466.
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 655-664, 2016.
I. Diakonikolas, D. M. Kane, and A. Stewart. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 73-84, 2017.
Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robustly learning a gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '18, pages 2683-2702, Philadelphia, PA, USA, 2018. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=3174304.3175475.
Fang Fang, Maxwell Stinchcombe, and Andrew Whinston. " putting your money where your mouth is"-a betting platform for better prediction. Review of Network Economics, 6(2), 2007.
Simon Fothergill, Helena Mentis, Pushmeet Kohli, and Sebastian Nowozin. Instructing people for training gestural interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1737-1746. ACM, 2012.
Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, STOC '02, pages 389-398, New York, NY, USA, 2002. ACM. URL: https://doi.org/10.1145/509907.509966.
Shugang Hao and Lingjie Duan. Regulating competition in age of information under network externalities. IEEE Journal on Selected Areas in Communications, 38(4):697-710, 2020.
Panagiotis G Ipeirotis. Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads, The ACM Magazine for Students, 17(2):16-21, 2010.
Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Efficiently learning mixtures of two gaussians. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC '10, page 553–562, New York, NY, USA, 2010. Association for Computing Machinery. URL: https://doi.org/10.1145/1806689.1806765.
H. M. Kalayeh and D. A. Landgrebe. Predicting the required number of training samples. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(6):664-667, November 1983. URL: https://doi.org/10.1109/TPAMI.1983.4767459.
Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Journal of basic Engineering, 82(1):35-45, 1960.
Sanjit Kaul, Roy Yates, and Marco Gruteser. Real-time status: How often should one update? In 2012 Proceedings IEEE INFOCOM, pages 2731-2735. IEEE, 2012.
Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In International Conference on Machine Learning, pages 1885-1894, 2017.
K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665-674, Los Alamitos, CA, USA, October 2016. IEEE Computer Society. URL: https://doi.org/10.1109/FOCS.2016.76.
Ern J Lefferts, F Landis Markley, and Malcolm D Shuster. Kalman filtering for spacecraft attitude estimation. Journal of Guidance, Control, and Dynamics, 5(5):417-429, 1982.
Chao Li and Gerome Miklau. Pricing aggregate queries in a data marketplace. In WebDB, pages 19-24, 2012.
Annie Liang, Xiaosheng Mu, and Vasilis Syrgkanis. Dynamic information acquisition from multiple sources. arXiv preprint, 2017. URL: http://arxiv.org/abs/1703.06367.
Nihar Bhadresh Shah and Denny Zhou. Double or nothing: Multiplicative incentive mechanisms for crowdsourcing. In Advances in neural information processing systems, pages 1-9, 2015.
Xiaohui Song and Jane W-S Liu. Performance of multiversion concurrency control algorithms in maintaining temporal consistency. In Proceedings Fourteenth Annual International Computer Software and Applications Conference, pages 132-133. IEEE Computer Society, 1990.
Florian Stahl, Fabian Schomm, and Gottfried Vossen. The data marketplace survey revisited. Technical report, Working Papers, ERCIS-European Research Center for Information Systems, 2014.
Amos Storkey. Machine learning markets. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 716-724, 2011.
Sebastian Thrun. Probabilistic algorithms in robotics. Ai Magazine, 21(4):93, 2000.
Xuehe Wang and Lingjie Duan. Dynamic pricing for controlling age of information. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 962-966. IEEE, 2019.
Daniel B Work, Olli-Pekka Tossavainen, Sébastien Blandin, Alexandre M Bayen, Tochukwu Iwuchukwu, and Kenneth Tracton. An ensemble kalman filtering approach to highway traffic estimation using gps enabled mobile devices. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, pages 5062-5068. IEEE, 2008.
Xianwen Wu, Jing Yang, and Jingxian Wu. Optimal status update for age of information minimization with an energy harvesting source. IEEE Transactions on Green Communications and Networking, 2(1):193-204, 2017.
Meng Zhang, Ahmed Arafa, Jianwei Huang, and H Vincent Poor. How to price fresh data. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.06899.

Buying Data over Time: Approximately Optimal Strategies for Dynamic Data-Driven Decisions

Authors Nicole Immorlica, Ian A. Kash, Brendan Lucier

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Buying Data over Time: Approximately Optimal Strategies for Dynamic Data-Driven Decisions

Authors Nicole Immorlica, Ian A. Kash, Brendan Lucier

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message