Differentially Private Continual Releases of Streaming Frequency Moment Estimations

Authors Alessandro Epasto, Jieming Mao, Andres Munoz Medina, Vahab Mirrokni, Sergei Vassilvitskii, Peilin Zhong



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2023.48.pdf
  • Filesize: 0.86 MB
  • 24 pages

Document Identifiers

Author Details

Alessandro Epasto
  • Google, New York, NY, USA
Jieming Mao
  • Google, New York, NY, USA
Andres Munoz Medina
  • Google, New York, NY, USA
Vahab Mirrokni
  • Google, New York, NY, USA
Sergei Vassilvitskii
  • Google, New York, NY, USA
Peilin Zhong
  • Google, New York, NY, USA

Cite AsGet BibTex

Alessandro Epasto, Jieming Mao, Andres Munoz Medina, Vahab Mirrokni, Sergei Vassilvitskii, and Peilin Zhong. Differentially Private Continual Releases of Streaming Frequency Moment Estimations. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 48:1-48:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ITCS.2023.48

Abstract

The streaming model of computation is a popular approach for working with large-scale data. In this setting, there is a stream of items and the goal is to compute the desired quantities (usually data statistics) while making a single pass through the stream and using as little space as possible. Motivated by the importance of data privacy, we develop differentially private streaming algorithms under the continual release setting, where the union of outputs of the algorithm at every timestamp must be differentially private. Specifically, we study the fundamental 𝓁_p (p ∈ [0,+∞)) frequency moment estimation problem under this setting, and give an ε-DP algorithm that achieves (1+η)-relative approximation (∀ η ∈ (0,1)) with polylog(Tn) additive error and uses polylog(Tn)⋅ max(1, n^{1-2/p}) space, where T is the length of the stream and n is the size of the universe of elements. Our space is near optimal up to poly-logarithmic factors even in the non-private setting. To obtain our results, we first reduce several primitives under the differentially private continual release model, such as counting distinct elements, heavy hitters and counting low frequency elements, to the simpler, counting/summing problems in the same setting. Based on these primitives, we develop a differentially private continual release level set estimation approach to address the 𝓁_p frequency moment estimation problem. We also provide a simple extension of our results to the harder sliding window model, where the statistics must be maintained over the past W data items.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Security and privacy
Keywords
  • Differential Privacy
  • Continual Release
  • Sliding Window
  • Streaming Algorithms
  • Distinct Elements
  • Frequency Moment Estimation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Naman Agarwal and Karan Singh. The price of differential privacy for online learning. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 32-40. PMLR, 06-11 August 2017. URL: https://proceedings.mlr.press/v70/agarwal17a.html.
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20-29, 1996. Google Scholar
  3. Alexandr Andoni. High frequency moments via max-stability. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6364-6368. IEEE, 2017. Google Scholar
  4. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pages 363-372. IEEE, 2011. Google Scholar
  5. Ziv Bar-Yossef, Thathachar S Jayram, Ravi Kumar, and D Sivakumar. An information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences, 68(4):702-732, 2004. Google Scholar
  6. Mihir Bellare and John Rompel. Randomness-efficient oblivious sampling. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pages 276-287. IEEE, 1994. Google Scholar
  7. Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The johnson-lindenstrauss transform itself preserves differential privacy. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pages 410-419. IEEE Computer Society, 2012. URL: https://doi.org/10.1109/FOCS.2012.67.
  8. Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee, and Samson Zhou. How to make your approximation algorithm private: A black-box differentially-private transformation for tunable approximation algorithms of functions with low sensitivity. arXiv preprint, 2022. URL: http://arxiv.org/abs/2210.03831.
  9. Jean Bolot, Nadia Fawaz, S. Muthukrishnan, Aleksandar Nikolov, and Nina Taft. Private decayed predicate sums on streams. In Proceedings of the 16th International Conference on Database Theory, ICDT '13, pages 284-295, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2448496.2448530.
  10. Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07), pages 283-293, 2007. URL: https://doi.org/10.1109/FOCS.2007.55.
  11. Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Judy Hanwen Shen, and Uthaipon Tantipongpipat. Fast and memory efficient differentially private-sgd via JL projections. In Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 19680-19691, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/a3842ed7b3d0fe3ac263bcabd2999790-Abstract.html.
  12. T-H Hubert Chan, Mingfei Li, Elaine Shi, and Wenchang Xu. Differentially private continual monitoring of heavy hitters from distributed streams. In International Symposium on Privacy Enhancing Technologies Symposium, pages 140-159. Springer, 2012. Google Scholar
  13. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, pages 693-703. Springer, 2002. Google Scholar
  14. Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. Cryptology ePrint Archive, Paper 2020/029, 2020. URL: https://eprint.iacr.org/2020/029.
  15. Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58-75, 2005. Google Scholar
  16. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794-1813, 2002. Google Scholar
  17. Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities. In European Symposium on Algorithms, pages 605-617. Springer, 2003. Google Scholar
  18. Cynthia Dwork. Differential privacy: A survey of results. In International conference on theory and applications of models of computation, pages 1-19. Springer, 2008. Google Scholar
  19. Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In Leonard J. Schulman, editor, Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC, pages 715-724. ACM, 2010. Google Scholar
  20. Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N Rothblum, and Sergey Yekhanin. Pan-private streaming algorithms. In ics, pages 66-80, 2010. Google Scholar
  21. Cynthia Dwork, Moni Naor, Omer Reingold, and Guy N. Rothblum. Pure differential privacy for rectangle queries via private partitions. In Proceedings, Part II, of the 21st International Conference on Advances in Cryptology - ASIACRYPT 2015 - Volume 9453, pages 735-751, Berlin, Heidelberg, 2015. Springer-Verlag. URL: https://doi.org/10.1007/978-3-662-48800-3_30.
  22. Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trendsregistered in Theoretical Computer Science, 9(3-4):211-407, 2014. Google Scholar
  23. Hendrik Fichtenberger, Monika Henzinger, and Wolfgang Ost. Differentially private algorithms for graphs under continual observation. In Petra Mutzel, Rasmus Pagh, and Grzegorz Herman, editors, 29th Annual European Symposium on Algorithms, ESA 2021, September 6-8, 2021, Lisbon, Portugal (Virtual Conference), volume 204 of LIPIcs, pages 42:1-42:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ESA.2021.42.
  24. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science, pages 137-156. Discrete Mathematics and Theoretical Computer Science, 2007. Google Scholar
  25. Philippe Flajolet and G Nigel Martin. Probabilistic counting algorithms for data base applications. Journal of computer and system sciences, 31(2):182-209, 1985. Google Scholar
  26. Sumit Ganguly. Polynomial estimators for high frequency moments. arXiv preprint, 2011. URL: http://arxiv.org/abs/1104.4552.
  27. Hsiang Hsu, Natalia Martinez, Martin Bertran, Guillermo Sapiro, and Flavio P. Calmon. A survey on statistical, information, and estimation—theoretic views on privacy. IEEE BITS the Information Theory Magazine, 1(1):45-56, 2021. Google Scholar
  28. T-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Languages and Programming, pages 405-417, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. Google Scholar
  29. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307-323, 2006. Google Scholar
  30. Piotr Indyk and David Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202-208, 2005. Google Scholar
  31. Palak Jain, Sofya Raskhodnikova, Satchit Sivakumar, and Adam D. Smith. The price of differential privacy under continual observation. CoRR, abs/2112.00828, 2021. URL: http://arxiv.org/abs/2112.00828.
  32. Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In Shie Mannor, Nathan Srebro, and Robert C. Williamson, editors, Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, pages 24.1-24.34, Edinburgh, Scotland, 25-27 June 2012. PMLR. URL: https://proceedings.mlr.press/v23/jain12.html.
  33. Hossein Jowhari, Mert Sağlam, and Gábor Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 49-58, 2011. Google Scholar
  34. Daniel M Kane, Jelani Nelson, Ely Porat, and David P Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 745-754, 2011. Google Scholar
  35. Daniel M Kane, Jelani Nelson, and David P Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1161-1178. SIAM, 2010. Google Scholar
  36. Ping Li. Estimators and tail bounds for dimension reduction in l α (0< α ≤ 2) using stable random projections. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 10-19, 2008. Google Scholar
  37. Yi Li and David P Woodruff. A tight lower bound for high frequency moment estimation with small error. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 623-638. Springer, 2013. Google Scholar
  38. Darakhshan Mir, Shan Muthukrishnan, Aleksandar Nikolov, and Rebecca N Wright. Pan-private algorithms via statistics on sketches. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 37-48, 2011. Google Scholar
  39. Jayadev Misra and David Gries. Finding repeated elements. Science of computer programming, 2(2):143-152, 1982. Google Scholar
  40. Robert Morris. Counting large numbers of events in small registers. Communications of the ACM, 21(10):840-842, 1978. Google Scholar
  41. Victor Perrier, Hassan Jameel Asghar, and Dali Kaafar. Private continual release of real-valued data streams. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society, 2019. URL: https://www.ndss-symposium.org/ndss-paper/private-continual-release-of-real-valued-data-streams/.
  42. Michael Saks and Xiaodong Sun. Space lower bounds for distance approximation in the data stream model. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 360-369, 2002. Google Scholar
  43. Or Sheffet. Differentially private ordinary least squares. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3105-3114. PMLR, 2017. URL: http://proceedings.mlr.press/v70/sheffet17a.html.
  44. Adam Smith and Abhradeep Thakurta. (nearly) optimal algorithms for private online learning in full-information and bandit settings. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS'13, pages 2733-2741, Red Hook, NY, USA, 2013. Curran Associates Inc. Google Scholar
  45. Adam D. Smith, Shuang Song, and Abhradeep Thakurta. The flajolet-martin sketch itself preserves differential privacy: Private counting with minimal space. In Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/e3019767b1b23f82883c9850356b71d6-Abstract.html.
  46. Shuang Song, Susan Little, Sanjay Mehta, Staal Vinterbo, and Kamalika Chaudhuri. Differentially private continual release of graph statistics, 2018. URL: https://doi.org/10.48550/ARXIV.1809.02575.
  47. Robert H. Morris Sr. Counting large numbers of events in small registers. Commun. ACM, 21(10):840-842, 1978. Google Scholar
  48. Mikkel Thorup and Yin Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In SODA, volume 4, pages 615-624, 2004. Google Scholar
  49. Jalaj Upadhyay. Sublinear space private algorithms under the sliding window model. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6363-6372. PMLR, 09-15 June 2019. URL: https://proceedings.mlr.press/v97/upadhyay19a.html.
  50. Lun Wang, Iosif Pinelis, and Dawn Song. Differentially private fractional frequency moments estimation with polylogarithmic space. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL: https://openreview.net/forum?id=7I8LPkcx8V.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail