Private Data Stream Analysis for Universal Symmetric Norm Estimation

Authors Vladimir Braverman, Joel Manning, Zhiwei Steven Wu, Samson Zhou



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2023.45.pdf
  • Filesize: 0.82 MB
  • 24 pages

Document Identifiers

Author Details

Vladimir Braverman
  • Rice University, Houston, TX, USA
Joel Manning
  • Carnegie Mellon University, Pittsburgh, PA, USA
Zhiwei Steven Wu
  • Carnegie Mellon University, Pittsburgh, PA, USA
Samson Zhou
  • University of California Berkeley, CA, USA
  • Rice University, Houston, TX, USA

Cite As Get BibTex

Vladimir Braverman, Joel Manning, Zhiwei Steven Wu, and Samson Zhou. Private Data Stream Analysis for Universal Symmetric Norm Estimation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 275, pp. 45:1-45:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2023.45

Abstract

We study how to release summary statistics on a data stream subject to the constraint of differential privacy. In particular, we focus on releasing the family of symmetric norms, which are invariant under sign-flips and coordinate-wise permutations on an input data stream and include L_p norms, k-support norms, top-k norms, and the box norm as special cases. Although it may be possible to design and analyze a separate mechanism for each symmetric norm, we propose a general parametrizable framework that differentially privately releases a number of sufficient statistics from which the approximation of all symmetric norms can be simultaneously computed. Our framework partitions the coordinates of the underlying frequency vector into different levels based on their magnitude and releases approximate frequencies for the "heavy" coordinates in important levels and releases approximate level sizes for the "light" coordinates in important levels. Surprisingly, our mechanism allows for the release of an arbitrary number of symmetric norm approximations without any overhead or additional loss in privacy. Moreover, our mechanism permits (1+α)-approximation to each of the symmetric norms and can be implemented using sublinear space in the streaming model for many regimes of the accuracy and privacy parameters.

Subject Classification

ACM Subject Classification
  • Security and privacy → Usability in security and privacy
Keywords
  • Differential privacy
  • norm estimation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jayadev Acharya and Ziteng Sun. Communication complexity in locally private distribution estimation and heavy hitters. In Proceedings of the 36th International Conference on Machine Learning, ICML, pages 51-60, 2019. Google Scholar
  2. Gergely Ács, Claude Castelluccia, and Rui Chen. Differentially private histogram publishing through lossy compression. In 12th IEEE International Conference on Data Mining, ICDM, pages 1-10, 2012. Google Scholar
  3. Aditya Akella, Ashwin Bharambe, Mike Reiter, and Srinivasan Seshan. Detecting ddos attacks on isp networks. In Proceedings of the Twenty-Second ACM SIGMOD/PODS Workshop on Management and Processing of Data Streams, pages 1-3, 2003. Google Scholar
  4. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. Google Scholar
  5. Alexandr Andoni. High frequency moments via max-stability. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pages 6364-6368, 2017. Google Scholar
  6. Alexandr Andoni, Huy L. Nguyen, Aleksandar Nikolov, Ilya P. Razenshteyn, and Erik Waingarten. Approximate near neighbors for general symmetric norms. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theoryof Computing, STOC, pages 902-913, 2017. Google Scholar
  7. Andreas Argyriou, Rina Foygel, and Nathan Srebro. Sparse prediction with the k-support norm. In Advances in Neural Information Processing Systems 25: Annual Conference on Neural Information Processing Systems, pages 1466-1474, 2012. Google Scholar
  8. Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Thakurta. Practical locally private heavy hitters. J. Mach. Learn. Res., 21:16:1-16:42, 2020. Google Scholar
  9. Raef Bassily and Adam D. Smith. Local, private, efficient protocols for succinct histograms. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC, pages 127-135, 2015. Google Scholar
  10. Rajendra Bhatia. Matrix analysis, volume 169. Springer Science & Business Media, 2013. Google Scholar
  11. Jaroslaw Blasiok, Vladimir Braverman, Stephen R. Chestnut, Robert Krauthgamer, and Lin F. Yang. Streaming symmetric norms via measure concentration. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 716-729, 2017. Google Scholar
  12. Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The johnson-lindenstrauss transform itself preserves differential privacy. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 410-419, 2012. Google Scholar
  13. Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee, and Samson Zhou. How to make your approximation algorithm private: A black-box differentially-private transformation for tunable approximation algorithms of functions with low sensitivity. CoRR, abs/2210.03831, 2022. Google Scholar
  14. Jeremiah Blocki, Seunghoon Lee, Tamalika Mukherjee, and Samson Zhou. Differentially private L₂-heavy hitters in the sliding window model. In The Eleventh International Conference on Learning Representations, ICLR, 2023. Google Scholar
  15. Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff. Bptree: An l₂ heavy hitters algorithm using constant memory. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pages 361-376, 2017. Google Scholar
  16. Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, and David P. Woodruff. Beating countsketch for heavy hitters in insertion streams. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 740-753, 2016. Google Scholar
  17. Vladimir Braverman, Elena Grigorescu, Harry Lang, David P. Woodruff, and Samson Zhou. Nearly optimal distinct elements and heavy hitters on sliding windows. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 7:1-7:22, 2018. Google Scholar
  18. Vladimir Braverman, Emanuele Viola, David P. Woodruff, and Lin F. Yang. Revisiting frequency moment estimation in random order streams. In 45th International Colloquium on Automata, Languages, and Programming, ICALP, pages 25:1-25:14, 2018. Google Scholar
  19. Vladimir Braverman, Viska Wei, and Samson Zhou. Symmetric norm estimation and regression on sliding windows. In Computing and Combinatorics - 27th International Conference, COCOON, Proceedings, pages 528-539, 2021. Google Scholar
  20. Leo Breiman. Random forests. Machine learning, 45(1):5-32, 2001. Google Scholar
  21. Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Judy Hanwen Shen, and Uthaipon Tantipongpipat. Fast and memory efficient differentially private-sgd via JL projections. CoRR, abs/2102.03013, 2021. Google Scholar
  22. Mark Bun, Jelani Nelson, and Uri Stemmer. Heavy hitters and the structure of local privacy. ACM Trans. Algorithms, 15(4):51:1-51:40, 2019. Google Scholar
  23. T.-H. Hubert Chan, Mingfei Li, Elaine Shi, and Wenchang Xu. Differentially private continual monitoring of heavy hitters from distributed streams. In Privacy Enhancing Technologies - 12th International Symposium, PETS Proceedings, pages 140-159, 2012. Google Scholar
  24. Moses Charikar, Kevin C. Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theor. Comput. Sci., 312(1):3-15, 2004. Google Scholar
  25. Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. Proc. Priv. Enhancing Technol., 2020(3):153-174, 2020. Google Scholar
  26. Graham Cormode, S. Muthukrishnan, and Irina Rozenbaum. Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In Proceedings of the 31st International Conference on Very Large Data Bases, pages 25-36, 2005. Google Scholar
  27. Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pages 3571-3580, 2017. Google Scholar
  28. Itai Dinur, Uri Stemmer, David P. Woodruff, and Samson Zhou. On differential privacy and adaptive data analysis with bounded space. In Advances in Cryptology - EUROCRYPT - 42nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Proceedings, Part III, pages 35-65, 2023. Google Scholar
  29. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC, Proceedings, pages 265-284, 2006. Google Scholar
  30. Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin. Pan-private streaming algorithms. In Innovations in Computer Science - ICS. Proceedings, pages 66-80, 2010. Google Scholar
  31. Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211-407, 2014. Google Scholar
  32. Cristian Estan, George Varghese, and Mike Fisk. Bitmap algorithms for counting active flows on high speed links. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 153-166, 2003. Google Scholar
  33. Sheldon J. Finkelstein, Mario Schkolnick, and Paolo Tiberio. Physical database design for relational databases. ACM Trans. Database Syst., 13(1):91-128, 1988. Google Scholar
  34. Sumit Ganguly. Taylor polynomial estimator for estimating frequency moments. In Automata, Languages, and Programming - 42nd International Colloquium, ICALP Proceedings, Part I, pages 542-553, 2015. Google Scholar
  35. Sumit Ganguly and David P. Woodruff. High probability frequency moment sketches. In 45th International Colloquium on Automata, Languages, and Programming, ICALP, pages 58:1-58:15, 2018. Google Scholar
  36. Corrado Gini. Variabilità e mutabilità. Reprinted in Memorie di metodologica statistica, 1912. Google Scholar
  37. Nicholas J. A. Harvey, Jelani Nelson, and Krzysztof Onak. Sketching and streaming entropy via approximation theory. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 489-498, 2008. Google Scholar
  38. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, 2006. Google Scholar
  39. Piotr Indyk and Andrew McGregor. Declaring independence via the sketching of sketches. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 737-745, 2008. Google Scholar
  40. Piotr Indyk and David P. Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing, pages 202-208, 2005. Google Scholar
  41. Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC, pages 745-754, 2011. Google Scholar
  42. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1161-1178, 2010. Google Scholar
  43. Bo'az Klartag and Roman Vershynin. Small ball probability and dvoretzky’s theorem. Israel Journal of Mathematics, 157(1):193-207, 2007. Google Scholar
  44. Balachander Krishnamurthy, Subhabrata Sen, Yin Zhang, and Yan Chen. Sketch-based change detection: Methods, evaluation, and applications. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 234-247, 2003. Google Scholar
  45. Ping Li. Estimators and tail bounds for dimension reduction in l_α (0 less α ≤ 2) using stable random projections. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 10-19, 2008. Google Scholar
  46. Yi Li and David P. Woodruff. A tight lower bound for high frequency moment estimation with small error. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 16th International Workshop, APPROX 2013, and 17th International Workshop, RANDOM. Proceedings, pages 623-638, 2013. Google Scholar
  47. Yi Li and David P. Woodruff. Input-sparsity low rank approximation in schatten norm. In Proceedings of the 37th International Conference on Machine Learning, ICML, pages 6001-6009, 2020. Google Scholar
  48. Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, Vyas Sekar, and Vladimir Braverman. One sketch to rule them all: Rethinking network flow monitoring with univmon. In Proceedings of the ACM SIGCOMM 2016 Conference, pages 101-114, 2016. Google Scholar
  49. Zaoxing Liu, Samson Zhou, Ori Rottenstreich, Vladimir Braverman, and Jennifer Rexford. Memory-efficient performance monitoring on programmable switches with lean algorithms. In 1st Symposium on Algorithmic Principles of Computer Systems, APOCS, pages 31-44, 2020. Google Scholar
  50. Max O Lorenz. Methods of measuring the concentration of wealth. Publications of the American statistical association, 9(70):209-219, 1905. Google Scholar
  51. Andrew M. McDonald, Massimiliano Pontil, and Dimitris Stamos. Spectral k-support norm regularization. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pages 3644-3652, 2014. Google Scholar
  52. Vitali D Milman and Gideon Schechtman. Asymptotic theory of finite dimensional normed spaces: Isoperimetric inequalities in riemannian manifolds, volume 1200. Springer, 2009. Google Scholar
  53. Darakhshan J. Mir, S. Muthukrishnan, Aleksandar Nikolov, and Rebecca N. Wright. Pan-private algorithms via statistics on sketches. In Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 37-48, 2011. Google Scholar
  54. Noam Nisan. Pseudorandom generators for space-bounded computation. Comb., 12(4):449-461, 1992. Google Scholar
  55. Christopher R Palmer, Georgos Siganos, Michalis Faloutsos, Christos Faloutsos, and Phillip B Gibbons. The connectivity and fault-tolerance of the internet topology, 2001. Google Scholar
  56. Gang Qiao, Weijie J. Su, and Li Zhang. Oneshot differentially private top-k selection. In Proceedings of the 38th International Conference on Machine Learning, ICML, pages 8672-8681, 2021. Google Scholar
  57. Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pages 23-34, 1979. Google Scholar
  58. Or Sheffet. Differentially private ordinary least squares. J. Priv. Confidentiality, 9(1), 2019. Google Scholar
  59. Adam D. Smith, Shuang Song, and Abhradeep Thakurta. The flajolet-martin sketch itself preserves differential privacy: Private counting with minimal space. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, NeurIPS, 2020. Google Scholar
  60. Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, and Peilin Zhong. Efficient symmetric norm regression via linear sketching. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS, pages 828-838, 2019. Google Scholar
  61. Jakub Tetek. Additive noise mechanisms for making randomized approximation algorithms differentially private. CoRR, abs/2211.03695, 2022. Google Scholar
  62. Mikkel Thorup and Yin Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 615-624, 2004. Google Scholar
  63. Lun Wang, Iosif Pinelis, and Dawn Song. Differentially private fractional frequency moments estimation with polylogarithmic space. In The Tenth International Conference on Learning Representations, ICLR, 2022. Google Scholar
  64. Tianhao Wang, Milan Lopuhaä-Zwakenberg, Zitao Li, Boris Skoric, and Ninghui Li. Locally differentially private frequency estimation with consistency. In 27th Annual Network and Distributed System Security Symposium, NDSS, 2020. Google Scholar
  65. David P. Woodruff and Samson Zhou. Separations for estimating large frequency moments on data streams. In 48th International Colloquium on Automata, Languages, and Programming, ICALP, volume 198, pages 112:1-112:21, 2021. Google Scholar
  66. David P. Woodruff and Samson Zhou. Tight bounds for adversarially robust streams and sliding windows via difference estimators. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 1183-1196, 2021. Google Scholar
  67. Bin Wu, Chao Ding, Defeng Sun, and Kim-Chuan Toh. On the moreau-yosida regularization of the vector k-norm related functions. SIAM J. Optim., 24(2):766-794, 2014. Google Scholar
  68. Jia Xu, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, Ge Yu, and Marianne Winslett. Differentially private histogram publication. VLDB J., 22(6):797-822, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail