Private Counting of Distinct Elements in the Turnstile Model and Extensions

Authors Monika Henzinger , A. R. Sricharan , Teresa Anna Steiner



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2024.40.pdf
  • Filesize: 0.92 MB
  • 21 pages

Document Identifiers

Author Details

Monika Henzinger
  • Institute of Science and Technology, Klosterneuburg, Austria
A. R. Sricharan
  • Faculty of Computer Science, Doctoral School Computer Science, University of Vienna, Austria
Teresa Anna Steiner
  • Technical University of Denmark, Lyngby, Denmark

Cite AsGet BibTex

Monika Henzinger, A. R. Sricharan, and Teresa Anna Steiner. Private Counting of Distinct Elements in the Turnstile Model and Extensions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 40:1-40:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2024.40

Abstract

Privately counting distinct elements in a stream is a fundamental data analysis problem with many applications in machine learning. In the turnstile model, Jain et al. [NeurIPS2023] initiated the study of this problem parameterized by the maximum flippancy of any element, i.e., the number of times that the count of an element changes from 0 to above 0 or vice versa. They give an item-level (ε,δ)-differentially private algorithm whose additive error is tight with respect to that parameterization. In this work, we show that a very simple algorithm based on the sparse vector technique achieves a tight additive error for item-level (ε,δ)-differential privacy and item-level ε-differential privacy with regards to a different parameterization, namely the sum of all flippancies. Our second result is a bound which shows that for a large class of algorithms, including all existing differentially private algorithms for this problem, the lower bound from item-level differential privacy extends to event-level differential privacy. This partially answers an open question by Jain et al. [NeurIPS2023].

Subject Classification

ACM Subject Classification
  • Security and privacy
Keywords
  • differential privacy
  • turnstile model
  • counting distinct elements

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Aditya Akella, Ashwin Bharambe, Mike Reiter, and Srinivasan Seshan. Detecting ddos attacks on isp networks. In Proceedings of the Workshop on Management and Processing of Data Streams, pages 1-2, 2003. Google Scholar
  2. Daniel N Baker and Ben Langmead. Dashing: fast and accurate genomic distances with hyperloglog. Genome biology, 20:1-12, 2019. Google Scholar
  3. Jean Bolot, Nadia Fawaz, S. Muthukrishnan, Aleksandar Nikolov, and Nina Taft. Private decayed predicate sums on streams. In Proc. 16th ICDT, pages 284-295, 2013. URL: https://doi.org/10.1145/2448496.2448530.
  4. Mark Bun, Jonathan R. Ullman, and Salil P. Vadhan. Fingerprinting codes and the price of approximate differential privacy. SIAM J. Comput., 47(5):1888-1938, 2018. URL: https://doi.org/10.1137/15M1033587.
  5. Vera Clemens, Lars-Christian Schulz, Marten Gartner, and David Hausheer. Ddos detection in P4 using HYPERLOGLOG and COUNTMIN sketches. In Proc. NOMS 2023, pages 1-6, 2023. Google Scholar
  6. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 of Lecture Notes in Computer Science, pages 265-284. Springer, 2006. URL: https://doi.org/10.1007/11681878_14.
  7. Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Rothblum, and Salil P. Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proc. 41st STOC, pages 381-390, 2009. URL: https://doi.org/10.1145/1536414.1536467.
  8. Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211-407, 2014. Google Scholar
  9. Alessandro Epasto, Jieming Mao, Andres Muñoz Medina, Vahab Mirrokni, Sergei Vassilvitskii, and Peilin Zhong. Differentially private continual releases of streaming frequency moment estimations. In Yael Tauman Kalai, editor, Proc. 14th ITCS, pages 48:1-48:24, 2023. URL: https://doi.org/10.4230/LIPIcs.ITCS.2023.48.
  10. Cristian Estan, George Varghese, and Michael E. Fisk. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Trans. Netw., 14(5):925-937, 2006. Google Scholar
  11. Hendrik Fichtenberger, Monika Henzinger, and Lara Ost. Differentially private algorithms for graphs under continual observation. In Proc. 29th ESA, pages 42:1-42:16, 2021. Google Scholar
  12. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182-209, 1985. Google Scholar
  13. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Proc. 2007 Conference on Analysis of Algorithms, pages 127-146, 2007. Google Scholar
  14. Badih Ghazi, Ravi Kumar, Jelani Nelson, and Pasin Manurangsi. Private counting of distinct and k-occurring items in time windows. In Proc. 14th ITCS, pages 55:1-55:24, 2023. URL: https://doi.org/10.4230/LIPIcs.ITCS.2023.55.
  15. Monika Henzinger, A. R. Sricharan, and Teresa Anna Steiner. Differentially private histogram, predecessor, and set cardinality under continual observation, 2023. URL: https://arxiv.org/abs/2306.10428.
  16. Palak Jain, Iden Kalemaj, Sofya Raskhodnikova, Satchit Sivakumar, and Adam Smith. Counting distinct elements in the turnstile model with differential privacy under continual observation, 2023. URL: https://arxiv.org/abs/2306.06723.
  17. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proc. 29th PODS, pages 41-52, 2010. Google Scholar
  18. Matti Karppa and Rasmus Pagh. Hyperlogloglog: Cardinality estimation with one log more. In Proc. 28th KDD, pages 753-761, 2022. Google Scholar
  19. Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic. In Prov. 11th EDBT, pages 618-629, 2008. Google Scholar
  20. Dingyu Wang and Seth Pettie. Better cardinality estimators for hyperloglog, pcsa, and beyond. In Proc. 42nd PODS, pages 317-327, 2023. Google Scholar
  21. Lotte Weedage, Nelly Litvak, and Clara Stegehuis. Locating highly connected clusters in large networks with hyperloglog counters. J. Complex Networks, 9(2), 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail