Private Counting of Distinct and k-Occurring Items in Time Windows

Authors Badih Ghazi, Ravi Kumar, Jelani Nelson, Pasin Manurangsi



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2023.55.pdf
  • Filesize: 0.79 MB
  • 24 pages

Document Identifiers

Author Details

Badih Ghazi
  • Google, Mountain View, CA, USA
Ravi Kumar
  • Google, Mountain View, CA, USA
Jelani Nelson
  • UC Berkeley, CA, USA
  • Google, Mountain View, CA, USA
Pasin Manurangsi
  • Google, Mountain View, CA, USA

Cite AsGet BibTex

Badih Ghazi, Ravi Kumar, Jelani Nelson, and Pasin Manurangsi. Private Counting of Distinct and k-Occurring Items in Time Windows. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 55:1-55:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ITCS.2023.55

Abstract

In this work, we study the task of estimating the numbers of distinct and k-occurring items in a time window under the constraint of differential privacy (DP). We consider several variants depending on whether the queries are on general time windows (between times t₁ and t₂), or are restricted to being cumulative (between times 1 and t₂), and depending on whether the DP neighboring relation is event-level or the more stringent item-level. We obtain nearly tight upper and lower bounds on the errors of DP algorithms for these problems. En route, we obtain an event-level DP algorithm for estimating, at each time step, the number of distinct items seen over the last W updates with error polylogarithmic in W; this answers an open question of Bolot et al. (ICDT 2013).

Subject Classification

ACM Subject Classification
  • Theory of computation → Theory of database privacy and security
Keywords
  • Differential Privacy
  • Algorithms
  • Distinct Elements
  • Time Windows

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. The Aqua approximate query answering system. In SIGMOD, pages 574-576, 1999. Google Scholar
  2. Naman Agarwal and Karan Singh. The price of differential privacy for online learning. In ICML, pages 32-40, 2017. Google Scholar
  3. Aditya Akella, Ashwin Bharambe, Mike Reiter, and Srinivasan Seshan. Detecting DDoS attacks on ISP networks. In Workshop on Management and Processing of Data Streams, 2003. Google Scholar
  4. Daniel N Baker and Ben Langmead. Dashing: fast and accurate genomic distances with HyperLogLog. Genome Biology, 20(1):1-12, 2019. Google Scholar
  5. Victor Balcer, Albert Cheu, Matthew Joseph, and Jieming Mao. Connecting robust shuffle privacy and pan-privacy. In SODA, pages 2384-2403, 2021. Google Scholar
  6. Jean Bolot, Nadia Fawaz, S. Muthukrishnan, Aleksandar Nikolov, and Nina Taft. Private decayed predicate sums on streams. In ICDT, pages 284-295, 2013. Google Scholar
  7. Florian P Breitwieser, DN Baker, and Steven L Salzberg. Krakenuniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology, 19(1):1-10, 2018. Google Scholar
  8. Mark Bun, Jonathan R. Ullman, and Salil P. Vadhan. Fingerprinting codes and the price of approximate differential privacy. SIAM J. Comput., 47(5):1888-1938, 2018. Google Scholar
  9. Adrian Rivera Cardoso and Ryan Rogers. Differentially private histograms under continual observation: Streaming selection into the unknown. In AISTATS, pages 2397-2419, 2022. Google Scholar
  10. T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. ACM Trans. Inf. Syst. Secur., 14(3):26:1-26:24, 2011. Google Scholar
  11. Lijie Chen, Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. On distributed differential privacy and counting distinct elements. In ITCS, pages 56:1-56:18, 2021. Google Scholar
  12. Steven Chen. Ara, tell me what my campaign forecast looks like for today. URL: https://www.quantcast.com/blog/ara-tell-me-what-my-campaign-forecast-looks-like-for-today/, 2022.
  13. Yunjae Cheong, Federico de Gregorio, and Kihan Kim. The power of reach and frequency in the age of digital advertising: Offine and online media demand different metrics. J. Advertising Res., 50, 2010. Google Scholar
  14. Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In PODS, pages 202-210, 2003. Google Scholar
  15. Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486-503, 2006. Google Scholar
  16. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265-284, 2006. Google Scholar
  17. Cynthia Dwork, Frank McSherry, and Kunal Talwar. The price of privacy and the limits of LP decoding. In STOC, pages 85-94, 2007. Google Scholar
  18. Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In STOC, pages 715-724, 2010. Google Scholar
  19. Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin. Pan-private streaming algorithms. In ICS, pages 66-80, 2010. Google Scholar
  20. Cynthia Dwork, Moni Naor, Omer Reingold, and Guy N. Rothblum. Pure differential privacy for rectangle queries via private partitions. In ASIACRYPT, pages 735-751, 2015. Google Scholar
  21. Cynthia Dwork and Aaron Roth. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211-407, 2014. Google Scholar
  22. Cynthia Dwork, Guy N. Rothblum, and Salil P. Vadhan. Boosting and differential privacy. In FOCS, pages 51-60, 2010. Google Scholar
  23. Cristian Estan, George Varghese, and Mike Fisk. Bitmap algorithms for counting active flows on high speed links. In IMC, pages 153-166, 2003. Google Scholar
  24. Hendrik Fichtenberger, Monika Henzinger, and Wolfgang Ost. Differentially private algorithms for graphs under continual observation. In ESA, pages 42:1-42:16, 2021. Google Scholar
  25. Badih Ghazi, Ben Kreuter, Ravi Kumar, Pasin Manurangsi, Jiayu Peng, Evgeny Skvortsov, Yao Wang, and Craig Wright. Multiparty reach and frequency histogram: Private, secure, and practical. PoPETS, 2022(1):373-395, 2022. Google Scholar
  26. Abhradeep Guha Thakurta and Adam Smith. (Nearly) optimal algorithms for private online learning in full-information and bandit settings. NIPS, 26, 2013. Google Scholar
  27. Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In STOC, pages 705-714, 2010. Google Scholar
  28. Monika Henzinger and Jalaj Upadhyay. Constant matters: Fine-grained complexity of differentially private continual observation using completely bounded norms. arXiv, 2022. URL: http://arxiv.org/abs/2202.11205.
  29. Stefan Heule, Marc Nunkesser, and Alexander Hall. Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm. In EDBT, pages 683-692, 2013. Google Scholar
  30. Palak Jain, Sofya Raskhodnikova, Satchit Sivakumar, and Adam Smith. The price of differential privacy under continual observation. In TPDP@ICML, 2022. Google Scholar
  31. Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In COLT, pages 1-34, 2012. Google Scholar
  32. John D. Leckenby and Jongpil Hong. Using reach/frequency for web media planning. J. Advertising Res., 38, 1998. Google Scholar
  33. Brendan McMahan, Keith Rush, and Abhradeep Guha Thakurta. Private online prefix sums via optimal matrix factorizations. arXiv, 2022. URL: http://arxiv.org/abs/2202.08312.
  34. Frank McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Commun. ACM, 53(9):89-97, 2010. Google Scholar
  35. Darakhshan J. Mir, S. Muthukrishnan, Aleksandar Nikolov, and Rebecca N. Wright. Pan-private algorithms via statistics on sketches. In Maurizio Lenzerini and Thomas Schwentick, editors, PODS, pages 37-48, 2011. Google Scholar
  36. S. Muthukrishnan and Aleksandar Nikolov. Optimal private halfspace counting via discrepancy. In STOC, pages 1285-1292, 2012. Google Scholar
  37. Sriram Padmanabhan, Bishwaranjan Bhattacharjee, Tim Malkemus, Leslie Cranston, and Matthew Huras. Multi-dimensional clustering: A new data layout scheme in DB2. In SIGMOD, pages 637-641, 2003. Google Scholar
  38. Christopher R Palmer, Phillip B Gibbons, and Christos Faloutsos. ANF: a fast and scalable tool for data mining in massive graphs. In KDD, pages 81-90, 2002. Google Scholar
  39. Victor Perrier, Hassan Jameel Asghar, and Dali Kaafar. Private continual release of real-valued data streams. In NDSS, 2019. Google Scholar
  40. Viswanath Poosala, Peter J Haas, Yannis E Ioannidis, and Eugene J Shekita. Improved histograms for selectivity estimation of range predicates. SIGMOD Record, 25(2):294-305, 1996. Google Scholar
  41. P Griffiths Selinger, Morton M Astrahan, Donald D Chamberlin, Raymond A Lorie, and Thomas G Price. Access path selection in a relational database management system. In Readings in Artificial Intelligence and Databases, pages 511-522. Morgan Kaufmann, 1989. Google Scholar
  42. Amit Shukla, Prasad Deshpande, Jeffrey F Naughton, and Karthikeyan Ramasamy. Storage estimation for multidimensional aggregates in the presence of hierarchies. In VLDB, pages 522-531, 1996. Google Scholar
  43. Adam Smith, Shuang Song, and Abhradeep Guha Thakurta. The Flajolet-Martin sketch itself preserves differential privacy: Private counting with minimal space. In NeurIPS, pages 19561-19572, 2020. Google Scholar
  44. Shuang Song, Susan Little, Sanjay Mehta, Staal Vinterbo, and Kamalika Chaudhuri. Differentially private continual release of graph statistics. arXiv, 2018. URL: http://arxiv.org/abs/1809.02575.
  45. Thomas Steinke and Jonathan R. Ullman. Between pure and approximate differential privacy. J. Priv. Confidentiality, 7(2), 2016. Google Scholar
  46. Salil Vadhan. The Complexity of Differential Privacy. Springer, 2017. Google Scholar
  47. Wikipedia contributors. Effective frequency - Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Effective_frequency&oldid=1021978492, 2021. [Online; accessed 18-May-2022].
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail