Locally Private Histograms in All Privacy Regimes

Authors Clément L. Canonne , Abigail Gentle



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2025.25.pdf
  • Filesize: 0.87 MB
  • 24 pages

Document Identifiers

Author Details

Clément L. Canonne
  • School of Computer Science, University of Sydney, Australia
Abigail Gentle
  • School of Computer Science, University of Sydney, Australia

Acknowledgements

The authors would like to thank Guy Blanc for the proof of Lemma 17, and Albert Cheu for insightful discussions regarding the use of amplification by shuffling (Theorem 25). This work was done in part while the authors were visiting the Simons Institute for the Theory of Computing.

Cite As Get BibTex

Clément L. Canonne and Abigail Gentle. Locally Private Histograms in All Privacy Regimes. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 25:1-25:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ITCS.2025.25

Abstract

Frequency estimation, a.k.a. histograms, is a workhorse of data analysis, and as such has been thoroughly studied under differentially privacy. In particular, computing histograms in the local model of privacy has been the focus of a fruitful recent line of work, and various algorithms have been proposed, achieving the order-optimal 𝓁_∞ error in the high-privacy (small ε) regime while balancing other considerations such as time- and communication-efficiency. However, to the best of our knowledge, the picture is much less clear when it comes to the medium- or low-privacy regime (large ε), despite its increased relevance in practice. In this paper, we investigate locally private histograms, and the very related distribution learning task, in this medium-to-low privacy regime, and establish near-tight (and somewhat unexpected) bounds on the 𝓁_∞ error achievable. As a direct corollary of our results, we obtain a protocol for histograms in the shuffle model of differential privacy, with accuracy matching previous algorithms but significantly better message and communication complexity. 
Our theoretical findings emerge from a novel analysis, which appears to improve bounds across the board for the locally private histogram problem. We back our theoretical findings by an empirical comparison of existing algorithms in all privacy regimes, to assess their typical performance and behaviour beyond the worst-case setting.

Subject Classification

ACM Subject Classification
  • Security and privacy
  • Security and privacy → Usability in security and privacy
  • Security and privacy → Privacy protections
  • Theory of computation → Theory of database privacy and security
Keywords
  • Differential Privacy
  • Local Differential Privacy
  • Histograms
  • Frequency Estimation
  • Lower Bounds
  • Maximum Error

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jayadev Acharya, Clément L. Canonne, Ziteng Sun, and Himanshu Tyagi. Unified lower bounds for interactive high-dimensional estimation under information constraints. In NeurIPS, 2023. Google Scholar
  2. Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi. Inference under information constraints I: Lower bounds from chi-square contraction. Institute of Electrical and Electronics Engineers, 66(12):7835-7855, 2020. URL: https://doi.org/10.1109/TIT.2020.3028440.
  3. Jayadev Acharya and Ziteng Sun. Communication complexity in locally private distribution estimation and heavy hitters. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 51-60, Long Beach, California, USA, 2019. PMLR. URL: http://proceedings.mlr.press/v97/acharya19c.html.
  4. Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Hadamard response: Estimating distributions privately, efficiently, and with little communication. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 1120-1129. PMLR, 2019. URL: http://proceedings.mlr.press/v89/acharya19a.html.
  5. Apple Privacy Team. Learning with privacy at scale, 2017. URL: https://machinelearning.apple.com/research/learning-with-privacy-at-scale.
  6. Victor Balcer and Albert Cheu. Separating Local & Shuffled Differential Privacy via Histograms, April 2020. https://arxiv.org/abs/1911.06879, URL: https://doi.org/10.48550/arXiv.1911.06879.
  7. Victor Balcer and Albert Cheu. Separating local & shuffled differential privacy via histograms. In ITC, volume 163 of LIPIcs, pages 1:1-1:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPICS.ITC.2020.1.
  8. Raef Bassily and Adam Smith. Local, Private, Efficient Protocols for Succinct Histograms. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pages 127-135, June 2015. URL: https://doi.org/10.1145/2746539.2746632.
  9. Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnés, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. CoRR, abs/1710.00901, 2017. URL: https://arxiv.org/abs/1710.00901.
  10. Moïse Blanchard and Vaclav Voracek. Tight bounds for local glivenko-cantelli. In Claire Vernade and Daniel Hsu, editors, Proceedings of The 35th International Conference on Algorithmic Learning Theory, volume 237 of Proceedings of Machine Learning Research, pages 179-220. PMLR, February 2024. URL: https://proceedings.mlr.press/v237/blanchard24a.html.
  11. Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford, 2013. Google Scholar
  12. V. V. Buldygin and K. K. Moskvichova. The sub-Gaussian norm of a binary random variable. Theory of Probability and Mathematical Statistics, 86:33-49, August 2013. URL: https://doi.org/10.1090/S0094-9000-2013-00887-4.
  13. Clément L. Canonne. A short note on learning discrete distributions, 2020. URL: https://arxiv.org/abs/2002.11457.
  14. Wei-Ning Chen, Peter Kairouz, and Ayfer Özgür. Breaking the communication-privacy-accuracy trilemma. IEEE Trans. Inf. Theory, 69(2):1261-1281, 2023. URL: https://doi.org/10.1109/TIT.2022.3218772.
  15. Albert Cheu, Adam D. Smith, Jonathan R. Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In EUROCRYPT (1), volume 11476 of Lecture Notes in Computer Science, pages 375-403. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-17653-2_13.
  16. Albert Cheu and Maxim Zhilyaev. Differentially private histograms in the shuffle model from fake users. In SP, pages 440-457. IEEE, 2022. URL: https://doi.org/10.1109/SP46214.2022.9833614.
  17. Doron Cohen and Aryeh Kontorovich. Local glivenko-cantelli. In COLT, volume 195 of Proceedings of Machine Learning Research, page 715. PMLR, 2023. URL: https://proceedings.mlr.press/v195/cohen23a.html.
  18. Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, and Tianhao Wang. Privacy at scale: Local differential privacy in practice. In SIGMOD Conference, pages 1655-1658. ACM, 2018. URL: https://doi.org/10.1145/3183713.3197390.
  19. John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Local privacy and statistical minimax rates. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, pages 429-438. IEEE Computer Society, 2013. URL: https://doi.org/10.1109/FOCS.2013.53.
  20. John C. Duchi and Ryan Rogers. Lower bounds for locally private estimation via communication complexity. In COLT, volume 99 of Proceedings of Machine Learning Research, pages 1161-1191. PMLR, 2019. URL: http://proceedings.mlr.press/v99/duchi19a.html.
  21. Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar, and Abhradeep Thakurta. Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation. arXiv:2001.03618 [cs], January 2020. URL: https://arxiv.org/abs/2001.03618.
  22. Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM Conference on Computer and Communications Security, CCS '14, pages 1054-1067, New York, NY, USA, 2014. ACM. URL: https://doi.org/10.1145/2660267.2660348.
  23. Vitaly Feldman, Audra McMillan, and Kunal Talwar. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 954-964. IEEE, 2021. URL: https://doi.org/10.1109/FOCS52979.2021.00096.
  24. Vitaly Feldman, Jelani Nelson, Huy Nguyen, and Kunal Talwar. Private frequency estimation via projective geometry. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 6418-6433. PMLR, July 2022. URL: https://proceedings.mlr.press/v162/feldman22a.html.
  25. Vitaly Feldman and Kunal Talwar. Lossless compression of efficient private local randomizers. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 3208-3219. PMLR, July 2021. URL: https://proceedings.mlr.press/v139/feldman21a.html.
  26. Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In EUROCRYPT (3), volume 12698 of Lecture Notes in Computer Science, pages 463-488. Springer, 2021. URL: https://doi.org/10.1007/978-3-030-77883-5_16.
  27. Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 3505-3514. PMLR, 2020. URL: http://proceedings.mlr.press/v119/ghazi20a.html.
  28. Justin Hsu, Sanjeev Khanna, and Aaron Roth. Distributed private heavy hitters. In ICALP (1), volume 7391 of Lecture Notes in Computer Science, pages 461-472. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-31594-7_39.
  29. Ziyue Huang, Yuan Qiu, Ke Yi, and Graham Cormode. Frequency estimation under multiparty differential privacy: One-shot and streaming. Proc. VLDB Endow., 15(10):2058-2070, 2022. URL: https://doi.org/10.14778/3547305.3547312.
  30. Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Extremal mechanisms for local differential privacy. Journal of Machine Learning Research, 17(17):1-51, 2016. URL: http://jmlr.org/papers/v17/15-135.html.
  31. Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, and Ananda Theertha Suresh. On Learning Distributions from their Samples. In Proceedings of The 28th Conference on Learning Theory, pages 1066-1100. PMLR, June 2015. URL: http://proceedings.mlr.press/v40/Kamath15.html.
  32. Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. Google Scholar
  33. Shaowei Wang, Liusheng Huang, Pengzhan Wang, Yiwen Nie, Hongli Xu, Wei Yang, Xiang-Yang Li, and Chunming Qiao. Mutual information optimally local private discrete distribution estimation. ArXiV, abs/1607.08025, 2016. URL: https://arxiv.org/abs/1607.08025.
  34. Min Ye and Alexander Barg. Optimal schemes for discrete distribution estimation under locally differential privacy. Institute of Electrical and Electronics Engineers, 64(8):5662-5676, 2018. URL: https://doi.org/10.1109/TIT.2018.2809790.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail