Private Distribution Testing with Heterogeneous Constraints: Your Epsilon Might Not Be Mine

Authors Clément L. Canonne , Yucheng Sun



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2024.23.pdf
  • Filesize: 0.91 MB
  • 24 pages

Document Identifiers

Author Details

Clément L. Canonne
  • University of Sydney, School of Computer Science, Australia
Yucheng Sun
  • ETH Zürich, Switzerland

Cite AsGet BibTex

Clément L. Canonne and Yucheng Sun. Private Distribution Testing with Heterogeneous Constraints: Your Epsilon Might Not Be Mine. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 23:1-23:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ITCS.2024.23

Abstract

Private closeness testing asks to decide whether the underlying probability distributions of two sensitive datasets are identical or differ significantly in statistical distance, while guaranteeing (differential) privacy of the data. As in most (if not all) distribution testing questions studied under privacy constraints, however, previous work assumes that the two datasets are equally sensitive, i.e., must be provided the same privacy guarantees. This is often an unrealistic assumption, as different sources of data come with different privacy requirements; as a result, known closeness testing algorithms might be unnecessarily conservative, "paying" too high a privacy budget for half of the data. In this work, we initiate the study of the closeness testing problem under heterogeneous privacy constraints, where the two datasets come with distinct privacy requirements. We formalize the question and provide algorithms under the three most widely used differential privacy settings, with a particular focus on the local and shuffle models of privacy; and show that one can indeed achieve better sample efficiency when taking into account the two different "epsilon" requirements.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Security and privacy
Keywords
  • differential privacy
  • distribution testing
  • local privacy
  • shuffle privacy

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jayadev Acharya, Clément L. Canonne, Cody Freitag, Ziteng Sun, and Himanshu Tyagi. Inference under information constraints III: local privacy constraints. IEEE J. Sel. Areas Inf. Theory, 2(1):253-267, 2021. Google Scholar
  2. Jayadev Acharya, Clément L. Canonne, Yanjun Han, Ziteng Sun, and Himanshu Tyagi. Domain compression and its application to randomness-optimal distributed goodness-of-fit. In Jacob D. Abernethy and Shivani Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 3-40. PMLR, 2020. URL: http://proceedings.mlr.press/v125/acharya20a.html.
  3. Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi. Inference under information constraints I: Lower bounds from chi-square contraction. IEEE Trans. Inform. Theory, 66(12):7835-7855, 2020. Preprint available at arXiv:abs/1812.11476. URL: https://doi.org/10.1109/TIT.2020.3028440.
  4. Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi. Inference under information constraints II: Communication constraints and shared randomness. IEEE Transactions on Information Theory, 2020. In press. Preprint available at arXiv:abs/1804.06952. URL: https://doi.org/10.1109/TIT.2020.3028439.
  5. Jayadev Acharya and Ziteng Sun. Communication complexity in locally private distribution estimation and heavy hitters. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 51-60. PMLR, 2019. URL: http://proceedings.mlr.press/v97/acharya19c.html.
  6. Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially private testing of identity and closeness of discrete distributions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 6878-6891. Curran Associates, Inc., 2018. URL: http://papers.nips.cc/paper/7920-differentially-private-testing-of-identity-and-closeness-of-discrete-distributions.pdf.
  7. Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Hadamard response: Estimating distributions privately, efficiently, and with little communication. In Kamalika Chaudhuri and Masashi Sugiyama, editors, The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, volume 89 of Proceedings of Machine Learning Research, pages 1120-1129. PMLR, 2019. URL: http://proceedings.mlr.press/v89/acharya19a.html.
  8. Mohammad Alaggan, Sébastien Gambs, and Anne-Marie Kermarrec. Heterogeneous differential privacy. J. Priv. Confidentiality, 7(2), 2016. URL: https://doi.org/10.29012/jpc.v7i2.652.
  9. Maryam Aliakbarpour, Ilias Diakonikolas, Daniel Kane, and Ronitt Rubinfeld. Private testing of distributions via sample permutations. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dquotesingle Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 10878-10889. Curran Associates, Inc., 2019. URL: http://papers.nips.cc/paper/9270-private-testing-of-distributions-via-sample-permutations.pdf.
  10. Maryam Aliakbarpour, Ilias Diakonikolas, and Ronitt Rubinfeld. Differentially private identity and equivalence testing of discrete distributions. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 169-178, Stockholmsmässan, Stockholm Sweden, 10-15 July 2018. PMLR. URL: http://proceedings.mlr.press/v80/aliakbarpour18a.html.
  11. Kareem Amin, Matthew Joseph, and Jieming Mao. Pan-private uniformity testing. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 183-218. PMLR, 09-12 July 2020. URL: http://proceedings.mlr.press/v125/amin20a.html.
  12. Sivaraman Balakrishnan and Larry Wasserman. Hypothesis testing for high-dimensional multinomials: A selective review. The Annals of Applied Statistics, 12(2):727-749, 2018. URL: https://doi.org/10.1214/18-AOAS1155SF.
  13. Victor Balcer, Albert Cheu, Matthew Joseph, and Jieming Mao. Connecting robust shuffle privacy and pan-privacy. CoRR, abs/2004.09481, 2020. URL: https://arxiv.org/abs/2004.09481.
  14. Borja Balle, Gilles Barthe, and Marco Gaboardi. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In NeurIPS, pages 6280-6290, 2018. Google Scholar
  15. Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, pages 189-197, 2000. Google Scholar
  16. Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnés, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In SOSP, pages 441-459. ACM, 2017. Google Scholar
  17. Clément L. Canonne. A Survey on Distribution Testing: your data is Big. But is it Blue? Electronic Colloquium on Computational Complexity (ECCC), 22:63, April 2015. URL: http://eccc.hpi-web.de/report/2015/063.
  18. Clément L. Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Number 9 in Graduate Surveys. Theory of Computing Library, 2020. URL: https://doi.org/10.4086/toc.gs.2020.009.
  19. Clément L. Canonne. Topics and techniques in distribution testing: A biased but representative sample. Found. Trends Commun. Inf. Theory, 19(6):1032-1198, 2022. Also available at https://ccanonne.github.io/survey-topics-dt.html. URL: https://doi.org/10.1561/0100000114.
  20. Clément L. Canonne, Xi Chen, Gautam Kamath, Amit Levi, and Erik Waingarten. Random restrictions of high dimensional distributions and uniformity testing with subcube conditioning. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 321-336. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.21.
  21. Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Testing bayesian networks. IEEE Trans. Inf. Theory, 66(5):3132-3170, 2020. URL: https://doi.org/10.1109/TIT.2020.2971625.
  22. Clément L. Canonne and Hongyi Lyu. Uniformity testing in the shuffle model: Simpler, better, faster. In SOSA, pages 182-202. SIAM, 2022. Google Scholar
  23. Clément L. Canonne and Yucheng Sun. Optimal closeness testing of discrete distributions made (complex) simple. CoRR, abs/2204.12640, 2022. Google Scholar
  24. Clément L. Canonne and Yucheng Sun. Private distribution testing with heterogeneous constraints: Your epsilon might not be mine. CoRR, abs/2309.06068, 2023. URL: https://doi.org/10.48550/arXiv.2309.06068.
  25. Siu-on Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of SODA, pages 1193-1203, 2014. Google Scholar
  26. Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In Advances in cryptology - EUROCRYPT 2019. Part I, volume 11476 of Lecture Notes in Comput. Sci., pages 375-403. Springer, Cham, 2019. URL: https://doi.org/10.1007/978-3-030-17653-2_13.
  27. Albert Cheu and Chao Yan. Pure differential privacy from secure intermediaries. CoRR, abs/2112.10032, 2021. Google Scholar
  28. Ilias Diakonikolas, Themis Gouleakis, Daniel M. Kane, John Peebles, and Eric Price. Optimal testing of discrete distributions with high probability. In STOC '21 - Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 542-555. ACM, New York, 2021. URL: https://doi.org/10.1145/3406325.3450997.
  29. Ilias Diakonikolas and Daniel M. Kane. A new approach for testing properties of discrete distributions. In 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2016. IEEE Computer Society, 2016. Google Scholar
  30. John C. Duchi and Martin J. Wainwright. Distance-based and continuum Fano inequalities with applications to statistical estimation. arXiv, abs/1311.2669, 2013. URL: https://arxiv.org/abs/1311.2669.
  31. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography, volume 3876 of Lecture Notes in Comput. Sci., pages 265-284. Springer, Berlin, 2006. Google Scholar
  32. Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: from local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2468-2479. SIAM, Philadelphia, PA, 2019. URL: https://doi.org/10.1137/1.9781611975482.151.
  33. Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian, and Asuman E. Ozdaglar. Optimal and differentially private data acquisition: Central and local mechanisms. In EC, page 1141. ACM, 2022. Google Scholar
  34. Vitaly Feldman, Audra McMillan, and Kunal Talwar. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In FOCS, pages 954-964. IEEE, 2021. Google Scholar
  35. Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 3505-3514. PMLR, 2020. URL: http://proceedings.mlr.press/v119/ghazi20a.html.
  36. Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017. URL: http://www.wisdom.weizmann.ac.il/~oded/pt-intro.html.
  37. Zach Jorgensen, Ting Yu, and Graham Cormode. Conservative or liberal? personalized differential privacy. In Johannes Gehrke, Wolfgang Lehner, Kyuseok Shim, Sang Kyun Cha, and Guy M. Lohman, editors, 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pages 1023-1034. IEEE Computer Society, 2015. URL: https://doi.org/10.1109/ICDE.2015.7113353.
  38. Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM J. Comput., 40(3):793-826, 2011. Google Scholar
  39. Ninghui Li, Wahbeh H. Qardaji, and Dong Su. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Heung Youl Youm and Yoojae Won, editors, 7th ACM Symposium on Information, Compuer and Communications Security, ASIACCS '12, Seoul, Korea, May 2-4, 2012, pages 32-33. ACM, 2012. URL: https://doi.org/10.1145/2414456.2414474.
  40. Ben Niu, Yahong Chen, Boyang Wang, Zhibo Wang, Fenghua Li, and Jin Cao. Adapdp: Adaptive personalized differential privacy. In INFOCOM, pages 1-10. IEEE, 2021. Google Scholar
  41. Ronitt Rubinfeld. Taming big probability distributions. XRDS: Crossroads, The ACM Magazine for Students, 19(1):24, September 2012. URL: https://doi.org/10.1145/2331042.2331052.
  42. Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927-1968, 2011. Google Scholar
  43. Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63-69, 1965. URL: https://doi.org/10.1080/01621459.1965.10480775.
  44. Huanyu Zhang. Statistical Inference in the Differential Privacy Model. PhD thesis, Cornell University, 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail