ACM Other Conferences

10.1145/acmotherconferences

0000000

10.5555/0000000

Proceedings of the Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)

APPROX/RANDOM 2024

10.4230/LIPIcs.APPROX/RANDOM.2024.73

10002978

Security and privacy

500

10003752.10003809.10010055

Theory of computation~Streaming, sublinear and near linear time algorithms

500

Additive Noise Mechanisms for Making Randomized Approximation Algorithms Differentially Private

https://orcid.org/0000-0002-2046-1627

Tětek

Jakub

INSAIT, Sofia, Bulgaria j.tetek@gmail.com Author

16 09 2024

73:1 73:20

The exponential increase in the amount of available data makes taking advantage of them without violating users' privacy one of the fundamental problems of computer science. This question has been investigated thoroughly under the framework of differential privacy. However, most of the literature has not focused on settings where the amount of data is so large that we are not even able to compute the exact answer in the non-private setting (such as in the streaming setting, sublinear-time setting, etc.). This can often make the use of differential privacy unfeasible in practice.

In this paper, we show a general approach for making Monte-Carlo randomized approximation algorithms differentially private. We only need to assume the error R of the approximation algorithm is sufficiently concentrated around 0 (e.g. 𝔼[|R|] is bounded) and that the function being approximated has a small global sensitivity Δ. Specifically, if we have a randomized approximation algorithm with sufficiently concentrated error which has time/space/query complexity T(n,ρ) with ρ being an accuracy parameter, we can generally speaking get an algorithm with the same accuracy and complexity T(n,Θ(ε ρ)) that is ε-differentially private.

Our technical results are as follows. First, we show that if the error is subexponential, then the Laplace mechanism with error magnitude proportional to the sum of the global sensitivity Δ and the subexponential diameter of the error of the algorithm makes the algorithm differentially private. This is true even if the worst-case global sensitivity of the algorithm is large or infinite. We then introduce a new additive noise mechanism, which we call the zero-symmetric Pareto mechanism. We show that using this mechanism, we can make an algorithm differentially private even if we only assume a bound on the first absolute moment of the error 𝔼[|R|].

Finally, we use our results to give either the first known or improved sublinear-complexity differentially private algorithms for various problems. This includes results for frequency moments, estimating the average degree of a graph in subliinear time, rank queries, or estimating the size of the maximum matching. Our results raise many new questions and we state multiple open problems.

Differential privacy Randomized approximation algorithms

Daniel Alabi, Omri Ben-Eliezer, and Anamay Chaturvedi. Bounded space differentially private quantiles. arXiv preprint, 2022. URL: https://arxiv.org/abs/2201.03380.

Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20-29, 1996.

Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Guha Thakurta. Practical locally private heavy hitters. Advances in Neural Information Processing Systems, 30, 2017.

Petra Berenbrink, Bruce Krayenhoff, and Frederik Mallmann-Trenn. Estimating the number of connected components in sublinear time. Information Processing Letters, 114(11):639-642, 2014.

J Blocki, E Grigorescu, and T Mukherjee. Privately estimating graph parameters in sublinear time. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)., 2022.

Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The johnson-lindenstrauss transform itself preserves differential privacy. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 410-419. IEEE, 2012.

Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee, and Samson Zhou. How to Make Your Approximation Algorithm Private: A Black-Box Differentially-Private Transformation for Tunable Approximation Algorithms of Functions with Low Sensitivity. In Nicole Megow and Adam Smith, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023), volume 275 of Leibniz International Proceedings in Informatics (LIPIcs), pages 59:1-59:24, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik.10.4230/LIPIcs.APPROX/RANDOM.2023.59

Jonas Boehler and Florian Kerschbaum. Secure sublinear time differentially private median computation, February 1 2022. US Patent 11,238,167.

Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. Cryptology ePrint Archive, 2020.

Charlie Dickens, Justin Thaler, and Daniel Ting. (nearly) all cardinality estimators are differentially private. arXiv preprint, 2022. URL: https://arxiv.org/abs/2203.15400.

Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities. In Algorithms-ESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003. Proceedings 11, pages 605-617. Springer, 2003.

Cynthia Dwork. Differential Privacy. In Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener, editors, Automata, Languages and Programming, pages 1-12, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265-284. Springer, 2006.

Talya Eden, Dana Ron, and C Seshadhri. On approximating the number of k-cliques in sublinear time. In Proceedings of the 50th annual ACM SIGACT symposium on theory of computing, pages 722-734, 2018.

Alessandro Epasto, Jieming Mao, Andres Munoz Medina, Vahab Mirrokni, Sergei Vassilvitskii, and Peilin Zhong. Differentially private continual releases of streaming frequency moment estimations. arXiv preprint, 2023. URL: https://arxiv.org/abs/2301.05605.

geetha290krm (https://math.stackexchange.com/users/1064504/geetha290krm). Does f_X+Y(z) = e[f_y(z-x)] hold? Mathematics Stack Exchange, 2022. URL:https://math.stackexchange.com/q/4544852 (version: 2022-10-04). URL: https://arxiv.org/abs/https://math.stackexchange.com/q/4544852.

Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 463-488. Springer, 2021.

Ziyue Huang, Yuan Qiu, Ke Yi, and Graham Cormode. Frequency estimation under multiparty differential privacy: One-shot and streaming. arXiv preprint, 2021. URL: https://arxiv.org/abs/2104.01808.

Haim Kaplan and Uri Stemmer. A note on sanitizing streams with differential privacy. arXiv preprint, 2021. URL: https://arxiv.org/abs/2111.13762.

Zohar Karnin, Kevin Lang, and Edo Liberty. Optimal quantile approximation in streams. In 2016 ieee 57th annual symposium on foundations of computer science (focs), pages 71-78. IEEE, 2016.

Kasper Green Larsen, Rasmus Pagh, and Jakub Tětek. Countsketches, feature hashing and the median of three. In International Conference on Machine Learning, pages 6011-6020. PMLR, 2021.

Christian Janos Lebeda and Jakub Tětek. Better differentially private approximate histograms and heavy hitters using the misra-gries sketch. arXiv preprint, 2023. URL: https://arxiv.org/abs/2301.02457.

Alexander J McNeil, Rüdiger Frey, and Paul Embrechts. Quantitative risk management: concepts, techniques and tools-revised edition. Princeton university press, 2015.

Luca Melis, George Danezis, and Emiliano De Cristofaro. Efficient private statistics with succinct sketches. arXiv preprint, 2015. URL: https://arxiv.org/abs/1508.06110.

Darakhshan Mir, Shan Muthukrishnan, Aleksandar Nikolov, and Rebecca N Wright. Pan-private algorithms via statistics on sketches. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 37-48, 2011.

Jayadev Misra and David Gries. Finding repeated elements. Science of computer programming, 2(2):143-152, 1982.

Rasmus Pagh and Mikkel Thorup. Improved utility analysis of private countsketch. arXiv preprint, 2022. URL: https://arxiv.org/abs/2205.08397.

Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odometers and filters: Pay-as-you-go composition. Advances in Neural Information Processing Systems, 29, 2016.

Harry Sivasubramaniam, Haonan Li, and Xi He. Differentially private sublinear average degree approximation, 2020.

Adam Smith, Shuang Song, and Abhradeep Guha Thakurta. The flajolet-martin sketch itself preserves differential privacy: Private counting with minimal space. Advances in Neural Information Processing Systems, 33:19561-19572, 2020.

Nina Mesing Stausholm. Improved differentially private euclidean distance approximation. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 42-56, 2021.

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.

Lun Wang, Iosif Pinelis, and Dawn Song. Differentially private fractional frequency moments estimation with polylogarithmic space. arXiv preprint, 2021. URL: https://arxiv.org/abs/2105.12363.

Yuichi Yoshida, Masaki Yamamoto, and Hiro Ito. An improved constant-time approximation algorithm for maximum~ matchings. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 225-234, 2009.

Fuheng Zhao, Dan Qiao, Rachel Redberg, Divyakant Agrawal, Amr El Abbadi, and Yu-Xiang Wang. Differentially private linear sketches: Efficient implementations and applications. arXiv preprint, 2022. URL: https://arxiv.org/abs/2205.09873.

<book-part-wrapper xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" content-type="research-article">

<collection-meta collection-type="book-series">

<collection-id collection-id-type="doi">10.1145/acmotherconferences</collection-id>

<title-group>

<title>ACM Other Conferences</title>

</title-group>

</collection-meta>

<book-meta>

<book-id book-id-type="acm-id">0000000</book-id>

<book-id book-id-type="doi">10.5555/0000000</book-id>

<book-title-group>

<book-title>Proceedings of the Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)</book-title>

<alt-title alt-title-type="acronym">APPROX/RANDOM 2024</alt-title>

</book-title-group>

</book-meta>

<book-part book-part-type="chapter" xml:lang="en">

<book-part-meta>

<book-part-id book-part-id-type="doi">10.4230/LIPIcs.APPROX/RANDOM.2024.73</book-part-id>

<book-part-id book-part-id-type="article-no">73</book-part-id>

<subj-group subj-group-type="ccs2012">

<compound-subject>

<compound-subject-part content-type="code">10002978</compound-subject-part>

<compound-subject-part content-type="text">Security and privacy</compound-subject-part>

<compound-subject-part content-type="weight">500</compound-subject-part>

</compound-subject>

<compound-subject>

<compound-subject-part content-type="code">10003752.10003809.10010055</compound-subject-part>

<compound-subject-part content-type="text">Theory of computation~Streaming, sublinear and near linear time algorithms</compound-subject-part>

<compound-subject-part content-type="weight">500</compound-subject-part>

</compound-subject>

</subj-group>

<title-group>

<title>Additive Noise Mechanisms for Making Randomized Approximation Algorithms Differentially Private</title>

</title-group>

<contrib-group>

<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-2046-1627</contrib-id>

<name>

<surname>Tětek</surname>

<given-names>Jakub</given-names>

</name>

<aff>INSAIT, Sofia, Bulgaria</aff>

<email>j.tetek@gmail.com</email>

<role>Author</role>

</contrib>

</contrib-group>

<pub-date date-type="publication">

</pub-date>

The exponential increase in the amount of available data makes taking advantage of them without violating users' privacy one of the fundamental problems of computer science. This question has been investigated thoroughly under the framework of differential privacy. However, most of the literature has not focused on settings where the amount of data is so large that we are not even able to compute the exact answer in the non-private setting (such as in the streaming setting, sublinear-time setting, etc.). This can often make the use of differential privacy unfeasible in practice.

In this paper, we show a general approach for making Monte-Carlo randomized approximation algorithms differentially private. We only need to assume the error R of the approximation algorithm is sufficiently concentrated around 0 (e.g. 𝔼[|R|] is bounded) and that the function being approximated has a small global sensitivity Δ. Specifically, if we have a randomized approximation algorithm with sufficiently concentrated error which has time/space/query complexity T(n,ρ) with ρ being an accuracy parameter, we can generally speaking get an algorithm with the same accuracy and complexity T(n,Θ(ε ρ)) that is ε-differentially private.

Our technical results are as follows. First, we show that if the error is subexponential, then the Laplace mechanism with error magnitude proportional to the sum of the global sensitivity Δ and the subexponential diameter of the error of the algorithm makes the algorithm differentially private. This is true even if the worst-case global sensitivity of the algorithm is large or infinite. We then introduce a new additive noise mechanism, which we call the zero-symmetric Pareto mechanism. We show that using this mechanism, we can make an algorithm differentially private even if we only assume a bound on the first absolute moment of the error 𝔼[|R|].

Finally, we use our results to give either the first known or improved sublinear-complexity differentially private algorithms for various problems. This includes results for frequency moments, estimating the average degree of a graph in subliinear time, rank queries, or estimating the size of the maximum matching. Our results raise many new questions and we state multiple open problems.

</abstract>

<kwd-group>

<kwd>Differential privacy</kwd>

<kwd>Randomized approximation algorithms</kwd>

</kwd-group>

</book-part-meta>

<back>

<ref-list specific-use="unparsed">

<mixed-citation>Daniel Alabi, Omri Ben-Eliezer, and Anamay Chaturvedi. Bounded space differentially private quantiles. arXiv preprint, 2022. URL: https://arxiv.org/abs/2201.03380.</mixed-citation>

</ref>

<mixed-citation>Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20-29, 1996.</mixed-citation>

</ref>

<mixed-citation>Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Guha Thakurta. Practical locally private heavy hitters. Advances in Neural Information Processing Systems, 30, 2017.</mixed-citation>

</ref>

<mixed-citation>Petra Berenbrink, Bruce Krayenhoff, and Frederik Mallmann-Trenn. Estimating the number of connected components in sublinear time. Information Processing Letters, 114(11):639-642, 2014.</mixed-citation>

</ref>

<mixed-citation>J Blocki, E Grigorescu, and T Mukherjee. Privately estimating graph parameters in sublinear time. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)., 2022.</mixed-citation>

</ref>

<mixed-citation>Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The johnson-lindenstrauss transform itself preserves differential privacy. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 410-419. IEEE, 2012.</mixed-citation>

</ref>

<mixed-citation>

<pub-id pub-id-type="doi" xlink:href="10.4230/LIPIcs.APPROX/RANDOM.2023.59">10.4230/LIPIcs.APPROX/RANDOM.2023.59</pub-id>

</mixed-citation>

</ref>

<mixed-citation>Jonas Boehler and Florian Kerschbaum. Secure sublinear time differentially private median computation, February 1 2022. US Patent 11,238,167.</mixed-citation>

</ref>

<mixed-citation>Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. Cryptology ePrint Archive, 2020.</mixed-citation>

</ref>

<mixed-citation>Charlie Dickens, Justin Thaler, and Daniel Ting. (nearly) all cardinality estimators are differentially private. arXiv preprint, 2022. URL: https://arxiv.org/abs/2203.15400.</mixed-citation>

</ref>

<mixed-citation>Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities. In Algorithms-ESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003. Proceedings 11, pages 605-617. Springer, 2003.</mixed-citation>

</ref>

<mixed-citation>Cynthia Dwork. Differential Privacy. In Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener, editors, Automata, Languages and Programming, pages 1-12, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.</mixed-citation>

</ref>

<mixed-citation>Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265-284. Springer, 2006.</mixed-citation>

</ref>

<mixed-citation>Talya Eden, Dana Ron, and C Seshadhri. On approximating the number of k-cliques in sublinear time. In Proceedings of the 50th annual ACM SIGACT symposium on theory of computing, pages 722-734, 2018.</mixed-citation>

</ref>

<mixed-citation>Alessandro Epasto, Jieming Mao, Andres Munoz Medina, Vahab Mirrokni, Sergei Vassilvitskii, and Peilin Zhong. Differentially private continual releases of streaming frequency moment estimations. arXiv preprint, 2023. URL: https://arxiv.org/abs/2301.05605.</mixed-citation>

</ref>

<mixed-citation>geetha290krm (https://math.stackexchange.com/users/1064504/geetha290krm). Does f_X+Y(z) = e[f_y(z-x)] hold? Mathematics Stack Exchange, 2022. URL:https://math.stackexchange.com/q/4544852 (version: 2022-10-04). URL: https://arxiv.org/abs/https://math.stackexchange.com/q/4544852.</mixed-citation>

</ref>

<mixed-citation>Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 463-488. Springer, 2021.</mixed-citation>

</ref>

<mixed-citation>Ziyue Huang, Yuan Qiu, Ke Yi, and Graham Cormode. Frequency estimation under multiparty differential privacy: One-shot and streaming. arXiv preprint, 2021. URL: https://arxiv.org/abs/2104.01808.</mixed-citation>

</ref>

<mixed-citation>Haim Kaplan and Uri Stemmer. A note on sanitizing streams with differential privacy. arXiv preprint, 2021. URL: https://arxiv.org/abs/2111.13762.</mixed-citation>

</ref>

<mixed-citation>Zohar Karnin, Kevin Lang, and Edo Liberty. Optimal quantile approximation in streams. In 2016 ieee 57th annual symposium on foundations of computer science (focs), pages 71-78. IEEE, 2016.</mixed-citation>

</ref>

<mixed-citation>Kasper Green Larsen, Rasmus Pagh, and Jakub Tětek. Countsketches, feature hashing and the median of three. In International Conference on Machine Learning, pages 6011-6020. PMLR, 2021.</mixed-citation>

</ref>

<mixed-citation>Christian Janos Lebeda and Jakub Tětek. Better differentially private approximate histograms and heavy hitters using the misra-gries sketch. arXiv preprint, 2023. URL: https://arxiv.org/abs/2301.02457.</mixed-citation>

</ref>

<mixed-citation>Alexander J McNeil, Rüdiger Frey, and Paul Embrechts. Quantitative risk management: concepts, techniques and tools-revised edition. Princeton university press, 2015.</mixed-citation>

</ref>

<mixed-citation>Luca Melis, George Danezis, and Emiliano De Cristofaro. Efficient private statistics with succinct sketches. arXiv preprint, 2015. URL: https://arxiv.org/abs/1508.06110.</mixed-citation>

</ref>

<mixed-citation>Darakhshan Mir, Shan Muthukrishnan, Aleksandar Nikolov, and Rebecca N Wright. Pan-private algorithms via statistics on sketches. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 37-48, 2011.</mixed-citation>

</ref>

<mixed-citation>Jayadev Misra and David Gries. Finding repeated elements. Science of computer programming, 2(2):143-152, 1982.</mixed-citation>

</ref>

<mixed-citation>Rasmus Pagh and Mikkel Thorup. Improved utility analysis of private countsketch. arXiv preprint, 2022. URL: https://arxiv.org/abs/2205.08397.</mixed-citation>

</ref>

<mixed-citation>Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odometers and filters: Pay-as-you-go composition. Advances in Neural Information Processing Systems, 29, 2016.</mixed-citation>

</ref>

<mixed-citation>Harry Sivasubramaniam, Haonan Li, and Xi He. Differentially private sublinear average degree approximation, 2020.</mixed-citation>

</ref>

<mixed-citation>Adam Smith, Shuang Song, and Abhradeep Guha Thakurta. The flajolet-martin sketch itself preserves differential privacy: Private counting with minimal space. Advances in Neural Information Processing Systems, 33:19561-19572, 2020.</mixed-citation>

</ref>

<mixed-citation>Nina Mesing Stausholm. Improved differentially private euclidean distance approximation. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 42-56, 2021.</mixed-citation>

</ref>

<mixed-citation>Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.</mixed-citation>

</ref>

<mixed-citation>Lun Wang, Iosif Pinelis, and Dawn Song. Differentially private fractional frequency moments estimation with polylogarithmic space. arXiv preprint, 2021. URL: https://arxiv.org/abs/2105.12363.</mixed-citation>

</ref>

<mixed-citation>Yuichi Yoshida, Masaki Yamamoto, and Hiro Ito. An improved constant-time approximation algorithm for maximum~ matchings. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 225-234, 2009.</mixed-citation>

</ref>

<mixed-citation>Fuheng Zhao, Dan Qiao, Rachel Redberg, Divyakant Agrawal, Amr El Abbadi, and Yu-Xiang Wang. Differentially private linear sketches: Efficient implementations and applications. arXiv preprint, 2022. URL: https://arxiv.org/abs/2205.09873.</mixed-citation>

</ref>

</ref-list>

</back>

</book-part>

</book-part-wrapper>