Document

# Efficient Differentially Private F₀ Linear Sketching

## File

LIPIcs.ICDT.2021.18.pdf
• Filesize: 0.83 MB
• 19 pages

## Acknowledgements

We thank Shuang Song and Abhradeep Guha Thakurta for feedback on a previous version of this manuscript. We thank the anonymous reviewers for constructive suggestions. The work of Rasmus Pagh was partly done while employed at IT University of Copenhagen.

## Cite As

Rasmus Pagh and Nina Mesing Stausholm. Efficient Differentially Private F₀ Linear Sketching. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICDT.2021.18

## Abstract

A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work we consider how to construct sketches for weighted F₀, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector w ∈ (0,1]^u be given. For x ∈ {0,1}^u we are interested in estimating ||x∘w||₁ where ∘ is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al. (STOC 1998), we introduce a sketch (depending on w) that is linear over GF(2), mapping a vector x ∈ {0,1}^u to Hx ∈ {0,1}^τ for a matrix H sampled from a suitable distribution ℋ. Differential privacy is achieved by using randomized response, flipping each bit of Hx with probability p < 1/2. That is, for a vector φ ∈ {0,1}^τ where Pr[(φ)_j = 1] = p independently for each entry j, we consider the noisy sketch Hx + φ, where the addition of noise happens over GF(2). We show that for every choice of 0 < β < 1 and ε = O(1) there exists p < 1/2 and a distribution ℋ of linear sketches of size τ = O(log²(u)ε^{-2}β^{-2}) such that: 1) For random H∼ℋ and noise vector φ, given Hx + φ we can compute an estimate of ||x∘w||₁ that is accurate within a factor 1±β, plus additive error O(log(u)ε^{-2}β^{-2}), w. p. 1-u^{-1}, and 2) For every H∼ℋ, Hx + φ is ε-differentially private over the randomness in φ. The special case w = (1,… ,1) is unweighted F₀. Previously, Mir et al. (PODS 2011) and Kenthapadi et al. (J. Priv. Confidentiality 2013) had described a differentially private way of sketching unweighted F₀, but the algorithms for calibrating noise to their sketches are not computationally efficient, either using quasipolynomial time in the sketch size or superlinear time in the universe size u. For fixed ε the size of our sketch is polynomially related to the lower bound of Ω(log(u)β^{-2}) bits by Jayram & Woodruff (Trans. Algorithms 2013). The additive error is comparable to the bound of Ω(1/ε) of Hardt & Talwar (STOC 2010). An application of our sketch is that two sketches can be added to form a noisy sketch of the form H(x₁+x₂) + (φ₁+φ₂), which allows us to estimate ||(x₁+x₂)∘w||₁. Since addition is over GF(2), this is the weight of the symmetric difference of the vectors x₁ and x₂. Recent work has shown how to privately and efficiently compute an estimate for the symmetric difference size of two sets using (non-linear) sketches such as FM-sketches and Bloom Filters, but these methods have an error bound no better than O(√{̄{m}}), where ̄{m} is an upper bound on ||x₁||₀ and ||x₂||₀. This improves previous work when β = o (1/√{̄{m}}) and log(u)/ε = ̄{m}^{o(1)}. In conclusion our results both improve the efficiency of existing methods for unweighted F₀ estimation and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.

## Subject Classification

##### ACM Subject Classification
• Security and privacy → Formal methods and theory of security
##### Keywords
• Differential Privacy
• Linear Sketches
• Weighted F0 Estimation

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Mohammad Alaggan, Sébastien Gambs, and Anne-Marie Kermarrec. BLIP: non-interactive differentially-private similarity computation on bloom filters. In Stabilization, Safety, and Security of Distributed Systems - 14th International Symposium, SSS, pages 202-216, 2012. URL: https://doi.org/10.1007/978-3-642-33536-5_20.
2. Mohammad Alaggan, Sébastien Gambs, Stan Matwin, and Mohammed Tuhin. Sanitization of call detail records via differentially-private bloom filters. In Data and Applications Security and Privacy XXIX - 29th Annual IFIP WG 11.3 Working Conference, DBSec 2015, pages 223-230, 2015. URL: https://doi.org/10.1007/978-3-319-20810-7_15.
3. Noga Alon, Phillip B Gibbons, Yossi Matias, and Mario Szegedy. Tracking join and self-join sizes in limited storage. Journal of Computer and System Sciences, 64(3):719-747, 2002.
4. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Symposium on the Theory of Computing, pages 20-29, 1996. URL: https://doi.org/10.1145/237814.237823.
5. Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, and Hongyang Zhang. Learning and 1-bit compressed sensing under asymmetric noise. In Conference on Learning Theory, pages 152-192, 2016.
6. Ziv Bar-Yossef, TS Jayram, Ravi Kumar, D Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In International Workshop on Randomization and Approximation Techniques in Computer Science, pages 1-10, 2002.
7. Valerio Bioglio, Tiziano Bianchi, and Enrico Magli. Secure compressed sensing over finite fields. In International Workshop on Information Forensics and Security (WIFS), pages 191-196, 2014.
8. Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The johnson-lindenstrauss transform itself preserves differential privacy. In Symposium on Foundations of Computer Science, FOCS, pages 410-419, 2012. URL: https://doi.org/10.1109/FOCS.2012.67.
9. Andrei Z. Broder and Michael Mitzenmacher. Survey: Network applications of bloom filters: A survey. Internet Mathematics, 1(4):485-509, 2003. URL: https://doi.org/10.1080/15427951.2004.10129096.
10. Clément Canonne, Gautam Kamath, and Thomas Steinke. The discrete gaussian for differential privacy. arXiv preprint arXiv:2004.00010, 2020.
11. Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. IACR Cryptol. ePrint Arch., 2020:29, 2020. URL: https://eprint.iacr.org/2020/029.
12. Reuven Cohen, Liran Katzir, and Aviv Yehezkel. A unified scheme for generalizing cardinality estimators to sum aggregation. Information Processing Letters, 115(2):336-342, 2015.
13. Graham Cormode, Minos N. Garofalakis, Peter J. Haas, and Chris Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 4(1-3):1-294, 2012. URL: https://doi.org/10.1561/1900000004.
14. Damien Desfontaines, Andreas Lochbihler, and David A. Basin. Cardinality estimators do not preserve privacy. PoPETs, 2019(2):26-46, 2019. URL: https://doi.org/10.2478/popets-2019-0018.
15. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In 3rd Theory of Cryptography Conference, TCC, pages 265-284, 2006. URL: https://doi.org/10.1007/11681878_14.
16. Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N Rothblum, and Sergey Yekhanin. Pan-private streaming algorithms. In ICS, pages 66-80, 2010.
17. Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211-407, 2014. URL: https://doi.org/10.1561/0400000042.
18. Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 Conference on Computer and Communications Security, pages 1054-1067, 2014. URL: https://doi.org/10.1145/2660267.2660348.
19. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In AofA: Analysis of Algorithms, pages 137-156, 2007.
20. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182-209, 1985. URL: https://doi.org/10.1016/0022-0000(85)90041-8.
21. Slawomir Goryczka, Li Xiong, and Vaidy S. Sunderam. Secure multiparty aggregation with differential privacy: a comparative study. In Joint 2013 EDBT/ICDT Conferences, EDBT/ICDT '13, pages 155-163, 2013. URL: https://doi.org/10.1145/2457317.2457343.
22. Peter J Haas, Jeffrey F Naughton, S Seshadri, and Lynne Stokes. Sampling-based estimation of the number of distinct values of an attribute. In VLDB, volume 95, pages 311-322, 1995.
23. Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Symposium on Theory of Computing, STOC, pages 705-714, 2010. URL: https://doi.org/10.1145/1806689.1806786.
24. T. S. Jayram and David P. Woodruff. Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with subconstant error. Transactions on Algorithms, 9(3):26:1-26:17, 2013. URL: https://doi.org/10.1145/2483699.2483706.
25. Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of the 29th ACM symposium on Principles of database systems (PODS), pages 41-52, 2010.
26. Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra. Privacy via the Johnson-Lindenstrauss transform. J. Priv. Confidentiality, 5(1), 2013. URL: https://doi.org/10.29012/jpc.v5i1.625.
27. Daniel Kifer, Shai Ben-David, and Johannes Gehrke. Detecting change in data streams. In VLDB, volume 4, pages 180-191. Toronto, Canada, 2004.
28. Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Proceedings of ACM International Conference on Management of data (SIGMOD), pages 193-204, 2011.
29. Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Symposium on the Theory of Computing, pages 614-623, 1998. URL: https://doi.org/10.1145/276698.276877.
30. Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. The limits of two-party differential privacy. In 51st Annual Symposium on Foundations of Computer Science, pages 81-90, 2010.
31. Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, volume 7, pages 94-103, 2007.
32. Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of ACM International Conference on Management of data (SIGMOD), pages 19-30, 2009.
33. Luca Melis, George Danezis, and Emiliano De Cristofaro. Efficient private statistics with succinct sketches. In 23rd Annual Network and Distributed System Security Symposium, NDSS, 2016. URL: https://doi.org/10.14722/ndss.2016.23175.
34. Darakhshan Mir, S Muthukrishnan, Aleksandar Nikolov, and Rebecca N Wright. Pan-private algorithms: When memory does not help. arXiv preprint arXiv:1009.1544, 2010.
35. Darakhshan Mir, Shan Muthukrishnan, Aleksandar Nikolov, and Rebecca N Wright. Pan-private algorithms via statistics on sketches. In Proceedings of the 30th Symposium on Principles of Database Systems (PODS), pages 37-48, 2011.
36. Ilya Mironov. On significance of the least significant bits for differential privacy. In Ting Yu, George Danezis, and Virgil D. Gligor, editors, Conference on Computer and Communications Security, CCS, pages 650-661, 2012. URL: https://doi.org/10.1145/2382196.2382264.
37. Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil P. Vadhan. Computational differential privacy. In Shai Halevi, editor, Advances in Cryptology - CRYPTO, volume 5677 of Lecture Notes in Computer Science, pages 126-142, 2009. URL: https://doi.org/10.1007/978-3-642-03356-8_8.
38. Michael Mitzenmacher, Rasmus Pagh, and Ninh Pham. Efficient estimation for high similarities using odd sketches. In Proceedings of 23rd international conference on World Wide Web (WWW), pages 109-118, 2014.
39. Aleksandar Nikolov. Personal communication. Clarification, 2020.
40. Hagen Sparka, Florian Tschorsch, and Björn Scheuermann. P2KMV: A privacy-preserving counting sketch for efficient and accurate set intersection cardinality estimations. IACR Cryptology ePrint Archive, 2018:234, 2018. URL: http://eprint.iacr.org/2018/234.
41. Rade Stanojevic, Mohamed Nabeel, and Ting Yu. Distributed cardinality estimation of set operations with differential privacy. In IEEE Symposium on Privacy-Aware Computing, PAC, pages 37-48, 2017. URL: https://doi.org/10.1109/PAC.2017.43.
42. Florian Tschorsch and Björn Scheuermann. An algorithm for privacy-preserving distributed user statistics. Computer Networks, 57(14):2775-2787, 2013. URL: https://doi.org/10.1016/j.comnet.2013.05.011.
43. Salil P. Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347-450. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-57048-8_7.
44. Saskia Nuñez von Voigt and Florian Tschorsch. Rrtxfm: Probabilistic counting for differentially private statistics. In Workshop on Trust and Privacy Aspects of Smart Information Environments (TPSIE), 2019.
45. Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63-69, 1965. URL: http://www.jstor.org/stable/2283137.
46. Royce J Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. Differentially private SQL with bounded user contribution. Proceedings on Privacy Enhancing Technologies, 2020(2):230-250, 2020.
47. David P. Woodruff. Data streams and applications in computer science. Bulletin of the EATCS, 114, 2014. URL: http://eatcs.org/beatcs/index.php/beatcs/article/view/304.
X

Feedback for Dagstuhl Publishing

Feedback submitted

### Could not send message

Please try again later or send an E-mail