Document

**Published in:** LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work we consider how to construct sketches for weighted F₀, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector w ∈ (0,1]^u be given. For x ∈ {0,1}^u we are interested in estimating ||x∘w||₁ where ∘ is the Hadamard product (entrywise product).
Building on a technique of Kushilevitz et al. (STOC 1998), we introduce a sketch (depending on w) that is linear over GF(2), mapping a vector x ∈ {0,1}^u to Hx ∈ {0,1}^τ for a matrix H sampled from a suitable distribution ℋ. Differential privacy is achieved by using randomized response, flipping each bit of Hx with probability p < 1/2. That is, for a vector φ ∈ {0,1}^τ where Pr[(φ)_j = 1] = p independently for each entry j, we consider the noisy sketch Hx + φ, where the addition of noise happens over GF(2). We show that for every choice of 0 < β < 1 and ε = O(1) there exists p < 1/2 and a distribution ℋ of linear sketches of size τ = O(log²(u)ε^{-2}β^{-2}) such that:
1) For random H∼ℋ and noise vector φ, given Hx + φ we can compute an estimate of ||x∘w||₁ that is accurate within a factor 1±β, plus additive error O(log(u)ε^{-2}β^{-2}), w. p. 1-u^{-1}, and
2) For every H∼ℋ, Hx + φ is ε-differentially private over the randomness in φ. The special case w = (1,… ,1) is unweighted F₀. Previously, Mir et al. (PODS 2011) and Kenthapadi et al. (J. Priv. Confidentiality 2013) had described a differentially private way of sketching unweighted F₀, but the algorithms for calibrating noise to their sketches are not computationally efficient, either using quasipolynomial time in the sketch size or superlinear time in the universe size u.
For fixed ε the size of our sketch is polynomially related to the lower bound of Ω(log(u)β^{-2}) bits by Jayram & Woodruff (Trans. Algorithms 2013). The additive error is comparable to the bound of Ω(1/ε) of Hardt & Talwar (STOC 2010). An application of our sketch is that two sketches can be added to form a noisy sketch of the form H(x₁+x₂) + (φ₁+φ₂), which allows us to estimate ||(x₁+x₂)∘w||₁. Since addition is over GF(2), this is the weight of the symmetric difference of the vectors x₁ and x₂. Recent work has shown how to privately and efficiently compute an estimate for the symmetric difference size of two sets using (non-linear) sketches such as FM-sketches and Bloom Filters, but these methods have an error bound no better than O(√{̄{m}}), where ̄{m} is an upper bound on ||x₁||₀ and ||x₂||₀. This improves previous work when β = o (1/√{̄{m}}) and log(u)/ε = ̄{m}^{o(1)}.
In conclusion our results both improve the efficiency of existing methods for unweighted F₀ estimation and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.

Rasmus Pagh and Nina Mesing Stausholm. Efficient Differentially Private F₀ Linear Sketching. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{pagh_et_al:LIPIcs.ICDT.2021.18, author = {Pagh, Rasmus and Stausholm, Nina Mesing}, title = {{Efficient Differentially Private F₀ Linear Sketching}}, booktitle = {24th International Conference on Database Theory (ICDT 2021)}, pages = {18:1--18:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-179-5}, ISSN = {1868-8969}, year = {2021}, volume = {186}, editor = {Yi, Ke and Wei, Zhewei}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.18}, URN = {urn:nbn:de:0030-drops-137264}, doi = {10.4230/LIPIcs.ICDT.2021.18}, annote = {Keywords: Differential Privacy, Linear Sketches, Weighted F0 Estimation} }

Document

**Published in:** LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A x B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a cap b|/|a cup b| for (a,b) in A x B.
We consider the approximate version of the problem where we are given thresholds j_1 > j_2 and wish to return a pair from A x B that has Jaccard similarity higher than j_2 if there exists a pair in A x B with Jaccard similarity at least j_1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n^(2-delta)) time if j_1 >= j_2^(1-delta). In particular, for delta=Omega(1), the approximation ratio j_1/j_2 = 1/j_2^delta increases polynomially in 1/j_2.
In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n^(2-Omega(1))) time for j_1/j_2 = 1/j_2^o(1). Specifically, assuming OVC, we prove that for any delta>0 there exists an epsilon>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Omega(n^(2-delta)) for any choice of thresholds j_2 < j_1 < 1-delta, that satisfy j_1 <= j_2^(1-epsilon).

Rasmus Pagh, Nina Mesing Stausholm, and Mikkel Thorup. Hardness of Bichromatic Closest Pair with Jaccard Similarity. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 74:1-74:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{pagh_et_al:LIPIcs.ESA.2019.74, author = {Pagh, Rasmus and Stausholm, Nina Mesing and Thorup, Mikkel}, title = {{Hardness of Bichromatic Closest Pair with Jaccard Similarity}}, booktitle = {27th Annual European Symposium on Algorithms (ESA 2019)}, pages = {74:1--74:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-124-5}, ISSN = {1868-8969}, year = {2019}, volume = {144}, editor = {Bender, Michael A. and Svensson, Ola and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.74}, URN = {urn:nbn:de:0030-drops-111951}, doi = {10.4230/LIPIcs.ESA.2019.74}, annote = {Keywords: fine-grained complexity, set similarity search, bichromatic closest pair, jaccard similarity} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail