DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.104

Optimal Non-Adaptive Cell Probe Dictionaries and Hashing

Authors: Kasper Green Larsen, Rasmus Pagh, Giuseppe Persiano, Toniann Pitassi, Kevin Yeo, and Or Zamir

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We present a simple and provably optimal non-adaptive cell probe data structure for the static dictionary problem. Our data structure supports storing a set of n key-value pairs from [u]× [u] using s words of space and answering key lookup queries in t = O(lg(u/n)/lg(s/n)) non-adaptive probes. This generalizes a solution to the membership problem (i.e., where no values are associated with keys) due to Buhrman et al. We also present matching lower bounds for the non-adaptive static membership problem in the deterministic setting. Our lower bound implies that both our dictionary algorithm and the preceding membership algorithm are optimal, and in particular that there is an inherent complexity gap in these problems between no adaptivity and one round of adaptivity (with which hashing-based algorithms solve these problems in constant time). Using the ideas underlying our data structure, we also obtain the first implementation of a n-wise independent family of hash functions with optimal evaluation time in the cell probe model.

Cite as

Kasper Green Larsen, Rasmus Pagh, Giuseppe Persiano, Toniann Pitassi, Kevin Yeo, and Or Zamir. Optimal Non-Adaptive Cell Probe Dictionaries and Hashing. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 104:1-104:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{larsen_et_al:LIPIcs.ICALP.2024.104,
  author =	{Larsen, Kasper Green and Pagh, Rasmus and Persiano, Giuseppe and Pitassi, Toniann and Yeo, Kevin and Zamir, Or},
  title =	{{Optimal Non-Adaptive Cell Probe Dictionaries and Hashing}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{104:1--104:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.104},
  URN =		{urn:nbn:de:0030-drops-202471},
  doi =		{10.4230/LIPIcs.ICALP.2024.104},
  annote =	{Keywords: non-adaptive, cell probe, dictionary, hashing}
}

Document

DOI: 10.4230/LIPIcs.SWAT.2024.9

Daisy Bloom Filters

Authors: Ioana O. Bercea, Jakob Bæk Tejs Houen, and Rasmus Pagh

Published in: LIPIcs, Volume 294, 19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024)

Abstract

A filter is a widely used data structure for storing an approximation of a given set S of elements from some universe 𝒰 (a countable set). It represents a superset S' ⊇ S that is "close to S" in the sense that for x ∉ S, the probability that x ∈ S' is bounded by some ε > 0. The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store S exactly. Though filters are well-understood from a worst-case perspective, it is clear that state-of-the-art constructions may not be close to optimal for particular distributions of data and queries. Suppose, for instance, that some elements are in S with probability close to 1. Then it would make sense to always include them in S', saving space by not having to represent these elements in the filter. Questions like this have been raised in the context of Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) and Bloom filter implementations that make use of access to learned components (Vaidya, Knorr, Mitzenmacher, and Krask, ICLR 2021). In this paper, we present a lower bound for the expected space that such a filter requires. We also show that the lower bound is asymptotically tight by exhibiting a filter construction that executes queries and insertions in worst-case constant time, and has a false positive rate at most ε with high probability over input sets drawn from a product distribution. We also present a Bloom filter alternative, which we call the Daisy Bloom filter, that executes operations faster and uses significantly less space than the standard Bloom filter.

Cite as

Ioana O. Bercea, Jakob Bæk Tejs Houen, and Rasmus Pagh. Daisy Bloom Filters. In 19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 294, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bercea_et_al:LIPIcs.SWAT.2024.9,
  author =	{Bercea, Ioana O. and Houen, Jakob B{\ae}k Tejs and Pagh, Rasmus},
  title =	{{Daisy Bloom Filters}},
  booktitle =	{19th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2024)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-318-8},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{294},
  editor =	{Bodlaender, Hans L.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2024.9},
  URN =		{urn:nbn:de:0030-drops-200491},
  doi =		{10.4230/LIPIcs.SWAT.2024.9},
  annote =	{Keywords: Bloom filters, input distribution, learned data structures}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.ESA.2021

LIPIcs, Volume 204, ESA 2021, Complete Volume

Authors: Petra Mutzel, Rasmus Pagh, and Grzegorz Herman

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

Abstract

LIPIcs, Volume 204, ESA 2021, Complete Volume

Cite as

29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 1-1340, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@Proceedings{mutzel_et_al:LIPIcs.ESA.2021,
  title =	{{LIPIcs, Volume 204, ESA 2021, Complete Volume}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{1--1340},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021},
  URN =		{urn:nbn:de:0030-drops-145808},
  doi =		{10.4230/LIPIcs.ESA.2021},
  annote =	{Keywords: LIPIcs, Volume 204, ESA 2021, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.ESA.2021.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Petra Mutzel, Rasmus Pagh, and Grzegorz Herman

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 0:i-0:xx, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{mutzel_et_al:LIPIcs.ESA.2021.0,
  author =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{0:i--0:xx},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.0},
  URN =		{urn:nbn:de:0030-drops-145816},
  doi =		{10.4230/LIPIcs.ESA.2021.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2021.18

Efficient Differentially Private F₀ Linear Sketching

Authors: Rasmus Pagh and Nina Mesing Stausholm

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work we consider how to construct sketches for weighted F₀, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector w ∈ (0,1]^u be given. For x ∈ {0,1}^u we are interested in estimating ||x∘w||₁ where ∘ is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al. (STOC 1998), we introduce a sketch (depending on w) that is linear over GF(2), mapping a vector x ∈ {0,1}^u to Hx ∈ {0,1}^τ for a matrix H sampled from a suitable distribution ℋ. Differential privacy is achieved by using randomized response, flipping each bit of Hx with probability p < 1/2. That is, for a vector φ ∈ {0,1}^τ where Pr[(φ)_j = 1] = p independently for each entry j, we consider the noisy sketch Hx + φ, where the addition of noise happens over GF(2). We show that for every choice of 0 < β < 1 and ε = O(1) there exists p < 1/2 and a distribution ℋ of linear sketches of size τ = O(log²(u)ε^{-2}β^{-2}) such that: 1) For random H∼ℋ and noise vector φ, given Hx + φ we can compute an estimate of ||x∘w||₁ that is accurate within a factor 1±β, plus additive error O(log(u)ε^{-2}β^{-2}), w. p. 1-u^{-1}, and 2) For every H∼ℋ, Hx + φ is ε-differentially private over the randomness in φ. The special case w = (1,… ,1) is unweighted F₀. Previously, Mir et al. (PODS 2011) and Kenthapadi et al. (J. Priv. Confidentiality 2013) had described a differentially private way of sketching unweighted F₀, but the algorithms for calibrating noise to their sketches are not computationally efficient, either using quasipolynomial time in the sketch size or superlinear time in the universe size u. For fixed ε the size of our sketch is polynomially related to the lower bound of Ω(log(u)β^{-2}) bits by Jayram & Woodruff (Trans. Algorithms 2013). The additive error is comparable to the bound of Ω(1/ε) of Hardt & Talwar (STOC 2010). An application of our sketch is that two sketches can be added to form a noisy sketch of the form H(x₁+x₂) + (φ₁+φ₂), which allows us to estimate ||(x₁+x₂)∘w||₁. Since addition is over GF(2), this is the weight of the symmetric difference of the vectors x₁ and x₂. Recent work has shown how to privately and efficiently compute an estimate for the symmetric difference size of two sets using (non-linear) sketches such as FM-sketches and Bloom Filters, but these methods have an error bound no better than O(√{̄{m}}), where ̄{m} is an upper bound on ||x₁||₀ and ||x₂||₀. This improves previous work when β = o (1/√{̄{m}}) and log(u)/ε = ̄{m}^{o(1)}. In conclusion our results both improve the efficiency of existing methods for unweighted F₀ estimation and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.

Cite as

Rasmus Pagh and Nina Mesing Stausholm. Efficient Differentially Private F₀ Linear Sketching. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{pagh_et_al:LIPIcs.ICDT.2021.18,
  author =	{Pagh, Rasmus and Stausholm, Nina Mesing},
  title =	{{Efficient Differentially Private F₀ Linear Sketching}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.18},
  URN =		{urn:nbn:de:0030-drops-137264},
  doi =		{10.4230/LIPIcs.ICDT.2021.18},
  annote =	{Keywords: Differential Privacy, Linear Sketches, Weighted F0 Estimation}
}

Document

DOI: 10.4230/LIPIcs.ITC.2020.15

Pure Differentially Private Summation from Anonymous Messages

Authors: Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker

Published in: LIPIcs, Volume 163, 1st Conference on Information-Theoretic Cryptography (ITC 2020)

Abstract

The shuffled (aka anonymous) model has recently generated significant interest as a candidate distributed privacy framework with trust assumptions better than the central model but with achievable error rates smaller than the local model. In this paper, we study pure differentially private protocols in the shuffled model for summation, a very basic and widely used primitive. Specifically: - For the binary summation problem where each of n users holds a bit as an input, we give a pure ε-differentially private protocol for estimating the number of ones held by the users up to an absolute error of O_{ε}(1), and where each user sends O_{ε}(log n) one-bit messages. This is the first pure protocol in the shuffled model with error o(√n) for constant values of ε. Using our binary summation protocol as a building block, we give a pure ε-differentially private protocol that performs summation of real numbers in [0, 1] up to an absolute error of O_{ε}(1), and where each user sends O_{ε}(log³ n) messages each consisting of O(log log n) bits. - In contrast, we show that for any pure ε-differentially private protocol for binary summation in the shuffled model having absolute error n^{0.5-Ω(1)}, the per user communication has to be at least Ω_{ε}(√{log n}) bits. This implies (i) the first separation between the (bounded-communication) multi-message shuffled model and the central model, and (ii) the first separation between pure and approximate differentially private protocols in the shuffled model. Interestingly, over the course of proving our lower bound, we have to consider (a generalization of) the following question that might be of independent interest: given γ ∈ (0, 1), what is the smallest positive integer m for which there exist two random variables X⁰ and X^1 supported on {0, … , m} such that (i) the total variation distance between X⁰ and X^1 is at least 1 - γ, and (ii) the moment generating functions of X⁰ and X^1 are within a constant factor of each other everywhere? We show that the answer to this question is m = Θ(√{log(1/γ)}).

Cite as

Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Pure Differentially Private Summation from Anonymous Messages. In 1st Conference on Information-Theoretic Cryptography (ITC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 163, pp. 15:1-15:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ITC.2020.15,
  author =	{Ghazi, Badih and Golowich, Noah and Kumar, Ravi and Manurangsi, Pasin and Pagh, Rasmus and Velingker, Ameya},
  title =	{{Pure Differentially Private Summation from Anonymous Messages}},
  booktitle =	{1st Conference on Information-Theoretic Cryptography (ITC 2020)},
  pages =	{15:1--15:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-151-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{163},
  editor =	{Tauman Kalai, Yael and Smith, Adam D. and Wichs, Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITC.2020.15},
  URN =		{urn:nbn:de:0030-drops-121208},
  doi =		{10.4230/LIPIcs.ITC.2020.15},
  annote =	{Keywords: Pure differential privacy, Shuffled model, Anonymous messages, Summation, Communication bounds}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2020.22

The Space Complexity of Inner Product Filters

Authors: Rasmus Pagh and Johan Sivertsen

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)

Abstract

Motivated by the problem of filtering candidate pairs in inner product similarity joins we study the following inner product estimation problem: Given parameters d∈ℕ, α>β≥0 and unit vectors x,y∈ ℝ^d consider the task of distinguishing between the cases ⟨x,y⟩≤β and ⟨x,y⟩≥α where ⟨x,y⟩ = ∑_{i=1}^d x_i y_i is the inner product of vectors x and y. The goal is to distinguish these cases based on information on each vector encoded independently in a bit string of the shortest length possible. In contrast to much work on compressing vectors using randomized dimensionality reduction, we seek to solve the problem deterministically, with no probability of error. Inner product estimation can be solved in general via estimating ⟨x,y⟩ with an additive error bounded by ε = α - β. We show that d log₂ (√{1-β}/ε) ± Θ(d) bits of information about each vector is necessary and sufficient. Our upper bound is constructive and improves a known upper bound of d log₂(1/ε) + O(d) by up to a factor of 2 when β is close to 1. The lower bound holds even in a stronger model where one of the vectors is known exactly, and an arbitrary estimation function is allowed.

Cite as

Rasmus Pagh and Johan Sivertsen. The Space Complexity of Inner Product Filters. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{pagh_et_al:LIPIcs.ICDT.2020.22,
  author =	{Pagh, Rasmus and Sivertsen, Johan},
  title =	{{The Space Complexity of Inner Product Filters}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{22:1--22:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.22},
  URN =		{urn:nbn:de:0030-drops-119468},
  doi =		{10.4230/LIPIcs.ICDT.2020.22},
  annote =	{Keywords: Similarity, estimation, dot product, filtering}
}

Document

DOI: 10.4230/LIPIcs.ESA.2019.10

PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

Authors: Martin Aumüller, Tobias Christiani, Rasmus Pagh, and Michael Vesterli

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Abstract

We present PUFFINN, a parameterless LSH-based index for solving the k-nearest neighbor problem with probabilistic guarantees. By parameterless we mean that the user is only required to specify the amount of memory the index is supposed to use and the result quality that should be achieved. The index combines several heuristic ideas known in the literature. By small adaptions to the query algorithm, we make heuristics rigorous. We perform experiments on real-world and synthetic inputs to evaluate implementation choices and show that the implementation satisfies the quality guarantees while being competitive with other state-of-the-art approaches to nearest neighbor search. We describe a novel synthetic data set that is difficult to solve for almost all existing nearest neighbor search approaches, and for which PUFFINN significantly outperform previous methods.

Cite as

Martin Aumüller, Tobias Christiani, Rasmus Pagh, and Michael Vesterli. PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 10:1-10:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{aumuller_et_al:LIPIcs.ESA.2019.10,
  author =	{Aum\"{u}ller, Martin and Christiani, Tobias and Pagh, Rasmus and Vesterli, Michael},
  title =	{{PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{10:1--10:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.10},
  URN =		{urn:nbn:de:0030-drops-111317},
  doi =		{10.4230/LIPIcs.ESA.2019.10},
  annote =	{Keywords: Nearest Neighbor Search, Locality-Sensitive Hashing, Adaptive Similarity Search}
}

Document

DOI: 10.4230/LIPIcs.ESA.2019.74

Hardness of Bichromatic Closest Pair with Jaccard Similarity

Authors: Rasmus Pagh, Nina Mesing Stausholm, and Mikkel Thorup

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Abstract

Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A x B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a cap b|/|a cup b| for (a,b) in A x B. We consider the approximate version of the problem where we are given thresholds j_1 > j_2 and wish to return a pair from A x B that has Jaccard similarity higher than j_2 if there exists a pair in A x B with Jaccard similarity at least j_1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n^(2-delta)) time if j_1 >= j_2^(1-delta). In particular, for delta=Omega(1), the approximation ratio j_1/j_2 = 1/j_2^delta increases polynomially in 1/j_2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n^(2-Omega(1))) time for j_1/j_2 = 1/j_2^o(1). Specifically, assuming OVC, we prove that for any delta>0 there exists an epsilon>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Omega(n^(2-delta)) for any choice of thresholds j_2 < j_1 < 1-delta, that satisfy j_1 <= j_2^(1-epsilon).

Cite as

Rasmus Pagh, Nina Mesing Stausholm, and Mikkel Thorup. Hardness of Bichromatic Closest Pair with Jaccard Similarity. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 74:1-74:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{pagh_et_al:LIPIcs.ESA.2019.74,
  author =	{Pagh, Rasmus and Stausholm, Nina Mesing and Thorup, Mikkel},
  title =	{{Hardness of Bichromatic Closest Pair with Jaccard Similarity}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{74:1--74:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.74},
  URN =		{urn:nbn:de:0030-drops-111951},
  doi =		{10.4230/LIPIcs.ESA.2019.74},
  annote =	{Keywords: fine-grained complexity, set similarity search, bichromatic closest pair, jaccard similarity}
}

Document

DOI: 10.4230/DagRep.7.5.1

Theory and Applications of Hashing (Dagstuhl Seminar 17181)

Authors: Martin Dietzfelbinger, Michael Mitzenmacher, Rasmus Pagh, David P. Woodruff, and Martin Aumüller

Published in: Dagstuhl Reports, Volume 7, Issue 5 (2018)

Abstract

This report documents the program and the topics discussed of the 4-day Dagstuhl Seminar 17181 "Theory and Applications of Hashing", which took place May 1-5, 2017. Four long and eighteen short talks covered a wide and diverse range of topics within the theme of the workshop. The program left sufficient space for informal discussions among the 40 participants.

Cite as

Martin Dietzfelbinger, Michael Mitzenmacher, Rasmus Pagh, David P. Woodruff, and Martin Aumüller. Theory and Applications of Hashing (Dagstuhl Seminar 17181). In Dagstuhl Reports, Volume 7, Issue 5, pp. 1-21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@Article{dietzfelbinger_et_al:DagRep.7.5.1,
  author =	{Dietzfelbinger, Martin and Mitzenmacher, Michael and Pagh, Rasmus and Woodruff, David P. and Aum\"{u}ller, Martin},
  title =	{{Theory and Applications of Hashing (Dagstuhl Seminar 17181)}},
  pages =	{1--21},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2017},
  volume =	{7},
  number =	{5},
  editor =	{Dietzfelbinger, Martin and Mitzenmacher, Michael and Pagh, Rasmus and Woodruff, David P. and Aum\"{u}ller, Martin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.7.5.1},
  URN =		{urn:nbn:de:0030-drops-82788},
  doi =		{10.4230/DagRep.7.5.1},
  annote =	{Keywords: connections to complexity theory, data streaming applications, hash function construction and analysis, hashing primitives, information retrieval applications, locality-sensitive hashing, machine learning applications}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2017.42

Range-Efficient Consistent Sampling and Locality-Sensitive Hashing for Polygons

Authors: Joachim Gudmundsson and Rasmus Pagh

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

Abstract

Locality-sensitive hashing (LSH) is a fundamental technique for similarity search and similarity estimation in high-dimensional spaces. The basic idea is that similar objects should produce hash collisions with probability significantly larger than objects with low similarity. We consider LSH for objects that can be represented as point sets in either one or two dimensions. To make the point sets finite size we consider the subset of points on a grid. Directly applying LSH (e.g. min-wise hashing) to these point sets would require time proportional to the number of points. We seek to achieve time that is much lower than direct approaches. Technically, we introduce new primitives for range-efficient consistent sampling (of independent interest), and show how to turn such samples into LSH values. Another application of our technique is a data structure for quickly estimating the size of the intersection or union of a set of preprocessed polygons. Curiously, our consistent sampling method uses transformation to a geometric problem.

Cite as

Joachim Gudmundsson and Rasmus Pagh. Range-Efficient Consistent Sampling and Locality-Sensitive Hashing for Polygons. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 42:1-42:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{gudmundsson_et_al:LIPIcs.ISAAC.2017.42,
  author =	{Gudmundsson, Joachim and Pagh, Rasmus},
  title =	{{Range-Efficient Consistent Sampling and Locality-Sensitive Hashing for Polygons}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{42:1--42:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.42},
  URN =		{urn:nbn:de:0030-drops-82316},
  doi =		{10.4230/LIPIcs.ISAAC.2017.42},
  annote =	{Keywords: Locality-sensitive hashing, probability distribution, polygon, min-wise hashing, consistent sampling}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.MFCS.2017.83

Hardness and Approximation of High-Dimensional Search Problems (Invited Talk)

Authors: Rasmus Pagh

Published in: LIPIcs, Volume 83, 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017)

Abstract

Hardness and Approximation of High-Dimensional Search Problems.

Cite as

Rasmus Pagh. Hardness and Approximation of High-Dimensional Search Problems (Invited Talk). In 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 83, p. 83:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{pagh:LIPIcs.MFCS.2017.83,
  author =	{Pagh, Rasmus},
  title =	{{Hardness and Approximation of High-Dimensional Search Problems}},
  booktitle =	{42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017)},
  pages =	{83:1--83:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-046-0},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{83},
  editor =	{Larsen, Kim G. and Bodlaender, Hans L. and Raskin, Jean-Francois},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2017.83},
  URN =		{urn:nbn:de:0030-drops-81416},
  doi =		{10.4230/LIPIcs.MFCS.2017.83},
  annote =	{Keywords: Hardness, high-dimensional search}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.SWAT.2016

LIPIcs, Volume 53, SWAT'16, Complete Volume

Authors: Rasmus Pagh

Published in: LIPIcs, Volume 53, 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)

Abstract

LIPIcs, Volume 53, SWAT'16, Complete Volume

Cite as

15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 53, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@Proceedings{pagh:LIPIcs.SWAT.2016,
  title =	{{LIPIcs, Volume 53, SWAT'16, Complete Volume}},
  booktitle =	{15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-011-8},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{53},
  editor =	{Pagh, Rasmus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2016},
  URN =		{urn:nbn:de:0030-drops-60601},
  doi =		{10.4230/LIPIcs.SWAT.2016},
  annote =	{Keywords: Theory of Computation}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.SWAT.2016.0

Front Matter, Table of Contents, Preface, Program Committee, Subreviewers

Authors: Rasmus Pagh

Published in: LIPIcs, Volume 53, 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)

Abstract

Front Matter, Table of Contents, Preface, Program Committee, Subreviewers

Cite as

15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 53, pp. 0:i-0:xiv, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{pagh:LIPIcs.SWAT.2016.0,
  author =	{Pagh, Rasmus},
  title =	{{Front Matter, Table of Contents, Preface, Program Committee, Subreviewers}},
  booktitle =	{15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016)},
  pages =	{0:i--0:xiv},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-011-8},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{53},
  editor =	{Pagh, Rasmus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2016.0},
  URN =		{urn:nbn:de:0030-drops-60229},
  doi =		{10.4230/LIPIcs.SWAT.2016.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Program Committee, Subreviewers}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2015.15

Large-Scale Similarity Joins With Guarantees (Invited Talk)

Authors: Rasmus Pagh

Published in: LIPIcs, Volume 31, 18th International Conference on Database Theory (ICDT 2015)

Abstract

The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on randomized techniques that have improved guarantees on computational cost but come with a probability of not returning the correct result. The aim of this paper is to give an overview of randomized techniques for high-dimensional similarity search, and discuss recent advances towards making these techniques more widely applicable by eliminating probability of error and improving the locality of data access.

Cite as

Rasmus Pagh. Large-Scale Similarity Joins With Guarantees (Invited Talk). In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 15-24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{pagh:LIPIcs.ICDT.2015.15,
  author =	{Pagh, Rasmus},
  title =	{{Large-Scale Similarity Joins With Guarantees}},
  booktitle =	{18th International Conference on Database Theory (ICDT 2015)},
  pages =	{15--24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-79-8},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{31},
  editor =	{Arenas, Marcelo and Ugarte, Mart{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2015.15},
  URN =		{urn:nbn:de:0030-drops-49995},
  doi =		{10.4230/LIPIcs.ICDT.2015.15},
  annote =	{Keywords: Similarity join, filtering, locality-sensitive hashing, recall}
}

Search Results

Documents authored by Pagh, Rasmus

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as