Pagh, Rasmus ;
Stausholm, Nina Mesing ;
Thorup, Mikkel
Hardness of Bichromatic Closest Pair with Jaccard Similarity
Abstract
Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A x B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity a cap b/a cup b for (a,b) in A x B.
We consider the approximate version of the problem where we are given thresholds j_1 > j_2 and wish to return a pair from A x B that has Jaccard similarity higher than j_2 if there exists a pair in A x B with Jaccard similarity at least j_1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n^(2delta)) time if j_1 >= j_2^(1delta). In particular, for delta=Omega(1), the approximation ratio j_1/j_2 = 1/j_2^delta increases polynomially in 1/j_2.
In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n^(2Omega(1))) time for j_1/j_2 = 1/j_2^o(1). Specifically, assuming OVC, we prove that for any delta>0 there exists an epsilon>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Omega(n^(2delta)) for any choice of thresholds j_2 < j_1 < 1delta, that satisfy j_1 <= j_2^(1epsilon).
BibTeX  Entry
@InProceedings{pagh_et_al:LIPIcs:2019:11195,
author = {Rasmus Pagh and Nina Mesing Stausholm and Mikkel Thorup},
title = {{Hardness of Bichromatic Closest Pair with Jaccard Similarity}},
booktitle = {27th Annual European Symposium on Algorithms (ESA 2019)},
pages = {74:174:13},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771245},
ISSN = {18688969},
year = {2019},
volume = {144},
editor = {Michael A. Bender and Ola Svensson and Grzegorz Herman},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2019/11195},
URN = {urn:nbn:de:0030drops111951},
doi = {10.4230/LIPIcs.ESA.2019.74},
annote = {Keywords: finegrained complexity, set similarity search, bichromatic closest pair, jaccard similarity}
}
06.09.2019
Keywords: 

finegrained complexity, set similarity search, bichromatic closest pair, jaccard similarity 
Seminar: 

27th Annual European Symposium on Algorithms (ESA 2019)

Issue date: 

2019 
Date of publication: 

06.09.2019 