Hardness of Bichromatic Closest Pair with Jaccard Similarity

Authors Rasmus Pagh , Nina Mesing Stausholm , Mikkel Thorup



PDF
Thumbnail PDF

File

LIPIcs.ESA.2019.74.pdf
  • Filesize: 0.5 MB
  • 13 pages

Document Identifiers

Author Details

Rasmus Pagh
  • BARC, Copenhagen, Denmark
  • IT University of Copenhagen, Denmark
Nina Mesing Stausholm
  • BARC, Copenhagen, Denmark
  • IT University of Copenhagen, Denmark
Mikkel Thorup
  • BARC, Copenhagen, Denmark
  • University of Copenhagen, Denmark

Acknowledgements

We want to thank A. Rubinstein for helping us understand the background of his results in [Aviad Rubinstein, 2018].

Cite AsGet BibTex

Rasmus Pagh, Nina Mesing Stausholm, and Mikkel Thorup. Hardness of Bichromatic Closest Pair with Jaccard Similarity. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 74:1-74:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ESA.2019.74

Abstract

Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A x B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a cap b|/|a cup b| for (a,b) in A x B. We consider the approximate version of the problem where we are given thresholds j_1 > j_2 and wish to return a pair from A x B that has Jaccard similarity higher than j_2 if there exists a pair in A x B with Jaccard similarity at least j_1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n^(2-delta)) time if j_1 >= j_2^(1-delta). In particular, for delta=Omega(1), the approximation ratio j_1/j_2 = 1/j_2^delta increases polynomially in 1/j_2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n^(2-Omega(1))) time for j_1/j_2 = 1/j_2^o(1). Specifically, assuming OVC, we prove that for any delta>0 there exists an epsilon>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Omega(n^(2-delta)) for any choice of thresholds j_2 < j_1 < 1-delta, that satisfy j_1 <= j_2^(1-epsilon).

Subject Classification

ACM Subject Classification
  • Theory of computation → Problems, reductions and completeness
Keywords
  • fine-grained complexity
  • set similarity search
  • bichromatic closest pair
  • jaccard similarity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Andrei Z Broder. On the resemblance and containment of documents. In Compression and complexity of sequences 1997. proceedings, pages 21-29. IEEE, 1997. Google Scholar
  2. Lijie Chen. On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product. In 33rd Computational Complexity Conference, CCC 2018, June 22-24, 2018, San Diego, CA, USA, pages 14:1-14:45, 2018. URL: https://doi.org/10.4230/LIPIcs.CCC.2018.14.
  3. Lijie Chen and Ryan Williams. An Equivalence Class for Orthogonal Vectors. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 21-40, 2019. URL: https://doi.org/10.1137/1.9781611975482.2.
  4. Tobias Christiani and Rasmus Pagh. Set similarity search beyond MinHash. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 1094-1107, 2017. URL: https://doi.org/10.1145/3055399.3055443.
  5. Ashish Goel, Aneesh Sharma, Dong Wang, and Zhijun Yin. Discovering similar users on twitter. In 11th Workshop on Mining and Learning with Graphs, 2013. Google Scholar
  6. Pankaj Gupta, Ashish Goel, Jimmy J. Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. WTF: the who to follow service at twitter. In 22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May 13-17, 2013, pages 505-514, 2013. URL: https://doi.org/10.1145/2488388.2488433.
  7. Piotr Indyk and Rajeev Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23-26, 1998, pages 604-613, 1998. URL: https://doi.org/10.1145/276698.276876.
  8. Rasmus Pagh, Nina Stausholm, and Mikkel Thorup. Hardness of Bichromatic Closest Pair with Jaccard Similarity, 2019. URL: http://arxiv.org/abs/1907.02251.
  9. Aviad Rubinstein. Hardness of approximate nearest neighbor search. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 1260-1268, 2018. URL: https://doi.org/10.1145/3188745.3188916.
  10. Gregory Valiant. Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem. J. ACM, 62(2):13:1-13:45, 2015. URL: https://doi.org/10.1145/2728167.
  11. Virginia Vassilevska Williams. Some Open Problems in Fine-Grained Complexity. SIGACT News, 49(4):29-35, 2018. URL: https://doi.org/10.1145/3300150.3300158.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail