Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

Authors Alexandr Andoni, Ilya Razensteyn



PDF
Thumbnail PDF

File

LIPIcs.SoCG.2016.9.pdf
  • Filesize: 444 kB
  • 11 pages

Document Identifiers

Author Details

Alexandr Andoni
Ilya Razensteyn

Cite AsGet BibTex

Alexandr Andoni and Ilya Razensteyn. Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 9:1-9:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.SoCG.2016.9

Abstract

We prove a tight lower bound for the exponent rho for data-dependent Locality-Sensitive Hashing schemes, recently used to design efficient solutions for the c-approximate nearest neighbor search. In particular, our lower bound matches the bound of rho<= 1/(2c-1)+o(1) for the l_1 space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15]. In recent years it emerged that data-dependent hashing is strictly superior to the classical Locality-Sensitive Hashing, when the hash function is data-independent. In the latter setting, the best exponent has been already known: for the l_1 space, the tight bound is rho=1/c, with the upper bound from [Indyk-Motwani,STOC'98] and the matching lower bound from [O'Donnell-Wu-Zhou,ITCS'11]. We prove that, even if the hashing is data-dependent, it must hold that rho>=1/(2c-1)-o(1). To prove the result, we need to formalize the exact notion of data-dependent hashing that also captures the complexity of the hash functions (in addition to their collision properties). Without restricting such complexity, we would allow for obviously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such solutions, we require our hash functions to be succinct. This condition is satisfied by all the known algorithmic results.
Keywords
  • similarity search
  • high-dimensional geometry
  • LSH
  • data structures
  • lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexandr Andoni. Nearest Neighbor Search: the Old, the New, and the Impossible. PhD thesis, Massachusetts Institute of Technology, 2009. Google Scholar
  2. Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pages 459-468, 2006. Google Scholar
  3. Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117-122, 2008. Google Scholar
  4. Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. Practical and optimal LSH for angular distance. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS'15), 2015. Google Scholar
  5. Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond locality-sensitive hashing. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'14), pages 1018-1028, 2014. Google Scholar
  6. Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the ACM Symposium on the Theory of Computing (STOC'15), 2015. Google Scholar
  7. Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. New directions in nearest neighbor searching with applications to lattice sieving. In Proceedings of the 27th ACM-SIAM Symposium on Discrete Algorithms (SODA'16), 2016. Google Scholar
  8. Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1):60-65, 2003. Google Scholar
  9. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th ACM Symposium on Computational Geometry (SoCG'04), pages 253-262, 2004. Google Scholar
  10. Moshe Dubiner. Bucketing coding and information theory for the statistical highdimensional nearest-neighbor problem. IEEE Transactions on Information Theory, 56(8):4166-4179, 2010. Google Scholar
  11. Uriel Feige and Gideon Schechtman. On the optimality of the random hyperplane rounding technique for MAX CUT. Random Structures and Algorithms, 20(3):403-440, 2002. Google Scholar
  12. Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: towards removing the curse of dimensionality. Theory of Computing, 8(1):321-350, 2012. Google Scholar
  13. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing (STOC'98), pages 604-613, 1998. Google Scholar
  14. William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, Connecticut, 1982), volume 26 of Contemporary Mathematics, pages 189-206. 1984. Google Scholar
  15. Subhash Khot and Nisheeth K. Vishnoi. The unique games conjecture, integrality gap for cut problems and embeddability of negative-type metrics into l₁. J. ACM, 62(1):8:1-8:39, 2015. Google Scholar
  16. Eyal Kushilevitz, Rafail Ostrovky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457-474, 2000. Google Scholar
  17. Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215-245, 1995. Google Scholar
  18. Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower bounds on locality sensitive hashing. SIAM Journal on Discrete Mathematics, 21(4):930-935, 2007. Google Scholar
  19. Ryan O'Donnell, Yi Wu, and Yuan Zhou. Optimal lower bounds for locality sensitive hashing (except when q is tiny). In Proceedings of Innovations in Computer Science (ICS'11), pages 275-283, 2011. Google Scholar
  20. Rina Panigrahy, Kunal Talwar, and Udi Wieder. A geometric approach to lower bounds for approximate near-neighbor search and partial match. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS'08), pages 414-423, 2008. Google Scholar
  21. Rina Panigrahy, Kunal Talwar, and Udi Wieder. Lower bounds on near neighbor search via metric expansion. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS'10), pages 805-814, 2010. Google Scholar
  22. Hannan Samet. Foundations of Multidimensional and Metric Data Structures. Elsevier, 2006. Google Scholar
  23. Kengo Terasawa and Yuzuru Tanaka. Spherical LSH for approximate nearest neighbor search on unit hypersphere. In Algorithms and Data Structures, 10th International Workshop, WADS 2007, Halifax, Canada, August 15-17, 2007, Proceedings, pages 27-38, 2007. Google Scholar
  24. Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. Hashing for similarity search: A survey. CoRR, abs/1408.2927, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail