Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

Andoni, Alexandr; Razensteyn, Ilya

doi:10.4230/LIPIcs.SoCG.2016.9

File

Subject Classification

Keywords

similarity search
high-dimensional geometry
LSH
data structures
lower bounds

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

We prove a tight lower bound for the exponent rho for data-dependent Locality-Sensitive Hashing schemes, recently used to design efficient solutions for the c-approximate nearest neighbor search. In particular, our lower bound matches the bound of rho<= 1/(2c-1)+o(1) for the l_1 space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15]. In recent years it emerged that data-dependent hashing is strictly superior to the classical Locality-Sensitive Hashing, when the hash function is data-independent. In the latter setting, the best exponent has been already known: for the l_1 space, the tight bound is rho=1/c, with the upper bound from [Indyk-Motwani,STOC'98] and the matching lower bound from [O'Donnell-Wu-Zhou,ITCS'11]. We prove that, even if the hashing is data-dependent, it must hold that rho>=1/(2c-1)-o(1). To prove the result, we need to formalize the exact notion of data-dependent hashing that also captures the complexity of the hash functions (in addition to their collision properties). Without restricting such complexity, we would allow for obviously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such solutions, we require our hash functions to be succinct. This condition is satisfied by all the known algorithmic results.

Cite As Get BibTex

Alexandr Andoni and Ilya Razensteyn. Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 9:1-9:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.SoCG.2016.9

Author Details

Alexandr Andoni

Ilya Razensteyn

References

Alexandr Andoni. Nearest Neighbor Search: the Old, the New, and the Impossible. PhD thesis, Massachusetts Institute of Technology, 2009.
Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pages 459-468, 2006.
Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117-122, 2008.
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. Practical and optimal LSH for angular distance. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS'15), 2015.
Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond locality-sensitive hashing. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'14), pages 1018-1028, 2014.
Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the ACM Symposium on the Theory of Computing (STOC'15), 2015.
Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. New directions in nearest neighbor searching with applications to lattice sieving. In Proceedings of the 27th ACM-SIAM Symposium on Discrete Algorithms (SODA'16), 2016.
Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1):60-65, 2003.
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th ACM Symposium on Computational Geometry (SoCG'04), pages 253-262, 2004.
Moshe Dubiner. Bucketing coding and information theory for the statistical highdimensional nearest-neighbor problem. IEEE Transactions on Information Theory, 56(8):4166-4179, 2010.
Uriel Feige and Gideon Schechtman. On the optimality of the random hyperplane rounding technique for MAX CUT. Random Structures and Algorithms, 20(3):403-440, 2002.
Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: towards removing the curse of dimensionality. Theory of Computing, 8(1):321-350, 2012.
Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing (STOC'98), pages 604-613, 1998.
William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, Connecticut, 1982), volume 26 of Contemporary Mathematics, pages 189-206. 1984.
Subhash Khot and Nisheeth K. Vishnoi. The unique games conjecture, integrality gap for cut problems and embeddability of negative-type metrics into l₁. J. ACM, 62(1):8:1-8:39, 2015.
Eyal Kushilevitz, Rafail Ostrovky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457-474, 2000.
Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215-245, 1995.
Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower bounds on locality sensitive hashing. SIAM Journal on Discrete Mathematics, 21(4):930-935, 2007.
Ryan O'Donnell, Yi Wu, and Yuan Zhou. Optimal lower bounds for locality sensitive hashing (except when q is tiny). In Proceedings of Innovations in Computer Science (ICS'11), pages 275-283, 2011.
Rina Panigrahy, Kunal Talwar, and Udi Wieder. A geometric approach to lower bounds for approximate near-neighbor search and partial match. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS'08), pages 414-423, 2008.
Rina Panigrahy, Kunal Talwar, and Udi Wieder. Lower bounds on near neighbor search via metric expansion. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS'10), pages 805-814, 2010.
Hannan Samet. Foundations of Multidimensional and Metric Data Structures. Elsevier, 2006.
Kengo Terasawa and Yuzuru Tanaka. Spherical LSH for approximate nearest neighbor search on unit hypersphere. In Algorithms and Data Structures, 10th International Workshop, WADS 2007, Halifax, Canada, August 15-17, 2007, Proceedings, pages 27-38, 2007.
Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. Hashing for similarity search: A survey. CoRR, abs/1408.2927, 2014.

Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

Authors Alexandr Andoni, Ilya Razensteyn

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message