Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)

We study locality-sensitive hash methods for the nearest neighbor problem for the angular distance, focusing on the approach of first projecting down onto a random low-dimensional subspace, and then partitioning the projected vectors according to the Voronoi cells induced by a well-chosen spherical code. This approach generalizes and interpolates between the fast but asymptotically suboptimal hyperplane hashing of Charikar [STOC 2002], and asymptotically optimal but practically often slower hash families of e.g. Andoni - Indyk [FOCS 2006], Andoni - Indyk - Nguyen - Razenshteyn [SODA 2014] and Andoni - Indyk - Laarhoven - Razenshteyn - Schmidt [NIPS 2015]. We set up a framework for analyzing the performance of any spherical code in this context, and we provide results for various codes appearing in the literature, such as those related to regular polytopes and root lattices. Similar to hyperplane hashing, and unlike e.g. cross-polytope hashing, our analysis of collision probabilities and query exponents is exact and does not hide any order terms which vanish only for large d, thus facilitating an easier parameter selection in practical applications.
For the two-dimensional case, we analytically derive closed-form expressions for arbitrary spherical codes, and we show that the equilateral triangle is optimal, achieving a better performance than the two-dimensional analogues of hyperplane and cross-polytope hashing. In three and four dimensions, we numerically find that the tetrahedron and 5-cell (the 3-simplex and 4-simplex) and the 16-cell (the 4-orthoplex) achieve the best query exponents, while in five or more dimensions orthoplices appear to outperform regular simplices, as well as the root lattice families A_k and D_k in terms of minimizing the query exponent. We provide lower bounds based on spherical caps, and we predict that in higher dimensions, larger spherical codes exist which outperform orthoplices in terms of the query exponent, and we argue why using the D_k root lattices will likely lead to better results in practice as well (compared to using cross-polytopes), due to a better trade-off between the asymptotic query exponent and the concrete costs of hashing.

Thijs Laarhoven. Polytopes, Lattices, and Spherical Codes for the Nearest Neighbor Problem. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 76:1-76:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{laarhoven:LIPIcs.ICALP.2020.76, author = {Laarhoven, Thijs}, title = {{Polytopes, Lattices, and Spherical Codes for the Nearest Neighbor Problem}}, booktitle = {47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)}, pages = {76:1--76:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-138-2}, ISSN = {1868-8969}, year = {2020}, volume = {168}, editor = {Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.76}, URN = {urn:nbn:de:0030-drops-124834}, doi = {10.4230/LIPIcs.ICALP.2020.76}, annote = {Keywords: (approximate) nearest neighbor problem, spherical codes, polytopes, lattices, locality-sensitive hashing (LSH)} }

Document

**Published in:** LIPIcs, Volume 99, 34th International Symposium on Computational Geometry (SoCG 2018)

We take a first step towards a rigorous asymptotic analysis of graph-based methods for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of randomized greedy walks on the approximate nearest neighbor graph. For random data sets of size n = 2^{o(d)} on the d-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approximate nearest neighbor problem with approximation factor c > 1 in query time n^{rho_{q} + o(1)} and space n^{1 + rho_{s} + o(1)}, for arbitrary rho_{q}, rho_{s} >= 0 satisfying (2c^2 - 1) rho_{q} + 2 c^2 (c^2 - 1) sqrt{rho_{s} (1 - rho_{s})} >= c^4. Graph-based near neighbor searching is especially competitive with hash-based methods for small c and near-linear memory, and in this regime the asymptotic scaling of a greedy graph-based search matches optimal hash-based trade-offs of Andoni-Laarhoven-Razenshteyn-Waingarten [Andoni et al., 2017]. We further study how the trade-offs scale when the data set is of size n = 2^{Theta(d)}, and analyze asymptotic complexities when applying these results to lattice sieving.

Thijs Laarhoven. Graph-Based Time-Space Trade-Offs for Approximate Near Neighbors. In 34th International Symposium on Computational Geometry (SoCG 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 99, pp. 57:1-57:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{laarhoven:LIPIcs.SoCG.2018.57, author = {Laarhoven, Thijs}, title = {{Graph-Based Time-Space Trade-Offs for Approximate Near Neighbors}}, booktitle = {34th International Symposium on Computational Geometry (SoCG 2018)}, pages = {57:1--57:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-066-8}, ISSN = {1868-8969}, year = {2018}, volume = {99}, editor = {Speckmann, Bettina and T\'{o}th, Csaba D.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2018.57}, URN = {urn:nbn:de:0030-drops-87700}, doi = {10.4230/LIPIcs.SoCG.2018.57}, annote = {Keywords: approximate nearest neighbor problem, near neighbor graphs, locality-sensitive hashing, locality-sensitive filters, similarity search} }

Document

**Published in:** LIPIcs, Volume 83, 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017)

A celebrated technique for finding near neighbors for the angular distance involves using a set of random hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of orthogonal hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka, WADS 2007]. However, no theoretical explanation for this improvement was ever given, and it remained unclear how the resulting hypercube hash method scales in high dimensions.
In this work, we provide explicit asymptotics for the collision probabilities when using hypercubes to partition the space. For instance, two near-orthogonal vectors are expected to collide with probability (1/pi)^d in dimension d, compared to (1/2)^d when using random hyperplanes. Vectors at angle pi/3 collide with probability (sqrt[3]/pi)^d, compared to (2/3)^d for random hyperplanes, and near-parallel vectors collide with similar asymptotic probabilities in both cases.
For c-approximate nearest neighbor searching, this translates to a decrease in the exponent rho of locality-sensitive hashing (LSH) methods of a factor up to log2(pi) ~ 1.652 compared to hyperplane LSH. For c = 2, we obtain rho ~ 0.302 for hypercube LSH, improving upon the rho ~ 0.377 for hyperplane LSH. We further describe how to use hypercube LSH in practice, and we consider an example application in the area of lattice algorithms.

Thijs Laarhoven. Hypercube LSH for Approximate near Neighbors. In 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 83, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{laarhoven:LIPIcs.MFCS.2017.7, author = {Laarhoven, Thijs}, title = {{Hypercube LSH for Approximate near Neighbors}}, booktitle = {42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017)}, pages = {7:1--7:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-046-0}, ISSN = {1868-8969}, year = {2017}, volume = {83}, editor = {Larsen, Kim G. and Bodlaender, Hans L. and Raskin, Jean-Francois}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2017.7}, URN = {urn:nbn:de:0030-drops-80926}, doi = {10.4230/LIPIcs.MFCS.2017.7}, annote = {Keywords: (approximate) near neighbors, locality-sensitive hashing, large deviations, dimensionality reduction, lattice algorithms} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail