Hypercube LSH for Approximate near Neighbors

Author Thijs Laarhoven



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2017.7.pdf
  • Filesize: 0.63 MB
  • 20 pages

Document Identifiers

Author Details

Thijs Laarhoven

Cite As Get BibTex

Thijs Laarhoven. Hypercube LSH for Approximate near Neighbors. In 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 83, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.MFCS.2017.7

Abstract

A celebrated technique for finding near neighbors for the angular distance involves using a set of random hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of orthogonal hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka, WADS 2007]. However, no theoretical explanation for this improvement was ever given, and it remained unclear how the resulting hypercube hash method scales in high dimensions.

In this work, we provide explicit asymptotics for the collision probabilities when using hypercubes to partition the space. For instance, two near-orthogonal vectors are expected to collide with probability (1/pi)^d in dimension d, compared to (1/2)^d when using random hyperplanes. Vectors at angle pi/3 collide with probability (sqrt[3]/pi)^d, compared to (2/3)^d for random hyperplanes, and near-parallel vectors collide with similar asymptotic probabilities in both cases.

For c-approximate nearest neighbor searching, this translates to a decrease in the exponent rho of locality-sensitive hashing (LSH) methods of a factor up to log2(pi) ~ 1.652 compared to hyperplane LSH. For c = 2, we obtain rho ~ 0.302 for hypercube LSH, improving upon the rho ~ 0.377 for hyperplane LSH. We further describe how to use hypercube LSH in practice, and we consider an example application in the area of lattice algorithms.

Subject Classification

Keywords
  • (approximate) near neighbors
  • locality-sensitive hashing
  • large deviations
  • dimensionality reduction
  • lattice algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. SVP challenge, 2015. URL: http://latticechallenge.org/svp-challenge/.
  2. Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Formulas. Dover Publications, 1972. URL: http://people.math.sfu.ca/~cbm/aands/toc.htm.
  3. Dimitris Achlioptas. Database-friendly random projections. In PODS, pages 274-281, 2001. URL: http://dx.doi.org/10.1145/375551.375608.
  4. Alexandr Andoni. Nearest Neighbor Search: the Old, the New, and the Impossible. PhD thesis, Massachusetts Institute of Technology, 2009. URL: http://hdl.handle.net/1721.1/55090.
  5. Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. Practical and optimal LSH for angular distance. In NIPS, pages 1225-1233, 2015. URL: https://papers.nips.cc/paper/5893-practical-and-optimal-lsh-for-angular-distance.
  6. Alexandr Andoni, Piotr Indyk, Huy Lê Nguyên, and Ilya Razenshteyn. Beyond locality-sensitive hashing. In SODA, pages 1018-1028, 2014. URL: http://dx.doi.org/10.1137/1.9781611973402.76.
  7. Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, and Erik Waingarten. Optimal hashing-based time-space trade-offs for approximate near neighbors. In SODA, pages 47-66, 2017. URL: http://dx.doi.org/10.1137/1.9781611974782.4.
  8. Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In STOC, pages 793-801, 2015. URL: http://dx.doi.org/10.1145/2746539.2746553.
  9. Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. New directions in nearest neighbor searching with applications to lattice sieving. In SODA, pages 10-24, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch2.
  10. Anja Becker and Thijs Laarhoven. Efficient (ideal) lattice sieving using cross-polytope LSH. In AFRICACRYPT, pages 3-23, 2016. URL: http://dx.doi.org/10.1007/978-3-319-31517-1_1.
  11. Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, 2006. Google Scholar
  12. Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, pages 380-388, 2002. URL: http://dx.doi.org/10.1145/509907.509965.
  13. Tobias Christiani. A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In SODA, pages 31-46, 2017. URL: http://dx.doi.org/10.1137/1.9781611974782.3.
  14. Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications (2nd edition). Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-03311-7.
  15. Moshe Dubiner. Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem. IEEE Transactions on Information Theory, 56(8):4166-4179, Aug 2010. URL: http://dx.doi.org/10.1109/TIT.2010.2050814.
  16. Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification (2nd Edition). Wiley, 2000. Google Scholar
  17. Kave Eshghi and Shyamsundar Rajaram. Locality sensitive hash functions based on concomitant rank order statistics. In KDD, pages 221-229, 2008. URL: http://dx.doi.org/10.1145/1401890.1401921.
  18. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, pages 604-613, 1998. URL: http://dx.doi.org/10.1145/276698.276876.
  19. Tiefeng Jiang. How many entries of a typical orthogonal matrix can be approximated by independent normals? The Annals of Probability, 34(4):1497-1529, 2006. URL: http://dx.doi.org/10.1214/009117906000000205.
  20. William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26(1):189-206, 1984. URL: http://dx.doi.org/10.1090/conm/026/737400.
  21. Christopher Kennedy and Rachel Ward. Fast cross-polytope locality-sensitive hashing. In ITCS, 2017. URL: https://arxiv.org/abs/1602.06922.
  22. Thorsten Kleinjung. Private communication, 2014. Google Scholar
  23. Thijs Laarhoven. Sieving for shortest vectors in lattices using angular locality-sensitive hashing. In CRYPTO, pages 3-22, 2015. URL: http://dx.doi.org/10.1007/978-3-662-47989-6_1.
  24. Thijs Laarhoven and Benne de Weger. Faster sieving for shortest lattice vectors using spherical locality-sensitive hashing. In LATINCRYPT, pages 101-118, 2015. URL: http://dx.doi.org/10.1007/978-3-319-22174-8_6.
  25. Artur Mariano. Private communication., 2016. Google Scholar
  26. Artur Mariano and Christian Bischof. Enhancing the scalability and memory usage of HashSieve on multi-core CPUs. In PDP, pages 545-552, 2016. URL: http://dx.doi.org/10.1109/PDP.2016.31.
  27. Artur Mariano, Thijs Laarhoven, and Christian Bischof. Parallel (probable) lock-free HashSieve: a practical sieving algorithm for the SVP. In ICPP, pages 590-599, 2015. URL: https://eprint.iacr.org/2015/041.
  28. Artur Mariano, Thijs Laarhoven, and Christian Bischof. A parallel variant of LDSieve for the SVP on lattices. PDP, 2017. Google Scholar
  29. Alexander May and Ilya Ozerov. On computing nearest neighbors with applications to decoding of binary linear codes. In EUROCRYPT, pages 203-228, 2015. URL: http://dx.doi.org/10.1007/978-3-662-46800-5_9.
  30. Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower bounds on locality sensitive hashing. SIAM Journal of Discrete Mathematics, 21(4):930-935, 2007. URL: http://dx.doi.org/10.1137/050646858.
  31. Phong Q. Nguyên and Thomas Vidick. Sieve algorithms for the shortest vector problem are practical. Journal of Mathematical Cryptology, 2(2):181-207, 2008. URL: http://dx.doi.org/10.1515/JMC.2008.009.
  32. Ryan O'Donnell, Yi Wu, and Yuan Zhou. Optimal lower bounds for locality sensitive hashing (except when q is tiny). In ICS, pages 276-283, 2011. URL: http://conference.itcs.tsinghua.edu.cn/ICS2011/content/papers/2.html.
  33. Ludwig Schmidt, Matthew Sharifi, and Ignacio Lopez-Moreno. Large-scale speaker identification. In ICASSP, pages 1650-1654, 2014. URL: http://dx.doi.org/10.1109/ICASSP.2014.6853878.
  34. Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press, 2005. URL: http://ttic.uchicago.edu/~gregory/annbook/book.html.
  35. Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden, and Pradeep Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. VLDB, 6(14):1930-1941, 2013. URL: http://dx.doi.org/10.14778/2556549.2556574.
  36. Kengo Terasawa and Yuzuru Tanaka. Spherical LSH for approximate nearest neighbor search on unit hypersphere. In WADS, pages 27-38, 2007. URL: http://dx.doi.org/10.1007/978-3-540-73951-7_4.
  37. Kengo Terasawa and Yuzuru Tanaka. Approximate nearest neighbor search for a dataset of normalized vectors. In IEICE Transactions on Information and Systems, volume 92, pages 1609-1619, 2009. URL: http://search.ieice.org/bin/summary.php?id=e92-d_9_1609.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail