Near Neighbor Search via Efficient Average Distortion Embeddings

Authors Deepanshu Kush, Aleksandar Nikolov, Haohua Tang



PDF
Thumbnail PDF

File

LIPIcs.SoCG.2021.50.pdf
  • Filesize: 0.64 MB
  • 14 pages

Document Identifiers

Author Details

Deepanshu Kush
  • University of Toronto, Canada
Aleksandar Nikolov
  • University of Toronto, Canada
Haohua Tang
  • University of Toronto, Canada

Acknowledgements

We want to thank Sushant Sachdeva for a fruitful discussion.

Cite AsGet BibTex

Deepanshu Kush, Aleksandar Nikolov, and Haohua Tang. Near Neighbor Search via Efficient Average Distortion Embeddings. In 37th International Symposium on Computational Geometry (SoCG 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 189, pp. 50:1-50:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.SoCG.2021.50

Abstract

A recent series of papers by Andoni, Naor, Nikolov, Razenshteyn, and Waingarten (STOC 2018, FOCS 2018) has given approximate near neighbour search (NNS) data structures for a wide class of distance metrics, including all norms. In particular, these data structures achieve approximation on the order of p for 𝓁_p^d norms with space complexity nearly linear in the dataset size n and polynomial in the dimension d, and query time sub-linear in n and polynomial in d. The main shortcoming is the exponential in d pre-processing time required for their construction. In this paper, we describe a more direct framework for constructing NNS data structures for general norms. More specifically, we show via an algorithmic reduction that an efficient NNS data structure for a metric ℳ is implied by an efficient average distortion embedding of ℳ into 𝓁₁ or the Euclidean space. In particular, the resulting data structures require only polynomial pre-processing time, as long as the embedding can be computed in polynomial time. As a concrete instantiation of this framework, we give an NNS data structure for 𝓁_p with efficient pre-processing that matches the approximation factor, space and query complexity of the aforementioned data structure of Andoni et al. On the way, we resolve a question of Naor (Analysis and Geometry in Metric Spaces, 2014) and provide an explicit, efficiently computable embedding of 𝓁_p, for p ≥ 1, into 𝓁₁ with average distortion on the order of p. Furthermore, we also give data structures for Schatten-p spaces with improved space and query complexity, albeit still requiring exponential pre-processing when p ≥ 2. We expect our approach to pave the way for constructing efficient NNS data structures for all norms.

Subject Classification

ACM Subject Classification
  • Theory of computation → Nearest neighbor algorithms
Keywords
  • Nearest neighbor search
  • metric space embeddings
  • average distortion embeddings
  • locality-sensitive hashing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexandr Andoni, Dorian Croitoru, and Mihai Patrascu. Hardness of Nearest Neighbor under L-infinity. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS '2008), pages 424-433, 2008. Google Scholar
  2. Alexandr Andoni and Piotr Indyk. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS '2006), pages 459-468, 2006. Google Scholar
  3. Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond Locality-Sensitive Hashing. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA '2014), pages 1018-1028, 2014. Available as URL: https://arxiv.org/abs/1306.1547.
  4. Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn. Approximate nearest neighbor search in high dimensions. In Proceedings of ICM 2018 (to appear), 2018. Google Scholar
  5. Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, and Erik Waingarten. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms (SODA '2017), pages 47-66, 2017. Available as URL: https://arxiv.org/abs/1608.03580.
  6. Alexandr Andoni, Assaf Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, and Erik Waingarten. Data-dependent hashing via nonlinear spectral gaps. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 787-800. ACM, 2018. URL: https://doi.org/10.1145/3188745.3188846.
  7. Alexandr Andoni, Assaf Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, and Erik Waingarten. Hölder homeomorphisms and approximate nearest neighbors. In Mikkel Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 159-169. IEEE Computer Society, 2018. URL: https://doi.org/10.1109/FOCS.2018.00024.
  8. Alexandr Andoni and Ilya Razenshteyn. Optimal Data-Dependent Hashing for Approximate Near Neighbors. In Proceedings of the 47th ACM Symposium on the Theory of Computing (STOC '2015), pages 793-801, 2015. Available as URL: https://arxiv.org/abs/1501.01062.
  9. Yair Bartal and Lee-Ad Gottlieb. Approximate nearest neighbor search for 𝓁_p-spaces (2lesspless∞). Theor. Comput. Sci., 757:27-35, 2019. URL: https://doi.org/10.1016/j.tcs.2018.07.011.
  10. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4):509-522, April 2002. URL: https://doi.org/10.1109/34.993558.
  11. Yoav Benyamini and Joram Lindenstrauss. Geometric Nonlinear Functional Analysis. Vol. 1, volume 48 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, RI, 2000. Google Scholar
  12. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In Proceedings of the 20th ACM Symposium on Computational Geometry (SoCG '2004), pages 253-262, 2004. Google Scholar
  13. Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity Search in High Dimensions via Hashing. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB '1999), pages 518-529, 1999. Google Scholar
  14. Piotr Indyk. On Approximate Nearest Neighbors under 𝓁_∞ Norm. Journal of Computer and System Sciences, 63(4):627-638, 2001. Google Scholar
  15. Piotr Indyk and Rajeev Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing (STOC '1998), pages 604-613, 1998. Google Scholar
  16. Yi Li, Huy L. Nguyên, and David P. Woodruff. On Sketching Matrix Norms and the Top Singular Vector. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA '2014), pages 1562-1581, 2014. Google Scholar
  17. Yi Li and David P. Woodruff. On Approximating Functions of the Singular Values in a Stream. In Proceedings of the 48th ACM Symposium on the Theory of Computing (STOC '2016), pages 726-739, 2016. Google Scholar
  18. Yi Li and David P. Woodruff. Embeddings of Schatten Norms with Applications to Data Streams. In Proceedings of the 44th International Colloquium on Automata, Languages and Programming (ICALP '2017), pages 60:1-60:14, 2017. Google Scholar
  19. Richard J. Lipton and Robert Endre Tarjan. Applications of a planar separator theorem. SIAM J. Comput., 9(3):615-627, 1980. URL: https://doi.org/10.1137/0209046.
  20. Jiří Matoušek. On Embedding Expanders into 𝓁_p Spaces. Israel Journal of Mathematics, 102:189-197, 1997. Google Scholar
  21. Stefan Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2):286-303, 1993. URL: https://doi.org/10.1006/inco.1993.1057.
  22. Bartlett W. Mel. Seemore: Combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Computation, 9(4):777-804, 1997. URL: https://doi.org/10.1162/neco.1997.9.4.777.
  23. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119, 2013. Google Scholar
  24. Renée J. Miller, Fatemeh Nargesian, Erkang Zhu, Christina Christodoulakis, Ken Q. Pu, and Periklis Andritsos. Making open data transparent: Data discovery on open data. IEEE Data Eng. Bull., 41(2):59-70, 2018. URL: http://sites.computer.org/debull/A18june/p59.pdf.
  25. Assaf Naor. Comparison of Metric Spectral Gaps. Analysis and Geometry in Metric Spaces, 2:1-52, 2014. Google Scholar
  26. Assaf Naor. An average John theorem. arXiv preprint, 2019. URL: http://arxiv.org/abs/1905.01280.
  27. Assaf Naor, Gilles Pisier, and Gideon Schechtman. Impossibility of Dimension Reduction in the Nuclear Norm. In Proceedings of the 29th ACM-SIAM Symposium on Discrete Algorithms (SODA '2018), 2018. Google Scholar
  28. Assaf Naor and Yuval Rabani. On Approximate Nearest Neighbor Search in 𝓁_p, p > 2. Manuscript, available on request, 2006. Google Scholar
  29. Éric Ricard. Hölder Estimates for the Noncommutative Mazur Map. Archiv der Mathematik, 104(1):37-45, 2015. Google Scholar
  30. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357-365, 2005. URL: https://doi.org/10.1016/j.tcs.2005.09.023.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail