Near Neighbor Search via Efficient Average Distortion Embeddings

Kush, Deepanshu; Nikolov, Aleksandar; Tang, Haohua

doi:10.4230/LIPIcs.SoCG.2021.50

File

Subject Classification

ACM Subject Classification

Theory of computation → Nearest neighbor algorithms

Keywords

Nearest neighbor search
metric space embeddings
average distortion embeddings
locality-sensitive hashing

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

A recent series of papers by Andoni, Naor, Nikolov, Razenshteyn, and Waingarten (STOC 2018, FOCS 2018) has given approximate near neighbour search (NNS) data structures for a wide class of distance metrics, including all norms. In particular, these data structures achieve approximation on the order of p for 𝓁_p^d norms with space complexity nearly linear in the dataset size n and polynomial in the dimension d, and query time sub-linear in n and polynomial in d. The main shortcoming is the exponential in d pre-processing time required for their construction. In this paper, we describe a more direct framework for constructing NNS data structures for general norms. More specifically, we show via an algorithmic reduction that an efficient NNS data structure for a metric ℳ is implied by an efficient average distortion embedding of ℳ into 𝓁₁ or the Euclidean space. In particular, the resulting data structures require only polynomial pre-processing time, as long as the embedding can be computed in polynomial time. As a concrete instantiation of this framework, we give an NNS data structure for 𝓁_p with efficient pre-processing that matches the approximation factor, space and query complexity of the aforementioned data structure of Andoni et al. On the way, we resolve a question of Naor (Analysis and Geometry in Metric Spaces, 2014) and provide an explicit, efficiently computable embedding of 𝓁_p, for p ≥ 1, into 𝓁₁ with average distortion on the order of p. Furthermore, we also give data structures for Schatten-p spaces with improved space and query complexity, albeit still requiring exponential pre-processing when p ≥ 2. We expect our approach to pave the way for constructing efficient NNS data structures for all norms.

Cite As Get BibTex

Deepanshu Kush, Aleksandar Nikolov, and Haohua Tang. Near Neighbor Search via Efficient Average Distortion Embeddings. In 37th International Symposium on Computational Geometry (SoCG 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 189, pp. 50:1-50:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.SoCG.2021.50

Author Details

Deepanshu Kush

University of Toronto, Canada

Aleksandar Nikolov

University of Toronto, Canada

Haohua Tang

University of Toronto, Canada

References

Alexandr Andoni, Dorian Croitoru, and Mihai Patrascu. Hardness of Nearest Neighbor under L-infinity. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS '2008), pages 424-433, 2008.
Alexandr Andoni and Piotr Indyk. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS '2006), pages 459-468, 2006.
Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond Locality-Sensitive Hashing. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA '2014), pages 1018-1028, 2014. Available as URL: https://arxiv.org/abs/1306.1547.
Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn. Approximate nearest neighbor search in high dimensions. In Proceedings of ICM 2018 (to appear), 2018.
Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, and Erik Waingarten. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms (SODA '2017), pages 47-66, 2017. Available as URL: https://arxiv.org/abs/1608.03580.
Alexandr Andoni, Assaf Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, and Erik Waingarten. Data-dependent hashing via nonlinear spectral gaps. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 787-800. ACM, 2018. URL: https://doi.org/10.1145/3188745.3188846.
Alexandr Andoni, Assaf Naor, Aleksandar Nikolov, Ilya P. Razenshteyn, and Erik Waingarten. Hölder homeomorphisms and approximate nearest neighbors. In Mikkel Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 159-169. IEEE Computer Society, 2018. URL: https://doi.org/10.1109/FOCS.2018.00024.
Alexandr Andoni and Ilya Razenshteyn. Optimal Data-Dependent Hashing for Approximate Near Neighbors. In Proceedings of the 47th ACM Symposium on the Theory of Computing (STOC '2015), pages 793-801, 2015. Available as URL: https://arxiv.org/abs/1501.01062.
Yair Bartal and Lee-Ad Gottlieb. Approximate nearest neighbor search for 𝓁_p-spaces (2lesspless∞). Theor. Comput. Sci., 757:27-35, 2019. URL: https://doi.org/10.1016/j.tcs.2018.07.011.
S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4):509-522, April 2002. URL: https://doi.org/10.1109/34.993558.
Yoav Benyamini and Joram Lindenstrauss. Geometric Nonlinear Functional Analysis. Vol. 1, volume 48 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, RI, 2000.
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In Proceedings of the 20th ACM Symposium on Computational Geometry (SoCG '2004), pages 253-262, 2004.
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity Search in High Dimensions via Hashing. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB '1999), pages 518-529, 1999.
Piotr Indyk. On Approximate Nearest Neighbors under 𝓁_∞ Norm. Journal of Computer and System Sciences, 63(4):627-638, 2001.
Piotr Indyk and Rajeev Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing (STOC '1998), pages 604-613, 1998.
Yi Li, Huy L. Nguyên, and David P. Woodruff. On Sketching Matrix Norms and the Top Singular Vector. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA '2014), pages 1562-1581, 2014.
Yi Li and David P. Woodruff. On Approximating Functions of the Singular Values in a Stream. In Proceedings of the 48th ACM Symposium on the Theory of Computing (STOC '2016), pages 726-739, 2016.
Yi Li and David P. Woodruff. Embeddings of Schatten Norms with Applications to Data Streams. In Proceedings of the 44th International Colloquium on Automata, Languages and Programming (ICALP '2017), pages 60:1-60:14, 2017.
Richard J. Lipton and Robert Endre Tarjan. Applications of a planar separator theorem. SIAM J. Comput., 9(3):615-627, 1980. URL: https://doi.org/10.1137/0209046.
Jiří Matoušek. On Embedding Expanders into 𝓁_p Spaces. Israel Journal of Mathematics, 102:189-197, 1997.
Stefan Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2):286-303, 1993. URL: https://doi.org/10.1006/inco.1993.1057.
Bartlett W. Mel. Seemore: Combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Computation, 9(4):777-804, 1997. URL: https://doi.org/10.1162/neco.1997.9.4.777.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119, 2013.
Renée J. Miller, Fatemeh Nargesian, Erkang Zhu, Christina Christodoulakis, Ken Q. Pu, and Periklis Andritsos. Making open data transparent: Data discovery on open data. IEEE Data Eng. Bull., 41(2):59-70, 2018. URL: http://sites.computer.org/debull/A18june/p59.pdf.
Assaf Naor. Comparison of Metric Spectral Gaps. Analysis and Geometry in Metric Spaces, 2:1-52, 2014.
Assaf Naor. An average John theorem. arXiv preprint, 2019. URL: http://arxiv.org/abs/1905.01280.
Assaf Naor, Gilles Pisier, and Gideon Schechtman. Impossibility of Dimension Reduction in the Nuclear Norm. In Proceedings of the 29th ACM-SIAM Symposium on Discrete Algorithms (SODA '2018), 2018.
Assaf Naor and Yuval Rabani. On Approximate Nearest Neighbor Search in 𝓁_p, p > 2. Manuscript, available on request, 2006.
Éric Ricard. Hölder Estimates for the Noncommutative Mazur Map. Archiv der Mathematik, 104(1):37-45, 2015.
Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357-365, 2005. URL: https://doi.org/10.1016/j.tcs.2005.09.023.

Near Neighbor Search via Efficient Average Distortion Embeddings

Authors Deepanshu Kush, Aleksandar Nikolov, Haohua Tang

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Near Neighbor Search via Efficient Average Distortion Embeddings

Authors Deepanshu Kush, Aleksandar Nikolov, Haohua Tang

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message