Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor

Anagnostopoulos, Evangelos; Emiris, Ioannis Z.; Psarros, Ioannis

doi:10.4230/LIPIcs.SOCG.2015.436

File

Subject Classification

Keywords

Approximate nearest neighbor
Randomized embeddings
Curse of dimensionality
Johnson-Lindenstrauss Lemma
Bounded expansion rate
Experimental study

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

The approximate nearest neighbor problem (epsilon-ANN) in Euclidean settings is a fundamental question, which has been addressed by two main approaches: Data-dependent space partitioning techniques perform well when the dimension is relatively low, but are affected by the curse of dimensionality. On the other hand, locality sensitive hashing has polynomial dependence in the dimension, sublinear query time with an exponent inversely proportional to (1+epsilon)^2, and subquadratic space requirement. We generalize the Johnson-Lindenstrauss Lemma to define "low-quality" mappings to a Euclidean space of significantly lower dimension, such that they satisfy a requirement weaker than approximately preserving all distances or even preserving the nearest neighbor. This mapping guarantees, with high probability, that an approximate nearest neighbor lies among the k approximate nearest neighbors in the projected space. These can be efficiently retrieved while using only linear storage by a data structure, such as BBD-trees. Our overall algorithm, given n points in dimension d, achieves space usage in O(dn), preprocessing time in O(dn log n), and query time in O(d n^{rho} log n), where rho is proportional to 1 - 1/loglog n, for fixed epsilon in (0, 1). The dimension reduction is larger if one assumes that point sets possess some structure, namely bounded expansion rate. We implement our method and present experimental results in up to 500 dimensions and 10^6 points, which show that the practical performance is better than predicted by the theoretical analysis. In addition, we compare our approach with E2LSH.

Cite As Get BibTex

Evangelos Anagnostopoulos, Ioannis Z. Emiris, and Ioannis Psarros. Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor. In 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 34, pp. 436-450, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.SOCG.2015.436

Author Details

Evangelos Anagnostopoulos

Ioannis Z. Emiris

Ioannis Psarros

References

I. Abraham, Y. Bartal, and O. Neiman. Local embeddings of metric spaces. In Proc. 39th ACM Symposium on Theory of Computing, pages 631-640. ACM Press, 2007.
D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671-687, 2003.
A. Andoni and P. Indyk. E²LSH 0.1 User Manual, Implementation of LSH: E2LSH, http://www.mit.edu/~andoni/LSH, 2005.
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117-122, 2008.
A. Andoni and I. Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In arXiv:1501.01062, to appear in the Proc. 47th ACM Symp. Theory of Computing, STOC'15, 2015.
S. Arya, G. D. da Fonseca, and D. M. Mount. Approximate polytope membership queries. In Proc. 43rd Annual ACM Symp. Theory of Computing, STOC'11, pages 579-586, 2011.
S. Arya, T. Malamatos, and D. M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. J. ACM, 57(1):1:1-1:54, 2009.
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6):891-923, 1998.
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In Proc. 23rd Intern. Conf. Machine Learning, ICML'06, pages 97-104, 2006.
S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In Proc. 40th Annual ACM Symp. Theory of Computing, STOC'08, pages 537-546, 2008.
S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms, 22(1):60-65, 2003.
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proc. 20th Annual Symp. Computational Geometry, SCG'04, pages 253-262, 2004.
A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded geometries, fractals, and low-distortion embeddings. In Proc. 44th Annual IEEE Symp. Foundations of Computer Science, FOCS'03, pages 534-, 2003.
S. Har-Peled and M. Mendel. Fast construction of nets in low dimensional metrics, and their applications. In Proc. 21st Annual Symp. Computational Geometry, SCG'05, pages 150-158, 2005.
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Symp. Theory of Computing, STOC'98, pages 604-613, 1998.
P. Indyk and A. Naor. Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms, 3(3), 2007.
H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(1):117-128, 2011.
W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26:189-206, 1984.
D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proc. 34th Annual ACM Symp. Theory of Computing, STOC'02, pages 741-750, 2002.
R. Krauthgamer and J. R. Lee. Navigating nets: Simple algorithms for proximity search. In Proc. 15th Annual ACM-SIAM Symp. Discrete Algorithms, SODA'04, pages 798-807, 2004.
S. Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2):286-303, 1993.
D. M. Mount. ANN programming manual: http://www.cs.umd.edu/~mount/ANN/, 2010.
R. O'Donnell, Yi Wu, and Y. Zhou. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Trans. Comput. Theory, 6(1):5:1-5:13, 2014.
R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In Proc. 17th Annual ACM-SIAM Symp. Discrete Algorithms, SODA'06, pages 1186-1195, 2006.
I. Psarros. Low quality embeddings and approximate nearest neighbors, MSc Thesis, Dept. of Informatics & Telecommunications, University of Athens, 2014.
S. Vempala. Randomly-oriented k-d trees adapt to intrinsic dimension. In Proc. Foundations of Software Technology & Theor. Computer Science, pages 48-57, 2012.

Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor

Authors Evangelos Anagnostopoulos, Ioannis Z. Emiris, Ioannis Psarros

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message