Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor

Authors Evangelos Anagnostopoulos, Ioannis Z. Emiris, Ioannis Psarros



PDF
Thumbnail PDF

File

LIPIcs.SOCG.2015.436.pdf
  • Filesize: 0.52 MB
  • 15 pages

Document Identifiers

Author Details

Evangelos Anagnostopoulos
Ioannis Z. Emiris
Ioannis Psarros

Cite As Get BibTex

Evangelos Anagnostopoulos, Ioannis Z. Emiris, and Ioannis Psarros. Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor. In 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 34, pp. 436-450, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.SOCG.2015.436

Abstract

The approximate nearest neighbor problem (epsilon-ANN) in Euclidean settings is a fundamental question, which has been addressed by two main approaches: Data-dependent space partitioning techniques perform well when the dimension is relatively low, but are affected by the curse of dimensionality. On the other hand, locality sensitive hashing has polynomial dependence in the dimension, sublinear query time with an exponent inversely proportional to (1+epsilon)^2, and subquadratic space requirement. 

We generalize the Johnson-Lindenstrauss Lemma to define "low-quality" mappings to a Euclidean space of significantly lower dimension, such that they satisfy a requirement weaker than approximately preserving all distances or even preserving the nearest neighbor. This mapping guarantees, with high probability, that an approximate nearest neighbor lies among the k approximate nearest neighbors in the projected space. These can be efficiently retrieved while using only linear storage by a data structure, such as BBD-trees. Our overall algorithm, given n points in dimension d, achieves space usage in O(dn), preprocessing time in O(dn log n), and query time in O(d n^{rho} log n), where rho is proportional to 1 - 1/loglog n, for fixed epsilon in (0, 1). The dimension reduction is larger if one assumes that point sets possess some structure, namely bounded expansion rate. We implement our method and present experimental results in up to 500 dimensions and 10^6 points, which show that the practical performance is better than predicted by the theoretical analysis. In addition, we compare our approach with E2LSH.

Subject Classification

Keywords
  • Approximate nearest neighbor
  • Randomized embeddings
  • Curse of dimensionality
  • Johnson-Lindenstrauss Lemma
  • Bounded expansion rate
  • Experimental study

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. I. Abraham, Y. Bartal, and O. Neiman. Local embeddings of metric spaces. In Proc. 39th ACM Symposium on Theory of Computing, pages 631-640. ACM Press, 2007. Google Scholar
  2. D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671-687, 2003. Google Scholar
  3. A. Andoni and P. Indyk. E²LSH 0.1 User Manual, Implementation of LSH: E2LSH, http://www.mit.edu/~andoni/LSH, 2005.
  4. A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117-122, 2008. Google Scholar
  5. A. Andoni and I. Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In arXiv:1501.01062, to appear in the Proc. 47th ACM Symp. Theory of Computing, STOC'15, 2015. Google Scholar
  6. S. Arya, G. D. da Fonseca, and D. M. Mount. Approximate polytope membership queries. In Proc. 43rd Annual ACM Symp. Theory of Computing, STOC'11, pages 579-586, 2011. Google Scholar
  7. S. Arya, T. Malamatos, and D. M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. J. ACM, 57(1):1:1-1:54, 2009. Google Scholar
  8. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6):891-923, 1998. Google Scholar
  9. A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In Proc. 23rd Intern. Conf. Machine Learning, ICML'06, pages 97-104, 2006. Google Scholar
  10. S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In Proc. 40th Annual ACM Symp. Theory of Computing, STOC'08, pages 537-546, 2008. Google Scholar
  11. S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms, 22(1):60-65, 2003. Google Scholar
  12. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proc. 20th Annual Symp. Computational Geometry, SCG'04, pages 253-262, 2004. Google Scholar
  13. A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded geometries, fractals, and low-distortion embeddings. In Proc. 44th Annual IEEE Symp. Foundations of Computer Science, FOCS'03, pages 534-, 2003. Google Scholar
  14. S. Har-Peled and M. Mendel. Fast construction of nets in low dimensional metrics, and their applications. In Proc. 21st Annual Symp. Computational Geometry, SCG'05, pages 150-158, 2005. Google Scholar
  15. P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Symp. Theory of Computing, STOC'98, pages 604-613, 1998. Google Scholar
  16. P. Indyk and A. Naor. Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms, 3(3), 2007. Google Scholar
  17. H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(1):117-128, 2011. Google Scholar
  18. W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26:189-206, 1984. Google Scholar
  19. D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proc. 34th Annual ACM Symp. Theory of Computing, STOC'02, pages 741-750, 2002. Google Scholar
  20. R. Krauthgamer and J. R. Lee. Navigating nets: Simple algorithms for proximity search. In Proc. 15th Annual ACM-SIAM Symp. Discrete Algorithms, SODA'04, pages 798-807, 2004. Google Scholar
  21. S. Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2):286-303, 1993. Google Scholar
  22. D. M. Mount. ANN programming manual: http://www.cs.umd.edu/~mount/ANN/, 2010.
  23. R. O'Donnell, Yi Wu, and Y. Zhou. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Trans. Comput. Theory, 6(1):5:1-5:13, 2014. Google Scholar
  24. R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In Proc. 17th Annual ACM-SIAM Symp. Discrete Algorithms, SODA'06, pages 1186-1195, 2006. Google Scholar
  25. I. Psarros. Low quality embeddings and approximate nearest neighbors, MSc Thesis, Dept. of Informatics & Telecommunications, University of Athens, 2014. Google Scholar
  26. S. Vempala. Randomly-oriented k-d trees adapt to intrinsic dimension. In Proc. Foundations of Software Technology & Theor. Computer Science, pages 48-57, 2012. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail