Algorithm Engineering for High-Dimensional Similarity Search Problems (Invited Talk)

Author Martin Aumüller



PDF
Thumbnail PDF

File

LIPIcs.SEA.2020.1.pdf
  • Filesize: 310 kB
  • 3 pages

Document Identifiers

Author Details

Martin Aumüller
  • IT University of Copenhagen, Denmark

Cite AsGet BibTex

Martin Aumüller. Algorithm Engineering for High-Dimensional Similarity Search Problems (Invited Talk). In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 1:1-1:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.SEA.2020.1

Abstract

Similarity search problems in high-dimensional data arise in many areas of computer science such as data bases, image analysis, machine learning, and natural language processing. One of the most prominent problems is finding the k nearest neighbors of a data point q ∈ ℝ^d in a large set of data points S ⊂ ℝ^d, under same distance measure such as Euclidean distance. In contrast to lower dimensional settings, we do not know of worst-case efficient data structures for such search problems in high-dimensional data, i.e., data structures that are faster than a linear scan through the data set. However, there is a rich body of (often heuristic) approaches that solve nearest neighbor search problems much faster than such a scan on many real-world data sets. As a necessity, the term solve means that these approaches give approximate results that are close to the true k-nearest neighbors. In this talk, we survey recent approaches to nearest neighbor search and related problems. The talk consists of three parts: (1) What makes nearest neighbor search difficult? (2) How do current state-of-the-art algorithms work? (3) What are recent advances regarding similarity search on GPUs, in distributed settings, or in external memory?

Subject Classification

ACM Subject Classification
  • Theory of computation → Nearest neighbor algorithms
  • Computing methodologies → Simulation evaluation
  • Theory of computation → Computational geometry
Keywords
  • Nearest neighbor search
  • Benchmarking

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Thomas D. Ahle, Martin Aumüller, and Rasmus Pagh. Parameter-free locality sensitive hashing for spherical range reporting. In SODA, pages 239-256. SIAM, 2017. Google Scholar
  2. Martin Aumüller, Erik Bernhardsson, and Alexander John Faithfull. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst., 87, 2020. See https://arxiv.org/abs/1807.05614 for an open access version. Google Scholar
  3. Martin Aumüller and Matteo Ceccarello. The role of local intrinsic dimensionality in benchmarking nearest neighbor search. In SISAP, volume 11807 of Lecture Notes in Computer Science, pages 113-127. Springer, 2019. Google Scholar
  4. Martin Aumüller, Tobias Christiani, Rasmus Pagh, and Michael Vesterli. PUFFINN: parameterless and universally fast finding of nearest neighbors. In ESA, volume 144 of LIPIcs, pages 10:1-10:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://github.com/puffinn/puffinn.
  5. Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbor" meaningful? In ICDT, volume 1540 of Lecture Notes in Computer Science, pages 217-235. Springer, 1999. Google Scholar
  6. Tobias Christiani. Fast locality-sensitive hashing frameworks for approximate near neighbor search. In SISAP, volume 11807 of Lecture Notes in Computer Science, pages 3-17. Springer, 2019. Google Scholar
  7. Yihe Dong, Piotr Indyk, Ilya Razenshteyn, and Tal Wagner. Learning space partitions for nearest neighbor search. In International Conference on Learning Representations, 2020. Google Scholar
  8. Fabian Fier, Nikolaus Augsten, Panagiotis Bouros, Ulf Leser, and Johann-Christoph Freytag. Set similarity joins on MapReduce: An experimental survey. PVLDB, 11(10):1110-1122, 2018. Google Scholar
  9. Fabian Groh, Lukas Ruppert, Patrick Wieschollek, and Hendrik Lensch. Ggnn: Graph-based gpu nearest neighbor search. arXiv preprint arXiv:1912.01059, 2019. Google Scholar
  10. Junfeng He, Sanjiv Kumar, and Shih-Fu Chang. On the difficulty of nearest neighbor search. In ICML. icml.cc / Omnipress, 2012. Google Scholar
  11. Xiao Hu, Ke Yi, and Yufei Tao. Output-optimal massively parallel algorithms for similarity joins. ACM Transactions on Database Systems, 44(2):1-36, April 2019. URL: https://doi.org/10.1145/3311967.
  12. Piotr Indyk and Rajeev Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC '98, pages 604-613, New York, NY, USA, 1998. ACM. URL: https://doi.org/10.1145/276698.276876.
  13. M. Iwasaki and D. Miyazaki. Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data. ArXiv e-prints, October 2018. URL: http://arxiv.org/abs/1810.07355.
  14. Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data, pages 489-504, 2018. Google Scholar
  15. Thijs Laarhoven. Graph-based time-space trade-offs for approximate near neighbors. In Symposium on Computational Geometry, volume 99 of LIPIcs, pages 57:1-57:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. Google Scholar
  16. Yury A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824-836, 2020. Google Scholar
  17. Willi Mann, Nikolaus Augsten, and Panagiotis Bouros. An empirical evaluation of set similarity join techniques. Proceedings of the VLDB Endowment, 9(9):636-647, May 2016. URL: https://doi.org/10.14778/2947618.2947620.
  18. Samuel McCauley and Francesco Silvestri. Adaptive MapReduce similarity joins. In BeyondMR@SIGMOD, pages 4:1-4:4. ACM, 2018. Google Scholar
  19. Milos Radovanovic. Hubs in nearest-neighbor graphs: Origins, applications and challenges. In WIMS, pages 5:1-5:4. ACM, 2018. Google Scholar
  20. Suhas Jayaram Subramanya, Devvrit, Rohan Kadekodi, Ravishankar Krishnaswamy, and Harsha Simhadri. DiskANN: Fast accurate billion-point nearest neighbor search on a single node. In NeurIPS 2019, November 2019. URL: https://www.microsoft.com/en-us/research/publication/diskann-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node/.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail