Chromatic k-Nearest Neighbor Queries

Authors Thijs van der Horst, Maarten Löffler, Frank Staals



PDF
Thumbnail PDF

File

LIPIcs.ESA.2022.67.pdf
  • Filesize: 1.09 MB
  • 14 pages

Document Identifiers

Author Details

Thijs van der Horst
  • Department of Information and Computing Sciences, Utrecht University, The Netherlands
Maarten Löffler
  • Department of Information and Computing Sciences, Utrecht University, The Netherlands
Frank Staals
  • Department of Information and Computing Sciences, Utrecht University, The Netherlands

Acknowledgements

We would like to thank an anonymous reviewer for the randomized solution presented in Section 3.2.1, which led to our current solution for finding 𝒟^k_2(q) in Section 3.2.2.

Cite AsGet BibTex

Thijs van der Horst, Maarten Löffler, and Frank Staals. Chromatic k-Nearest Neighbor Queries. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 67:1-67:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ESA.2022.67

Abstract

Let P be a set of n colored points. We develop efficient data structures that store P and can answer chromatic k-nearest neighbor (k-NN) queries. Such a query consists of a query point q and a number k, and asks for the color that appears most frequently among the k points in P closest to q. Answering such queries efficiently is the key to obtain fast k-NN classifiers. Our main aim is to obtain query times that are independent of k while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the k-nearest neighbors of a query point q, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of O(n^{1/2} log n) for points in ℝ¹, and with query times varying between O(n^{2/3}log^{2/3} n) and O(n^{5/6} polylog n), depending on the distance measure used, for points in ℝ². These results can be extended to work in higher dimensions as well. Since the query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least (1-ε)f^* times, where f^* is the frequency of the most frequent color, we obtain a query time of O(log n + log log_{1/(1-ε)} n) in ℝ¹ and expected query times ranging between Õ(n^{1/2}ε^{-3/2}) and Õ(n^{1/2}ε^{-5/2}) in ℝ² using near-linear space (ignoring polylogarithmic factors).

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • data structure
  • nearest neighbor
  • classification

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K. Agarwal, Alon Efrat, and Micha Sharir. Vertical decomposition of shallow levels in 3-dimensional arrangements and its applications. SIAM Journal on Computing, 29:912-953, 1999. Google Scholar
  2. Pankaj K. Agarwal, Jirí Matousek, and Micha Sharir. On range searching with semialgebraic sets. II. SIAM J. Comput., 42(6):2039-2062, 2013. URL: https://doi.org/10.1137/120890855.
  3. Charu C Aggarwal. Data classification: algorithms and applications. CRC press, 2014. Google Scholar
  4. Sunil Arya, David M. Mount, and Jian Xia. Tight lower bounds for halfspace range searching. Discrete & Computational Geometry, 47(4):711-730, 2012. URL: https://doi.org/10.1007/s00454-012-9412-x.
  5. Jon Louis Bentley. Multidimensional divide-and-conquer. Communications of the ACM, 23(4):214-229, 1980. URL: https://doi.org/10.1145/358841.358850.
  6. Cecilia Bohler, Panagiotis Cheilaris, Rolf Klein, Chih-Hung Liu, Evanthia Papadopoulou, and Maksym Zavershynskyi. On the complexity of higher order abstract voronoi diagrams. Computational Geometry, 48(8):539-551, 2015. URL: https://doi.org/10.1016/j.comgeo.2015.04.008.
  7. Prosenjit Bose, Evangelos Kranakis, Pat Morin, and Yihui Tang. Approximate range mode and range median queries. In Volker Diekert and Bruno Durand, editors, STACS, pages 377-388, 2005. URL: https://doi.org/10.1007/978-3-540-31856-9_31.
  8. Timothy M. Chan, Stephane Durocher, Kasper Green Larsen, Jason Morrison, and Bryan T. Wilkinson. Linear-space data structures for range mode query in arrays. Theory of Computing Systems, 55:719-741, 2014. Google Scholar
  9. Timothy M. Chan, Qizheng He, and Yakov Nekrich. Further Results on Colored Range Searching. In Sergio Cabello and Danny Z. Chen, editors, 36th International Symposium on Computational Geometry (SoCG 2020), volume 164 of Leibniz International Proceedings in Informatics (LIPIcs), pages 28:1-28:15, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.SoCG.2020.28.
  10. Thomas Cover and Peter Hart. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21-27, 1967. Google Scholar
  11. Jerome H. Friedman, Jon Louis Bentley, and Raphael A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209-226, 1977. URL: https://doi.org/10.1145/355744.355745.
  12. W. E. Henley and D. J. Hand. A k-nearest-neighbour classifier for assessing consumer credit risk. Journal of the Royal Statistical Society. Series D (The Statistician), 45(1):77-95, 1996. URL: http://www.jstor.org/stable/2348414.
  13. Haim Kaplan, Wolfgang Mulzer, Liam Roditty, Paul Seiferth, and Micha Sharir. Dynamic planar voronoi diagrams for general distance functions and their algorithmic applications. Discrete & Computational Geometry, 64:838-904, 2020. Google Scholar
  14. Yan-Nei Law and Carlo Zaniolo. An adaptive nearest neighbor classification algorithm for data streams. In Alípio Mário Jorge, Luís Torgo, Pavel Brazdil, Rui Camacho, and João Gama, editors, Knowledge Discovery in Databases: PKDD 2005, pages 108-120, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. Google Scholar
  15. D. T. Lee. On k-nearest neighbor voronoi diagrams in the plane. IEEE Transactions on Computing, 31:478-487, 1982. Google Scholar
  16. Chih-Hung Liu. Nearly optimal planar k nearest neighbors queries under general distance functions. In Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2842-2859. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.173.
  17. Chih-Hung Liu, Evanthia Papadopoulou, and Der-Tsai Lee. The k-nearest-neighbor voronoi diagram revisited. Algorithmica, 71(2):429-449, 2015. URL: https://doi.org/10.1007/s00453-013-9809-9.
  18. N. Megiddo. Applying parallel computation algorithms in the design of serial algorithms. Journal of the ACM, 30(4):852-865, 1983. Google Scholar
  19. Nimrod Megiddo. Combinatorial optimization with rational objective functions. Mathematics of Operations Research, 4(4):414-424, 1979. URL: https://doi.org/10.1287/moor.4.4.414.
  20. D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. Chromatic nearest neighbor searching: A query sensitive approach. Computational Geometry, 17:97-119, 2000. Google Scholar
  21. Neil Sarnak and Robert E Tarjan. Planar point location using persistent search trees. Communications of the ACM, 29(7):669-679, 1986. Google Scholar
  22. Robert Endre Tarjan. Data structures and network algorithms. SIAM, 1983. Google Scholar
  23. Dan E. Willard. New data structures for orthogonal range queries. SIAM Journal on Computing, 14(1):232-253, 1985. URL: https://doi.org/10.1137/0214019.