Let P be a set of n colored points. We develop efficient data structures that store P and can answer chromatic k-nearest neighbor (k-NN) queries. Such a query consists of a query point q and a number k, and asks for the color that appears most frequently among the k points in P closest to q. Answering such queries efficiently is the key to obtain fast k-NN classifiers. Our main aim is to obtain query times that are independent of k while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the k-nearest neighbors of a query point q, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of O(n^{1/2} log n) for points in ℝ¹, and with query times varying between O(n^{2/3}log^{2/3} n) and O(n^{5/6} polylog n), depending on the distance measure used, for points in ℝ². These results can be extended to work in higher dimensions as well. Since the query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least (1-ε)f^* times, where f^* is the frequency of the most frequent color, we obtain a query time of O(log n + log log_{1/(1-ε)} n) in ℝ¹ and expected query times ranging between Õ(n^{1/2}ε^{-3/2}) and Õ(n^{1/2}ε^{-5/2}) in ℝ² using near-linear space (ignoring polylogarithmic factors).
@InProceedings{vanderhorst_et_al:LIPIcs.ESA.2022.67, author = {van der Horst, Thijs and L\"{o}ffler, Maarten and Staals, Frank}, title = {{Chromatic k-Nearest Neighbor Queries}}, booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)}, pages = {67:1--67:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-247-1}, ISSN = {1868-8969}, year = {2022}, volume = {244}, editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.67}, URN = {urn:nbn:de:0030-drops-170055}, doi = {10.4230/LIPIcs.ESA.2022.67}, annote = {Keywords: data structure, nearest neighbor, classification} }
Feedback for Dagstuhl Publishing