,
David M. Mount
Creative Commons Attribution 4.0 International license
The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Given a training set P of n labeled points in ℝ^d, and an approximation parameter 0 < ε ≤ 1/2, any unlabeled query point should be classified with the class of any of its ε-approximate nearest-neighbors in P. Answering these queries efficiently has been the focus of extensive research, proposing techniques that are mainly tailored towards resolving the more general problem of ε-approximate nearest-neighbor search. While the latest can only hope to provide query time and space complexities dependent on n, the problem of nearest-neighbor classification accepts other parameters more suitable to its analysis. Such is the number k_ε of ε-border points, which describes the complexity of boundaries between sets of points of different classes. This paper presents a new data structure called Chromatic AVD. This is the first approach for ε-approximate nearest-neighbor classification whose space and query time complexities are only dependent on ε, k_ε and d, while being independent on both n and Δ, the spread of P.
@InProceedings{floresvelazco_et_al:LIPIcs.ESA.2021.44,
author = {Flores-Velazco, Alejandro and Mount, David M.},
title = {{Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification}},
booktitle = {29th Annual European Symposium on Algorithms (ESA 2021)},
pages = {44:1--44:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-204-4},
ISSN = {1868-8969},
year = {2021},
volume = {204},
editor = {Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.44},
URN = {urn:nbn:de:0030-drops-146252},
doi = {10.4230/LIPIcs.ESA.2021.44},
annote = {Keywords: approximate nearest-neighbor searching, nearest-neighbor classification, geometric data structures, space-time tradeoffs}
}