Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification

Flores-Velazco, Alejandro; Mount, David M.

doi:10.4230/LIPIcs.ESA.2021.44

File

Subject Classification

ACM Subject Classification

Theory of computation → Computational geometry

Keywords

approximate nearest-neighbor searching
nearest-neighbor classification
geometric data structures
space-time tradeoffs

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Given a training set P of n labeled points in ℝ^d, and an approximation parameter 0 < ε ≤ 1/2, any unlabeled query point should be classified with the class of any of its ε-approximate nearest-neighbors in P. Answering these queries efficiently has been the focus of extensive research, proposing techniques that are mainly tailored towards resolving the more general problem of ε-approximate nearest-neighbor search. While the latest can only hope to provide query time and space complexities dependent on n, the problem of nearest-neighbor classification accepts other parameters more suitable to its analysis. Such is the number k_ε of ε-border points, which describes the complexity of boundaries between sets of points of different classes. This paper presents a new data structure called Chromatic AVD. This is the first approach for ε-approximate nearest-neighbor classification whose space and query time complexities are only dependent on ε, k_ε and d, while being independent on both n and Δ, the spread of P.

Cite As Get BibTex

Alejandro Flores-Velazco and David M. Mount. Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 44:1-44:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.ESA.2021.44

Author Details

Alejandro Flores-Velazco

Department of Computer Science, University of Maryland, College Park, MD, USA

David M. Mount

Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA

References

Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn. Approximate nearest neighbor search in high dimensions. arXiv preprint arXiv:1806.09823, 2018.
Fabrizio Angiulli. Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering, 19(11):1450-1464, 2007.
Sunil Arya, Guilherme D. da Fonseca, and David M. Mount. Optimal approximate polytope membership. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 270-288. SIAM, 2017.
Sunil Arya, Guilherme D. Da Fonseca, and David M. Mount. Approximate polytope membership queries. SIAM Journal on Computing, 47(1):1-51, 2018.
Sunil Arya, Theocharis Malamatos, and David M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. Journal of the ACM (JACM), 57(1):1, 2009.
Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6):891-923, 1998. URL: https://doi.org/10.1145/293347.293348.
Ricardo Barandela, Francesc J Ferri, and J Salvador Sánchez. Decision boundary preserving prototype selection for nearest neighbor classification. International Journal of Pattern Recognition and Artificial Intelligence, 19(06):787-806, 2005.
Ahmad Biniaz, Sergio Cabello, Paz Carmi, Jean-Lou De Carufel, Anil Maheshwari, Saeed Mehrabi, and Michiel Smid. On the minimum consistent subset problem. In Workshop on Algorithms and Data Structures, pages 155-167. Springer, 2019.
Oren Boiman, Eli Shechtman, and Michal Irani. In defense of nearest-neighbor based image classification. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE, 2008.
Paul B. Callahan and S. Rao Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. Journal of the ACM (JACM), 42(1):67-90, 1995.
Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273-297, 1995.
T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor., 13(1):21-27, January 1967. URL: https://doi.org/10.1109/TIT.1967.1053964.
Luc Devroye. On the inequality of cover and hart in nearest neighbor discrimination. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (1):75-78, 1981.
Raphael A. Finkel and Jon Louis Bentley. Quad trees a data structure for retrieval on composite keys. Acta informatica, 4(1):1-9, 1974.
E. Fix and J. L. Hodges. Discriminatory analysis, nonparametric discrimination: Consistency properties. US Air Force School of Aviation Medicine, Technical Report 4(3):477+, 1951.
Alejandro Flores-Velazco. Social distancing is good for points too! In Proceedings of the 32st Canadian Conference on Computational Geometry, CCCG 2020, August 5-7, 2020, University of Saskatchewan, Saskatoon, Saskatchewan, Canada, 2020.
Alejandro Flores-Velazco and David M. Mount. Guarantees on nearest-neighbor condensation heuristics. In Proceedings of the 31st Canadian Conference on Computational Geometry, CCCG 2019, August 8-10, 2019, University of Alberta, Edmonton, Alberta, Canada, 2019.
Alejandro Flores-Velazco and David M. Mount. Coresets for the nearest-neighbor rule, 2020. URL: http://arxiv.org/abs/2002.06650.
Alejandro Flores-Velazco and David M. Mount. Coresets for the Nearest-Neighbor Rule. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 47:1-47:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.47.
Alejandro Flores-Velazco and David M. Mount. Guarantees on nearest-neighbor condensation heuristics. Computational Geometry, 95:101732, 2021. URL: https://doi.org/10.1016/j.comgeo.2020.101732.
Lee-Ad Gottlieb, Aryeh Kontorovich, and Pinhas Nisnevitch. Near-optimal sample compression for nearest neighbors. In Advances in Neural Information Processing Systems, pages 370-378, 2014.
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning, 2020. URL: https://arxiv.org/abs/1908.10396.
Sariel Har-Peled. A replacement for Voronoi diagrams of near linear size. In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., pages 94-103, 2001.
P. Hart. The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theor., 14(3):515-516, 1968. URL: https://doi.org/10.1109/TIT.1968.1054155.
Norbert Jankowski and Marek Grochowski. Comparison of instances selection algorithms I. Algorithms survey. In Artificial Intelligence and Soft Computing-ICAISC 2004, pages 598-603. Springer, 2004.
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734, 2017.
Marc Khoury and Dylan Hadfield-Menell. Adversarial training with Voronoi constraints. CoRR, abs/1905.01019, 2019. URL: http://arxiv.org/abs/1905.01019.
Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765, 2018.
Neehar Peri, Neal Gupta, W. Ronny Huang, Liam Fowl, Chen Zhu, Soheil Feizi, Tom Goldstein, and John P. Dickerson. Deep k-nn defense against clean-label data poisoning attacks. In European Conference on Computer Vision, pages 55-70. Springer, 2020.
G. L. Ritter, H. B. Woodruff, S. R. Lowry, and T. L. Isenhour. An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21(6):665-669, 1975.
Hanan Samet. The design and analysis of spatial data structures, volume 85. Addison-Wesley Reading, MA, 1990.
Jürgen Schmidhuber. Deep learning in neural networks: An overview. CoRR, abs/1404.7828, 2014. URL: http://arxiv.org/abs/1404.7828.
Chawin Sitawarin and David Wagner. On the robustness of deep k-nearest neighbors. In 2019 IEEE Security and Privacy Workshops (SPW), pages 1-7. IEEE, 2019.
Charles J. Stone. Consistent nonparametric regression. The annals of statistics, pages 595-620, 1977.
Godfried Toussaint. Open problems in geometric methods for instance-based learning. In Jin Akiyama and Mikio Kano, editors, JCDCG, volume 2866 of Lecture Notes in Computer Science, pages 273-283. Springer, 2002. URL: https://doi.org/10.1007/978-3-540-44400-8_29.

Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification

Authors Alejandro Flores-Velazco , David M. Mount

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification

Authors Alejandro Flores-Velazco , David M. Mount

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message