Improved Search of Relevant Points for Nearest-Neighbor Classification

Flores-Velazco, Alejandro

doi:10.4230/LIPIcs.ESA.2022.54

File

Cite AsGet BibTex

Alejandro Flores-Velazco. Improved Search of Relevant Points for Nearest-Neighbor Classification. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 54:1-54:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ESA.2022.54

Abstract

Given a training set P ⊂ ℝ^d, the nearest-neighbor classifier assigns any query point q ∈ ℝ^d to the class of its closest point in P. To answer these classification queries, some training points are more relevant than others. We say a training point is relevant if its omission from the training set could induce the misclassification of some query point in ℝ^d. These relevant points are commonly known as border points, as they define the boundaries of the Voronoi diagram of P that separate points of different classes. Being able to compute this set of points efficiently is crucial to reduce the size of the training set without affecting the accuracy of the nearest-neighbor classifier. Improving over a decades-long result by Clarkson (FOCS'94), Eppstein (SOSA’22) recently proposed an output-sensitive algorithm to find the set of border points of P in 𝒪(n² + nk²) time, where k is the size of such set. In this paper, we improve this algorithm to have time complexity equal to 𝒪(nk²) by proving that the first phase of their algorithm, which requires 𝒪(n²) time, are unnecessary.

Subject Classification

ACM Subject Classification

Theory of computation → Computational geometry

Keywords

nearest-neighbor classification
nearest-neighbor rule
decision boundaries
border points
relevant points

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Fabrizio Angiulli. Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering, 19(11):1450-1464, 2007.
Sunil Arya, Guilherme D. da Fonseca, and David M. Mount. Optimal approximate polytope membership. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 270-288. SIAM, 2017.
Sunil Arya, Guilherme D. Da Fonseca, and David M. Mount. Approximate polytope membership queries. SIAM Journal on Computing, 47(1):1-51, 2018.
Sunil Arya, Theocharis Malamatos, and David M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. Journal of the ACM (JACM), 57(1):1, 2009.
Oren Boiman, Eli Shechtman, and Michal Irani. In defense of nearest-neighbor based image classification. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE, 2008.
David Bremner, Erik Demaine, Jeff Erickson, John Iacono, Stefan Langerman, Pat Morin, and Godfried Toussaint. Output-sensitive algorithms for computing nearest-neighbour decision boundaries. In Frank Dehne, Jörg-Rüdiger Sack, and Michiel Smid, editors, Algorithms and Data Structures: 8th International Workshop, WADS 2003, Ottawa, Ontario, Canada, July 30 - August 1, 2003. Proceedings, pages 451-461, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/978-3-540-45078-8_39.
Timothy M Chan. Output-sensitive results on convex hulls, extreme points, and related problems. Discrete & Computational Geometry, 16(4):369-387, 1996.
Kenneth L Clarkson. More output-sensitive geometric algorithms. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pages 695-702. IEEE, 1994.
Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273-297, 1995.
T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor., 13(1):21-27, January 1967. URL: https://doi.org/10.1109/TIT.1967.1053964.
Luc Devroye. On the inequality of cover and hart in nearest neighbor discrimination. Pattern Analysis and Machine Intelligence, IEEE Transactions on, pages 75-78, 1981.
David Eppstein. Finding relevant points for nearest-neighbor classification. In Symposium on Simplicity in Algorithms (SOSA), pages 68-78. SIAM, 2022.
E. Fix and J. L. Hodges. Discriminatory analysis, nonparametric discrimination: Consistency properties. US Air Force School of Aviation Medicine, Technical Report 4(3):477+, January 1951.
Alejandro Flores-Velazco. Social distancing is good for points too! In Proceedings of the 32st Canadian Conference on Computational Geometry, CCCG 2020, August 5-7, 2020, University of Saskatchewan, Saskatoon, Saskatchewan, Canada, 2020.
Alejandro Flores-Velazco and David M. Mount. Guarantees on nearest-neighbor condensation heuristics. In Proceedings of the 31st Canadian Conference on Computational Geometry, CCCG 2019, August 8-10, 2019, University of Alberta, Edmonton, Alberta, Canada, 2019.
Alejandro Flores-Velazco and David M. Mount. Coresets for the Nearest-Neighbor Rule. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 47:1-47:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.47.
Alejandro Flores-Velazco and David M. Mount. Boundary-sensitive approach for approximate nearest-neighbor classification. In Petra Mutzel, Rasmus Pagh, and Grzegorz Herman, editors, 29th Annual European Symposium on Algorithms, ESA 2021, September 6-8, 2021, Lisbon, Portugal (Virtual Conference), volume 204 of LIPIcs, pages 44:1-44:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ESA.2021.44.
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning, 2020. URL: http://arxiv.org/abs/1908.10396.
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization, 2020. URL: http://arxiv.org/abs/1908.10396.
Sariel Har-Peled. A replacement for Voronoi diagrams of near linear size. In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., pages 94-103, 2001.
Norbert Jankowski and Marek Grochowski. Comparison of instances selection algorithms I. Algorithms survey. In Artificial Intelligence and Soft Computing-ICAISC 2004, pages 598-603. Springer, 2004.
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. arXiv preprint, 2017. URL: http://arxiv.org/abs/1702.08734.
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus, 2017. URL: http://arxiv.org/abs/1702.08734.
Kamyar Khodamoradi, Ramesh Krishnamurti, and Bodhayan Roy. Consistent subset problem with two labels. In Conference on Algorithms and Discrete Applied Mathematics, pages 131-142. Springer, 2018.
Marc Khoury and Dylan Hadfield-Menell. Adversarial training with Voronoi constraints. CoRR, abs/1905.01019, 2019. URL: http://arxiv.org/abs/1905.01019.
Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint, 2018. URL: http://arxiv.org/abs/1803.04765.
Neehar Peri, Neal Gupta, W. Ronny Huang, Liam Fowl, Chen Zhu, Soheil Feizi, Tom Goldstein, and John P. Dickerson. Deep k-nn defense against clean-label data poisoning attacks. In European Conference on Computer Vision, pages 55-70. Springer, 2020.
Jürgen Schmidhuber. Deep learning in neural networks: An overview. CoRR, abs/1404.7828, 2014. URL: http://arxiv.org/abs/1404.7828.
Chawin Sitawarin and David Wagner. On the robustness of deep k-nearest neighbors. In 2019 IEEE Security and Privacy Workshops (SPW), pages 1-7. IEEE, 2019.
Charles J. Stone. Consistent nonparametric regression. The annals of statistics, pages 595-620, 1977.
Gordon Wilfong. Nearest neighbor problems. In Proceedings of the Seventh Annual Symposium on Computational Geometry, SCG '91, pages 224-233, New York, NY, USA, 1991. ACM. URL: https://doi.org/10.1145/109648.109673.
D. Randall Wilson and Tony R. Martinez. Instance pruning techniques. In Proceedings of the Fourteenth International Conference on Machine Learning, ICML '97, pages 403-411, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. URL: http://dl.acm.org/citation.cfm?id=645526.657143.
A. V. Zukhba. NP-completeness of the problem of prototype selection in the nearest neighbor method. Pattern Recog. Image Anal., 20(4):484-494, December 2010. URL: https://doi.org/10.1134/S1054661810040097.

Improved Search of Relevant Points for Nearest-Neighbor Classification

Author Alejandro Flores-Velazco

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message