Improved Search of Relevant Points for Nearest-Neighbor Classification

Author Alejandro Flores-Velazco



PDF
Thumbnail PDF

File

LIPIcs.ESA.2022.54.pdf
  • Filesize: 0.84 MB
  • 10 pages

Document Identifiers

Author Details

Alejandro Flores-Velazco
  • Department of Computer Science, University of Maryland, College Park, MD, USA

Acknowledgements

Thanks to Prof. David Mount for pointing out Eppstein’s paper [Eppstein, 2022] and for the valuable discussions on the results presented in this paper.

Cite AsGet BibTex

Alejandro Flores-Velazco. Improved Search of Relevant Points for Nearest-Neighbor Classification. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 54:1-54:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ESA.2022.54

Abstract

Given a training set P ⊂ ℝ^d, the nearest-neighbor classifier assigns any query point q ∈ ℝ^d to the class of its closest point in P. To answer these classification queries, some training points are more relevant than others. We say a training point is relevant if its omission from the training set could induce the misclassification of some query point in ℝ^d. These relevant points are commonly known as border points, as they define the boundaries of the Voronoi diagram of P that separate points of different classes. Being able to compute this set of points efficiently is crucial to reduce the size of the training set without affecting the accuracy of the nearest-neighbor classifier. Improving over a decades-long result by Clarkson (FOCS'94), Eppstein (SOSA’22) recently proposed an output-sensitive algorithm to find the set of border points of P in 𝒪(n² + nk²) time, where k is the size of such set. In this paper, we improve this algorithm to have time complexity equal to 𝒪(nk²) by proving that the first phase of their algorithm, which requires 𝒪(n²) time, are unnecessary.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • nearest-neighbor classification
  • nearest-neighbor rule
  • decision boundaries
  • border points
  • relevant points

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Fabrizio Angiulli. Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering, 19(11):1450-1464, 2007. Google Scholar
  2. Sunil Arya, Guilherme D. da Fonseca, and David M. Mount. Optimal approximate polytope membership. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 270-288. SIAM, 2017. Google Scholar
  3. Sunil Arya, Guilherme D. Da Fonseca, and David M. Mount. Approximate polytope membership queries. SIAM Journal on Computing, 47(1):1-51, 2018. Google Scholar
  4. Sunil Arya, Theocharis Malamatos, and David M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. Journal of the ACM (JACM), 57(1):1, 2009. Google Scholar
  5. Oren Boiman, Eli Shechtman, and Michal Irani. In defense of nearest-neighbor based image classification. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE, 2008. Google Scholar
  6. David Bremner, Erik Demaine, Jeff Erickson, John Iacono, Stefan Langerman, Pat Morin, and Godfried Toussaint. Output-sensitive algorithms for computing nearest-neighbour decision boundaries. In Frank Dehne, Jörg-Rüdiger Sack, and Michiel Smid, editors, Algorithms and Data Structures: 8th International Workshop, WADS 2003, Ottawa, Ontario, Canada, July 30 - August 1, 2003. Proceedings, pages 451-461, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/978-3-540-45078-8_39.
  7. Timothy M Chan. Output-sensitive results on convex hulls, extreme points, and related problems. Discrete & Computational Geometry, 16(4):369-387, 1996. Google Scholar
  8. Kenneth L Clarkson. More output-sensitive geometric algorithms. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pages 695-702. IEEE, 1994. Google Scholar
  9. Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273-297, 1995. Google Scholar
  10. T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor., 13(1):21-27, January 1967. URL: https://doi.org/10.1109/TIT.1967.1053964.
  11. Luc Devroye. On the inequality of cover and hart in nearest neighbor discrimination. Pattern Analysis and Machine Intelligence, IEEE Transactions on, pages 75-78, 1981. Google Scholar
  12. David Eppstein. Finding relevant points for nearest-neighbor classification. In Symposium on Simplicity in Algorithms (SOSA), pages 68-78. SIAM, 2022. Google Scholar
  13. E. Fix and J. L. Hodges. Discriminatory analysis, nonparametric discrimination: Consistency properties. US Air Force School of Aviation Medicine, Technical Report 4(3):477+, January 1951. Google Scholar
  14. Alejandro Flores-Velazco. Social distancing is good for points too! In Proceedings of the 32st Canadian Conference on Computational Geometry, CCCG 2020, August 5-7, 2020, University of Saskatchewan, Saskatoon, Saskatchewan, Canada, 2020. Google Scholar
  15. Alejandro Flores-Velazco and David M. Mount. Guarantees on nearest-neighbor condensation heuristics. In Proceedings of the 31st Canadian Conference on Computational Geometry, CCCG 2019, August 8-10, 2019, University of Alberta, Edmonton, Alberta, Canada, 2019. Google Scholar
  16. Alejandro Flores-Velazco and David M. Mount. Coresets for the Nearest-Neighbor Rule. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 47:1-47:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.47.
  17. Alejandro Flores-Velazco and David M. Mount. Boundary-sensitive approach for approximate nearest-neighbor classification. In Petra Mutzel, Rasmus Pagh, and Grzegorz Herman, editors, 29th Annual European Symposium on Algorithms, ESA 2021, September 6-8, 2021, Lisbon, Portugal (Virtual Conference), volume 204 of LIPIcs, pages 44:1-44:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ESA.2021.44.
  18. Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning, 2020. URL: http://arxiv.org/abs/1908.10396.
  19. Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization, 2020. URL: http://arxiv.org/abs/1908.10396.
  20. Sariel Har-Peled. A replacement for Voronoi diagrams of near linear size. In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., pages 94-103, 2001. Google Scholar
  21. Norbert Jankowski and Marek Grochowski. Comparison of instances selection algorithms I. Algorithms survey. In Artificial Intelligence and Soft Computing-ICAISC 2004, pages 598-603. Springer, 2004. Google Scholar
  22. Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. arXiv preprint, 2017. URL: http://arxiv.org/abs/1702.08734.
  23. Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus, 2017. URL: http://arxiv.org/abs/1702.08734.
  24. Kamyar Khodamoradi, Ramesh Krishnamurti, and Bodhayan Roy. Consistent subset problem with two labels. In Conference on Algorithms and Discrete Applied Mathematics, pages 131-142. Springer, 2018. Google Scholar
  25. Marc Khoury and Dylan Hadfield-Menell. Adversarial training with Voronoi constraints. CoRR, abs/1905.01019, 2019. URL: http://arxiv.org/abs/1905.01019.
  26. Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint, 2018. URL: http://arxiv.org/abs/1803.04765.
  27. Neehar Peri, Neal Gupta, W. Ronny Huang, Liam Fowl, Chen Zhu, Soheil Feizi, Tom Goldstein, and John P. Dickerson. Deep k-nn defense against clean-label data poisoning attacks. In European Conference on Computer Vision, pages 55-70. Springer, 2020. Google Scholar
  28. Jürgen Schmidhuber. Deep learning in neural networks: An overview. CoRR, abs/1404.7828, 2014. URL: http://arxiv.org/abs/1404.7828.
  29. Chawin Sitawarin and David Wagner. On the robustness of deep k-nearest neighbors. In 2019 IEEE Security and Privacy Workshops (SPW), pages 1-7. IEEE, 2019. Google Scholar
  30. Charles J. Stone. Consistent nonparametric regression. The annals of statistics, pages 595-620, 1977. Google Scholar
  31. Gordon Wilfong. Nearest neighbor problems. In Proceedings of the Seventh Annual Symposium on Computational Geometry, SCG '91, pages 224-233, New York, NY, USA, 1991. ACM. URL: https://doi.org/10.1145/109648.109673.
  32. D. Randall Wilson and Tony R. Martinez. Instance pruning techniques. In Proceedings of the Fourteenth International Conference on Machine Learning, ICML '97, pages 403-411, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. URL: http://dl.acm.org/citation.cfm?id=645526.657143.
  33. A. V. Zukhba. NP-completeness of the problem of prototype selection in the nearest neighbor method. Pattern Recog. Image Anal., 20(4):484-494, December 2010. URL: https://doi.org/10.1134/S1054661810040097.