Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification

Authors Alejandro Flores-Velazco , David M. Mount



PDF
Thumbnail PDF

File

LIPIcs.ESA.2021.44.pdf
  • Filesize: 1.25 MB
  • 15 pages

Document Identifiers

Author Details

Alejandro Flores-Velazco
  • Department of Computer Science, University of Maryland, College Park, MD, USA
David M. Mount
  • Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA

Cite AsGet BibTex

Alejandro Flores-Velazco and David M. Mount. Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 44:1-44:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ESA.2021.44

Abstract

The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Given a training set P of n labeled points in ℝ^d, and an approximation parameter 0 < ε ≤ 1/2, any unlabeled query point should be classified with the class of any of its ε-approximate nearest-neighbors in P. Answering these queries efficiently has been the focus of extensive research, proposing techniques that are mainly tailored towards resolving the more general problem of ε-approximate nearest-neighbor search. While the latest can only hope to provide query time and space complexities dependent on n, the problem of nearest-neighbor classification accepts other parameters more suitable to its analysis. Such is the number k_ε of ε-border points, which describes the complexity of boundaries between sets of points of different classes. This paper presents a new data structure called Chromatic AVD. This is the first approach for ε-approximate nearest-neighbor classification whose space and query time complexities are only dependent on ε, k_ε and d, while being independent on both n and Δ, the spread of P.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • approximate nearest-neighbor searching
  • nearest-neighbor classification
  • geometric data structures
  • space-time tradeoffs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn. Approximate nearest neighbor search in high dimensions. arXiv preprint arXiv:1806.09823, 2018. Google Scholar
  2. Fabrizio Angiulli. Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering, 19(11):1450-1464, 2007. Google Scholar
  3. Sunil Arya, Guilherme D. da Fonseca, and David M. Mount. Optimal approximate polytope membership. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 270-288. SIAM, 2017. Google Scholar
  4. Sunil Arya, Guilherme D. Da Fonseca, and David M. Mount. Approximate polytope membership queries. SIAM Journal on Computing, 47(1):1-51, 2018. Google Scholar
  5. Sunil Arya, Theocharis Malamatos, and David M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. Journal of the ACM (JACM), 57(1):1, 2009. Google Scholar
  6. Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6):891-923, 1998. URL: https://doi.org/10.1145/293347.293348.
  7. Ricardo Barandela, Francesc J Ferri, and J Salvador Sánchez. Decision boundary preserving prototype selection for nearest neighbor classification. International Journal of Pattern Recognition and Artificial Intelligence, 19(06):787-806, 2005. Google Scholar
  8. Ahmad Biniaz, Sergio Cabello, Paz Carmi, Jean-Lou De Carufel, Anil Maheshwari, Saeed Mehrabi, and Michiel Smid. On the minimum consistent subset problem. In Workshop on Algorithms and Data Structures, pages 155-167. Springer, 2019. Google Scholar
  9. Oren Boiman, Eli Shechtman, and Michal Irani. In defense of nearest-neighbor based image classification. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE, 2008. Google Scholar
  10. Paul B. Callahan and S. Rao Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. Journal of the ACM (JACM), 42(1):67-90, 1995. Google Scholar
  11. Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273-297, 1995. Google Scholar
  12. T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor., 13(1):21-27, January 1967. URL: https://doi.org/10.1109/TIT.1967.1053964.
  13. Luc Devroye. On the inequality of cover and hart in nearest neighbor discrimination. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (1):75-78, 1981. Google Scholar
  14. Raphael A. Finkel and Jon Louis Bentley. Quad trees a data structure for retrieval on composite keys. Acta informatica, 4(1):1-9, 1974. Google Scholar
  15. E. Fix and J. L. Hodges. Discriminatory analysis, nonparametric discrimination: Consistency properties. US Air Force School of Aviation Medicine, Technical Report 4(3):477+, 1951. Google Scholar
  16. Alejandro Flores-Velazco. Social distancing is good for points too! In Proceedings of the 32st Canadian Conference on Computational Geometry, CCCG 2020, August 5-7, 2020, University of Saskatchewan, Saskatoon, Saskatchewan, Canada, 2020. Google Scholar
  17. Alejandro Flores-Velazco and David M. Mount. Guarantees on nearest-neighbor condensation heuristics. In Proceedings of the 31st Canadian Conference on Computational Geometry, CCCG 2019, August 8-10, 2019, University of Alberta, Edmonton, Alberta, Canada, 2019. Google Scholar
  18. Alejandro Flores-Velazco and David M. Mount. Coresets for the nearest-neighbor rule, 2020. URL: http://arxiv.org/abs/2002.06650.
  19. Alejandro Flores-Velazco and David M. Mount. Coresets for the Nearest-Neighbor Rule. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 47:1-47:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.47.
  20. Alejandro Flores-Velazco and David M. Mount. Guarantees on nearest-neighbor condensation heuristics. Computational Geometry, 95:101732, 2021. URL: https://doi.org/10.1016/j.comgeo.2020.101732.
  21. Lee-Ad Gottlieb, Aryeh Kontorovich, and Pinhas Nisnevitch. Near-optimal sample compression for nearest neighbors. In Advances in Neural Information Processing Systems, pages 370-378, 2014. Google Scholar
  22. Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning, 2020. URL: https://arxiv.org/abs/1908.10396.
  23. Sariel Har-Peled. A replacement for Voronoi diagrams of near linear size. In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., pages 94-103, 2001. Google Scholar
  24. P. Hart. The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theor., 14(3):515-516, 1968. URL: https://doi.org/10.1109/TIT.1968.1054155.
  25. Norbert Jankowski and Marek Grochowski. Comparison of instances selection algorithms I. Algorithms survey. In Artificial Intelligence and Soft Computing-ICAISC 2004, pages 598-603. Springer, 2004. Google Scholar
  26. Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734, 2017. Google Scholar
  27. Marc Khoury and Dylan Hadfield-Menell. Adversarial training with Voronoi constraints. CoRR, abs/1905.01019, 2019. URL: http://arxiv.org/abs/1905.01019.
  28. Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765, 2018. Google Scholar
  29. Neehar Peri, Neal Gupta, W. Ronny Huang, Liam Fowl, Chen Zhu, Soheil Feizi, Tom Goldstein, and John P. Dickerson. Deep k-nn defense against clean-label data poisoning attacks. In European Conference on Computer Vision, pages 55-70. Springer, 2020. Google Scholar
  30. G. L. Ritter, H. B. Woodruff, S. R. Lowry, and T. L. Isenhour. An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21(6):665-669, 1975. Google Scholar
  31. Hanan Samet. The design and analysis of spatial data structures, volume 85. Addison-Wesley Reading, MA, 1990. Google Scholar
  32. Jürgen Schmidhuber. Deep learning in neural networks: An overview. CoRR, abs/1404.7828, 2014. URL: http://arxiv.org/abs/1404.7828.
  33. Chawin Sitawarin and David Wagner. On the robustness of deep k-nearest neighbors. In 2019 IEEE Security and Privacy Workshops (SPW), pages 1-7. IEEE, 2019. Google Scholar
  34. Charles J. Stone. Consistent nonparametric regression. The annals of statistics, pages 595-620, 1977. Google Scholar
  35. Godfried Toussaint. Open problems in geometric methods for instance-based learning. In Jin Akiyama and Mikio Kano, editors, JCDCG, volume 2866 of Lecture Notes in Computer Science, pages 273-283. Springer, 2002. URL: https://doi.org/10.1007/978-3-540-44400-8_29.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail