Certifiable Robustness for Nearest Neighbor Classifiers

Authors Austen Z. Fan , Paraschos Koutris

Thumbnail PDF


  • Filesize: 0.88 MB
  • 20 pages

Document Identifiers

Author Details

Austen Z. Fan
  • Department of Computer Sciences, University of Wisconsin-Madison, WI, USA
Paraschos Koutris
  • Department of Computer Sciences, University of Wisconsin-Madison, WI, USA

Cite AsGet BibTex

Austen Z. Fan and Paraschos Koutris. Certifiable Robustness for Nearest Neighbor Classifiers. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, k-Nearest Neighbors (k-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database theory
  • Theory of computation → Incomplete, inconsistent, and uncertain databases
  • Inconsistent databases
  • k-NN classification
  • certifiable robustness


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68-79. ACM Press, 1999. URL: https://doi.org/10.1145/303976.303983.
  2. Jan Chomicki. Consistent query answering: Five easy pieces. In ICDT, volume 4353 of Lecture Notes in Computer Science, pages 1-17. Springer, 2007. URL: https://doi.org/10.1007/11965893_1.
  3. Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 1310-1320. PMLR, 2019. URL: http://proceedings.mlr.press/v97/cohen19c.html.
  4. Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high-dimensions without the computational intractability. SIAM J. Comput., 48(2):742-864, 2019. URL: https://doi.org/10.1137/17M1126680.
  5. Samuel Drews, Aws Albarghouthi, and Loris D'Antoni. Proving data-poisoning robustness in decision trees. In PLDI, pages 1083-1097. ACM, 2020. URL: https://doi.org/10.1145/3385412.3385975.
  6. Martin E. Dyer, Leslie Ann Goldberg, Catherine S. Greenhill, and Mark Jerrum. The relative complexity of approximate counting problems. Algorithmica, 38(3):471-500, 2004. URL: https://doi.org/10.1007/s00453-003-1073-y.
  7. Austen Z. Fan and Paraschos Koutris. Certifiable robustness for nearest neighbor classifiers. CoRR, abs/2201.04770, 2022. URL: http://arxiv.org/abs/2201.04770.
  8. Ariel Fuxman and Renée J. Miller. First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci., 73(4):610-635, 2007. URL: https://doi.org/10.1016/j.jcss.2006.10.013.
  9. Leslie Ann Goldberg, Rob Gysel, and John Lapinskas. Approximately counting locally-optimal structures. J. Comput. Syst. Sci., 82(6):1144-1160, 2016. URL: https://doi.org/10.1016/j.jcss.2016.04.001.
  10. Sergio Greco, Cristian Molinaro, and Francesca Spezzano. Incomplete Data and Data Dependencies in Relational Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2012. URL: https://doi.org/10.2200/S00435ED1V01Y201207DTM029.
  11. Tomasz Imielinski and Witold Lipski Jr. Incomplete information in relational databases. J. ACM, 31(4):761-791, 1984. URL: https://doi.org/10.1145/1634.1886.
  12. Jinyuan Jia, Xiaoyu Cao, Binghui Wang, and Neil Zhenqiang Gong. Certified robustness for top-k predictions against adversarial perturbations via randomized smoothing. In ICLR. OpenReview.net, 2020. URL: https://openreview.net/forum?id=BkeWw6VFwr.
  13. Bojan Karlas, Peng Li, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, and Ce Zhang. Nearest neighbor classifiers over incomplete information: From certain answers to certain predictions. Proc. VLDB Endow., 14(3):255-267, 2020. URL: https://doi.org/10.5555/3430915.3442426.
  14. Phokion G. Kolaitis and Enela Pema. A dichotomy in the complexity of consistent query answering for queries with two atoms. Inf. Process. Lett., 112(3):77-85, 2012. URL: https://doi.org/10.1016/j.ipl.2011.10.018.
  15. Paraschos Koutris and Dan Suciu. A dichotomy on the complexity of consistent query answering for atoms with simple keys. In ICDT, pages 165-176. OpenProceedings.org, 2014. URL: https://doi.org/10.5441/002/icdt.2014.19.
  16. Paraschos Koutris and Jef Wijsen. The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In PODS, pages 17-29. ACM, 2015. URL: https://doi.org/10.1145/2745754.2745769.
  17. Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, and Ken Goldberg. Activeclean: Interactive data cleaning for statistical modeling. Proc. VLDB Endow., 9(12):948-959, 2016. URL: https://doi.org/10.14778/2994509.2994514.
  18. Aounon Kumar, Alexander Levine, Tom Goldstein, and Soheil Feizi. Curse of dimensionality on randomized smoothing for certifiable robustness. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 5458-5467. PMLR, 2020. URL: http://proceedings.mlr.press/v119/kumar20b.html.
  19. Leonid Libkin. Incomplete information and certain answers in general data models. In PODS, pages 59-70. ACM, 2011. URL: https://doi.org/10.1145/1989284.1989294.
  20. Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4:1-4:46, 2020. URL: https://doi.org/10.1145/3196959.3196980.
  21. Ester Livshits, Benny Kimelfeld, and Jef Wijsen. Counting subset repairs with functional dependencies. J. Comput. Syst. Sci., 117:154-164, 2021. URL: https://doi.org/10.1016/j.jcss.2020.10.001.
  22. Simon Razniewski and Werner Nutt. Completeness of queries over incomplete databases. Proc. VLDB Endow., 4(11):749-760, 2011. URL: http://www.vldb.org/pvldb/vol4/p749-razniewski.pdf.
  23. Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. Holoclean: Holistic data repairs with probabilistic inference. Proc. VLDB Endow., 10(11):1190-1201, 2017. URL: https://doi.org/10.14778/3137628.3137631.
  24. Elan Rosenfeld, Ezra Winston, Pradeep Ravikumar, and J. Zico Kolter. Certified robustness to label-flipping attacks via randomized smoothing. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 8230-8241. PMLR, 2020. URL: http://proceedings.mlr.press/v119/rosenfeld20b.html.
  25. Anish Das Sarma, Omar Benjelloun, Alon Y. Halevy, and Jennifer Widom. Working models for uncertain data. In ICDE, page 7. IEEE Computer Society, 2006. URL: https://doi.org/10.1109/ICDE.2006.174.
  26. Jacob Steinhardt, Pang Wei Koh, and Percy Liang. Certified defenses for data poisoning attacks. In NIPS, pages 3517-3529, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/9d7311ba459f9e45ed746755a32dcd11-Abstract.html.
  27. M. Yannakakis and F. Gavril. Edge dominating sets in graphs. SIAM Journal on Applied Mathematics, 38(3):364-372, 1980. URL: https://doi.org/10.1137/0138030.