Certifiable Robustness for Nearest Neighbor Classifiers

Fan, Austen Z.; Koutris, Paraschos

doi:10.4230/LIPIcs.ICDT.2022.6

Abstract

ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, k-Nearest Neighbors (k-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.

Cite As Get BibTex

Austen Z. Fan and Paraschos Koutris. Certifiable Robustness for Nearest Neighbor Classifiers. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ICDT.2022.6

Author Details

Austen Z. Fan

Department of Computer Sciences, University of Wisconsin-Madison, WI, USA

Paraschos Koutris

Department of Computer Sciences, University of Wisconsin-Madison, WI, USA

Funding

This research was supported in part by National Science Foundation grants CRII-1850348 and III-1910014, as well as a gift by Google.

Supplementary Materials

Audiovisual (Video of the Presentation) https://doi.org/10.5446/57492

References

Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68-79. ACM Press, 1999. URL: https://doi.org/10.1145/303976.303983.
Jan Chomicki. Consistent query answering: Five easy pieces. In ICDT, volume 4353 of Lecture Notes in Computer Science, pages 1-17. Springer, 2007. URL: https://doi.org/10.1007/11965893_1.
Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 1310-1320. PMLR, 2019. URL: http://proceedings.mlr.press/v97/cohen19c.html.
Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high-dimensions without the computational intractability. SIAM J. Comput., 48(2):742-864, 2019. URL: https://doi.org/10.1137/17M1126680.
Samuel Drews, Aws Albarghouthi, and Loris D'Antoni. Proving data-poisoning robustness in decision trees. In PLDI, pages 1083-1097. ACM, 2020. URL: https://doi.org/10.1145/3385412.3385975.
Martin E. Dyer, Leslie Ann Goldberg, Catherine S. Greenhill, and Mark Jerrum. The relative complexity of approximate counting problems. Algorithmica, 38(3):471-500, 2004. URL: https://doi.org/10.1007/s00453-003-1073-y.
Austen Z. Fan and Paraschos Koutris. Certifiable robustness for nearest neighbor classifiers. CoRR, abs/2201.04770, 2022. URL: http://arxiv.org/abs/2201.04770.
Ariel Fuxman and Renée J. Miller. First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci., 73(4):610-635, 2007. URL: https://doi.org/10.1016/j.jcss.2006.10.013.
Leslie Ann Goldberg, Rob Gysel, and John Lapinskas. Approximately counting locally-optimal structures. J. Comput. Syst. Sci., 82(6):1144-1160, 2016. URL: https://doi.org/10.1016/j.jcss.2016.04.001.
Sergio Greco, Cristian Molinaro, and Francesca Spezzano. Incomplete Data and Data Dependencies in Relational Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2012. URL: https://doi.org/10.2200/S00435ED1V01Y201207DTM029.
Tomasz Imielinski and Witold Lipski Jr. Incomplete information in relational databases. J. ACM, 31(4):761-791, 1984. URL: https://doi.org/10.1145/1634.1886.
Jinyuan Jia, Xiaoyu Cao, Binghui Wang, and Neil Zhenqiang Gong. Certified robustness for top-k predictions against adversarial perturbations via randomized smoothing. In ICLR. OpenReview.net, 2020. URL: https://openreview.net/forum?id=BkeWw6VFwr.
Bojan Karlas, Peng Li, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, and Ce Zhang. Nearest neighbor classifiers over incomplete information: From certain answers to certain predictions. Proc. VLDB Endow., 14(3):255-267, 2020. URL: https://doi.org/10.5555/3430915.3442426.
Phokion G. Kolaitis and Enela Pema. A dichotomy in the complexity of consistent query answering for queries with two atoms. Inf. Process. Lett., 112(3):77-85, 2012. URL: https://doi.org/10.1016/j.ipl.2011.10.018.
Paraschos Koutris and Dan Suciu. A dichotomy on the complexity of consistent query answering for atoms with simple keys. In ICDT, pages 165-176. OpenProceedings.org, 2014. URL: https://doi.org/10.5441/002/icdt.2014.19.
Paraschos Koutris and Jef Wijsen. The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In PODS, pages 17-29. ACM, 2015. URL: https://doi.org/10.1145/2745754.2745769.
Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, and Ken Goldberg. Activeclean: Interactive data cleaning for statistical modeling. Proc. VLDB Endow., 9(12):948-959, 2016. URL: https://doi.org/10.14778/2994509.2994514.
Aounon Kumar, Alexander Levine, Tom Goldstein, and Soheil Feizi. Curse of dimensionality on randomized smoothing for certifiable robustness. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 5458-5467. PMLR, 2020. URL: http://proceedings.mlr.press/v119/kumar20b.html.
Leonid Libkin. Incomplete information and certain answers in general data models. In PODS, pages 59-70. ACM, 2011. URL: https://doi.org/10.1145/1989284.1989294.
Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4:1-4:46, 2020. URL: https://doi.org/10.1145/3196959.3196980.
Ester Livshits, Benny Kimelfeld, and Jef Wijsen. Counting subset repairs with functional dependencies. J. Comput. Syst. Sci., 117:154-164, 2021. URL: https://doi.org/10.1016/j.jcss.2020.10.001.
Simon Razniewski and Werner Nutt. Completeness of queries over incomplete databases. Proc. VLDB Endow., 4(11):749-760, 2011. URL: http://www.vldb.org/pvldb/vol4/p749-razniewski.pdf.
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. Holoclean: Holistic data repairs with probabilistic inference. Proc. VLDB Endow., 10(11):1190-1201, 2017. URL: https://doi.org/10.14778/3137628.3137631.
Elan Rosenfeld, Ezra Winston, Pradeep Ravikumar, and J. Zico Kolter. Certified robustness to label-flipping attacks via randomized smoothing. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 8230-8241. PMLR, 2020. URL: http://proceedings.mlr.press/v119/rosenfeld20b.html.
Anish Das Sarma, Omar Benjelloun, Alon Y. Halevy, and Jennifer Widom. Working models for uncertain data. In ICDE, page 7. IEEE Computer Society, 2006. URL: https://doi.org/10.1109/ICDE.2006.174.
Jacob Steinhardt, Pang Wei Koh, and Percy Liang. Certified defenses for data poisoning attacks. In NIPS, pages 3517-3529, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/9d7311ba459f9e45ed746755a32dcd11-Abstract.html.
M. Yannakakis and F. Gavril. Edge dominating sets in graphs. SIAM Journal on Applied Mathematics, 38(3):364-372, 1980. URL: https://doi.org/10.1137/0138030.

Certifiable Robustness for Nearest Neighbor Classifiers

Authors Austen Z. Fan , Paraschos Koutris

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Certifiable Robustness for Nearest Neighbor Classifiers

Authors Austen Z. Fan , Paraschos Koutris

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message