Interactive Proofs for Verifying Machine Learning

Goldwasser, Shafi; Rothblum, Guy N.; Shafer, Jonathan; Yehudayoff, Amir

doi:10.4230/LIPIcs.ITCS.2021.41

Abstract

We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is "approximately correct"? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept good hypotheses, and reject bad hypotheses. Both the verifier and the prover are efficient and have access to labeled data samples from an unknown distribution. We are interested in cases where the verifier can use significantly less data than is required for (agnostic) PAC learning, or use a substantially cheaper data source (e.g., using only random samples for verification, even though learning requires membership queries). We believe that today, when data and data-driven algorithms are quickly gaining prominence, the question of verifying purported outcomes of data analyses is very well-motivated.
We show three main results. First, we prove that for a specific hypothesis class, verification is significantly cheaper than learning in terms of sample complexity, even if the verifier engages with the prover only in a single-round (NP-like) protocol. Moreover, for this class we prove that single-round verification is also significantly cheaper than testing closeness to the class. Second, for the broad class of Fourier-sparse boolean functions, we show a multi-round (IP-like) verification protocol, where the prover uses membership queries, and the verifier is able to assess the result while only using random samples. Third, we show that verification is not always more efficient. Namely, we show a class of functions where verification requires as many samples as learning does, up to a logarithmic factor.

Cite As Get BibTex

Shafi Goldwasser, Guy N. Rothblum, Jonathan Shafer, and Amir Yehudayoff. Interactive Proofs for Verifying Machine Learning. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 41:1-41:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.ITCS.2021.41

Author Details

Shafi Goldwasser

University of California, Berkeley, CA, USA

Guy N. Rothblum

Weizmann Institute of Science, Rehovot, Israel

Jonathan Shafer

University of California, Berkeley, CA, USA

Amir Yehudayoff

Technion - Israel Institute of Technology, Haifa, Israel

Funding

Goldwasser, Shafi: Research was supported in part by DARPA under Contract No. HR001120C0015.
Rothblum, Guy N.: This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 819702).
Yehudayoff, Amir: Research was supported by ISF grant No. 1162/15.

Acknowledgements

The authors would like to thank Oded Ben-David, Eyal Ben-David, Alessandro Chiesa, Constantinos Daskalakis, Oded Goldreich, Ankur Moitra, Ido Nachum, Orr Paradise and Avishay Tal for insightful discussions and helpful references.

References

Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Comput., 75(2):87-106, 1987. URL: https://doi.org/10.1016/0890-5401(87)90052-6.
Maria-Florina Balcan, Eric Blais, Avrim Blum, and Liu Yang. Active property testing. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pages 21-30. IEEE Computer Society, 2012. URL: https://doi.org/10.1109/FOCS.2012.64.
C. Glenn Begley and Lee M. Ellis. Raise standards for preclinical cancer research. Nature, 483(7391):531-533, 2012.
Avrim Blum and Lunjia Hu. Active tolerant testing. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Conference On Learning Theory, COLT 2018, Stockholm, Sweden, 6-9 July 2018, volume 75 of Proceedings of Machine Learning Research, pages 474-497. PMLR, 2018. URL: http://proceedings.mlr.press/v75/blum18a.html.
Avrim Blum, Adam Kalai, and Hal Wasserman. Noise-tolerant learning, the parity problem, and the statistical query model. Journal of the ACM (JACM), 50(4):506-519, 2003.
Alessandro Chiesa and Tom Gur. Proofs of proximity for distribution testing. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 53:1-53:14, 2018. URL: https://doi.org/10.4230/LIPIcs.ITCS.2018.53.
Fiona Fidler and John Wilcox. Reproducibility of scientific results. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, winter 2018 edition, 2018.
Oded Goldreich. Foundations of cryptography: volume 1, basic tools. Cambridge university press, 2007.
Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653-750, 1998. URL: https://doi.org/10.1145/285055.285060.
Oded Goldreich and Leonid A Levin. A hard-core predicate for all one-way functions. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 25-32. ACM, 1989.
Shafi Goldwasser, Yael Tauman Kalai, and Guy N. Rothblum. Delegating computation: Interactive proofs for muggles. J. ACM, 62(4):27:1-27:64, 2015. URL: https://doi.org/10.1145/2699436.
Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof systems. SIAM Journal on computing, 18(1):186-208, 1989.
Shafi Goldwasser, Guy N. Rothblum, Jonathan Shafer, and Amir Yehudayoff. Interactive proofs for verifying machine learning. Electron. Colloquium Comput. Complex., 27:58, 2020. URL: https://eccc.weizmann.ac.il/report/2020/058.
John PA Ioannidis. Why most published research findings are false. PLoS medicine, 2(8):e124, 2005.
Michael J. Kearns and Dana Ron. Testing problems with sublearning sample complexity. J. Comput. Syst. Sci., 61(3):428-456, 2000. URL: https://doi.org/10.1006/jcss.1999.1656.
Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the Fourier spectrum. SIAM Journal on Computing, 22(6):1331-1348, 1993.
Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform, and learnability. Journal of the ACM (JACM), 40(3):607-620, 1993.
Yishay Mansour. Learning boolean functions via the Fourier transform. In Theoretical advances in neural computation and learning, pages 391-424. Springer, 1994.
Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Tolerant property testing and distance approximation. J. Comput. Syst. Sci., 72(6):1012-1042, 2006. URL: https://doi.org/10.1016/j.jcss.2006.03.002.
Harold Pashler and Eric-Jan Wagenmakers. Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science, 7(6):528-530, 2012.
Florian Prinz, Thomas Schlange, and Khusru Asadullah. Believe it or not: how much can we rely on published data on potential drug targets? Nature reviews Drug discovery, 10(9):712-712, 2011.
Leslie G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, 1984.
Michael Walfish and Andrew J. Blumberg. Verifying computations without reexecuting them. Commun. ACM, 58(2):74-84, 2015. URL: https://doi.org/10.1145/2641562.
Yu Yu and John Steinberger. Pseudorandom functions in almost constant depth from low-noise LPN. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 154-183. Springer, 2016.

Interactive Proofs for Verifying Machine Learning

Authors Shafi Goldwasser , Guy N. Rothblum , Jonathan Shafer , Amir Yehudayoff

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Interactive Proofs for Verifying Machine Learning

Authors Shafi Goldwasser , Guy N. Rothblum , Jonathan Shafer , Amir Yehudayoff

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message