A New Analysis of Differential Privacy’s Generalization Guarantees

Jung, Christopher; Ligett, Katrina; Neel, Seth; Roth, Aaron; Sharifi-Malvajerdi, Saeed; Shenfeld, Moshe

doi:10.4230/LIPIcs.ITCS.2020.31

Abstract

We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the conditional distribution on datasets induced by the transcript of the interaction is close to its expectation on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to the expectation of the query on the conditional distribution. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the conditional distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids the "monitor argument" used to derive high probability bounds in prior work.
An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds, even though the improvements are in the constants, rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform the naive "sample-splitting" baseline at dramatically smaller dataset sizes compared to the previous state of the art, bringing techniques from this literature closer to practicality.

Cite As Get BibTex

Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Moshe Shenfeld. A New Analysis of Differential Privacy’s Generalization Guarantees. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 31:1-31:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.ITCS.2020.31

Author Details

Christopher Jung

University of Pennsylvania, Philadelphia, PA, USA

Katrina Ligett

The Hebrew University, Jerusalem, Israel

Seth Neel

University of Pennsylvania, Philadelphia, PA, USA

Aaron Roth

University of Pennsylvania, Philadelphia, PA, USA

Saeed Sharifi-Malvajerdi

University of Pennsylvania, Philadelphia, PA, USA

Moshe Shenfeld

The Hebrew University, Jerusalem, Israel

Funding

Jung, Christopher: Supported in part by NSF grant AF-1763307.
Ligett, Katrina: Supported in part by Israel Science Foundation (ISF) grant #1044/16, the United States Air Force and DARPA under contracts FA8750-16-C-0022 and FA8750-19-2-0222, and the Federmann Cyber Security Center in conjunction with the Israel national cyber directorate.
Neel, Seth: Supported in part by an NSF Graduate Research Fellowship.
Roth, Aaron: Supported in part by NSF grant AF-1763314, the United States Air Force and DARPA under Contract No FA8750-16-C-0022, and a grant from the Sloan Foundation.
Shenfeld, Moshe: Supported in part by Israel Science Foundation (ISF) grant #1044/16, the United States Air Force and DARPA under contracts FA8750-16-C-0022 and FA8750-19-2-0222, and the Federmann Cyber Security Center in conjunction with the Israel national cyber directorate. Any opinions,findings and conclusions or recommendations expressed in this materialare those of the author(s) and do not necessarily reflect the views ofthe United States Air Force and DARPA.

Acknowledgements

We thank Adam Smith for helpful conversations at an early stage of this work, and Daniel Roy for helpful feedback on the presentation of the result.

References

Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, and Jonathan Ullman. Algorithmic stability for adaptive data analysis. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 1046-1059. ACM, 2016.
Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635-658. Springer, 2016.
Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, and Zhiwei Steven Wu. Adaptive learning with robust generalization guarantees. In Conference on Learning Theory, pages 772-814, 2016.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron Roth. Generalization in adaptive data analysis and holdout reuse. In Advances in Neural Information Processing Systems, pages 2350-2358, 2015.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. The reusable holdout: Preserving validity in adaptive data analysis. Science, 349(6248):636-638, 2015.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 117-126. ACM, 2015.
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265-284. Springer, 2006.
Sam Elder. Bayesian adaptive data analysis guarantees from subgaussianity. arXiv preprint, 2016. URL: http://arxiv.org/abs/1611.00065.
Sam Elder. Challenges in bayesian adaptive data analysis. arXiv preprint, 2016. URL: http://arxiv.org/abs/1604.02492.
Vitaly Feldman, Roy Frostig, and Moritz Hardt. The advantages of multiple classes for reducing overfitting from test set reuse. In International Conference on Machine Learning, pages 1892-1900, 2019.
Vitaly Feldman and Thomas Steinke. Generalization for Adaptively-chosen Estimators via Stable Median. In Conference on Learning Theory, pages 728-757, 2017.
Vitaly Feldman and Thomas Steinke. Calibrating Noise to Variance in Adaptive Data Analysis. In Conference On Learning Theory, pages 535-544, 2018.
Vitaly Feldman and Jan Vondrak. Generalization bounds for uniformly stable algorithms. In Advances in Neural Information Processing Systems, pages 9747-9757, 2018.
Andrew Gelman and Eric Loken. The Statistical Crisis in Science. American Scientist, 102(6):460, 2014.
Moritz Hardt and Jonathan Ullman. Preventing false discovery in interactive data analysis is hard. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 454-463. IEEE, 2014.
Katrina Ligett and Moshe Shenfeld. A necessary and sufficient stability notion for adaptive generalization. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.00930.
Seth Neel and Aaron Roth. Mitigating Bias in Adaptive Data Gathering via Differential Privacy. In International Conference on Machine Learning (ICML), 2018.
Xinkun Nie, Xiaoying Tian, Jonathan Taylor, and James Zou. Why Adaptively Collected Data Have Negative Bias and How to Correct for It. In International Conference on Artificial Intelligence and Statistics, pages 1261-1269, 2018.
Kobbi Nissim and Uri Stemmer. Concentration Bounds for High Sensitivity Functions Through Differential Privacy. Journal of Privacy and Confidentiality, 9(1), 2019.
Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, and Blake Woodworth. Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.09231.
Ryan Rogers, Aaron Roth, Adam Smith, and Om Thakkar. Max-information, differential privacy, and post-selection hypothesis testing. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 487-494. IEEE, 2016.
Daniel Russo and James Zou. Controlling bias in adaptive data analysis using information theory. In Artificial Intelligence and Statistics, pages 1232-1240, 2016.
Thomas Steinke and Jonathan Ullman. Interactive fingerprinting codes and the hardness of preventing false discovery. In Conference on Learning Theory, pages 1588-1628, 2015.
Thomas Steinke and Jonathan Ullman. Subgaussian tail bounds via stability arguments. arXiv preprint, 2017. URL: http://arxiv.org/abs/1701.03493.
Aolin Xu and Maxim Raginsky. Information-theoretic analysis of generalization capability of learning algorithms. In Advances in Neural Information Processing Systems, pages 2524-2533, 2017.
Tijana Zrnic and Moritz Hardt. Natural Analysts in Adaptive Data Analysis. In International Conference on Machine Learning, pages 7703-7711, 2019.

A New Analysis of Differential Privacy’s Generalization Guarantees

Authors Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, Moshe Shenfeld

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

A New Analysis of Differential Privacy’s Generalization Guarantees

Authors Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, Moshe Shenfeld

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message