Improved Generalization Guarantees in Restricted Data Models

Du, Elbert; Dwork, Cynthia

doi:10.4230/LIPIcs.FORC.2022.6

File

LIPIcs.FORC.2022.6.pdf

Filesize: 0.59 MB
12 pages

Document Identifiers

DOI: 10.4230/LIPIcs.FORC.2022.6
URN: urn:nbn:de:0030-drops-165299

Author Details

Elbert Du

Department of Computer Science, Harvard University, Boston, MA, USA

Cynthia Dwork

Department of Computer Science, Harvard University, Boston, MA, USA

Acknowledgements

The authors are indebted to Guy Rothblum and Pragya Sur for many helpful conversations.

Cite AsGet BibTex

Elbert Du and Cynthia Dwork. Improved Generalization Guarantees in Restricted Data Models. In 3rd Symposium on Foundations of Responsible Computing (FORC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 218, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.FORC.2022.6

Abstract

Differential privacy is known to protect against threats to validity incurred due to adaptive, or exploratory, data analysis - even when the analyst adversarially searches for a statistical estimate that diverges from the true value of the quantity of interest on the underlying population. The cost of this protection is the accuracy loss incurred by differential privacy. In this work, inspired by standard models in the genomics literature, we consider data models in which individuals are represented by a sequence of attributes with the property that where distant attributes are only weakly correlated. We show that, under this assumption, it is possible to "re-use" privacy budget on different portions of the data, significantly improving accuracy without increasing the risk of overfitting.

Subject Classification

ACM Subject Classification

Theory of computation → Machine learning theory
Theory of computation → Design and analysis of algorithms
Theory of computation → Streaming, sublinear and near linear time algorithms

Keywords

Differential Privacy
Adaptive Data Analysis
Transfer Theorem

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, and Jonathan Ullman. Algorithmic stability for adaptive data analysis. SIAM Journal on Computing, 50(3):STOC16-377, 2021.
Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. Journal of the ACM (JACM), 60(2):1-25, 2013.
Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approximate differential privacy. SIAM Journal on Computing, 47(5):1888-1938, 2018.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron Roth. Generalization in adaptive data analysis and holdout reuse. Advances in Neural Information Processing Systems, 28, 2015.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. Preserving statistical validity in adaptive data analysis. arXiv e-prints, 2014. URL: http://arxiv.org/abs/1411.2664.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 117-126, 2015.
David A. Freedman. A note on screening regression equations. The American Statistician, 37(2):152-155, 1983. URL: http://www.jstor.org/stable/2685877.
Moritz Hardt and Guy N Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 61-70. IEEE, 2010.
Martin Hirzel, Scott Schneider, and Kanat Tangwongsan. Sliding-window aggregation algorithms: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, pages 11-14, 2017.
Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Moshe Shenfeld. A new analysis of differential privacy’s generalization guarantees, 2019. URL: http://arxiv.org/abs/1909.03577.
Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983-1006, 1998.
Montgomery Slatkin. Linkage disequilibrium - understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics, 9:477-485, 2008.