Improved Generalization Guarantees in Restricted Data Models

Authors Elbert Du, Cynthia Dwork



PDF
Thumbnail PDF

File

LIPIcs.FORC.2022.6.pdf
  • Filesize: 0.59 MB
  • 12 pages

Document Identifiers

Author Details

Elbert Du
  • Department of Computer Science, Harvard University, Boston, MA, USA
Cynthia Dwork
  • Department of Computer Science, Harvard University, Boston, MA, USA

Acknowledgements

The authors are indebted to Guy Rothblum and Pragya Sur for many helpful conversations.

Cite AsGet BibTex

Elbert Du and Cynthia Dwork. Improved Generalization Guarantees in Restricted Data Models. In 3rd Symposium on Foundations of Responsible Computing (FORC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 218, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.FORC.2022.6

Abstract

Differential privacy is known to protect against threats to validity incurred due to adaptive, or exploratory, data analysis - even when the analyst adversarially searches for a statistical estimate that diverges from the true value of the quantity of interest on the underlying population. The cost of this protection is the accuracy loss incurred by differential privacy. In this work, inspired by standard models in the genomics literature, we consider data models in which individuals are represented by a sequence of attributes with the property that where distant attributes are only weakly correlated. We show that, under this assumption, it is possible to "re-use" privacy budget on different portions of the data, significantly improving accuracy without increasing the risk of overfitting.

Subject Classification

ACM Subject Classification
  • Theory of computation → Machine learning theory
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Differential Privacy
  • Adaptive Data Analysis
  • Transfer Theorem

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, and Jonathan Ullman. Algorithmic stability for adaptive data analysis. SIAM Journal on Computing, 50(3):STOC16-377, 2021. Google Scholar
  2. Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. Journal of the ACM (JACM), 60(2):1-25, 2013. Google Scholar
  3. Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approximate differential privacy. SIAM Journal on Computing, 47(5):1888-1938, 2018. Google Scholar
  4. Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron Roth. Generalization in adaptive data analysis and holdout reuse. Advances in Neural Information Processing Systems, 28, 2015. Google Scholar
  5. Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. Preserving statistical validity in adaptive data analysis. arXiv e-prints, 2014. URL: http://arxiv.org/abs/1411.2664.
  6. Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 117-126, 2015. Google Scholar
  7. David A. Freedman. A note on screening regression equations. The American Statistician, 37(2):152-155, 1983. URL: http://www.jstor.org/stable/2685877.
  8. Moritz Hardt and Guy N Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 61-70. IEEE, 2010. Google Scholar
  9. Martin Hirzel, Scott Schneider, and Kanat Tangwongsan. Sliding-window aggregation algorithms: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, pages 11-14, 2017. Google Scholar
  10. Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Moshe Shenfeld. A new analysis of differential privacy’s generalization guarantees, 2019. URL: http://arxiv.org/abs/1909.03577.
  11. Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983-1006, 1998. Google Scholar
  12. Montgomery Slatkin. Linkage disequilibrium - understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics, 9:477-485, 2008. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail