Data Reconstruction: When You See It and When You Don't

Authors Edith Cohen , Haim Kaplan , Yishay Mansour , Shay Moran , Kobbi Nissim , Uri Stemmer , Eliad Tsfadia



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2025.39.pdf
  • Filesize: 0.91 MB
  • 23 pages

Document Identifiers

Author Details

Edith Cohen
  • Google Research, Mountain View, CA, USA
  • Tel Aviv University, Israel
Haim Kaplan
  • Tel Aviv University, Israel
  • Google Research, Tel Aviv, Israel
Yishay Mansour
  • Tel Aviv University, Israel
  • Google Research, Tel Aviv, Israel
Shay Moran
  • Technion, Haifa, Israel
  • Google Research, Tel Aviv, Israel
Kobbi Nissim
  • Georgetown University, Washington, DC, USA
  • Work done while at Google Research, Tel Aviv, Israel
Uri Stemmer
  • Tel Aviv University, Israel
  • Google Research, Tel Aviv, Israel
Eliad Tsfadia
  • Georgetown University, Washington, DC, USA

Acknowledgements

The authors would like to thank Noam Mazor for useful discussions about Kolmogorov complexity.

Cite As Get BibTex

Edith Cohen, Haim Kaplan, Yishay Mansour, Shay Moran, Kobbi Nissim, Uri Stemmer, and Eliad Tsfadia. Data Reconstruction: When You See It and When You Don't. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 39:1-39:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ITCS.2025.39

Abstract

We revisit the fundamental question of formally defining what constitutes a reconstruction attack. While often clear from the context, our exploration reveals that a precise definition is much more nuanced than it appears, to the extent that a single all-encompassing definition may not exist. Thus, we employ a different strategy and aim to "sandwich" the concept of reconstruction attacks by addressing two complementing questions: (i) What conditions guarantee that a given system is protected against such attacks? (ii) Under what circumstances does a given attack clearly indicate that a system is not protected? More specifically,  
- We introduce a new definitional paradigm - Narcissus Resiliency - to formulate a security definition for protection against reconstruction attacks. This paradigm has a self-referential nature that enables it to circumvent shortcomings of previously studied notions of security.
Furthermore, as a side-effect, we demonstrate that Narcissus resiliency captures as special cases multiple well-studied concepts including differential privacy and other security notions of one-way functions and encryption schemes.
- We formulate a link between reconstruction attacks and Kolmogorov complexity. This allows us to put forward a criterion for evaluating when such attacks are convincingly successful.

Subject Classification

ACM Subject Classification
  • Security and privacy → Human and societal aspects of security and privacy
Keywords
  • differential privacy
  • reconstruction

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, and Daniel M. Roy. Information complexity of stochastic convex optimization: Applications to generalization and memorization, 2024. URL: https://doi.org/10.48550/arXiv.2402.09327.
  2. Achraf Azize and Debabrota Basu. How much does each datapoint leak your privacy? quantifying the per-datum membership leakage. CoRR, abs/2402.10065, 2024. URL: https://doi.org/10.48550/arXiv.2402.10065.
  3. Marshall Ball, Yanyi Liu, Noam Mazor, and Rafael Pass. Kolmogorov comes to cryptomania: On interactive kolmogorov complexity and key-agreement. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, pages 458-483, 2023. URL: https://doi.org/10.1109/FOCS57990.2023.00034.
  4. Borja Balle, Giovanni Cherubin, and Jamie Hayes. Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1138-1156. IEEE, 2022. URL: https://doi.org/10.1109/SP46214.2022.9833677.
  5. Amos Beimel, Kobbi Nissim, and Eran Omri. Distributed private data analysis: Simultaneously solving how and what. In Annual International Cryptology Conference (CRYPTO), pages 451-468, 2008. URL: https://doi.org/10.1007/978-3-540-85174-5_25.
  6. Gavin Brown, Mark Bun, Vitaly Feldman, Adam Smith, and Kunal Talwar. When is memorization of irrelevant training data necessary for high-accuracy learning? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 123-132, 2021. URL: https://doi.org/10.1145/3406325.3451131.
  7. Gavin Brown, Mark Bun, and Adam Smith. Strong memory lower bounds for learning natural models. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 4989-5029, 2022. URL: https://proceedings.mlr.press/v178/brown22a.html.
  8. Mark Bun, Yi-Hsiu Chen, and Salil P. Vadhan. Separating computational and statistical differential privacy in the client-server model. In Theory of Cryptography - 14th International Conference, TCC 2016-B, volume 9985, pages 607-634, 2016. URL: https://doi.org/10.1007/978-3-662-53641-4_23.
  9. Mark Bun and Mark Zhandry. Order-revealing encryption and the hardness of private learning. In TCC 2016-A, volume 9562, pages 176-206, 2016. URL: https://doi.org/10.1007/978-3-662-49096-9_8.
  10. Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramèr. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1897-1914, 2022. URL: https://doi.org/10.1109/SP46214.2022.9833649.
  11. Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In Proceedings of the 32nd USENIX Conference on Security Symposium, SEC '23, 2023. Google Scholar
  12. Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: evaluating and testing unintended memorization in neural networks. In Proceedings of the 28th USENIX Conference on Security Symposium, SEC'19, pages 267-284, USA, 2019. USENIX Association. URL: https://www.usenix.org/conference/usenixsecurity19/presentation/carlini.
  13. Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium, USENIX Security 2019, pages 267-284, 2019. URL: https://www.usenix.org/conference/usenixsecurity19/presentation/carlini.
  14. Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633-2650, 2021. URL: https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting.
  15. Gregory J. Chaitin. On the simplicity and speed of programs for computing infinite sets of natural numbers. J. ACM, 16(3):407-422, 1969. URL: https://doi.org/10.1145/321526.321530.
  16. Aloni Cohen and Kobbi Nissim. Towards formalizing the gdpr’s notion of singling out. Proc. Natl. Acad. Sci. USA, 117(15):8344-8352, 2020. URL: https://doi.org/10.1073/PNAS.1914598117.
  17. Edith Cohen, Haim Kaplan, Yishay Mansour, Shay Moran, Kobbi Nissim, Uri Stemmer, and Eliad Tsfadia. Data reconstruction: When you see it and when you don't, 2024. URL: https://doi.org/10.48550/arXiv.2405.15753.
  18. Rachel Cummings, Shlomi Hod, Jayshree Sarathy, and Marika Swanberg. Attaxonomy: Unpacking differential privacy guarantees against practical adversaries, 2024. URL: https://doi.org/10.48550/arXiv.2405.01716.
  19. Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In PODS, pages 202-210. ACM, 2003. URL: https://doi.org/10.1145/773153.773173.
  20. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876, pages 265-284, 2006. URL: https://doi.org/10.1007/11681878_14.
  21. Cynthia Dwork, Frank McSherry, and Kunal Talwar. The price of privacy and the limits of lp decoding. In STOC, pages 85-94. ACM, 2007. URL: https://doi.org/10.1145/1250790.1250804.
  22. Cynthia Dwork, Adam D. Smith, Thomas Steinke, Jonathan R. Ullman, and Salil P. Vadhan. Robust traceability from trace amounts. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, pages 650-669, 2015. URL: https://doi.org/10.1109/FOCS.2015.46.
  23. Cynthia Dwork and Sergey Yekhanin. New efficient attacks on statistical disclosure control mechanisms. In CRYPTO, pages 469-480. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-85174-5_26.
  24. Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 954-959, 2020. URL: https://doi.org/10.1145/3357713.3384290.
  25. Vitaly Feldman and Chiyuan Zhang. What neural networks memorize and why: discovering the long tail via influence estimation. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS '20, 2020. Google Scholar
  26. Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS '15, pages 1322-1333, 2015. URL: https://doi.org/10.1145/2810103.2813677.
  27. B. Ghazi, R. Ilango, P. Kamath, R. Kumar, and P. Manurangsi. Towards separating computational and statistical differential privacy. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 580-599, 2023. Google Scholar
  28. Peter D. Grünwald and Paul M. B. Vitányi. Kolmogorov complexity and information theory. with an interpretation in terms of questions and answers. Journal of Logic, Language and Information, 12(4):497-529, 2003. URL: https://doi.org/10.1023/A:1025011119492.
  29. Chuan Guo, Brian Karrer, Kamalika Chaudhuri, and Laurens van der Maaten. Bounding training data reconstruction in private (deep) learning. In International Conference on Machine Learning, ICML 2022, volume 162, pages 8056-8071, 2022. URL: https://proceedings.mlr.press/v162/guo22c.html.
  30. Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, and Michal Irani. Reconstructing training data from trained neural networks. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS 2022, 2022. Google Scholar
  31. Iftach Haitner, Noam Mazor, Jad Silbak, and Eliad Tsfadia. On the complexity of two-party differential privacy. In STOC '22: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1392-1405. ACM, 2022. URL: https://doi.org/10.1145/3519935.3519982.
  32. Jamie Hayes, Borja Balle, and Saeed Mahloujifar. Bounding training data reconstruction in DP-SGD. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. Google Scholar
  33. Zecheng He, Tianwei Zhang, and Ruby B. Lee. Model inversion attacks against collaborative inference. In Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC '19, pages 148-162, 2019. URL: https://doi.org/10.1145/3359789.3359824.
  34. Matthew Jagielski, Om Thakkar, Florian Tramer, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Guha Thakurta, Nicolas Papernot, and Chiyuan Zhang. Measuring forgetting of memorized training examples. In The Eleventh International Conference on Learning Representations, 2023. Google Scholar
  35. Georgios Kaissis, Jamie Hayes, Alexander Ziller, and Daniel Rueckert. Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy, 2023. URL: https://doi.org/10.48550/arXiv.2307.03928.
  36. K.-I. Ko. On the notion of infinite pseudorandom sequences. Theor. Comput. Sci., 48(1):9-33, 1986. URL: https://doi.org/10.1016/0304-3975(86)90081-2.
  37. A. N. Kolmogorov. Three approaches to the quantitative definition of information. International Journal of Computer Mathematics, 2(1-4):157-168, 1968. Google Scholar
  38. L Levin. Universal search problems (russian), translated to english by trakhtenbrot (1984). Problems of Information Transmission, 9(3):265-266, 1973. Google Scholar
  39. Yanyi Liu and Rafael Pass. On one-way functions and kolmogorov complexity. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pages 1243-1254, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00118.
  40. Yanyi Liu and Rafael Pass. On the possibility of basing cryptography on exp̸ = BPP. In Advances in Cryptology - CRYPTO 2021 - 41st Annual International Cryptology Conference, CRYPTO 2021, volume 12825, pages 11-40, 2021. URL: https://doi.org/10.1007/978-3-030-84242-0_2.
  41. Yanyi Liu and Rafael Pass. Characterizing derandomization through hardness of levin-kolmogorov complexity. In Shachar Lovett, editor, 37th Computational Complexity Conference, CCC 2022, volume 234, pages 35:1-35:17, 2022. URL: https://doi.org/10.4230/LIPICS.CCC.2022.35.
  42. Yanyi Liu and Rafael Pass. On one-way functions from np-complete problems. In 37th Computational Complexity Conference, CCC 2022, volume 234, pages 36:1-36:24, 2022. URL: https://doi.org/10.4230/LIPICS.CCC.2022.36.
  43. Roi Livni. Information theoretic lower bounds for information theoretic upper bounds. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, 2023. Google Scholar
  44. Yunhui Long, Lei Wang, Diyue Bu, Vincent Bindschaedler, Xiaofeng Wang, Haixu Tang, Carl A. Gunter, and Kai Chen. A pragmatic approach to membership inferences on machine learning models. In 2020 IEEE European Symposium on Security and Privacy, pages 521-534, 2020. URL: https://doi.org/10.1109/EUROSP48549.2020.00040.
  45. Luc Longpré and Sarah Mocas. Symmetry of information and one-way functions. Information Processing Letters, 46(2):95-100, 1993. URL: https://doi.org/10.1016/0020-0190(93)90204-M.
  46. Jean loup Gailly and Mark Adle. zlib compression library, 2004. Google Scholar
  47. Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. The limits of two-party differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 81-90, 2010. URL: https://doi.org/10.1109/FOCS.2010.14.
  48. Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil Vadhan. Computational differential privacy. In Annual International Cryptology Conference (CRYPTO), pages 126-142, 2009. URL: https://doi.org/10.1007/978-3-642-03356-8_8.
  49. Kobbi Nissim, Aaron Bembenek, Alexandra Wood, Mark Bun, Marco Gaboardi, Urs Gasser, David R. O'Brien, Thomas Steinke, and Salil Vadhan. Bridging the gap between computer science and legal approaches to privacy. Harvard Journal of Law & Technology, 31(2):687-780, 2018. Google Scholar
  50. Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Yann Ollivier, and Hervé Jégou. White-box vs black-box: Bayes optimal strategies for membership inference. In International Conference on Machine Learning, 2019. Google Scholar
  51. Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kamyar Seyed Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS 2022, 2022. Google Scholar
  52. Sriram Sankararaman, Guillaume Obozinski, Michael I Jordan, and Eran Halperin. Genomic privacy and limits of individual detection in a pool. Nature genetics, 41(9):965-967, 2009. Google Scholar
  53. Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3-18. IEEE, 2017. URL: https://doi.org/10.1109/SP.2017.41.
  54. Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, pages 3-18, 2017. URL: https://doi.org/10.1109/SP.2017.41.
  55. Michael Sipser. A complexity theoretic approach to randomness. In Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, STOC '83, pages 330-335, 1983. URL: https://doi.org/10.1145/800061.808762.
  56. R.J. Solomonoff. A formal theory of inductive inference. Information and Control, 7(1):1-22, 1967. Google Scholar
  57. B.A. Trakhtenbrot. A survey of russian approaches to perebor (brute-force searches) algorithms. Annals of the History of Computing, 6(4):384-400, 1984. URL: https://doi.org/10.1109/MAHC.1984.10036.
  58. Lauren Watson, Chuan Guo, Graham Cormode, and Alexandre Sablayrolles. On the importance of difficulty calibration in membership inference attacks. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022. URL: https://openreview.net/forum?id=3eIrli0TwQ.
  59. Ziqi Yang, Jiyi Zhang, Ee-Chien Chang, and Zhenkai Liang. Neural network inversion in adversarial setting via background knowledge alignment. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS '19, pages 225-240. Association for Computing Machinery, 2019. URL: https://doi.org/10.1145/3319535.3354261.
  60. Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pages 268-282. IEEE, 2018. URL: https://doi.org/10.1109/CSF.2018.00027.
  61. H. Yin, P. Molchanov, J. M. Alvarez, Z. Li, A. Mallya, D. Hoiem, N. K. Jha, and J. Kautz. Dreaming to distill: Data-free knowledge transfer via deepinversion. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8712-8721. IEEE Computer Society, 2020. Google Scholar
  62. A Zvonkin and L Levin. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 6:83-124, 1970. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail