Data Reconstruction: When You See It and When You Don't

Cohen, Edith; Kaplan, Haim; Mansour, Yishay; Moran, Shay; Nissim, Kobbi; Stemmer, Uri; Tsfadia, Eliad

doi:10.4230/LIPIcs.ITCS.2025.39

Abstract

We revisit the fundamental question of formally defining what constitutes a reconstruction attack. While often clear from the context, our exploration reveals that a precise definition is much more nuanced than it appears, to the extent that a single all-encompassing definition may not exist. Thus, we employ a different strategy and aim to "sandwich" the concept of reconstruction attacks by addressing two complementing questions: (i) What conditions guarantee that a given system is protected against such attacks? (ii) Under what circumstances does a given attack clearly indicate that a system is not protected? More specifically,  
- We introduce a new definitional paradigm - Narcissus Resiliency - to formulate a security definition for protection against reconstruction attacks. This paradigm has a self-referential nature that enables it to circumvent shortcomings of previously studied notions of security.
Furthermore, as a side-effect, we demonstrate that Narcissus resiliency captures as special cases multiple well-studied concepts including differential privacy and other security notions of one-way functions and encryption schemes.
- We formulate a link between reconstruction attacks and Kolmogorov complexity. This allows us to put forward a criterion for evaluating when such attacks are convincingly successful.

Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, and Daniel M. Roy. Information complexity of stochastic convex optimization: Applications to generalization and memorization, 2024. URL: https://doi.org/10.48550/arXiv.2402.09327.
Achraf Azize and Debabrota Basu. How much does each datapoint leak your privacy? quantifying the per-datum membership leakage. CoRR, abs/2402.10065, 2024. URL: https://doi.org/10.48550/arXiv.2402.10065.
Marshall Ball, Yanyi Liu, Noam Mazor, and Rafael Pass. Kolmogorov comes to cryptomania: On interactive kolmogorov complexity and key-agreement. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, pages 458-483, 2023. URL: https://doi.org/10.1109/FOCS57990.2023.00034.
Borja Balle, Giovanni Cherubin, and Jamie Hayes. Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1138-1156. IEEE, 2022. URL: https://doi.org/10.1109/SP46214.2022.9833677.
Amos Beimel, Kobbi Nissim, and Eran Omri. Distributed private data analysis: Simultaneously solving how and what. In Annual International Cryptology Conference (CRYPTO), pages 451-468, 2008. URL: https://doi.org/10.1007/978-3-540-85174-5_25.
Gavin Brown, Mark Bun, Vitaly Feldman, Adam Smith, and Kunal Talwar. When is memorization of irrelevant training data necessary for high-accuracy learning? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 123-132, 2021. URL: https://doi.org/10.1145/3406325.3451131.
Gavin Brown, Mark Bun, and Adam Smith. Strong memory lower bounds for learning natural models. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 4989-5029, 2022. URL: https://proceedings.mlr.press/v178/brown22a.html.
Mark Bun, Yi-Hsiu Chen, and Salil P. Vadhan. Separating computational and statistical differential privacy in the client-server model. In Theory of Cryptography - 14th International Conference, TCC 2016-B, volume 9985, pages 607-634, 2016. URL: https://doi.org/10.1007/978-3-662-53641-4_23.
Mark Bun and Mark Zhandry. Order-revealing encryption and the hardness of private learning. In TCC 2016-A, volume 9562, pages 176-206, 2016. URL: https://doi.org/10.1007/978-3-662-49096-9_8.
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramèr. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1897-1914, 2022. URL: https://doi.org/10.1109/SP46214.2022.9833649.
Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In Proceedings of the 32nd USENIX Conference on Security Symposium, SEC '23, 2023.
Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: evaluating and testing unintended memorization in neural networks. In Proceedings of the 28th USENIX Conference on Security Symposium, SEC'19, pages 267-284, USA, 2019. USENIX Association. URL: https://www.usenix.org/conference/usenixsecurity19/presentation/carlini.
Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium, USENIX Security 2019, pages 267-284, 2019. URL: https://www.usenix.org/conference/usenixsecurity19/presentation/carlini.
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633-2650, 2021. URL: https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting.
Gregory J. Chaitin. On the simplicity and speed of programs for computing infinite sets of natural numbers. J. ACM, 16(3):407-422, 1969. URL: https://doi.org/10.1145/321526.321530.
Aloni Cohen and Kobbi Nissim. Towards formalizing the gdpr’s notion of singling out. Proc. Natl. Acad. Sci. USA, 117(15):8344-8352, 2020. URL: https://doi.org/10.1073/PNAS.1914598117.
Edith Cohen, Haim Kaplan, Yishay Mansour, Shay Moran, Kobbi Nissim, Uri Stemmer, and Eliad Tsfadia. Data reconstruction: When you see it and when you don't, 2024. URL: https://doi.org/10.48550/arXiv.2405.15753.
Rachel Cummings, Shlomi Hod, Jayshree Sarathy, and Marika Swanberg. Attaxonomy: Unpacking differential privacy guarantees against practical adversaries, 2024. URL: https://doi.org/10.48550/arXiv.2405.01716.
Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In PODS, pages 202-210. ACM, 2003. URL: https://doi.org/10.1145/773153.773173.
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876, pages 265-284, 2006. URL: https://doi.org/10.1007/11681878_14.
Cynthia Dwork, Frank McSherry, and Kunal Talwar. The price of privacy and the limits of lp decoding. In STOC, pages 85-94. ACM, 2007. URL: https://doi.org/10.1145/1250790.1250804.
Cynthia Dwork, Adam D. Smith, Thomas Steinke, Jonathan R. Ullman, and Salil P. Vadhan. Robust traceability from trace amounts. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, pages 650-669, 2015. URL: https://doi.org/10.1109/FOCS.2015.46.
Cynthia Dwork and Sergey Yekhanin. New efficient attacks on statistical disclosure control mechanisms. In CRYPTO, pages 469-480. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-85174-5_26.
Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 954-959, 2020. URL: https://doi.org/10.1145/3357713.3384290.
Vitaly Feldman and Chiyuan Zhang. What neural networks memorize and why: discovering the long tail via influence estimation. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS '20, 2020.
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS '15, pages 1322-1333, 2015. URL: https://doi.org/10.1145/2810103.2813677.
B. Ghazi, R. Ilango, P. Kamath, R. Kumar, and P. Manurangsi. Towards separating computational and statistical differential privacy. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 580-599, 2023.
Peter D. Grünwald and Paul M. B. Vitányi. Kolmogorov complexity and information theory. with an interpretation in terms of questions and answers. Journal of Logic, Language and Information, 12(4):497-529, 2003. URL: https://doi.org/10.1023/A:1025011119492.
Chuan Guo, Brian Karrer, Kamalika Chaudhuri, and Laurens van der Maaten. Bounding training data reconstruction in private (deep) learning. In International Conference on Machine Learning, ICML 2022, volume 162, pages 8056-8071, 2022. URL: https://proceedings.mlr.press/v162/guo22c.html.
Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, and Michal Irani. Reconstructing training data from trained neural networks. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS 2022, 2022.
Iftach Haitner, Noam Mazor, Jad Silbak, and Eliad Tsfadia. On the complexity of two-party differential privacy. In STOC '22: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1392-1405. ACM, 2022. URL: https://doi.org/10.1145/3519935.3519982.
Jamie Hayes, Borja Balle, and Saeed Mahloujifar. Bounding training data reconstruction in DP-SGD. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Zecheng He, Tianwei Zhang, and Ruby B. Lee. Model inversion attacks against collaborative inference. In Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC '19, pages 148-162, 2019. URL: https://doi.org/10.1145/3359789.3359824.
Matthew Jagielski, Om Thakkar, Florian Tramer, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Guha Thakurta, Nicolas Papernot, and Chiyuan Zhang. Measuring forgetting of memorized training examples. In The Eleventh International Conference on Learning Representations, 2023.
Georgios Kaissis, Jamie Hayes, Alexander Ziller, and Daniel Rueckert. Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy, 2023. URL: https://doi.org/10.48550/arXiv.2307.03928.
K.-I. Ko. On the notion of infinite pseudorandom sequences. Theor. Comput. Sci., 48(1):9-33, 1986. URL: https://doi.org/10.1016/0304-3975(86)90081-2.
A. N. Kolmogorov. Three approaches to the quantitative definition of information. International Journal of Computer Mathematics, 2(1-4):157-168, 1968.
L Levin. Universal search problems (russian), translated to english by trakhtenbrot (1984). Problems of Information Transmission, 9(3):265-266, 1973.
Yanyi Liu and Rafael Pass. On one-way functions and kolmogorov complexity. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pages 1243-1254, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00118.
Yanyi Liu and Rafael Pass. On the possibility of basing cryptography on exp̸ = BPP. In Advances in Cryptology - CRYPTO 2021 - 41st Annual International Cryptology Conference, CRYPTO 2021, volume 12825, pages 11-40, 2021. URL: https://doi.org/10.1007/978-3-030-84242-0_2.
Yanyi Liu and Rafael Pass. Characterizing derandomization through hardness of levin-kolmogorov complexity. In Shachar Lovett, editor, 37th Computational Complexity Conference, CCC 2022, volume 234, pages 35:1-35:17, 2022. URL: https://doi.org/10.4230/LIPICS.CCC.2022.35.
Yanyi Liu and Rafael Pass. On one-way functions from np-complete problems. In 37th Computational Complexity Conference, CCC 2022, volume 234, pages 36:1-36:24, 2022. URL: https://doi.org/10.4230/LIPICS.CCC.2022.36.
Roi Livni. Information theoretic lower bounds for information theoretic upper bounds. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, 2023.
Yunhui Long, Lei Wang, Diyue Bu, Vincent Bindschaedler, Xiaofeng Wang, Haixu Tang, Carl A. Gunter, and Kai Chen. A pragmatic approach to membership inferences on machine learning models. In 2020 IEEE European Symposium on Security and Privacy, pages 521-534, 2020. URL: https://doi.org/10.1109/EUROSP48549.2020.00040.
Luc Longpré and Sarah Mocas. Symmetry of information and one-way functions. Information Processing Letters, 46(2):95-100, 1993. URL: https://doi.org/10.1016/0020-0190(93)90204-M.
Jean loup Gailly and Mark Adle. zlib compression library, 2004.
Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. The limits of two-party differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 81-90, 2010. URL: https://doi.org/10.1109/FOCS.2010.14.
Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil Vadhan. Computational differential privacy. In Annual International Cryptology Conference (CRYPTO), pages 126-142, 2009. URL: https://doi.org/10.1007/978-3-642-03356-8_8.
Kobbi Nissim, Aaron Bembenek, Alexandra Wood, Mark Bun, Marco Gaboardi, Urs Gasser, David R. O'Brien, Thomas Steinke, and Salil Vadhan. Bridging the gap between computer science and legal approaches to privacy. Harvard Journal of Law & Technology, 31(2):687-780, 2018.
Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Yann Ollivier, and Hervé Jégou. White-box vs black-box: Bayes optimal strategies for membership inference. In International Conference on Machine Learning, 2019.
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kamyar Seyed Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS 2022, 2022.
Sriram Sankararaman, Guillaume Obozinski, Michael I Jordan, and Eran Halperin. Genomic privacy and limits of individual detection in a pool. Nature genetics, 41(9):965-967, 2009.
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3-18. IEEE, 2017. URL: https://doi.org/10.1109/SP.2017.41.
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP 2017, pages 3-18, 2017. URL: https://doi.org/10.1109/SP.2017.41.
Michael Sipser. A complexity theoretic approach to randomness. In Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, STOC '83, pages 330-335, 1983. URL: https://doi.org/10.1145/800061.808762.
R.J. Solomonoff. A formal theory of inductive inference. Information and Control, 7(1):1-22, 1967.
B.A. Trakhtenbrot. A survey of russian approaches to perebor (brute-force searches) algorithms. Annals of the History of Computing, 6(4):384-400, 1984. URL: https://doi.org/10.1109/MAHC.1984.10036.
Lauren Watson, Chuan Guo, Graham Cormode, and Alexandre Sablayrolles. On the importance of difficulty calibration in membership inference attacks. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022. URL: https://openreview.net/forum?id=3eIrli0TwQ.
Ziqi Yang, Jiyi Zhang, Ee-Chien Chang, and Zhenkai Liang. Neural network inversion in adversarial setting via background knowledge alignment. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS '19, pages 225-240. Association for Computing Machinery, 2019. URL: https://doi.org/10.1145/3319535.3354261.
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pages 268-282. IEEE, 2018. URL: https://doi.org/10.1109/CSF.2018.00027.
H. Yin, P. Molchanov, J. M. Alvarez, Z. Li, A. Mallya, D. Hoiem, N. K. Jha, and J. Kautz. Dreaming to distill: Data-free knowledge transfer via deepinversion. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8712-8721. IEEE Computer Society, 2020.
A Zvonkin and L Levin. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 6:83-124, 1970.

Data Reconstruction: When You See It and When You Don't

Authors Edith Cohen , Haim Kaplan , Yishay Mansour , Shay Moran , Kobbi Nissim , Uri Stemmer , Eliad Tsfadia

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Data Reconstruction: When You See It and When You Don't

Authors Edith Cohen , Haim Kaplan , Yishay Mansour , Shay Moran , Kobbi Nissim , Uri Stemmer , Eliad Tsfadia

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message