Decentralized Data Archival: New Definitions and Constructions

Shi, Elaine; Silver, Rose; Mu, Changrui

doi:10.4230/LIPIcs.ITCS.2026.116

Abstract

We initiate the study of a new abstraction called incremental decentralized data archival (iDDA). Specifically, imagine that there is an ever-growing, massive database such as a blockchain, a comprehensive human knowledge base like Wikipedia, or the Internet archive. We want to build a decentralized archival system for such datasets to ensure long-term robustness and sustainability. We identify several important properties that an iDDA scheme should satisfy. First, to promote heterogeneity and decentralization, we want to encourage even weak nodes with limited space (e.g., users' home computers) to contribute. The minimum space requirement to contribute should be approximately independent of the data size. Second, if a collection of nodes together receive rewards commensurate with contributing a total of m blocks of space, then we want the following reassurances: 1) if m is at least the database size, we should be able to reconstruct the entire dataset; and 2) these nodes should actually be committing roughly m space in aggregate - specifically, when m is much larger than the data size, these nodes cannot store only one copy of the database, and be able to impersonate arbitrarily many pseudonyms and get unbounded rewards.
We propose new definitions that mathematically formalize the aforementioned requirements of an iDDA scheme. We also devise an efficient construction in the random oracle model which satisfies the desired security requirements. Our scheme incurs only Õ(1) audit cost, as well as Õ(1) update cost for both the publisher and each node, where Õ(⋅) hides polylogarithmic factors. Further, the minimum space provisioning required to contribute is as small as polylogarithmic.
Our construction exposes several interesting technical challenges. Specifically, we show that a straightforward application of the standard hierarchical data structure fails, since both our security definition and the underlying cryptographic primitives we employ lack the desired compositional guarantees. We devise novel techniques to overcome these compositional issues, resulting in a construction with provable security while still retaining efficiency. Finally, our new definitions also make a conceptual contribution, and lay the theoretical groundwork for the study of iDDA. We raise several interesting open problems along this direction.

Akshima, David Cash, Andrew Drucker, and Hoeteck Wee. Time-space tradeoffs and short collisions in merkle-damgard hash functions. In CRYPTO, 2020.
Akshima, Siyao Guo, and Qipeng Liu. Time-space lower bounds for finding collisions in merkle-damgård hash functions. In Advances in Cryptology - CRYPTO 2022 - 42nd Annual International Cryptology Conference, CRYPTO 2022, Santa Barbara, CA, USA, August 15-18, 2022, Proceedings, Part III, volume 13509 of Lecture Notes in Computer Science, pages 192-221. Springer, 2022. URL: https://doi.org/10.1007/978-3-031-15982-4_7.
Mustafa Al-Bassam, Alberto Sonnino, Vitalik Buterin, and Ismail Khoffi. Fraud and data availability proofs: Detecting invalid blocks in light clients. In Financial Cryptography and Data Security: 25th International Conference, FC 2021, Virtual Event, March 1–5, 2021, Revised Selected Papers, Part II, pages 279-298, Berlin, Heidelberg, 2021. Springer-Verlag. URL: https://doi.org/10.1007/978-3-662-64331-0_15.
Giuseppe Ateniese, Seny Kamara, and Jonathan Katz. Proofs of storage from homomorphic identification protocols. In ASIACRYPT, 2009.
Thomas Attema, Michael Klooß, Russell W. F. Lai, and Pavlo Yatsyna. Adaptive special soundness: Improved knowledge extraction by adaptive useful challenge sampling. IACR Cryptol. ePrint Arch., page 2038, 2024. URL: https://eprint.iacr.org/2024/2038.
Annalisa Barbara, Alessandro Chiesa, and Ziyi Guan. Relativized succinct arguments in the ROM do not exist. IACR Cryptol. ePrint Arch., page 728, 2024. URL: https://eprint.iacr.org/2024/728.
Jeb Bearer, Benedikt Bünz, Philippe Camacho, Binyi Chen, Ellie Davidson, Ben Fisch, Brendon Fish, Gus Gutoski, Fernando Krell, Chengyu Lin, Dahlia Malkhi, Kartik Nayak, Keyao Shen, Alex Xiong, Nathan Yospe, and Sishan Long. The espresso sequencing network: HotShot consensus, tiramisu data-availability, and builder-exchange. Cryptology ePrint Archive, Paper 2024/1189, 2024. URL: https://eprint.iacr.org/2024/1189.
Eli Ben-Sasson, Iddo Bentov, Yinon Horesh, and Michael Riabzev. Fast reed-solomon interactive oracle proofs of proximity. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 14:1-14:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.14.
Jon Louis Bentley and James B Saxe. Decomposable searching problems i. static-to-dynamic transformation. Journal of Algorithms, 1(4):301-358, 1980. URL: https://doi.org/10.1016/0196-6774(80)90015-2.
Dan Boneh, Joachim Neu, Valeria Nikolaenko, and Aditi Partap. Data availability sampling with repair. Cryptology ePrint Archive, Paper 2025/1414, 2025. URL: https://eprint.iacr.org/2025/1414.
Kevin D. Bowers, Ari Juels, and Alina Oprea. Proofs of retrievability: theory and implementation. In CCSW, 2009.
Christian Cachin and Stefano Tessaro. Asynchronous verifiable information dispersal. In Proceedings of the 19th International Conference on Distributed Computing, DISC'05, pages 503-504, Berlin, Heidelberg, 2005. Springer-Verlag. URL: https://doi.org/10.1007/11561927_42.
Nishanth Chandran, Bhavana Kanukurthi, and Rafail Ostrovsky. Locally updatable and locally decodable codes. In Theory of Cryptography - 11th Theory of Cryptography Conference, TCC 2014, San Diego, CA, USA, February 24-26, 2014. Proceedings, volume 8349 of Lecture Notes in Computer Science, pages 489-514. Springer, 2014. URL: https://doi.org/10.1007/978-3-642-54242-8_21.
Arunima Chaudhuri, Sudipta Basak, Csaba Kiraly, Dmitriy Ryajov, and Leonardo Bautista-Gomez. On the design of ethereum data availability sampling: A comprehensive simulation study, 2024. URL: https://doi.org/10.48550/arXiv.2407.18085.
Alessandro Chiesa, Marcel Dall’Agnol, Ziyi Guan, Nicholas Spooner, and Eylon Yogev. Untangling the security of kilian’s protocol: Upper and lower bounds. In Theory of Cryptography: 22nd International Conference, TCC 2024, Milan, Italy, December 2–6, 2024, Proceedings, Part I, pages 158-188, Berlin, Heidelberg, 2024. Springer-Verlag. URL: https://doi.org/10.1007/978-3-031-78011-0_6.
Alessandro Chiesa, Ziyi Guan, and Yuetian Wu. Unpublished manuscript. Private conversation with Ziyi Guan, 2025.
Seunghyun Cho, Eunyoung Seo, and Young-Sik Kim. Locally recoverable data availability sampling. Cryptology ePrint Archive, Paper 2025/1851, 2025. URL: https://eprint.iacr.org/2025/1851.
Sandro Coretti, Yevgeniy Dodis, Siyao Guo, and John P. Steinberger. Random oracles and non-uniformity. In EUROCRYPT (1), pages 227-258. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-78381-9_9.
Anindya De, Luca Trevisan, and Madhur Tulsiani. Time space tradeoffs for attacks against one-way functions and prgs. In Advances in Cryptology - CRYPTO 2010, pages 649-665, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/978-3-642-14623-7_35.
Yevgeniy Dodis, Siyao Guo, and Jonathan Katz. Fixing cracks in the concrete: Random oracles with auxiliary input, revisited. In EUROCRYPT (2), pages 473-495. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-56614-6_16.
Yevgeniy Dodis, Salil P. Vadhan, and Daniel Wichs. Proofs of retrievability via hardness amplification. In Theoretical Cryptography Conference (TCC), 2009.
Ben Fisch. Poreps: Proofs of space on useful data. IACR Cryptol. ePrint Arch., page 678, 2018. URL: https://eprint.iacr.org/2018/678.
Ben Fisch. Tight proofs of space and replication. In Advances in Cryptology – EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, May 19–23, 2019, Proceedings, Part II, pages 324-348, Berlin, Heidelberg, 2019. Springer-Verlag. URL: https://doi.org/10.1007/978-3-030-17656-3_12.
Ben Fisch, Arthur Lazzaretti, Zeyu Liu, and Lei Yang. Permissionless verifiable information dispersal (data availability for bitcoin rollups). Cryptology ePrint Archive, Paper 2024/1299, 2024. URL: https://eprint.iacr.org/2024/1299.
Nick Gravin, Siyao Guo, Tsz Chiu Kwok, and Pinyan Lu. Concentration bounds for almost k-wise independence with applications to non-uniform security. In Proceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '21, pages 2404-2423, USA, 2021. Society for Industrial and Applied Mathematics. URL: https://doi.org/10.1137/1.9781611976465.143.
Yanpei Guo, Alex Luoyuan Xiong, Wenjie Qu, and Jiaheng Zhang. Data availability for thousands of nodes. Cryptology ePrint Archive, Paper 2025/865, 2025. URL: https://eprint.iacr.org/2025/865.
Mathias Hall-Andersen, Mark Simkin, and Benedikt Wagner. Foundations of data availability sampling. IACR Commun. Cryptol., 1(4):34, 2024. URL: https://doi.org/10.62056/A09QUDHDJ.
Mathias Hall-Andersen, Mark Simkin, and Benedikt Wagner. FRIDA: data availability sampling from FRI. In Advances in Cryptology - CRYPTO 2024 - 44th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2024, Proceedings, Part VI, volume 14925 of Lecture Notes in Computer Science, pages 289-324. Springer, 2024. URL: https://doi.org/10.1007/978-3-031-68391-6_9.
James Hendricks, Gregory R. Ganger, and Michael K. Reiter. Verifying distributed erasure-coded data. In Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, PODC '07, pages 139-146, New York, NY, USA, 2007. Association for Computing Machinery. URL: https://doi.org/10.1145/1281100.1281122.
Ari Juels and Burton Kaliski. Pors: proofs of retrievability for large files. In ACM CCS, 2007.
Aniket Kate, Gregory M. Zaverucha, and Ian Goldberg. Constant-size commitments to polynomials and their applications. In Masayuki Abe, editor, Advances in Cryptology - ASIACRYPT 2010 - 16th International Conference on the Theory and Application of Cryptology and Information Security, Singapore, December 5-9, 2010. Proceedings, volume 6477 of Lecture Notes in Computer Science, pages 177-194. Springer, 2010. URL: https://doi.org/10.1007/978-3-642-17373-8_11.
A. Kupcu. Efficient cryptography for the next generation secure cloud. PhD thesis, Brown University, 2010.
Kamilla Nazirkhanova, Joachim Neu, and David Tse. Information dispersal with provable retrievability for rollups. In Proceedings of the 4th ACM Conference on Advances in Financial Technologies, AFT '22, pages 180-197, New York, NY, USA, 2023. Association for Computing Machinery.
Krzysztof Pietrzak. Proofs of catalytic space. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, volume 124 of LIPIcs, pages 59:1-59:25. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ITCS.2019.59.
Hovav Shacham and Brent Waters. Compact proofs of retrievability. In Asiacrypt, 2008.
Hovav Shacham and Brent Waters. Compact proofs of retrievability. J. Cryptol., 26(3):442-483, July 2013. URL: https://doi.org/10.1007/S00145-012-9129-2.
Elaine Shi, Rose Silver, and Changrui Mu. Decentralized data archival: New definitions and constructions. Cryptology ePrint Archive, Paper 2025/969, 2025. URL: https://eprint.iacr.org/2025/969.
Elaine Shi, Emil Stefanov, and Charalampos Papamanthou. Practical dynamic proofs of retrievability. In ACM Conference on Computer and Communications Security (CCS), 2013.
Dominique Unruh. Random oracles and auxiliary input. In Proceedings of the 27th Annual International Cryptology Conference on Advances in Cryptology, CRYPTO'07, pages 205-223, Berlin, Heidelberg, 2007. Springer-Verlag. URL: https://doi.org/10.1007/978-3-540-74143-5_12.
Riad S. Wahby, Ioanna Tzialla, Abhi Shelat, Justin Thaler, and Michael Walfish. Doubly-efficient zksnarks without trusted setup. In 2018 IEEE Symposium on Security and Privacy (SP), pages 926-943, 2018. URL: https://doi.org/10.1109/SP.2018.00060.
Lei Yang, Seo Jin Park, Mohammad Alizadeh, Sreeram Kannan, and David Tse. DispersedLedger: High-Throughput byzantine consensus on variable bandwidth networks. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 493-512, Renton, WA, April 2022. USENIX Association. URL: https://www.usenix.org/conference/nsdi22/presentation/yang.

Decentralized Data Archival: New Definitions and Constructions

Authors Elaine Shi , Rose Silver , Changrui Mu

Files

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Decentralized Data Archival: New Definitions and Constructions

Authors Elaine Shi , Rose Silver , Changrui Mu

Files

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message