Synthesis-Aided Crash Consistency for Storage Systems

Authors Jacob Van Geffen , Xi Wang, Emina Torlak, James Bornholt



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2023.35.pdf
  • Filesize: 1.08 MB
  • 26 pages

Document Identifiers

Author Details

Jacob Van Geffen
  • University of Washington, Seattle, WA, USA
Xi Wang
  • University of Washington, Seattle, WA, USA
  • Amazon Web Services, Seattle, WA, USA
Emina Torlak
  • University of Washington, Seattle, WA, USA
  • Amazon Web Services, Seattle, WA, USA
James Bornholt
  • The University of Texas at Austin, TX, USA
  • Amazon Web Services, Seattle, WA, USA

Cite AsGet BibTex

Jacob Van Geffen, Xi Wang, Emina Torlak, and James Bornholt. Synthesis-Aided Crash Consistency for Storage Systems. In 37th European Conference on Object-Oriented Programming (ECOOP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 263, pp. 35:1-35:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ECOOP.2023.35

Abstract

Reliable storage systems must be crash consistent - guaranteed to recover to a consistent state after a crash. Crash consistency is non-trivial as it requires maintaining complex invariants about persistent data structures in the presence of caching, reordering, and system failures. Current programming models offer little support for implementing crash consistency, forcing storage system developers to roll their own consistency mechanisms. Bugs in these mechanisms can lead to severe data loss for applications that rely on persistent storage. This paper presents a new synthesis-aided programming model for building crash-consistent storage systems. In this approach, storage systems can assume an angelic crash-consistency model, where the underlying storage stack promises to resolve crashes in favor of consistency whenever possible. To realize this model, we introduce a new labeled writes interface for developers to identify their writes to disk, and develop a program synthesis tool, DepSynth, that generates dependency rules to enforce crash consistency over these labeled writes. We evaluate our model in a case study on a production storage system at Amazon Web Services. We find that DepSynth can automate crash consistency for this complex storage system, with similar results to existing expert-written code, and can automatically identify and correct consistency and performance issues.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Search-based software engineering
  • Computer systems organization → Secondary storage organization
Keywords
  • program synthesis
  • crash consistency
  • file systems

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. Fences in weak memory models. In Proceedings of the 22nd International Conference on Computer Aided Verification (CAV), pages 258-272, Edinburgh, United Kingdom, July 2010. Google Scholar
  2. Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. Litmus: Running tests against hardware. In Proceedings of the 17th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Saarbrücken, Germany, March-April 2011. Google Scholar
  3. Sidney Amani, Alex Hixon, Zilin Chen, Christine Rizkallah, Peter Chubb, Liam O'Connor, Joel Beeren, Yutaka Nagashima, Japheth Lim, Thomas Sewell, Joseph Tuong, Gabriele Keller, Toby Murray, Gerwin Klein, and Gernot Heiser. Cogent: Verifying high-assurance file system implementations. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 175-188, Atlanta, GA, April 2016. Google Scholar
  4. James Bornholt, Rajeev Joshi, Vytautas Astrauskas, Brendan Cully, Bernhard Kragl, Seth Markle, Kyle Sauri, Drew Schleit, Grant Slatton, Serdar Tasiran, Jacob Van Geffen, and Andrew Warfield. Using lightweight formal methods to validate a key-value storage node in Amazon S3. In Proceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP), Virtual conference, October 2021. Google Scholar
  5. James Bornholt, Antoine Kaufmann, Jialin Li, Arvind Krishnamurthy, Emina Torlak, and Xi Wang. Specifying and checking file system crash-consistency models. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 83-98, Atlanta, GA, April 2016. Google Scholar
  6. James Bornholt and Emina Torlak. Synthesizing memory models from framework sketches and litmus tests. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 467-481, Barcelona, Spain, June 2017. Google Scholar
  7. Haogang Chen, Tej Chajed, Alex Konradi, Stephanie Wang, Atalay İleri, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. Verifying a high-performance crash-safe file system using a tree specification. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), pages 270-286, Shanghai, China, October 2017. Google Scholar
  8. Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. Using Crash Hoare Logic for certifying the FSCQ file system. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 18-37, Monterey, CA, October 2015. Google Scholar
  9. Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Shriram Krishnamurthi, Eli Barzilay, Jay McCarthy, and Sam Tobin-Hochstadt. A programmable programming language. Communications of the ACM, 61(3):62-71, March 2018. Google Scholar
  10. Christopher Frost, Mike Mammarella, Eddie Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, and Lei Zhang. Generalized file system dependencies. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP), pages 307-320, Stevenson, WA, October 2007. Google Scholar
  11. Gregory R. Ganger and Yale N. Patt. Metadata update performance in file systems. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 49-60, Monterey, CA, November 1994. Google Scholar
  12. Valerie Henson. The many faces of fsck, September 2007. URL: https://lwn.net/Articles/248180/.
  13. Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Michael Norrish, Rafal Kolanski, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 207-220, Big Sky, MT, October 2009. Google Scholar
  14. Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7), 1978. Google Scholar
  15. Hayley LeBlanc, Shankara Pailoor, Om Saran K R E, Isil Dillig, James Bornholt, and Vijay Chidambaram. Chipmunk: Investigating crash-consistency in persistent-memory file systems. In Proceedings of the 18th ACM EuroSys Conference, Rome, Italy, May 2023. Google Scholar
  16. Xavier Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107-115, July 2009. Google Scholar
  17. Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Shan Lu. A study of Linux file system evolution. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST), pages 31-44, San Jose, CA, February 2013. Google Scholar
  18. Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, and Vijay Chidambaram. Finding crash-consistency bugs with bounded black-box crash testing. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 33-50, Carlsbad, CA, October 2018. Google Scholar
  19. Luke Nelson, James Bornholt, Ronghui Gu, Andrew Baumann, Emina Torlak, and Xi Wang. Scaling symbolic evaluation for automated verification of systems code with Serval. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), pages 225-242, Huntsville, Ontario, Canada, October 2019. Google Scholar
  20. Aina Niemetz, Mathias Preiner, and Armin Biere. Boolector 2.0. Journal on Satisfiability, Boolean Modeling and Computation (JSAT), 9:53-58, 2015. Google Scholar
  21. Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. The log-structured merge-tree (lsm-tree). Acta Informatica, 33(4):351-385, June 1996. Google Scholar
  22. Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 433-448, Broomfield, CO, October 2014. Google Scholar
  23. Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Analysis and evolution of journaling file systems. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 105-120, Anaheim, CA, April 2005. Google Scholar
  24. Ohad Rodeh, Josef Bacik, and Chris Mason. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage, 9(3):9:1-32, August 2013. Google Scholar
  25. M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP), pages 1-15, Pacific Grove, CA, October 1991. Google Scholar
  26. Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing engine for c. In Proceedings of the 13th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), Lisbon, Portugal, September 2005. Google Scholar
  27. Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, and Xi Wang. Push-button verification of file systems via crash refinement. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 1-16, Savannah, GA, November 2016. Google Scholar
  28. Emina Torlak and Rastislav Bodik. A lightweight symbolic virtual machine for solver-aided host languages. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 530-541, Edinburgh, United Kingdom, June 2014. Google Scholar
  29. Emina Torlak and Daniel Jackson. Kodkod: A relational model finder. In Proceedings of the 13th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Braga, Portugal, March-April 2007. Google Scholar
  30. Stephen C. Tweedie. Journaling the Linux ext2fs filesystem. In Proceedings of the 4th Annual LinuxExpo, Durham, NC, May 1998. Google Scholar
  31. Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M.K. Martin, and Rajeev Alur. TRANSIT: Specifying protocols with concolic snippets. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 287-296, Seattle, WA, June 2013. Google Scholar
  32. John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constantinides. Automatically comparing memory consistency models. In Proceedings of the 44th ACM Symposium on Principles of Programming Languages (POPL), Paris, France, January 2017. Google Scholar
  33. Junfeng Yang, Paul Twohey, Dawson Engler, and Madanlal Musuvathi. Using model checking to find serious file system errors. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 273-287, San Francisco, CA, December 2004. Google Scholar
  34. Junfeng Yang, Paul Twohey, Dawson Engler, and Madanlal Musuvathi. eXplode: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 131-146, Seattle, WA, November 2006. Google Scholar
  35. Mai Zheng, Joseph Tucek, Dachuan Huang, Feng Qin, Mark Lillibridge, Elizabeth S. Yang, Bill W. Zhao, and Shashank Singh. Torturing databases for fun and profit. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 449-464, Broomfield, CO, October 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail