Verified Progress Tracking for Timely Dataflow

Authors Matthias Brun, Sára Decova, Andrea Lattuada, Dmitriy Traytel



PDF
Thumbnail PDF

File

LIPIcs.ITP.2021.10.pdf
  • Filesize: 0.76 MB
  • 20 pages

Document Identifiers

Author Details

Matthias Brun
  • Department of Computer Science, ETH Zürich, Switzerland
Sára Decova
  • Department of Computer Science, ETH Zürich, Switzerland
Andrea Lattuada
  • Department of Computer Science, ETH Zürich, Switzerland
Dmitriy Traytel
  • Department of Computer Science, University of Copenhagen, Denmark

Acknowledgements

We thank David Basin and Timothy Roscoe for supporting this work and Frank McSherry for providing valuable input on our formalization, e.g, by suggesting to consider the implied_frontier notion and to show that it is what local propagation computes. David Cock, Jon Howell, and Frank McSherry provided helpful feedback after reading early drafts of this paper.

Cite AsGet BibTex

Matthias Brun, Sára Decova, Andrea Lattuada, and Dmitriy Traytel. Verified Progress Tracking for Timely Dataflow. In 12th International Conference on Interactive Theorem Proving (ITP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 193, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ITP.2021.10

Abstract

Large-scale stream processing systems often follow the dataflow paradigm, which enforces a program structure that exposes a high degree of parallelism. The Timely Dataflow distributed system supports expressive cyclic dataflows for which it offers low-latency data- and pipeline-parallel stream processing. To achieve high expressiveness and performance, Timely Dataflow uses an intricate distributed protocol for tracking the computation’s progress. We modeled the progress tracking protocol as a combination of two independent transition systems in the Isabelle/HOL proof assistant. We specified and verified the safety of the two components and of the combined protocol. To this end, we identified abstract assumptions on dataflow programs that are sufficient for safety and were not previously formalized.

Subject Classification

ACM Subject Classification
  • Security and privacy → Logic and verification
  • Computing methodologies → Distributed algorithms
  • Software and its engineering → Data flow languages
Keywords
  • safety
  • distributed systems
  • timely dataflow
  • Isabelle/HOL

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Github: Timely dataflow. URL: https://github.com/TimelyDataflow/timely-dataflow/.
  2. Materialize: Incrementally-updated materialized views. URL: https://materialize.com.
  3. Martín Abadi and Michael Isard. Timely dataflow: A model. In Susanne Graf and Mahesh Viswanathan, editors, FORTE 2015, volume 9039 of LNCS, pages 131-145. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-19195-9_9.
  4. Martín Abadi, Frank McSherry, Derek Gordon Murray, and Thomas L. Rodeheffer. Formal analysis of a distributed algorithm for tracking progress. In Dirk Beyer and Michele Boreale, editors, FMOODS/FORTE 2013, volume 7892 of LNCS, pages 5-19. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-38592-6_2.
  5. Clemens Ballarin. Locales: A module system for mathematical theories. J. Autom. Reason., 52(2):123-153, 2014. URL: https://doi.org/10.1007/s10817-013-9284-7.
  6. Véronique Benzaken and Evelyne Contejean. A Coq mechanised formal semantics for realistic SQL queries: formally reconciling SQL and bag relational algebra. In Assia Mahboubi and Magnus O. Myreen, editors, CPP 2019, pages 249-261. ACM, 2019. URL: https://doi.org/10.1145/3293880.3294107.
  7. Véronique Benzaken, Evelyne Contejean, Chantal Keller, and E. Martins. A Coq formalisation of SQL’s execution engines. In Jeremy Avigad and Assia Mahboubi, editors, ITP 2018, volume 10895 of LNCS, pages 88-107. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-94821-8_6.
  8. Jasmin Christian Blanchette, Mathias Fleury, and Dmitriy Traytel. Nested multisets, hereditary multisets, and syntactic ordinals in Isabelle/HOL. In Dale Miller, editor, FSCD 2017, volume 84 of LIPIcs, pages 11:1-11:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.FSCD.2017.11.
  9. Matthias Brun, Sára Decova, Andrea Lattuada, and Dmitriy Traytel. Verified progress tracking for timely dataflow (extended report), 2021. URL: https://www.github.com/matthias-brun/progress-tracking-formalization/raw/main/rep.pdf.
  10. Kaustuv Chaudhuri, Damien Doligez, Leslie Lamport, and Stephan Merz. Verifying safety properties with the TLA+ proof system. In Jürgen Giesl and Reiner Hähnle, editors, IJCAR 2010, volume 6173 of LNCS, pages 142-148. Springer, 2010. URL: https://doi.org/10.1007/978-3-642-14203-1_12.
  11. Shumo Chu, Chenglong Wang, Konstantin Weitz, and Alvin Cheung. Cosette: An automated prover for SQL. In CIDR 2017. www.cidrdb.org, 2017. URL: http://cidrdb.org/cidr2017/papers/p51-chu-cidr17.pdf.
  12. Sára Decova. Modelling and verification of the Timely Dataflow progress tracking protocol. Master’s thesis, ETH Zurich, Zurich, 2020. URL: https://doi.org/10.3929/ethz-b-000444762.
  13. Tomás Díaz, Federico Olmedo, and Éric Tanter. A mechanized formalization of GraphQL. In Jasmin Blanchette and Catalin Hritcu, editors, CPP 2020, pages 201-214. ACM, 2020. URL: https://doi.org/10.1145/3372885.3373822.
  14. Victor B. F. Gomes, Martin Kleppmann, Dominic P. Mulligan, and Alastair R. Beresford. Verifying strong eventual consistency in distributed systems. Proc. ACM Program. Lang., 1(OOPSLA):109:1-109:28, 2017. URL: https://doi.org/10.1145/3133933.
  15. Travis Hance, Andrea Lattuada, Chris Hawblitzel, Jon Howell, Rob Johnson, and Bryan Parno. Storage systems are distributed systems (so verify them that way!). In OSDI 2020, pages 99-115. USENIX Association, 2020. URL: https://www.usenix.org/conference/osdi20/presentation/hance.
  16. Chris Hawblitzel, Jon Howell, Manos Kapritsos, Jacob R. Lorch, Bryan Parno, Michael L. Roberts, Srinath T. V. Setty, and Brian Zill. Ironfleet: proving practical distributed systems correct. In Ethan L. Miller and Steven Hand, editors, SOSP 2015, pages 1-17. ACM, 2015. URL: https://doi.org/10.1145/2815400.2815428.
  17. Jonas Kastberg Hinrichsen, Jesper Bengtson, and Robbert Krebbers. Actris: session-type based reasoning in separation logic. Proc. ACM Program. Lang., 4(POPL):6:1-6:30, 2020. URL: https://doi.org/10.1145/3371074.
  18. C. A. R. Hoare. Communicating sequential processes. Commun. ACM, 21(8):666-677, 1978. URL: https://doi.org/10.1145/359576.359585.
  19. Brian Huffman and Ondrej Kuncar. Lifting and Transfer: A modular design for quotients in Isabelle/HOL. In Georges Gonthier and Michael Norrish, editors, CPP 2013, volume 8307 of LNCS, pages 131-146. Springer, 2013. URL: https://doi.org/10.1007/978-3-319-03545-1_9.
  20. Ralf Jung, Robbert Krebbers, Jacques-Henri Jourdan, Ales Bizjak, Lars Birkedal, and Derek Dreyer. Iris from the ground up: A modular foundation for higher-order concurrent separation logic. J. Funct. Program., 28:e20, 2018. URL: https://doi.org/10.1017/S0956796818000151.
  21. Leslie Lamport. Paxos made simple, fast, and Byzantine. In Alain Bui and Hacène Fouchal, editors, OPODIS 2002, volume 3 of Studia Informatica Universalis, pages 7-9. Suger, Saint-Denis, rue Catulienne, France, 2002. Google Scholar
  22. Mohsen Lesani, Christian J. Bell, and Adam Chlipala. Chapar: certified causally consistent distributed key-value stores. In Rastislav Bodík and Rupak Majumdar, editors, POPL 2016, pages 357-370. ACM, 2016. URL: https://doi.org/10.1145/2837614.2837622.
  23. J. Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. Toward a verified relational database management system. In Manuel V. Hermenegildo and Jens Palsberg, editors, POPL 2010, pages 237-248. ACM, 2010. URL: https://doi.org/10.1145/1706299.1706329.
  24. Frank McSherry, Andrea Lattuada, Malte Schwarzkopf, and Timothy Roscoe. Shared arrangements: practical inter-query sharing for streaming dataflows. Proc. VLDB Endow., 13(10):1793-1806, 2020. URL: http://www.vldb.org/pvldb/vol13/p1793-mcsherry.pdf.
  25. Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. Differential dataflow. In CIDR 2013. www.cidrdb.org, 2013. URL: http://cidrdb.org/cidr2013/Papers/CIDR13_Paper111.pdf.
  26. Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: a timely dataflow system. In Michael Kaminsky and Mike Dahlin, editors, SOSP 2013, pages 439-455. ACM, 2013. URL: https://doi.org/10.1145/2517349.2522738.
  27. Derek Gordon Murray, Frank McSherry, Michael Isard, Rebecca Isaacs, Paul Barham, and Martín Abadi. Incremental, iterative data processing with timely dataflow. Commun. ACM, 59(10):75-83, 2016. URL: https://doi.org/10.1145/2983551.
  28. Diego Ongaro and John K. Ousterhout. In search of an understandable consensus algorithm. In Garth Gibson and Nickolai Zeldovich, editors, USENIX ATC 2014, pages 305-319. USENIX Association, 2014. URL: https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro.
  29. Lawrence C. Paulson and Jasmin Christian Blanchette. Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers. In Geoff Sutcliffe, Stephan Schulz, and Eugenia Ternovska, editors, IWIL 2010, volume 2 of EPiC Series in Computing, pages 1-11. EasyChair, 2010. URL: https://easychair.org/publications/paper/wV.
  30. Ilya Sergey, James R. Wilcox, and Zachary Tatlock. Programming and proving with distributed protocols. Proc. ACM Program. Lang., 2(POPL):28:1-28:30, 2018. URL: https://doi.org/10.1145/3158116.
  31. Christoph Sprenger, Tobias Klenze, Marco Eilers, Felix A. Wolf, Peter Müller, Martin Clochard, and David A. Basin. Igloo: soundly linking compositional refinement and separation logic for distributed system verification. Proc. ACM Program. Lang., 4(OOPSLA):152:1-152:31, 2020. URL: https://doi.org/10.1145/3428220.
  32. Makarius Wenzel. Isabelle/Isar - A generic framework for human-readable proof documents. In Roman Matuszewski and Anna Zalewska, editors, From Insight to Proof: Festschrift in Honour of Andrzej Trybulec, volume 10(23) of Studies in Logic, Grammar, and Rhetoric. Uniwersytet w Białymstoku, 2007. Google Scholar
  33. James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In David Grove and Steve Blackburn, editors, PLDI 2015, pages 357-368. ACM, 2015. URL: https://doi.org/10.1145/2737924.2737958.