Distributed Parallel Build for the Isabelle Archive of Formal Proofs

Authors Fabian Huch , Makarius Wenzel



PDF
Thumbnail PDF

File

LIPIcs.ITP.2024.22.pdf
  • Filesize: 0.85 MB
  • 19 pages

Document Identifiers

Author Details

Fabian Huch
  • Technische Universität München, Garching, Germany
Makarius Wenzel
  • Augsburg, Germany

Acknowledgements

We are immensely grateful to Rainer Kolisch, who shaped the research on project scheduling in operations research with his seminal PSPLIB library, and to Maximilian Kolter, for contributing their invaluable expertise to this project.

Cite AsGet BibTex

Fabian Huch and Makarius Wenzel. Distributed Parallel Build for the Isabelle Archive of Formal Proofs. In 15th International Conference on Interactive Theorem Proving (ITP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 309, pp. 22:1-22:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ITP.2024.22

Abstract

Motivated by the continuously growing performance demands for the Isabelle Archive of Formal Proofs (AFP), we introduce distributed cluster computing to the Isabelle platform. Parallel build time on a single node has approached 4h-8h in recent years: by supporting multiple nodes, without shared memory nor shared file-systems, we target at a substantial speedup factor to get below the critical limit of 45min total elapsed time. Our distributed build tool is part of the regular Isabelle distribution, but specifically adapted to cope with the structure of projects seen in the AFP. In this work, we address two main challenges: (1) the distributed system architecture that has been implemented in Isabelle/Scala, and (2) the build schedule optimization problem for multi-threaded tasks on multiple compute nodes. We introduce a heuristic tuned to the typical AFP session structure, which can generate good schedules in a few seconds. We reached a total speedup factor of over 100, which is a milestone never before reached. Using this approach, we could build the Isabelle distribution in 8min 10s elapsed time, and the AFP in 35min 40s, or 1h 59min 13s including very slow sessions.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Distributed architectures
  • Software and its engineering → Distributed systems organizing principles
  • Mathematics of computing → Discrete optimization
Keywords
  • Interactive theorem proving
  • Isabelle
  • Archive of Formal Proofs
  • Theorem prover technology
  • Distributed computing
  • Schedule optimization

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Archive of Formal Proofs (AFP). ISSN 2150-914x. URL: https://www.isa-afp.org/about.
  2. Francisco Ballestín. When it is worthwhile to work with the stochastic RCPSP?, 2007. URL: https://doi.org/10.1007/s10951-007-0012-1.
  3. Bruno Barras, Carst Tankink, and Enrico Tassi. Asynchronous processing of Coq documents: From the kernel up to the user interface. In Interactive Theorem Proving (ITP 2015), volume 9236 of LNCS. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-22102-1_4.
  4. J. Blazewicz, J. K. Lenstra, A. Kan, and H. G. Rinnooy. Scheduling subject to resource constraints: classification and complexity. Discrete Applied Mathematics, 5, 1983. URL: https://doi.org/10.1016/0166-218X(83)90012-4.
  5. Miquel Bofill, Jordi Coll, Josep Suy, and Mateu Villaret. Solving the Multi-Mode Resource-Constrained Project Scheduling Problem with SMT. In International Conference on Tools with Artificial Intelligence (ICTAI 2016). IEEE, 2016. URL: https://doi.org/10.1109/ICTAI.2016.0045.
  6. Ahmet Celik, Karl Palmskog, and Milos Gligoric. ICoq: Regression proof selection for large-scale verification projects. In Automated Software Engineering (ASE 2017). IEEE, 2017. URL: https://doi.org/10.1109/ASE.2017.8115630.
  7. Edward Wilson Davis. An exact algorithm for the multiple constrained-resource project scheduling problem. Yale University, 1968. Google Scholar
  8. Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. The Lean theorem prover (system description). In Conference on Automated Deduction (CADE 2015), volume 9195 of LNCS. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-21401-6_26.
  9. Evgeny R. Gafarov, Alexander A. Lazarev, and Frank Werner. Approximability results for the resource-constrained project scheduling problem with a single type of resources. Annals of Operations Research, 213, 2014. URL: https://doi.org/10.1007/s10479-012-1106-5.
  10. M. R. Garey and D. S. Johnson. Complexity Results for Multiprocessor Scheduling under Resource Constraints. SIAM Journal on Computing, 4, 1975. URL: https://doi.org/10.1137/0204035.
  11. Martin Josef Geiger. A multi-threaded local search algorithm and computer implementation for the multi-mode, resource-constrained multi-project scheduling problem. European Journal of Operational Research, 256, 2017. URL: https://doi.org/10.1016/j.ejor.2016.07.024.
  12. Daniel Geiss. Optimal Build Strategies for Isabelle. Technical report, Technische Universität München, 2023. Google Scholar
  13. Sönke Hartmann. Project Scheduling with Multiple Modes: A Genetic Algorithm. Annals of Operations Research, 102, 2001. URL: https://doi.org/10.1023/A:1010902015091.
  14. Mark D. Hill and Michael R. Marty. Amdahl’s Law in the Multicore Era. Computer, 41, 2008. URL: https://doi.org/10.1109/MC.2008.209.
  15. Fabian Huch. Formal Entity Graphs as Complex Networks: Assessing Centrality Metrics of the Archive of Formal Proofs. In Conference on Intelligent Computer Mathematics (CICM 2022). Springer, 2022. URL: https://doi.org/10.1007/978-3-031-16681-5_10.
  16. Fabian Huch and Vincent Bode. The Isabelle Community Benchmark. In Practical Aspects of Automated Reasoning (PAAR 2022), volume Vol-3201. CEUR-WS, 2022. URL: https://doi.org/10.48550/arXiv.2209.13894.
  17. Morris A. Jette and Tim Wickberg. Architecture of the Slurm Workload Manager. In Job Scheduling Strategies for Parallel Processing (JSSPP 2023), volume 14283 of LNCS. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-43943-8_1.
  18. Brian Daniel Keller. Models and methods for multiple resource constrained job scheduling under uncertainty. PhD thesis, University of Arizona, 2009. Google Scholar
  19. David Matthews and Makarius Wenzel. Efficient parallel programming in Poly/ML and Isabelle/ML. In Declarative Aspects of Multicore Programming (DAMP 2010). ACM, 2010. URL: https://doi.org/10.1145/1708046.1708058.
  20. Mokhtar Riahi Mohamed. Development of a Distributed Build Platform for Isabelle. Technical report, Technische Universität München, 2022. Google Scholar
  21. Tobias Nipkow and Cornelia Pusch. AVL Trees. Archive of Formal Proofs, 2004. , Formal proof development. URL: https://isa-afp.org/entries/AVL-Trees.html.
  22. Karl Palmskog, Ahmet Celik, and Milos Gligoric. PiCoq: Parallel regression proving for large-scale verification projects. In International Symposium on Software Testing and Analysis (ISSTA 2018), volume 12. ACM, 2018. URL: https://doi.org/10.1145/3213846.3213877.
  23. Robert Pellerin, Nathalie Perrier, and François Berthaut. A survey of hybrid metaheuristics for the resource-constrained project scheduling problem, 2020. URL: https://doi.org/10.1016/j.ejor.2019.01.063.
  24. Tom Reichel, R. Wesley Henderson, Andrew Touchet, Andrew Gardner, and Talia Ringer. Proof Repair Infrastructure for Supervised Models: Building a Large Proof Repair Dataset. In Interactive Theorem Proving (ITP 2023), volume 268 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPIcs.ITP.2023.26.
  25. Christoph Schwindt and Jürgen Zimmermann. Handbook on project management and scheduling, volume 1. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-05443-8.
  26. Frederik Stork. Stochastic Resource-Constrained Project Scheduling. PhD thesis, Technische Universität Berlin, 2001. Google Scholar
  27. Jeffery David Ullman. Polynomial complete scheduling problems. In Proceedings of the Fourth ACM Symposium on Operating System Principles, volume 7 of SIGOPS. ACM, 1973. URL: https://doi.org/10.1145/800009.808055.
  28. Josef Urban. Parallelizing Mizar. In Trends in Contemporary Computer Science. Bialystok University of Technology Publishing Office, 2014. URL: https://doi.org/arXiv:1206.0141v2.
  29. Makarius Wenzel. Parallel Proof Checking in Isabelle/Isar. In Programming Languages for Mechanized Mathematics Systems (PLMMS 2009), 2009. URL: https://files.sketis.net/papers/parallel-isabelle.pdf.
  30. Makarius Wenzel. Shared-Memory Multiprocessing for Interactive Theorem Proving. In Interactive Theorem Proving (ITP 2013), LNCS. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-39634-2_30.
  31. Makarius Wenzel. Asynchronous User Interaction and Tool Integration in Isabelle/PIDE. In Interactive Theorem Proving (ITP 2014), LNCS. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-08970-6_33.
  32. Makarius Wenzel. System description: Isabelle/jEdit in 2014. In User Interfaces for Theorem Provers (UITP 2014), volume 167 of EPTCS, 2014. URL: https://doi.org/10.4204/EPTCS.167.10.
  33. Guidong Zhu, Jonathan F. Bard, and Gang Yu. A branch-and-cut procedure for the multimode resource-constrained project-scheduling problem. Journal on Computing, 18, 2006. URL: https://doi.org/10.1287/ijoc.1040.0121.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail