Seventeen Provers Under the Hammer

Authors Martin Desharnais , Petar Vukmirović , Jasmin Blanchette , Makarius Wenzel



PDF
Thumbnail PDF

File

LIPIcs.ITP.2022.8.pdf
  • Filesize: 0.75 MB
  • 18 pages

Document Identifiers

Author Details

Martin Desharnais
  • Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany
  • Max-Planck-Institut für Informatik, Saarland Informatics Campus, Saarbrücken, Germany
Petar Vukmirović
  • Vrije Universiteit Amsterdam, The Netherlands
Jasmin Blanchette
  • Vrije Universiteit Amsterdam, The Netherlands
  • Max-Planck-Institut für Informatik, Saarland Informatics Campus, Saarbrücken, Germany
Makarius Wenzel
  • Augsburg, Germany

Acknowledgements

We are grateful to the maintainers of StarExec for letting us use their service. Developers of automatic theorem provers, including Peter Baumgartner, Ahmed Bhayat, Pascal Fontaine, Konstantin Korovin, Giles Reger, Andrew Reynolds, Philipp Rümmer, Alexander Steen, and Martin Suda have helped us run their systems. Michael Färber, Thibault Gauthier, Konstantin Korovin, Lorenz Leutgeb, Jens Otten, Philipp Rümmer, Stephan Schulz, Hans-Jörg Schurr, Alexander Steen, Martin Suda, Mark Summerfield, Dmitriy Traytel, Josef Urban, and the anonymous reviewers suggested some textual improvements.

Cite AsGet BibTex

Martin Desharnais, Petar Vukmirović, Jasmin Blanchette, and Makarius Wenzel. Seventeen Provers Under the Hammer. In 13th International Conference on Interactive Theorem Proving (ITP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 237, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ITP.2022.8

Abstract

One of the main success stories of automatic theorem provers has been their integration into proof assistants. Such integrations, or "hammers," increase proof automation and hence user productivity. In this paper, we use Isabelle/HOL’s Sledgehammer tool to find out how useful modern provers are at proving formulas in higher-order logic. Our evaluation follows in the steps of Böhme and Nipkow’s Judgment Day study from 2010, but instead of three provers we use 17, including SMT solvers and higher-order provers. Our work offers an alternative yardstick for comparing modern provers, next to the benchmarks and competitions emerging from the TPTP World and SMT-LIB.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Theorem proving algorithms
Keywords
  • Automatic theorem proving
  • interactive theorem proving
  • proof assistants

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Haniel Barbosa, Andrew Reynolds, Daniel El Ouraoui, Cesare Tinelli, and Clark W. Barrett. Extending SMT solvers to higher-order logic. In Pascal Fontaine, editor, CADE-27, volume 11716 of LNCS, pages 35-54. Springer, 2019. Google Scholar
  2. Clark Barrett, Pascal Fontaine, and Cesare Tinelli. The SMT-LIB standard: Version 2.6. Technical report, Department of Computer Science, The University of Iowa, 2017. URL: https://www.SMT-LIB.org/.
  3. Clark W. Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanovic, Tim King, Andrew Reynolds, and Cesare Tinelli. CVC4. In Ganesh Gopalakrishnan and Shaz Qadeer, editors, CAV 2011, volume 6806 of LNCS, pages 171-177. Springer, 2011. Google Scholar
  4. Clark W. Barrett, Leonardo de Moura, and Aaron Stump. SMT-COMP: satisfiability modulo theories competition. In Kousha Etessami and Sriram K. Rajamani, editors, CAV 2005, volume 3576 of LNCS, pages 20-23. Springer, 2005. Google Scholar
  5. Peter Baumgartner, Joshua Bax, and Uwe Waldmann. Beagle - A hierarchic superposition theorem prover. In Amy P. Felty and Aart Middeldorp, editors, CADE-25, volume 9195 of LNCS, pages 367-377. Springer, 2015. Google Scholar
  6. Alexander Bentkamp, Jasmin Blanchette, Simon Cruanes, and Uwe Waldmann. Superposition for lambda-free higher-order logic. Log. Meth. Comput. Sci., 17(2):1:1-1:38, 2021. Google Scholar
  7. Alexander Bentkamp, Jasmin Blanchette, Sophie Tourret, and Petar Vukmirović. Superposition for full higher-order logic. In André Platzer and Geoff Sutcliffe, editors, CADE-28, volume 12699 of LNCS, pages 396-412. Springer, 2021. Google Scholar
  8. Alexander Bentkamp, Jasmin Blanchette, Sophie Tourret, Petar Vukmirovic, and Uwe Waldmann. Superposition with lambdas. J. Autom. Reason., 65(7):893-940, 2021. Google Scholar
  9. Ahmed Bhayat and Giles Reger. A combinator-based superposition calculus for higher-order logic. In Nicolas Peltier and Viorica Sofronie-Stokkermans, editors, IJCAR 2020, Part I, volume 12166 of LNCS, pages 278-296. Springer, 2020. Google Scholar
  10. Jasmin Blanchette. Hammering away: A user’s guide to Sledgehammer for Isabelle/HOL, 2021. URL: https://isabelle.in.tum.de/website-Isabelle2021-1/dist/Isabelle2021-1/doc/sledgehammer.pdf.
  11. Jasmin Christian Blanchette, Sascha Böhme, and Lawrence C. Paulson. Extending Sledgehammer with SMT solvers. J. Autom. Reason., 51(1):109-128, 2013. Google Scholar
  12. Jasmin Christian Blanchette, Sascha Böhme, Andrei Popescu, and Nicholas Smallbone. Encoding monomorphic and polymorphic types. Log. Meth. Comput. Sci., 12(4), 2016. Google Scholar
  13. Jasmin Christian Blanchette, David Greenaway, Cezary Kaliszyk, Daniel Kühlwein, and Josef Urban. A learning-based fact selector for Isabelle/HOL. J. Autom. Reason., 57(3):219-244, 2016. Google Scholar
  14. Jasmin Christian Blanchette, Maximilian Haslbeck, Daniel Matichuk, and Tobias Nipkow. Mining the archive of formal proofs. In Manfred Kerber, editor, CICM 2015, volume 9150 of LNCS, pages 1-15. Springer, 2015. Google Scholar
  15. Jasmin Christian Blanchette, Cezary Kaliszyk, Lawrence C. Paulson, and Josef Urban. Hammering towards QED. J. Formaliz. Reason., 9(1):101-148, 2016. Google Scholar
  16. Jasmin Christian Blanchette, Nicolas Peltier, and Simon Robillard. Superposition with datatypes and codatatypes. In Didier Galmiche, Stephan Schulz, and Roberto Sebastiani, editors, IJCAR 2018, volume 10900 of LNCS, pages 370-387. Springer, 2018. Google Scholar
  17. Jasmin Christian Blanchette, Andrei Popescu, Daniel Wand, and Christoph Weidenbach. More SPASS with Isabelle: Superposition with hard sorts and configurable simplification. In Lennart Beringer and Amy Felty, editors, ITP 2012, volume 7406 of LNCS, pages 345-360. Springer, 2012. Google Scholar
  18. François Bobot and Andrei Paskevich. Expressing polymorphic types in a many-sorted language. In Cesare Tinelli and Viorica Sofronie-Stokkermans, editors, FroCoS 2011, volume 6989 of LNCS, pages 87-102. Springer, 2011. Google Scholar
  19. Sascha Böhme. Proving Theorems of Higher-Order Logic with SMT Solvers. PhD thesis, Technische Universität München, 2012. Google Scholar
  20. Sascha Böhme and Tobias Nipkow. Sledgehammer: Judgement Day. In Jürgen Giesl and Reiner Hähnle, editors, IJCAR 2010, volume 6173 of LNCS, pages 107-121. Springer, 2010. Google Scholar
  21. Sascha Böhme and Tjark Weber. Fast LCF-style proof reconstruction for Z3. In Matt Kaufmann and Lawrence C. Paulson, editors, ITP 2010, volume 6172 of LNCS, pages 179-194. Springer, 2010. Google Scholar
  22. Richard Bonichon, David Delahaye, and Damien Doligez. Zenon: An extensible automated theorem prover producing checkable proofs. In Nachum Dershowitz and Andrei Voronkov, editors, LPAR 2007, volume 4790 of LNCS, pages 151-165. Springer, 2007. Google Scholar
  23. Thomas Bouton, Diego Caminha Barbosa De Oliveira, David Déharbe, and Pascal Fontaine. veriT: An open, trustable and efficient SMT-solver. In Renate A. Schmidt, editor, CADE-22, volume 5663 of LNCS, pages 151-156. Springer, 2009. Google Scholar
  24. Chad E. Brown. Satallax: An automatic higher-order prover. In Bernhard Gramlich, Dale Miller, and Uli Sattler, editors, IJCAR 2012, volume 7364 of LNCS, pages 111-117. Springer, 2012. Google Scholar
  25. Chad E. Brown, Thibault Gauthier, Cezary Kaliszyk, Geoff Sutcliffe, and Josef Urban. GRUNGE: A grand unified ATP challenge. In Pascal Fontaine, editor, CADE-27, volume 11716 of LNCS, pages 123-141. Springer, 2019. Google Scholar
  26. Lukas Bulwahn. The new Quickcheck for Isabelle - Random, exhaustive and symbolic testing under one roof. In Chris Hawblitzel and Dale Miller, editors, CPP 2012, volume 7679 of LNCS, pages 92-108. Springer, 2012. Google Scholar
  27. Łukasz Czajka and Cezary Kaliszyk. Hammer for Coq: Automation for dependent type theory, 2018. Google Scholar
  28. Leonardo Mendonça de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In C. R. Ramakrishnan and Jakob Rehof, editors, TACAS 2008, volume 4963 of LNCS, pages 337-340. Springer, 2008. Google Scholar
  29. André Duarte and Konstantin Korovin. Implementing superposition in iProver (system description). In Nicolas Peltier and Viorica Sofronie-Stokkermans, editors, IJCAR 2020, Part II, volume 12167 of LNCS, pages 388-397. Springer, 2020. Google Scholar
  30. Jean-Christophe Filliâtre. One logic to use them all. In Maria Paola Bonacina, editor, CADE-24, volume 7898 of LNCS, pages 1-20. Springer, 2013. Google Scholar
  31. Krystof Hoder and Andrei Voronkov. Sine qua non for large theory reasoning. In Nikolaj Bjørner and Viorica Sofronie-Stokkermans, editors, CADE-23, volume 6803 of LNCS, pages 299-314. Springer, 2011. Google Scholar
  32. Edvard K. Holden and Konstantin Korovin. Heterogeneous heuristic optimisation and scheduling for first-order theorem proving. In Fairouz Kamareddine and Claudio Sacerdoti Coen, editors, CICM 2021, volume 12833 of LNCS, pages 107-123. Springer, 2021. Google Scholar
  33. R. J. M. Hughes. Super-combinators: A new implementation method for applicative languages. In LFP 1982, pages 1-10. ACM Press, 1982. Google Scholar
  34. Jan Jakubův and Josef Urban. ENIGMA: efficient learning-based inference guiding machine. In Herman Geuvers, Matthew England, Osman Hasan, Florian Rabe, and Olaf Teschke, editors, CICM 2017, volume 10383 of LNCS, pages 292-302. Springer, 2017. Google Scholar
  35. Cezary Kaliszyk and Josef Urban. HOL(y)Hammer: Online ATP service for HOL Light. Math. Comput. Sci., 9(1):5-22, 2015. Google Scholar
  36. Cezary Kaliszyk and Josef Urban. MizAR 40 for Mizar 40. J. Autom. Reason., 55(3):245-256, 2015. Google Scholar
  37. Konstantin Korovin. iProver - An instantiation-based theorem prover for first-order logic (system description). In Alessandro Armando, Peter Baumgartner, and Gilles Dowek, editors, IJCAR 2008, volume 5195 of LNCS, pages 292-298. Springer, 2008. Google Scholar
  38. Laura Kovács and Andrei Voronkov. First-order theorem proving and Vampire. In Natasha Sharygina and Helmut Veith, editors, CAV 2013, volume 8044 of LNCS, pages 1-35. Springer, 2013. Google Scholar
  39. Fredrik Lindblad. A focused sequent calculus for higher-order logic. In Stéphane Demri, Deepak Kapur, and Christoph Weidenbach, editors, IJCAR 2014, volume 8562 of LNCS, pages 61-75. Springer, 2014. Google Scholar
  40. Fredrik Lindblad and Marcin Benke. A tool for automated theorem proving in Agda. In Jean-Christophe Filliâtre, Christine Paulin-Mohring, and Benjamin Werner, editors, TYPES 2004, volume 3839 of LNCS, pages 154-169. Springer, 2004. Google Scholar
  41. Jia Meng and Lawrence C. Paulson. Translating higher-order clauses to first-order clauses. J. Autom. Reason., 40(1):35-60, 2008. Google Scholar
  42. Jia Meng and Lawrence C. Paulson. Lightweight relevance filtering for machine-generated resolution problems. J. Applied Logic, 7(1):41-57, 2009. Google Scholar
  43. Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL: A Proof Assistant for Higher-Order Logic, volume 2283 of LNCS. Springer, 2002. Google Scholar
  44. Jens Otten. leanCoP 2.0 and ileanCoP 1.2: High performance lean theorem proving in classical and intuitionistic logic (system descriptions). In Alessandro Armando, Peter Baumgartner, and Gilles Dowek, editors, IJCAR 2008, volume 5195 of LNCS, pages 283-291. Springer, 2008. Google Scholar
  45. Lawrence C. Paulson and Jasmin Christian Blanchette. Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers. In Geoff Sutcliffe, Stephan Schulz, and Eugenia Ternovska, editors, IWIL-2010, volume 2 of EPiC, pages 1-11. EasyChair, 2012. Google Scholar
  46. Lawrence C. Paulson and Kong Woei Susanto. Source-level proof reconstruction for interactive theorem proving. In Klaus Schneider and Jens Brandt, editors, TPHOLs 2007, volume 4732 of LNCS, pages 232-245. Springer, 2007. Google Scholar
  47. Andrew Reynolds and Jasmin Christian Blanchette. A decision procedure for (co)datatypes in SMT solvers. J. Autom. Reason., 58(3):341-362, 2017. Google Scholar
  48. Philipp Rümmer. A constraint sequent calculus for first-order logic with linear integer arithmetic. In Iliano Cervesato, Helmut Veith, and Andrei Voronkov, editors, LPAR 2008, volume 5330 of LNCS, pages 274-289. Springer, 2008. Google Scholar
  49. Stephan Schulz, Simon Cruanes, and Petar Vukmirovic. Faster, higher, stronger: E 2.3. In Pascal Fontaine, editor, CADE-27, volume 11716 of LNCS, pages 495-507. Springer, 2019. Google Scholar
  50. Hans-Jörg Schurr, Mathias Fleury, and Martin Desharnais. Reliable reconstruction of fine-grained proofs in a proof assistant. In André Platzer and Geoff Sutcliffe, editors, CADE-28, volume 12699 of LNCS, pages 450-467. Springer, 2021. Google Scholar
  51. Alexander Steen and Christoph Benzmüller. Extensional higher-order paramodulation in Leo-III. J. Autom. Reason., 65(6):775-807, 2021. Google Scholar
  52. Aaron Stump, Geoff Sutcliffe, and Cesare Tinelli. StarExec: A cross-community infrastructure for logic solving. In Stéphane Demri, Deepak Kapur, and Christoph Weidenbach, editors, IJCAR 2014, volume 8562 of LNCS, pages 367-373. Springer, 2014. Google Scholar
  53. Nik Sultana, Jasmin Christian Blanchette, and Lawrence C. Paulson. LEO-II and Satallax on the Sledgehammer test bench. J. Applied Logic, 11(1):91-102, 2013. Google Scholar
  54. Geoff Sutcliffe. The TPTP problem library and associated infrastructure - From CNF to TH0, TPTP v6.4.0. J. Autom. Reason., 59(4):483-502, 2017. Google Scholar
  55. Geoff Sutcliffe. The 10th IJCAR automated theorem proving system competition - CASC-J10. AI Commun., 34(2):163-177, 2021. Google Scholar
  56. Geoff Sutcliffe and Martin Desharnais. The CADE-28 automated theorem proving system competition - CASC-28. AI Communications, 34(4):259-276, 2022. Google Scholar
  57. David A. Turner. A new implementation technique for applicative languages. Softw. Pract. Exper., 9:31-49, 1979. Google Scholar
  58. Josef Urban, Piotr Rudnicki, and Geoff Sutcliffe. ATP and presentation service for Mizar formalizations. J. Autom. Reason., 50(2):229-241, 2013. Google Scholar
  59. Petar Vukmirović, Alexander Bentkamp, Jasmin Blanchette, Simon Cruanes, Visa Nummelin, and Sophie Tourret. Making higher-order superposition work. In André Platzer and Geoff Sutcliffe, editors, CADE-28, volume 12699 of LNCS, pages 415-432. Springer, 2021. Google Scholar
  60. Petar Vukmirović, Jasmin Christian Blanchette, Simon Cruanes, and Stephan Schulz. Extending a brainiac prover to lambda-free higher-order logic. In Tomas Vojnar and Lijun Zhang, editors, TACAS 2019, volume 11427 of LNCS, pages 192-210. Springer, 2019. Google Scholar
  61. Christoph Weidenbach, Dilyana Dimova, Arnaud Fietzke, Rohit Kumar, Martin Suda, and Patrick Wischnewski. SPASS version 3.5. In Renate A. Schmidt, editor, CADE-22, volume 5663 of LNCS, pages 140-145. Springer, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail