Automatic Test-Case Reduction in Proof Assistants: A Case Study in Coq

Authors Jason Gross , Théo Zimmermann , Miraya Poddar-Agrawal , Adam Chlipala

Thumbnail PDF


  • Filesize: 0.65 MB
  • 18 pages

Document Identifiers

Author Details

Jason Gross
  • CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
  • Machine Intelligence Research Institute, Berkeley, CA, USA
Théo Zimmermann
  • Inria, Université Paris Cité, CNRS, IRIF, F-75013, Paris, France
Miraya Poddar-Agrawal
  • Reed College, Portland, OR, USA
Adam Chlipala
  • CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA

Cite AsGet BibTex

Jason Gross, Théo Zimmermann, Miraya Poddar-Agrawal, and Adam Chlipala. Automatic Test-Case Reduction in Proof Assistants: A Case Study in Coq. In 13th International Conference on Interactive Theorem Proving (ITP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 237, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


As the adoption of proof assistants increases, there is a need for efficiency in identifying, documenting, and fixing compatibility issues that arise from proof-assistant evolution. We present the Coq Bug Minimizer, a tool for reproducing buggy behavior with minimal and standalone files, integrated with coqbot to trigger automatically on failures from Coq’s reverse dependency compatibility testing. Our tool eliminates the overhead of having to download, set up, compile, and then explore and understand large developments, enabling Coq developers to easily obtain modular test-case files for fast experimentation. In this paper, we describe insights about how test-case reduction is different in Coq than in traditional compilers. We expect that our insights will generalize to other proof assistants. We evaluate the Coq Bug Minimizer on over 150 compatibility testing failures. Our tool succeeds in reducing failures to smaller test cases roughly 75% of the time. The minimizer produces a fully standalone test case 89% of the time, and it is on average about one-third the size of the original test. The average reduced test case compiles in 1.25 seconds, with 75% taking under half a second.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Software evolution
  • Software and its engineering → Maintaining software
  • Software and its engineering → Compilers
  • Software and its engineering → Formal software verification
  • debugging
  • automatic test-case reduction
  • Coq
  • bug minimizer


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Jasmin Christian Blanchette and Tobias Nipkow. Nitpick: A counterexample generator for higher-order logic based on a relational model finder. In Matt Kaufmann and Lawrence C. Paulson, editors, Interactive Theorem Proving, pages 131-146, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. URL:
  2. Martin Burger, Karsten Lehmann, and Andreas Zeller. Automated debugging in Eclipse. In Companion to the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA '05, pages 184-185, New York, NY, USA, 2005. Association for Computing Machinery. URL:
  3. Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. A survey of compiler testing. ACM Comput. Surv., 53(1), February 2020. URL:
  4. Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern, Eric Eide, and John Regehr. Taming compiler fuzzers. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 197-208, New York, NY, USA, 2013. Association for Computing Machinery. URL:
  5. Holger Cleve and Andreas Zeller. Finding failure causes through automated testing. In Mireille Ducassé, editor, Proceedings of the Fourth International Workshop on Automated Debugging, AADEBUG 2000, Munich, Germany, August 28-30th, 2000, 2000. URL:
  6. Jason Gross. Coq bug minimizer, January 2015. Presented at URL:
  7. Satia Herfert, Jibesh Patra, and Michael Pradel. Automatically reducing tree-structured test inputs. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, pages 861-871, Urbana-Champaign, IL, USA, 2017. IEEE Press. URL:
  8. Josie Holmes, Alex Groce, and Mohammad Amin Alipour. Mitigating (and exploiting) test reduction slippage. In Proceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation, A-TEST 2016, pages 66-69, New York, NY, USA, 2016. Association for Computing Machinery. URL:
  9. Vu Le, Mehrdad Afshari, and Zhendong Su. Compiler validation via equivalence modulo inputs. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 216-226, New York, NY, USA, 2014. Association for Computing Machinery. URL:
  10. Lina Ochoa, Thomas Degueule, and Jean-Rémy Falleri. BreakBot: Analyzing the impact of breaking changes to assist library evolution. In 44th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results, ICSE (NIER) 2022. IEEE, 2022. Google Scholar
  11. Zoe Paraskevopoulou, Cătălin Hriţcu, Maxime Dénès, Leonidas Lampropoulos, and Benjamin C. Pierce. Foundational property-based testing. In ITP 2015 - 6th conference on Interactive Theorem Proving, volume 9236 of Lecture Notes in Computer Science, Nanjing, China, August 2015. Springer. URL:
  12. John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. Test-case reduction for C compiler bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 335-346, New York, NY, USA, 2012. Association for Computing Machinery. URL:
  13. Matthieu Sozeau, Simon Boulier, Yannick Forster, Nicolas Tabareau, and Théo Winterhalter. Coq Coq Correct! verification of type checking and erasure for Coq, in Coq. Proc. ACM Program. Lang., 4(POPL), December 2019. URL:
  14. Daniel S. Wilkerson and Scott McPeak. delta - delta assists you in minimizing "interesting" files subject to a test of their interestingness, February 2006. Presented at URL:
  15. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 283-294, New York, NY, USA, 2011. Association for Computing Machinery. URL:
  16. Andreas Zeller. Isolating cause-effect chains from computer programs. In Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering, SIGSOFT '02/FSE-10, pages 1-10, New York, NY, USA, 2002. Association for Computing Machinery. URL:
  17. Andreas Zeller. Why Programs Fail: A Guide to Systematic Debugging. Elsevier, 2009. Google Scholar
  18. Théo Zimmermann. Challenges in the collaborative evolution of a proof language and its ecosystem. PhD thesis, Université de Paris, 2019. URL:
  19. Théo Zimmermann, Julien Coolen, Jason Gross, Pierre-Marie Pédrot, and Gaëtan Gilbert. Advantages of maintaining a multi-task project-specific bot: an experience report. working paper, 2022. URL: