Automatic Test-Case Reduction in Proof Assistants: A Case Study in Coq

Gross, Jason; Zimmermann, Théo; Poddar-Agrawal, Miraya; Chlipala, Adam

doi:10.4230/LIPIcs.ITP.2022.18

Abstract

As the adoption of proof assistants increases, there is a need for efficiency in identifying, documenting, and fixing compatibility issues that arise from proof-assistant evolution. We present the Coq Bug Minimizer, a tool for reproducing buggy behavior with minimal and standalone files, integrated with coqbot to trigger automatically on failures from Coq’s reverse dependency compatibility testing. Our tool eliminates the overhead of having to download, set up, compile, and then explore and understand large developments, enabling Coq developers to easily obtain modular test-case files for fast experimentation. In this paper, we describe insights about how test-case reduction is different in Coq than in traditional compilers. We expect that our insights will generalize to other proof assistants. We evaluate the Coq Bug Minimizer on over 150 compatibility testing failures. Our tool succeeds in reducing failures to smaller test cases roughly 75% of the time. The minimizer produces a fully standalone test case 89% of the time, and it is on average about one-third the size of the original test. The average reduced test case compiles in 1.25 seconds, with 75% taking under half a second.

Cite As Get BibTex

Jason Gross, Théo Zimmermann, Miraya Poddar-Agrawal, and Adam Chlipala. Automatic Test-Case Reduction in Proof Assistants: A Case Study in Coq. In 13th International Conference on Interactive Theorem Proving (ITP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 237, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ITP.2022.18

Author Details

Jason Gross

CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
Machine Intelligence Research Institute, Berkeley, CA, USA

Théo Zimmermann

Inria, Université Paris Cité, CNRS, IRIF, F-75013, Paris, France

Miraya Poddar-Agrawal

Reed College, Portland, OR, USA

Adam Chlipala

CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA

Funding

This work was supported in part by a Google Research Award, National Science Foundation grants CCF-1253229, CCF-1512611, and CCF-1521584, and the National Science Foundation Graduate Research Fellowship under Grant Nos. 1122374 and 1745302. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Supplementary Materials

Dataset (Data and Data Analysis Script) https://doi.org/10.6084/m9.figshare.19141952.v1
Software (Source Code) https://github.com/JasonGross/coq-tools browse archived version

References

Jasmin Christian Blanchette and Tobias Nipkow. Nitpick: A counterexample generator for higher-order logic based on a relational model finder. In Matt Kaufmann and Lawrence C. Paulson, editors, Interactive Theorem Proving, pages 131-146, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/978-3-642-14052-5_11.
Martin Burger, Karsten Lehmann, and Andreas Zeller. Automated debugging in Eclipse. In Companion to the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA '05, pages 184-185, New York, NY, USA, 2005. Association for Computing Machinery. URL: https://doi.org/10.1145/1094855.1094926.
Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. A survey of compiler testing. ACM Comput. Surv., 53(1), February 2020. URL: https://doi.org/10.1145/3363562.
Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern, Eric Eide, and John Regehr. Taming compiler fuzzers. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 197-208, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2491956.2462173.
Holger Cleve and Andreas Zeller. Finding failure causes through automated testing. In Mireille Ducassé, editor, Proceedings of the Fourth International Workshop on Automated Debugging, AADEBUG 2000, Munich, Germany, August 28-30th, 2000, 2000. URL: http://arxiv.org/abs/cs/0012009.
Jason Gross. Coq bug minimizer, January 2015. Presented at https://coqpl.cs.washington.edu/2014/07/31/. URL: https://jasongross.github.io/papers/2015-coq-bug-minimizer.pdf.
Satia Herfert, Jibesh Patra, and Michael Pradel. Automatically reducing tree-structured test inputs. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, pages 861-871, Urbana-Champaign, IL, USA, 2017. IEEE Press. URL: https://doi.org/10.1109/ase.2017.8115697.
Josie Holmes, Alex Groce, and Mohammad Amin Alipour. Mitigating (and exploiting) test reduction slippage. In Proceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation, A-TEST 2016, pages 66-69, New York, NY, USA, 2016. Association for Computing Machinery. URL: https://doi.org/10.1145/2994291.2994301.
Vu Le, Mehrdad Afshari, and Zhendong Su. Compiler validation via equivalence modulo inputs. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 216-226, New York, NY, USA, 2014. Association for Computing Machinery. URL: https://doi.org/10.1145/2594291.2594334.
Lina Ochoa, Thomas Degueule, and Jean-Rémy Falleri. BreakBot: Analyzing the impact of breaking changes to assist library evolution. In 44th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results, ICSE (NIER) 2022. IEEE, 2022.
Zoe Paraskevopoulou, Cătălin Hriţcu, Maxime Dénès, Leonidas Lampropoulos, and Benjamin C. Pierce. Foundational property-based testing. In ITP 2015 - 6th conference on Interactive Theorem Proving, volume 9236 of Lecture Notes in Computer Science, Nanjing, China, August 2015. Springer. URL: https://doi.org/10.1007/978-3-319-22102-1_22.
John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. Test-case reduction for C compiler bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 335-346, New York, NY, USA, 2012. Association for Computing Machinery. URL: https://doi.org/10.1145/2254064.2254104.
Matthieu Sozeau, Simon Boulier, Yannick Forster, Nicolas Tabareau, and Théo Winterhalter. Coq Coq Correct! verification of type checking and erasure for Coq, in Coq. Proc. ACM Program. Lang., 4(POPL), December 2019. URL: https://doi.org/10.1145/3371076.
Daniel S. Wilkerson and Scott McPeak. delta - delta assists you in minimizing "interesting" files subject to a test of their interestingness, February 2006. Presented at https://web.archive.org/web/20071224085116/http://www.codecon.org/2006/program.html. URL: https://github.com/dsw/delta.
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 283-294, New York, NY, USA, 2011. Association for Computing Machinery. URL: https://doi.org/10.1145/1993498.1993532.
Andreas Zeller. Isolating cause-effect chains from computer programs. In Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering, SIGSOFT '02/FSE-10, pages 1-10, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/587051.587053.
Andreas Zeller. Why Programs Fail: A Guide to Systematic Debugging. Elsevier, 2009.
Théo Zimmermann. Challenges in the collaborative evolution of a proof language and its ecosystem. PhD thesis, Université de Paris, 2019. URL: https://hal.inria.fr/tel-02451322.
Théo Zimmermann, Julien Coolen, Jason Gross, Pierre-Marie Pédrot, and Gaëtan Gilbert. Advantages of maintaining a multi-task project-specific bot: an experience report. working paper, 2022. URL: https://hal.inria.fr/hal-03479327.

Automatic Test-Case Reduction in Proof Assistants: A Case Study in Coq

Authors Jason Gross , Théo Zimmermann , Miraya Poddar-Agrawal , Adam Chlipala

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Automatic Test-Case Reduction in Proof Assistants: A Case Study in Coq

Authors Jason Gross , Théo Zimmermann , Miraya Poddar-Agrawal , Adam Chlipala

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message