Test-Case Reduction via Test-Case Generation: Insights from the Hypothesis Reducer (Tool Insights Paper)

Authors David R. MacIver , Alastair F. Donaldson

Thumbnail PDF


  • Filesize: 0.5 MB
  • 27 pages

Document Identifiers

Author Details

David R. MacIver
  • Imperial College London, United Kingdom
Alastair F. Donaldson
  • Imperial College London, United Kingdom

Cite AsGet BibTex

David R. MacIver and Alastair F. Donaldson. Test-Case Reduction via Test-Case Generation: Insights from the Hypothesis Reducer (Tool Insights Paper). In 34th European Conference on Object-Oriented Programming (ECOOP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 166, pp. 13:1-13:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


We describe internal test-case reduction, the method of test-case reduction employed by Hypothesis, a widely-used property-based testing library for Python. The key idea of internal test-case reduction is that instead of applying test-case reduction externally to generated test cases, we apply it internally, to the sequence of random choices made during generation, so that a test case is reduced by continually re-generating smaller and simpler test cases that continue to trigger some property of interest (e.g. a bug in the system under test). This allows for fully generic test-case reduction without any user intervention and without the need to write a specific test-case reducer for a particular application domain. It also significantly mitigates the impact of the test-case validity problem, by ensuring that any reduced test case is one that could in principle have been generated. We describe the rationale behind this approach, explain its implementation in Hypothesis, and present an extensive evaluation comparing its effectiveness with that of several other test-case reducers, including C-Reduce and delta debugging, on applications including Python auto-formatting, C compilers, and the SymPy symbolic math library. Our hope is that these insights into the reduction mechanism employed by Hypothesis will be useful to researchers interested in randomized testing and test-case reduction, as the crux of the approach is fully generic and should be applicable to any random generator of test cases.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Software testing and debugging
  • Software testing
  • test-case reduction


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Thomas Arts, John Hughes, Joakim Johansson, and Ulf T. Wiger. Testing telecoms software with quviq quickcheck. In Marc Feeley and Philip W. Trinder, editors, Proceedings of the 2006 ACM SIGPLAN Workshop on Erlang, Portland, Oregon, USA, September 16, 2006, pages 2-10. ACM, 2006. URL: https://doi.org/10.1145/1159789.1159792.
  2. Koen Claessen and John Hughes. Quickcheck: a lightweight tool for random testing of haskell programs. In Martin Odersky and Philip Wadler, editors, Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP '00), Montreal, Canada, September 18-21, 2000., pages 268-279. ACM, 2000. URL: https://doi.org/10.1145/351240.351266.
  3. Stephen Dolan and Mindy Preston. Testing with crowbar. In Proceedings of the OCaml Users and Developers Workshop, September 2017. URL: https://ocaml.org/meetings/ocaml/2017/extended-abstract__2017__stephen-dolan_mindy-preston__testing-with-crowbar.pdf.
  4. Reid Draper. Proposal: free shrinking with quickcheck. https://mail.haskell.org/pipermail/libraries/2013-November/021674.html, 2013. Accessed: 2020-05-25.
  5. Reid Draper. Writing simple-check. http://reiddraper.com/writing-simple-check/, 2013. Accessed: 2020-05-25.
  6. Peter Goodman and Alex Groce. DeepState: Symbolic unit testing for C and C++. In NDSS Workshop on Binary Analysis Research, 2018. Google Scholar
  7. Google. yapf: Yet another python formatter, 2018. URL: https://github.com/google/yapf.
  8. Alex Groce. private correspondence. Google Scholar
  9. Alex Groce, Josie Holmes, and Kevin Kellar. One test to rule them all. In Tevfik Bultan and Koushik Sen, editors, Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10 - 14, 2017, pages 1-11. ACM, 2017. URL: https://doi.org/10.1145/3092703.3092704.
  10. Alex Groce and Jervis Pinto. A little language for testing. In Klaus Havelund, Gerard J. Holzmann, and Rajeev Joshi, editors, NASA Formal Methods - 7th International Symposium, NFM 2015, Pasadena, CA, USA, April 27-29, 2015, Proceedings, volume 9058 of Lecture Notes in Computer Science, pages 204-218. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-17524-9_15.
  11. Alex Groce, Jervis Pinto, Pooria Azimi, and Pranjal Mittal. TSTL: a language and tool for testing (demo). In Michal Young and Tao Xie, editors, Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, Baltimore, MD, USA, July 12-17, 2015, pages 414-417. ACM, 2015. URL: https://doi.org/10.1145/2771783.2784769.
  12. Ralf Hildebrandt and Andreas Zeller. Simplifying failure-inducing input. In Debra J. Richardson and Mary Jean Harold, editors, Proceedings of the International Symposium on Software Testing and Analysis, ISSTA 2000, Portland, OR, USA, August 21-24, 2000, pages 135-145. ACM, 2000. URL: https://doi.org/10.1145/347324.348938.
  13. Renáta Hodován and Ákos Kiss. Practical improvements to the minimizing delta debugging algorithm. In Leszek A. Maciaszek, Jorge S. Cardoso, André Ludwig, Marten van Sinderen, and Enrique Cabello, editors, Proceedings of the 11th International Joint Conference on Software Technologies (ICSOFT 2016) - Volume 1: ICSOFT-EA, Lisbon, Portugal, July 24 - 26, 2016., pages 241-248. SciTePress, 2016. URL: https://doi.org/10.5220/0005988602410248.
  14. Ralf Lämmel and Simon L. Peyton Jones. Scrap your boilerplate with class: extensible generic functions. In Olivier Danvy and Benjamin C. Pierce, editors, Proceedings of the 10th ACM SIGPLAN International Conference on Functional Programming, ICFP 2005, Tallinn, Estonia, September 26-28, 2005, pages 204-215. ACM, 2005. URL: https://doi.org/10.1145/1086365.1086391.
  15. Andreas Löscher and Konstantinos Sagonas. Targeted property-based testing. In Tevfik Bultan and Koushik Sen, editors, Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10 - 14, 2017, pages 46-56. ACM, 2017. URL: https://doi.org/10.1145/3092703.3092711.
  16. Andreas Löscher and Konstantinos Sagonas. Automating targeted property-based testing. In 11th IEEE International Conference on Software Testing, Verification and Validation, ICST 2018, Västerås, Sweden, April 9-13, 2018, pages 70-80. IEEE Computer Society, 2018. URL: https://doi.org/10.1109/ICST.2018.00017.
  17. Lukasz Langa. black: The uncompromising code formatter, 2018. URL: https://github.com/ambv/black.
  18. David MacIver, Zac Hatfield-Dodds, and Many Contributors. Hypothesis: A new approach to property-based testing. Journal of Open Source Software, 4(43):1891, November 2019. URL: https://doi.org/10.21105/joss.01891.
  19. Ghassan Misherghi and Zhendong Su. HDD: hierarchical delta debugging. In Leon J. Osterweil, H. Dieter Rombach, and Mary Lou Soffa, editors, 28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, May 20-28, 2006, pages 142-151. ACM, 2006. URL: https://doi.org/10.1145/1134307.
  20. Eugenio Moggi. Notions of computation and monads. Inf. Comput., 93(1):55-92, 1991. URL: https://doi.org/10.1016/0890-5401(91)90052-4.
  21. Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. Semantic fuzzing with zest. In Dongmei Zhang and Anders Møller, editors, Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019, pages 329-340. ACM, 2019. URL: https://doi.org/10.1145/3293882.3330576.
  22. Lee Pike. Smartcheck: automatic and efficient counterexample reduction and generalization. In Wouter Swierstra, editor, Proceedings of the 2014 ACM SIGPLAN symposium on Haskell, Gothenburg, Sweden, September 4-5, 2014, pages 53-64. ACM, 2014. URL: https://doi.org/10.1145/2633357.2633365.
  23. John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. Test-case reduction for C compiler bugs. In Jan Vitek, Haibo Lin, and Frank Tip, editors, ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, Beijing, China - June 11 - 16, 2012, pages 335-346. ACM, 2012. URL: https://doi.org/10.1145/2254064.2254104.
  24. Wikipedia contributors. Shortlex order, 2020. [Online; accessed 10-January-2020]. URL: https://en.wikipedia.org/wiki/Shortlex_order.
  25. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In Mary W. Hall and David A. Padua, editors, Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, pages 283-294. ACM, 2011. URL: https://doi.org/10.1145/1993498.1993532.
  26. Andreas Zeller and Ralf Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Trans. Software Eng., 28(2):183-200, 2002. URL: https://doi.org/10.1109/32.988498.