Complications for Computational Experiments from Modern Processors

Authors Johannes K. Fichte , Markus Hecher , Ciaran McCreesh, Anas Shahab

Thumbnail PDF


  • Filesize: 1.1 MB
  • 21 pages

Document Identifiers

Author Details

Johannes K. Fichte
  • TU Dresden, Germany
Markus Hecher
  • TU Wien, Austria
  • Universität Potsdam, Germany
Ciaran McCreesh
  • University of Glasgow, UK
Anas Shahab
  • TU Dresden, Germany


Authors are given in alphabetical order. The work has been carried out while the first three authors were visiting the Simons Institute for the Theory of Computing.

Cite AsGet BibTex

Johannes K. Fichte, Markus Hecher, Ciaran McCreesh, and Anas Shahab. Complications for Computational Experiments from Modern Processors. In 27th International Conference on Principles and Practice of Constraint Programming (CP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 210, pp. 25:1-25:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


In this paper, we revisit the approach to empirical experiments for combinatorial solvers. We provide a brief survey on tools that can help to make empirical work easier. We illustrate origins of uncertainty in modern hardware and show how strong the influence of certain aspects of modern hardware and its experimental setup can be in an actual experimental evaluation. More specifically, there can be situations where (i) two different researchers run a reasonable-looking experiment comparing the same solvers and come to different conclusions and (ii) one researcher runs the same experiment twice on the same hardware and reaches different conclusions based upon how the hardware is configured and used. We investigate these situations from a hardware perspective. Furthermore, we provide an overview on standard measures, detailed explanations on effects, potential errors, and biased suggestions for useful tools. Alongside the tools, we discuss their feasibility as experiments often run on clusters to which the experimentalist has only limited access. Our work sheds light on a number of benchmarking-related issues which could be considered to be folklore or even myths.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Multicore architectures
  • Hardware → Temperature monitoring
  • Hardware → Impact on the environment
  • Hardware → Platform power issues
  • Theory of computation → Design and analysis of algorithms
  • Experimenting
  • Combinatorial Solving
  • Empirical Work


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Altuna Akalin. Scientific data analysis pipelines and reproducibility., 2018. Retrieved 2021-05-29.
  2. Chris Angelini. Intel’s second-gen core CPUs: The Sandy bridge review.,2833-3.html, 2011.
  3. Gilles Audemard, Loïc Paulevé, and Laurent Simon. SAT heritage: A community-driven effort for archiving, building and running more than thousand sat solvers. In Luca Pulina and Martina Seidl, editors, Proceedings of the 23rd International Conference on Theory and Applications of Satisfiability Testing (SAT'20), volume 12178 of Lecture Notes in Computer Science, pages 107-113, Alghero, Italy, July 2020. Springer Verlag. Google Scholar
  4. David F. Bacon, Susan L. Graham, and Oliver J. Sharp. Compiler transformations for high-performance computing. ACM Comput. Surv., 26(4):345-420, 1994. URL:
  5. Daniel Le Berre, Matti Järvisalo, Armin Biere, and Kuldeep S. Meel. The SAT practitioner’s manifesto v1.0., 2020.
  6. Donald Berry. Avoiding clock drift on VMs., 2017.
  7. Dirk Beyer, Stefan Löwe, and Philipp Wendler. Benchmarking and resource measurement. In Bernd Fischer and Jaco Geldenhuys, editors, Proceedings of the 22nd International Symposium on Model Checking of Software (SPIN'15), volume 9232 of Lecture Notes in Computer Science, pages 160-178. Springer Verlag, 2015. URL:
  8. Dirk Beyer, Stefan Löwe, and Philipp Wendler. Reliable benchmarking: requirements and solutions. International Journal on Software Tools for Technology Transfer, 21(1):1-29, 2019. URL:
  9. Steve Blank. What the GlobalFoundries' retreat really means. IEEE Spectrum, 2018. URL:
  10. Terrehon Bowden, Bodo Bauer, Jorge Nerin, Shen Feng, and Stefani Seibold. The /proc filesystem., 2009.
  11. Padraig Brady. Timeout(1)., 2019.
  12. Dominik Brodowski, Nico Golde, Rafael J. Wysocki, and Viresh Kumar. CPU frequency and voltage scaling code in the Linux(tm) kernel., 2016.
  13. Davidlohr Bueso. Util-linux 2.37., 2021.
  14. Alejandro Colomar et al. Ulimit(3)., 2017.
  15. Roberto Di Cosmo, Morane Gruenpeter, and Stefano Zacchiroli. Identifiers for digital objects: the case of software source code preservation. In iPRES 2018 - 15th International Conference on Digital Preservation, 2018. URL:
  16. Roberto Di Cosmo and Stefano Zacchiroli. Software heritage: Why and how to preserve software source code. In iPRES 2017: 14th International Conference on Digital Preservation, Kyoto, Japan, 2017. URL:
  17. Roberto Di Cosmo, Stefano Zacchiroli, Gérard Berry, Jean-François Abramatic, Julia Lawall, and Serge Abiteboul. Software heritage., 2020.
  18. Chris Drummond. Replicability is not reproducibility: Nor is it good science. In Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th ICML, Montreal, Canada, 2009. Google Scholar
  19. Jan Elffers and Jakob Nordström. Divide and conquer: Towards faster pseudo-boolean solving. In Jérôme Lang, editor, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 1291-1299. International Joint Conferences on Artificial Intelligence Organization, July 2018. URL:
  20. Johannes K. Fichte and Markus Hecher. The model counting competition 2021., 2021.
  21. Johannes K. Fichte, Markus Hecher, and Florim Hamiti. The model counting competition 2020. ACM Journal of Experimental Algorithmics, 2021. In press. Google Scholar
  22. Johannes K. Fichte, Markus Hecher, and Stefan Szeider. A time leap challenge for SAT-solving. In Helmut Simonis, editor, Proceedings of the 26th International Conference on Principles and Practice of Constraint Programming (CP'20), pages 267-285, Louvain-la-Neuve, Belgium, September 2020. Springer Verlag. URL:
  23. Johannes K. Fichte, Markus Hecher, Stefan Woltran, and Markus Zisser. Weighted model counting on the GPU by exploiting small treewidth. In Yossi Azar, Hannah Bast, and Grzegorz Herman, editors, Proceedings of the 26th Annual European Symposium on Algorithms (ESA'18), volume 112 of Leibniz International Proceedings in Informatics (LIPIcs), pages 28:1-28:16. Dagstuhl Publishing, 2018. URL:
  24. Johannes K. Fichte, Norbert Manthey, André Schidler, and Julian Stecklina. Towards faster reasoners by using transparent huge pages. In Helmut Simonis, editor, Proceedings of the 26th International Conference on Principles and Practice of Constraint Programming (CP'20), volume 12333 of Lecture Notes in Computer Science, pages 304-322, Louvain-la-Neuve, Belgium, September 2020. Springer Verlag. URL:
  25. Ronald Fisher. Arrangement of field experiments. Journal of the Ministry of Agriculture of Great Britain, pages 503-513, 1926. Google Scholar
  26. Ronald Fisher. The Design of Experiments. Oliver and Boyd, 1935. Google Scholar
  27. Vijay Ganesh and Moshe Y. Vardi. On the unreasonable effectiveness of sat solvers, 2021. Google Scholar
  28. Ian P. Gent. The recomputation manifesto. CoRR, abs/1304.3674, 2013. URL:
  29. Ian P Gent and Toby Walsh. The SAT phase transition. In Anthony G. Cohn, editor, Proceedings of the 11th European Conference on Artificial Intelligence (ECAI'94), volume 94, pages 105-109, Amsterdam, The Netherlands, 1994. PITMAN. Google Scholar
  30. Stephan Gocht, Jakob Nordström, and Amir Yehudayoff. On division versus saturation in pseudo-boolean solving. In Sarit Kraus, editor, Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI'19), pages 1711-1718. International Joint Conferences on Artificial Intelligence Organization, July 2019. URL:
  31. Odd Erik Gundersen. The reproducibility crisis is real. AI Magazine, 2020. URL:
  32. Odd Erik Gundersen. AAAI'21 reproducibility checklist., 2021.
  33. Odd Erik Gundersen and Sigbjørn Kjensmo. State of the art: Reproducibility in artificial intelligence. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI'18), 2018. Google Scholar
  34. Daniel Hackenberg, Robert Schöne, Thomas Ilsche, Daniel Molka, Joseph Schuchart, and Robin Geyer. An energy efficiency feature survey of the Intel Haswell processor. In Jean-Francois Lalande and Teng Moh, editors, Proceedings of the 17th International Conference on High Performance Computing & Simulation (HPCS'19), 2019. Google Scholar
  35. Youssef Hamadi and Christoph M. Wintersteiger. Seven challenges in parallel SAT solving. In Jörg Hoffmann and Bart Selman, editors, Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI'12), pages 2120-2125, Toronto, Ontario, Canada, 2012. The AAAI Press. Google Scholar
  36. Tejun Heo. Control group v2., 2015.
  37. Christopher Hollowell, Costin Caramarcu, William Strecker-Kellogg, Antonio Wong, and Alexandr Zaytsev. The effect of numa tunings on cpu performance. Journal of Physics: Conference Series, 2015. Google Scholar
  38. Christopher Hollowell, Costin Caramarcu, William Strecker-Kellogg, Antonio Wong, and Alexandr Zaytsev. The effect of NUMA tunings on CPU performance. Journal of Physics: Conference Series, 664(9):092010, December 2015. URL:
  39. John N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:33-42, 1995. URL:
  40. Holger H. Hoos, Benjamin Kaufmann, Torsten Schaub, and Marius Schneider. Robust benchmark set selection for Boolean constraint solvers. In Proceedings of the 7th International Conference on Learning and Intelligent Optimization (LION'13), volume 7997 of Lecture Notes in Computer Science, pages 138-152, Catania, Italy, January 2013. Springer Verlag. Revised Selected Papers. Google Scholar
  41. Neil Horman, PJ Waskiewicz, and Anton Arapov. Irqbalance., 2020.
  42. Sascha Hunold, Alexandra Carpen-Amarie, and Jesper Larsson Träff. Reproducible MPI micro-benchmarking isn't as easy as you think. In Proceedings of the 21st European MPI Users' Group Meeting, EuroMPI/ASIA '14, pages 69-76, New York, NY, USA, 2014. Association for Computing Machinery, New York. URL:
  43. Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, and Takatsugu Ono. Evaluating the impacts of code-level performance tunings on power efficiency. In 2016 IEEE International Conference on Big Data (Big Data), pages 362-369, 2016. URL:
  44. VmWare Inc. Timekeeping in VMware virtual machines., 2008.
  45. Intel. Intel product specifications: Processors., 2021.
  46. Paul Jackson and Christoph Lameter. cgroups - Linux control groups., 2006.
  47. Matti Järvisalo, Daniel Le Berre, Olivier Roussel, and Laurent Simon. The international SAT solver competitions. AI Magazin, 33(1), 2012. URL:, URL:
  48. D. A. Jimenez and C. Lin. Dynamic branch prediction with perceptrons. In Proceedings of 7th International Symposium on High-Performance Computer Architecture (HPCA'01), pages 197-206, 2001. Google Scholar
  49. Armin Biere Katalin Fazekas, Daniela Kaufmann. Ranking robustness under sub-sampling for the SAT competition 2018. In Matti Järvisalo and Daniel Le Berre, editors, Proceedings of the 10th Workshop on Pragmatics of SAT (POS'19), 2019. Google Scholar
  50. Michael Kerrisk. The Linux Programming Interface. No Starch Press, 2010. URL:
  51. Andi Kleen, Cliff Wickman, Christoph Lameter, and Lee Schermerhorn. numactl., 2014.
  52. Donald E. Knuth. The Art of Computer Programming, Volume 4, Fascicle 6: Satisfiability. Addison-Wesley, 2015. Google Scholar
  53. Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: Exploiting speculative execution. In Proceedings of the 40th IEEE Symposium on Security and Privacy (S&P'19), 2019. Google Scholar
  54. Michael Kohlhase. The theorem prover museum - conserving the system heritage of automated reasoning. CoRR, abs/1904.10414, 2019. URL:
  55. Divya Kiran Kumar. Installing and using Perf in Ubuntu and CentOS., 2019.
  56. G. M. Kurtzer, V. Sochat, and Michael W Bauer. Singularity: Scientific containers for mobility of compute. PLoS ONE, 12, 2017. URL:
  57. Christoph Lameter. Numa (non-uniform memory access): An overview: Numa becomes more common because memory controllers get close to execution units on microprocessors. Queue, 11(7):40-51, 2013. Google Scholar
  58. Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown: Reading kernel memory from user space. In 27th USENIX Security Symposium (USENIX Security 18), 2018. Google Scholar
  59. Zhiyuan Liu and Jian Tang. IJCAI 2021 reproducibility guidelines., 2020.
  60. Eric Ma. How to statically link C and C++ programs on Linux with gcc., 2020. Accessed 20-April-2021.
  61. Kamil Maciorowski and meuh. Run perf without root-rights., 2015.
  62. Ruben Martins, Vasco Manquinho, and Inês Lynce. An overview of parallel sat solving. Constraints, 17(3):304-347, 2012. URL:
  63. Catherine C. McGeoch. A Guide to Experimental Algorithmics. Cambridge University Press, 2012. Google Scholar
  64. Doug McIlroy. Unix time-sharing system: Foreword. The Bell System Technical Journal, pages 1902-1903, 1978. Google Scholar
  65. Marcin Miłkowski, Witold M. Hensel, and Mateusz Hohol. Replicability or reproducibility? on the replication crisis in computational neuroscience and sharing only relevant detail. Journal of Computational Neuroscience, 2018. Google Scholar
  66. Daniel Molka, Daniel Hackenberg, Robert Schone, and Matthias S. Muller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 261-270, 2009. URL:
  67. Ingo Molnar and Max Krasnyansky. Smp irq affinity., 2012.
  68. Matteo Monchiero and Gianluca Palermo. The combined perceptron branch predictor. In José C. Cunha and Pedro D. Medeiros, editors, Euro-Par 2005 Parallel Processing, pages 487-496, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. Google Scholar
  69. Gordon E. Moore. Cramming more components onto integrated circuits. Electronics, 38(8), April 1965. Google Scholar
  70. Hassan Mujtaba. Intel Xeon E7 'Ivy Bridge-EX' lineup detailed - Xeon E7-8890 V2 'Ivy Town' chip with 15 cores and 37.5 mb LLC., 2014.
  71. Matthias Müller-Hannemann and Stefan Schirra, editors. Algorithm Engineering, volume 5971 of Theoretical Computer Science and General Issues. Springer Verlag, 2010. Bridging the Gap Between Algorithm Theory and Practice. URL:
  72. Steven J. Murdoch. Hot or not: Revealing hidden services by their clock skew. In Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS '06, pages 27-36, New York, NY, USA, 2006. Association for Computing Machinery. URL:
  73. Gonzalo Navarro. RCR: Replicated computational results initiative., 2021.
  74. Naomi Oreskes. Why Trust Science? The University Center for Human Values Series, 2021. Google Scholar
  75. Jeffrey M. Perkel. Challenge to scientists: does your ten-year-old code still run? Nature, 584:656-658, 2020. URL:
  76. Red Had Developers. Optimizing Red Hat Enterprise Linux performance by tuning IRQ affinity., 2011.
  77. Benoit Rostykus and Gabriel Hartmann. Predictive CPU isolation of containers at Netflix., 2019. accessed 20-May-2021.
  78. David Rotman. We're not prepared for the end of moore’s law. MIT Technology Review, 2020. Google Scholar
  79. Olivier Roussel. Controlling a solver execution with the runsolver tool. J. on Satisfiability, Boolean Modeling and Computation, 7:139-144, 2011. Google Scholar
  80. Olivier Roussel and Christophe Lecoutre. XCSP3 competition 2019., July 2019. A joint event with the 25th International Conference on Principles and Practice of Constraint Programming (CP'19).
  81. Marta Serra-Garcia and Uri Gneezy. Nonreplicable publications are cited more than replicable ones. Science Advances, 7(21), 2021. URL:
  82. Laurent Simon, Daniel Le Berre, and Edward A. Hirsch. The SAT2002 competition (preliminary draft)., 2002.
  83. Helmut Simonis, George Katsirelos, Matt Streeter, and Emmanuel Hebrard. CSP 2009 competition (CSC'2009)., July 2009.
  84. Aaron Stump, Geoff Sutcliffe, and Cesare Tinelli. StarExec: A cross-community infrastructure for logic solving. In Stéphane Demri, Deepak Kapur, and Christoph Weidenbach, editors, Proceedings of the 7th International Joint Conference on Automated Reasoning (IJCAR'14), volume 8562 of Lecture Notes in Computer Science, pages 367-373, Vienna, Austria, July 2014. Springer Verlag. Held as Part of the Vienna Summer of Logic, VSL 2014. URL:
  85. SUSE Support. Clock drifts in KVM virtual machines., 2020.
  86. Guido Tack and Peter J. Stuckey. The MiniZinc challenge 2019., 2019. A joint event with the 25th International Conference on Principles and Practice of Constraint Programming (CP'19).
  87. Andrew S. Tanenbaum and James R. Goodman. Structured Computer Organization. Prentice Hall PTR, USA, 4th edition, 1998. Google Scholar
  88. The Open Group. The open group base specifications issue 7, 2018 edition ieee std 1003.1-2017 (revision of ieee std 1003.1-2008)., 2018.
  89. Linus Torvalds. LKML archive on Very slow clang kernel config., 2021.
  90. A. M. Turing. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, s2-42(1):230-265, 1937. URL:
  91. Moshe Y. Vardi. Boolean satisfiability: Theory and engineering. Communications of the ACM, 57(3):5, 2014. URL:
  92. Felix von Leitner. Source code optimization. Linux Congress 2009., 2009.
  93. J. von Neumann. First draft of a report on the EDVAC. IEEE Annals of the History of Computing, 15(4):27-75, 1993. URL:
  94. Szymon Wasik, Maciej Antczak, Jan Badura, Artur Laskowski, and Tomasz Sternal. Cscw '16 companion. In Eric Gilbert and Karrie Karahalios, editors, Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion, Association for Computing Machinery, New York, pages 433-436, New York, NY, USA, Optil.Io: Cloud Based Platform For Solving Optimization Problems Using Crowdsourcing Approach. acm. URL:
  95. Ulrich Windl, David Dalton, Marc Martinec, and Dale R. Worley. The NTP FAQ and HOWTO: Understanding and using the network time protocol., 2006.
  96. Ricardo Wurmus, Bora Uyar, Brendan Osberg, Vedran Franke, Alexander Gosdschan, Katarzyna Wreczycka, Jonathan Ronen, and Altuna Akalin. PiGx: reproducible genomics analysis pipelines with GNU Guix. GigaScience, 7(12), October 2018. giy123. URL:
  97. Rafael J. Wysocki. intel_pstate CPU performance scaling driver., 2017.
  98. Rengan Xu, Frank Han, and Nishanth Dandapanthula. Containerizing HPC applications with singularity., 2017.
  99. Andy B. Yoo, Morris A. Jette, and Mark Grondona. Slurm: Simple Linux utility for resource management. In Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, editors, 9th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP'03), volume 2862 of Lecture Notes in Computer Science, pages 44-60, Seattle, WA, USA, 2003. Springer Verlag. Google Scholar
  100. Zhenyun Zhuang, Cuong Tran, and Jerry Weng. Don't let Linux control groups run uncontrolled: Addressing memory-related performance pitfalls of cgroups., 2016. Accessed Apr-28-2020.
  101. Peter Zijlstra, Ingo Molnar, and Arnaldo Carvalho de Melo. Performance events subsystem., 2009.