Design Principles for Falsifiable, Replicable and Reproducible Empirical Machine Learning Research

Vranješ, Daniel; Ehrhardt, Jonas; Heesch, René; Moddemann, Lukas; Steude, Henrik Sebastian; Niggemann, Oliver

doi:10.4230/OASIcs.DX.2024.7

Abstract

Machine learning is becoming increasingly important in the diagnosis and planning fields, where data-driven models and algorithms are being employed as alternatives to traditional first-principle approaches. Empirical research plays a fundamental role in the machine learning domain. At the heart of impactful empirical research lies the development of clear research hypotheses, which then shape the design of experiments. The execution of experiments must be carried out with precision to ensure reliable results, followed by statistical analysis to interpret these outcomes. This process is key to either supporting or refuting initial hypotheses. Despite its importance, there is a high variability in research practices across the machine learning community and no uniform understanding of quality criteria for empirical research. To address this gap, we propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research. By embracing these recommendations, greater consistency, enhanced reliability and increased impact can be achieved.

Martin W. Bauer. Hypothesis. The Palgrave Encyclopedia of the Possible, pages 692-701, 2022. URL: https://doi.org/10.1007/978-3-030-90913-0_193.
Chris Drummond. Replicability is not reproducibility: Nor is it good science. Proc. of the Evaluation Methods for MachineLearning Workshop at the 26 th ICML, 2009. URL: https://www.researchgate.net/publication/228709155_Replicability_Is_Not_Reproducibility_Nor_Is_It_Good_Science.
Chris Drummond. Reproducible research: a minority opinion. Journal of Experimental and Theoretical Artificial Intelligence, 30:1-11, January 2018. URL: https://doi.org/10.1080/0952813X.2017.1413140.
Jonas Ehrhardt, René Heesch, and Oliver Niggemann. Learning process steps as dynamical systems for a sub-symbolic approach of process planning in cyber-physical production systems. Communications in Computer and Information Science, 1948 CCIS:332-345, 2024. URL: https://doi.org/10.1007/978-3-031-50485-3_34/FIGURES/4.
Ronald A. Fisher. The design of experiments. Oliver & Boyd, 1935.
Odd Erik Gundersen and Sigbjørn Kjensmo. State of the art: Reproducibility in artificial intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 32:1644-1651, April 2018. URL: https://doi.org/10.1609/AAAI.V32I1.11503.
René Heesch, Alessandro Cimatti, Jonas Ehrhardt, Alexander Diedrich, and Oliver Niggemann. A lazy approach to neural numerical planning with control parameters. In ECAI 2024. IOS Press, 2024.
René Heesch, Jonas Ehrhardt, and Oliver Niggemann. Integrating machine learning into an SMT-based planning approach for production planning inc yber-physical production systems. Communications in Computer and Information Science, 1948 CCIS:318-331, 2024. URL: https://doi.org/0.1007/978-3-031-50485-3_33.
Sabine Hoffmann, Fabian Scheipl, and Anne-Laure Boulesteix. Reproduzierbare und replizierbare forschung. Moderne Verfahren der Angewandten Statistik, pages 1-28, 2023. URL: https://doi.org/10.1007/978-3-662-63496-7_25-1.
Keith J. Holyoak and Robert G. Morrison. The Oxford Handbook of Thinking and Reasoning. Oxford University Press, November 2012. URL: https://doi.org/10.1093/OXFORDHB/9780199734689.001.0001.
Matthew Hutson. Artificial intelligence faces reproducibility crisis unpublished code and sensitivity to training conditions make many claims hard to verify. Science, 359:725-726, February 2018. URL: https://doi.org/10.1126/SCIENCE.359.6377.725/ASSET/65660AD9-910B-4564-8959-7E3192C0286D/ASSETS/GRAPHIC/359_725_F2.JPEG.
Eyke Hüllermeier and Willem Waegeman. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Machine Learning, 110:457-506, March 2021. URL: https://doi.org/10.1007/S10994-021-05946-3/FIGURES/17.
Tatjana Ille and Natasa Milic. Statistical tests. Encyclopedia of Public Health, pages 1341-1344, 2008. URL: https://doi.org/10.1007/978-1-4020-5614-7_3349.
Hiroshi Ishikawa. Hypothesis. Studies in Big Data, 139:33-69, 2024. URL: https://doi.org/10.1007/978-3-031-43540-9_2/FIGURES/16.
Herbert Keuth. Karl popper: Logik der forschung. Karl Popper: Logik der Forschung, 2007. URL: https://doi.org/10.1524/9783050050188/HTML.
Lukas Moddemann, Henrik Sebastian Steude, Alexander Diedrich, and Oliver Niggemann. Discret2di - deep learning based discretization for model-based diagnosis. IFAC-PapersOnLine, 58:640-645, January 2024. URL: https://doi.org/10.1016/J.IFACOL.2024.07.291.
Lukas Moddemann, Henrik Sebastian Steude, Alexander Diedrich, Ingo Pill, and Oliver Niggemann. Extracting knowledge using machine learning for anomaly detection and root-cause diagnosis | request pdf. In IEEE ETFA 2024 - IEEE International Conference on Emerging Technologies and Factory Automation, 2024. URL: https://www.researchgate.net/publication/381574722_Extracting_Knowledge_using_Machine_Learning_for_Anomaly_Detection_and_Root-Cause_Diagnosis.
Committee on Reproducibility and Replicability in Science. Understanding Reproducibility and Replicability - Reproducibility and Replicability in Science - NCBI Bookshelf. National Academies Press (US), 2019. URL: https://www.ncbi.nlm.nih.gov/books/NBK547546/.
Friedrich Rapp. The methodological symmetry between verification and falsification on jstor. Journal for General Philosophy of Science, 6:139-144, 1975. URL: https://www.jstor.org/stable/25170349?seq=1.
Hartmut Schiefer and Felix Schiefer. Statistical tests. Statistics for Engineers, pages 69-93, 2021. URL: https://doi.org/10.1007/978-3-658-32397-4_5.
Henrik Sebastian Steude, Lukas Moddemann, Alexander Diedrich, Jonas Ehrhardt, and Oliver Niggemann. Diagnosis driven anomaly detection for cyber-physical systems. IFAC-PapersOnLine, 58:13-18, January 2024. URL: https://doi.org/10.1016/J.IFACOL.2024.07.186.
Stefan Studer, Thanh Binh Bui, Christian Drescher, Alexander Hanuschkin, Ludwig Winkler, Steven Peters, and Klaus Robert Müller. Towards crisp-ml(q): A machine learning process model with quality assurance methodology. Machine Learning and Knowledge Extraction, 3:392-413, March 2020. URL: https://doi.org/10.3390/make3020020.
Daniel Vranjes. Empirical machine learning research guide, 2024. URL: https://github.com/danvran/EmpiricalMachineLearningResearchGuide.
Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J.G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A.C. t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan Van Der Lei, Erik Van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. The fair guiding principles for scientific data management and stewardship. Scientific Data 2016 3:1, 3:1-9, March 2016. URL: https://doi.org/10.1038/sdata.2016.18.

Design Principles for Falsifiable, Replicable and Reproducible Empirical Machine Learning Research

Authors Daniel Vranješ , Jonas Ehrhardt , René Heesch , Lukas Moddemann , Henrik Sebastian Steude , Oliver Niggemann

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Design Principles for Falsifiable, Replicable and Reproducible Empirical Machine Learning Research

Authors Daniel Vranješ , Jonas Ehrhardt , René Heesch , Lukas Moddemann , Henrik Sebastian Steude , Oliver Niggemann

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message