FAIR Jupyter: A Knowledge Graph Approach to Semantic Sharing and Granular Exploration of a Computational Notebook Reproducibility Dataset

Samuel, Sheeba; Mietchen, Daniel

doi:10.4230/TGDK.2.2.4

File

TGDK.2.2.4.pdf

Filesize: 1.7 MB
24 pages

Document Identifiers

DOI: 10.4230/TGDK.2.2.4
URN: urn:nbn:de:0030-drops-225886

Author Details

Sheeba Samuel

Distributed and Self-organizing Systems, Chemnitz University of Technology, Chemnitz, Germany

Daniel Mietchen

FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany
Institute for Globally Distributed Open Research and Education (IGDORE)

Acknowledgements

We thank the providers of infrastructure, data, and code that we used in this study. These include the PubMed Central repository at the U.S. National Center for Biotechnology Information and the Ara Cluster at the University of Jena as well as the Jupyter, Python and Conda communities and their respective dependencies. The authors gratefully acknowledge the computing time made available to them on the high-performance computer at the NHR Center of TU Dresden. This center is jointly supported by the Federal Ministry of Education and Research and the state governments participating in the NHR (www.nhr-verein.de/unsere-partner). Special thanks go to JupyterCon, which provided the nucleus for our collaboration. We also thank Ramy-Badr Ahmed and Moritz Schubotz for help with registering the GitHub repositories from our corpus in the Software Heritage archive.

Cite As Get BibTex

Sheeba Samuel and Daniel Mietchen. FAIR Jupyter: A Knowledge Graph Approach to Semantic Sharing and Granular Exploration of a Computational Notebook Reproducibility Dataset. In Special Issue on Resources for Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 2, pp. 4:1-4:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/TGDK.2.2.4

Abstract

The way in which data are shared can affect their utility and reusability. Here, we demonstrate how data that we had previously shared in bulk can be mobilized further through a knowledge graph that allows for much more granular exploration and interrogation. The original dataset is about the computational reproducibility of GitHub-hosted Jupyter notebooks associated with biomedical publications. It contains rich metadata about the publications, associated GitHub repositories and Jupyter notebooks, and the notebooks' reproducibility. We took this dataset, converted it into semantic triples and loaded these into a triple store to create a knowledge graph - FAIR Jupyter - that we made accessible via a web service. This enables granular data exploration and analysis through queries that can be tailored to specific use cases. Such queries may provide details about any of the variables from the original dataset, highlight relationships between them or combine some of the graph’s content with materials from corresponding external resources. We provide a collection of example queries addressing a range of use cases in research and education. We also outline how sets of such queries can be used to profile specific content types, either individually or by class. We conclude by discussing how such a semantically enhanced sharing of complex datasets can both enhance their FAIRness - i.e., their findability, accessibility, interoperability, and reusability - and help identify and communicate best practices, particularly with regards to data quality, standardization, automation and reproducibility.

Subject Classification

ACM Subject Classification

Information systems → Entity relationship models
Information systems → Information extraction

Keywords

Knowledge Graph
Computational reproducibility
Jupyter notebooks
FAIR data
PubMed Central
GitHub
Python
SPARQL

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

arXiv. URL: https://arxiv.org/.
Biodiversity PMC. URL: https://biodiversitypmc.sibils.org/.
DOAP: Description of a project. URL: http://usefulinc.com/ns/doap#.
flake8nb. URL: https://github.com/s-weigand/flake8-nb.
GitHub REST API. URL: https://docs.github.com/en/rest/guides/getting-started-with-the-rest-api.
JupyterCon’23. URL: https://cfp.jupytercon.com/2023/talk/FSMWLQ/.
Matey. URL: https://rml.io/yarrrml/matey/.
MeSH (Medical Subject Headings). URL: https://www.ncbi.nlm.nih.gov/mesh.
MeSH SPARQL Endpoint. URL: https://id.nlm.nih.gov/mesh/sparql.
SQLite. URL: https://www.sqlite.org.
vis.js. URL: https://visjs.org/.
Ibrahim Abdelaziz, Julian Dolby, Jamie P. McCusker, and Kavitha Srinivas. A toolkit for generating code knowledge graphs. In Anna Lisa Gentile and Rafael Gonçalves, editors, K-CAP '21: Knowledge Capture Conference, Virtual Event, USA, December 2-3, 2021, pages 137-144. ACM, 2021. URL: https://doi.org/10.1145/3460210.3493578.
Julián Arenas-Guerrero, David Chaves-Fraga, Jhon Toledo, María S. Pérez, and Oscar Corcho. Morph-KGC: Scalable knowledge graph materialization with mapping partitions. Semantic Web, 15(1):1-20, 2024. URL: https://doi.org/10.3233/SW-223135.
Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Markus Stocker, and Maria Esther Vidal. Towards a knowledge graph for science. In Proceedings of the 8th international conference on web intelligence, mining and semantics, pages 1-6, 2018. URL: https://doi.org/10.1145/3227609.3227689.
Kathrin Blagec, Adriano Barbosa-Silva, Simon Ott, and Matthias Samwald. A curated, ontology-based, large-scale knowledge graph of artificial intelligence tasks and benchmarks. Scientific Data, 9(1):322, 2022.
Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, and Tushar Khot. Super: Evaluating agents on setting up and executing tasks from research repositories, 2024. URL: https://doi.org/10.48550/arXiv.2409.07440.
Cristina-Iulia Bucur, Tobias Kuhn, Davide Ceolin, and Jacco van Ossenbruggen. Nanopublication-based semantic publishing and reviewing: a field study with formalization papers. PeerJ Computer Science, 9:e1159, 2023. URL: https://doi.org/10.7717/peerj-cs.1159.
Jeremy J Carroll, Ian Dickinson, Chris Dollin, Dave Reynolds, Andy Seaborne, and Kevin Wilkinson. Jena: implementing the semantic web recommendations. In Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 74-83, 2004. URL: https://doi.org/10.1145/1013367.1013381.
Souti Chattopadhyay, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. What’s wrong with computational notebooks? pain points, needs, and design opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pages 1-12, 2020. URL: https://doi.org/10.1145/3313831.3376729.
Zheyuan Chen, Yuwei Wan, Ying Liu, and Agustin Valera-Medina. A knowledge graph-supported information fusion approach for multi-faceted conceptual modelling. Inf. Fusion, 101:101985, 2024. URL: https://doi.org/10.1016/j.inffus.2023.101985.
Paolo Ciccarese, Stian Soiland-Reyes, Khalid Belhajjame, Alasdair J. G. Gray, Carole A. Goble, and Tim Clark. PAV ontology: provenance, authoring and versioning. J. Biomed. Semant., 4:1-22, 2013. URL: https://doi.org/10.1186/2041-1480-4-37.
Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, and Michiel J. L. de Hoon. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422-1423, 2009. URL: https://doi.org/10.1093/bioinformatics/btp163.
The MaRDI consortium. MaRDI: Mathematical Research Data Initiative Proposal, May 2022. URL: https://doi.org/10.5281/zenodo.6552436.
Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. Rml: A generic language for integrated rdf mappings of heterogeneous data. Ldow, 1184, 2014. URL: https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf.
Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck, Markus Ackermann, and Jens Lehmann. Mex vocabulary: a lightweight interchange format for machine learning experiments. In Proceedings of the 11th International Conference on Semantic Systems, pages 169-176, 2015. URL: https://doi.org/10.1145/2814864.2814883.
Michael Färber. The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data. In The Semantic Web - ISWC 2019, pages 113-129, Cham, 2019. Springer International Publishing. URL: https://doi.org/10.1007/978-3-030-30796-7_8.
Michael Färber. Analyzing the github repositories of research papers. In Ruhua Huang, Dan Wu, Gary Marchionini, Daqing He, Sally Jo Cunningham, and Preben Hansen, editors, JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020, JCDL '20, pages 491-492, New York, NY, USA, 2020. ACM. URL: https://doi.org/10.1145/3383583.3398578.
Daniel Garijo and Yolanda Gil. Augmenting PROV with plans in P-PLAN: scientific processes as linked data. In Tomi Kauppinen, Line C. Pouchard, and Carsten Keßler, editors, Proceedings of the Second International Workshop on Linked Science 2012 - Tackling Big Data, Boston, MA, USA, November 12, 2012, volume 951 of CEUR Workshop Proceedings. CEUR Workshop Proceedings, CEUR-WS.org, 2012. URL: https://ceur-ws.org/Vol-951/paper6.pdf.
Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober. FAIR Computational Workflows. Data Intelligence, 2(1-2):108-121, 2020. URL: https://doi.org/10.1162/dint_a_00033.
Brian E. Granger and Fernando Perez. Jupyter: Thinking and Storytelling With Code and Data. Computing in Science and Engineering, 23(2):7-14, 2021. URL: https://doi.org/10.1109/MCSE.2021.3059263.
Sabrina Granger. How research reproducibility challenges librariansquoteright skill sets. A French librarianquoterights perspective. Journal for Reproducibility in Neuroscience, 2, 2020. https://jrn.trialanderror.org/pub/french-librarians-perspective.
Maarten Grootendorst. Bertopic: Neural topic modeling with a class-based TF-IDF procedure. CoRR, abs/2203.05794, 2022. URL: https://doi.org/10.48550/arXiv.2203.05794.
Björn Grüning, John Chilton, Johannes Köster, Ryan Dale, Nicola Soranzo, Marius van den Beek, Jeremy Goecks, Rolf Backofen, Anton Nekrutenko, and James Taylor. Practical Computational Reproducibility in the Life Sciences. Cell systems, 6(6):631-635, 2018.
Björn Hagemeier, Arnim Bleier, Bernd Flemisch, Matthias Lieber, Klaus Reuter, and George Dogaru. Jupyter4nfdi, July 2024. URL: https://doi.org/10.5281/zenodo.12699382.
Kristina Hettne, Ricarda Proppert, Linda Nab, L. Paloma Rojas-Saunero, and Daniela Gawehns. Reprohacknl 2019: how libraries can promote research reproducibility through community engagement. IASSIST quarterly, 44(1-2):1-10, 2020.
Pieter Heyvaert, Ben De Meester, Anastasia Dimou, and Ruben Verborgh. Declarative rules for linked data generation at your fingertips! In The Semantic Web: ESWC 2018 Satellite Events: ESWC 2018 Satellite Events, Heraklion, Crete, Greece, June 3-7, 2018, Revised Selected Papers 15, pages 213-217. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-98192-5_40.
Nicolas Hubert, Pierre Monnin, Mathieu d'Aquin, Armelle Brun, and Davy Monticolo. Pygraft: Configurable generation of schemas and knowledge graphs at your fingertips. CoRR, abs/2309.03685, 2023. URL: https://doi.org/10.48550/arXiv.2309.03685.
Hassan Hussein, Kheir Eddine Farfar, Allard Oelen, Oliver Karras, and Sören Auer. Increasing reproducibility in science by interlinking semantic artifact descriptions in a knowledge graph. In International Conference on Asian Digital Libraries, pages 220-229. Springer, 2023. URL: https://doi.org/10.1007/978-981-99-8088-8_19.
Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D'Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In Proceedings of the 10th International Conference on Knowledge Capture, pages 243-246, 2019. URL: https://doi.org/10.1145/3360901.3364435.
Aidan Kelley and Daniel Garijo. A framework for creating knowledge graphs of scientific software metadata. Quantitative Science Studies, 2(4):1423-1446, 2021. URL: https://doi.org/10.1162/qss_a_00167.
Dominik Kerzel, Birgitta König-Ries, and Sheeba Samuel. MLProvLab: Provenance management for data science notebooks. In Datenbanksysteme für Business, Technologie und Web (BTW 2023), 20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS), 06.-10, März 2023, Dresden, Germany, Proceedings, volume P-331 of LNI, pages 965-980. Gesellschaft für Informatik e.V., 2023. URL: https://doi.org/10.18420/BTW2023-66.
Mallory C Kidwell, Ljiljana Lazarevic, Erica Baranski, Tom E. Hardwicke, Sarah Piechowski, Lina-Sophia Falkenberg, Curtis Kennett, Agnieszka Slowik, Carina Sonnleitner, Chelsey Hess-Holden, Timothy M Errington, Susann Fiedler, and Brian A Nosek. Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency. PLOS Biology, 14(5):e1002456, 2016. URL: https://arxiv.org/abs/27171007.
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E. Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B. Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, Carol Willing, and Jupyter Development Team. Jupyter notebooks - a publishing format for reproducible computational workflows. In Fernando Loizides and Birgit Schmidt, editors, Positioning and Power in Academic Publishing: Players, Agents and Agendas, 20th International Conference on Electronic Publishing, Göttingen, Germany, June 7-9, 2016, pages 87-90. IOS Press, 2016. URL: https://doi.org/10.3233/978-1-61499-649-1-87.
Anna-Lena Lamprecht, Leyla J. García, Mateusz Kuzak, Carlos Martinez-Ortiz, Ricardo Arcila, Eva Martin Del Pico, Victoria Dominguez Del Angel, Stephanie van de Sandt, Jon C. Ison, Paula Andrea Martínez, Peter McQuilton, Alfonso Valencia, Jennifer L. Harrow, Fotis E. Psomopoulos, Josep Lluis Gelpí, Neil P. Chue Hong, Carole A. Goble, and Salvador Capella-Gutiérrez. Towards FAIR principles for research software. Data Sci., 3(1):37-59, 2020. URL: https://doi.org/10.3233/ds-190026.
Timothy Lebo, Satya Sahoo, Deborah McGuinness, Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland-Reyes, Stephan Zednik, and Jun Zhao. Prov-o: The prov ontology. W3C recommendation, 30, 2013.
Ekaterina Levitskaya, Gizem Korkmaz, Daniel Mietchen, and Lane Rasberry. Analysis of linked github and wikidata, December 2022. URL: https://doi.org/10.5281/zenodo.7443339.
Mario Lins, René Mayrhofer, Michael Roland, Daniel Hofer, and Martin Schwaighofer. On the critical path to implant backdoors and the effectiveness of potential mitigation techniques: Early learnings from xz, 2024. URL: https://doi.org/10.48550/arXiv.2404.08987.
Chang Liu, Matthew Kim, Michael Rueschman, and Satya S. Sahoo. ProvCaRe: A Large-Scale Semantic Provenance Resource for Scientific Reproducibility, pages 59-73. Springer International Publishing, Cham, 2021. URL: https://doi.org/10.1007/978-3-030-67681-0_5.
Tomasz Miksa, Stephanie Renee Simms, Daniel Mietchen, and Sarah Jones. Ten principles for machine-actionable data management plans. PLoS Comput. Biol., 15(3):e1006750, 2019. URL: https://doi.org/10.1371/journal.pcbi.1006750.
Andrew Nesbitt, Boris Veytsman, Daniel Mietchen, Eva Maxfield Brown, James Howison, João Felipe Pimentel, Laurent Hébert-Dufresne, and Stephan Druskat. Biomedical open source software: Crucial packages and hidden heroes. CoRR, abs/2404.06672, 2024. URL: https://doi.org/10.48550/arXiv.2404.06672.
Finn Årup Nielsen, Daniel Mietchen, and Egon L. Willighagen. Scholia, scientometrics and wikidata. In Eva Blomqvist, Katja Hose, Heiko Paulheim, Agnieszka Lawrynowicz, Fabio Ciravegna, and Olaf Hartig, editors, The Semantic Web: ESWC 2017 Satellite Events - ESWC 2017 Satellite Events, Portorož, Slovenia, May 28 - June 1, 2017, Revised Selected Papers, volume 10577 of Lecture Notes in Computer Science, pages 237-259. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-70407-4_36.
Daniel Nüst, Vanessa V. Sochat, Ben Marwick, Stephen J. Eglen, Tim Head, Tony Hirst, and Benjamin D Evans. Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology, 16(11):e1008316, 2020. URL: https://doi.org/10.1371/journal.pcbi.1008316.
Jeff Z Pan. Resource description framework. In Handbook on ontologies, pages 71-90. Springer, 2009. URL: https://doi.org/10.1007/978-3-540-92673-3_3.
Silvio Peroni and David Shotton. Fabio and cito: Ontologies for describing bibliographic resources and citations. Journal of Web Semantics, 17:33-43, 2012. URL: https://doi.org/10.1016/j.websem.2012.08.001.
João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. A large-scale study about quality and reproducibility of jupyter notebooks. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR '19, pages 507-517, Piscataway, NJ, USA, 2019. IEEE Press. URL: https://doi.org/10.1109/MSR.2019.00077.
João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. Understanding and improving the quality and reproducibility of jupyter notebooks. Empir. Softw. Eng., 26(4):65, 2021. URL: https://doi.org/10.1007/s10664-021-09961-9.
Project Jupyter. nbdime: Jupyter notebook diff and merge tools. https://github.com/jupyter/nbdime, 2021. Accessed 22 November 2024.
Sarah Pugachev. What are "the carpentries" and what are they doing in the library? portal: Libraries and the Academy, 19(2):209-214, 2019.
Richard J. Roberts. Pubmed central: The genbank of the published literature. Proceedings of the National Academy of Sciences, 98(2):381-382, 2001.
A Rule, A Birmingham, C Zuniga, I Altintas, SC Huang, R Knight, N Moshiri, MH Nguyen, SB Rosenthal, F Pérez, et al. Ten simple rules for writing and sharing computational analyses in jupyter notebooks. Plos Computational Biology, 15(7):e1007007-e1007007, 2019. URL: https://doi.org/10.1371/journal.pcbi.1007007.
Adam Rule, Aurélien Tabard, and James D. Hollan. Exploration and explanation in computational notebooks. In Regan L. Mandryk, Mark Hancock, Mark Perry, and Anna L. Cox, editors, Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21-26, 2018, CHI '18, page 32, New York, NY, USA, 2018. ACM. URL: https://doi.org/10.1145/3173574.3173606.
Satya S Sahoo, Joshua Valdez, Michael Rueschman, and Matthew Kim. Semantic provenance graph for reproducibility of biomedical research studies: Generating and analyzing graph structures from published literature. In MEDINFO 2019: Health and Wellbeing e-Networks for All, pages 328-332. IOS Press, 2019. URL: https://doi.org/10.3233/SHTI190237.
Angelo A Salatino, Thiviyan Thanapalasingam, Andrea Mannocci, Francesco Osborne, and Enrico Motta. The computer science ontology: a large-scale taxonomy of research areas. In The Semantic Web-ISWC 2018: 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part II 17, pages 187-205. Springer, 2018. URL: https://doi.org/10.1007/978-3-030-00668-6_12.
Sheeba Samuel and Birgitta König-Ries. Combining P-Plan and the REPRODUCE-ME ontology to achieve semantic enrichment of scientific experiments using interactive notebooks. In The Semantic Web: ESWC 2018 Satellite Events: Heraklion, Crete, Greece, June 3-7, 2018, pages 126-130. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-98192-5_24.
Sheeba Samuel and Birgitta König-Ries. ProvBook: Provenance-based semantic enrichment of interactive notebooks for reproducibility. In Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8th - to - 12th, 2018, volume 2180 of CEUR Workshop Proceedings. CEUR-WS.org, 2018. URL: https://ceur-ws.org/Vol-2180/paper-57.pdf.
Sheeba Samuel and Birgitta König-Ries. ReproduceMeGit: A visualization tool for analyzing reproducibility of jupyter notebooks. In Provenance and Annotation of Data and Processes, pages 201-206, Cham, 2021. Springer International Publishing. URL: https://doi.org/10.1007/978-3-030-80960-7_12.
Sheeba Samuel and Birgitta König-Ries. End-to-end provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach. Journal of biomedical semantics, 13(1):1, 2022. URL: https://doi.org/10.1186/s13326-021-00253-1.
Sheeba Samuel and Daniel Mietchen. FAIR Jupyter. Service, DFG 514664767, DFG 460135501, DFG 521453681 (visited on 2024-11-29). URL: https://w3id.org/fairjupyter
full metadata available at: https://doi.org/10.4230/artifacts.22527
Sheeba Samuel and Daniel Mietchen. FAIR Jupyter Knowledge Graph: v1.0. Software, version 1.0., DFG 514664767, DFG 460135501, DFG 521453681 (visited on 2024-11-29). URL: https://doi.org/10.5281/zenodo.14197755.
Sheeba Samuel and Daniel Mietchen. Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications. https://doi.org/10.5281/zenodo.8226725, 2023.
Sheeba Samuel and Daniel Mietchen. Computational reproducibility of Jupyter notebooks from biomedical publications. GigaScience, 13:giad113, 2024.
Sheeba Samuel and Daniel Mietchen. FAIR Jupyter Knowledge Graph, September 2024. URL: https://doi.org/10.5281/zenodo.13845701.
Sheeba Samuel and Daniel Mietchen. FAIR Jupyter Knowledge Graph: SPARQL Queries and Performance Evaluation and Benchmark, September 2024. URL: https://doi.org/10.5281/zenodo.13847627.
Max Schröder, Frank Krüger, and Sascha Spors. Reproducible research is more than publishing research artefacts: A systematic analysis of jupyter notebooks from research articles. CoRR, abs/1905.00092, 2019. https://arxiv.org/abs/1905.00092, URL: https://doi.org/10.48550/arXiv.1905.00092.
Zachary S. Siegel, Sayash Kapoor, Nitya Nagdir, Benedikt Stroebl, and Arvind Narayanan. Core-bench: Fostering the credibility of published research through a computational reproducibility agent benchmark, 2024. URL: https://doi.org/10.48550/arXiv.2409.11363.
Mari Carmen Suárez-Figueroa, Asunción Gómez-Pérez, and Mariano Fernández-López. The neon methodology for ontology engineering. In Ontology engineering in a networked world, pages 9-34. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-24794-1_2.
Ana Trisovic, Matthew K Lau, Thomas Pasquier, and Mercè Crosas. A large-scale study on research code quality and execution. Scientific Data, 9(1):60, 2022.
Denny Vrandečić, Lydia Pintscher, and Markus Krötzsch. Wikidata: The making of. In Ying Ding, Jie Tang, Juan F. Sequeda, Lora Aroyo, Carlos Castillo, and Geert-Jan Houben, editors, Companion Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pages 615-624. ACM, 2023. URL: https://doi.org/10.1145/3543873.3585579.
Jiawei Wang, Tzu-yang Kuo, Li Li, and Andreas Zeller. Restoring reproducibility of jupyter notebooks. In 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pages 288-289, 2020. URL: https://doi.org/10.1145/3377812.3390803.
Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1):1-9, 2016.
Alistair Willis, Patricia Charlton, and Tony Hirst. Developing students' written communication skills with jupyter notebooks. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education, pages 1089-1095, 2020. URL: https://doi.org/10.1145/3328778.3366927.
Morgan F. Wofford, Bernadette M. Boscoe, Christine L. Borgman, Irene V. Pasquetto, and Milena S. Golshan. Jupyter notebooks as discovery mechanisms for open science: Citation practices in the astronomy community. Computing in Science & Engineering, 22(1):5-15, 2020. URL: https://doi.org/10.1109/MCSE.2019.2932067.
Jian Xu, Sunkyu Kim, Min Song, Minbyul Jeong, Donghyeon Kim, Jaewoo Kang, Justin F Rousseau, Xin Li, Weijia Xu, Vetle I Torvik, et al. Building a pubmed knowledge graph. Scientific data, 7(1):205, 2020.

FAIR Jupyter: A Knowledge Graph Approach to Semantic Sharing and Granular Exploration of a Computational Notebook Reproducibility Dataset

Authors Sheeba Samuel , Daniel Mietchen

File

Document Identifiers

Author Details

Acknowledgements

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

FAIR Jupyter: A Knowledge Graph Approach to Semantic Sharing and Granular Exploration of a Computational Notebook Reproducibility Dataset

Authors Sheeba Samuel , Daniel Mietchen

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

Supplementary Materials

References

Thanks for your feedback!

Could not send message