Knowledge Engineering Using Large Language Models

Allen, Bradley P.; Stork, Lise; Groth, Paul

doi:10.4230/TGDK.1.1.3

Abstract

Knowledge engineering is a discipline that focuses on the creation and maintenance of processes that generate and apply knowledge. Traditionally, knowledge engineering approaches have focused on knowledge expressed in formal languages. The emergence of large language models and their capabilities to effectively work with natural language, in its broadest sense, raises questions about the foundations and practice of knowledge engineering. Here, we outline the potential role of LLMs in knowledge engineering, identifying two central directions: 1) creating hybrid neuro-symbolic knowledge systems; and 2) enabling knowledge engineering in natural language. Additionally, we formulate key open research questions to tackle these directions.

Zeynep Akata, Dan Balliet, Maarten De Rijke, Frank Dignum, Virginia Dignum, Guszti Eiben, Antske Fokkens, Davide Grossi, Koen Hindriks, Holger Hoos, et al. A research agenda for hybrid intelligence: augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. Computer, 53(8):18-28, 2020. URL: https://doi.org/10.1109/MC.2020.2996587.
Dimitrios Alivanistos, Selene Báez Santamaría, Michael Cochez, Jan Christoph Kalo, Emile van Krieken, and Thiviyan Thanapalasingam. Prompting as probing: Using language models for knowledge base construction. In Sneha Singhania, Tuan-Phong Nguyen, and Simon Razniewski, editors, LM-KBC 2022 Knowledge Base Construction from Pre-trained Language Models 2022, CEUR Workshop Proceedings, pages 11-34. CEUR-WS.org, 2022. URL: https://doi.org/10.48550/ARXIV.2208.11057.
Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, and Marjan Ghazvininejad. A review on language models as knowledge bases. arXiv preprint arXiv:2204.06031, 2022. URL: https://doi.org/10.48550/ARXIV.2204.06031.
Bradley P Allen, Filip Ilievski, and Saurav Joshi. Identifying and consolidating knowledge engineering requirements. arXiv preprint arXiv:2306.15124, 2023. URL: https://doi.org/10.48550/ARXIV.2306.15124.
Christoph Alt, Marc Hübner, and Leonhard Hennig. Fine-tuning pre-trained transformer language models to distantly supervised relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1388-1398, 2019. URL: https://doi.org/10.18653/V1/P19-1134.
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021. URL: https://arxiv.org/abs/2108.07732.
Agnes Axelsson and Gabriel Skantze. Using large language models for zero-shot natural language generation from knowledge graphs. arXiv preprint arXiv:2307.07312, 2023. URL: https://doi.org/10.48550/ARXIV.2307.07312.
Stephen Bach, Victor Sanh, Zheng Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-david, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Fries, Maged Al-shaibani, Shanya Sharma, Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev, Mike Tian-jian Jiang, and Alexander Rush. PromptSource: An integrated development environment and repository for natural language prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 93-104, Dublin, Ireland, may 2022. Association for Computational Linguistics. URL: https://doi.org/10.18653/V1/2022.ACL-DEMO.9.
Claudine Badue, Rânik Guidolini, Raphael Vivacqua Carneiro, Pedro Azevedo, Vinicius B Cardoso, Avelino Forechi, Luan Jesus, Rodrigo Berriel, Thiago M Paixao, Filipe Mutz, et al. Self-driving cars: A survey. Expert Systems with Applications, 165:113816, 2021. URL: https://doi.org/10.1016/J.ESWA.2020.113816.
Steve Baskauf, Roger Hyam, Stanley Blum, Robert A Morris, Jonathan Rees, Joel Sachs, Greg Whitbread, and John Wieczorek. Tdwg standards documentation specification. Technical report, Biodiversity Information Standards (TDWG), 2017. URL: https://doi.org/10.3897/biss.3.35297.
Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific american, 284(5):34-43, 2001. URL: https://doi.org/10.1038/scientificamerican0501-34.
Camila Bezerra, Fred Freitas, and Filipe Santana. Evaluating ontologies with competency questions. In 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), volume 3, pages 284-285. IEEE, 2013. URL: https://doi.org/10.1109/WI-IAT.2013.199.
Christian Bizer, Tom Heath, Kingsley Idehen, and Tim Berners-Lee. Linked data on the web (ldow2008). In Proceedings of the 17th international conference on World Wide Web, pages 1265-1266, 2008. URL: https://doi.org/10.1145/1367497.1367760.
Vladimir Blagoderov, Ian J Kitching, Laurence Livermore, Thomas J Simonsen, and Vincent S Smith. No specimen left behind: industrial scale digitization of natural history collections. ZooKeys, 209:133-146, 2012. URL: https://doi.org/10.3897/zookeys.209.3178.
Rishi Bommasani and et al. On the opportunities and risks of foundation models. CoRR, abs/2108.07258, 2021. URL: https://arxiv.org/abs/2108.07258.
Stephen Bonner, Ian P Barrett, Cheng Ye, Rowan Swiers, Ola Engkvist, Andreas Bender, Charles Tapley Hoyt, and William L Hamilton. A review of biomedical datasets relating to drug discovery: a knowledge graph perspective. Briefings in Bioinformatics, 23(6):bbac404, 2022. URL: https://doi.org/10.1093/BIB/BBAC404.
Grady Booch, Francesco Fabiano, Lior Horesh, Kiran Kate, Jonathan Lenchner, Nick Linck, Andreas Loreggia, Keerthiram Murgesan, Nicholas Mattei, Francesca Rossi, et al. Thinking fast and slow in ai. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 15042-15046, 2021. URL: https://doi.org/10.1609/AAAI.V35I17.17765.
Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, and Yejin Choi. COMET: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4762-4779, Florence, Italy, jul 2019. Association for Computational Linguistics. URL: https://doi.org/10.18653/V1/P19-1470.
Anna Breit, Laura Waltersdorfer, Fajar J Ekaputra, Marta Sabou, Andreas Ekelhart, Andreea Iana, Heiko Paulheim, Jan Portisch, Artem Revenko, Annette ten Teije, et al. Combining machine learning and semantic web: A systematic mapping study. ACM Computing Surveys, 2023. URL: https://doi.org/10.1145/3586163.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877-1901, 2020. URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
Tiffany J. Callahan, Ignacio J. Tripodi, Harrison Pielke-Lombardo, and Lawrence E. Hunter. Knowledge-based biomedical data science. Annual Review of Biomedical Data Science, 3(1):23-41, 2020. URL: https://doi.org/10.1146/annurev-biodatasci-010820-091627.
Herman Cappelen and Josh Dever. Making AI intelligible: Philosophical foundations. Oxford University Press, 2021. URL: https://doi.org/10.1093/oso/9780192894724.001.0001.
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650-9660, 2021. URL: https://doi.org/10.1109/ICCV48922.2021.00951.
André W Carus. Carnap and twentieth-century thought: Explication as enlightenment. Cambridge University Press, 2007. URL: https://doi.org/10.1017/cbo9780511487132.
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109, 2023. URL: https://doi.org/10.48550/ARXIV.2307.03109.
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. Journal of machine learning research, 12(ARTICLE):2493-2537, 2011. URL: https://doi.org/10.5555/1953048.2078186.
Enrico Daga and Paul Groth. Data journeys: explaining ai workflows through abstraction. Semantic Web, Preprint:1-27, 2023. URL: https://doi.org/10.3233/sw-233407.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. URL: https://doi.org/10.48550/arXiv.1810.04805.
Chris Dijkshoorn, Mieke HR Leyssen, Archana Nottamkandath, Jasper Oosterman, Myriam C Traub, Lora Aroyo, Alessandro Bozzon, Wan J Fokkink, Geert-Jan Houben, Henrike Hovelmann, et al. Personalized nichesourcing: Acquisition of qualitative annotations from niche communities. In UMAP Workshops, 2013. URL: https://ceur-ws.org/Vol-997/patch2013_paper_13.pdf.
Filip Karlo Došilović, Mario Brčić, and Nikica Hlupić. Explainable artificial intelligence: A survey. In 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO), pages 0210-0215. IEEE, 2018. URL: https://doi.org/10.23919/MIPRO.2018.8400040.
Fajar J Ekaputra, Majlinda Llugiqi, Marta Sabou, Andreas Ekelhart, Heiko Paulheim, Anna Breit, Artem Revenko, Laura Waltersdorfer, Kheir Eddine Farfar, and Sören Auer. Describing and organizing semantic web and machine learning systems in the swemls-kg. In European Semantic Web Conference, pages 372-389. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-33455-9_22.
Edward A Feigenbaum. The art of artificial intelligence: Themes and case studies of knowledge engineering. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence, volume 2. Boston, 1977. URL: http://ijcai.org/Proceedings/77-2/Papers/092.pdf.
EDWARD A. FEIGENBAUM. Knowledge engineering. Annals of the New York Academy of Sciences, 426(1 Computer Cult):91-107, nov 1984. URL: https://doi.org/10.1111/j.1749-6632.1984.tb16513.x.
Edward A Feigenbaum. A personal view of expert systems: Looking back and looking ahead. Knowledge Systems Laboratory, Department of Computer Science, Stanford …, 1992. URL: https://doi.org/10.1016/0957-4174(92)90004-c.
Dov M Gabbay and John Woods. The rise of modern logic: from Leibniz to Frege. Elsevier, 2004.
Aldo Gangemi and Valentina Presutti. Ontology design patterns. In Handbook on ontologies, pages 221-243. Springer, 2009. URL: https://doi.org/10.1007/978-3-540-92673-3_10.
Clark Glymour, Kenneth M Ford, and Patrick J Hayes. Ramón lull and the infidels. AI Magazine, 19(2):136-136, 1998. URL: https://doi.org/10.1609/AIMAG.V19I2.1380.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139-144, 2020. URL: https://doi.org/10.1145/3422622.
Paul Groth, Aidan Hogan, Lise Stork, Katherine Thornton, and Vrandečić Denny. Knowledge graphs vs. other forms of knowledge representation. Dagstuhl Reports, 12(9):101-105, 2023. URL: https://doi.org/10.4230/DagRep.12.9.60.
Qipeng Guo, Zhijing Jin, Xipeng Qiu, Weinan Zhang, David Wipf, and Zheng Zhang. Cyclegt: Unsupervised graph-to-text and text-to-graph generation via cycle training. arXiv preprint arXiv:2006.04702, 2020. URL: https://doi.org/10.48550/arXiv.2006.04702.
Alex R Hardisty, Elizabeth R Ellwood, Gil Nelson, Breda Zimkus, Jutta Buschbom, Wouter Addink, Richard K Rabeler, John Bates, Andrew Bentley, José AB Fortes, et al. Digital extended specimens: Enabling an extensible network of biodiversity data records as integrated digital objects on the internet. BioScience, 72(10):978-987, 2022. URL: https://doi.org/10.1093/biosci/biac060.
Frederick Hayes-Roth, Donald A Waterman, and Douglas B Lenat. Building expert systems. Addison-Wesley Longman Publishing Co., Inc., 1983. URL: https://doi.org/10.1017/s0263574700004069.
James Hendler, Fabien Gandon, and Dean Allemang. Semantic web for the working ontologist: Effective modeling for linked data, RDFS, and OWL. Morgan & Claypool, 2020.
Birger Hjørland. What is knowledge organization (ko)? KO Knowledge Organization, 35(2-3):86-101, 2008. URL: https://doi.org/10.5771/0943-7444-2008-2-3-86.
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et al. Knowledge graphs. ACM Computing Surveys (CSUR), 54(4):1-37, 2021. URL: https://doi.org/10.1007/978-3-031-01918-0.
Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, Çagatay Demiralp, and César Hidalgo. Sherlock: A deep learning approach to semantic data type detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1500-1508, 2019. URL: https://doi.org/10.1145/3292500.3330993.
Naman Jain, Skanda Vaidyanath, Arun Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram Rajamani, and Rahul Sharma. Jigsaw: Large language models meet program synthesis. In Proceedings of the 44th International Conference on Software Engineering, pages 1219-1231, 2022. URL: https://doi.org/10.1145/3510003.3510203.
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1-38, 2023. URL: https://doi.org/10.1145/3571730.
Daniel Kahneman. Thinking, fast and slow. Macmillan, 2011.
Paschalia Kapli, Ziheng Yang, and Maximilian J Telford. Phylogenetic tree building in the genomic age. Nature Reviews Genetics, 21(7):428-444, 2020. URL: https://doi.org/10.1038/s41576-020-0233-0.
Elisa F Kendall and Deborah L McGuinness. Ontology engineering. Morgan & Claypool Publishers, 2019. URL: https://doi.org/10.1007/978-3-031-79486-5.
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199-22213, 2022.
Keti Korini and Christian Bizer. Column type annotation using chatgpt. arXiv preprint arXiv:2306.00745, 2023. URL: https://doi.org/10.48550/ARXIV.2306.00745.
Tiffany H Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos, Lorie De Leon, Camille Elepaño, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, et al. Performance of chatgpt on usmle: Potential for ai-assisted medical education using large language models. PLoS digital health, 2(2):e0000198, 2023. URL: https://doi.org/10.1101/2022.12.19.22283643.
Himabindu Lakkaraju, Dylan Slack, Yuxin Chen, Chenhao Tan, and Sameer Singh. Rethinking explainability as a dialogue: A practitioner’s perspective. CoRR, abs/2202.01875, 2022. URL: https://doi.org/10.48550/arXiv.2202.01875.
Wanhae Lee, Minki Chun, Hyeonhak Jeong, and Hyunggu Jung. Toward keyword generation through large language models. In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI '23 Companion, pages 37-40, New York, NY, USA, 2023. Association for Computing Machinery. URL: https://doi.org/10.1145/3581754.3584126.
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1-35, 2023. URL: https://doi.org/10.1145/3560815.
Michela Lorandi and Anya Belz. Data-to-text generation for severely under-resourced languages with gpt-3.5: A bit of help needed from google translate. arXiv preprint arXiv:2308.09957, 2023. URL: https://doi.org/10.48550/ARXIV.2308.09957.
Arthur MacGregor. Naturalists in the field: collecting, recording and preserving the natural world from the fifteenth to the twenty-first century, volume 2. Brill, 2018.
Kyle Mahowald, Anna A Ivanova, Idan A Blank, Nancy Kanwisher, Joshua B Tenenbaum, and Evelina Fedorenko. Dissociating language and thought in large language models: a cognitive perspective. arXiv preprint arXiv:2301.06627, 2023. URL: https://doi.org/10.48550/ARXIV.2301.06627.
Jose L. Martinez-Rodriguez, Aidan Hogan, and Ivan Lopez-Arevalo. Information extraction meets the semantic web: A survey. Semantic Web, 11(2):255-335, feb 2020. URL: https://doi.org/10.3233/SW-180333.
Richard Menary. Writing as thinking. Language sciences, 29(5):621-632, 2007. URL: https://doi.org/10.1016/j.langsci.2007.01.005.
Richard Menary. Dimensions of mind. Phenomenology and the Cognitive Sciences, 9:561-578, 2010. URL: https://doi.org/10.1007/s11097-010-9186-7.
Hugo Mercier and Dan Sperber. The enigma of reason. Harvard University Press, 2017. URL: https://doi.org/10.4159/9780674977860.
Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, et al. Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023. URL: https://doi.org/10.48550/ARXIV.2302.07842.
Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput. Surv., jun 2023. Just Accepted. URL: https://doi.org/10.1145/3605943.
Staffan Müller-Wille. Names and numbers:"data" in classical natural history, 1758-1859. Osiris, 32(1):109-128, 2017. URL: https://doi.org/10.1086/693560.
Zara Nasar, Syed Waqar Jaffry, and Muhammad Kamran Malik. Named entity recognition and relation extraction: State-of-the-art. ACM Computing Surveys (CSUR), 54(1):1-39, 2021. URL: https://doi.org/10.1145/3445965.
Catarina Dutilh Novaes. Formal languages in logic: A philosophical and cognitive analysis. Cambridge University Press, 2012. URL: https://doi.org/10.1017/cbo9781139108010.
Catarina Dutilh Novaes. The dialogical roots of deduction: Historical, cognitive, and philosophical perspectives on reasoning. Cambridge University Press, 2020. URL: https://doi.org/10.1017/9781108800792.
Alexandra Ortolja-Baird and Julianne Nyhan. Encoding the haunting of an object catalogue: on the potential of digital technologies to perpetuate or subvert the silence and bias of the early-modern archive. Digital Scholarship in the Humanities, 37(3):844-867, 2022. URL: https://doi.org/10.1093/LLC/FQAB065.
Roderic DM Page. Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Briefings in bioinformatics, 9(5):345-354, 2008. URL: https://doi.org/10.59350/x3wmw-nws84.
Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. Unifying large language models and knowledge graphs: A roadmap. arXiv preprint arXiv:2306.08302, 2023. URL: https://doi.org/10.48550/ARXIV.2306.08302.
Joon Sung Park, Joseph C O'Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023. URL: https://doi.org/10.1145/3586183.3606763.
Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019. URL: https://doi.org/10.18653/V1/D19-1250.
Valentina Presutti, Enrico Daga, Aldo Gangemi, and Eva Blomqvist. extreme design with content ontology design patterns. In Proc. Workshop on Ontology Patterns, pages 83-97, 2009. URL: https://ceur-ws.org/Vol-516/pap21.pdf.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748-8763. PMLR, 2021. URL: http://proceedings.mlr.press/v139/radford21a.html.
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492-28518. PMLR, 2023. URL: https://proceedings.mlr.press/v202/radford23a.html.
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022. URL: https://doi.org/10.48550/ARXIV.2204.06125.
Laria Reynolds and Kyle McDonell. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1-7, 2021. URL: https://doi.org/10.1145/3411763.3451760.
Tim Robertson, Markus Döring, Robert Guralnick, David Bloom, John Wieczorek, Kyle Braak, Javier Otegui, Laura Russell, and Peter Desmet. The gbif integrated publishing toolkit: facilitating the efficient publishing of biodiversity data on the internet. PloS one, 9(8):e102623, 2014. URL: https://doi.org/10.1371/journal.pone.0102623.
Víctor Rodríguez-Doncel and Elena Montiel-Ponsoda. Lynx: Towards a legal knowledge graph for multilingual europe. Law Context: A Socio-Legal J., 37:175, 2020. URL: https://doi.org/10.26826/law-in-context.v37i1.129.
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684-10695, 2022. URL: https://doi.org/10.1109/CVPR52688.2022.01042.
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools, 2023. URL: https://doi.org/10.48550/ARXIV.2302.04761.
Guus Schreiber. Knowledge engineering. Foundations of Artificial Intelligence, 3:929-946, 2008. URL: https://doi.org/10.1016/S1574-6526(07)03025-8.
Guus Schreiber, Hans Akkermans, Anjo Anjewierden, Nigel Shadbolt, Robert de Hoog, Walter Van de Velde, and Bob Wielinga. Knowledge engineering and management: the CommonKADS methodology. MIT press, 2000. URL: https://doi.org/10.7551/mitpress/4073.001.0001.
Guus Schreiber and Lora Aroyo. Principles for knowledge engineering on the web. In AAAI Spring Symposium: Symbiotic Relationships between Semantic Web and Knowledge Engineering, pages 78-82, 2008. URL: https://aaai.org/papers/0012-ss08-07-012-principles-for-knowledge-engineering-on-the-web/.
Nigel R Shadbolt, Paul R Smart, J Wilson, and S Sharples. Knowledge elicitation. Evaluation of human work, pages 163-200, 2015.
Murray Shanahan. Talking about large language models. arXiv preprint arXiv:2212.03551, 2022. URL: https://doi.org/10.48550/ARXIV.2212.03551.
Steffen Staab and Rudi Studer. Handbook on ontologies. Springer Science & Business Media, 2010. URL: https://doi.org/10.1007/978-3-540-24750-0.
Lise Stork. Knowledge extraction from archives of natural history collections. PhD thesis, Ph. D. Dissertation, Leiden University, 2021.
Lise Stork, Andreas Weber, Eulàlia Gassó Miracle, Fons Verbeek, Aske Plaat, Jaap van den Herik, and Katherine Wolstencroft. Semantic annotation of natural history collections. Journal of Web Semantics, 59:100462, 2019. URL: https://doi.org/10.1016/J.WEBSEM.2018.06.002.
Rudi Studer, V Richard Benjamins, and Dieter Fensel. Knowledge engineering: Principles and methods. Data & knowledge engineering, 25(1-2):161-197, 1998. URL: https://doi.org/10.1016/S0169-023X(97)00056-6.
Mari Carmen Suárez-Figueroa, Asunción Gómez-Pérez, and Mariano Fernández-López. The neon methodology for ontology engineering. In Ontology engineering in a networked world, pages 9-34. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-24794-1_2.
Mari Carmen Suárez-Figueroa, Asunción Gómez-Pérez, Enrico Motta, and Aldo Gangemi. Introduction: Ontology engineering in a networked world. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-24794-1_1.
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014. URL: https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html.
Anders Telenius. Biodiversity information goes public: Gbif at your service. Nordic Journal of Botany, 29(3):378-381, 2011. URL: https://doi.org/10.1111/j.1756-1051.2011.01167.x.
Ian Tenney, Dipanjan Das, and Ellie Pavlick. Bert rediscovers the classical nlp pipeline. arXiv preprint arXiv:1905.05950, 2019. URL: https://doi.org/10.18653/V1/P19-1452.
Ilaria Tiddi and Stefan Schlobach. Knowledge graphs as tools for explainable machine learning: A survey. Artificial Intelligence, 302:103627, 2022. URL: https://doi.org/10.1016/J.ARTINT.2021.103627.
Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts, pages 1-7, 2022. URL: https://doi.org/10.1145/3491101.3519665.
Michael van Bekkum, Maaike de Boer, Frank van Harmelen, André Meyer-Vitali, and Annette ten Teije. Modular design patterns for hybrid learning and reasoning systems: a taxonomy, patterns and use cases. Applied Intelligence, 51(9):6528-6546, 2021. URL: https://doi.org/10.1007/s10489-021-02394-3.
M.G.J. van Erp. Accessing natural history: Discoveries in data cleaning, structuring, and retrieval. PhD thesis, Tilburg University, 2010. Series: TiCC Ph.D. Series Volume: 13.
Frank Van Harmelen and Annette ten Teije. A boxology of design patterns for hybrid learning and reasoning systems. arXiv preprint arXiv:1905.12389, 2019. URL: https://doi.org/10.48550/arXiv.1905.12389.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, et al. Neural codec language models are zero-shot text to speech synthesizers. arXiv preprint arXiv:2301.02111, 2023. URL: https://doi.org/10.48550/ARXIV.2301.02111.
Haohan Wang and Bhiksha Raj. On the origin of deep learning. arXiv preprint arXiv:1702.07800, 2017. URL: https://doi.org/10.48550/arXiv.1702.07800.
Andreas Weber, Mahya Ameryan, Katherine Wolstencroft, Lise Stork, Maarten Heerlien, and Lambert Schomaker. Towards a digital infrastructure for illustrated handwritten archives. In Digital Cultural Heritage: Final Conference of the Marie Skłodowska-Curie Initial Training Network for Digital Cultural Heritage, ITN-DCH 2017, Olimje, Slovenia, May 23-25, 2017, Revised Selected Papers, pages 155-166. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-75826-8_13.
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824-24837, 2022. URL: http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.
Justin D Weisz, Michael Muller, Jessica He, and Stephanie Houde. Toward general design principles for generative ai applications. arXiv preprint arXiv:2301.05578, 2023. URL: https://doi.org/10.48550/ARXIV.2301.05578.
Jules White, Sam Hays, Quchen Fu, Jesse Spencer-Smith, and Douglas C. Schmidt. Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design, 2023. URL: https://doi.org/10.48550/ARXIV.2303.07839.
Bob J Wielinga, A Th Schreiber, and Jost A Breuker. Kads: A modelling approach to knowledge engineering. Knowledge acquisition, 4(1):5-53, 1992. URL: https://doi.org/10.1016/1042-8143(92)90013-q.
Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J.G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A.C 't Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1), mar 2016. URL: https://doi.org/10.1038/sdata.2016.18.
Lionel Wong, Gabriel Grand, Alexander K Lew, Noah D Goodman, Vikash K Mansinghka, Jacob Andreas, and Joshua B Tenenbaum. From word models to world models: Translating from natural language to the probabilistic language of thought. arXiv preprint arXiv:2306.12672, 2023. URL: https://doi.org/10.48550/ARXIV.2306.12672.
Qianqian Xie, Jennifer Amy Bishop, Prayag Tiwari, and Sophia Ananiadou. Pre-trained language models with domain knowledge for biomedical extractive summarization. Knowledge-Based Systems, 252:109460, 2022. URL: https://doi.org/10.1016/J.KNOSYS.2022.109460.
Zonglin Yang, Xinya Du, Rui Mao, Jinjie Ni, and Erik Cambria. Logical reasoning over natural language as knowledge representation: A survey, 2023. URL: https://doi.org/10.48550/ARXIV.2303.12023.
Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, and Woomyeong Park. Gpt3mix: Leveraging large-scale language models for text augmentation. arXiv preprint arXiv:2104.08826, 2021. URL: https://doi.org/10.18653/v1/2021.findings-emnlp.192.
Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022. URL: https://doi.org/10.48550/ARXIV.2205.01917.
Wen Zhang, Yushan Zhu, Mingyang Chen, Yuxia Geng, Yufeng Huang, Yajing Xu, Wenting Song, and Huajun Chen. Structure pretraining and prompt tuning for knowledge graph transfer. In Proceedings of the ACM Web Conference 2023, WWW '23, pages 2581-2590, New York, NY, USA, 2023. Association for Computing Machinery. URL: https://doi.org/10.1145/3543507.3583301.
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337-2348, 2022. URL: https://doi.org/10.48550/arXiv.2109.01134.

Knowledge Engineering Using Large Language Models

Authors Bradley P. Allen , Lise Stork , Paul Groth

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Knowledge Engineering Using Large Language Models

Authors Bradley P. Allen , Lise Stork , Paul Groth

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References