Explaining Enterprise Knowledge Graphs with Large Language Models and Ontological Reasoning

Baldazzi, Teodoro; Bellomarini, Luigi; Ceri, Stefano; Colombo, Andrea; Gentili, Andrea; Sallinger, Emanuel; Atzeni, Paolo

doi:10.4230/OASIcs.Tannen.1

Abstract

In recent times, the demand for transparency and accountability in AI-driven decisions has intensified, particularly in high-stakes domains like finance and bio-medicine. This focus on the provenance of AI-generated conclusions underscores the need for decision-making processes that are not only transparent but also readily interpretable by humans, to built trust of both users and stakeholders. In this context, the integration of state-of-the-art Large Language Models (LLMs) with logic-oriented Enterprise Knowledge Graphs (EKGs) and the broader scope of Knowledge Representation and Reasoning (KRR) methodologies is currently at the cutting edge of industrial and academic research across numerous data-intensive areas. Indeed, such a synergy is paramount as LLMs bring a layer of adaptability and human-centric understanding that complements the structured insights of EKGs. Conversely, the central role of ontological reasoning is to capture the domain knowledge, accurately handling complex tasks over a given realm of interest, and to infuse the process with transparency and a clear provenance-based explanation of the conclusions drawn, addressing the fundamental challenge of LLMs' inherent opacity and fostering trust and accountability in AI applications. In this paper, we propose a novel neuro-symbolic framework that leverages the underpinnings of provenance in ontological reasoning to enhance state-of-the-art LLMs with domain awareness and explainability, enabling them to act as natural language interfaces to EKGs.

Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of databases, volume 8. Addison-Wesley Reading, 1995.
Foto N. Afrati, Manolis Gergatsoulis, and Francesca Toni. Linearisability on datalog programs. Theor. Comput. Sci., 308(1-3):199-226, 2003. URL: https://doi.org/10.1016/S0304-3975(02)00730-2.
Paolo Atzeni, Luigi Bellomarini, Michela Iezzi, Emanuel Sallinger, and Adriano Vlad. Augmenting logic-based knowledge graphs: The case of company graphs. In KR4L@ECAI, volume 3020, pages 22-27. CEUR-WS.org, 2020.
Paolo Atzeni, Luigi Bellomarini, Michela Iezzi, Emanuel Sallinger, and Adriano Vlad. Weaving enterprise knowledge graphs: The case of company ownership graphs. In EDBT, pages 555-566. OpenProceedings.org, 2020.
Teodoro Baldazzi, Luigi Bellomarini, Stefano Ceri, Andrea Colombo, Andrea Gentili, and Emanuel Sallinger. Fine-tuning large enterprise language models via ontological reasoning. In International Joint Conference on Rules and Reasoning, pages 86-94. Springer, 2023.
Teodoro Baldazzi, Luigi Bellomarini, Stefano Ceri, Andrea Colombo, Andrea Gentili, and Emanuel Sallinger. "please, vadalog, tell me why": Interactive explanation of datalog-based reasoning. In Letizia Tanca, Qiong Luo, Giuseppe Polese, Loredana Caruccio, Xavier Oriol, and Donatella Firmani, editors, Proceedings 27th International Conference on Extending Database Technology, EDBT 2024, Paestum, Italy, March 25 - March 28, pages 834-837. OpenProceedings.org, 2024. URL: https://doi.org/10.48786/EDBT.2024.82.
Teodoro Baldazzi, Luigi Bellomarini, Marco Favorito, and Emanuel Sallinger. Ontological reasoning over shy and warded datalog+/-for streaming-based architectures. In International Symposium on Practical Aspects of Declarative Languages, pages 169-185. Springer, 2024.
Teodoro Baldazzi, Luigi Bellomarini, Markus Gerschberger, Aditya Jami, Davide Magnanimi, Markus Nissl, Aleksandar Pavlović, and Emanuel Sallinger. Vadalog: Overview, extensions and business applications. Reasoning Web. Causality, Explanations and Declarative Knowledge: 18th International Summer School 2022, Berlin, Germany, September 27-30, 2022, Tutorial Lectures, pages 161-198, 2023.
Teodoro Baldazzi, Luigi Bellomarini, Emanuel Sallinger, and Paolo Atzeni. Eliminating harmful joins in warded datalog+/-. In RuleML+RR, volume 12851 of Lecture Notes in Computer Science, pages 267-275. Springer, 2021.
Teodoro Baldazzi, Davide Benedetto, Matteo Brandetti, Adriano Vlad, Luigi Bellomarini, and Emanuel Sallinger. Datalog-based reasoning with heuristics over knowledge graphs. In Datalog, volume 3203 of CEUR Workshop Proceedings, pages 114-126. CEUR-WS.org, 2022.
Pablo Barceló and Reinhard Pichler. Datalog in Academia and Industry: Second International Workshop, Datalog 2.0, Vienna, Austria, September 11-13, 2012, Proceedings, volume 7494. Springer, 2012.
Luigi Bellomarini, Lorenzo Bencivelli, Claudia Biancotti, Livia Blasi, Francesco Paolo Conteduca, Andrea Gentili, Rosario Laurendi, Davide Magnanimi, Michele Savini Zangrandi, Flavia Tonelli, Stefano Ceri, Davide Benedetto, Markus Nissl, and Emanuel Sallinger. Reasoning on company takeovers: From tactic to strategy. Data Knowl. Eng., 141:102073, 2022.
Luigi Bellomarini, Marco Benedetti, Andrea Gentili, Davide Magnanimi, and Emanuel Sallinger. Kg-roar: Interactive datalog-based reasoning on virtual knowledge graphs. Proc. VLDB Endow., 16(12):4014-4017, August 2023.
Luigi Bellomarini, Davide Benedetto, Georg Gottlob, and Emanuel Sallinger. Vadalog: A modern architecture for automated reasoning with large knowledge graphs. Inf. Syst., 105:101528, 2022. URL: https://doi.org/10.1016/j.is.2020.101528.
Luigi Bellomarini, Daniele Fakhoury, Georg Gottlob, and Emanuel Sallinger. Knowledge graphs and enterprise AI: the promise of an enabling technology. In ICDE, pages 26-37, 2019.
Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. The vadalog system: Datalog-based reasoning for knowledge graphs. Proc. VLDB Endow., 11(9):975-987, May 2018. URL: https://doi.org/10.14778/3213880.3213888.
Tom Brown and et al. Language models are few-shot learners. In NeurIPS, volume 33, pages 1877-1901. Curran Associates, Inc., 2020.
Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. Why and where: A characterization of data provenance. In Jan Van den Bussche and Victor Vianu, editors, Database Theory - ICDT 2001, pages 316-330. Springer Berlin Heidelberg, 2001.
Pedro Cabalar, Jorge Fandinno, and Brais Muñiz. A system for explainable answer set programming. Electronic Proceedings in Theoretical Computer Science, 325:124-136, 2020. URL: https://doi.org/10.4204/eptcs.325.19.
Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. A general datalog-based framework for tractable query answering over ontologies. J. Web Semant., 14:57-83, 2012. URL: https://doi.org/10.1016/j.websem.2012.03.001.
Andrea Calì, Georg Gottlob, Thomas Lukasiewicz, Bruno Marnette, and Andreas Pieris. Datalog+/-: A family of logical knowledge representation and query languages for new applications. In 2010 25th annual IEEE symposium on logic in computer science, pages 228-242. IEEE, 2010.
Andrea Calì, Georg Gottlob, and Andreas Pieris. Towards more expressive ontology languages: The query answering problem. Artificial Intelligence, 193:87-128, 2012.
Steven P Callahan, Juliana Freire, Emanuele Santos, Carlos E Scheidegger, Cláudio T Silva, and Huy T Vo. Vistrails: visualization meets data management. In ACM SIGMOD 2006, pages 745-747, 2006.
Luciano Caroprese, Eugenio Vocaturo, and Ester Zumpano. Argumentation approaches for explanaible ai in medical informatics. Intelligent Systems with Applications, 16:200109, 2022. URL: https://doi.org/10.1016/j.iswa.2022.200109.
Stefano Ceri, Georg Gottlob, Letizia Tanca, et al. What you always wanted to know about datalog(and never dared to ask). IEEE transactions on knowledge and data engineering, 1(1):146-166, 1989.
James Cheney, Laura Chiticariu, Wang-Chiew Tan, et al. Provenance in databases: Why, how, and where. Foundations and Trendsregistered in Databases, 1(4):379-474, 2009.
Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, and Susan Davidson. Addressing the provenance challenge using zoom. Concurrency and Computation: Practice and Experience, 20(5):497-506, 2008.
Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov. Complexity and expressive power of logic programming. ACM Comput. Surv., 33(3):374-425, September 2001. URL: https://doi.org/10.1145/502807.502810.
Daniel Deutch, Nave Frost, and Amir Gilad. Provenance for natural language queries. Proc. VLDB Endow., 10(5):577-588, January 2017. URL: https://doi.org/10.14778/3055540.3055550.
Daniel Deutch, Nave Frost, and Amir Gilad. Natural language explanations for query results. SIGMOD Rec., 47(1):42-49, September 2018. URL: https://doi.org/10.1145/3277006.3277017.
Xin Luna Dong. Generations of knowledge graphs: The crazy ideas and the business impact. arXiv preprint arXiv:2308.14217, 2023.
Owen P. Dwyer, Teodoro Baldazzi, Jim Davies, Emanuel Sallinger, and Adriano Vlad. Reasoning over health records with vadalog: a rule-based approach to patient pathways. In Jan Vanthienen, Tomás Kliegr, Paul Fodor, Davide Lanti, Dörthe Arndt, Egor V. Kostylev, Theodoros Mitsikas, and Ahmet Soylu, editors, Proceedings of the 17th International Rule Challenge and 7th Doctoral Consortium @ RuleML+RR 2023 co-located with 19th Reasoning Web Summer School (RW 2023) and 15th DecisionCAMP 2023 as part of Declarative AI 2023, Oslo, Norway, 18 - 20 September, 2023, volume 3485 of CEUR Workshop Proceedings. CEUR-WS.org, 2023. URL: https://ceur-ws.org/Vol-3485/paper9111.pdf.
Esra Erdem and Umut Oztok. Generating explanations for biomedical queries. Theory and Practice of Logic Programming, 15(1):35-78, 2015.
European Central Bank. Guideline (eu) 2011/14 of the ecb guideline, 2011.
Jorge Fandinno and Claudia Schulz. Answering the “why” in answer set programming-a survey of explanation approaches. Theory and Practice of Logic Programming, 19(2):114-203, 2019.
Boris Glavic, Renée J. Miller, and Gustavo Alonso. Using SQL for Efficient Generation and Querying of Provenance Information, pages 291-320. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. URL: https://doi.org/10.1007/978-3-642-41660-6_16.
Goetz Graefe and William J. McKenna. The volcano optimizer generator: Extensibility and efficient search. In ICDE, pages 209-218. IEEE Computer Society, 1993.
Todd J Green, Shan Shan Huang, Boon Thau Loo, Wenchao Zhou, et al. Datalog and recursive query processing. Foundations and Trendsregistered in Databases, 5(2):105-195, 2013.
Todd J Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 31-40, 2007.
Todd J. Green and Val Tannen. The semiring framework for database provenance. In PODS, pages 93-99. ACM, 2017.
A. Gulino, S. Ceri, G. Gottlob, E. Sallinger, and L. Bellomarini. Distributed company control in company shareholding graphs. In IEEE 37th International Conference on Data Engineering (ICDE), pages 2637-2648, Los Alamitos, CA, USA, 2021.
Kyle Hamilton, Aparna Nayak, Bojan Bozic, and Luca Longo. Is neuro-symbolic AI meeting its promise in natural language processing? A structured review. CoRR, abs/2202.12205, 2022. URL: https://arxiv.org/abs/2202.12205.
Melanie Herschel and Marcel Hlawatsch. Provenance: On and behind the screens. In SIGMOD 2016, pages 2213-2217, New York, NY, USA, 2016. Association for Computing Machinery. URL: https://doi.org/10.1145/2882903.2912568.
Jason I. Hong. Teaching the fate community about privacy. Commun. ACM, 66(8):10-11, July 2023.
David S. Johnson and Anthony C. Klug. Testing containment of conjunctive queries under functional and inclusion dependencies. J. Comput. Syst. Sci., 28(1):167-189, 1984.
Dominik K Kanbach, Louisa Heiduk, Georg Blueher, Maximilian Schreiter, and Alexander Lahmann. The genai is out of the bottle: generative artificial intelligence from a business model innovation perspective. Review of Managerial Science, pages 1-32, 2023.
David Koop, Marta Mattoso, and Juliana Freire. Provenance in Workflows, pages 2912-2916. Springer New York, New York, NY, 2018. URL: https://doi.org/10.1007/978-1-4614-8265-9_80745.
Markus Krötzsch and Veronika Thost. Ontologies for knowledge graphs: Breaking the rules. In International Semantic Web Conference, pages 376-392. Springer, 2016.
Seokki Lee, Bertram Ludäscher, and Boris Glavic. Provenance summaries for answers and non-answers. Proc. VLDB Endow., 11(12):1954-1957, August 2018. URL: https://doi.org/10.14778/3229863.3236233.
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459-9474, 2020.
Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. Finbert: A pre-trained financial language representation model for financial text mining. In IJCAI 2020, 2021.
Davide Magnanimi and Michela Iezzi. Ownership graphs and reasoning in corporate economics. In EDBT/ICDT Workshops, volume 3135 of CEUR Workshop Proceedings. CEUR-WS.org, 2022.
David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. Testing implications of data dependencies. ACM TODS, 4(4):455-469, 1979. URL: https://doi.org/10.1145/320107.320115.
Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. Unifying large language models and knowledge graphs: A roadmap. arXiv preprint arXiv:2306.08302, 2023.
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. ., 2018.
Yann Ramusat, Silviu Maniu, and Pierre Senellart. Semiring provenance over graph databases. In Proceedings of the 10th USENIX Conference on Theory and Practice of Provenance, TaPP'18, page 7, USA, 2018. USENIX Association.
Adam Roberts, Colin Raffel, and Noam Shazeer. How much knowledge can you pack into the parameters of a language model? In EMNLP (1), pages 5418-5426. ACL, 2020.
Sudeepa Roy and Dan Suciu. A formal approach to finding explanations for database queries. In SIGMOD 2014, pages 1579-1590, New York, NY, USA, 2014. Association for Computing Machinery. URL: https://doi.org/10.1145/2588555.2588578.
Pierre Senellart. Provenance in Databases: Principles And Applications, pages 104-109. Springer-Verlag, Berlin, Heidelberg, 2022. URL: https://doi.org/10.1007/978-3-030-31423-1_3.
Yogesh L Simmhan, Beth Plale, and Dennis Gannon. Karma2: Provenance management for data-driven workflows. International Journal of Web Services Research (IJWSR), 5(2):1-22, 2008.
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
Adriano Vlad, Sahar Vahdati, Mojtaba Nayyeri, Luigi Bellomarini, and Emanuel Sallinger. Towards hybrid logic-based and embedding-based reasoning on financial knowledge graphs. In EDBT/ICDT Workshops, volume 3135, 2022.
Yisu Remy Wang, Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, and Dan Suciu. Optimizing recursive queries with progam synthesis. In SIGMOD 2022, pages 79-93, New York, NY, USA, 2022. Association for Computing Machinery. URL: https://doi.org/10.1145/3514221.3517827.
Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David S. Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for finance. CoRR, abs/2303.17564, 2023.
Tianyi Zhang*, Varsha Kishore*, Felix Wu*, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, 2020. URL: https://openreview.net/forum?id=SkeHuCVFDr.
Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, and Mengnan Du. Explainability for large language models: A survey. arXiv preprint arXiv:2309.01029, 2023.

Explaining Enterprise Knowledge Graphs with Large Language Models and Ontological Reasoning

Authors Teodoro Baldazzi , Luigi Bellomarini , Stefano Ceri , Andrea Colombo , Andrea Gentili , Emanuel Sallinger , Paolo Atzeni

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Explaining Enterprise Knowledge Graphs with Large Language Models and Ontological Reasoning

Authors Teodoro Baldazzi , Luigi Bellomarini , Stefano Ceri , Andrea Colombo , Andrea Gentili , Emanuel Sallinger , Paolo Atzeni

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message