Explaining Enterprise Knowledge Graphs with Large Language Models and Ontological Reasoning

Authors Teodoro Baldazzi , Luigi Bellomarini , Stefano Ceri , Andrea Colombo , Andrea Gentili , Emanuel Sallinger , Paolo Atzeni



PDF
Thumbnail PDF

File

OASIcs.Tannen.1.pdf
  • Filesize: 3.76 MB
  • 20 pages

Document Identifiers

Author Details

Teodoro Baldazzi
  • Università Roma Tre, Italy
Luigi Bellomarini
  • Banca d'Italia, Roma, Italy
Stefano Ceri
  • Politecnico di Milano, Italy
Andrea Colombo
  • Politecnico di Milano, Italy
Andrea Gentili
  • Banca d'Italia, Roma, Italy
Emanuel Sallinger
  • TU Wien, Austria
  • University of Oxford, UK
Paolo Atzeni
  • Università Roma Tre, Italy

Cite AsGet BibTex

Teodoro Baldazzi, Luigi Bellomarini, Stefano Ceri, Andrea Colombo, Andrea Gentili, Emanuel Sallinger, and Paolo Atzeni. Explaining Enterprise Knowledge Graphs with Large Language Models and Ontological Reasoning. In The Provenance of Elegance in Computation - Essays Dedicated to Val Tannen. Open Access Series in Informatics (OASIcs), Volume 119, pp. 1:1-1:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/OASIcs.Tannen.1

Abstract

In recent times, the demand for transparency and accountability in AI-driven decisions has intensified, particularly in high-stakes domains like finance and bio-medicine. This focus on the provenance of AI-generated conclusions underscores the need for decision-making processes that are not only transparent but also readily interpretable by humans, to built trust of both users and stakeholders. In this context, the integration of state-of-the-art Large Language Models (LLMs) with logic-oriented Enterprise Knowledge Graphs (EKGs) and the broader scope of Knowledge Representation and Reasoning (KRR) methodologies is currently at the cutting edge of industrial and academic research across numerous data-intensive areas. Indeed, such a synergy is paramount as LLMs bring a layer of adaptability and human-centric understanding that complements the structured insights of EKGs. Conversely, the central role of ontological reasoning is to capture the domain knowledge, accurately handling complex tasks over a given realm of interest, and to infuse the process with transparency and a clear provenance-based explanation of the conclusions drawn, addressing the fundamental challenge of LLMs' inherent opacity and fostering trust and accountability in AI applications. In this paper, we propose a novel neuro-symbolic framework that leverages the underpinnings of provenance in ontological reasoning to enhance state-of-the-art LLMs with domain awareness and explainability, enabling them to act as natural language interfaces to EKGs.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Knowledge representation and reasoning
  • Computing methodologies → Natural language processing
  • Theory of computation → Data provenance
Keywords
  • provenance
  • ontological reasoning
  • language models
  • knowledge graphs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of databases, volume 8. Addison-Wesley Reading, 1995. Google Scholar
  2. Foto N. Afrati, Manolis Gergatsoulis, and Francesca Toni. Linearisability on datalog programs. Theor. Comput. Sci., 308(1-3):199-226, 2003. URL: https://doi.org/10.1016/S0304-3975(02)00730-2.
  3. Paolo Atzeni, Luigi Bellomarini, Michela Iezzi, Emanuel Sallinger, and Adriano Vlad. Augmenting logic-based knowledge graphs: The case of company graphs. In KR4L@ECAI, volume 3020, pages 22-27. CEUR-WS.org, 2020. Google Scholar
  4. Paolo Atzeni, Luigi Bellomarini, Michela Iezzi, Emanuel Sallinger, and Adriano Vlad. Weaving enterprise knowledge graphs: The case of company ownership graphs. In EDBT, pages 555-566. OpenProceedings.org, 2020. Google Scholar
  5. Teodoro Baldazzi, Luigi Bellomarini, Stefano Ceri, Andrea Colombo, Andrea Gentili, and Emanuel Sallinger. Fine-tuning large enterprise language models via ontological reasoning. In International Joint Conference on Rules and Reasoning, pages 86-94. Springer, 2023. Google Scholar
  6. Teodoro Baldazzi, Luigi Bellomarini, Stefano Ceri, Andrea Colombo, Andrea Gentili, and Emanuel Sallinger. "please, vadalog, tell me why": Interactive explanation of datalog-based reasoning. In Letizia Tanca, Qiong Luo, Giuseppe Polese, Loredana Caruccio, Xavier Oriol, and Donatella Firmani, editors, Proceedings 27th International Conference on Extending Database Technology, EDBT 2024, Paestum, Italy, March 25 - March 28, pages 834-837. OpenProceedings.org, 2024. URL: https://doi.org/10.48786/EDBT.2024.82.
  7. Teodoro Baldazzi, Luigi Bellomarini, Marco Favorito, and Emanuel Sallinger. Ontological reasoning over shy and warded datalog+/-for streaming-based architectures. In International Symposium on Practical Aspects of Declarative Languages, pages 169-185. Springer, 2024. Google Scholar
  8. Teodoro Baldazzi, Luigi Bellomarini, Markus Gerschberger, Aditya Jami, Davide Magnanimi, Markus Nissl, Aleksandar Pavlović, and Emanuel Sallinger. Vadalog: Overview, extensions and business applications. Reasoning Web. Causality, Explanations and Declarative Knowledge: 18th International Summer School 2022, Berlin, Germany, September 27-30, 2022, Tutorial Lectures, pages 161-198, 2023. Google Scholar
  9. Teodoro Baldazzi, Luigi Bellomarini, Emanuel Sallinger, and Paolo Atzeni. Eliminating harmful joins in warded datalog+/-. In RuleML+RR, volume 12851 of Lecture Notes in Computer Science, pages 267-275. Springer, 2021. Google Scholar
  10. Teodoro Baldazzi, Davide Benedetto, Matteo Brandetti, Adriano Vlad, Luigi Bellomarini, and Emanuel Sallinger. Datalog-based reasoning with heuristics over knowledge graphs. In Datalog, volume 3203 of CEUR Workshop Proceedings, pages 114-126. CEUR-WS.org, 2022. Google Scholar
  11. Pablo Barceló and Reinhard Pichler. Datalog in Academia and Industry: Second International Workshop, Datalog 2.0, Vienna, Austria, September 11-13, 2012, Proceedings, volume 7494. Springer, 2012. Google Scholar
  12. Luigi Bellomarini, Lorenzo Bencivelli, Claudia Biancotti, Livia Blasi, Francesco Paolo Conteduca, Andrea Gentili, Rosario Laurendi, Davide Magnanimi, Michele Savini Zangrandi, Flavia Tonelli, Stefano Ceri, Davide Benedetto, Markus Nissl, and Emanuel Sallinger. Reasoning on company takeovers: From tactic to strategy. Data Knowl. Eng., 141:102073, 2022. Google Scholar
  13. Luigi Bellomarini, Marco Benedetti, Andrea Gentili, Davide Magnanimi, and Emanuel Sallinger. Kg-roar: Interactive datalog-based reasoning on virtual knowledge graphs. Proc. VLDB Endow., 16(12):4014-4017, August 2023. Google Scholar
  14. Luigi Bellomarini, Davide Benedetto, Georg Gottlob, and Emanuel Sallinger. Vadalog: A modern architecture for automated reasoning with large knowledge graphs. Inf. Syst., 105:101528, 2022. URL: https://doi.org/10.1016/j.is.2020.101528.
  15. Luigi Bellomarini, Daniele Fakhoury, Georg Gottlob, and Emanuel Sallinger. Knowledge graphs and enterprise AI: the promise of an enabling technology. In ICDE, pages 26-37, 2019. Google Scholar
  16. Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. The vadalog system: Datalog-based reasoning for knowledge graphs. Proc. VLDB Endow., 11(9):975-987, May 2018. URL: https://doi.org/10.14778/3213880.3213888.
  17. Tom Brown and et al. Language models are few-shot learners. In NeurIPS, volume 33, pages 1877-1901. Curran Associates, Inc., 2020. Google Scholar
  18. Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. Why and where: A characterization of data provenance. In Jan Van den Bussche and Victor Vianu, editors, Database Theory - ICDT 2001, pages 316-330. Springer Berlin Heidelberg, 2001. Google Scholar
  19. Pedro Cabalar, Jorge Fandinno, and Brais Muñiz. A system for explainable answer set programming. Electronic Proceedings in Theoretical Computer Science, 325:124-136, 2020. URL: https://doi.org/10.4204/eptcs.325.19.
  20. Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. A general datalog-based framework for tractable query answering over ontologies. J. Web Semant., 14:57-83, 2012. URL: https://doi.org/10.1016/j.websem.2012.03.001.
  21. Andrea Calì, Georg Gottlob, Thomas Lukasiewicz, Bruno Marnette, and Andreas Pieris. Datalog+/-: A family of logical knowledge representation and query languages for new applications. In 2010 25th annual IEEE symposium on logic in computer science, pages 228-242. IEEE, 2010. Google Scholar
  22. Andrea Calì, Georg Gottlob, and Andreas Pieris. Towards more expressive ontology languages: The query answering problem. Artificial Intelligence, 193:87-128, 2012. Google Scholar
  23. Steven P Callahan, Juliana Freire, Emanuele Santos, Carlos E Scheidegger, Cláudio T Silva, and Huy T Vo. Vistrails: visualization meets data management. In ACM SIGMOD 2006, pages 745-747, 2006. Google Scholar
  24. Luciano Caroprese, Eugenio Vocaturo, and Ester Zumpano. Argumentation approaches for explanaible ai in medical informatics. Intelligent Systems with Applications, 16:200109, 2022. URL: https://doi.org/10.1016/j.iswa.2022.200109.
  25. Stefano Ceri, Georg Gottlob, Letizia Tanca, et al. What you always wanted to know about datalog(and never dared to ask). IEEE transactions on knowledge and data engineering, 1(1):146-166, 1989. Google Scholar
  26. James Cheney, Laura Chiticariu, Wang-Chiew Tan, et al. Provenance in databases: Why, how, and where. Foundations and Trendsregistered in Databases, 1(4):379-474, 2009. Google Scholar
  27. Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, and Susan Davidson. Addressing the provenance challenge using zoom. Concurrency and Computation: Practice and Experience, 20(5):497-506, 2008. Google Scholar
  28. Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov. Complexity and expressive power of logic programming. ACM Comput. Surv., 33(3):374-425, September 2001. URL: https://doi.org/10.1145/502807.502810.
  29. Daniel Deutch, Nave Frost, and Amir Gilad. Provenance for natural language queries. Proc. VLDB Endow., 10(5):577-588, January 2017. URL: https://doi.org/10.14778/3055540.3055550.
  30. Daniel Deutch, Nave Frost, and Amir Gilad. Natural language explanations for query results. SIGMOD Rec., 47(1):42-49, September 2018. URL: https://doi.org/10.1145/3277006.3277017.
  31. Xin Luna Dong. Generations of knowledge graphs: The crazy ideas and the business impact. arXiv preprint arXiv:2308.14217, 2023. Google Scholar
  32. Owen P. Dwyer, Teodoro Baldazzi, Jim Davies, Emanuel Sallinger, and Adriano Vlad. Reasoning over health records with vadalog: a rule-based approach to patient pathways. In Jan Vanthienen, Tomás Kliegr, Paul Fodor, Davide Lanti, Dörthe Arndt, Egor V. Kostylev, Theodoros Mitsikas, and Ahmet Soylu, editors, Proceedings of the 17th International Rule Challenge and 7th Doctoral Consortium @ RuleML+RR 2023 co-located with 19th Reasoning Web Summer School (RW 2023) and 15th DecisionCAMP 2023 as part of Declarative AI 2023, Oslo, Norway, 18 - 20 September, 2023, volume 3485 of CEUR Workshop Proceedings. CEUR-WS.org, 2023. URL: https://ceur-ws.org/Vol-3485/paper9111.pdf.
  33. Esra Erdem and Umut Oztok. Generating explanations for biomedical queries. Theory and Practice of Logic Programming, 15(1):35-78, 2015. Google Scholar
  34. European Central Bank. Guideline (eu) 2011/14 of the ecb guideline, 2011. Google Scholar
  35. Jorge Fandinno and Claudia Schulz. Answering the “why” in answer set programming-a survey of explanation approaches. Theory and Practice of Logic Programming, 19(2):114-203, 2019. Google Scholar
  36. Boris Glavic, Renée J. Miller, and Gustavo Alonso. Using SQL for Efficient Generation and Querying of Provenance Information, pages 291-320. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. URL: https://doi.org/10.1007/978-3-642-41660-6_16.
  37. Goetz Graefe and William J. McKenna. The volcano optimizer generator: Extensibility and efficient search. In ICDE, pages 209-218. IEEE Computer Society, 1993. Google Scholar
  38. Todd J Green, Shan Shan Huang, Boon Thau Loo, Wenchao Zhou, et al. Datalog and recursive query processing. Foundations and Trendsregistered in Databases, 5(2):105-195, 2013. Google Scholar
  39. Todd J Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 31-40, 2007. Google Scholar
  40. Todd J. Green and Val Tannen. The semiring framework for database provenance. In PODS, pages 93-99. ACM, 2017. Google Scholar
  41. A. Gulino, S. Ceri, G. Gottlob, E. Sallinger, and L. Bellomarini. Distributed company control in company shareholding graphs. In IEEE 37th International Conference on Data Engineering (ICDE), pages 2637-2648, Los Alamitos, CA, USA, 2021. Google Scholar
  42. Kyle Hamilton, Aparna Nayak, Bojan Bozic, and Luca Longo. Is neuro-symbolic AI meeting its promise in natural language processing? A structured review. CoRR, abs/2202.12205, 2022. URL: https://arxiv.org/abs/2202.12205.
  43. Melanie Herschel and Marcel Hlawatsch. Provenance: On and behind the screens. In SIGMOD 2016, pages 2213-2217, New York, NY, USA, 2016. Association for Computing Machinery. URL: https://doi.org/10.1145/2882903.2912568.
  44. Jason I. Hong. Teaching the fate community about privacy. Commun. ACM, 66(8):10-11, July 2023. Google Scholar
  45. David S. Johnson and Anthony C. Klug. Testing containment of conjunctive queries under functional and inclusion dependencies. J. Comput. Syst. Sci., 28(1):167-189, 1984. Google Scholar
  46. Dominik K Kanbach, Louisa Heiduk, Georg Blueher, Maximilian Schreiter, and Alexander Lahmann. The genai is out of the bottle: generative artificial intelligence from a business model innovation perspective. Review of Managerial Science, pages 1-32, 2023. Google Scholar
  47. David Koop, Marta Mattoso, and Juliana Freire. Provenance in Workflows, pages 2912-2916. Springer New York, New York, NY, 2018. URL: https://doi.org/10.1007/978-1-4614-8265-9_80745.
  48. Markus Krötzsch and Veronika Thost. Ontologies for knowledge graphs: Breaking the rules. In International Semantic Web Conference, pages 376-392. Springer, 2016. Google Scholar
  49. Seokki Lee, Bertram Ludäscher, and Boris Glavic. Provenance summaries for answers and non-answers. Proc. VLDB Endow., 11(12):1954-1957, August 2018. URL: https://doi.org/10.14778/3229863.3236233.
  50. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459-9474, 2020. Google Scholar
  51. Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. Finbert: A pre-trained financial language representation model for financial text mining. In IJCAI 2020, 2021. Google Scholar
  52. Davide Magnanimi and Michela Iezzi. Ownership graphs and reasoning in corporate economics. In EDBT/ICDT Workshops, volume 3135 of CEUR Workshop Proceedings. CEUR-WS.org, 2022. Google Scholar
  53. David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. Testing implications of data dependencies. ACM TODS, 4(4):455-469, 1979. URL: https://doi.org/10.1145/320107.320115.
  54. Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. Unifying large language models and knowledge graphs: A roadmap. arXiv preprint arXiv:2306.08302, 2023. Google Scholar
  55. Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. ., 2018. Google Scholar
  56. Yann Ramusat, Silviu Maniu, and Pierre Senellart. Semiring provenance over graph databases. In Proceedings of the 10th USENIX Conference on Theory and Practice of Provenance, TaPP'18, page 7, USA, 2018. USENIX Association. Google Scholar
  57. Adam Roberts, Colin Raffel, and Noam Shazeer. How much knowledge can you pack into the parameters of a language model? In EMNLP (1), pages 5418-5426. ACL, 2020. Google Scholar
  58. Sudeepa Roy and Dan Suciu. A formal approach to finding explanations for database queries. In SIGMOD 2014, pages 1579-1590, New York, NY, USA, 2014. Association for Computing Machinery. URL: https://doi.org/10.1145/2588555.2588578.
  59. Pierre Senellart. Provenance in Databases: Principles And Applications, pages 104-109. Springer-Verlag, Berlin, Heidelberg, 2022. URL: https://doi.org/10.1007/978-3-030-31423-1_3.
  60. Yogesh L Simmhan, Beth Plale, and Dennis Gannon. Karma2: Provenance management for data-driven workflows. International Journal of Web Services Research (IJWSR), 5(2):1-22, 2008. Google Scholar
  61. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023. Google Scholar
  62. Adriano Vlad, Sahar Vahdati, Mojtaba Nayyeri, Luigi Bellomarini, and Emanuel Sallinger. Towards hybrid logic-based and embedding-based reasoning on financial knowledge graphs. In EDBT/ICDT Workshops, volume 3135, 2022. Google Scholar
  63. Yisu Remy Wang, Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, and Dan Suciu. Optimizing recursive queries with progam synthesis. In SIGMOD 2022, pages 79-93, New York, NY, USA, 2022. Association for Computing Machinery. URL: https://doi.org/10.1145/3514221.3517827.
  64. Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David S. Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for finance. CoRR, abs/2303.17564, 2023. Google Scholar
  65. Tianyi Zhang*, Varsha Kishore*, Felix Wu*, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, 2020. URL: https://openreview.net/forum?id=SkeHuCVFDr.
  66. Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, and Mengnan Du. Explainability for large language models: A survey. arXiv preprint arXiv:2309.01029, 2023. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail