Search Results

Documents authored by Roocke, Nicole


Artifact
InteractiveResource
nlp-tlp/GraphRAG-on-Minerals-Domain

Authors: Henri Scaffidi, Melinda Hodkiewicz, Caitlin Woods, and Nicole Roocke


Abstract

Cite as

Henri Scaffidi, Melinda Hodkiewicz, Caitlin Woods, Nicole Roocke. nlp-tlp/GraphRAG-on-Minerals-Domain (InteractiveResource). Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@misc{dagstuhl-artifact-24886,
   title = {{nlp-tlp/GraphRAG-on-Minerals-Domain}}, 
   author = {Scaffidi, Henri and Hodkiewicz, Melinda and Woods, Caitlin and Roocke, Nicole},
   note = {InteractiveResource (visited on 2025-10-15)},
   url = {https://github.com/nlp-tlp/GraphRAG-on-Minerals-Domain},
   doi = {10.4230/artifacts.24886},
}
Document
Research
GraphRAG on Technical Documents - Impact of Knowledge Graph Schema

Authors: Henri Scaffidi, Melinda Hodkiewicz, Caitlin Woods, and Nicole Roocke

Published in: TGDK, Volume 3, Issue 2 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 2


Abstract
Retrieval Augmented Generation (RAG) is seeing rapid adoption in industry to enable employees to query information captured in proprietary data for their organisation. In this work, we test the impact of domain-relevant knowledge graph schemas on the results of Microsoft’s GraphRAG pipeline. Our approach aims to address the poor quality of GraphRAG responses on technical reports rich in domain-specific terms. The use case involves technical reports about geology, chemistry and mineral processing published by the Minerals Research Institute of Western Australia (MRIWA). Four schemas are considered: a simple five-class minerals domain expert-developed schema, an expanded minerals domain schema, the Microsoft GraphRAG auto-generated schema, and a schema-less GraphRAG. These are compared to a conventional baseline RAG. Performance is evaluated using a scoring approach that accounts for the mix of correct, incorrect, additional, and missing content in RAG responses. The results show that the simple five-class minerals domain schema extracts approximately 10% more entities from the MRIWA reports than the other schema options. Additionally, both the five-class and the expanded eight-class minerals domain schemas produce the most factually correct answers and the fewest hallucinations. We attribute this to the minerals-specific schemas extracting more relevant, domain-specific information during the Indexing stage. As a result, the Query stage’s context window includes more high-value content. This contributes to the observed improvement in answer quality compared to the other pipelines. In contrast, pipelines with fewer domain-related entities in the KG retrieve less valuable information, leaving more room for irrelevant content in the context window. Baseline RAG responses were typically shorter, less complete, and contained more hallucinations compared to our GraphRAG pipelines. We provide a complete set of resources at https://github.com/nlp-tlp/GraphRAG-on-Minerals-Domain/tree/main. These resources include links to the MRIWA reports, a set of questions (from simple to challenging) along with domain-expert curated answers, schemas, and evaluations of the pipelines.

Cite as

Henri Scaffidi, Melinda Hodkiewicz, Caitlin Woods, and Nicole Roocke. GraphRAG on Technical Documents - Impact of Knowledge Graph Schema. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 2, pp. 3:1-3:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@Article{scaffidi_et_al:TGDK.3.2.3,
  author =	{Scaffidi, Henri and Hodkiewicz, Melinda and Woods, Caitlin and Roocke, Nicole},
  title =	{{GraphRAG on Technical Documents - Impact of Knowledge Graph Schema}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{3:1--3:24},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.2.3},
  URN =		{urn:nbn:de:0030-drops-248131},
  doi =		{10.4230/TGDK.3.2.3},
  annote =	{Keywords: RAG, minerals, local search, global search, entity extraction, competency questions}
}
Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail