3 Search Results for "Wang, Jianguo"


Document
Research
GraphRAG on Technical Documents - Impact of Knowledge Graph Schema

Authors: Henri Scaffidi, Melinda Hodkiewicz, Caitlin Woods, and Nicole Roocke

Published in: TGDK, Volume 3, Issue 2 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 2


Abstract
Retrieval Augmented Generation (RAG) is seeing rapid adoption in industry to enable employees to query information captured in proprietary data for their organisation. In this work, we test the impact of domain-relevant knowledge graph schemas on the results of Microsoft’s GraphRAG pipeline. Our approach aims to address the poor quality of GraphRAG responses on technical reports rich in domain-specific terms. The use case involves technical reports about geology, chemistry and mineral processing published by the Minerals Research Institute of Western Australia (MRIWA). Four schemas are considered: a simple five-class minerals domain expert-developed schema, an expanded minerals domain schema, the Microsoft GraphRAG auto-generated schema, and a schema-less GraphRAG. These are compared to a conventional baseline RAG. Performance is evaluated using a scoring approach that accounts for the mix of correct, incorrect, additional, and missing content in RAG responses. The results show that the simple five-class minerals domain schema extracts approximately 10% more entities from the MRIWA reports than the other schema options. Additionally, both the five-class and the expanded eight-class minerals domain schemas produce the most factually correct answers and the fewest hallucinations. We attribute this to the minerals-specific schemas extracting more relevant, domain-specific information during the Indexing stage. As a result, the Query stage’s context window includes more high-value content. This contributes to the observed improvement in answer quality compared to the other pipelines. In contrast, pipelines with fewer domain-related entities in the KG retrieve less valuable information, leaving more room for irrelevant content in the context window. Baseline RAG responses were typically shorter, less complete, and contained more hallucinations compared to our GraphRAG pipelines. We provide a complete set of resources at https://github.com/nlp-tlp/GraphRAG-on-Minerals-Domain/tree/main. These resources include links to the MRIWA reports, a set of questions (from simple to challenging) along with domain-expert curated answers, schemas, and evaluations of the pipelines.

Cite as

Henri Scaffidi, Melinda Hodkiewicz, Caitlin Woods, and Nicole Roocke. GraphRAG on Technical Documents - Impact of Knowledge Graph Schema. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 2, pp. 3:1-3:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@Article{scaffidi_et_al:TGDK.3.2.3,
  author =	{Scaffidi, Henri and Hodkiewicz, Melinda and Woods, Caitlin and Roocke, Nicole},
  title =	{{GraphRAG on Technical Documents - Impact of Knowledge Graph Schema}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{3:1--3:24},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.2.3},
  URN =		{urn:nbn:de:0030-drops-248131},
  doi =		{10.4230/TGDK.3.2.3},
  annote =	{Keywords: RAG, minerals, local search, global search, entity extraction, competency questions}
}
Document
Guiding Geospatial Analysis Processes in Dealing with Modifiable Areal Unit Problems

Authors: Guoray Cai and Yue Hao

Published in: LIPIcs, Volume 346, 13th International Conference on Geographic Information Science (GIScience 2025)


Abstract
Geospatial analysis has been widely applied in different domains for critical decision making. However, the results of spatial analysis are often plagued with uncertainties due to measurement errors, choice of data representations, and unintended transformation artifacts. A well known example of such problems is the Modifiable Areal Unit Problem (MAUP) which has well documented effects on the outcome of spatial analysis on area-aggregated data. Existing methods for addressing the effects of MAUP are limited, are technically complex, and are often inaccessible to practitioners. As a result, analysts tend to ignore the effects of MAUP in practice due to lack of expertise, high cognitive loads, and resource limitations. To address these challenges, this paper proposes a machine-guidance approach to augment the analyst’s capacity in mitigating the effect of MAUP. Based on an analysis of practical challenges faced by human analysts, we identified multiple opportunities for the machine to guide the analysts by alerting to the rise of MAUP, assessing the impact of MAUP, choosing mitigation methods, and generating visual guidance messages using GIS functions and tools. For each of the opportunities, we characterize the behavior patterns and the underlying guidance strategies that generate the behavior. We illustrate the behavior of machine guidance using a hotspot analysis scenario in the context of crime policing, where MAUP has strong effects on the patterns of crime hotspots. Finally, we describe the computational framework used to build a prototype guidance system and identify a number of research questions to be addressed. We conclude by discussing how the machine guidance approach could be an answer to some of the toughest problems in geospatial analysis.

Cite as

Guoray Cai and Yue Hao. Guiding Geospatial Analysis Processes in Dealing with Modifiable Areal Unit Problems. In 13th International Conference on Geographic Information Science (GIScience 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 346, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{cai_et_al:LIPIcs.GIScience.2025.14,
  author =	{Cai, Guoray and Hao, Yue},
  title =	{{Guiding Geospatial Analysis Processes in Dealing with Modifiable Areal Unit Problems}},
  booktitle =	{13th International Conference on Geographic Information Science (GIScience 2025)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-378-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{346},
  editor =	{Sila-Nowicka, Katarzyna and Moore, Antoni and O'Sullivan, David and Adams, Benjamin and Gahegan, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GIScience.2025.14},
  URN =		{urn:nbn:de:0030-drops-238433},
  doi =		{10.4230/LIPIcs.GIScience.2025.14},
  annote =	{Keywords: Machine Guidance, Geo-Spatial Analysis, Modifiable Areal Unit Problem (MAUP)}
}
Document
Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees

Authors: Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
Given a database of vectors, a cosine threshold query returns all vectors in the database having cosine similarity to a query vector above a given threshold. These queries arise naturally in many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers the efficient evaluation of such queries, providing novel optimality guarantees and exhibiting good performance on real datasets. We take as a starting point Fagin’s well-known Threshold Algorithm (TA), which can be used to answer cosine threshold queries as follows: an inverted index is first built from the database vectors during pre-processing; at query time, the algorithm traverses the index partially to gather a set of candidate vectors to be later verified against the similarity threshold. However, directly applying TA in its raw form misses significant optimization opportunities. Indeed, we first show that one can take advantage of the fact that the vectors can be assumed to be normalized, to obtain an improved, tight stopping condition for index traversal and to efficiently compute it incrementally. Then we show that one can take advantage of data skewness to obtain better traversal strategies. In particular, we show a novel traversal strategy that exploits a common data skewness condition which holds in multiple domains including mass spectrometry, documents, and image databases. We show that under the skewness assumption, the new traversal strategy has a strong, near-optimal performance guarantee. The techniques developed in the paper are quite general since they can be applied to a large class of similarity functions beyond cosine.

Cite as

Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou. Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{li_et_al:LIPIcs.ICDT.2019.11,
  author =	{Li, Yuliang and Wang, Jianguo and Pullman, Benjamin and Bandeira, Nuno and Papakonstantinou, Yannis},
  title =	{{Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.11},
  URN =		{urn:nbn:de:0030-drops-103135},
  doi =		{10.4230/LIPIcs.ICDT.2019.11},
  annote =	{Keywords: Vector databases, Similarity search, Cosine, Threshold Algorithm}
}
  • Refine by Type
  • 3 Document/PDF
  • 2 Document/HTML

  • Refine by Publication Year
  • 2 2025
  • 1 2019

  • Refine by Author
  • 1 Bandeira, Nuno
  • 1 Cai, Guoray
  • 1 Hao, Yue
  • 1 Hodkiewicz, Melinda
  • 1 Li, Yuliang
  • Show More...

  • Refine by Series/Journal
  • 2 LIPIcs
  • 1 TGDK

  • Refine by Classification
  • 1 Computing methodologies → Artificial intelligence
  • 1 Information systems → Geographic information systems
  • 1 Information systems → Nearest-neighbor search
  • 1 Information systems → Spatial-temporal systems
  • 1 Information systems → Specialized information retrieval
  • Show More...

  • Refine by Keyword
  • 1 Cosine
  • 1 Geo-Spatial Analysis
  • 1 Machine Guidance
  • 1 Modifiable Areal Unit Problem (MAUP)
  • 1 RAG
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail