,
Daniel Hernández
,
Luis Galárraga
,
Giorgos Flouris
,
Irini Fundulaki
,
Katja Hose
Creative Commons Attribution 4.0 International license
The popularity of knowledge graphs (KGs) owes credit to their flexible data model, which is suitable for data integration from multiple sources. Several KG-based applications, such as trust assessment, view maintenance, or data valuation on dynamic data, rely on the ability to compute provenance explanations for query results. This need becomes more urgent in federated query processing systems, which allow the online consumption of heterogeneous and decentralized Web data. However, the problem of computing and interacting with provenance has received little attention, especially in the federated setting. On those grounds, this paper introduces the NPCS (Native Provenance Computation for SPARQL) approach, and its federated variant Fed-NPCS, that compute provenance for SPARQL query results. Both approaches build upon spm-semirings to annotate the results of monotonic and non-monotonic SPARQL queries with their provenance. Due to their reliance on query rewriting techniques, the approaches are directly applicable to already deployed SPARQL engines and federations using different reification schemes, including RDF-star. Our experimental evaluation shows that our novel query rewriting approach brings significant run-time improvements w.r.t. the state-of-the-art across both centralized and federated settings. In centralized settings, our tests on two popular SPARQL engines (GraphDB and Stardog) reveal substantial runtime gains over existing query rewriting solutions, enabling scalability to RDF graphs with billions of triples. In federated settings, our experiments on the FedShop benchmark with GraphDB show the viability of Fed-NPCS for federations with up to 200 sources.
@Article{asma_et_al:TGDK.4.1.4,
author = {Asma, Zubaria and Hern\'{a}ndez, Daniel and Gal\'{a}rraga, Luis and Flouris, Giorgos and Fundulaki, Irini and Hose, Katja},
title = {{Native Provenance Computation for Federated and Non-Federated SPARQL Queries}},
journal = {Transactions on Graph Data and Knowledge},
pages = {4:1--4:43},
ISSN = {2942-7517},
year = {2026},
volume = {4},
number = {1},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/TGDK.4.1.4},
URN = {urn:nbn:de:0030-drops-259642},
doi = {10.4230/TGDK.4.1.4},
annote = {Keywords: native provenance computation, federated SPARQL queries, data provenance, NPCS, Fed-NPCS}
}
archived version