DROPS

Document

Invited Talk

DOI: 10.4230/LIPIcs.STACS.2026.1

Query Languages for Machine-Learning Models (Invited Talk)

Authors: Martin Grohe

Published in: LIPIcs, Volume 364, 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)

Abstract

In my invited talk and this accompanying paper, I discuss two logics for weighted finite structures: first-order logic with summation (FO(SUM)) and its recursive extension IFP(SUM). These logics originate from foundational work by Grädel, Gurevich, and Meer in the 1990s. In recent joint work with Standke, Steegmans, and Van den Bussche, we have investigated these logics as query languages for machine learning models, specifically neural networks, which are naturally represented as weighted graphs. I present illustrative examples of queries to neural networks that can be expressed in these logics and discuss fundamental results on their expressiveness and computational complexity.

Cite as

Martin Grohe. Query Languages for Machine-Learning Models (Invited Talk). In 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 364, pp. 1:1-1:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{grohe:LIPIcs.STACS.2026.1,
  author =	{Grohe, Martin},
  title =	{{Query Languages for Machine-Learning Models}},
  booktitle =	{43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)},
  pages =	{1:1--1:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-412-3},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{364},
  editor =	{Mahajan, Meena and Manea, Florin and McIver, Annabelle and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2026.1},
  URN =		{urn:nbn:de:0030-drops-254904},
  doi =		{10.4230/LIPIcs.STACS.2026.1},
  annote =	{Keywords: Expressive power of query languages, fixed-point logics, weighted structures, neural networks, explainable AI}
}

Document

Invited Paper

DOI: 10.4230/OASIcs.RW.2024/2025.7

Modern Datalog: Concepts, Methods, Applications (Invited Paper)

Authors: Markus Krötzsch

Published in: OASIcs, Volume 138, Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 & RW 2025)

Abstract

Pure Datalog is arguably the most fundamental rule language, elegant and simple, but also often too limited to be useful in practice. This has motivated the introduction of many new expressive features, ranging from datatypes and related functions, over aggregates and semi-ring generalisations, to existential quantifiers and complex terms. In spite of their variety, all these approaches remain true to the nature of Datalog as a direct, pattern-based way of computing on structured data. We therefore find that a modern notion of Datalog is emerging, distinctly different from other approaches of logic programming and with its own set of related methods and applications. In this course, we introduce Datalog and its most common extensions, and explain when and how these features can be used together (which is often, but not always, safe to do). We further look at modern Datalog systems and some of their primary use cases. Hands-on work with Datalog and its extensions is done with the free Datalog engine https://knowsys.github.io/nemo-doc/. The course is accessible to all audiences and does not assume specific prior knowledge.

Cite as

Markus Krötzsch. Modern Datalog: Concepts, Methods, Applications (Invited Paper). In Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 & RW 2025). Open Access Series in Informatics (OASIcs), Volume 138, pp. 7:1-7:41, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{krotzsch:OASIcs.RW.2024/2025.7,
  author =	{Kr\"{o}tzsch, Markus},
  title =	{{Modern Datalog: Concepts, Methods, Applications}},
  booktitle =	{Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 \& RW 2025)},
  pages =	{7:1--7:41},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-405-5},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{138},
  editor =	{Artale, Alessandro and Bienvenu, Meghyn and Garc{\'\i}a, Yazm{\'\i}n Ib\'{a}\~{n}ez and Murlak, Filip},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.RW.2024/2025.7},
  URN =		{urn:nbn:de:0030-drops-250524},
  doi =		{10.4230/OASIcs.RW.2024/2025.7},
  annote =	{Keywords: Datalog, query language, knowlegde representation and reasoning, logic programming, Horn logic, SPARQL, datatypes and aggregation, lecture notes, tutorial}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.20

A Formal Language Perspective on Factorized Representations

Authors: Benny Kimelfeld, Wim Martens, and Matthias Niewerth

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Factorized representations (FRs) are a well-known tool to succinctly represent results of join queries and have been originally defined using the named database perspective. We define FRs in the unnamed database perspective and use them to establish several new connections. First, unnamed FRs can be exponentially more succinct than named FRs, but this difference can be alleviated by imposing a disjointness condition on columns. Conversely, named FRs can also be exponentially more succinct than unnamed FRs. Second, unnamed FRs are the same as (i.e., isomorphic to) context-free grammars for languages in which each word has the same length. This tight connection allows us to transfer a wide range of results on context-free grammars to database factorization; of which we offer a selection in the paper. Third, when we generalize unnamed FRs to arbitrary sets of tuples, they become a generalization of path multiset representations, a formalism that was recently introduced to succinctly represent sets of paths in the context of graph database query evaluation.

Cite as

Benny Kimelfeld, Wim Martens, and Matthias Niewerth. A Formal Language Perspective on Factorized Representations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 20:1-20:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kimelfeld_et_al:LIPIcs.ICDT.2025.20,
  author =	{Kimelfeld, Benny and Martens, Wim and Niewerth, Matthias},
  title =	{{A Formal Language Perspective on Factorized Representations}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{20:1--20:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.20},
  URN =		{urn:nbn:de:0030-drops-229614},
  doi =		{10.4230/LIPIcs.ICDT.2025.20},
  annote =	{Keywords: Databases, relational databases, graph databases, factorized databases, regular path queries, compact representations}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.18

A Framework for Extraction and Transformation of Documents

Authors: Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

We present a theoretical framework for the extraction and transformation of text documents as a two-phase process: The first phase uses document spanners to extract information from the input document. The second phase transforms the extracted information into a suitable output. To support several reasonable extract-transform scenarios, we propose for the first phase an extension of document spanners from span-tuples to so-called multispan-tuples, where variables are mapped to sets of spans instead of only single spans. We focus on multispanners described by regex formulas, and we prove that these have the same desirable properties as standard regular spanners. To formalize the second phase, we consider transformations that map every pair document-tuple, where each tuple comes from the (multi)span-relation extracted in the first phase, into a new output document. The specification of the two phases is what we call an extract-transform (ET) program, which covers practically relevant extract-transform tasks. In this paper, our main technical goal is to identify a broad class of ET programs that can be evaluated efficiently. We specifically focus on the scenario of regular ET programs: the extraction phase is given by a regex multispanner and the transformation phase is given by a regular string-to-string function. We show that for any regular ET program, given an input document, we can enumerate all final output documents with output-linear delay after linear preprocessing. As a side effect, we characterize the expressive power of regular ET programs and also show that they have desirable properties, like being closed under composition.

Cite as

Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt. A Framework for Extraction and Transformation of Documents. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{riveros_et_al:LIPIcs.ICDT.2025.18,
  author =	{Riveros, Cristian and Schmid, Markus L. and Schweikardt, Nicole},
  title =	{{A Framework for Extraction and Transformation of Documents}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.18},
  URN =		{urn:nbn:de:0030-drops-229593},
  doi =		{10.4230/LIPIcs.ICDT.2025.18},
  annote =	{Keywords: Information extraction, Document spanners, Transducers, Query evaluation}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.29

FC-Datalog as a Framework for Efficient String Querying

Authors: Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures 𝖯, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.

Cite as

Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger. FC-Datalog as a Framework for Efficient String Querying. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 29:1-29:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bell_et_al:LIPIcs.ICDT.2025.29,
  author =	{Bell, Owen M. and Day, Joel D. and Freydenberger, Dominik D.},
  title =	{{FC-Datalog as a Framework for Efficient String Querying}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{29:1--29:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.29},
  URN =		{urn:nbn:de:0030-drops-229708},
  doi =		{10.4230/LIPIcs.ICDT.2025.29},
  annote =	{Keywords: Information extraction, word equations, datalog, document spanners, regex}
}

Document

Resource Paper

DOI: 10.4230/TGDK.2.2.7

Whelk: An OWL EL+RL Reasoner Enabling New Use Cases

Authors: James P. Balhoff and Christopher J. Mungall

Published in: TGDK, Volume 2, Issue 2 (2024): Special Issue on Resources for Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 2, Issue 2

Abstract

Many tasks in the biosciences rely on reasoning with large OWL terminologies (Tboxes), often combined with even larger databases. In particular, a common task is retrieval queries that utilize relational expressions; for example, “find all genes expressed in the brain or any part of the brain”. Automated reasoning on these ontologies typically relies on scalable reasoners targeting the EL subset of OWL, such as ELK. While the introduction of ELK has been transformative in the incorporation of reasoning into bio-ontology quality control and production pipelines, we have encountered limitations when applying it to use cases involving high throughput query answering or reasoning about datasets describing instances (Aboxes). Whelk is a fast OWL reasoner for combined EL+RL reasoning. As such, it is particularly useful for many biological ontology tasks, particularly those characterized by large Tboxes using the EL subset of OWL, combined with Aboxes targeting the RL subset of OWL. Whelk is implemented in Scala and utilizes immutable functional data structures, which provides advantages when performing incremental or dynamic reasoning tasks. Whelk supports querying complex class expressions at a substantially greater rate than ELK, and can answer queries or perform incremental reasoning tasks in parallel, enabling novel applications of OWL reasoning.

Cite as

James P. Balhoff and Christopher J. Mungall. Whelk: An OWL EL+RL Reasoner Enabling New Use Cases. In Special Issue on Resources for Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 2, pp. 7:1-7:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{balhoff_et_al:TGDK.2.2.7,
  author =	{Balhoff, James P. and Mungall, Christopher J.},
  title =	{{Whelk: An OWL EL+RL Reasoner Enabling New Use Cases}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{7:1--7:17},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.2.7},
  URN =		{urn:nbn:de:0030-drops-225918},
  doi =		{10.4230/TGDK.2.2.7},
  annote =	{Keywords: Web Ontology Language, OWL, Semantic Web, ontology, reasoner}
}

Document

Vision

DOI: 10.4230/TGDK.1.1.8

Machine Learning and Knowledge Graphs: Existing Gaps and Future Research Challenges

Authors: Claudia d'Amato, Louis Mahon, Pierre Monnin, and Giorgos Stamou

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

The graph model is nowadays largely adopted to model a wide range of knowledge and data, spanning from social networks to knowledge graphs (KGs), representing a successful paradigm of how symbolic and transparent AI can scale on the World Wide Web. However, due to their unprecedented volume, they are generally tackled by Machine Learning (ML) and mostly numeric based methods such as graph embedding models (KGE) and deep neural networks (DNNs). The latter methods have been proved lately very efficient, leading the current AI spring. In this vision paper, we introduce some of the main existing methods for combining KGs and ML, divided into two categories: those using ML to improve KGs, and those using KGs to improve results on ML tasks. From this introduction, we highlight research gaps and perspectives that we deem promising and currently under-explored for the involved research communities, spanning from KG support for LLM prompting, integration of KG semantics in ML models to symbol-based methods, interpretability of ML models, and the need for improved benchmark datasets. In our opinion, such perspectives are stepping stones in an ultimate view of KGs as central assets for neuro-symbolic and explainable AI.

Cite as

Claudia d'Amato, Louis Mahon, Pierre Monnin, and Giorgos Stamou. Machine Learning and Knowledge Graphs: Existing Gaps and Future Research Challenges. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 8:1-8:35, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{damato_et_al:TGDK.1.1.8,
  author =	{d'Amato, Claudia and Mahon, Louis and Monnin, Pierre and Stamou, Giorgos},
  title =	{{Machine Learning and Knowledge Graphs: Existing Gaps and Future Research Challenges}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{8:1--8:35},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.8},
  URN =		{urn:nbn:de:0030-drops-194824},
  doi =		{10.4230/TGDK.1.1.8},
  annote =	{Keywords: Graph-based Learning, Knowledge Graph Embeddings, Large Language Models, Explainable AI, Knowledge Graph Completion \& Curation}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2016.14

Dynamic Graph Queries

Authors: Pablo Muñoz, Nils Vortmeier, and Thomas Zeume

Published in: LIPIcs, Volume 48, 19th International Conference on Database Theory (ICDT 2016)

Abstract

Graph databases in many applications - semantic web, transport or biological networks among others - are not only large, but also frequently modified. Evaluating graph queries in this dynamic context is a challenging task, as those queries often combine first-order and navigational features. Motivated by recent results on maintaining dynamic reachability, we study the dynamic evaluation of traditional query languages for graphs in the descriptive complexity framework. Our focus is on maintaining regular path queries, and extensions thereof, by first-order formulas. In particular we are interested in path queries defined by non-regular languages and in extended conjunctive regular path queries (which allow to compare labels of paths based on word relations). Further we study the closely related problems of maintaining distances in graphs and reachability in product graphs. In this preliminary study we obtain upper bounds for those problems in restricted settings, such as undirected and acyclic graphs, or under insertions only, and negative results regarding quantifier-free update formulas. In addition we point out interesting directions for further research.

Cite as

Pablo Muñoz, Nils Vortmeier, and Thomas Zeume. Dynamic Graph Queries. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{munoz_et_al:LIPIcs.ICDT.2016.14,
  author =	{Mu\~{n}oz, Pablo and Vortmeier, Nils and Zeume, Thomas},
  title =	{{Dynamic Graph Queries}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.14},
  URN =		{urn:nbn:de:0030-drops-57830},
  doi =		{10.4230/LIPIcs.ICDT.2016.14},
  annote =	{Keywords: Dynamic descriptive complexity, graph databases, graph products, reachability, path queries}
}

8 Search Results for "Muñoz, Pablo"

Query Languages for Machine-Learning Models (Invited Talk)

Abstract

Cite as

Modern Datalog: Concepts, Methods, Applications (Invited Paper)

Abstract

Cite as

A Formal Language Perspective on Factorized Representations

Abstract

Cite as

A Framework for Extraction and Transformation of Documents

Abstract

Cite as

FC-Datalog as a Framework for Efficient String Querying

Abstract

Cite as

Whelk: An OWL EL+RL Reasoner Enabling New Use Cases

Abstract

Cite as

Machine Learning and Knowledge Graphs: Existing Gaps and Future Research Challenges

Abstract

Cite as

Dynamic Graph Queries

Abstract

Cite as

Thanks for your feedback!

Could not send message