LIPIcs, Volume 328

28th International Conference on Database Theory (ICDT 2025)



Thumbnail PDF

Event

ICDT 2025, March 25-28, 2025, Barcelona, Spain

Editors

Sudeepa Roy
  • Duke University, Durham, NC, USA
Ahmet Kara
  • OTH Regensburg, Germany

Publication Details

  • published at: 2025-03-21
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-364-5

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 328, ICDT 2025, Complete Volume

Authors: Sudeepa Roy and Ahmet Kara


Abstract
LIPIcs, Volume 328, ICDT 2025, Complete Volume

Cite as

28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 1-634, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@Proceedings{roy_et_al:LIPIcs.ICDT.2025,
  title =	{{LIPIcs, Volume 328, ICDT 2025, Complete Volume}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{1--634},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025},
  URN =		{urn:nbn:de:0030-drops-229837},
  doi =		{10.4230/LIPIcs.ICDT.2025},
  annote =	{Keywords: LIPIcs, Volume 328, ICDT 2025, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Sudeepa Roy and Ahmet Kara


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 0:i-0:xx, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{roy_et_al:LIPIcs.ICDT.2025.0,
  author =	{Roy, Sudeepa and Kara, Ahmet},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{0:i--0:xx},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.0},
  URN =		{urn:nbn:de:0030-drops-229828},
  doi =		{10.4230/LIPIcs.ICDT.2025.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
The Quest for Faster Join Algorithms (Invited Talk)

Authors: Paraschos Koutris, Shaleen Deep, Austen Fan, and Hangdong Zhao


Abstract
Joins are the cornerstone of relational databases. Surprisingly, even after several decades of research in the systems and theory database community, we still lack an understanding of how to design the fastest possible join algorithm. In this talk, we will present the exciting progress the database theory community has achieved in join algorithms over the last two decades. The talk will revolve around five key ideas fundamentally shaping this research area: tree decompositions, data partitioning, leveraging statistical information, enumeration, and algebraic techniques.

Cite as

Paraschos Koutris, Shaleen Deep, Austen Fan, and Hangdong Zhao. The Quest for Faster Join Algorithms (Invited Talk). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 1:1-1:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{koutris_et_al:LIPIcs.ICDT.2025.1,
  author =	{Koutris, Paraschos and Deep, Shaleen and Fan, Austen and Zhao, Hangdong},
  title =	{{The Quest for Faster Join Algorithms}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{1:1--1:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.1},
  URN =		{urn:nbn:de:0030-drops-229428},
  doi =		{10.4230/LIPIcs.ICDT.2025.1},
  annote =	{Keywords: Conjunctive Queries, Joins, Tree Decompositions, Enumeration, Semirings}
}
Document
Invited Talk
Data Management Perspectives on Prescriptive Analytics (Invited Talk)

Authors: Alexandra Meliou, Azza Abouzied, Peter J. Haas, Riddho R. Haque, Anh Mai, and Vasileios Vittis


Abstract
Decision makers in a broad range of domains, such as finance, transportation, manufacturing, and healthcare, often need to derive optimal decisions given a set of constraints and objectives. Traditional solutions to such constrained optimization problems are typically application-specific, complex, and do not generalize. Further, the usual workflow requires slow, cumbersome, and error-prone data movement between a database, and predictive-modeling and optimization packages. All of these problems are exacerbated by the unprecedented size of modern data-intensive optimization problems. The emerging research area of in-database prescriptive analytics aims to provide seamless domain-independent, declarative, and scalable approaches powered by the system where the data typically resides: the database. Integrating optimization with database technology opens up prescriptive analytics to a much broader community, amplifying its benefits. We discuss how deep integration between the DBMS, predictive models, and optimization software creates opportunities for rich prescriptive-query functionality with good scalability and performance. Summarizing some of our main results and ongoing work in this area, we highlight challenges related to usability, scalability, data uncertainty, and dynamic environments, and argue that perspectives from data management research can drive novel strategies and solutions.

Cite as

Alexandra Meliou, Azza Abouzied, Peter J. Haas, Riddho R. Haque, Anh Mai, and Vasileios Vittis. Data Management Perspectives on Prescriptive Analytics (Invited Talk). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 2:1-2:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{meliou_et_al:LIPIcs.ICDT.2025.2,
  author =	{Meliou, Alexandra and Abouzied, Azza and Haas, Peter J. and Haque, Riddho R. and Mai, Anh and Vittis, Vasileios},
  title =	{{Data Management Perspectives on Prescriptive Analytics}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{2:1--2:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.2},
  URN =		{urn:nbn:de:0030-drops-229432},
  doi =		{10.4230/LIPIcs.ICDT.2025.2},
  annote =	{Keywords: Prescriptive analytics, decision making, scalable constrained optimization}
}
Document
Invited Talk
Evaluating First-Order Formulas in Structured Graphs (Invited Talk)

Authors: Szymon Toruńczyk


Abstract
A central problem in database theory concerns the complexity of the query evaluation problem, also called the model-checking problem in finite model theory: the problem of evaluating a given formula in a given structure. Here, I will focus on formulas of first-order logic, and the data complexity (or parameterized complexity) of their evaluation. Leveraging tools from structural graph theory, I will assume that the input structure is a graph which comes from a fixed class of well-structured graphs, such as the class of planar graphs, classes of bounded treewidth or clique-width, or much more general "tame" graph classes, such as the nowhere dense graph classes introduced by Ossona de Mendez and Nešetřil, or classes of bounded twin-width studied by Bonnet, Thomassé, and coauthors. I will survey the recent progress in this area, which connects tools from structural graph theory, from model theory - such as stability and dependence - and from statistical learning theory and computational geometry - such as VC-dimension and ε-nets.

Cite as

Szymon Toruńczyk. Evaluating First-Order Formulas in Structured Graphs (Invited Talk). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 3:1-3:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{torunczyk:LIPIcs.ICDT.2025.3,
  author =	{Toru\'{n}czyk, Szymon},
  title =	{{Evaluating First-Order Formulas in Structured Graphs}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{3:1--3:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.3},
  URN =		{urn:nbn:de:0030-drops-229449},
  doi =		{10.4230/LIPIcs.ICDT.2025.3},
  annote =	{Keywords: Finite model theory, first-order model checking, graph parameters}
}
Document
Learning Aggregate Queries Defined by First-Order Logic with Counting

Authors: Steffen van Bergerem and Nicole Schweikardt


Abstract
In the logical framework introduced by Grohe and Turán (TOCS 2004) for Boolean classification problems, the instances to classify are tuples from a logical structure, and Boolean classifiers are described by parametric models based on logical formulas. This is a specific scenario for supervised passive learning, where classifiers should be learned based on labelled examples. Existing results in this scenario focus on Boolean classification. This paper presents learnability results beyond Boolean classification. We focus on multiclass classification problems where the task is to assign input tuples to arbitrary integers. To represent such integer-valued classifiers, we use aggregate queries specified by an extension of first-order logic with counting terms called FOC₁. Our main result shows the following: given a database of polylogarithmic degree, within quasi-linear time, we can build an index structure that makes it possible to learn FOC₁-definable integer-valued classifiers in time polylogarithmic in the size of the database and polynomial in the number of training examples.

Cite as

Steffen van Bergerem and Nicole Schweikardt. Learning Aggregate Queries Defined by First-Order Logic with Counting. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 4:1-4:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{vanbergerem_et_al:LIPIcs.ICDT.2025.4,
  author =	{van Bergerem, Steffen and Schweikardt, Nicole},
  title =	{{Learning Aggregate Queries Defined by First-Order Logic with Counting}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{4:1--4:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.4},
  URN =		{urn:nbn:de:0030-drops-229457},
  doi =		{10.4230/LIPIcs.ICDT.2025.4},
  annote =	{Keywords: Supervised learning, multiclass classification problems, counting logic}
}
Document
Bag Containment of Join-On-Free Queries

Authors: George Konstantinidis and Fabio Mogavero


Abstract
Bag-semantics allows for atomic relations and query answers to contain multiple copies of the same data tuple, reflecting real-world database systems more accurately. Deciding containment under bag-semantics (or simply, bag-containment) for two conjunctive queries (CQs) requires determining whether the answer of the first query, taking multiplicities into account, is contained within the answer of the second query, across all databases. Despite numerous attempts in the last thirty years, this problem of determining decidability and complexity of this task remains open as one of the prominent challenges in database theory, given its relevance in important applications. Previous works have established the decidability of the problem for specific classes of queries, among which is the the bag-containment of projection-free queries (PFQs), i.e., queries without existentially quantified variables, into general CQs. In this work, we push the boundaries further by addressing a broader, yet natural, fragment of CQs, called join-on-free queries (JoFQ), which allows existential variables, while prohibiting joins involving them. We prove decidability of bag-containment of a JoFQ within a general CQ, placing the complexity of the problem in the first non-deterministic layer of the exponential hierarchy. The approach involves a homomorphism-counting reduction to the solution of a system of Diophantine inequalities with a specific structure (an undecidable problem in its general form) and an algorithm designed to address this category of inequalities.

Cite as

George Konstantinidis and Fabio Mogavero. Bag Containment of Join-On-Free Queries. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{konstantinidis_et_al:LIPIcs.ICDT.2025.5,
  author =	{Konstantinidis, George and Mogavero, Fabio},
  title =	{{Bag Containment of Join-On-Free Queries}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.5},
  URN =		{urn:nbn:de:0030-drops-229469},
  doi =		{10.4230/LIPIcs.ICDT.2025.5},
  annote =	{Keywords: Query Containment, Bag Semantics, Bag Containment, Diophantine Problems}
}
Document
PAC: Computing Join Queries with Semi-Covers

Authors: Heba Aamer and Bas Ketsman


Abstract
An increased and growing interest in large-scale data processing has triggered a demand for specialized algorithms that thrive in massively parallel shared-nothing systems. To answer the question of how to efficiently compute join queries in this setting, a rich line of research has emerged specifically for the Massively Parallel Communication (MPC) model. In the MPC model, algorithms are executed in rounds, with each round consisting of a synchronized communication phase and a separate local computation phase. The main cost measure is the load of the algorithm, defined as the maximum number of messages received by any server in any round. We study worst-case optimal algorithms for the join query evaluation problem in the constant-round MPC model. In the single-round variant of MPC, the worst-case optimal load for this problem is well understood and algorithms exist that guarantee this load for any join query. In the constant-round variant of MPC, queries can often be computed with a lower load compared to the single-round variant, but the worst-case optimal load is only known for specific classes of join queries, including graph-like and acyclic join queries, and the associated algorithms use very different techniques. In this paper, we propose a new constant-round MPC algorithm for computing join queries. Our algorithm is correct for every join query and its load matches (up to a polylog factor) the worst-case optimal load for at least all join queries that are acyclic or graph-like.

Cite as

Heba Aamer and Bas Ketsman. PAC: Computing Join Queries with Semi-Covers. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{aamer_et_al:LIPIcs.ICDT.2025.6,
  author =	{Aamer, Heba and Ketsman, Bas},
  title =	{{PAC: Computing Join Queries with Semi-Covers}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{6:1--6:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.6},
  URN =		{urn:nbn:de:0030-drops-229474},
  doi =		{10.4230/LIPIcs.ICDT.2025.6},
  annote =	{Keywords: Worst-case optimal load, MPC model, join queries}
}
Document
O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN

Authors: Junhao Gan, Anthony Wirth, and Zhuo Zhang


Abstract
In this paper, we investigate three fundamental problems in the Massively Parallel Computation (MPC) model: (i) grid graph connectivity, (ii) approximate Euclidean Minimum Spanning Tree (EMST), and (iii) approximate DBSCAN. Our first result is a O(1)-round Las Vegas (i.e., succeeding with high probability) MPC algorithm for computing the connected components on a d-dimensional c-penetration grid graph ((d,c)-grid graph), where both d and c are positive integer constants. In such a grid graph, each vertex is a point with integer coordinates in ℕ^d, and an edge can only exist between two distinct vertices with 𝓁_∞-norm at most c. To our knowledge, the current best existing result for computing the connected components (CC’s) on (d,c)-grid graphs in the MPC model is to run the state-of-the-art MPC CC algorithms that are designed for general graphs: they achieve O(log log n + log D) [Behnezhad et al., 2019] and O(log log n + log 1/(λ)) [Sepehr Assadi et al., 2019] rounds, respectively, where D is the diameter and λ is the spectral gap of the graph. With our grid graph connectivity technique, our second main result is a O(1)-round Las Vegas MPC algorithm for computing approximate Euclidean MST. The existing state-of-the-art result on this problem is the O(1)-round MPC algorithm proposed by Andoni et al. [Alexandr Andoni et al., 2014], which only guarantees an approximation on the overall weight in expectation. In contrast, our algorithm not only guarantees a deterministic overall weight approximation, but also achieves a deterministic edge-wise weight approximation. The latter property is crucial to many applications, such as finding the Bichromatic Closest Pair and Single-Linkage Clustering. Last, but not least, our third main result is a O(1)-round Las Vegas MPC algorithm for computing an approximate DBSCAN clustering in O(1)-dimensional Euclidean space.

Cite as

Junhao Gan, Anthony Wirth, and Zhuo Zhang. O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{gan_et_al:LIPIcs.ICDT.2025.7,
  author =	{Gan, Junhao and Wirth, Anthony and Zhang, Zhuo},
  title =	{{O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{7:1--7:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.7},
  URN =		{urn:nbn:de:0030-drops-229483},
  doi =		{10.4230/LIPIcs.ICDT.2025.7},
  annote =	{Keywords: Massively Parallel Computation, Graph Connectivity, Grid Graphs, Euclidean Minimum Spanning Tree, DBSCAN}
}
Document
Enumeration of Minimal Hitting Sets Parameterized by Treewidth

Authors: Batya Kenig and Dan Shlomo Mizrahi


Abstract
Enumerating the minimal hitting sets of a hypergraph is a problem which arises in many data management applications that include constraint mining, discovering unique column combinations, and enumerating database repairs. Previously, Eiter et al. [Thomas Eiter et al., 2003] showed that the minimal hitting sets of an n-vertex hypergraph, with treewidth w, can be enumerated with delay O^*(n^w) (ignoring polynomial factors), with space requirements that scale with the output size. We improve this to fixed-parameter-linear delay, following an FPT preprocessing phase. The memory consumption of our algorithm is exponential with respect to the treewidth of the hypergraph.

Cite as

Batya Kenig and Dan Shlomo Mizrahi. Enumeration of Minimal Hitting Sets Parameterized by Treewidth. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{kenig_et_al:LIPIcs.ICDT.2025.8,
  author =	{Kenig, Batya and Mizrahi, Dan Shlomo},
  title =	{{Enumeration of Minimal Hitting Sets Parameterized by Treewidth}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{8:1--8:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.8},
  URN =		{urn:nbn:de:0030-drops-229498},
  doi =		{10.4230/LIPIcs.ICDT.2025.8},
  annote =	{Keywords: Enumeration, Hitting sets}
}
Document
Query Languages for Neural Networks

Authors: Martin Grohe, Christoph Standke, Juno Steegmans, and Jan Van den Bussche


Abstract
We lay the foundations for a database-inspired approach to interpreting and understanding neural network models by querying them using declarative languages. Towards this end we study different query languages, based on first-order logic, that mainly differ in their access to the neural network model. First-order logic over the reals naturally yields a language which views the network as a black box; only the input-output function defined by the network can be queried. This is essentially the approach of constraint query languages. On the other hand, a white-box language can be obtained by viewing the network as a weighted graph, and extending first-order logic with summation over weight terms. The latter approach is essentially an abstraction of SQL . In general, the two approaches are incomparable in expressive power, as we will show. Under natural circumstances, however, the white-box approach can subsume the black-box approach; this is our main result. We prove the result concretely for linear constraint queries over real functions definable by feedforward neural networks with a fixed number of hidden layers and piecewise linear activation functions.

Cite as

Martin Grohe, Christoph Standke, Juno Steegmans, and Jan Van den Bussche. Query Languages for Neural Networks. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 9:1-9:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2025.9,
  author =	{Grohe, Martin and Standke, Christoph and Steegmans, Juno and Van den Bussche, Jan},
  title =	{{Query Languages for Neural Networks}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{9:1--9:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.9},
  URN =		{urn:nbn:de:0030-drops-229508},
  doi =		{10.4230/LIPIcs.ICDT.2025.9},
  annote =	{Keywords: Expressive power of query languages, Machine learning models, languages for interpretability, explainable AI}
}
Document
Maximizing the Optimality Streak of Deferred Data Structuring (a.k.a. Database Cracking)

Authors: Yufei Tao


Abstract
This paper studies how to minimize the total cost of answering r queries over n elements in an online manner (i.e., the next query is given only after the previous query’s result is ready) when the value r ≤ n is unknown in advance. Traditional indexing, which first builds a complete index on the n elements before answering queries, may be unsuitable because the index’s construction time - usually Ω(n log n) - can become the performance bottleneck. In contrast, for many problems, a lower bound of Ω(n log (1+r)) holds on the total cost of r queries for every r ∈ [1, n]. Matching this lower bound is a primary objective of deferred data structuring (DDS), also known as database cracking in the system community. For a wide class of problems, we present generic reductions to convert traditional indexes into DDS algorithms that match the lower bound for a long range of r. For a decomposable problem, if a data structure can be built in O(n log n) time and has Q(n) query search time, our reduction yields an algorithm that runs in O(n log (1+r)) time for all r ≤ (n log n)/(Q(n)), where the upper bound (n log n)/(Q(n)) is asymptotically the best possible under mild constraints. In particular, if Q(n) = O(log n), then the O(n log (1+r))-time guarantee extends to all r ≤ n, with which we optimally settle a large variety of DDS problems. Our results can be generalized to a class of "spectrum indexable problems", which subsumes the class of decomposable problems.

Cite as

Yufei Tao. Maximizing the Optimality Streak of Deferred Data Structuring (a.k.a. Database Cracking). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 10:1-10:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{tao:LIPIcs.ICDT.2025.10,
  author =	{Tao, Yufei},
  title =	{{Maximizing the Optimality Streak of Deferred Data Structuring (a.k.a. Database Cracking)}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{10:1--10:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.10},
  URN =		{urn:nbn:de:0030-drops-229512},
  doi =		{10.4230/LIPIcs.ICDT.2025.10},
  annote =	{Keywords: Deferred Data Structuring, Database Cracking, Data Structures}
}
Document
Semantic Foundations of Equality Saturation

Authors: Dan Suciu, Yisu Remy Wang, and Yihong Zhang


Abstract
Equality saturation is an emerging technique for program and query optimization developed in the programming language community. It performs term rewriting over an E-graph, a data structure that compactly represents a program space. Despite its popularity, the theory of equality saturation lags behind the practice. In this paper, we define a fixpoint semantics of equality saturation based on tree automata and uncover deep connections between equality saturation and the chase. We characterize the class of chase sequences that correspond to equality saturation. We study the complexities of terminations of equality saturation in three cases: single-instance, all-term-instance, and all-E-graph-instance. Finally, we define a syntactic criterion based on acyclicity that implies equality saturation termination.

Cite as

Dan Suciu, Yisu Remy Wang, and Yihong Zhang. Semantic Foundations of Equality Saturation. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 11:1-11:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{suciu_et_al:LIPIcs.ICDT.2025.11,
  author =	{Suciu, Dan and Wang, Yisu Remy and Zhang, Yihong},
  title =	{{Semantic Foundations of Equality Saturation}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{11:1--11:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.11},
  URN =		{urn:nbn:de:0030-drops-229523},
  doi =		{10.4230/LIPIcs.ICDT.2025.11},
  annote =	{Keywords: the chase, equality saturation, term rewriting, tree automata, query optimization}
}
Document
Tractable Conjunctive Queries over Static and Dynamic Relations

Authors: Ahmet Kara, Zheng Luo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang


Abstract
We investigate the evaluation of conjunctive queries over static and dynamic relations. While static relations are given as input and do not change, dynamic relations are subject to inserts and deletes. We characterise syntactically three classes of queries that admit constant update time and constant enumeration delay. We call such queries tractable. Depending on the class, the preprocessing time is linear, polynomial, or exponential (under data complexity, so the query size is constant). To decide whether a query is tractable, it does not suffice to analyse separately the sub-queries over the static relations and over the dynamic relations, respectively. Instead, we need to take the interaction between the static and the dynamic relations into account. Even when the sub-query over the dynamic relations is not tractable, the overall query can become tractable if the dynamic relations are sufficiently constrained by the static ones.

Cite as

Ahmet Kara, Zheng Luo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Tractable Conjunctive Queries over Static and Dynamic Relations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 12:1-12:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2025.12,
  author =	{Kara, Ahmet and Luo, Zheng and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Tractable Conjunctive Queries over Static and Dynamic Relations}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{12:1--12:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.12},
  URN =		{urn:nbn:de:0030-drops-229534},
  doi =		{10.4230/LIPIcs.ICDT.2025.12},
  annote =	{Keywords: fully dynamic algorithm, constant enumeration delay, constant update time}
}
Document
Symmetric Linear Arc Monadic Datalog and Gadget Reductions

Authors: Manuel Bodirsky and Florian Starke


Abstract
A Datalog program solves a constraint satisfaction problem (CSP) if and only if it derives the goal predicate precisely on the unsatisfiable instances of the CSP. There are three Datalog fragments that are particularly important for finite-domain constraint satisfaction: arc monadic Datalog, linear Datalog, and symmetric linear Datalog, each having good computational properties. We consider the fragment of Datalog where we impose all of these restrictions simultaneously, i.e., we study symmetric linear arc monadic (slam) Datalog. We characterise the CSPs that can be solved by a slam Datalog program as those that have a gadget reduction to a particular Boolean constraint satisfaction problem. We also present exact characterisations in terms of a homomorphism duality (which we call unfolded caterpillar duality), and in universal-algebraic terms (using known minor conditions, namely the existence of quasi Maltsev operations and k-absorptive operations of arity nk, for all n,k ≥ 1). Our characterisations also imply that the question whether a given finite-domain CSP can be expressed by a slam Datalog program is decidable.

Cite as

Manuel Bodirsky and Florian Starke. Symmetric Linear Arc Monadic Datalog and Gadget Reductions. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 13:1-13:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bodirsky_et_al:LIPIcs.ICDT.2025.13,
  author =	{Bodirsky, Manuel and Starke, Florian},
  title =	{{Symmetric Linear Arc Monadic Datalog and Gadget Reductions}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{13:1--13:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.13},
  URN =		{urn:nbn:de:0030-drops-229548},
  doi =		{10.4230/LIPIcs.ICDT.2025.13},
  annote =	{Keywords: Datalog, Gadget Reductions, Homomorphism Dualities, Minor Conditions}
}
Document
Repairing Databases over Metric Spaces with Coincidence Constraints

Authors: Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and David Wajc


Abstract
Datasets often contain values that naturally reside in a metric space: numbers, strings, geographical locations, machine-learned embeddings in a vector space, and so on. We study the computational complexity of repairing inconsistent databases that violate integrity constraints, where the database values belong to an underlying metric space. The goal is to update the database values to retain consistency while minimizing the total distance between the original values and the repaired ones. We consider what we refer to as coincidence constraints, which include unary key constraints, inclusion constraints, foreign keys, and generally any restriction on the relationship between the numbers of cells of different labels (attributes) coinciding in a single value, for a fixed attribute set. We begin by showing that the problem is APX-hard for general metric spaces. We then present an algorithm solving the problem optimally for tree metrics, which generalize both the line metric (i.e., where repaired values are numbers) and the discrete metric (i.e., where we simply count the number of changed values). Combining our algorithm for tree metrics and a classic result on probabilistic tree embeddings, we design a (high probability) logarithmic-ratio approximation for general metrics. We also study the variant of the problem where we limit the allowed change of each individual value. In this variant, it is already NP-complete to decide the existence of any legal repair for a general metric, and we present a polynomial-time repairing algorithm for the case of a line metric.

Cite as

Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, and David Wajc. Repairing Databases over Metric Spaces with Coincidence Constraints. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{kaminsky_et_al:LIPIcs.ICDT.2025.14,
  author =	{Kaminsky, Youri and Kimelfeld, Benny and Livshits, Ester and Naumann, Felix and Wajc, David},
  title =	{{Repairing Databases over Metric Spaces with Coincidence Constraints}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.14},
  URN =		{urn:nbn:de:0030-drops-229554},
  doi =		{10.4230/LIPIcs.ICDT.2025.14},
  annote =	{Keywords: Database repairs, metric spaces, coincidence constraints, inclusion constraints, foreign-key constraints}
}
Document
Query Repairs

Authors: Balder ten Cate, Phokion G. Kolaitis, and Carsten Lutz


Abstract
We formalize and study the problem of repairing database queries based on user feedback in the form of a collection of labeled examples. We propose a framework based on the notion of a proximity pre-order, and we investigate and compare query repairs for conjunctive queries (CQs) using different such pre-orders. The proximity pre-orders we consider are based on query containment and on distance metrics for CQs.

Cite as

Balder ten Cate, Phokion G. Kolaitis, and Carsten Lutz. Query Repairs. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 15:1-15:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{tencate_et_al:LIPIcs.ICDT.2025.15,
  author =	{ten Cate, Balder and Kolaitis, Phokion G. and Lutz, Carsten},
  title =	{{Query Repairs}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{15:1--15:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.15},
  URN =		{urn:nbn:de:0030-drops-229566},
  doi =		{10.4230/LIPIcs.ICDT.2025.15},
  annote =	{Keywords: Query Repairs, Databases, Conjunctive Queries, Data Examples, Fitting}
}
Document
Quantum Data Sketches

Authors: Qin Zhang and Mohsen Heidari


Abstract
Recent advancements in quantum technologies, particularly in quantum sensing and simulation, have facilitated the generation and analysis of inherently quantum data. This progress underscores the necessity for developing efficient and scalable quantum data management strategies. This goal faces immense challenges due to the exponential dimensionality of quantum data and its unique quantum properties such as no-cloning and measurement stochasticity. Specifically, classical storage and manipulation of an arbitrary n-qubit quantum state requires exponential space and time. Hence, there is a critical need to revisit foundational data management concepts and algorithms for quantum data. In this paper, we propose succinct quantum data sketches to support basic database operations such as search and selection. We view our work as an initial step towards the development of quantum data management model, opening up many possibilities for future research in this direction.

Cite as

Qin Zhang and Mohsen Heidari. Quantum Data Sketches. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 16:1-16:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.ICDT.2025.16,
  author =	{Zhang, Qin and Heidari, Mohsen},
  title =	{{Quantum Data Sketches}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{16:1--16:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.16},
  URN =		{urn:nbn:de:0030-drops-229570},
  doi =		{10.4230/LIPIcs.ICDT.2025.16},
  annote =	{Keywords: quantum data representation, data sketching, query execution}
}
Document
Partition Constraints for Conjunctive Queries: Bounds and Worst-Case Optimal Joins

Authors: Kyle Deeds and Timo Camillo Merkl


Abstract
In the last decade, various works have used statistics on relations to improve both the theory and practice of conjunctive query execution. Starting with the AGM bound which took advantage of relation sizes, later works incorporated statistics like functional dependencies and degree constraints. Each new statistic prompted work along two lines; bounding the size of conjunctive query outputs and worst-case optimal join algorithms. In this work, we continue in this vein by introducing a new statistic called a partition constraint. This statistic captures latent structure within relations by partitioning them into sub-relations which each have much tighter degree constraints. We show that this approach can both refine existing cardinality bounds and improve existing worst-case optimal join algorithms.

Cite as

Kyle Deeds and Timo Camillo Merkl. Partition Constraints for Conjunctive Queries: Bounds and Worst-Case Optimal Joins. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 17:1-17:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{deeds_et_al:LIPIcs.ICDT.2025.17,
  author =	{Deeds, Kyle and Merkl, Timo Camillo},
  title =	{{Partition Constraints for Conjunctive Queries: Bounds and Worst-Case Optimal Joins}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{17:1--17:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.17},
  URN =		{urn:nbn:de:0030-drops-229588},
  doi =		{10.4230/LIPIcs.ICDT.2025.17},
  annote =	{Keywords: Worst-Case Optimal Joins, Cardinality Bounds, Degeneracy, Degree Constraints, Partition Constraints}
}
Document
A Framework for Extraction and Transformation of Documents

Authors: Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt


Abstract
We present a theoretical framework for the extraction and transformation of text documents as a two-phase process: The first phase uses document spanners to extract information from the input document. The second phase transforms the extracted information into a suitable output. To support several reasonable extract-transform scenarios, we propose for the first phase an extension of document spanners from span-tuples to so-called multispan-tuples, where variables are mapped to sets of spans instead of only single spans. We focus on multispanners described by regex formulas, and we prove that these have the same desirable properties as standard regular spanners. To formalize the second phase, we consider transformations that map every pair document-tuple, where each tuple comes from the (multi)span-relation extracted in the first phase, into a new output document. The specification of the two phases is what we call an extract-transform (ET) program, which covers practically relevant extract-transform tasks. In this paper, our main technical goal is to identify a broad class of ET programs that can be evaluated efficiently. We specifically focus on the scenario of regular ET programs: the extraction phase is given by a regex multispanner and the transformation phase is given by a regular string-to-string function. We show that for any regular ET program, given an input document, we can enumerate all final output documents with output-linear delay after linear preprocessing. As a side effect, we characterize the expressive power of regular ET programs and also show that they have desirable properties, like being closed under composition.

Cite as

Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt. A Framework for Extraction and Transformation of Documents. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{riveros_et_al:LIPIcs.ICDT.2025.18,
  author =	{Riveros, Cristian and Schmid, Markus L. and Schweikardt, Nicole},
  title =	{{A Framework for Extraction and Transformation of Documents}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.18},
  URN =		{urn:nbn:de:0030-drops-229593},
  doi =		{10.4230/LIPIcs.ICDT.2025.18},
  annote =	{Keywords: Information extraction, Document spanners, Transducers, Query evaluation}
}
Document
Beyond Logarithmic Bounds: Querying in Constant Expected Time with Learned Indexes

Authors: Luis Alberto Croquevielle, Guang Yang, Liang Liang, Ali Hadian, and Thomas Heinis


Abstract
Learned indexes leverage machine learning models to accelerate query answering in databases, showing impressive practical performance. However, theoretical understanding of these methods remains incomplete. Existing research suggests that learned indexes have superior asymptotic complexity compared to their non-learned counterparts, but these findings have been established under restrictive probabilistic assumptions. Specifically, for a sorted array with n elements, it has been shown that learned indexes can find a key in O(log(log n)) expected time using at most linear space, compared with O(log n) for non-learned methods. In this work, we prove O(1) expected time can be achieved with at most linear space, thereby establishing the tightest upper bound so far for the time complexity of an asymptotically optimal learned index. Notably, we use weaker probabilistic assumptions than prior research, meaning our work generalizes previous results. Furthermore, we introduce a new measure of statistical complexity for data. This metric exhibits an information-theoretical interpretation and can be estimated in practice. This characterization provides further theoretical understanding of learned indexes, by helping to explain why some datasets seem to be particularly challenging for these methods.

Cite as

Luis Alberto Croquevielle, Guang Yang, Liang Liang, Ali Hadian, and Thomas Heinis. Beyond Logarithmic Bounds: Querying in Constant Expected Time with Learned Indexes. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 19:1-19:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{croquevielle_et_al:LIPIcs.ICDT.2025.19,
  author =	{Croquevielle, Luis Alberto and Yang, Guang and Liang, Liang and Hadian, Ali and Heinis, Thomas},
  title =	{{Beyond Logarithmic Bounds: Querying in Constant Expected Time with Learned Indexes}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{19:1--19:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.19},
  URN =		{urn:nbn:de:0030-drops-229603},
  doi =		{10.4230/LIPIcs.ICDT.2025.19},
  annote =	{Keywords: Learned Indexes, Expected Time, Stochastic Processes, R\'{e}nyi Entropy}
}
Document
A Formal Language Perspective on Factorized Representations

Authors: Benny Kimelfeld, Wim Martens, and Matthias Niewerth


Abstract
Factorized representations (FRs) are a well-known tool to succinctly represent results of join queries and have been originally defined using the named database perspective. We define FRs in the unnamed database perspective and use them to establish several new connections. First, unnamed FRs can be exponentially more succinct than named FRs, but this difference can be alleviated by imposing a disjointness condition on columns. Conversely, named FRs can also be exponentially more succinct than unnamed FRs. Second, unnamed FRs are the same as (i.e., isomorphic to) context-free grammars for languages in which each word has the same length. This tight connection allows us to transfer a wide range of results on context-free grammars to database factorization; of which we offer a selection in the paper. Third, when we generalize unnamed FRs to arbitrary sets of tuples, they become a generalization of path multiset representations, a formalism that was recently introduced to succinctly represent sets of paths in the context of graph database query evaluation.

Cite as

Benny Kimelfeld, Wim Martens, and Matthias Niewerth. A Formal Language Perspective on Factorized Representations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 20:1-20:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{kimelfeld_et_al:LIPIcs.ICDT.2025.20,
  author =	{Kimelfeld, Benny and Martens, Wim and Niewerth, Matthias},
  title =	{{A Formal Language Perspective on Factorized Representations}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{20:1--20:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.20},
  URN =		{urn:nbn:de:0030-drops-229614},
  doi =		{10.4230/LIPIcs.ICDT.2025.20},
  annote =	{Keywords: Databases, relational databases, graph databases, factorized databases, regular path queries, compact representations}
}
Document
Targeted Least Cardinality Candidate Key for Relational Databases

Authors: Vasileios Nakos, Hung Q. Ngo, and Charalampos E. Tsourakakis


Abstract
Functional dependencies (FDs) are a central theme in databases, playing a major role in the design of database schemas and the optimization of queries [Ramakrishnan and Gehrke, 2003]. In this work, we introduce the targeted least cardinality candidate key problem (TCAND). This problem is defined over a set of functional dependencies ℱ and a target variable set T ⊆ V, and it aims to find the smallest set X ⊆ V such that the FD X → T can be derived from ℱ. The TCAND problem generalizes the well-known NP-hard problem of finding the least cardinality candidate key [Lucchesi and Osborn, 1978], which has been previously demonstrated to be at least as difficult as the set cover problem. We present an integer programming (IP) formulation for the TCAND problem, analogous to a layered set cover problem. We analyze its linear programming (LP) relaxation from two perspectives: we propose two approximation algorithms and investigate the integrality gap. Our findings indicate that the approximation upper bounds for our algorithms are not significantly improvable through LP rounding, a notable distinction from the standard Set Cover problem. Additionally, we discover that a generalization of the TCAND problem is equivalent to a variant of the Set Cover problem, named Red Blue Set Cover [Carr et al., 2000], which cannot be approximated within a sub-polynomial factor in polynomial time under plausible conjectures [Chlamtáč et al., 2023]. Despite the extensive history surrounding the issue of identifying the least cardinality candidate key, our research contributes new theoretical insights, novel algorithms, and demonstrates that the general TCAND problem poses complexities beyond those encountered in the Set Cover problem.

Cite as

Vasileios Nakos, Hung Q. Ngo, and Charalampos E. Tsourakakis. Targeted Least Cardinality Candidate Key for Relational Databases. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{nakos_et_al:LIPIcs.ICDT.2025.21,
  author =	{Nakos, Vasileios and Ngo, Hung Q. and Tsourakakis, Charalampos E.},
  title =	{{Targeted Least Cardinality Candidate Key for Relational Databases}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.21},
  URN =		{urn:nbn:de:0030-drops-229628},
  doi =		{10.4230/LIPIcs.ICDT.2025.21},
  annote =	{Keywords: functional dependencies, candidate key, approximation algorithms, hardness}
}
Document
Queries with External Predicates

Authors: Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, and Cristina Sirangelo


Abstract
Real-life query languages feature external predicates such as user-defined functions or built-in arithmetic and string operations. These predicates are often infinite, potentially leading to unsafe or non-computable queries. To overcome this, traditional languages such as SQL, put significant syntactic restrictions on the use of external predicates. These restrictions have been relaxed in a number of modern query languages, each doing it in their own way. Our goal therefore is to provide a theoretical basis for querying with external predicates. To this end, we formalize queries with external predicates based on the notion of access patterns. We develop a suitable evaluation model, based on Turing machines with oracles, and tailor the classical notion of query safety to it. Since query safety is undecidable in general, we can only produce sufficient conditions for guaranteeing safety. We do so by developing an inference system to derive safety and computability for relational algebra, first-order logic, as well as for a language that combines them both.

Cite as

Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, and Cristina Sirangelo. Queries with External Predicates. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 22:1-22:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{guagliardo_et_al:LIPIcs.ICDT.2025.22,
  author =	{Guagliardo, Paolo and Libkin, Leonid and Marsault, Victor and Martens, Wim and Murlak, Filip and Peterfreund, Liat and Sirangelo, Cristina},
  title =	{{Queries with External Predicates}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{22:1--22:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.22},
  URN =		{urn:nbn:de:0030-drops-229635},
  doi =		{10.4230/LIPIcs.ICDT.2025.22},
  annote =	{Keywords: External predicates, Query safety, Computational model, Oracles, Infinite predicates, Access patterns, Relational algebra, First-order logic}
}
Document
A Simple Algorithm for Worst Case Optimal Join and Sampling

Authors: Florent Capelli, Oliver Irwin, and Sylvain Salvati


Abstract
We present an elementary branch and bound algorithm with a simple analysis of why it achieves worstcase optimality for join queries on classes of databases defined respectively by cardinality or acyclic degree constraints. We then show that if one is given a reasonable way for recursively estimating upper bounds on the number of answers of the join queries, our algorithm can be turned into algorithm for uniformly sampling answers with expected running time Õ(UP/OUT) where UP is the upper bound, OUT is the actual number of answers and Õ(⋅) ignores polylogarithmic factors. Our approach recovers recent results on worstcase optimal join algorithm and sampling in a modular, clean and elementary way.

Cite as

Florent Capelli, Oliver Irwin, and Sylvain Salvati. A Simple Algorithm for Worst Case Optimal Join and Sampling. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{capelli_et_al:LIPIcs.ICDT.2025.23,
  author =	{Capelli, Florent and Irwin, Oliver and Salvati, Sylvain},
  title =	{{A Simple Algorithm for Worst Case Optimal Join and Sampling}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{23:1--23:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.23},
  URN =		{urn:nbn:de:0030-drops-229647},
  doi =		{10.4230/LIPIcs.ICDT.2025.23},
  annote =	{Keywords: join queries, worst-case optimality, uniform sampling}
}
Document
Learning Tree Pattern Transformations

Authors: Daniel Neider, Leif Sabellek, Johannes Schmidt, Fabian Vehlken, and Thomas Zeume


Abstract
Explaining why and how a tree t structurally differs from another tree t^⋆ is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set {(t₁, t₁^⋆),… , (t_n, t_n^⋆)} of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs (t_i, t_i^⋆)? This raises two research questions: (i) what is a good notion of "rule" in this context?; and (ii) how can sets of rules explaining a data set be learned algorithmically? We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.

Cite as

Daniel Neider, Leif Sabellek, Johannes Schmidt, Fabian Vehlken, and Thomas Zeume. Learning Tree Pattern Transformations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 24:1-24:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{neider_et_al:LIPIcs.ICDT.2025.24,
  author =	{Neider, Daniel and Sabellek, Leif and Schmidt, Johannes and Vehlken, Fabian and Zeume, Thomas},
  title =	{{Learning Tree Pattern Transformations}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{24:1--24:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.24},
  URN =		{urn:nbn:de:0030-drops-229652},
  doi =		{10.4230/LIPIcs.ICDT.2025.24},
  annote =	{Keywords: Tree pattern transformations, learning from positive examples, computational complexity}
}
Document
Optimal Oblivious Algorithms for Multi-Way Joins

Authors: Xiao Hu and Zhiang Wu


Abstract
In cloud databases, cloud computation over sensitive data uploaded by clients inevitably causes concern about data security and privacy. Even if cryptographic primitives and trusted computing environments are integrated into query processing to safeguard the actual contents of the data, access patterns of algorithms can still leak private information about data. Oblivious RAM (ORAM) and circuits are two generic approaches to address this issue, ensuring that access patterns of algorithms remain oblivious to the data. However, deploying these methods on insecure algorithms, particularly for multi-way join processing, is computationally expensive and inherently challenging. In this paper, we propose a novel sorting-based algorithm for multi-way join processing that operates without relying on ORAM simulations or other security assumptions. Our algorithm is a non-trivial, provably oblivious composition of basic primitives, with time complexity matching the insecure worst-case optimal join algorithm, up to a logarithmic factor. Furthermore, it is cache-agnostic, with cache complexity matching the insecure lower bound, also up to a logarithmic factor. This clean and straightforward approach has the potential to be extended to other security settings and implemented in practical database systems.

Cite as

Xiao Hu and Zhiang Wu. Optimal Oblivious Algorithms for Multi-Way Joins. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 25:1-25:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{hu_et_al:LIPIcs.ICDT.2025.25,
  author =	{Hu, Xiao and Wu, Zhiang},
  title =	{{Optimal Oblivious Algorithms for Multi-Way Joins}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{25:1--25:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.25},
  URN =		{urn:nbn:de:0030-drops-229662},
  doi =		{10.4230/LIPIcs.ICDT.2025.25},
  annote =	{Keywords: oblivious algorithms, multi-way joins, worst-case optimality}
}
Document
Dynamic Direct Access of MSO Query Evaluation over Strings

Authors: Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros


Abstract
We study the problem of evaluating a Monadic Second Order (MSO) query over strings under updates in the setting of direct access. We present an algorithm that, given an MSO query with first-order free variables represented by an unambiguous variable-set automaton 𝒜 with state set Q and variables X and a string s, computes a data structure in time 𝒪(|Q|^ω⋅ |X|² ⋅ |s|) and, then, given an index i retrieves, using the data structure, the i-th output of the evaluation of 𝒜 over s in time 𝒪(|Q|^ω ⋅ |X|³ ⋅ log(|s|)²) where ω is the exponent for matrix multiplication. Ours is the first efficient direct access algorithm for MSO query evaluation over strings; such algorithms so far had only been studied for first-order queries and conjunctive queries over relational data. Our algorithm gives the answers in lexicographic order where, in contrast to the setting of conjunctive queries, the order between variables can be freely chosen by the user without degrading the runtime. Moreover, our data structure can be updated efficiently after changes to the input string, allowing more powerful updates than in the enumeration literature, e.g. efficient deletion of substrings, concatenation and splitting of strings, and cut-and-paste operations. Our approach combines a matrix representation of MSO queries and a novel data structure for dynamic word problems over semi-groups which yields an overall algorithm that is elegant and easy to formulate.

Cite as

Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros. Dynamic Direct Access of MSO Query Evaluation over Strings. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 26:1-26:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bourhis_et_al:LIPIcs.ICDT.2025.26,
  author =	{Bourhis, Pierre and Capelli, Florent and Mengel, Stefan and Riveros, Cristian},
  title =	{{Dynamic Direct Access of MSO Query Evaluation over Strings}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{26:1--26:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.26},
  URN =		{urn:nbn:de:0030-drops-229675},
  doi =		{10.4230/LIPIcs.ICDT.2025.26},
  annote =	{Keywords: Query evaluation, direct access, MSO queries}
}
Document
Parallel Query Processing with Heterogeneous Machines

Authors: Simon Frisk and Paraschos Koutris


Abstract
We study the problem of computing a full Conjunctive Query in parallel using p heterogeneous machines. Our computational model is similar to the MPC model, but each machine has its own cost function mapping from the number of bits it receives to a cost. An optimal algorithm should minimize the maximum cost across all machines. We consider algorithms over a single communication round and give a lower bound and matching upper bound for databases where each relation has the same cardinality. We do this for both linear cost functions like in previous work, but also for more general cost functions. For databases with relations of different cardinalities, we also find a lower bound, and give matching upper bounds for specific queries like the cartesian product, the join, the star query, and the triangle query. Our approach is inspired by the HyperCube algorithm, but there are additional challenges involved when machines have heterogeneous cost functions.

Cite as

Simon Frisk and Paraschos Koutris. Parallel Query Processing with Heterogeneous Machines. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 27:1-27:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{frisk_et_al:LIPIcs.ICDT.2025.27,
  author =	{Frisk, Simon and Koutris, Paraschos},
  title =	{{Parallel Query Processing with Heterogeneous Machines}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{27:1--27:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.27},
  URN =		{urn:nbn:de:0030-drops-229683},
  doi =		{10.4230/LIPIcs.ICDT.2025.27},
  annote =	{Keywords: Joins, Massively Parallel Computation, Heterogeneous}
}
Document
Generalized Covers for Conjunctive Queries

Authors: Paraschos Koutris


Abstract
Covers of query results were introduced as succinct lossless representations of join query outputs. A cover is a subset of the query result from which we can efficiently enumerate the output with constant delay and linear preprocessing time. However, covers are dependent on a single tree decomposition of the query. In this work, we generalize the notion of a cover to a set of multiple tree decompositions. We show that this generalization can potentially produce asymptotically smaller covers while maintaining the properties of constant-delay enumeration and linear preprocessing time. In particular, given a set of tree decompositions, we can determine exactly the asymptotic size of a minimum cover, which is tied to the notion of entropic width of the query. We also provide a simple greedy algorithm that computes this cover efficiently. Finally, we relate covers to semiring circuits when the semiring is idempotent.

Cite as

Paraschos Koutris. Generalized Covers for Conjunctive Queries. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 28:1-28:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{koutris:LIPIcs.ICDT.2025.28,
  author =	{Koutris, Paraschos},
  title =	{{Generalized Covers for Conjunctive Queries}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{28:1--28:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.28},
  URN =		{urn:nbn:de:0030-drops-229698},
  doi =		{10.4230/LIPIcs.ICDT.2025.28},
  annote =	{Keywords: Conjunctive Query, tree decomposition, cover}
}
Document
FC-Datalog as a Framework for Efficient String Querying

Authors: Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger


Abstract
Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures 𝖯, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.

Cite as

Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger. FC-Datalog as a Framework for Efficient String Querying. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 29:1-29:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bell_et_al:LIPIcs.ICDT.2025.29,
  author =	{Bell, Owen M. and Day, Joel D. and Freydenberger, Dominik D.},
  title =	{{FC-Datalog as a Framework for Efficient String Querying}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{29:1--29:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.29},
  URN =		{urn:nbn:de:0030-drops-229708},
  doi =		{10.4230/LIPIcs.ICDT.2025.29},
  annote =	{Keywords: Information extraction, word equations, datalog, document spanners, regex}
}
Document
An FPRAS for Model Counting for Non-Deterministic Read-Once Branching Programs

Authors: Kuldeep S. Meel and Alexis de Colnet


Abstract
Non-deterministic read-once branching programs, also known as non-deterministic free binary decision diagrams (nFBDD), are a fundamental data structure in computer science for representing Boolean functions. In this paper, we focus on #nFBDD, the problem of model counting for non-deterministic read-once branching programs. The #nFBDD problem is #P-hard, and it is known that there exists a quasi-polynomial randomized approximation scheme for #nFBDD. In this paper, we provide the first FPRAS for #nFBDD. Our result relies on the introduction of new analysis techniques that focus on bounding the dependence of samples.

Cite as

Kuldeep S. Meel and Alexis de Colnet. An FPRAS for Model Counting for Non-Deterministic Read-Once Branching Programs. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 30:1-30:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{meel_et_al:LIPIcs.ICDT.2025.30,
  author =	{Meel, Kuldeep S. and de Colnet, Alexis},
  title =	{{An FPRAS for Model Counting for Non-Deterministic Read-Once Branching Programs}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{30:1--30:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.30},
  URN =		{urn:nbn:de:0030-drops-229717},
  doi =		{10.4230/LIPIcs.ICDT.2025.30},
  annote =	{Keywords: Approximate model counting, FPRAS, Knowledge compilation, nFBDD}
}
Document
On Deciding the Data Complexity of Answering Linear Monadic Datalog Queries with LTL Operators

Authors: Alessandro Artale, Anton Gnatenko, Vladislav Ryzhikov, and Michael Zakharyaschev


Abstract
Our concern is the data complexity of answering linear monadic datalog queries whose atoms in the rule bodies can be prefixed by operators of linear temporal logic LTL. We first observe that, for data complexity, answering any connected query with operators ○/○- (at the next/previous moment) is either in AC⁰, or in ACC⁰\AC⁰, or NC¹-complete, or L-hard and in NL. Then we show that the problem of deciding L-hardness of answering such queries is PSpace-complete, while checking membership in the classes AC⁰ and ACC⁰ as well as NC¹-completeness can be done in ExpSpace. Finally, we prove that membership in AC⁰ or in ACC⁰, NC¹-completeness, and L-hardness are undecidable for queries with operators ◇/◇- (sometime in the future/past) provided that NC¹ ≠ NL and L ≠ NL.

Cite as

Alessandro Artale, Anton Gnatenko, Vladislav Ryzhikov, and Michael Zakharyaschev. On Deciding the Data Complexity of Answering Linear Monadic Datalog Queries with LTL Operators. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 31:1-31:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{artale_et_al:LIPIcs.ICDT.2025.31,
  author =	{Artale, Alessandro and Gnatenko, Anton and Ryzhikov, Vladislav and Zakharyaschev, Michael},
  title =	{{On Deciding the Data Complexity of Answering Linear Monadic Datalog Queries with LTL Operators}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{31:1--31:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.31},
  URN =		{urn:nbn:de:0030-drops-229723},
  doi =		{10.4230/LIPIcs.ICDT.2025.31},
  annote =	{Keywords: Linear monadic datalog, linear temporal logic, data complexity}
}
Document
The Free Termination Property of Queries over Time

Authors: Conor Power, Paraschos Koutris, and Joseph M. Hellerstein


Abstract
Building on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow nodes in a distributed (database) system to unilaterally terminate execution even though they may receive additional data or messages in the future? This completeness question is complementary to the soundness questions studied in the CALM literature. We also develop a new model based on semiautomata that allows us to bridge from the relational transducer model of the CALM papers to algebraic models that are popular among software engineers (e.g. CRDTs) and of increasing interest to database theory for datalog extensions and incremental view maintenance.

Cite as

Conor Power, Paraschos Koutris, and Joseph M. Hellerstein. The Free Termination Property of Queries over Time. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 32:1-32:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{power_et_al:LIPIcs.ICDT.2025.32,
  author =	{Power, Conor and Koutris, Paraschos and Hellerstein, Joseph M.},
  title =	{{The Free Termination Property of Queries over Time}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{32:1--32:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.32},
  URN =		{urn:nbn:de:0030-drops-229736},
  doi =		{10.4230/LIPIcs.ICDT.2025.32},
  annote =	{Keywords: distributed systems, algebraic data models, coordination-free systems}
}
Document
Database Theory in Action
Database Theory in Action: Making Provenance and Probabilistic Database Theory Work in Practice (Invited Talk)

Authors: Silviu Maniu and Pierre Senellart


Abstract
There has been a rich literature in database theory on how to model and manage the provenance of data (for instance using the semiring framework) and its uncertainty (in particular via probabilistic databases). In this article, we explain how these results have been used as the basis for practical implementations, notably in the ProvSQL system, and how these implementations need to be adapted for the efficient management of provenance and probability for real-world data.

Cite as

Silviu Maniu and Pierre Senellart. Database Theory in Action: Making Provenance and Probabilistic Database Theory Work in Practice (Invited Talk). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 33:1-33:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{maniu_et_al:LIPIcs.ICDT.2025.33,
  author =	{Maniu, Silviu and Senellart, Pierre},
  title =	{{Database Theory in Action: Making Provenance and Probabilistic Database Theory Work in Practice (Invited Talk)}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{33:1--33:6},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.33},
  URN =		{urn:nbn:de:0030-drops-229746},
  doi =		{10.4230/LIPIcs.ICDT.2025.33},
  annote =	{Keywords: provenance, probabilistic data, ProvSQL}
}
Document
Database Theory in Action
Database Theory in Action: Search-Based Program Optimization

Authors: Yihong Zhang, Dan Suciu, Yisu Remy Wang, and Max Willsey


Abstract
Recent work in programming languages developed an approach to term rewritings based on equality saturation (EqSat), which, instead of applying destructively the rewrite rules, maintains all equivalent expressions in a structure called an E-graph. This paper describes two surprising connections between EqSat and databases, going both ways. On one hand equality saturation can be viewed as a query evaluation problem, with great benefits. On the other hand, most sophisticated SQL query optimizers are based on the Volcano/Cascades framework which, we explain, is a variant of EqSat.

Cite as

Yihong Zhang, Dan Suciu, Yisu Remy Wang, and Max Willsey. Database Theory in Action: Search-Based Program Optimization. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 34:1-34:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.ICDT.2025.34,
  author =	{Zhang, Yihong and Suciu, Dan and Wang, Yisu Remy and Willsey, Max},
  title =	{{Database Theory in Action: Search-Based Program Optimization}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{34:1--34:6},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.34},
  URN =		{urn:nbn:de:0030-drops-229759},
  doi =		{10.4230/LIPIcs.ICDT.2025.34},
  annote =	{Keywords: Query optimization, program optimization, Cascades framework, equality saturation, Datalog}
}
Document
Database Theory in Action
Database Theory in Action: Database Visualization

Authors: Eugene Wu


Abstract
We draw a connection between data modeling and visualization, namely that a visualization specification defines a mapping from database constraints to visual representations of those constraints. We show data modeling explains many existing visualization design and introduce multi-table database visualization.

Cite as

Eugene Wu. Database Theory in Action: Database Visualization. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 35:1-35:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{wu:LIPIcs.ICDT.2025.35,
  author =	{Wu, Eugene},
  title =	{{Database Theory in Action: Database Visualization}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{35:1--35:6},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.35},
  URN =		{urn:nbn:de:0030-drops-229769},
  doi =		{10.4230/LIPIcs.ICDT.2025.35},
  annote =	{Keywords: Visualization Theory, Data Model, Database Visualization}
}
Document
Database Theory in Action
Database Theory in Action: Cypher, GQL, and Regular Path Queries

Authors: Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, and Alexandra Rogova


Abstract
Cypher has so far been the most commonly used query language for property graphs, and served as the foundation of the recently standardized graph query language GQL. In designing the features of GQL, the standards committee addressed the perceived limitations of Cypher. One such limitation is the inability of Cypher, as originally designed, to express all regular path queries (RPQs). Despite this claim having been stated many times as a folklore result, we could not find any proof of it. In this note we formalize the core of Cypher’s pattern matching and formally prove that indeed it falls short of all RPQs, justifying the inclusion of new pattern matching features in GQL.

Cite as

Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, and Alexandra Rogova. Database Theory in Action: Cypher, GQL, and Regular Path Queries. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 36:1-36:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{gheerbrant_et_al:LIPIcs.ICDT.2025.36,
  author =	{Gheerbrant, Am\'{e}lie and Libkin, Leonid and Peterfreund, Liat and Rogova, Alexandra},
  title =	{{Database Theory in Action: Cypher, GQL, and Regular Path Queries}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{36:1--36:5},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.36},
  URN =		{urn:nbn:de:0030-drops-229771},
  doi =		{10.4230/LIPIcs.ICDT.2025.36},
  annote =	{Keywords: Regular path queries, Cypher, GQL, inexpressibility}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail