DROPS

Document

DOI: 10.4230/LIPIcs.ICDT.2026.15

A Formal Query Language and Automata Model for Aggregation in Complex Event Recognition

Authors: Pierre Bourhis, Cristian Riveros, and Amaranta Salas

Published in: LIPIcs, Volume 365, 29th International Conference on Database Theory (ICDT 2026)

Abstract

Complex Event Recognition (CER) systems are used to identify complex patterns in event streams, such as those found in stock markets, sensor networks, and other similar applications. An important task in such patterns is aggregation, which involves summarizing a set of values into a single value using an algebraic function, such as the maximum, sum, or average, among others. Despite the relevance of this task, query languages in CER typically support aggregation in a restricted syntactic form, and their semantics are generally undefined. In this work, we present a first step toward formalizing a query language with aggregation for CER. We propose to extend Complex Event Logic (CEL), a formal query language for CER, with aggregation operations. This task requires revisiting the semantics of CEL, using a new semantics based on bags of tuples instead of sets of positions. Then, we present an extension of CEL, called Aggregation CEL (ACEL), which introduces an aggregation operator for any commutative monoid operation. The operator can be freely composed with previous CEL operators, allowing users to define complex queries and patterns. We showcase several queries in practice where ACEL proves to be natural for specifying them. From the computational side, we present a novel automata model, called Aggregation Complex Event Automata (ACEA), that extends the previous proposal of Complex Event Automata (CEA) with aggregation and filtering features. Moreover, we demonstrate that every query in ACEL can be expressed in ACEA, illustrating the effectiveness of our computational model. Finally, we study the expressiveness of ACEA through the lens of ACEL, showing that the automata model is more expressive than ACEL.

Cite as

Pierre Bourhis, Cristian Riveros, and Amaranta Salas. A Formal Query Language and Automata Model for Aggregation in Complex Event Recognition. In 29th International Conference on Database Theory (ICDT 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 365, pp. 15:1-15:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{bourhis_et_al:LIPIcs.ICDT.2026.15,
  author =	{Bourhis, Pierre and Riveros, Cristian and Salas, Amaranta},
  title =	{{A Formal Query Language and Automata Model for Aggregation in Complex Event Recognition}},
  booktitle =	{29th International Conference on Database Theory (ICDT 2026)},
  pages =	{15:1--15:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-413-0},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{365},
  editor =	{ten Cate, Balder and Funk, Maurice},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2026.15},
  URN =		{urn:nbn:de:0030-drops-256291},
  doi =		{10.4230/LIPIcs.ICDT.2026.15},
  annote =	{Keywords: Streams, complex event recognition, query language, aggregation}
}

Document

DOI: 10.4230/LIPIcs.STACS.2026.37

Random Models and Guarded Logic

Authors: Oskar Fiuk

Published in: LIPIcs, Volume 364, 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)

Abstract

Building on ideas of Gurevich and Shelah for the Gödel Class, we present a new probabilistic proof of the finite model property for the Guarded Fragment of First-Order Logic. Our proof is conceptually simple and yields the optimal doubly-exponential upper bound on the size of minimal models. We precisely analyse the obtained bound, up to constant factors in the exponents, and construct sentences that enforce models of tightly matching size. The probabilistic approach adapts naturally to the Triguarded Fragment, an extension of the Guarded Fragment that also subsumes the Two-Variable Fragment. Finally, we derandomise the probabilistic proof by providing an explicit model construction which replaces randomness with deterministic hash functions.

Cite as

Oskar Fiuk. Random Models and Guarded Logic. In 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 364, pp. 37:1-37:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{fiuk:LIPIcs.STACS.2026.37,
  author =	{Fiuk, Oskar},
  title =	{{Random Models and Guarded Logic}},
  booktitle =	{43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)},
  pages =	{37:1--37:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-412-3},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{364},
  editor =	{Mahajan, Meena and Manea, Florin and McIver, Annabelle and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2026.37},
  URN =		{urn:nbn:de:0030-drops-255269},
  doi =		{10.4230/LIPIcs.STACS.2026.37},
  annote =	{Keywords: guarded fragment, finite model property, probabilistic method}
}

Document

Invited Paper

DOI: 10.4230/OASIcs.RW.2024/2025.5

Inconsistency-Tolerant Semantics Based on (Preferred) Repairs (Invited Paper)

Authors: Camille Bourgaux

Published in: OASIcs, Volume 138, Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 & RW 2025)

Abstract

Real-world datasets are plagued by data quality issues which may render the data inconsistent w.r.t. a set of constraints, be they given by database integrity constraints or ontologies. A prominent way to handle such inconsistent data is to use inconsistency-tolerant semantics to obtain meaningful answers to queries. Most of these semantics are based on some notion of repairs, which represent ways of restoring the data consistency. The most basic kind of repairs is that of subset repairs, which are maximal consistent subsets of the dataset. However, in many scenarios, one can define preferred repairs based on some preference information. These lecture notes present inconsistency-tolerant semantics, focusing on the repair-based ones, then review different kinds of preferred repairs that have been considered in the literature. We present in particular the relationships between different kinds of preferred repairs and other notions related to inconsistency handling, the computational complexity of reasoning with (preferred) repairs, and some implementations.

Cite as

Camille Bourgaux. Inconsistency-Tolerant Semantics Based on (Preferred) Repairs (Invited Paper). In Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 & RW 2025). Open Access Series in Informatics (OASIcs), Volume 138, pp. 5:1-5:67, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bourgaux:OASIcs.RW.2024/2025.5,
  author =	{Bourgaux, Camille},
  title =	{{Inconsistency-Tolerant Semantics Based on (Preferred) Repairs}},
  booktitle =	{Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 \& RW 2025)},
  pages =	{5:1--5:67},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-405-5},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{138},
  editor =	{Artale, Alessandro and Bienvenu, Meghyn and Garc{\'\i}a, Yazm{\'\i}n Ib\'{a}\~{n}ez and Murlak, Filip},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.RW.2024/2025.5},
  URN =		{urn:nbn:de:0030-drops-250504},
  doi =		{10.4230/OASIcs.RW.2024/2025.5},
  annote =	{Keywords: Knowledge bases, databases, inconsistency handling, repairs, preferences}
}

Document

Invited Paper

DOI: 10.4230/OASIcs.RW.2024/2025.7

Modern Datalog: Concepts, Methods, Applications (Invited Paper)

Authors: Markus Krötzsch

Published in: OASIcs, Volume 138, Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 & RW 2025)

Abstract

Pure Datalog is arguably the most fundamental rule language, elegant and simple, but also often too limited to be useful in practice. This has motivated the introduction of many new expressive features, ranging from datatypes and related functions, over aggregates and semi-ring generalisations, to existential quantifiers and complex terms. In spite of their variety, all these approaches remain true to the nature of Datalog as a direct, pattern-based way of computing on structured data. We therefore find that a modern notion of Datalog is emerging, distinctly different from other approaches of logic programming and with its own set of related methods and applications. In this course, we introduce Datalog and its most common extensions, and explain when and how these features can be used together (which is often, but not always, safe to do). We further look at modern Datalog systems and some of their primary use cases. Hands-on work with Datalog and its extensions is done with the free Datalog engine https://knowsys.github.io/nemo-doc/. The course is accessible to all audiences and does not assume specific prior knowledge.

Cite as

Markus Krötzsch. Modern Datalog: Concepts, Methods, Applications (Invited Paper). In Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 & RW 2025). Open Access Series in Informatics (OASIcs), Volume 138, pp. 7:1-7:41, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{krotzsch:OASIcs.RW.2024/2025.7,
  author =	{Kr\"{o}tzsch, Markus},
  title =	{{Modern Datalog: Concepts, Methods, Applications}},
  booktitle =	{Joint Proceedings of the 20th and 21st Reasoning Web Summer Schools (RW 2024 \& RW 2025)},
  pages =	{7:1--7:41},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-405-5},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{138},
  editor =	{Artale, Alessandro and Bienvenu, Meghyn and Garc{\'\i}a, Yazm{\'\i}n Ib\'{a}\~{n}ez and Murlak, Filip},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.RW.2024/2025.7},
  URN =		{urn:nbn:de:0030-drops-250524},
  doi =		{10.4230/OASIcs.RW.2024/2025.7},
  annote =	{Keywords: Datalog, query language, knowlegde representation and reasoning, logic programming, Horn logic, SPARQL, datatypes and aggregation, lecture notes, tutorial}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.8

Dynamic Membership for Regular Tree Languages

Authors: Antoine Amarilli, Corentin Barloy, Louis Jachiet, and Charles Paperman

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

We study the dynamic membership problem for regular tree languages under relabeling updates: we fix an alphabet Σ and a regular tree language L over Σ (expressed, e.g., as a tree automaton), we are given a tree T with labels in Σ, and we must maintain the information of whether the tree T belongs to L while handling relabeling updates that change the labels of individual nodes in T. Our first contribution is to show that this problem admits an O(log n / log log n) algorithm for any fixed regular tree language, improving over known O(log n) algorithms. This generalizes the known O(log n / log log n) upper bound over words, and it matches the lower bound of Ω(log n / log log n) from dynamic membership to some word languages and from the existential marked ancestor problem. Our second contribution is to introduce a class of regular languages, dubbed almost-commutative tree languages, and show that dynamic membership to such languages under relabeling updates can be decided in constant time per update. Almost-commutative languages generalize both commutative languages and finite languages: they are the analogue for trees of the ZG languages enjoying constant-time dynamic membership over words. Our main technical contribution is to show that this class is conditionally optimal when we assume that the alphabet features a neutral letter, i.e., a letter that has no effect on membership to the language. More precisely, we show that any regular tree language with a neutral letter which is not almost-commutative cannot be maintained in constant time under the assumption that the prefix-U1 problem from [Antoine Amarilli et al., 2021] also does not admit a constant-time algorithm.

Cite as

Antoine Amarilli, Corentin Barloy, Louis Jachiet, and Charles Paperman. Dynamic Membership for Regular Tree Languages. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.MFCS.2025.8,
  author =	{Amarilli, Antoine and Barloy, Corentin and Jachiet, Louis and Paperman, Charles},
  title =	{{Dynamic Membership for Regular Tree Languages}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.8},
  URN =		{urn:nbn:de:0030-drops-241155},
  doi =		{10.4230/LIPIcs.MFCS.2025.8},
  annote =	{Keywords: automaton, dynamic membership, incremental maintenance, forest algebra}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.20

A Formal Language Perspective on Factorized Representations

Authors: Benny Kimelfeld, Wim Martens, and Matthias Niewerth

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Factorized representations (FRs) are a well-known tool to succinctly represent results of join queries and have been originally defined using the named database perspective. We define FRs in the unnamed database perspective and use them to establish several new connections. First, unnamed FRs can be exponentially more succinct than named FRs, but this difference can be alleviated by imposing a disjointness condition on columns. Conversely, named FRs can also be exponentially more succinct than unnamed FRs. Second, unnamed FRs are the same as (i.e., isomorphic to) context-free grammars for languages in which each word has the same length. This tight connection allows us to transfer a wide range of results on context-free grammars to database factorization; of which we offer a selection in the paper. Third, when we generalize unnamed FRs to arbitrary sets of tuples, they become a generalization of path multiset representations, a formalism that was recently introduced to succinctly represent sets of paths in the context of graph database query evaluation.

Cite as

Benny Kimelfeld, Wim Martens, and Matthias Niewerth. A Formal Language Perspective on Factorized Representations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 20:1-20:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kimelfeld_et_al:LIPIcs.ICDT.2025.20,
  author =	{Kimelfeld, Benny and Martens, Wim and Niewerth, Matthias},
  title =	{{A Formal Language Perspective on Factorized Representations}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{20:1--20:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.20},
  URN =		{urn:nbn:de:0030-drops-229614},
  doi =		{10.4230/LIPIcs.ICDT.2025.20},
  annote =	{Keywords: Databases, relational databases, graph databases, factorized databases, regular path queries, compact representations}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.18

A Framework for Extraction and Transformation of Documents

Authors: Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

We present a theoretical framework for the extraction and transformation of text documents as a two-phase process: The first phase uses document spanners to extract information from the input document. The second phase transforms the extracted information into a suitable output. To support several reasonable extract-transform scenarios, we propose for the first phase an extension of document spanners from span-tuples to so-called multispan-tuples, where variables are mapped to sets of spans instead of only single spans. We focus on multispanners described by regex formulas, and we prove that these have the same desirable properties as standard regular spanners. To formalize the second phase, we consider transformations that map every pair document-tuple, where each tuple comes from the (multi)span-relation extracted in the first phase, into a new output document. The specification of the two phases is what we call an extract-transform (ET) program, which covers practically relevant extract-transform tasks. In this paper, our main technical goal is to identify a broad class of ET programs that can be evaluated efficiently. We specifically focus on the scenario of regular ET programs: the extraction phase is given by a regex multispanner and the transformation phase is given by a regular string-to-string function. We show that for any regular ET program, given an input document, we can enumerate all final output documents with output-linear delay after linear preprocessing. As a side effect, we characterize the expressive power of regular ET programs and also show that they have desirable properties, like being closed under composition.

Cite as

Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt. A Framework for Extraction and Transformation of Documents. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{riveros_et_al:LIPIcs.ICDT.2025.18,
  author =	{Riveros, Cristian and Schmid, Markus L. and Schweikardt, Nicole},
  title =	{{A Framework for Extraction and Transformation of Documents}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.18},
  URN =		{urn:nbn:de:0030-drops-229593},
  doi =		{10.4230/LIPIcs.ICDT.2025.18},
  annote =	{Keywords: Information extraction, Document spanners, Transducers, Query evaluation}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.26

Dynamic Direct Access of MSO Query Evaluation over Strings

Authors: Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

We study the problem of evaluating a Monadic Second Order (MSO) query over strings under updates in the setting of direct access. We present an algorithm that, given an MSO query with first-order free variables represented by an unambiguous variable-set automaton 𝒜 with state set Q and variables X and a string s, computes a data structure in time 𝒪(|Q|^ω⋅ |X|² ⋅ |s|) and, then, given an index i retrieves, using the data structure, the i-th output of the evaluation of 𝒜 over s in time 𝒪(|Q|^ω ⋅ |X|³ ⋅ log(|s|)²) where ω is the exponent for matrix multiplication. Ours is the first efficient direct access algorithm for MSO query evaluation over strings; such algorithms so far had only been studied for first-order queries and conjunctive queries over relational data. Our algorithm gives the answers in lexicographic order where, in contrast to the setting of conjunctive queries, the order between variables can be freely chosen by the user without degrading the runtime. Moreover, our data structure can be updated efficiently after changes to the input string, allowing more powerful updates than in the enumeration literature, e.g. efficient deletion of substrings, concatenation and splitting of strings, and cut-and-paste operations. Our approach combines a matrix representation of MSO queries and a novel data structure for dynamic word problems over semi-groups which yields an overall algorithm that is elegant and easy to formulate.

Cite as

Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros. Dynamic Direct Access of MSO Query Evaluation over Strings. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 26:1-26:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bourhis_et_al:LIPIcs.ICDT.2025.26,
  author =	{Bourhis, Pierre and Capelli, Florent and Mengel, Stefan and Riveros, Cristian},
  title =	{{Dynamic Direct Access of MSO Query Evaluation over Strings}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{26:1--26:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.26},
  URN =		{urn:nbn:de:0030-drops-229675},
  doi =		{10.4230/LIPIcs.ICDT.2025.26},
  annote =	{Keywords: Query evaluation, direct access, MSO queries}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.8

Enumeration of Minimal Hitting Sets Parameterized by Treewidth

Authors: Batya Kenig and Dan Shlomo Mizrahi

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Enumerating the minimal hitting sets of a hypergraph is a problem which arises in many data management applications that include constraint mining, discovering unique column combinations, and enumerating database repairs. Previously, Eiter et al. [Thomas Eiter et al., 2003] showed that the minimal hitting sets of an n-vertex hypergraph, with treewidth w, can be enumerated with delay O^*(n^w) (ignoring polynomial factors), with space requirements that scale with the output size. We improve this to fixed-parameter-linear delay, following an FPT preprocessing phase. The memory consumption of our algorithm is exponential with respect to the treewidth of the hypergraph.

Cite as

Batya Kenig and Dan Shlomo Mizrahi. Enumeration of Minimal Hitting Sets Parameterized by Treewidth. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kenig_et_al:LIPIcs.ICDT.2025.8,
  author =	{Kenig, Batya and Mizrahi, Dan Shlomo},
  title =	{{Enumeration of Minimal Hitting Sets Parameterized by Treewidth}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{8:1--8:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.8},
  URN =		{urn:nbn:de:0030-drops-229498},
  doi =		{10.4230/LIPIcs.ICDT.2025.8},
  annote =	{Keywords: Enumeration, Hitting sets}
}

Document

Database Theory in Action

DOI: 10.4230/LIPIcs.ICDT.2025.33

Database Theory in Action: Making Provenance and Probabilistic Database Theory Work in Practice (Invited Talk)

Authors: Silviu Maniu and Pierre Senellart

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

There has been a rich literature in database theory on how to model and manage the provenance of data (for instance using the semiring framework) and its uncertainty (in particular via probabilistic databases). In this article, we explain how these results have been used as the basis for practical implementations, notably in the ProvSQL system, and how these implementations need to be adapted for the efficient management of provenance and probability for real-world data.

Cite as

Silviu Maniu and Pierre Senellart. Database Theory in Action: Making Provenance and Probabilistic Database Theory Work in Practice (Invited Talk). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 33:1-33:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{maniu_et_al:LIPIcs.ICDT.2025.33,
  author =	{Maniu, Silviu and Senellart, Pierre},
  title =	{{Database Theory in Action: Making Provenance and Probabilistic Database Theory Work in Practice (Invited Talk)}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{33:1--33:6},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.33},
  URN =		{urn:nbn:de:0030-drops-229746},
  doi =		{10.4230/LIPIcs.ICDT.2025.33},
  annote =	{Keywords: provenance, probabilistic data, ProvSQL}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.29

FC-Datalog as a Framework for Efficient String Querying

Authors: Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures 𝖯, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.

Cite as

Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger. FC-Datalog as a Framework for Efficient String Querying. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 29:1-29:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bell_et_al:LIPIcs.ICDT.2025.29,
  author =	{Bell, Owen M. and Day, Joel D. and Freydenberger, Dominik D.},
  title =	{{FC-Datalog as a Framework for Efficient String Querying}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{29:1--29:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.29},
  URN =		{urn:nbn:de:0030-drops-229708},
  doi =		{10.4230/LIPIcs.ICDT.2025.29},
  annote =	{Keywords: Information extraction, word equations, datalog, document spanners, regex}
}

Document

DOI: 10.4230/DagRep.14.1.49

Representation, Provenance, and Explanations in Database Theory and Logic (Dagstuhl Seminar 24032)

Authors: Pablo Barcelo, Pierre Bourhis, Stefan Mengel, and Sudeepa Roy

Published in: Dagstuhl Reports, Volume 14, Issue 1 (2024)

Abstract

This report documents the program and the outcomes of Dagstuhl Seminar "Representation, Provenance, and Explanations in Database Theory and Logic" (24032), which was broadly in the area of database theory. Database theory formalizes the theoretical underpinnings of databases and analyzes them with mathematical tools. We focused on questions related to the fundamental problem of efficient query evaluation: compute the answers of a query on a database. This seminar focused on three key aspects of query evaluations. (1) Representation studies the tradeoff between expressivity, compactness, and efficient computation of outputs from the inputs, including circuits and knowledge compilation forms, enumeration, and direct access. (2) Provenance captures the computation process of outputs from the inputs using a compact formula, and has applications to probabilistic databases. (3) Explanations give meaningful insights to responsibilities of different inputs toward an output beyond provenance, e.g., by using Shapley Values from co-operative game theory that has been recently popular in both DB and ML.

Cite as

Pablo Barcelo, Pierre Bourhis, Stefan Mengel, and Sudeepa Roy. Representation, Provenance, and Explanations in Database Theory and Logic (Dagstuhl Seminar 24032). In Dagstuhl Reports, Volume 14, Issue 1, pp. 49-71, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{barcelo_et_al:DagRep.14.1.49,
  author =	{Barcelo, Pablo and Bourhis, Pierre and Mengel, Stefan and Roy, Sudeepa},
  title =	{{Representation, Provenance, and Explanations in Database Theory and Logic (Dagstuhl Seminar 24032)}},
  pages =	{49--71},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2024},
  volume =	{14},
  number =	{1},
  editor =	{Barcelo, Pablo and Bourhis, Pierre and Mengel, Stefan and Roy, Sudeepa},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.14.1.49},
  URN =		{urn:nbn:de:0030-drops-204904},
  doi =		{10.4230/DagRep.14.1.49},
  annote =	{Keywords: Circuits, database theory, factorized databases, provenance, shapley values}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.17

Containment of Regular Path Queries Under Path Constraints

Authors: Sylvain Salvati and Sophie Tison

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Data integrity is ensured by expressing constraints it should satisfy. One can also view constraints as data properties and take advantage of them for several tasks such as reasoning about data or accelerating query processing. In the context of graph databases, simple constraints can be expressed by means of path constraints while simple queries are modeled as regular path queries (RPQs). In this paper, we investigate the containment of RPQs under path constraints. We focus on word constraints that can be viewed as tuple-generating dependencies (TGDs) of the form ∀x_1,x_2, ∃y⁻, a_1(x_1,y_1) ∧ ... ∧ a_i(y_{i-1},y_i) ∧ ... ∧ a_n(y_{n-1},x_2) ⟶ ∃z⁻, b_1(x_1,z_1) ∧ ... ∧ b_i(z_{i-1},z_i) ∧ ... ∧ b_m(z_{m-1},x_2). Such a constraint means that whenever two nodes in a graph are connected by a path labeled a_1 … a_n, there is also a path labeled b_1 … b_m that connects them. Rewrite systems offer an abstract view of these TGDs: the rewrite rule a_1 … a_n → b_1 … b_m represents the previous constraint. A set of constraints 𝒞 is then represented by a rewrite system R and, when dealing with possibly infinite databases, a path query p is contained in a path query q under the constraints 𝒞 iff p rewrites to q with R. Contrary to what has been claimed in the literature we show that, when restricting to finite databases only, there are cases where a path query p is contained in a path query q under the constraints 𝒞 while p does not rewrite to q with R. More generally, we study the finite controllability of the containment of RPQs under word constraints, that is when this containment problem on unrestricted databases does coincide with the finite case. We give an exact characterisation of the cases where this equivalence holds. We then deduce the undecidability of the containment problem in the finite case even when RPQs are restricted to word queries. We prove several properties related to finite controllability, and in particular that it is undecidable. We also exhibit some classes of word constraints that ensure the finite controllability and the decidability of the containment problem.

Cite as

Sylvain Salvati and Sophie Tison. Containment of Regular Path Queries Under Path Constraints. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 17:1-17:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{salvati_et_al:LIPIcs.ICDT.2024.17,
  author =	{Salvati, Sylvain and Tison, Sophie},
  title =	{{Containment of Regular Path Queries Under Path Constraints}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{17:1--17:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.17},
  URN =		{urn:nbn:de:0030-drops-197994},
  doi =		{10.4230/LIPIcs.ICDT.2024.17},
  annote =	{Keywords: Graph databases, rational path queries, query containment, TGDs, word constraints, rewrite systems, finite controllability, decision problems}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.25

Ranked Enumeration for MSO on Trees via Knowledge Compilation

Authors: Antoine Amarilli, Pierre Bourhis, Florent Capelli, and Mikaël Monet

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

We study the problem of enumerating the satisfying assignments for certain circuit classes from knowledge compilation, where assignments are ranked in a specific order. In particular, we show how this problem can be used to efficiently perform ranked enumeration of the answers to MSO queries over trees, with the order being given by a ranking function satisfying a subset-monotonicity property. Assuming that the number of variables is constant, we show that we can enumerate the satisfying assignments in ranked order for so-called multivalued circuits that are smooth, decomposable, and in negation normal form (smooth multivalued DNNF). There is no preprocessing and the enumeration delay is linear in the size of the circuit times the number of values, plus a logarithmic term in the number of assignments produced so far. If we further assume that the circuit is deterministic (smooth multivalued d-DNNF), we can achieve linear-time preprocessing in the circuit, and the delay only features the logarithmic term.

Cite as

Antoine Amarilli, Pierre Bourhis, Florent Capelli, and Mikaël Monet. Ranked Enumeration for MSO on Trees via Knowledge Compilation. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 25:1-25:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2024.25,
  author =	{Amarilli, Antoine and Bourhis, Pierre and Capelli, Florent and Monet, Mika\"{e}l},
  title =	{{Ranked Enumeration for MSO on Trees via Knowledge Compilation}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{25:1--25:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.25},
  URN =		{urn:nbn:de:0030-drops-198079},
  doi =		{10.4230/LIPIcs.ICDT.2024.25},
  annote =	{Keywords: Enumeration, knowledge compilation, monadic second-order logic}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.6

On Distances Between Words with Parameters

Authors: Pierre Bourhis, Aaron Boussidan, and Philippe Gambette

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

The edit distance between parameterized words is a generalization of the classical edit distance where it is allowed to map particular letters of the first word, called parameters, to parameters of the second word before computing the distance. This problem has been introduced in particular for detection of code duplication, and the notion of words with parameters has also been used with different semantics in other fields. The complexity of several variants of edit distances between parameterized words has been studied, however, the complexity of the most natural one, the Levenshtein distance, remained open. In this paper, we solve this open question and close the exhaustive analysis of all cases of parameterized word matching and function matching, showing that these problems are np-complete. To this aim, we also provide a comparison of the different problems, exhibiting several equivalences between them. We also provide and implement a MaxSAT encoding of the problem, as well as a simple FPT algorithm in the alphabet size, and study their efficiency on real data in the context of theater play structure comparison.

Cite as

Pierre Bourhis, Aaron Boussidan, and Philippe Gambette. On Distances Between Words with Parameters. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 6:1-6:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bourhis_et_al:LIPIcs.CPM.2023.6,
  author =	{Bourhis, Pierre and Boussidan, Aaron and Gambette, Philippe},
  title =	{{On Distances Between Words with Parameters}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{6:1--6:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.6},
  URN =		{urn:nbn:de:0030-drops-179602},
  doi =		{10.4230/LIPIcs.CPM.2023.6},
  annote =	{Keywords: String matching, edit distance, Levenshtein, parameterized matching, parameterized words, parameter words, instantiable words, NP-completeness, MAX-SAT}
}

22 Search Results for "Bourhis, Pierre"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message