DROPS

Document

DOI: 10.4230/LIPIcs.MFCS.2025.9

Linear Time Subsequence and Supersequence Regex Matching

Authors: Antoine Amarilli, Florin Manea, Tina Ringleb, and Markus L. Schmid

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

It is well-known that checking whether a given string w matches a given regular expression r can be done in quadratic time O(|w|⋅ |r|) and that this cannot be improved to a truly subquadratic running time of O((|w|⋅ |r|)^{1-ε}) assuming the strong exponential time hypothesis (SETH). We study a different matching paradigm where we ask instead whether w has a subsequence that matches r, and show that regex matching in this sense can be solved in linear time O(|w| + |r|). Further, the same holds if we ask for a supersequence. We show that the quantitative variants where we want to compute a longest or shortest subsequence or supersequence of w that matches r can be solved in O(|w|⋅ |r|), i. e., asymptotically no worse than classical regex matching; and we show that O(|w| + |r|) is conditionally not possible for these problems. We also investigate these questions with respect to other natural string relations like the infix, prefix, left-extension or extension relation instead of the subsequence and supersequence relation. We further study the complexity of the universal problem where we ask if all subsequences (or supersequences, infixes, prefixes, left-extensions or extensions) of an input string satisfy a given regular expression.

Cite as

Antoine Amarilli, Florin Manea, Tina Ringleb, and Markus L. Schmid. Linear Time Subsequence and Supersequence Regex Matching. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.MFCS.2025.9,
  author =	{Amarilli, Antoine and Manea, Florin and Ringleb, Tina and Schmid, Markus L.},
  title =	{{Linear Time Subsequence and Supersequence Regex Matching}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.9},
  URN =		{urn:nbn:de:0030-drops-241162},
  doi =		{10.4230/LIPIcs.MFCS.2025.9},
  annote =	{Keywords: subsequence, supersequence, regular language, regular expression, automata}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.22

On the Reachability Problem for Two-Dimensional Branching VASS

Authors: Clotilde Bizière, Thibault Hilaire, Jérôme Leroux, and Grégoire Sutre

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

Vectors addition systems with states (VASS), or equivalently Petri nets, are arguably one of the most studied formalisms for the modeling and analysis of concurrent systems. A central decision problem for VASS is reachability: whether there exists a run from an initial configuration to a final one. This problem has been known to be decidable for over forty years, and its complexity has recently been precisely characterized. Our work concerns the reachability problem for BVASS, a branching generalization of VASS. In dimension one, the exact complexity of this problem is known. In this paper, we prove that the reachability problem for 2-dimensional BVASS is decidable. In fact, we even show that the reachability set admits a computable semilinear presentation. The decidability status of the reachability problem for BVASS remains open in higher dimensions.

Cite as

Clotilde Bizière, Thibault Hilaire, Jérôme Leroux, and Grégoire Sutre. On the Reachability Problem for Two-Dimensional Branching VASS. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 22:1-22:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{biziere_et_al:LIPIcs.MFCS.2025.22,
  author =	{Bizi\`{e}re, Clotilde and Hilaire, Thibault and Leroux, J\'{e}r\^{o}me and Sutre, Gr\'{e}goire},
  title =	{{On the Reachability Problem for Two-Dimensional Branching VASS}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{22:1--22:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.22},
  URN =		{urn:nbn:de:0030-drops-241294},
  doi =		{10.4230/LIPIcs.MFCS.2025.22},
  annote =	{Keywords: Vector addition systems, Reachability problem, Semilinear sets, Verification}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.29

FC-Datalog as a Framework for Efficient String Querying

Authors: Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures 𝖯, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.

Cite as

Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger. FC-Datalog as a Framework for Efficient String Querying. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 29:1-29:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bell_et_al:LIPIcs.ICDT.2025.29,
  author =	{Bell, Owen M. and Day, Joel D. and Freydenberger, Dominik D.},
  title =	{{FC-Datalog as a Framework for Efficient String Querying}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{29:1--29:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.29},
  URN =		{urn:nbn:de:0030-drops-229708},
  doi =		{10.4230/LIPIcs.ICDT.2025.29},
  annote =	{Keywords: Information extraction, word equations, datalog, document spanners, regex}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.12

Tractable Conjunctive Queries over Static and Dynamic Relations

Authors: Ahmet Kara, Zheng Luo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

We investigate the evaluation of conjunctive queries over static and dynamic relations. While static relations are given as input and do not change, dynamic relations are subject to inserts and deletes. We characterise syntactically three classes of queries that admit constant update time and constant enumeration delay. We call such queries tractable. Depending on the class, the preprocessing time is linear, polynomial, or exponential (under data complexity, so the query size is constant). To decide whether a query is tractable, it does not suffice to analyse separately the sub-queries over the static relations and over the dynamic relations, respectively. Instead, we need to take the interaction between the static and the dynamic relations into account. Even when the sub-query over the dynamic relations is not tractable, the overall query can become tractable if the dynamic relations are sufficiently constrained by the static ones.

Cite as

Ahmet Kara, Zheng Luo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Tractable Conjunctive Queries over Static and Dynamic Relations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 12:1-12:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2025.12,
  author =	{Kara, Ahmet and Luo, Zheng and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Tractable Conjunctive Queries over Static and Dynamic Relations}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{12:1--12:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.12},
  URN =		{urn:nbn:de:0030-drops-229534},
  doi =		{10.4230/LIPIcs.ICDT.2025.12},
  annote =	{Keywords: fully dynamic algorithm, constant enumeration delay, constant update time}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.20

A Formal Language Perspective on Factorized Representations

Authors: Benny Kimelfeld, Wim Martens, and Matthias Niewerth

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Factorized representations (FRs) are a well-known tool to succinctly represent results of join queries and have been originally defined using the named database perspective. We define FRs in the unnamed database perspective and use them to establish several new connections. First, unnamed FRs can be exponentially more succinct than named FRs, but this difference can be alleviated by imposing a disjointness condition on columns. Conversely, named FRs can also be exponentially more succinct than unnamed FRs. Second, unnamed FRs are the same as (i.e., isomorphic to) context-free grammars for languages in which each word has the same length. This tight connection allows us to transfer a wide range of results on context-free grammars to database factorization; of which we offer a selection in the paper. Third, when we generalize unnamed FRs to arbitrary sets of tuples, they become a generalization of path multiset representations, a formalism that was recently introduced to succinctly represent sets of paths in the context of graph database query evaluation.

Cite as

Benny Kimelfeld, Wim Martens, and Matthias Niewerth. A Formal Language Perspective on Factorized Representations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 20:1-20:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kimelfeld_et_al:LIPIcs.ICDT.2025.20,
  author =	{Kimelfeld, Benny and Martens, Wim and Niewerth, Matthias},
  title =	{{A Formal Language Perspective on Factorized Representations}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{20:1--20:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.20},
  URN =		{urn:nbn:de:0030-drops-229614},
  doi =		{10.4230/LIPIcs.ICDT.2025.20},
  annote =	{Keywords: Databases, relational databases, graph databases, factorized databases, regular path queries, compact representations}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.18

A Framework for Extraction and Transformation of Documents

Authors: Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

We present a theoretical framework for the extraction and transformation of text documents as a two-phase process: The first phase uses document spanners to extract information from the input document. The second phase transforms the extracted information into a suitable output. To support several reasonable extract-transform scenarios, we propose for the first phase an extension of document spanners from span-tuples to so-called multispan-tuples, where variables are mapped to sets of spans instead of only single spans. We focus on multispanners described by regex formulas, and we prove that these have the same desirable properties as standard regular spanners. To formalize the second phase, we consider transformations that map every pair document-tuple, where each tuple comes from the (multi)span-relation extracted in the first phase, into a new output document. The specification of the two phases is what we call an extract-transform (ET) program, which covers practically relevant extract-transform tasks. In this paper, our main technical goal is to identify a broad class of ET programs that can be evaluated efficiently. We specifically focus on the scenario of regular ET programs: the extraction phase is given by a regex multispanner and the transformation phase is given by a regular string-to-string function. We show that for any regular ET program, given an input document, we can enumerate all final output documents with output-linear delay after linear preprocessing. As a side effect, we characterize the expressive power of regular ET programs and also show that they have desirable properties, like being closed under composition.

Cite as

Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt. A Framework for Extraction and Transformation of Documents. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{riveros_et_al:LIPIcs.ICDT.2025.18,
  author =	{Riveros, Cristian and Schmid, Markus L. and Schweikardt, Nicole},
  title =	{{A Framework for Extraction and Transformation of Documents}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.18},
  URN =		{urn:nbn:de:0030-drops-229593},
  doi =		{10.4230/LIPIcs.ICDT.2025.18},
  annote =	{Keywords: Information extraction, Document spanners, Transducers, Query evaluation}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.26

Dynamic Direct Access of MSO Query Evaluation over Strings

Authors: Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

We study the problem of evaluating a Monadic Second Order (MSO) query over strings under updates in the setting of direct access. We present an algorithm that, given an MSO query with first-order free variables represented by an unambiguous variable-set automaton 𝒜 with state set Q and variables X and a string s, computes a data structure in time 𝒪(|Q|^ω⋅ |X|² ⋅ |s|) and, then, given an index i retrieves, using the data structure, the i-th output of the evaluation of 𝒜 over s in time 𝒪(|Q|^ω ⋅ |X|³ ⋅ log(|s|)²) where ω is the exponent for matrix multiplication. Ours is the first efficient direct access algorithm for MSO query evaluation over strings; such algorithms so far had only been studied for first-order queries and conjunctive queries over relational data. Our algorithm gives the answers in lexicographic order where, in contrast to the setting of conjunctive queries, the order between variables can be freely chosen by the user without degrading the runtime. Moreover, our data structure can be updated efficiently after changes to the input string, allowing more powerful updates than in the enumeration literature, e.g. efficient deletion of substrings, concatenation and splitting of strings, and cut-and-paste operations. Our approach combines a matrix representation of MSO queries and a novel data structure for dynamic word problems over semi-groups which yields an overall algorithm that is elegant and easy to formulate.

Cite as

Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros. Dynamic Direct Access of MSO Query Evaluation over Strings. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 26:1-26:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bourhis_et_al:LIPIcs.ICDT.2025.26,
  author =	{Bourhis, Pierre and Capelli, Florent and Mengel, Stefan and Riveros, Cristian},
  title =	{{Dynamic Direct Access of MSO Query Evaluation over Strings}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{26:1--26:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.26},
  URN =		{urn:nbn:de:0030-drops-229675},
  doi =		{10.4230/LIPIcs.ICDT.2025.26},
  annote =	{Keywords: Query evaluation, direct access, MSO queries}
}

Document

DOI: 10.4230/OASIcs.Tannen.4

Annotation and More Annotation: Some Problems Posed by (and to) Val Tannen

Authors: Peter Buneman and Stijn Vansummeren

Published in: OASIcs, Volume 119, The Provenance of Elegance in Computation - Essays Dedicated to Val Tannen (2024)

Abstract

Among the many research accomplishments of Val Tannen, his work on provenance and semirings is probably the most widely known. In this paper, we discuss questions that arise when applying this general framework to the setting of curated databases, and in particular the setting where we can have multiple annotations on the same data, as well as annotations on annotations.

Cite as

Peter Buneman and Stijn Vansummeren. Annotation and More Annotation: Some Problems Posed by (and to) Val Tannen. In The Provenance of Elegance in Computation - Essays Dedicated to Val Tannen. Open Access Series in Informatics (OASIcs), Volume 119, pp. 4:1-4:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{buneman_et_al:OASIcs.Tannen.4,
  author =	{Buneman, Peter and Vansummeren, Stijn},
  title =	{{Annotation and More Annotation: Some Problems Posed by (and to) Val Tannen}},
  booktitle =	{The Provenance of Elegance in Computation - Essays Dedicated to Val Tannen},
  pages =	{4:1--4:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-320-1},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{119},
  editor =	{Amarilli, Antoine and Deutsch, Alin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Tannen.4},
  URN =		{urn:nbn:de:0030-drops-201007},
  doi =		{10.4230/OASIcs.Tannen.4},
  annote =	{Keywords: Annotation, provenance, semiring, curated data}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.12

Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility

Authors: Thomas Muñoz Serrano, Cristian Riveros, and Stijn Vansummeren

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Due to the importance of linear algebra and matrix operations in data analytics, there is significant interest in using relational query optimization and processing techniques for evaluating (sparse) linear algebra programs. In particular, in recent years close connections have been established between linear algebra programs and relational algebra that allow transferring optimization techniques of the latter to the former. In this paper, we ask ourselves which linear algebra programs in MATLANG correspond to the free-connex and q-hierarchical fragments of conjunctive first-order logic. Both fragments have desirable query processing properties: free-connex conjunctive queries support constant-delay enumeration after a linear-time preprocessing phase, and q-hierarchical conjunctive queries further allow constant-time updates. By characterizing the corresponding fragments of MATLANG, we hence identify the fragments of linear algebra programs that one can evaluate with constant-delay enumeration after linear-time preprocessing and with constant-time updates. To derive our results, we improve and generalize previous correspondences between MATLANG and relational algebra evaluated over semiring-annotated relations. In addition, we identify properties on semirings that allow to generalize the complexity bounds for free-connex and q-hierarchical conjunctive queries from Boolean annotations to general semirings.

Cite as

Thomas Muñoz Serrano, Cristian Riveros, and Stijn Vansummeren. Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{munozserrano_et_al:LIPIcs.ICDT.2024.12,
  author =	{Mu\~{n}oz Serrano, Thomas and Riveros, Cristian and Vansummeren, Stijn},
  title =	{{Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.12},
  URN =		{urn:nbn:de:0030-drops-197946},
  doi =		{10.4230/LIPIcs.ICDT.2024.12},
  annote =	{Keywords: Query evaluation, conjunctive queries, linear algebra, enumeration algorithms}
}

Document

Survey

DOI: 10.4230/TGDK.1.1.12

Structural Summarization of Semantic Graphs Using Quotients

Authors: Ansgar Scherp, David Richerby, Till Blume, Michael Cochez, and Jannik Rau

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

Graph summarization is the process of computing a compact version of an input graph while preserving chosen features of its structure. We consider semantic graphs where the features include edge labels and label sets associated with a vertex. Graph summaries are typically much smaller than the original graph. Applications that depend on the preserved features can perform their tasks on the summary, but much faster or with less memory overhead, while producing the same outcome as if they were applied on the original graph. In this survey, we focus on structural summaries based on quotients that organize vertices in equivalence classes of shared features. Structural summaries are particularly popular for semantic graphs and have the advantage of defining a precise graph-based output. We consider approaches and algorithms for both static and temporal graphs. A common example of quotient-based structural summaries is bisimulation, and we discuss this in detail. While there exist other surveys on graph summarization, to the best of our knowledge, we are the first to bring in a focused discussion on quotients, bisimulation, and their relation. Furthermore, structural summarization naturally connects well with formal logic due to the discrete structures considered. We complete the survey with a brief description of approaches beyond structural summaries.

Cite as

Ansgar Scherp, David Richerby, Till Blume, Michael Cochez, and Jannik Rau. Structural Summarization of Semantic Graphs Using Quotients. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 12:1-12:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{scherp_et_al:TGDK.1.1.12,
  author =	{Scherp, Ansgar and Richerby, David and Blume, Till and Cochez, Michael and Rau, Jannik},
  title =	{{Structural Summarization of Semantic Graphs Using Quotients}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{12:1--12:25},
  ISSN =	{2942-7517},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.12},
  URN =		{urn:nbn:de:0030-drops-194862},
  doi =		{10.4230/TGDK.1.1.12},
  annote =	{Keywords: graph summarization, quotients, stratified bisimulation}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.TIME.2022.3

Getting to the CORE of Complex Event Recognition (Invited Talk)

Authors: Stijn Vansummeren

Published in: LIPIcs, Volume 247, 29th International Symposium on Temporal Representation and Reasoning (TIME 2022)

Abstract

In this talk, I will give an overview of our recent work on complex event recognition.

Cite as

Stijn Vansummeren. Getting to the CORE of Complex Event Recognition (Invited Talk). In 29th International Symposium on Temporal Representation and Reasoning (TIME 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 247, pp. 3:1-3:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{vansummeren:LIPIcs.TIME.2022.3,
  author =	{Vansummeren, Stijn},
  title =	{{Getting to the CORE of Complex Event Recognition}},
  booktitle =	{29th International Symposium on Temporal Representation and Reasoning (TIME 2022)},
  pages =	{3:1--3:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-262-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{247},
  editor =	{Artikis, Alexander and Posenato, Roberto and Tonetta, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.TIME.2022.3},
  URN =		{urn:nbn:de:0030-drops-172503},
  doi =		{10.4230/LIPIcs.TIME.2022.3},
  annote =	{Keywords: Complex Event Recognition, automata, enumeration-based query processing}
}

Document

DOI: 10.4230/DagRep.10.2.19

Foundations of Composite Event Recognition (Dagstuhl Seminar 20071)

Authors: Alexander Artikis, Thomas Eiter, Alessandro Margara, and Stijn Vansummeren

Published in: Dagstuhl Reports, Volume 10, Issue 2 (2020)

Abstract

Composite Event Recognition (CER) refers to the activity of detecting patterns in streams of continuously arriving "event" data over, possibly geographically, distributed sources. CER is key in Big Data applications that require the processing of such event streams to obtain timely insights and to implement reactive and proactive measures. Examples include the recognition of emerging stories and trends on the Social Web, traffic and transport incidents in smart cities, and epidemic spread. Numerous CER languages have been proposed in the literature. While these systems have a common goal, they differ in their data models, pattern languages and processing mechanisms, resulting in heterogeneous implementations with fundamentally different capabilities. Moreover, we lack a common understanding of the trade-offs between expressiveness and complexity, and a theory for comparing the fundamental capabilities of CER systems. As such, CER frameworks are difficult to understand, extend and generalise. It is unclear which of the proposed approaches better meets the requirements of a given application. Furthermore, the lack of foundations makes it hard to leverage established results - from automata theory, temporal logics, etc - thus hindering scientific and technological progress in CER. The objective of the seminar was to bring together researchers and practitioners working in Databases, Distributed Systems, Automata Theory, Logic and Stream Reasoning; disseminate the recent foundational results across these fields; establish new research collaborations among these fields; thereby start making progress towards formulating such foundations.

Cite as

Alexander Artikis, Thomas Eiter, Alessandro Margara, and Stijn Vansummeren. Foundations of Composite Event Recognition (Dagstuhl Seminar 20071). In Dagstuhl Reports, Volume 10, Issue 2, pp. 19-49, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@Article{artikis_et_al:DagRep.10.2.19,
  author =	{Artikis, Alexander and Eiter, Thomas and Margara, Alessandro and Vansummeren, Stijn},
  title =	{{Foundations of Composite Event Recognition (Dagstuhl Seminar 20071)}},
  pages =	{19--49},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2020},
  volume =	{10},
  number =	{2},
  editor =	{Artikis, Alexander and Eiter, Thomas and Margara, Alessandro and Vansummeren, Stijn},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.10.2.19},
  URN =		{urn:nbn:de:0030-drops-130587},
  doi =		{10.4230/DagRep.10.2.19},
  annote =	{Keywords: complex event processing, event algebra, pattern matching, stream reasoning, temporal reasoning}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2020.15

On the Expressiveness of Languages for Complex Event Recognition

Authors: Alejandro Grez, Cristian Riveros, Martín Ugarte, and Stijn Vansummeren

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)

Abstract

Complex Event Recognition (CER for short) has recently gained attention as a mechanism for detecting patterns in streams of continuously arriving event data. Numerous CER systems and languages have been proposed in the literature, commonly based on combining operations from regular expressions (sequencing, iteration, and disjunction) and relational algebra (e.g., joins and filters). While these languages are naturally first-order, meaning that variables can only bind single elements, they also provide capabilities for filtering sets of events that occur inside iterative patterns; for example requiring sequences of numbers to be increasing. Unfortunately, these type of filters usually present ad-hoc syntax and under-defined semantics, precisely because variables cannot bind sets of events. As a result, CER languages that provide filtering of sequences commonly lack rigorous semantics and their expressive power is not understood. In this paper we embark on two tasks: First, to define a denotational semantics for CER that naturally allows to bind and filter sets of events; and second, to compare the expressive power of this semantics with that of CER languages that only allow for binding single events. Concretely, we introduce Set-Oriented Complex Event Logic (SO-CEL for short), a variation of the CER language introduced in [Grez et al., 2019] in which all variables bind to sets of matched events. We then compare SO-CEL with CEL, the CER language of [Grez et al., 2019] where variables bind single events. We show that they are equivalent in expressive power when restricted to unary predicates but, surprisingly, incomparable in general. Nevertheless, we show that if we restrict to sets of binary predicates, then SO-CEL is strictly more expressive than CEL. To get a better understanding of the expressive power, computational capabilities, and limitations of SO-CEL, we also investigate the relationship between SO-CEL and Complex Event Automata (CEA), a natural computational model for CER languages. We define a property on CEA called the *-property and show that, under unary predicates, SO-CEL captures precisely the subclass of CEA that satisfy this property. Finally, we identify the operations that SO-CEL is lacking to characterize CEA and introduce a natural extension of the language that captures the complete class of CEA under unary predicates.

Cite as

Alejandro Grez, Cristian Riveros, Martín Ugarte, and Stijn Vansummeren. On the Expressiveness of Languages for Complex Event Recognition. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{grez_et_al:LIPIcs.ICDT.2020.15,
  author =	{Grez, Alejandro and Riveros, Cristian and Ugarte, Mart{\'\i}n and Vansummeren, Stijn},
  title =	{{On the Expressiveness of Languages for Complex Event Recognition}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{15:1--15:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.15},
  URN =		{urn:nbn:de:0030-drops-119390},
  doi =		{10.4230/LIPIcs.ICDT.2020.15},
  annote =	{Keywords: Query languages, Complex Event Recognition, Logics, Automata theory}
}

Document

DOI: 10.4230/DagRep.2.2.84

Principles of Provenance (Dagstuhl Seminar 12091)

Authors: James Cheney, Anthony Finkelstein, Bertram Ludaescher, and Stijn Vansummeren

Published in: Dagstuhl Reports, Volume 2, Issue 2 (2012)

Abstract

This report documents the program and the outcomes of Dagstuhl Seminar 12091 ``Principles of Provenance''. The term ``provenance'' refers to information about the origin, context, derivation, ownership or history of some artifact. In both art and science, provenance information is crucial for establishing the value of a real-world artifact, guaranteeing for example that the artifact is an original work produced by an important artist, or that a stated scientific conclusion is reproducible. Since it is much easier to copy or alter digital information than it is to copy or alter real-world artifacts, the need for tracking and management of provenance information to testify the value and correctness of digital information has been firmly established in the last few years. As a result, provenance tracking and management has been studied in many settings, ranging from databases, scientific workflows, business process modeling, and security to social networking and the Semantic Web, but with relatively few interaction between these areas. This Dagstuhl seminar has focused on bringing together researchers from the above and other areas to identify the commonalities and differences of dealing with provenance; improve the mutual understanding of these communities; and identify main areas for further foundational provenance research.

Cite as

James Cheney, Anthony Finkelstein, Bertram Ludaescher, and Stijn Vansummeren. Principles of Provenance (Dagstuhl Seminar 12091). In Dagstuhl Reports, Volume 2, Issue 2, pp. 84-113, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)

Copy BibTex To Clipboard

@Article{cheney_et_al:DagRep.2.2.84,
  author =	{Cheney, James and Finkelstein, Anthony and Ludaescher, Bertram and Vansummeren, Stijn},
  title =	{{Principles of Provenance (Dagstuhl Seminar 12091)}},
  pages =	{84--113},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2012},
  volume =	{2},
  number =	{2},
  editor =	{Cheney, James and Finkelstein, Anthony and Ludaescher, Bertram and Vansummeren, Stijn},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.2.2.84},
  URN =		{urn:nbn:de:0030-drops-35073},
  doi =		{10.4230/DagRep.2.2.84},
  annote =	{Keywords: Provenance, Lineage, Metadata, Trust, Repeatability, Accountability}
}

14 Search Results for "Vansummeren, Stijn"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message