LIPIcs, Volume 155

23rd International Conference on Database Theory (ICDT 2020)



Thumbnail PDF

Event

ICDT 2020, March 30 to April 2, 2020, Copenhagen, Denmark

Editors

Carsten Lutz
  • University of Bremen, Germany
Jean Christoph Jung
  • University of Bremen, Germany

Publication Details

  • published at: 2020-03-11
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-139-9
  • DBLP: db/conf/icdt/icdt2020

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Vol. 155, ICDT 2020, Complete Volume

Authors: Carsten Lutz and Jean Christoph Jung


Abstract
LIPIcs, Vol. 155, ICDT 2020, Complete Volume

Cite as

23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 1-454, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@Proceedings{lutz_et_al:LIPIcs.ICDT.2020,
  title =	{{LIPIcs, Vol. 155, ICDT 2020, Complete Volume}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{1--454},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020},
  URN =		{urn:nbn:de:0030-drops-119239},
  doi =		{10.4230/LIPIcs.ICDT.2020},
  annote =	{Keywords: LIPIcs, Vol. 155, ICDT 2020, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Carsten Lutz and Jean Christoph Jung


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{lutz_et_al:LIPIcs.ICDT.2020.0,
  author =	{Lutz, Carsten and Jung, Jean Christoph},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.0},
  URN =		{urn:nbn:de:0030-drops-119244},
  doi =		{10.4230/LIPIcs.ICDT.2020.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Facets of Probabilistic Databases (Invited Talk)

Authors: Benny Kimelfeld


Abstract
Probabilistic databases are commonly known in the form of the tuple-independent model, where the validity of every tuple is an independent random event. Conceptually, the notion is more general, as a probabilistic database refers to any probability distribution over ordinary databases. A central computational problem is that of marginal inference for database queries: what is the probability that a given tuple is a query answer? In this talk, I will discuss recent developments in several research directions that, collectively, position probabilistic databases as the common and natural foundation of various challenges at the core of data analytics. Examples include reasoning about uncertain preferences from conventional distributions such as the Mallows model, data cleaning and repairing in probabilistic paradigms such as the HoloClean system, and the explanation of query answers through concepts from cooperative game theory such as the Shapley value and the Banzhaf Power Index. While these challenges manifest different facets of probabilistic databases, I will show how they interrelate and, moreover, how they relate to the basic theory of inference over tuple-independent databases.

Cite as

Benny Kimelfeld. Facets of Probabilistic Databases (Invited Talk). In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{kimelfeld:LIPIcs.ICDT.2020.1,
  author =	{Kimelfeld, Benny},
  title =	{{Facets of Probabilistic Databases}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.1},
  URN =		{urn:nbn:de:0030-drops-119258},
  doi =		{10.4230/LIPIcs.ICDT.2020.1},
  annote =	{Keywords: Probabilistic databases, data cleaning, preference models, Shapley value}
}
Document
Invited Talk
What Makes a Variant of Query Determinacy (Un)Decidable? (Invited Talk)

Authors: Jerzy Marcinkowski


Abstract
This paper was written as the companion paper of the ICDT 2020 invited tutorial. Query determinacy is a broad topic, with literally hundreds of papers published since late 1980s. This paper is not going to be a "survey" but rather a personal perspective of a person somehow involved in the recent developments in the area. First I explain how, in the last 30+ years, the question of determinacy was formalized. There are many parameters here: obviously one needs to choose the query language of the available views and the query language of the query itself. But - surprisingly - there is also some choice regarding what the word "to compute" actually means in this context. Then I concentrate on certain variants of the decision problem of determinacy (for each choice of parameters there is one such problem) and explain how I understand the mechanisms rendering such variants of determinacy decidable or undecidable. This is on a rather informal level. No really new theorems are presented, but I show some improvements of existing theorems and also simplified proofs of some of the earlier results.

Cite as

Jerzy Marcinkowski. What Makes a Variant of Query Determinacy (Un)Decidable? (Invited Talk). In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 2:1-2:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{marcinkowski:LIPIcs.ICDT.2020.2,
  author =	{Marcinkowski, Jerzy},
  title =	{{What Makes a Variant of Query Determinacy (Un)Decidable?}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{2:1--2:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.2},
  URN =		{urn:nbn:de:0030-drops-119265},
  doi =		{10.4230/LIPIcs.ICDT.2020.2},
  annote =	{Keywords: database theory, query, view, determinacy}
}
Document
Invited Talk
Current Challenges in Graph Databases (Invited Talk)

Authors: Juan L. Reutter


Abstract
As graph databases grow in popularity, decades of work in graph query languages and models are materialising in industry standards and in the construction of new graph database systems. However, this surge in graph systems has in turn opened up a series of new, interesting research problems related to graph databases. Our first set of problems has to do with more efficient ways of computing the answers of graph queries, specifically graph patterns, path queries, and combinations between them. Traditionally, researchers in graph databases have pointed out that relational systems are ill-equipped to process these types of queries, and if one looks at the performance of native graph database systems, there is clearly a lot of room for improvement. The talk focuses on two possible directions for improving the state of the art in graph query processing. The first is implementing worst-case optimal algorithms for processing graph patterns that traduce in relational queries with several joins. Some advances are already in development (see e.g. Nguyen, Dung, et al. "Join processing for graph patterns: An old dog with new tricks." GRADES'15. or Hogan, Aidan, et al. "A Worst-Case Optimal Join Algorithm for SPARQL." ISWC’19.), but we are still far from a full fledged solution: most algorithms require complex data structures, or need further support in terms of heuristics to select an order in which joins are processed. Second, we need to understand what is the best way of evaluating path queries (that is, finding all pairs of nodes connected by a path), in such a way that these results can be further integrated with other query results in a graph system pipeline. We already have complexity results regarding path computation and enumeration for different semantics of path queries (see e.g. Martens, Wim, and Tina Trautner. "Evaluation and enumeration problems for regular path queries." ICDT'18. or Bagan, Guillaume, Angela Bonifati, and Benoit Groz. "A trichotomy for regular simple path queries on graphs." PODS'13.), but still very little is known in terms of optimal processing of path queries when inside a tractable fragment. Our second set of problems is related to graph analytics, one of the current selling points of graph databases. Systems should be able to run more complex analytical queries involving tasks such as more complex path finding, centrality or clustering. It is also important to be able to run these algorithms not over native graphs, but perhaps over a certain set of nodes or edges previously selected by a graph query, and one may also want to pose further queries over the result of the analytics task. Finally, all of this should be done in an efficient way, specially in the prospect that graph databases may contain a huge amount of nodes. In this talk I will discuss possible approaches to perform these operations, covering aspects from the design of languages for graph analytics to efficient ways of processing them, and also comparing the expressive power of graph analytics solutions with other forms of graph computation.

Cite as

Juan L. Reutter. Current Challenges in Graph Databases (Invited Talk). In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{reutter:LIPIcs.ICDT.2020.3,
  author =	{Reutter, Juan L.},
  title =	{{Current Challenges in Graph Databases}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.3},
  URN =		{urn:nbn:de:0030-drops-119272},
  doi =		{10.4230/LIPIcs.ICDT.2020.3},
  annote =	{Keywords: Graph databases, Join algorithms, path queries, graph analytics}
}
Document
Executable First-Order Queries in the Logic of Information Flows

Authors: Heba Aamer, Bart Bogaerts, Dimitri Surinx, Eugenia Ternovska, and Jan Van den Bussche


Abstract
The logic of information flows (LIF) has recently been proposed as a general framework in the field of knowledge representation. In this framework, tasks of a procedural nature can still be modeled in a declarative, logic-based fashion. In this paper, we focus on the task of query processing under limited access patterns, a well-studied problem in the database literature. We show that LIF is well-suited for modeling this task. Toward this goal, we introduce a variant of LIF called "forward" LIF, in a first-order setting. We define FLIF^io, a syntactical fragment of forward LIF, and show that it corresponds exactly to the "executable" fragment of first-order logic defined by Nash and Ludäscher. The definition of FLIF^io involves a classification of the free variables of an expression into "input" and "output" variables. Our result hinges on inertia and determinacy laws for forward LIF expressions, which are interesting in their own right. These laws are formulated in terms of the input and output variables.

Cite as

Heba Aamer, Bart Bogaerts, Dimitri Surinx, Eugenia Ternovska, and Jan Van den Bussche. Executable First-Order Queries in the Logic of Information Flows. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 4:1-4:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{aamer_et_al:LIPIcs.ICDT.2020.4,
  author =	{Aamer, Heba and Bogaerts, Bart and Surinx, Dimitri and Ternovska, Eugenia and Van den Bussche, Jan},
  title =	{{Executable First-Order Queries in the Logic of Information Flows}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{4:1--4:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.4},
  URN =		{urn:nbn:de:0030-drops-119284},
  doi =		{10.4230/LIPIcs.ICDT.2020.4},
  annote =	{Keywords: Logic of Information Flows, Limited access pattern, Executable first-order logic}
}
Document
A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs

Authors: Antoine Amarilli and İsmail İlkan Ceylan


Abstract
We study the problem of probabilistic query evaluation (PQE) over probabilistic graphs, namely, tuple-independent probabilistic databases (TIDs) on signatures of arity two. Our focus is the class of queries that is closed under homomorphisms, or equivalently, the infinite unions of conjunctive queries, denoted UCQ∞. Our main result states that all unbounded queries in UCQ∞ are #P-hard for PQE. As bounded queries in UCQ∞ are already classified by the dichotomy of Dalvi and Suciu [Dalvi and Suciu, 2012], our results and theirs imply a complete dichotomy on PQE for UCQ∞ queries over probabilistic graphs. This dichotomy covers in particular all fragments in UCQ∞ such as negation-free (disjunctive) Datalog, regular path queries, and a large class of ontology-mediated queries on arity-two signatures. Our result is shown by reducing from counting the valuations of positive partitioned 2-DNF formulae (#PP2DNF) for some queries, or from the source-to-target reliability problem in an undirected graph (#U-ST-CON) for other queries, depending on properties of minimal models.

Cite as

Antoine Amarilli and İsmail İlkan Ceylan. A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 5:1-5:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2020.5,
  author =	{Amarilli, Antoine and Ceylan, \.{I}smail \.{I}lkan},
  title =	{{A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{5:1--5:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.5},
  URN =		{urn:nbn:de:0030-drops-119295},
  doi =		{10.4230/LIPIcs.ICDT.2020.5},
  annote =	{Keywords: Tuple-independent database, #P-hardness, recursive queries, homomorphism-closed queries}
}
Document
On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

Authors: Pablo Barceló, Nelson Higuera, Jorge Pérez, and Bernardo Subercaseaux


Abstract
We study the expressive power of the Lara language - a recently proposed unified model for expressing relational and linear algebra operations - both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. We start by showing Lara to be expressive complete with respect to first-order logic with aggregation. Since Lara is parameterized by a set of user-defined functions which allow to transform values in tables, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.

Cite as

Pablo Barceló, Nelson Higuera, Jorge Pérez, and Bernardo Subercaseaux. On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICDT.2020.6,
  author =	{Barcel\'{o}, Pablo and Higuera, Nelson and P\'{e}rez, Jorge and Subercaseaux, Bernardo},
  title =	{{On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{6:1--6:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.6},
  URN =		{urn:nbn:de:0030-drops-119305},
  doi =		{10.4230/LIPIcs.ICDT.2020.6},
  annote =	{Keywords: languages for linear and relational algebra, expressive power, first order logic with aggregation, matrix convolution, matrix inverse, query genericity, locality of queries, safety}
}
Document
Random Sampling and Size Estimation Over Cyclic Joins

Authors: Yu Chen and Ke Yi


Abstract
Computing joins is expensive, and often unnecessary when the output size is large. In 1999, Chaudhuri et al. [Surajit Chaudhuri et al., 1999] posed the problem of random sampling over joins as a potentially effective approach to avoiding computing the join in full, while obtaining important statistical information about the join results. Unfortunately, no significant progress has been made in the last 20 years, except for the case of acyclic joins. In this paper, we present the first non-trivial result on sampling over cyclic joins. We show that after a linear-time preprocessing step, a join result can be drawn uniformly at random in expected time O(IN^ρ/OUT), where IN^ρ is known as the AGM bound of the join and OUT is its output size. This result holds for all joins on binary relations, as well as certain joins on relations of higher arity. We further show how this algorithm immediately leads to a join size estimation algorithm with the same running time.

Cite as

Yu Chen and Ke Yi. Random Sampling and Size Estimation Over Cyclic Joins. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICDT.2020.7,
  author =	{Chen, Yu and Yi, Ke},
  title =	{{Random Sampling and Size Estimation Over Cyclic Joins}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.7},
  URN =		{urn:nbn:de:0030-drops-119315},
  doi =		{10.4230/LIPIcs.ICDT.2020.7},
  annote =	{Keywords: Random sampling, joins, join size estimation}
}
Document
Weight Annotation in Information Extraction

Authors: Johannes Doleschal, Benny Kimelfeld, Wim Martens, and Liat Peterfreund


Abstract
The framework of document spanners abstracts the task of information extraction from text as a function that maps every document (a string) into a relation over the document’s spans (intervals identified by their start and end indices). For instance, the regular spanners are the closure under the Relational Algebra (RA) of the regular expressions with capture variables, and the expressive power of the regular spanners is precisely captured by the class of vset-automata - a restricted class of transducers that mark the endpoints of selected spans. In this work, we embark on the investigation of document spanners that can annotate extractions with auxiliary information such as confidence, support, and confidentiality measures. To this end, we adopt the abstraction of provenance semirings by Green et al., where tuples of a relation are annotated with the elements of a commutative semiring, and where the annotation propagates through the (positive) RA operators via the semiring operators. Hence, the proposed spanner extension, referred to as an annotator, maps every string into an annotated relation over the spans. As a specific instantiation, we explore weighted vset-automata that, similarly to weighted automata and transducers, attach semiring elements to transitions. We investigate key aspects of expressiveness, such as the closure under the positive RA, and key aspects of computational complexity, such as the enumeration of annotated answers and their ranked enumeration in the case of numeric semirings. For a number of these problems, fundamental properties of the underlying semiring, such as positivity, are crucial for establishing tractability.

Cite as

Johannes Doleschal, Benny Kimelfeld, Wim Martens, and Liat Peterfreund. Weight Annotation in Information Extraction. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{doleschal_et_al:LIPIcs.ICDT.2020.8,
  author =	{Doleschal, Johannes and Kimelfeld, Benny and Martens, Wim and Peterfreund, Liat},
  title =	{{Weight Annotation in Information Extraction}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.8},
  URN =		{urn:nbn:de:0030-drops-119325},
  doi =		{10.4230/LIPIcs.ICDT.2020.8},
  annote =	{Keywords: Information extraction, regular document spanners, weighted automata, provenance semirings, K-relations}
}
Document
Containment of UC2RPQ: The Hard and Easy Cases

Authors: Diego Figueira


Abstract
We study the containment problem for UC2RPQ, that is, two-way Regular Path Queries, closed under conjunction, projection and union. We show a dichotomy property between PSpace-c and ExpSpace-c based on a property on the underlying graph of queries. We show that for any class C of graphs, the containment problem for queries whose underlying graph is in C is in PSpace if and only if C has bounded bridgewidth. Bridgewidth is a graph measure we introduce to this end, defined as the maximum size of a minimal edge separator of a graph.

Cite as

Diego Figueira. Containment of UC2RPQ: The Hard and Easy Cases. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 9:1-9:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{figueira:LIPIcs.ICDT.2020.9,
  author =	{Figueira, Diego},
  title =	{{Containment of UC2RPQ: The Hard and Easy Cases}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{9:1--9:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.9},
  URN =		{urn:nbn:de:0030-drops-119330},
  doi =		{10.4230/LIPIcs.ICDT.2020.9},
  annote =	{Keywords: Regular Path Queries (RPQ), 2RPQ, CRPQ, C2RPQ, UC2RPQ, graph databases, containment, inclusion, equivalence, dichotomy, graph measure, bridge-width (bridgewidth), minimal edge separator, minimal cut-set, max-cut, tree-width (treewidth)}
}
Document
On Equivalence and Cores for Incomplete Databases in Open and Closed Worlds

Authors: Henrik Forssell, Evgeny Kharlamov, and Evgenij Thorstensen


Abstract
Data exchange heavily relies on the notion of incomplete database instances. Several semantics for such instances have been proposed and include open (OWA), closed (CWA), and open-closed (OCWA) world. For all these semantics important questions are: whether one incomplete instance semantically implies another; when two are semantically equivalent; and whether a smaller or smallest semantically equivalent instance exists. For OWA and CWA these questions are fully answered. For several variants of OCWA, however, they remain open. In this work we adress these questions for Closed Powerset semantics and the OCWA semantics of [Leonid Libkin and Cristina Sirangelo, 2011]. We define a new OCWA semantics, called OCWA*, in terms of homomorphic covers that subsumes both semantics, and characterize semantic implication and equivalence in terms of such covers. This characterization yields a guess-and-check algorithm to decide equivalence, and shows that the problem is NP-complete. For the minimization problem we show that for several common notions of minimality there is in general no unique minimal equivalent instance for Closed Powerset semantics, and consequently not for the more expressive OCWA* either. However, for Closed Powerset semantics we show that one can find, for any incomplete database, a unique finite set of its subinstances which are subinstances (up to renaming of nulls) of all instances semantically equivalent to the original incomplete one. We study properties of this set, and extend the analysis to OCWA*.

Cite as

Henrik Forssell, Evgeny Kharlamov, and Evgenij Thorstensen. On Equivalence and Cores for Incomplete Databases in Open and Closed Worlds. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 10:1-10:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{forssell_et_al:LIPIcs.ICDT.2020.10,
  author =	{Forssell, Henrik and Kharlamov, Evgeny and Thorstensen, Evgenij},
  title =	{{On Equivalence and Cores for Incomplete Databases in Open and Closed Worlds}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{10:1--10:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.10},
  URN =		{urn:nbn:de:0030-drops-119343},
  doi =		{10.4230/LIPIcs.ICDT.2020.10},
  annote =	{Keywords: Incomplete Databases, Cores, Semantics, Open and Closed Worlds}
}
Document
Dynamic Complexity of Document Spanners

Authors: Dominik D. Freydenberger and Sam M. Thompson


Abstract
The present paper investigates the dynamic complexity of document spanners, a formal framework for information extraction introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (JACM 2015). We first look at the class of regular spanners and prove that any regular spanner can be maintained in the dynamic complexity class DynPROP. This result follows from work done previously on the dynamic complexity of formal languages by Gelade, Marquardt, and Schwentick (TOCL 2012). To investigate core spanners we use SpLog, a concatenation logic that exactly captures core spanners. We show that the dynamic complexity class DynCQ is more expressive than SpLog and therefore can maintain any core spanner. This result is then extended to show that DynFO can maintain any generalized core spanner and that DynFO is more powerful than SpLog with negation.

Cite as

Dominik D. Freydenberger and Sam M. Thompson. Dynamic Complexity of Document Spanners. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 11:1-11:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2020.11,
  author =	{Freydenberger, Dominik D. and Thompson, Sam M.},
  title =	{{Dynamic Complexity of Document Spanners}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{11:1--11:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.11},
  URN =		{urn:nbn:de:0030-drops-119355},
  doi =		{10.4230/LIPIcs.ICDT.2020.11},
  annote =	{Keywords: Document spanners, information extraction, dynamic complexity, descriptive complexity, word equations}
}
Document
When Can Matrix Query Languages Discern Matrices?

Authors: Floris Geerts


Abstract
We investigate when two graphs, represented by their adjacency matrices, can be distinguished by means of sentences formed in MATLANG, a matrix query language which supports a number of elementary linear algebra operators. When undirected graphs are concerned, and hence the adjacency matrices are real and symmetric, precise characterisations are in place when two graphs (i.e., their adjacency matrices) can be distinguished. Turning to directed graphs, one has to deal with asymmetric adjacency matrices. This complicates matters. Indeed, it requires to understand the more general problem of when two arbitrary matrices can be distinguished in MATLANG. We provide characterisations of the distinguishing power of MATLANG on real and complex matrices, and on adjacency matrices of directed graphs in particular. The proof techniques are a combination of insights from the symmetric matrix case and results from linear algebra and linear control theory.

Cite as

Floris Geerts. When Can Matrix Query Languages Discern Matrices?. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{geerts:LIPIcs.ICDT.2020.12,
  author =	{Geerts, Floris},
  title =	{{When Can Matrix Query Languages Discern Matrices?}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.12},
  URN =		{urn:nbn:de:0030-drops-119361},
  doi =		{10.4230/LIPIcs.ICDT.2020.12},
  annote =	{Keywords: matrix query languages, linear algebra, expressive power}
}
Document
Distribution Constraints: The Chase for Distributed Data

Authors: Gaetano Geck, Frank Neven, and Thomas Schwentick


Abstract
This paper introduces a declarative framework to specify and reason about distributions of data over computing nodes in a distributed setting. More specifically, it proposes distribution constraints which are tuple and equality generating dependencies (tgds and egds) extended with node variables ranging over computing nodes. In particular, they can express co-partitioning constraints and constraints about range-based data distributions by using comparison atoms. The main technical contribution is the study of the implication problem of distribution constraints. While implication is undecidable in general, relevant fragments of so-called data-full constraints are exhibited for which the corresponding implication problems are complete for EXPTIME, PSPACE and NP. These results yield bounds on deciding parallel-correctness for conjunctive queries in the presence of distribution constraints.

Cite as

Gaetano Geck, Frank Neven, and Thomas Schwentick. Distribution Constraints: The Chase for Distributed Data. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 13:1-13:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{geck_et_al:LIPIcs.ICDT.2020.13,
  author =	{Geck, Gaetano and Neven, Frank and Schwentick, Thomas},
  title =	{{Distribution Constraints: The Chase for Distributed Data}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{13:1--13:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.13},
  URN =		{urn:nbn:de:0030-drops-119378},
  doi =		{10.4230/LIPIcs.ICDT.2020.13},
  annote =	{Keywords: tuple-generating dependencies, chase, conjunctive queries, distributed evaluation}
}
Document
Towards Streaming Evaluation of Queries with Correlation in Complex Event Processing

Authors: Alejandro Grez and Cristian Riveros


Abstract
Complex event processing (CEP) has gained a lot of attention for evaluating complex patterns over high-throughput data streams. Recently, new algorithms for the evaluation of CEP patterns have emerged with strong guarantees of efficiency, i.e. constant update-time per tuple and constant-delay enumeration. Unfortunately, these techniques are restricted for patterns with local filters, limiting the possibility of using joins for correlating the data of events that are far apart. In this paper, we embark on the search for efficient evaluation algorithms of CEP patterns with joins. We start by formalizing the so-called partition-by operator, a standard operator in data stream management systems to correlate contiguous events on streams. Although this operator is a restricted version of a join query, we show that partition-by (without iteration) is equally expressive as hierarchical queries, the biggest class of full conjunctive queries that can be evaluated with constant update-time and constant-delay enumeration over streams. To evaluate queries with partition-by we introduce an automata model, called chain complex event automata (chain-CEA), an extension of complex event automata that can compare data values by using equalities and disequalities. We show that this model admits determinization and is expressive enough to capture queries with partition-by. More importantly, we provide an algorithm with constant update time and constant delay enumeration for evaluating any query definable by chain-CEA, showing that all CEP queries with partition-by can be evaluated with these strong guarantees of efficiency.

Cite as

Alejandro Grez and Cristian Riveros. Towards Streaming Evaluation of Queries with Correlation in Complex Event Processing. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{grez_et_al:LIPIcs.ICDT.2020.14,
  author =	{Grez, Alejandro and Riveros, Cristian},
  title =	{{Towards Streaming Evaluation of Queries with Correlation in Complex Event Processing}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.14},
  URN =		{urn:nbn:de:0030-drops-119389},
  doi =		{10.4230/LIPIcs.ICDT.2020.14},
  annote =	{Keywords: Complex event processing, Query languages, Correlation, Constant delay enumeration.}
}
Document
On the Expressiveness of Languages for Complex Event Recognition

Authors: Alejandro Grez, Cristian Riveros, Martín Ugarte, and Stijn Vansummeren


Abstract
Complex Event Recognition (CER for short) has recently gained attention as a mechanism for detecting patterns in streams of continuously arriving event data. Numerous CER systems and languages have been proposed in the literature, commonly based on combining operations from regular expressions (sequencing, iteration, and disjunction) and relational algebra (e.g., joins and filters). While these languages are naturally first-order, meaning that variables can only bind single elements, they also provide capabilities for filtering sets of events that occur inside iterative patterns; for example requiring sequences of numbers to be increasing. Unfortunately, these type of filters usually present ad-hoc syntax and under-defined semantics, precisely because variables cannot bind sets of events. As a result, CER languages that provide filtering of sequences commonly lack rigorous semantics and their expressive power is not understood. In this paper we embark on two tasks: First, to define a denotational semantics for CER that naturally allows to bind and filter sets of events; and second, to compare the expressive power of this semantics with that of CER languages that only allow for binding single events. Concretely, we introduce Set-Oriented Complex Event Logic (SO-CEL for short), a variation of the CER language introduced in [Grez et al., 2019] in which all variables bind to sets of matched events. We then compare SO-CEL with CEL, the CER language of [Grez et al., 2019] where variables bind single events. We show that they are equivalent in expressive power when restricted to unary predicates but, surprisingly, incomparable in general. Nevertheless, we show that if we restrict to sets of binary predicates, then SO-CEL is strictly more expressive than CEL. To get a better understanding of the expressive power, computational capabilities, and limitations of SO-CEL, we also investigate the relationship between SO-CEL and Complex Event Automata (CEA), a natural computational model for CER languages. We define a property on CEA called the *-property and show that, under unary predicates, SO-CEL captures precisely the subclass of CEA that satisfy this property. Finally, we identify the operations that SO-CEL is lacking to characterize CEA and introduce a natural extension of the language that captures the complete class of CEA under unary predicates.

Cite as

Alejandro Grez, Cristian Riveros, Martín Ugarte, and Stijn Vansummeren. On the Expressiveness of Languages for Complex Event Recognition. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{grez_et_al:LIPIcs.ICDT.2020.15,
  author =	{Grez, Alejandro and Riveros, Cristian and Ugarte, Mart{\'\i}n and Vansummeren, Stijn},
  title =	{{On the Expressiveness of Languages for Complex Event Recognition}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{15:1--15:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.15},
  URN =		{urn:nbn:de:0030-drops-119390},
  doi =		{10.4230/LIPIcs.ICDT.2020.15},
  annote =	{Keywords: Query languages, Complex Event Recognition, Logics, Automata theory}
}
Document
Infinite Probabilistic Databases

Authors: Martin Grohe and Peter Lindner


Abstract
Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open-world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries.

Cite as

Martin Grohe and Peter Lindner. Infinite Probabilistic Databases. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 16:1-16:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2020.16,
  author =	{Grohe, Martin and Lindner, Peter},
  title =	{{Infinite Probabilistic Databases}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{16:1--16:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.16},
  URN =		{urn:nbn:de:0030-drops-119400},
  doi =		{10.4230/LIPIcs.ICDT.2020.16},
  annote =	{Keywords: Probabilistic Databases, Possible Worlds Semantics, Query Measurability, Relational Algebra, Aggregate Queries}
}
Document
Coordination-Free Byzantine Replication with Minimal Communication Costs

Authors: Jelle Hellings and Mohammad Sadoghi


Abstract
State-of-the-art fault-tolerant and federated data management systems rely on fully-replicated designs in which all participants have equivalent roles. Consequently, these systems have only limited scalability and are ill-suited for high-performance data management. As an alternative, we propose a hierarchical design in which a Byzantine cluster manages data, while an arbitrary number of learners can reliable learn these updates and use the corresponding data. To realize our design, we propose the delayed-replication algorithm, an efficient solution to the Byzantine learner problem that is central to our design. The delayed-replication algorithm is coordination-free, scalable, and has minimal communication cost for all participants involved. In doing so, the delayed-broadcast algorithm opens the door to new high-performance fault-tolerant and federated data management systems. To illustrate this, we show that the delayed-replication algorithm is not only useful to support specialized learners, but can also be used to reduce the overall communication cost of permissioned blockchains and to improve their storage scalability.

Cite as

Jelle Hellings and Mohammad Sadoghi. Coordination-Free Byzantine Replication with Minimal Communication Costs. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 17:1-17:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{hellings_et_al:LIPIcs.ICDT.2020.17,
  author =	{Hellings, Jelle and Sadoghi, Mohammad},
  title =	{{Coordination-Free Byzantine Replication with Minimal Communication Costs}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{17:1--17:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.17},
  URN =		{urn:nbn:de:0030-drops-119418},
  doi =		{10.4230/LIPIcs.ICDT.2020.17},
  annote =	{Keywords: Byzantine learner, coordination-free checkpoint protocol, delayed-replication, information dispersal, consensus}
}
Document
Integrity Constraints Revisited: From Exact to Approximate Implication

Authors: Batya Kenig and Dan Suciu


Abstract
Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction.

Cite as

Batya Kenig and Dan Suciu. Integrity Constraints Revisited: From Exact to Approximate Implication. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 18:1-18:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{kenig_et_al:LIPIcs.ICDT.2020.18,
  author =	{Kenig, Batya and Suciu, Dan},
  title =	{{Integrity Constraints Revisited: From Exact to Approximate Implication}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{18:1--18:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.18},
  URN =		{urn:nbn:de:0030-drops-119420},
  doi =		{10.4230/LIPIcs.ICDT.2020.18},
  annote =	{Keywords: Integrity constraints, The implication problem}
}
Document
Datalog with Negation and Monotonicity

Authors: Bas Ketsman and Christoph Koch


Abstract
Positive Datalog has several nice properties that are lost when the language is extended with negation. One example is that fixpoints of positive Datalog programs are robust w.r.t. the order in which facts are inserted, which facilitates efficient evaluation of such programs in distributed environments. A natural question to ask, given a (stratified) Datalog program with negation, is whether an equivalent positive Datalog program exists. In this context, it is known that positive Datalog can express only a strict subset of the monotone queries, yet the exact relationship between the positive and monotone fragments of semi-positive and stratified Datalog was previously left open. In this paper, we complete the picture by showing that monotone queries expressible in semi-positive Datalog exist which are not expressible in positive Datalog. To provide additional insight into this gap, we also characterize a large class of semi-positive Datalog programs for which the dichotomy `monotone if and only if rewritable to positive Datalog' holds. Finally, we give best-effort techniques to reduce the amount of negation that is exhibited by a program, even if the program is not monotone.

Cite as

Bas Ketsman and Christoph Koch. Datalog with Negation and Monotonicity. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 19:1-19:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{ketsman_et_al:LIPIcs.ICDT.2020.19,
  author =	{Ketsman, Bas and Koch, Christoph},
  title =	{{Datalog with Negation and Monotonicity}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{19:1--19:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.19},
  URN =		{urn:nbn:de:0030-drops-119432},
  doi =		{10.4230/LIPIcs.ICDT.2020.19},
  annote =	{Keywords: Datalog, Monotonicity}
}
Document
The Shapley Value of Tuples in Query Answering

Authors: Ester Livshits, Leopoldo Bertossi, Benny Kimelfeld, and Moshe Sebag


Abstract
We investigate the application of the Shapley value to quantifying the contribution of a tuple to a query answer. The Shapley value is a widely known numerical measure in cooperative game theory and in many applications of game theory for assessing the contribution of a player to a coalition game. It has been established already in the 1950s, and is theoretically justified by being the very single wealth-distribution measure that satisfies some natural axioms. While this value has been investigated in several areas, it received little attention in data management. We study this measure in the context of conjunctive and aggregate queries by defining corresponding coalition games. We provide algorithmic and complexity-theoretic results on the computation of Shapley-based contributions to query answers; and for the hard cases we present approximation algorithms.

Cite as

Ester Livshits, Leopoldo Bertossi, Benny Kimelfeld, and Moshe Sebag. The Shapley Value of Tuples in Query Answering. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{livshits_et_al:LIPIcs.ICDT.2020.20,
  author =	{Livshits, Ester and Bertossi, Leopoldo and Kimelfeld, Benny and Sebag, Moshe},
  title =	{{The Shapley Value of Tuples in Query Answering}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.20},
  URN =		{urn:nbn:de:0030-drops-119442},
  doi =		{10.4230/LIPIcs.ICDT.2020.20},
  annote =	{Keywords: Shapley value, query answering, conjunctive queries, aggregate queries}
}
Document
Optimal Joins Using Compact Data Structures

Authors: Gonzalo Navarro, Juan L. Reutter, and Javiel Rojas-Ledesma


Abstract
Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality we either need to build completely new indexes, or we must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that may be non-negligible. We show that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage. Our representation is a compact quadtree for the static indexes, and a dynamic quadtree sharing subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, and show that the running time of this algorithm is worst-case optimal in data complexity. Remarkably, we can extend our framework to evaluate more expressive queries from relational algebra by introducing a lazy version of qdags (lqdags). Once again, we can show that the running time of our algorithms is worst-case optimal.

Cite as

Gonzalo Navarro, Juan L. Reutter, and Javiel Rojas-Ledesma. Optimal Joins Using Compact Data Structures. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 21:1-21:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{navarro_et_al:LIPIcs.ICDT.2020.21,
  author =	{Navarro, Gonzalo and Reutter, Juan L. and Rojas-Ledesma, Javiel},
  title =	{{Optimal Joins Using Compact Data Structures}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{21:1--21:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.21},
  URN =		{urn:nbn:de:0030-drops-119453},
  doi =		{10.4230/LIPIcs.ICDT.2020.21},
  annote =	{Keywords: Join algorithms, Compact data structures, Quadtrees, AGM bound}
}
Document
The Space Complexity of Inner Product Filters

Authors: Rasmus Pagh and Johan Sivertsen


Abstract
Motivated by the problem of filtering candidate pairs in inner product similarity joins we study the following inner product estimation problem: Given parameters d∈ℕ, α>β≥0 and unit vectors x,y∈ ℝ^d consider the task of distinguishing between the cases ⟨x,y⟩≤β and ⟨x,y⟩≥α where ⟨x,y⟩ = ∑_{i=1}^d x_i y_i is the inner product of vectors x and y. The goal is to distinguish these cases based on information on each vector encoded independently in a bit string of the shortest length possible. In contrast to much work on compressing vectors using randomized dimensionality reduction, we seek to solve the problem deterministically, with no probability of error. Inner product estimation can be solved in general via estimating ⟨x,y⟩ with an additive error bounded by ε = α - β. We show that d log₂ (√{1-β}/ε) ± Θ(d) bits of information about each vector is necessary and sufficient. Our upper bound is constructive and improves a known upper bound of d log₂(1/ε) + O(d) by up to a factor of 2 when β is close to 1. The lower bound holds even in a stronger model where one of the vectors is known exactly, and an arbitrary estimation function is allowed.

Cite as

Rasmus Pagh and Johan Sivertsen. The Space Complexity of Inner Product Filters. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{pagh_et_al:LIPIcs.ICDT.2020.22,
  author =	{Pagh, Rasmus and Sivertsen, Johan},
  title =	{{The Space Complexity of Inner Product Filters}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{22:1--22:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.22},
  URN =		{urn:nbn:de:0030-drops-119468},
  doi =		{10.4230/LIPIcs.ICDT.2020.22},
  annote =	{Keywords: Similarity, estimation, dot product, filtering}
}
Document
A Family of Centrality Measures for Graph Data Based on Subgraphs

Authors: Cristian Riveros and Jorge Salas


Abstract
We present the theoretical foundations of a new approach in centrality measures for graph data. The main principle of our approach is very simple: the more relevant subgraphs around a vertex, the more central it is in the network. We formalize the notion of "relevant subgraphs" by choosing a family of subgraphs that, give a graph G and a vertex v in G, it assigns a subset of connected subgraphs of G that contains v. Any of such families defines a measure of centrality by counting the number of subgraphs assigned to the vertex, i.e., a vertex will be more important for the network if it belongs to more subgraphs in the family. We show many examples of this approach and, in particular, we propose the all-subgraphs centrality, a centrality measure that takes every subgraph into account. We study fundamental properties over families of subgraphs that guarantee desirable properties over the corresponding centrality measure. Interestingly, all-subgraphs centrality satisfies all these properties, showing its robustness as a notion for centrality. Finally, we study the computational complexity of counting certain families of subgraphs and show a polynomial time algorithm to compute the all-subgraphs centrality for graphs with bounded tree width.

Cite as

Cristian Riveros and Jorge Salas. A Family of Centrality Measures for Graph Data Based on Subgraphs. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 23:1-23:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{riveros_et_al:LIPIcs.ICDT.2020.23,
  author =	{Riveros, Cristian and Salas, Jorge},
  title =	{{A Family of Centrality Measures for Graph Data Based on Subgraphs}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{23:1--23:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.23},
  URN =		{urn:nbn:de:0030-drops-119474},
  doi =		{10.4230/LIPIcs.ICDT.2020.23},
  annote =	{Keywords: Graph data, graph centrality, centrality measures}
}
Document
Reverse Prevention Sampling for Misinformation Mitigation in Social Networks

Authors: Michael Simpson, Venkatesh Srinivasan, and Alex Thomo


Abstract
In this work, we consider misinformation propagating through a social network and study the problem of its prevention. In this problem, a "bad" campaign starts propagating from a set of seed nodes in the network and we use the notion of a limiting (or "good") campaign to counteract the effect of misinformation. The goal is to identify a set of k users that need to be convinced to adopt the limiting campaign so as to minimize the number of people that adopt the "bad" campaign at the end of both propagation processes. This work presents RPS (Reverse Prevention Sampling), an algorithm that provides a scalable solution to the misinformation prevention problem. Our theoretical analysis shows that RPS runs in O((k + l)(n + m)(1/(1 - γ)) log n / ε²) expected time and returns a (1 - 1/e - ε)-approximate solution with at least 1 - n^{-l} probability (where γ is a typically small network parameter and l is a confidence parameter). The time complexity of RPS substantially improves upon the previously best-known algorithms that run in time Ω(m n k ⋅ POLY(ε^{-1})). We experimentally evaluate RPS on large datasets and show that it outperforms the state-of-the-art solution by several orders of magnitude in terms of running time. This demonstrates that misinformation prevention can be made practical while still offering strong theoretical guarantees.

Cite as

Michael Simpson, Venkatesh Srinivasan, and Alex Thomo. Reverse Prevention Sampling for Misinformation Mitigation in Social Networks. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 24:1-24:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{simpson_et_al:LIPIcs.ICDT.2020.24,
  author =	{Simpson, Michael and Srinivasan, Venkatesh and Thomo, Alex},
  title =	{{Reverse Prevention Sampling for Misinformation Mitigation in Social Networks}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{24:1--24:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.24},
  URN =		{urn:nbn:de:0030-drops-119484},
  doi =		{10.4230/LIPIcs.ICDT.2020.24},
  annote =	{Keywords: Graph Algorithms, Social Networks, Misinformation Prevention}
}
Document
A Simple Parallel Algorithm for Natural Joins on Binary Relations

Authors: Yufei Tao


Abstract
In PODS'17, Ketsman and Suciu gave an algorithm in the MPC model for computing the result of any natural join where every input relation has two attributes. Achieving an optimal load O(m/p^{1/ρ}) - where m is the total size of the input relations, p the number of machines, and ρ the fractional edge covering number of the join - their algorithm requires 7 rounds to finish. This paper presents a simpler algorithm that ensures the same load with 3 rounds (in fact, the second round incurs only a load of O(p²) to transmit certain statistics to assist machine allocation in the last round). Our algorithm is made possible by a new theorem that provides fresh insight on the structure of the problem, and brings us closer to understanding the intrinsic reason why joins on binary relations can be settled with load O(m/p^{1/ρ}).

Cite as

Yufei Tao. A Simple Parallel Algorithm for Natural Joins on Binary Relations. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 25:1-25:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{tao:LIPIcs.ICDT.2020.25,
  author =	{Tao, Yufei},
  title =	{{A Simple Parallel Algorithm for Natural Joins on Binary Relations}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{25:1--25:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.25},
  URN =		{urn:nbn:de:0030-drops-119495},
  doi =		{10.4230/LIPIcs.ICDT.2020.25},
  annote =	{Keywords: Natural Joins, Conjunctive Queries, MPC Algorithms, Parallel Computing}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail