LIPIcs, Volume 255

26th International Conference on Database Theory (ICDT 2023)



Thumbnail PDF

Event

ICDT 2023, March 28-31, 2023, Ioannina, Greece

Editors

Floris Geerts
  • University of Antwerp, Belgium
Brecht Vandevoort
  • UHasselt, Data Science Institute, ACSL, Diepenbeek, Belgium

Publication Details

  • published at: 2023-03-17
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-270-9
  • DBLP: db/conf/icdt/icdt2023

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 255, ICDT 2023, Complete Volume

Authors: Floris Geerts and Brecht Vandevoort


Abstract
LIPIcs, Volume 255, ICDT 2023, Complete Volume

Cite as

26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 1-466, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@Proceedings{geerts_et_al:LIPIcs.ICDT.2023,
  title =	{{LIPIcs, Volume 255, ICDT 2023, Complete Volume}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{1--466},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023},
  URN =		{urn:nbn:de:0030-drops-177414},
  doi =		{10.4230/LIPIcs.ICDT.2023},
  annote =	{Keywords: LIPIcs, Volume 255, ICDT 2023, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Floris Geerts and Brecht Vandevoort


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{geerts_et_al:LIPIcs.ICDT.2023.0,
  author =	{Geerts, Floris and Vandevoort, Brecht},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.0},
  URN =		{urn:nbn:de:0030-drops-177424},
  doi =		{10.4230/LIPIcs.ICDT.2023.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
A Researcher’s Digest of GQL (Invited Talk)

Authors: Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoč


Abstract
GQL (Graph Query Language) is being developed as a new ISO standard for graph query languages to play the same role for graph databases as SQL plays for relational. In parallel, an extension of SQL for querying property graphs, SQL/PGQ, is added to the SQL standard; it shares the graph pattern matching functionality with GQL. Both standards (not yet published) are hard-to-understand specifications of hundreds of pages. The goal of this paper is to present a digest of the language that is easy for the research community to understand, and thus to initiate research on these future standards for querying graphs. The paper concentrates on pattern matching features shared by GQL and SQL/PGQ, as well as querying facilities of GQL.

Cite as

Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoč. A Researcher’s Digest of GQL (Invited Talk). In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 1:1-1:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{francis_et_al:LIPIcs.ICDT.2023.1,
  author =	{Francis, Nadime and Gheerbrant, Am\'{e}lie and Guagliardo, Paolo and Libkin, Leonid and Marsault, Victor and Martens, Wim and Murlak, Filip and Peterfreund, Liat and Rogova, Alexandra and Vrgo\v{c}, Domagoj},
  title =	{{A Researcher’s Digest of GQL}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{1:1--1:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.1},
  URN =		{urn:nbn:de:0030-drops-177434},
  doi =		{10.4230/LIPIcs.ICDT.2023.1},
  annote =	{Keywords: GQL, Property Graph, Query Language, Graph Database, Pattern matching, Multi-Graph}
}
Document
Invited Talk
Compact Data Structures Meet Databases (Invited Talk)

Authors: Gonzalo Navarro


Abstract
We describe two success stories on the application of compact data structures (cds) to solve the problem of the excessively redundant space requirements posed by worst-case-optimal (wco) algorithms for multijoins in databases, and particularly basic graph patterns on graph databases. The aim of cds is to represent the data and additional data structures on it, using total space close to that of the plain (and, sometimes, compressed) data, while efficiently simulating the data structure operations. Cds turn out to be a perfect approach for the described problem: We designed and implemented cds that effectively use space close to that of the plain or compressed data, which is orders of magnitude less than existing systems, while retaining worst-case optimality and performing competitively with those systems in query time, sometimes being even considerably faster.

Cite as

Gonzalo Navarro. Compact Data Structures Meet Databases (Invited Talk). In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 2:1-2:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{navarro:LIPIcs.ICDT.2023.2,
  author =	{Navarro, Gonzalo},
  title =	{{Compact Data Structures Meet Databases}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{2:1--2:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.2},
  URN =		{urn:nbn:de:0030-drops-177446},
  doi =		{10.4230/LIPIcs.ICDT.2023.2},
  annote =	{Keywords: succinct data structures, tries, multidimensional grids, text searching}
}
Document
Invited Talk
Some Vignettes on Subgraph Counting Using Graph Orientations (Invited Talk)

Authors: C. Seshadhri


Abstract
Subgraph counting is a fundamental problem that spans many areas in computer science: database theory, logic, network science, data mining, and complexity theory. Given a large input graph G and a small pattern graph H, we wish to count the number of occurrences of H in G. In recent times, there has been a resurgence on using an old (maybe overlooked?) technique of orienting the edges of G and H, and then using a combination of brute-force enumeration and indexing. These orientation techniques appear to give the best of both worlds. There is a rigorous theoretical explanation behind these techniques, and they also have excellent empirical behavior (on large real-world graphs). Time and again, graph orientations help solve subgraph counting problems in various computational models, be it sampling, streaming, distributed, etc. In this paper, we give some short vignettes on how the orientation technique solves a variety of algorithmic problems.

Cite as

C. Seshadhri. Some Vignettes on Subgraph Counting Using Graph Orientations (Invited Talk). In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{seshadhri:LIPIcs.ICDT.2023.3,
  author =	{Seshadhri, C.},
  title =	{{Some Vignettes on Subgraph Counting Using Graph Orientations}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{3:1--3:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.3},
  URN =		{urn:nbn:de:0030-drops-177454},
  doi =		{10.4230/LIPIcs.ICDT.2023.3},
  annote =	{Keywords: subgraph counting, graph degeneracy, homomorphism counting, graph algorithms}
}
Document
Enumerating Subgraphs of Constant Sizes in External Memory

Authors: Shiyuan Deng, Francesco Silvestri, and Yufei Tao


Abstract
We present an indivisible I/O-efficient algorithm for subgraph enumeration, where the objective is to list all the subgraphs of a massive graph G : = (V, E) that are isomorphic to a pattern graph Q having k = O(1) vertices. Our algorithm performs O((|E|^{k/2})/(M^{{k/2}-1} B) log_{M/B}(|E|/B) + (|E|^ρ)/(M^{ρ-1} B) I/Os with high probability, where ρ is the fractional edge covering number of Q (it always holds ρ ≥ k/2, regardless of Q), M is the number of words in (internal) memory, and B is the number of words in a disk block. Our solution is optimal in the class of indivisible algorithms for all pattern graphs with ρ > k/2. When ρ = k/2, our algorithm is still optimal as long as M/B ≥ (|E|/B)^ε for any constant ε > 0.

Cite as

Shiyuan Deng, Francesco Silvestri, and Yufei Tao. Enumerating Subgraphs of Constant Sizes in External Memory. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{deng_et_al:LIPIcs.ICDT.2023.4,
  author =	{Deng, Shiyuan and Silvestri, Francesco and Tao, Yufei},
  title =	{{Enumerating Subgraphs of Constant Sizes in External Memory}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.4},
  URN =		{urn:nbn:de:0030-drops-177460},
  doi =		{10.4230/LIPIcs.ICDT.2023.4},
  annote =	{Keywords: Subgraph Enumeration, Conjunctive Queries, External Memory, Algorithms}
}
Document
An Optimal Algorithm for Sliding Window Order Statistics

Authors: Pavel Raykov


Abstract
Assume there is a data stream of elements and a window of size m. Sliding window algorithms compute various statistic functions over the last m elements of the data stream seen so far. The time complexity of a sliding window algorithm is measured as the time required to output an updated statistic function value every time a new element is read. For example, it is well known that computing the sliding window maximum/minimum has time complexity O(1) while computing the sliding window median has time complexity O(log m). In this paper we close the gap between these two cases by (1) presenting an algorithm for computing the sliding window k-th smallest element in O(log k) time and (2) prove that this time complexity is optimal.

Cite as

Pavel Raykov. An Optimal Algorithm for Sliding Window Order Statistics. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{raykov:LIPIcs.ICDT.2023.5,
  author =	{Raykov, Pavel},
  title =	{{An Optimal Algorithm for Sliding Window Order Statistics}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{5:1--5:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.5},
  URN =		{urn:nbn:de:0030-drops-177479},
  doi =		{10.4230/LIPIcs.ICDT.2023.5},
  annote =	{Keywords: sliding window, order statistics, median, selection algorithms}
}
Document
Space-Query Tradeoffs in Range Subgraph Counting and Listing

Authors: Shiyuan Deng, Shangqi Lu, and Yufei Tao


Abstract
This paper initializes the study of range subgraph counting and range subgraph listing, both of which are motivated by the significant demands in practice to perform graph analytics on subgraphs pertinent to only selected, as opposed to all, vertices. In the first problem, there is an undirected graph G where each vertex carries a real-valued attribute. Given an interval q and a pattern Q, a query counts the number of occurrences of Q in the subgraph of G induced by the vertices whose attributes fall in q. The second problem has the same setup except that a query needs to enumerate (rather than count) those occurrences with a small delay. In both problems, our goal is to understand the tradeoff between space usage and query cost, or more specifically: (i) given a target on query efficiency, how much pre-computed information about G must we store? (ii) Or conversely, given a budget on space usage, what is the best query time we can hope for? We establish a suite of upper- and lower-bound results on such tradeoffs for various query patterns.

Cite as

Shiyuan Deng, Shangqi Lu, and Yufei Tao. Space-Query Tradeoffs in Range Subgraph Counting and Listing. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 6:1-6:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{deng_et_al:LIPIcs.ICDT.2023.6,
  author =	{Deng, Shiyuan and Lu, Shangqi and Tao, Yufei},
  title =	{{Space-Query Tradeoffs in Range Subgraph Counting and Listing}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{6:1--6:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.6},
  URN =		{urn:nbn:de:0030-drops-177484},
  doi =		{10.4230/LIPIcs.ICDT.2023.6},
  annote =	{Keywords: Subgraph Pattern Counting, Subgraph Pattern Listing, Conjunctive Queries}
}
Document
Constant-Delay Enumeration for SLP-Compressed Documents

Authors: Martín Muñoz and Cristian Riveros


Abstract
We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our queries we use a model called Annotated Automata, an extension of regular automata that allows annotations on letters. This model extends the notion of Regular Spanners as it allows arbitrarily long outputs. Our main result is an algorithm which evaluates such a query by enumerating all results with output-linear delay after a preprocessing phase which takes linear time on the size of the SLP, and cubic time over the size of the automaton. This is an improvement over Schmid and Schweikardt’s result [Markus L. Schmid and Nicole Schweikardt, 2021], which, with the same preprocessing time, enumerates with a delay which is logarithmic on the size of the uncompressed document. We achieve this through a persistent data structure named Enumerable Compact Sets with Shifts which guarantees output-linear delay under certain restrictions. These results imply constant-delay enumeration algorithms in the context of regular spanners. Further, we use an extension of annotated automata which utilizes succinctly encoded annotations to save an exponential factor from previous results that dealt with constant-delay enumeration over vset automata. Lastly, we extend our results in the same fashion Schmid and Schweikardt did [Markus L. Schmid and Nicole Schweikardt, 2022] to allow complex document editing while maintaining the constant-delay guarantee.

Cite as

Martín Muñoz and Cristian Riveros. Constant-Delay Enumeration for SLP-Compressed Documents. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 7:1-7:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{munoz_et_al:LIPIcs.ICDT.2023.7,
  author =	{Mu\~{n}oz, Mart{\'\i}n and Riveros, Cristian},
  title =	{{Constant-Delay Enumeration for SLP-Compressed Documents}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{7:1--7:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.7},
  URN =		{urn:nbn:de:0030-drops-177495},
  doi =		{10.4230/LIPIcs.ICDT.2023.7},
  annote =	{Keywords: SLP compression, query evaluation, enumeration algorithms}
}
Document
Degree Sequence Bound for Join Cardinality Estimation

Authors: Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai


Abstract
Recent work has demonstrated the catastrophic effects of poor cardinality estimates on query processing time. In particular, underestimating query cardinality can result in overly optimistic query plans which take orders of magnitude longer to complete than one generated with the true cardinality. Cardinality bounding avoids this pitfall by computing an upper bound on the query’s output size using statistics about the database such as table sizes and degrees, i.e. value frequencies. In this paper, we extend this line of work by proving a novel bound called the Degree Sequence Bound which takes into account the full degree sequences and the max tuple multiplicity. This work focuses on the important class of Berge-Acyclic queries for which the Degree Sequence Bound is tight. Further, we describe how to practically compute this bound using a functional approximation of the true degree sequences and prove that even this functional form improves upon previous bounds.

Cite as

Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai. Degree Sequence Bound for Join Cardinality Estimation. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{deeds_et_al:LIPIcs.ICDT.2023.8,
  author =	{Deeds, Kyle and Suciu, Dan and Balazinska, Magda and Cai, Walter},
  title =	{{Degree Sequence Bound for Join Cardinality Estimation}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.8},
  URN =		{urn:nbn:de:0030-drops-177508},
  doi =		{10.4230/LIPIcs.ICDT.2023.8},
  annote =	{Keywords: Cardinality Estimation, Cardinality Bounding, Degree Bounds, Functional Approximation, Query Planning, Berge-Acyclic Queries}
}
Document
Absolute Expressiveness of Subgraph-Based Centrality Measures

Authors: Andreas Pieris and Jorge Salas


Abstract
In graph-based applications, a common task is to pinpoint the most important or "central" vertex in a (directed or undirected) graph, or rank the vertices of a graph according to their importance. To this end, a plethora of so-called centrality measures have been proposed in the literature. Such measures assess which vertices in a graph are the most important ones by analyzing the structure of the underlying graph. A family of centrality measures that are suited for graph databases has been recently proposed by relying on the following simple principle: the importance of a vertex in a graph is relative to the number of "relevant" connected subgraphs surrounding it; we refer to the members of this family as subgraph-based centrality measures. Although it has been shown that such measures enjoy several favourable properties, their absolute expressiveness remains largely unexplored. The goal of this work is to precisely characterize the absolute expressiveness of the family of subgraph-based centrality measures by considering both directed and undirected graphs. To this end, we characterize when an arbitrary centrality measure is a subgraph-based one, or a subgraph-based measure relative to the induced ranking. These characterizations provide us with technical tools that allow us to determine whether well-established centrality measures are subgraph-based. Such a classification, apart from being interesting in its own right, gives useful insights on the structural similarities and differences among existing centrality measures.

Cite as

Andreas Pieris and Jorge Salas. Absolute Expressiveness of Subgraph-Based Centrality Measures. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 9:1-9:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{pieris_et_al:LIPIcs.ICDT.2023.9,
  author =	{Pieris, Andreas and Salas, Jorge},
  title =	{{Absolute Expressiveness of Subgraph-Based Centrality Measures}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{9:1--9:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.9},
  URN =		{urn:nbn:de:0030-drops-177516},
  doi =		{10.4230/LIPIcs.ICDT.2023.9},
  annote =	{Keywords: Graph centrality measures, ranking, expressiveness}
}
Document
Diversity of Answers to Conjunctive Queries

Authors: Timo Camillo Merkl, Reinhard Pichler, and Sebastian Skritek


Abstract
Enumeration problems aim at outputting, without repetition, the set of solutions to a given problem instance. However, outputting the entire solution set may be prohibitively expensive if it is too big. In this case, outputting a small, sufficiently diverse subset of the solutions would be preferable. This leads to the Diverse-version of the original enumeration problem, where the goal is to achieve a certain level d of diversity by selecting k solutions. In this paper, we look at the Diverse-version of the query answering problem for Conjunctive Queries and extensions thereof. That is, we study the problem if it is possible to achieve a certain level d of diversity by selecting k answers to the given query and, in the positive case, to actually compute such k answers.

Cite as

Timo Camillo Merkl, Reinhard Pichler, and Sebastian Skritek. Diversity of Answers to Conjunctive Queries. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{merkl_et_al:LIPIcs.ICDT.2023.10,
  author =	{Merkl, Timo Camillo and Pichler, Reinhard and Skritek, Sebastian},
  title =	{{Diversity of Answers to Conjunctive Queries}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.10},
  URN =		{urn:nbn:de:0030-drops-177529},
  doi =		{10.4230/LIPIcs.ICDT.2023.10},
  annote =	{Keywords: Query Answering, Diversity of Solutions, Complexity, Algorithms}
}
Document
The Complexity of the Shapley Value for Regular Path Queries

Authors: Majd Khalil and Benny Kimelfeld


Abstract
A path query extracts vertex tuples from a labeled graph, based on the words that are formed by the paths connecting the vertices. We study the computational complexity of measuring the contribution of edges and vertices to an answer to a path query, focusing on the class of conjunctive regular path queries. To measure this contribution, we adopt the traditional Shapley value from cooperative game theory. This value has been recently proposed and studied in the context of relational database queries and has uses in a plethora of other domains. We first study the contribution of edges and show that the exact Shapley value is almost always hard to compute. Specifically, it is #P-hard to calculate the contribution of an edge whenever at least one (non-redundant) conjunct allows for a word of length three or more. In the case of regular path queries (i.e., no conjunction), the problem is tractable if the query has only words of length at most two; hence, this property fully characterizes the tractability of the problem. On the other hand, if we allow for an approximation error, then it is straightforward to obtain an efficient scheme (FPRAS) for an additive approximation. Yet, a multiplicative approximation is harder to obtain. We establish that in the case of conjunctive regular path queries, a multiplicative approximation of the Shapley value of an edge can be computed in polynomial time if and only if all query atoms are finite languages (assuming non-redundancy and conventional complexity limitations). We also study the analogous situation where we wish to determine the contribution of a vertex, rather than an edge, and establish complexity results of similar nature.

Cite as

Majd Khalil and Benny Kimelfeld. The Complexity of the Shapley Value for Regular Path Queries. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 11:1-11:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{khalil_et_al:LIPIcs.ICDT.2023.11,
  author =	{Khalil, Majd and Kimelfeld, Benny},
  title =	{{The Complexity of the Shapley Value for Regular Path Queries}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{11:1--11:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.11},
  URN =		{urn:nbn:de:0030-drops-177535},
  doi =		{10.4230/LIPIcs.ICDT.2023.11},
  annote =	{Keywords: Path queries, regular path queries, graph databases, Shapley value}
}
Document
How Do Centrality Measures Choose the Root of Trees?

Authors: Cristian Riveros, Jorge Salas, and Oskar Skibski


Abstract
Centrality measures are widely used to assign importance to graph-structured data. Recently, understanding the principles of such measures has attracted a lot of attention. Given that measures are diverse, this research has usually focused on classes of centrality measures. In this work, we provide a different approach by focusing on classes of graphs instead of classes of measures to understand the underlying principles among various measures. More precisely, we study the class of trees. We observe that even in the case of trees, there is no consensus on which node should be selected as the most central. To analyze the behavior of centrality measures on trees, we introduce a property of tree rooting that states a measure selects one or two adjacent nodes as the most important, and the importance decreases from them in all directions. This property is satisfied by closeness centrality but violated by PageRank. We show that, for several centrality measures that root trees, the comparison of adjacent nodes can be inferred by potential functions that assess the quality of trees. We use these functions to give fundamental insights on rooting and derive a characterization explaining why some measure root trees. Moreover, we provide an almost linear time algorithm to compute the root of a graph by using potential functions. Finally, using a family of potential functions, we show that many ways of tree rooting exist with desirable properties.

Cite as

Cristian Riveros, Jorge Salas, and Oskar Skibski. How Do Centrality Measures Choose the Root of Trees?. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 12:1-12:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{riveros_et_al:LIPIcs.ICDT.2023.12,
  author =	{Riveros, Cristian and Salas, Jorge and Skibski, Oskar},
  title =	{{How Do Centrality Measures Choose the Root of Trees?}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{12:1--12:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.12},
  URN =		{urn:nbn:de:0030-drops-177545},
  doi =		{10.4230/LIPIcs.ICDT.2023.12},
  annote =	{Keywords: Databases, centrality measures, data centrality, graph theory, tree structures}
}
Document
Size Bounds and Algorithms for Conjunctive Regular Path Queries

Authors: Tamara Cucumides, Juan Reutter, and Domagoj Vrgoč


Abstract
Conjunctive regular path queries (CRPQs) are one of the core classes of queries over graph databases. They are join intensive, inheriting their structure from the relational setting, but they also allow arbitrary length paths to connect points that are to be joined. However, despite their popularity, little is known about what are the best algorithms for processing CRPQs. We focus on worst-case optimal algorithms, which are algorithms that run in time bounded by the worst-case output size of queries, and have been recently deployed for simpler graph queries with very promising results. We show that the famous bound on the number of query results by Atserias, Grohe and Marx can be extended to CRPQs, but to obtain tight bounds one needs to work with slightly stronger cardinality profiles. We also discuss what algorithms follow from our analysis. If one pays the cost for fully materializing graph queries, then the techniques developed for conjunctive queries can be reused. If, on the other hand, one imposes constraint on the working memory of algorithms, then worst-case optimal algorithms must be adapted with care: the order of variables in which queries are processed can have striking implications on the running time of queries.

Cite as

Tamara Cucumides, Juan Reutter, and Domagoj Vrgoč. Size Bounds and Algorithms for Conjunctive Regular Path Queries. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{cucumides_et_al:LIPIcs.ICDT.2023.13,
  author =	{Cucumides, Tamara and Reutter, Juan and Vrgo\v{c}, Domagoj},
  title =	{{Size Bounds and Algorithms for Conjunctive Regular Path Queries}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.13},
  URN =		{urn:nbn:de:0030-drops-177552},
  doi =		{10.4230/LIPIcs.ICDT.2023.13},
  annote =	{Keywords: graph databases, regular path queries, worst-case optimal algorithms}
}
Document
Uniform Reliability for Unbounded Homomorphism-Closed Graph Queries

Authors: Antoine Amarilli


Abstract
We study the uniform query reliability problem, which asks, for a fixed Boolean query Q, given an instance I, how many subinstances of I satisfy Q. Equivalently, this is a restricted case of Boolean query evaluation on tuple-independent probabilistic databases where all facts must have probability 1/2. We focus on graph signatures, and on queries closed under homomorphisms. We show that for any such query that is unbounded, i.e., not equivalent to a union of conjunctive queries, the uniform reliability problem is #P-hard. This recaptures the hardness, e.g., of s-t connectedness, which counts how many subgraphs of an input graph have a path between a source and a sink. This new hardness result on uniform reliability strengthens our earlier hardness result on probabilistic query evaluation for unbounded homomorphism-closed queries [Amarilli and Ceylan, 2021]. Indeed, our earlier proof crucially used facts with probability 1, so it did not apply to the unweighted case. The new proof presented in this paper avoids this; it uses our recent hardness result on uniform reliability for non-hierarchical conjunctive queries without self-joins [Antoine Amarilli and Benny Kimelfeld, 2022], along with new techniques.

Cite as

Antoine Amarilli. Uniform Reliability for Unbounded Homomorphism-Closed Graph Queries. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{amarilli:LIPIcs.ICDT.2023.14,
  author =	{Amarilli, Antoine},
  title =	{{Uniform Reliability for Unbounded Homomorphism-Closed Graph Queries}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.14},
  URN =		{urn:nbn:de:0030-drops-177566},
  doi =		{10.4230/LIPIcs.ICDT.2023.14},
  annote =	{Keywords: Uniform reliability, #P-hardness, probabilistic databases}
}
Document
Approximation and Semantic Tree-Width of Conjunctive Regular Path Queries

Authors: Diego Figueira and Rémi Morvan


Abstract
We show that the problem of whether a query is equivalent to a query of tree-width k is decidable, for the class of Unions of Conjunctive Regular Path Queries with two-way navigation (UC2RPQs). A previous result by Barceló, Romero, and Vardi [Pablo Barceló et al., 2016] has shown decidability for the case k = 1, and here we show that decidability in fact holds for any arbitrary k > 1. The algorithm is in 2ExpSpace, but for the restricted but practically relevant case where all regular expressions of the query are of the form a^* or (a_1 + ... + a_n) we show that the complexity of the problem drops to Π^p_2. We also investigate the related problem of approximating a UC2RPQ by queries of small tree-width. We exhibit an algorithm which, for any fixed number k, builds the maximal under-approximation of tree-width k of a UC2RPQ. The maximal under-approximation of tree-width k of a query q is a query q' of tree-width k which is contained in q in a maximal and unique way, that is, such that for every query q'' of tree-width k, if q'' is contained in q then q'' is also contained in q'.

Cite as

Diego Figueira and Rémi Morvan. Approximation and Semantic Tree-Width of Conjunctive Regular Path Queries. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 15:1-15:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{figueira_et_al:LIPIcs.ICDT.2023.15,
  author =	{Figueira, Diego and Morvan, R\'{e}mi},
  title =	{{Approximation and Semantic Tree-Width of Conjunctive Regular Path Queries}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{15:1--15:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.15},
  URN =		{urn:nbn:de:0030-drops-177575},
  doi =		{10.4230/LIPIcs.ICDT.2023.15},
  annote =	{Keywords: graph databases, conjunctive regular path queries, semantic optimization, tree-width, containment, approximation}
}
Document
Work-Efficient Query Evaluation with PRAMs

Authors: Jens Keppeler, Thomas Schwentick, and Christopher Spinrath


Abstract
The paper studies query evaluation in parallel constant time in the PRAM model. While it is well-known that all relational algebra queries can be evaluated in constant time on an appropriate CRCW-PRAM, this paper is interested in the efficiency of evaluation algorithms, that is, in the number of processors or, asymptotically equivalent, in the work. Naive evaluation in the parallel setting results in huge (polynomial) bounds on the work of such algorithms and in presentations of the result sets that can be extremely scattered in memory. The paper first discusses some obstacles for constant time PRAM query evaluation. It presents algorithms for relational operators that are considerably more efficient than the naive approaches. Further it explores three settings, in which efficient sequential query evaluation algorithms exist: acyclic queries, semi-join algebra queries, and join queries - the latter in the worst-case optimal framework. Under natural assumptions on the representation of the database, the work of the given algorithms matches the best sequential algorithms in the case of semi-join queries, and it comes close in the other two settings. An important tool is the compaction technique from Hagerup (1992).

Cite as

Jens Keppeler, Thomas Schwentick, and Christopher Spinrath. Work-Efficient Query Evaluation with PRAMs. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 16:1-16:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{keppeler_et_al:LIPIcs.ICDT.2023.16,
  author =	{Keppeler, Jens and Schwentick, Thomas and Spinrath, Christopher},
  title =	{{Work-Efficient Query Evaluation with PRAMs}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{16:1--16:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.16},
  URN =		{urn:nbn:de:0030-drops-177589},
  doi =		{10.4230/LIPIcs.ICDT.2023.16},
  annote =	{Keywords: PRAM, query evaluation, work-efficient, parallel, acyclic queries, free-connex queries}
}
Document
Conjunctive Queries with Free Access Patterns Under Updates

Authors: Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang


Abstract
We study the problem of answering conjunctive queries with free access patterns under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple of values over the input variables. We introduce a fully dynamic evaluation approach for such queries. We also give a syntactic characterisation of those queries that admit constant time per single-tuple update and whose output tuples can be enumerated with constant delay given an input tuple. Finally, we chart the complexity trade-off between the preprocessing time, update time and enumeration delay for such queries. For a class of queries, our approach achieves optimal, albeit non-constant, update time and delay. Their optimality is predicated on the Online Matrix-Vector Multiplication conjecture. Our results recover prior work on the dynamic evaluation of conjunctive queries without access patterns.

Cite as

Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Conjunctive Queries with Free Access Patterns Under Updates. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 17:1-17:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2023.17,
  author =	{Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Conjunctive Queries with Free Access Patterns Under Updates}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{17:1--17:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.17},
  URN =		{urn:nbn:de:0030-drops-177599},
  doi =		{10.4230/LIPIcs.ICDT.2023.17},
  annote =	{Keywords: fully dynamic algorithm, enumeration delay, complexity trade-off, dichotomy}
}
Document
Finite-Cliquewidth Sets of Existential Rules: Toward a General Criterion for Decidable yet Highly Expressive Querying

Authors: Thomas Feller, Tim S. Lyon, Piotr Ostropolski-Nalewaja, and Sebastian Rudolph


Abstract
In our pursuit of generic criteria for decidable ontology-based querying, we introduce finite-cliquewidth sets (fcs) of existential rules, a model-theoretically defined class of rule sets, inspired by the cliquewidth measure from graph theory. By a generic argument, we show that fcs ensures decidability of entailment for a sizable class of queries (dubbed DaMSOQs) subsuming conjunctive queries (CQs). The fcs class properly generalizes the class of finite-expansion sets (fes), and for signatures of arity ≤ 2, the class of bounded-treewidth sets (bts). For higher arities, bts is only indirectly subsumed by fcs by means of reification. Despite the generality of fcs, we provide a rule set with decidable CQ entailment (by virtue of first-order-rewritability) that falls outside fcs, thus demonstrating the incomparability of fcs and the class of finite-unification sets (fus). In spite of this, we show that if we restrict ourselves to single-headed rule sets over signatures of arity ≤ 2, then fcs subsumes fus.

Cite as

Thomas Feller, Tim S. Lyon, Piotr Ostropolski-Nalewaja, and Sebastian Rudolph. Finite-Cliquewidth Sets of Existential Rules: Toward a General Criterion for Decidable yet Highly Expressive Querying. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{feller_et_al:LIPIcs.ICDT.2023.18,
  author =	{Feller, Thomas and Lyon, Tim S. and Ostropolski-Nalewaja, Piotr and Rudolph, Sebastian},
  title =	{{Finite-Cliquewidth Sets of Existential Rules: Toward a General Criterion for Decidable yet Highly Expressive Querying}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{18:1--18:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.18},
  URN =		{urn:nbn:de:0030-drops-177602},
  doi =		{10.4230/LIPIcs.ICDT.2023.18},
  annote =	{Keywords: existential rules, TGDs, cliquewidth, treewidth, bounded-treewidth sets, finite-unification sets, first-order rewritability, monadic second-order logic, datalog}
}
Document
Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs

Authors: Sepehr Assadi, Nirmit Joshi, Milind Prabhu, and Vihan Shah


Abstract
Estimating quantiles, like the median or percentiles, is a fundamental task in data mining and data science. A (streaming) quantile summary is a data structure that can process a set S of n elements in a streaming fashion and at the end, for any ϕ ∈ (0,1], return a ϕ-quantile of S up to an ε error, i.e., return a ϕ'-quantile with ϕ' = ϕ ± ε. We are particularly interested in comparison-based summaries that only compare elements of the universe under a total ordering and are otherwise completely oblivious of the universe. The best known deterministic quantile summary is the 20-year old Greenwald-Khanna (GK) summary that uses O((1/ε) log{(ε n)}) space [SIGMOD'01]. This bound was recently proved to be optimal for all deterministic comparison-based summaries by Cormode and Vesleý [PODS'20]. In this paper, we study weighted quantiles, a generalization of the quantiles problem, where each element arrives with a positive integer weight which denotes the number of copies of that element being inserted. The only known method of handling weighted inputs via GK summaries is the naive approach of breaking each weighted element into multiple unweighted items, and feeding them one by one to the summary, which results in a prohibitively large update time (proportional to the maximum weight of input elements). We give the first non-trivial extension of GK summaries for weighted inputs and show that it takes O((1/ε) log(εn)) space and O(log(1/ε)+log log(εn)) update time per element to process a stream of length n (under some quite mild assumptions on the range of weights and ε). En route to this, we also simplify the original GK summaries for unweighted quantiles.

Cite as

Sepehr Assadi, Nirmit Joshi, Milind Prabhu, and Vihan Shah. Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 19:1-19:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{assadi_et_al:LIPIcs.ICDT.2023.19,
  author =	{Assadi, Sepehr and Joshi, Nirmit and Prabhu, Milind and Shah, Vihan},
  title =	{{Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{19:1--19:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.19},
  URN =		{urn:nbn:de:0030-drops-177618},
  doi =		{10.4230/LIPIcs.ICDT.2023.19},
  annote =	{Keywords: Streaming algorithms, Quantile summaries, Rank estimation}
}
Document
Probabilistic Query Evaluation with Bag Semantics

Authors: Martin Grohe, Peter Lindner, and Christoph Standke


Abstract
We initiate the study of probabilistic query evaluation under bag semantics where tuples are allowed to be present with duplicates. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi & Suciu, JACM 2012). Due to potentially unbounded multiplicities, the bag probabilistic databases we discuss are no longer finite objects, which requires a treatment of representation mechanisms. Moreover, the answer to a Boolean query is a probability distribution over non-negative integers, rather than a probability distribution over {true, false}. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.

Cite as

Martin Grohe, Peter Lindner, and Christoph Standke. Probabilistic Query Evaluation with Bag Semantics. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2023.20,
  author =	{Grohe, Martin and Lindner, Peter and Standke, Christoph},
  title =	{{Probabilistic Query Evaluation with Bag Semantics}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.20},
  URN =		{urn:nbn:de:0030-drops-177636},
  doi =		{10.4230/LIPIcs.ICDT.2023.20},
  annote =	{Keywords: Probabilistic Query Evaluation, Probabilistic Databases, Bag Semantics}
}
Document
On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions

Authors: Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara


Abstract
d-dimensional (for d > 1) efficient range-summability (dD-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the dD-ERS problem, or to the latter database problem, have been two long-standing open problems. Both are solved in this work. Specifically, we propose a novel solution framework to dD-ERS on RVs that have Gaussian or Poisson distribution. Our dD-ERS solutions are the first ones that have polylogarithmic time complexities. Furthermore, we develop a novel k-wise independence theory that allows our dD-ERS solutions to have both high computational efficiencies and strong provable independence guarantees. Finally, we show that under a sufficient and likely necessary condition, certain existing solutions for 1D-ERS can be generalized to higher dimensions.

Cite as

Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara. On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{meng_et_al:LIPIcs.ICDT.2023.21,
  author =	{Meng, Jingfan and Wang, Huayi and Xu, Jun and Ogihara, Mitsunori},
  title =	{{On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.21},
  URN =		{urn:nbn:de:0030-drops-177624},
  doi =		{10.4230/LIPIcs.ICDT.2023.21},
  annote =	{Keywords: fast range-summation, multidimensional data streams, Haar wavelet transform}
}
Document
The Consistency of Probabilistic Databases with Independent Cells

Authors: Amir Gilad, Aviram Imber, and Benny Kimelfeld


Abstract
A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies.

Cite as

Amir Gilad, Aviram Imber, and Benny Kimelfeld. The Consistency of Probabilistic Databases with Independent Cells. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 22:1-22:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{gilad_et_al:LIPIcs.ICDT.2023.22,
  author =	{Gilad, Amir and Imber, Aviram and Kimelfeld, Benny},
  title =	{{The Consistency of Probabilistic Databases with Independent Cells}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{22:1--22:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.22},
  URN =		{urn:nbn:de:0030-drops-177644},
  doi =		{10.4230/LIPIcs.ICDT.2023.22},
  annote =	{Keywords: Probabilistic databases, attribute-level uncertainty, functional dependencies, most probable database}
}
Document
Consistent Query Answering for Primary Keys and Conjunctive Queries with Counting

Authors: Aziz Amezian El Khalfioui and Jef Wijsen


Abstract
The problem of consistent query answering for primary keys and self-join-free conjunctive queries has been intensively studied in recent years and is by now well understood. In this paper, we study an extension of this problem with counting. The queries we consider count how many times each value occurs in a designated (possibly composite) column of an answer to a full conjunctive query. In a setting of database repairs, we adopt the semantics of [Arenas et al., ICDT 2001] which computes tight lower and upper bounds on these counts, where the bounds are taken over all repairs. Ariel Fuxman defined in his PhD thesis a syntactic class of queries, called Cforest, for which this computation can be done by executing two first-order queries (one for lower bounds, and one for upper bounds) followed by simple counting steps. We use the term "parsimonious counting" for this computation. A natural question is whether Cforest contains all self-join-free conjunctive queries that admit parsimonious counting. We answer this question negatively. We define a new syntactic class of queries, called Cparsimony, and prove that it contains all (and only) self-join-free conjunctive queries that admit parsimonious counting.

Cite as

Aziz Amezian El Khalfioui and Jef Wijsen. Consistent Query Answering for Primary Keys and Conjunctive Queries with Counting. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{amezianelkhalfioui_et_al:LIPIcs.ICDT.2023.23,
  author =	{Amezian El Khalfioui, Aziz and Wijsen, Jef},
  title =	{{Consistent Query Answering for Primary Keys and Conjunctive Queries with Counting}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{23:1--23:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.23},
  URN =		{urn:nbn:de:0030-drops-177659},
  doi =		{10.4230/LIPIcs.ICDT.2023.23},
  annote =	{Keywords: Consistent query answering, primary key, conjunctive query, aggregation, counting}
}
Document
A Simple Algorithm for Consistent Query Answering Under Primary Keys

Authors: Diego Figueira, Anantha Padmanabha, Luc Segoufin, and Cristina Sirangelo


Abstract
We consider the dichotomy conjecture for consistent query answering under primary key constraints. It states that, for every fixed Boolean conjunctive query q, testing whether q is certain (i.e. whether it evaluates to true over all repairs of a given inconsistent database) is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set Δ of subsets of database repairs with at most k facts, where k is the size of the query q. The algorithm runs in polynomial time and can be formally defined as: 1) Initialize Δ with all sets S of at most k facts such that S⊧ q. 2) Add any set S of at most k facts to Δ if there exists a block B (i.e., a maximal set of facts sharing the same key) such that for every fact a ∈ B there is a set S' ∈ Δ contained in S ∪ {a}. The algorithm answers "q is certain" iff Δ eventually contains the empty set. The algorithm correctly computes certainty when the query q falls in the polynomial time cases of the known dichotomies for self-join-free queries and path queries. For arbitrary Boolean conjunctive queries, the algorithm is an under-approximation: the query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.

Cite as

Diego Figueira, Anantha Padmanabha, Luc Segoufin, and Cristina Sirangelo. A Simple Algorithm for Consistent Query Answering Under Primary Keys. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 24:1-24:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{figueira_et_al:LIPIcs.ICDT.2023.24,
  author =	{Figueira, Diego and Padmanabha, Anantha and Segoufin, Luc and Sirangelo, Cristina},
  title =	{{A Simple Algorithm for Consistent Query Answering Under Primary Keys}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{24:1--24:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.24},
  URN =		{urn:nbn:de:0030-drops-177663},
  doi =		{10.4230/LIPIcs.ICDT.2023.24},
  annote =	{Keywords: consistent query answering, primary keys, conjunctive queries}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail