LIPIcs, Volume 186

24th International Conference on Database Theory (ICDT 2021)



Thumbnail PDF

Event

ICDT 2021, March 23-26, 2021, Nicosia, Cyprus

Editors

Ke Yi
  • The Hong Kong University of Science and Technology, Hong Kong
Zhewei Wei
  • Renmin University of China, China

Publication Details

  • published at: 2021-03-11
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-179-5
  • DBLP: db/conf/icdt/icdt2021

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 186, ICDT 2021, Complete Volume

Authors: Ke Yi and Zhewei Wei


Abstract
LIPIcs, Volume 186, ICDT 2021, Complete Volume

Cite as

24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 1-438, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@Proceedings{yi_et_al:LIPIcs.ICDT.2021,
  title =	{{LIPIcs, Volume 186, ICDT 2021, Complete Volume}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{1--438},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021},
  URN =		{urn:nbn:de:0030-drops-137076},
  doi =		{10.4230/LIPIcs.ICDT.2021},
  annote =	{Keywords: LIPIcs, Volume 186, ICDT 2021, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Ke Yi and Zhewei Wei


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{yi_et_al:LIPIcs.ICDT.2021.0,
  author =	{Yi, Ke and Wei, Zhewei},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.0},
  URN =		{urn:nbn:de:0030-drops-137086},
  doi =		{10.4230/LIPIcs.ICDT.2021.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Explainability Queries for ML Models and its Connections with Data Management Problems (Invited Talk)

Authors: Pablo Barceló


Abstract
In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML. The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time. The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multi-layer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models.

Cite as

Pablo Barceló. Explainability Queries for ML Models and its Connections with Data Management Problems (Invited Talk). In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{barcelo:LIPIcs.ICDT.2021.1,
  author =	{Barcel\'{o}, Pablo},
  title =	{{Explainability Queries for ML Models and its Connections with Data Management Problems}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.1},
  URN =		{urn:nbn:de:0030-drops-137091},
  doi =		{10.4230/LIPIcs.ICDT.2021.1},
  annote =	{Keywords: ML models, Explainability, Shapley values, decision trees, OBDDs, deterministic and decomposable Boolean circuits}
}
Document
Invited Talk
Comparing Apples and Oranges: Fairness and Diversity in Ranking (Invited Talk)

Authors: Julia Stoyanovich


Abstract
Algorithmic rankers take a collection of candidates as input and produce a ranking (permutation) of the candidates as output. The simplest kind of ranker is score-based; it computes a score of each candidate independently and returns the candidates in score order. Another common kind of ranker is learning-to-rank, where supervised learning is used to predict the ranking of unseen candidates. For both kinds of rankers, we may output the entire permutation or only the highest scoring k candidates, the top-k. Set selection is a special case of ranking that ignores the relative order among the top-k. In the past few years, there has been much work on incorporating fairness and diversity requirements into algorithmic rankers, with contributions coming from the data management, algorithms, information retrieval, and recommender systems communities. In my talk I will offer a broad perspective that connects formalizations and algorithmic approaches across subfields, grounding them in a common narrative around the value frameworks that motivate specific fairness- and diversity-enhancing interventions. I will discuss some recent and ongoing work, and will outline future research directions where the data management community is well-positioned to make lasting impact, especially if we attack these problems with our rich theory-meets-systems toolkit.

Cite as

Julia Stoyanovich. Comparing Apples and Oranges: Fairness and Diversity in Ranking (Invited Talk). In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{stoyanovich:LIPIcs.ICDT.2021.2,
  author =	{Stoyanovich, Julia},
  title =	{{Comparing Apples and Oranges: Fairness and Diversity in Ranking}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.2},
  URN =		{urn:nbn:de:0030-drops-137104},
  doi =		{10.4230/LIPIcs.ICDT.2021.2},
  annote =	{Keywords: fairness, diversity, ranking, set selection, responsible data management}
}
Document
Box Covers and Domain Orderings for Beyond Worst-Case Join Processing

Authors: Kaleb Alway, Eric Blais, and Semih Salihoglu


Abstract
Recent beyond worst-case optimal join algorithms Minesweeper and its generalization Tetris have brought the theory of indexing and join processing together by developing a geometric framework for joins. These algorithms take as input an index ℬ, referred to as a box cover, that stores output gaps that can be inferred from traditional indexes, such as B+ trees or tries, on the input relations. The performances of these algorithms highly depend on the certificate of ℬ, which is the smallest subset of gaps in ℬ whose union covers all of the gaps in the output space of a query Q. Different box covers can have different size certificates and the sizes of both the box covers and certificates highly depend on the ordering of the domain values of the attributes in Q. We study how to generate box covers that contain small size certificates to guarantee efficient runtimes for these algorithms. First, given a query Q over a set of relations of size N and a fixed set of domain orderings for the attributes, we give a Õ(N)-time algorithm called GAMB which generates a box cover for Q that is guaranteed to contain the smallest size certificate across any box cover for Q. Second, we show that finding a domain ordering to minimize the box cover size and certificate is NP-hard through a reduction from the 2 consecutive block minimization problem on boolean matrices. Our third contribution is a Õ(N)-time approximation algorithm called ADORA to compute domain orderings, under which one can compute a box cover of size Õ(K^r), where K is the minimum box cover for Q under any domain ordering and r is the maximum arity of any relation. This guarantees certificates of size Õ(K^r). We combine ADORA and GAMB with Tetris to form a new algorithm we call TetrisReordered, which provides several new beyond worst-case bounds. On infinite families of queries, TetrisReordered’s runtimes are unboundedly better than the bounds stated in prior work.

Cite as

Kaleb Alway, Eric Blais, and Semih Salihoglu. Box Covers and Domain Orderings for Beyond Worst-Case Join Processing. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 3:1-3:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{alway_et_al:LIPIcs.ICDT.2021.3,
  author =	{Alway, Kaleb and Blais, Eric and Salihoglu, Semih},
  title =	{{Box Covers and Domain Orderings for Beyond Worst-Case Join Processing}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{3:1--3:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.3},
  URN =		{urn:nbn:de:0030-drops-137114},
  doi =		{10.4230/LIPIcs.ICDT.2021.3},
  annote =	{Keywords: Beyond worst-case join algorithms, Tetris, Box covers, Domain orderings}
}
Document
A Purely Regular Approach to Non-Regular Core Spanners

Authors: Markus L. Schmid and Nicole Schweikardt


Abstract
The regular spanners (characterised by vset-automata) are closed under the algebraic operations of union, join and projection, and have desirable algorithmic properties. The core spanners (introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015) as a formalisation of the core functionality of the query language AQL used in IBM’s SystemT) additionally need string equality selections and it has been shown by Freydenberger and Holldack (ICDT 2016, Theory of Computing Systems 2018) that this leads to high complexity and even undecidability of the typical problems in static analysis and query evaluation. We propose an alternative approach to core spanners: by incorporating the string-equality selections directly into the regular language that represents the underlying regular spanner (instead of treating it as an algebraic operation on the table extracted by the regular spanner), we obtain a fragment of core spanners that, while having slightly weaker expressive power than the full class of core spanners, arguably still covers the intuitive applications of string equality selections for information extraction and has much better upper complexity bounds of the typical problems in static analysis and query evaluation.

Cite as

Markus L. Schmid and Nicole Schweikardt. A Purely Regular Approach to Non-Regular Core Spanners. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 4:1-4:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{schmid_et_al:LIPIcs.ICDT.2021.4,
  author =	{Schmid, Markus L. and Schweikardt, Nicole},
  title =	{{A Purely Regular Approach to Non-Regular Core Spanners}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{4:1--4:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.4},
  URN =		{urn:nbn:de:0030-drops-137124},
  doi =		{10.4230/LIPIcs.ICDT.2021.4},
  annote =	{Keywords: Document spanners, regular expressions with backreferences}
}
Document
Ranked Enumeration of Conjunctive Query Results

Authors: Shaleen Deep and Paraschos Koutris


Abstract
We study the problem of enumerating answers of Conjunctive Queries ranked according to a given ranking function. Our main contribution is a novel algorithm with small preprocessing time, logarithmic delay, and non-trivial space usage during execution. To allow for efficient enumeration, we exploit certain properties of ranking functions that frequently occur in practice. To this end, we introduce the notions of decomposable and compatible (w.r.t. a query decomposition) ranking functions, which allow for partial aggregation of tuple scores in order to efficiently enumerate the output. We complement the algorithmic results with lower bounds that justify why restrictions on the structure of ranking functions are necessary. Our results extend and improve upon a long line of work that has studied ranked enumeration from both a theoretical and practical perspective.

Cite as

Shaleen Deep and Paraschos Koutris. Ranked Enumeration of Conjunctive Query Results. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{deep_et_al:LIPIcs.ICDT.2021.5,
  author =	{Deep, Shaleen and Koutris, Paraschos},
  title =	{{Ranked Enumeration of Conjunctive Query Results}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.5},
  URN =		{urn:nbn:de:0030-drops-137139},
  doi =		{10.4230/LIPIcs.ICDT.2021.5},
  annote =	{Keywords: Query result enumeration, joins, ranking}
}
Document
Towards Optimal Dynamic Indexes for Approximate (and Exact) Triangle Counting

Authors: Shangqi Lu and Yufei Tao


Abstract
In ICDT'19, Kara, Ngo, Nikolic, Olteanu, and Zhang gave a structure which maintains the number T of triangles in an undirected graph G = (V, E) along with the edge insertions/deletions in G. Using O(m) space (m = |E|), their structure supports an update in O(√m log m) amortized time which is optimal (up to polylog factors) subject to the OMv-conjecture (Henzinger, Krinninger, Nanongkai, and Saranurak, STOC'15). Aiming to improve the update efficiency, we study: - the optimal tradeoff between update time and approximation quality. We require a structure to provide the (ε, Γ)-guarantee: when queried, it should return an estimate t of T that has relative error at most ε if T ≥ Γ, or an absolute error at most ε ⋅ Γ, otherwise. We prove that, under any ε ≤ 0.49 and subject to the OMv-conjecture, no structure can guarantee O(m^{0.5-δ}/Γ) expected amortized update time and O(m^{2/3-δ}) query time simultaneously for any constant δ > 0; this is true for Γ = m^c of any constant c in [0, 1/2). We match the lower bound with a structure that ensures Õ((1/ε)³ ⋅ √m/Γ) amortized update time with high probability, and O(1) query time. - (for exact counting) how to achieve arboricity-sensitive update time. For any 1 ≤ Γ ≤ √m, we describe a structure of O(min{α m + m log m, (m/Γ)²}) space that maintains T precisely, and supports an update in Õ(min{α + Γ, √m}) amortized time, where α is the largest arboricity of G in history (and does not need to be known). Our structure reconstructs the aforementioned ICDT'19 result up to polylog factors by setting Γ = √m, but achieves Õ(m^{0.5-δ}) update time as long as α = O(m^{0.5-δ}).

Cite as

Shangqi Lu and Yufei Tao. Towards Optimal Dynamic Indexes for Approximate (and Exact) Triangle Counting. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 6:1-6:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{lu_et_al:LIPIcs.ICDT.2021.6,
  author =	{Lu, Shangqi and Tao, Yufei},
  title =	{{Towards Optimal Dynamic Indexes for Approximate (and Exact) Triangle Counting}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{6:1--6:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.6},
  URN =		{urn:nbn:de:0030-drops-137146},
  doi =		{10.4230/LIPIcs.ICDT.2021.6},
  annote =	{Keywords: Triangle Counting, Data Structures, Lower Bounds, Graph Algorithms}
}
Document
Grammars for Document Spanners

Authors: Liat Peterfreund


Abstract
We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied formalisms for document spanners are mainly based on regular expressions, we use an extension of context-free grammars, called {extraction grammars}, to define the new class of context-free spanners. Extraction grammars are simply context-free grammars extended with variables that capture interval positions of the document, namely spans. While regular expressions are efficient for tokenizing and tagging, context-free grammars are also efficient for capturing structural properties. Indeed, we show that context-free spanners are strictly more expressive than their regular counterparts. We reason about the expressive power of our new class and present a pushdown-automata model that captures it. We show that extraction grammars can be evaluated with polynomial data complexity. Nevertheless, as the degree of the polynomial depends on the query, we present an enumeration algorithm for unambiguous extraction grammars that, after quintic preprocessing, outputs the results sequentially, without repetitions, with a constant delay between every two consecutive ones.

Cite as

Liat Peterfreund. Grammars for Document Spanners. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{peterfreund:LIPIcs.ICDT.2021.7,
  author =	{Peterfreund, Liat},
  title =	{{Grammars for Document Spanners}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.7},
  URN =		{urn:nbn:de:0030-drops-137154},
  doi =		{10.4230/LIPIcs.ICDT.2021.7},
  annote =	{Keywords: Information Extraction, Document Spanners, Context-Free Grammars, Constant-Delay Enumeration, Regular Expressions, Pushdown Automata}
}
Document
Input-Output Disjointness for Forward Expressions in the Logic of Information Flows

Authors: Heba Aamer and Jan Van den Bussche


Abstract
Last year we introduced the logic FLIF (forward logic of information flows) as a declarative language for specifying complex compositions of information sources with limited access patterns. The key insight of this approach is to view a system of information sources as a graph, where the nodes are valuations of variables, so that accesses to information sources can be modeled as edges in the graph. This allows the use of XPath-like navigational graph query languages. Indeed, a well-behaved fragment of FLIF, called io-disjoint FLIF, was shown to be equivalent to the executable fragment of first-order logic. It remained open, however, how io-disjoint FLIF compares to general FLIF . In this paper we close this gap by showing that general FLIF expressions can always be put into io-disjoint form.

Cite as

Heba Aamer and Jan Van den Bussche. Input-Output Disjointness for Forward Expressions in the Logic of Information Flows. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{aamer_et_al:LIPIcs.ICDT.2021.8,
  author =	{Aamer, Heba and Van den Bussche, Jan},
  title =	{{Input-Output Disjointness for Forward Expressions in the Logic of Information Flows}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.8},
  URN =		{urn:nbn:de:0030-drops-137167},
  doi =		{10.4230/LIPIcs.ICDT.2021.8},
  annote =	{Keywords: Composition, expressive power, variable substitution}
}
Document
Conjunctive Queries: Unique Characterizations and Exact Learnability

Authors: Balder ten Cate and Victor Dalmau


Abstract
We answer the question of which conjunctive queries are uniquely characterized by polynomially many positive and negative examples, and how to construct such examples efficiently. As a consequence, we obtain a new efficient exact learning algorithm for a class of conjunctive queries. At the core of our contributions lie two new polynomial-time algorithms for constructing frontiers in the homomorphism lattice of finite structures. We also discuss implications for the unique characterizability and learnability of schema mappings and of description logic concepts.

Cite as

Balder ten Cate and Victor Dalmau. Conjunctive Queries: Unique Characterizations and Exact Learnability. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 9:1-9:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{tencate_et_al:LIPIcs.ICDT.2021.9,
  author =	{ten Cate, Balder and Dalmau, Victor},
  title =	{{Conjunctive Queries: Unique Characterizations and Exact Learnability}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{9:1--9:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.9},
  URN =		{urn:nbn:de:0030-drops-137172},
  doi =		{10.4230/LIPIcs.ICDT.2021.9},
  annote =	{Keywords: Conjunctive Queries, Homomorphisms, Frontiers, Unique Characterizations, Exact Learnability, Schema Mappings, Description Logic}
}
Document
The Complexity of Aggregates over Extractions by Regular Expressions

Authors: Johannes Doleschal, Noa Bratman, Benny Kimelfeld, and Wim Martens


Abstract
Regular expressions with capture variables, also known as "regex-formulas", extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the regex formulas under the Relational Algebra. We investigate the computational complexity of querying text by aggregate functions, such as sum, average, and quantile, on top of regular document spanners. To this end, we formally define aggregate functions over regular document spanners and analyze the computational complexity of exact and approximate computation. More precisely, we show that in a restricted case, all studied aggregate functions can be computed in polynomial time. In general, however, even though exact computation is intractable, some aggregates can still be approximated with fully polynomial-time randomized approximation schemes (FPRAS).

Cite as

Johannes Doleschal, Noa Bratman, Benny Kimelfeld, and Wim Martens. The Complexity of Aggregates over Extractions by Regular Expressions. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{doleschal_et_al:LIPIcs.ICDT.2021.10,
  author =	{Doleschal, Johannes and Bratman, Noa and Kimelfeld, Benny and Martens, Wim},
  title =	{{The Complexity of Aggregates over Extractions by Regular Expressions}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{10:1--10:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.10},
  URN =		{urn:nbn:de:0030-drops-137181},
  doi =		{10.4230/LIPIcs.ICDT.2021.10},
  annote =	{Keywords: Information extraction, document spanners, regular expressions, aggregation functions}
}
Document
Answer Counting Under Guarded TGDs

Authors: Cristina Feier, Carsten Lutz, and Marcin Przybyłko


Abstract
We study the complexity of answer counting for ontology-mediated queries and for querying under constraints, considering conjunctive queries and unions thereof (UCQs) as the query language and guarded TGDs as the ontology and constraint language, respectively. Our main result is a classification according to whether answer counting is fixed-parameter tractable (FPT), W[1]-equivalent, #W[1]-equivalent, #W[2]-hard, or #A[2]-equivalent, lifting a recent classification for UCQs without ontologies and constraints due to Dell et al. [Holger Dell et al., 2019]. The classification pertains to various structural measures, namely treewidth, contract treewidth, starsize, and linked matching number. Our results rest on the assumption that the arity of relation symbols is bounded by a constant and, in the case of ontology-mediated querying, that all symbols from the ontology and query can occur in the data (so-called full data schema). We also study the meta-problems for the mentioned structural measures, that is, to decide whether a given ontology-mediated query or constraint-query specification is equivalent to one for which the structural measure is bounded.

Cite as

Cristina Feier, Carsten Lutz, and Marcin Przybyłko. Answer Counting Under Guarded TGDs. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 11:1-11:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{feier_et_al:LIPIcs.ICDT.2021.11,
  author =	{Feier, Cristina and Lutz, Carsten and Przyby{\l}ko, Marcin},
  title =	{{Answer Counting Under Guarded TGDs}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{11:1--11:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.11},
  URN =		{urn:nbn:de:0030-drops-137195},
  doi =		{10.4230/LIPIcs.ICDT.2021.11},
  annote =	{Keywords: Ontology-Mediated Querying, Querying under Constraints, Answer Counting, Parameterized Complexity}
}
Document
Maximum Coverage in the Data Stream Model: Parameterized and Generalized

Authors: Andrew McGregor, David Tench, and Hoa T. Vu


Abstract
We present algorithms for the Max Coverage and Max Unique Coverage problems in the data stream model. The input to both problems are m subsets of a universe of size n and a value k ∈ [m]. In Max Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by at least one set is maximized. In Max Unique Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by exactly one set is maximized. These problems are closely related to a range of graph problems including matching, partial vertex cover, and capacitated maximum cut. In the data stream model, we assume k is given and the sets are revealed online. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: - If the sets have size at most d, there exist single-pass algorithms using O(d^{d+1} k^d) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant d. - If each element appears in at most r sets, we present single pass algorithms using Õ(k² r/ε³) space that return a 1+ε approximation in the case of Max Coverage. We also present a single-pass algorithm using slightly more memory, i.e., Õ(k³ r/ε⁴) space, that 1+ε approximates Max Unique Coverage. In contrast to the above results, when d and r are arbitrary, any constant pass 1+ε approximation algorithm for either problem requires Ω(ε^{-2}m) space but a single pass O(ε^{-2}mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e-1) and e^{1-1/k} for Max Coverage and Max Unique Coverage respectively requires Ω(m/k²) space when d and r are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set Cover problem.

Cite as

Andrew McGregor, David Tench, and Hoa T. Vu. Maximum Coverage in the Data Stream Model: Parameterized and Generalized. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.ICDT.2021.12,
  author =	{McGregor, Andrew and Tench, David and Vu, Hoa T.},
  title =	{{Maximum Coverage in the Data Stream Model: Parameterized and Generalized}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.12},
  URN =		{urn:nbn:de:0030-drops-137208},
  doi =		{10.4230/LIPIcs.ICDT.2021.12},
  annote =	{Keywords: Data streams, maximum coverage, maximum unique coverage, set cover}
}
Document
Diverse Data Selection under Fairness Constraints

Authors: Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou


Abstract
Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe 𝒰 of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified k_i number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping.

Cite as

Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou. Diverse Data Selection under Fairness Constraints. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 13:1-13:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{moumoulidou_et_al:LIPIcs.ICDT.2021.13,
  author =	{Moumoulidou, Zafeiria and McGregor, Andrew and Meliou, Alexandra},
  title =	{{Diverse Data Selection under Fairness Constraints}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{13:1--13:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.13},
  URN =		{urn:nbn:de:0030-drops-137216},
  doi =		{10.4230/LIPIcs.ICDT.2021.13},
  annote =	{Keywords: data selection, diversity maximization, fairness constraints, approximation algorithms}
}
Document
Enumeration Algorithms for Conjunctive Queries with Projection

Authors: Shaleen Deep, Xiao Hu, and Paraschos Koutris


Abstract
We investigate the enumeration of query results for an important subset of CQs with projections, namely star and path queries. The task is to design data structures and algorithms that allow for efficient enumeration with delay guarantees after a preprocessing phase. Our main contribution is a series of results based on the idea of interleaving precomputed output with further join processing to maintain delay guarantees, which maybe of independent interest. In particular, we design combinatorial algorithms that provide instance-specific delay guarantees in linear preprocessing time. These algorithms improve upon the currently best known results. Further, we show how existing results can be improved upon by using fast matrix multiplication. We also present {new} results involving tradeoff between preprocessing time and delay guarantees for enumeration of path queries that contain projections. CQs with projection where the join attribute is projected away is equivalent to boolean matrix multiplication. Our results can therefore be also interpreted as sparse, output-sensitive matrix multiplication with delay guarantees.

Cite as

Shaleen Deep, Xiao Hu, and Paraschos Koutris. Enumeration Algorithms for Conjunctive Queries with Projection. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{deep_et_al:LIPIcs.ICDT.2021.14,
  author =	{Deep, Shaleen and Hu, Xiao and Koutris, Paraschos},
  title =	{{Enumeration Algorithms for Conjunctive Queries with Projection}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.14},
  URN =		{urn:nbn:de:0030-drops-137229},
  doi =		{10.4230/LIPIcs.ICDT.2021.14},
  annote =	{Keywords: Query result enumeration, joins, ranking}
}
Document
The Shapley Value of Inconsistency Measures for Functional Dependencies

Authors: Ester Livshits and Benny Kimelfeld


Abstract
Quantifying the inconsistency of a database is motivated by various goals including reliability estimation for new datasets and progress indication in data cleaning. Another goal is to attribute to individual tuples a level of responsibility to the overall inconsistency, and thereby prioritize tuples in the explanation or inspection of dirt. Therefore, inconsistency quantification and attribution have been a subject of much research in Knowledge Representation and, more recently, in Databases. As in many other fields, a conventional responsibility sharing mechanism is the Shapley value from cooperative game theory. In this paper, we carry out a systematic investigation of the complexity of the Shapley value in common inconsistency measures for functional-dependency (FD) violations. For several measures we establish a full classification of the FD sets into tractable and intractable classes with respect to Shapley-value computation. We also study the complexity of approximation in intractable cases.

Cite as

Ester Livshits and Benny Kimelfeld. The Shapley Value of Inconsistency Measures for Functional Dependencies. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 15:1-15:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{livshits_et_al:LIPIcs.ICDT.2021.15,
  author =	{Livshits, Ester and Kimelfeld, Benny},
  title =	{{The Shapley Value of Inconsistency Measures for Functional Dependencies}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{15:1--15:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.15},
  URN =		{urn:nbn:de:0030-drops-137230},
  doi =		{10.4230/LIPIcs.ICDT.2021.15},
  annote =	{Keywords: Shapley value, inconsistent databases, functional dependencies, database repairs}
}
Document
Database Repairing with Soft Functional Dependencies

Authors: Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, and Muhammad Tibi


Abstract
A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost), this subset is a "cardinality repair" of an inconsistent database; in soft interpretations, this subset corresponds to a "most probable world" of a probabilistic database, a "most likely intention" of a probabilistic unclean database, and so on. Within the class of functional dependencies, the complexity of finding a cardinality repair is thoroughly understood. Yet, very little is known about the complexity of finding an optimal subset for the more general soft semantics. This paper makes a significant progress in this direction. In addition to general insights about the hardness and approximability of the problem, we present algorithms for two special cases: a single functional dependency, and a bipartite matching. The latter is the problem of finding an optimal "almost matching" of a bipartite graph where a penalty is paid for every lost edge and every violation of monogamy.

Cite as

Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, and Muhammad Tibi. Database Repairing with Soft Functional Dependencies. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{carmeli_et_al:LIPIcs.ICDT.2021.16,
  author =	{Carmeli, Nofar and Grohe, Martin and Kimelfeld, Benny and Livshits, Ester and Tibi, Muhammad},
  title =	{{Database Repairing with Soft Functional Dependencies}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{16:1--16:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.16},
  URN =		{urn:nbn:de:0030-drops-137245},
  doi =		{10.4230/LIPIcs.ICDT.2021.16},
  annote =	{Keywords: Database inconsistency, database repairs, integrity constraints, soft constraints, functional dependencies}
}
Document
Uniform Reliability of Self-Join-Free Conjunctive Queries

Authors: Antoine Amarilli and Benny Kimelfeld


Abstract
The reliability of a Boolean Conjunctive Query (CQ) over a tuple-independent probabilistic database is the probability that the CQ is satisfied when the tuples of the database are sampled one by one, independently, with their associated probability. For queries without self-joins (repeated relation symbols), the data complexity of this problem is fully characterized in a known dichotomy: reliability can be computed in polynomial time for hierarchical queries, and is #P-hard for non-hierarchical queries. Hierarchical queries also characterize the tractability of queries for other tasks: having read-once lineage formulas, supporting insertion/deletion updates to the database in constant time, and having a tractable computation of tuples' Shapley and Banzhaf values. In this work, we investigate a fundamental counting problem for CQs without self-joins: how many sets of facts from the input database satisfy the query? This is equivalent to the uniform case of the query reliability problem, where the probability of every tuple is required to be 1/2. Of course, for hierarchical queries, uniform reliability is in polynomial time, like the reliability problem. However, it is an open question whether being hierarchical is necessary for the uniform reliability problem to be in polynomial time. In fact, the complexity of the problem has been unknown even for the simplest non-hierarchical CQs without self-joins. We solve this open question by showing that uniform reliability is #P-complete for every non-hierarchical CQ without self-joins. Hence, we establish that being hierarchical also characterizes the tractability of unweighted counting of the satisfying tuple subsets. We also consider the generalization to query reliability where all tuples of the same relation have the same probability, and give preliminary results on the complexity of this problem.

Cite as

Antoine Amarilli and Benny Kimelfeld. Uniform Reliability of Self-Join-Free Conjunctive Queries. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2021.17,
  author =	{Amarilli, Antoine and Kimelfeld, Benny},
  title =	{{Uniform Reliability of Self-Join-Free Conjunctive Queries}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{17:1--17:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.17},
  URN =		{urn:nbn:de:0030-drops-137252},
  doi =		{10.4230/LIPIcs.ICDT.2021.17},
  annote =	{Keywords: Hierarchical conjunctive queries, query reliability, tuple-independent database, counting problems, #P-hardness}
}
Document
Efficient Differentially Private F₀ Linear Sketching

Authors: Rasmus Pagh and Nina Mesing Stausholm


Abstract
A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work we consider how to construct sketches for weighted F₀, i.e., the summed weights of the elements in the data set, that are small, differentially private, and computationally efficient. Let a weight vector w ∈ (0,1]^u be given. For x ∈ {0,1}^u we are interested in estimating ||x∘w||₁ where ∘ is the Hadamard product (entrywise product). Building on a technique of Kushilevitz et al. (STOC 1998), we introduce a sketch (depending on w) that is linear over GF(2), mapping a vector x ∈ {0,1}^u to Hx ∈ {0,1}^τ for a matrix H sampled from a suitable distribution ℋ. Differential privacy is achieved by using randomized response, flipping each bit of Hx with probability p < 1/2. That is, for a vector φ ∈ {0,1}^τ where Pr[(φ)_j = 1] = p independently for each entry j, we consider the noisy sketch Hx + φ, where the addition of noise happens over GF(2). We show that for every choice of 0 < β < 1 and ε = O(1) there exists p < 1/2 and a distribution ℋ of linear sketches of size τ = O(log²(u)ε^{-2}β^{-2}) such that: 1) For random H∼ℋ and noise vector φ, given Hx + φ we can compute an estimate of ||x∘w||₁ that is accurate within a factor 1±β, plus additive error O(log(u)ε^{-2}β^{-2}), w. p. 1-u^{-1}, and 2) For every H∼ℋ, Hx + φ is ε-differentially private over the randomness in φ. The special case w = (1,… ,1) is unweighted F₀. Previously, Mir et al. (PODS 2011) and Kenthapadi et al. (J. Priv. Confidentiality 2013) had described a differentially private way of sketching unweighted F₀, but the algorithms for calibrating noise to their sketches are not computationally efficient, either using quasipolynomial time in the sketch size or superlinear time in the universe size u. For fixed ε the size of our sketch is polynomially related to the lower bound of Ω(log(u)β^{-2}) bits by Jayram & Woodruff (Trans. Algorithms 2013). The additive error is comparable to the bound of Ω(1/ε) of Hardt & Talwar (STOC 2010). An application of our sketch is that two sketches can be added to form a noisy sketch of the form H(x₁+x₂) + (φ₁+φ₂), which allows us to estimate ||(x₁+x₂)∘w||₁. Since addition is over GF(2), this is the weight of the symmetric difference of the vectors x₁ and x₂. Recent work has shown how to privately and efficiently compute an estimate for the symmetric difference size of two sets using (non-linear) sketches such as FM-sketches and Bloom Filters, but these methods have an error bound no better than O(√{̄{m}}), where ̄{m} is an upper bound on ||x₁||₀ and ||x₂||₀. This improves previous work when β = o (1/√{̄{m}}) and log(u)/ε = ̄{m}^{o(1)}. In conclusion our results both improve the efficiency of existing methods for unweighted F₀ estimation and extend to a weighted generalization. We also give a distributed streaming implementation for estimating the size of the union between two input streams.

Cite as

Rasmus Pagh and Nina Mesing Stausholm. Efficient Differentially Private F₀ Linear Sketching. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{pagh_et_al:LIPIcs.ICDT.2021.18,
  author =	{Pagh, Rasmus and Stausholm, Nina Mesing},
  title =	{{Efficient Differentially Private F₀ Linear Sketching}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.18},
  URN =		{urn:nbn:de:0030-drops-137264},
  doi =		{10.4230/LIPIcs.ICDT.2021.18},
  annote =	{Keywords: Differential Privacy, Linear Sketches, Weighted F0 Estimation}
}
Document
Fine-Grained Complexity of Regular Path Queries

Authors: Katrin Casel and Markus L. Schmid


Abstract
A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ evaluation (called PG-approach), i. e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.

Cite as

Katrin Casel and Markus L. Schmid. Fine-Grained Complexity of Regular Path Queries. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 19:1-19:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{casel_et_al:LIPIcs.ICDT.2021.19,
  author =	{Casel, Katrin and Schmid, Markus L.},
  title =	{{Fine-Grained Complexity of Regular Path Queries}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{19:1--19:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.19},
  URN =		{urn:nbn:de:0030-drops-137277},
  doi =		{10.4230/LIPIcs.ICDT.2021.19},
  annote =	{Keywords: Graph Databases, Regular Path Queries, Enumeration, Fine-Grained Complexity}
}
Document
Ranked Enumeration of MSO Logic on Words

Authors: Pierre Bourhis, Alejandro Grez, Louis Jachiet, and Cristian Riveros


Abstract
In the last years, enumeration algorithms with bounded delay have attracted a lot of attention for several data management tasks. Given a query and the data, the task is to preprocess the data and then enumerate all the answers to the query one by one and without repetitions. This enumeration scheme is typically useful when the solutions are treated on the fly or when we want to stop the enumeration once the pertinent solutions have been found. However, with the current schemes, there is no restriction on the order how the solutions are given and this order usually depends on the techniques used and not on the relevance for the user. In this paper we study the enumeration of monadic second order logic (MSO) over words when the solutions are ranked. We present a framework based on MSO cost functions that allows to express MSO formulae on words with a cost associated with each solution. We then demonstrate the generality of our framework which subsumes, for instance, document spanners and adds ranking to them. The main technical result of the paper is an algorithm for enumerating all the solutions of formulae in increasing order of cost efficiently, namely, with a linear preprocessing phase and logarithmic delay between solutions. The novelty of this algorithm is based on using functional data structures, in particular, by extending functional Brodal queues to suit with the ranked enumeration of MSO on words.

Cite as

Pierre Bourhis, Alejandro Grez, Louis Jachiet, and Cristian Riveros. Ranked Enumeration of MSO Logic on Words. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{bourhis_et_al:LIPIcs.ICDT.2021.20,
  author =	{Bourhis, Pierre and Grez, Alejandro and Jachiet, Louis and Riveros, Cristian},
  title =	{{Ranked Enumeration of MSO Logic on Words}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.20},
  URN =		{urn:nbn:de:0030-drops-137286},
  doi =		{10.4230/LIPIcs.ICDT.2021.20},
  annote =	{Keywords: Persistent data structures, Query evaluation, Enumeration algorithms}
}
Document
Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing

Authors: Samuel McCauley


Abstract
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess n strings of length d, to quickly answer queries q of the form: if there is a database string within edit distance r of q, return a database string within edit distance cr of q. Previous approaches to this problem either rely on very large (superconstant) approximation ratios c, or very small search radii r. Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all n strings. In this work we give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time Õ(d3^rn^{1/c}). The best known practical results require c ≫ r to achieve any correctness guarantee; meanwhile, the best known theoretical results are very involved and difficult to implement, and require query time that can be loosely bounded below by 24^r. Our results significantly broaden the range of parameters for which there exist nontrivial theoretical bounds, while retaining the practicality of a locality-sensitive hash function.

Cite as

Samuel McCauley. Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 21:1-21:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{mccauley:LIPIcs.ICDT.2021.21,
  author =	{McCauley, Samuel},
  title =	{{Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{21:1--21:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.21},
  URN =		{urn:nbn:de:0030-drops-137299},
  doi =		{10.4230/LIPIcs.ICDT.2021.21},
  annote =	{Keywords: edit distance, approximate pattern matching, approximate nearest neighbor, similarity search, locality-sensitive hashing}
}
Document
Locality-Aware Distribution Schemes

Authors: Bruhathi Sundarmurthy, Paraschos Koutris, and Jeffrey Naughton


Abstract
One of the bottlenecks in parallel query processing is the cost of shuffling data across nodes in a cluster. Ideally, given a distribution of the data across the nodes and a query, we want to execute the query by performing only local computation and no communication: in this case, the query is called parallel-correct with respect to the data distribution. Previous work studied this problem for Conjunctive Queries in the case where the distribution scheme is oblivious, i.e., the location of each tuple depends only on the tuple and is independent of the instance. In this work, we show that oblivious schemes have a fundamental theoretical limitation, and initiate the formal study of distribution schemes that are locality-aware. In particular, we focus on a class of distribution schemes called co-hash distribution schemes, which are widely used in parallel systems. In co-hash partitioning, some tables are initially hashed, and the remaining tables are co-located so that a join condition is always satisfied. Given a co-hash distribution scheme, we formally study the complexity of deciding various desirable properties, including obliviousness and redundancy. Then, for a given Conjunctive Query and co-hash scheme, we determine the computational complexity of deciding whether the query is parallel-correct. We also explore a stronger notion of correctness, called parallel disjoint correctness, which guarantees that the query result will be disjointly partitioned across nodes, i.e., there is no duplication of results.

Cite as

Bruhathi Sundarmurthy, Paraschos Koutris, and Jeffrey Naughton. Locality-Aware Distribution Schemes. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 22:1-22:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{sundarmurthy_et_al:LIPIcs.ICDT.2021.22,
  author =	{Sundarmurthy, Bruhathi and Koutris, Paraschos and Naughton, Jeffrey},
  title =	{{Locality-Aware Distribution Schemes}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{22:1--22:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.22},
  URN =		{urn:nbn:de:0030-drops-137302},
  doi =		{10.4230/LIPIcs.ICDT.2021.22},
  annote =	{Keywords: partitioning, parallel correctness, join queries}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail