29 Search Results for "Wei, Zhewei"


Volume

LIPIcs, Volume 186

24th International Conference on Database Theory (ICDT 2021)

ICDT 2021, March 23-26, 2021, Nicosia, Cyprus

Editors: Ke Yi and Zhewei Wei

Document
Track A: Algorithms, Complexity and Games
On Range Summary Queries

Authors: Peyman Afshani, Pingan Cheng, Aniket Basu Roy, and Zhewei Wei

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)


Abstract
We study the query version of the approximate heavy hitter and quantile problems. In the former problem, the input is a parameter ε and a set P of n points in ℝ^d where each point is assigned a color from a set C, and the goal is to build a structure such that given any geometric range γ, we can efficiently find a list of approximate heavy hitters in γ∩P, i.e., colors that appear at least ε |γ∩P| times in γ∩P, as well as their frequencies with an additive error of ε |γ∩P|. In the latter problem, each point is assigned a weight from a totally ordered universe and the query must output a sequence S of 1+1/ε weights such that the i-th weight in S has approximate rank iε|γ∩P|, meaning, rank iε|γ∩P| up to an additive error of ε|γ∩P|. Previously, optimal results were only known in 1D [Wei and Yi, 2011] but a few sub-optimal methods were available in higher dimensions [Peyman Afshani and Zhewei Wei, 2017; Pankaj K. Agarwal et al., 2012]. We study the problems for two important classes of geometric ranges: 3D halfspace and 3D dominance queries. It is known that many other important queries can be reduced to these two, e.g., 1D interval stabbing or interval containment, 2D three-sided queries, 2D circular as well as 2D k-nearest neighbors queries. We consider the real RAM model of computation where integer registers of size w bits, w = Θ(log n), are also available. For dominance queries, we show optimal solutions for both heavy hitter and quantile problems: using linear space, we can answer both queries in time O(log n + 1/ε). Note that as the output size is 1/ε, after investing the initial O(log n) searching time, our structure takes on average O(1) time to find a heavy hitter or a quantile! For more general halfspace heavy hitter queries, the same optimal query time can be achieved by increasing the space by an extra log_w(1/ε) (resp. log log_w(1/ε)) factor in 3D (resp. 2D). By spending extra log^O(1)(1/ε) factors in both time and space, we can also support quantile queries. We remark that it is hopeless to achieve a similar query bound for dimensions 4 or higher unless significant advances are made in the data structure side of theory of geometric approximations.

Cite as

Peyman Afshani, Pingan Cheng, Aniket Basu Roy, and Zhewei Wei. On Range Summary Queries. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 7:1-7:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{afshani_et_al:LIPIcs.ICALP.2023.7,
  author =	{Afshani, Peyman and Cheng, Pingan and Basu Roy, Aniket and Wei, Zhewei},
  title =	{{On Range Summary Queries}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{7:1--7:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.7},
  URN =		{urn:nbn:de:0030-drops-180590},
  doi =		{10.4230/LIPIcs.ICALP.2023.7},
  annote =	{Keywords: Computational Geometry, Range Searching, Random Sampling, Geometric Approximation, Data Structures and Algorithms}
}
Document
Complete Volume
LIPIcs, Volume 186, ICDT 2021, Complete Volume

Authors: Ke Yi and Zhewei Wei

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
LIPIcs, Volume 186, ICDT 2021, Complete Volume

Cite as

Ke Yi and Zhewei Wei. LIPIcs, Volume 186, ICDT 2021, Complete Volume. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 1-438, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@Proceedings{yi_et_al:LIPIcs.ICDT.2021,
  title =	{{LIPIcs, Volume 186, ICDT 2021, Complete Volume}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{1--438},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021},
  URN =		{urn:nbn:de:0030-drops-137076},
  doi =		{10.4230/LIPIcs.ICDT.2021},
  annote =	{Keywords: LIPIcs, Volume 186, ICDT 2021, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Ke Yi and Zhewei Wei

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

Ke Yi and Zhewei Wei. Front Matter, Table of Contents, Preface, Conference Organization. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 0:i-0:xvi, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{yi_et_al:LIPIcs.ICDT.2021.0,
  author =	{Yi, Ke and Wei, Zhewei},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.0},
  URN =		{urn:nbn:de:0030-drops-137086},
  doi =		{10.4230/LIPIcs.ICDT.2021.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Explainability Queries for ML Models and its Connections with Data Management Problems (Invited Talk)

Authors: Pablo Barceló

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML. The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time. The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multi-layer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models.

Cite as

Pablo Barceló. Explainability Queries for ML Models and its Connections with Data Management Problems (Invited Talk). In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, p. 1:1, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{barcelo:LIPIcs.ICDT.2021.1,
  author =	{Barcel\'{o}, Pablo},
  title =	{{Explainability Queries for ML Models and its Connections with Data Management Problems}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.1},
  URN =		{urn:nbn:de:0030-drops-137091},
  doi =		{10.4230/LIPIcs.ICDT.2021.1},
  annote =	{Keywords: ML models, Explainability, Shapley values, decision trees, OBDDs, deterministic and decomposable Boolean circuits}
}
Document
Invited Talk
Comparing Apples and Oranges: Fairness and Diversity in Ranking (Invited Talk)

Authors: Julia Stoyanovich

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
Algorithmic rankers take a collection of candidates as input and produce a ranking (permutation) of the candidates as output. The simplest kind of ranker is score-based; it computes a score of each candidate independently and returns the candidates in score order. Another common kind of ranker is learning-to-rank, where supervised learning is used to predict the ranking of unseen candidates. For both kinds of rankers, we may output the entire permutation or only the highest scoring k candidates, the top-k. Set selection is a special case of ranking that ignores the relative order among the top-k. In the past few years, there has been much work on incorporating fairness and diversity requirements into algorithmic rankers, with contributions coming from the data management, algorithms, information retrieval, and recommender systems communities. In my talk I will offer a broad perspective that connects formalizations and algorithmic approaches across subfields, grounding them in a common narrative around the value frameworks that motivate specific fairness- and diversity-enhancing interventions. I will discuss some recent and ongoing work, and will outline future research directions where the data management community is well-positioned to make lasting impact, especially if we attack these problems with our rich theory-meets-systems toolkit.

Cite as

Julia Stoyanovich. Comparing Apples and Oranges: Fairness and Diversity in Ranking (Invited Talk). In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, p. 2:1, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{stoyanovich:LIPIcs.ICDT.2021.2,
  author =	{Stoyanovich, Julia},
  title =	{{Comparing Apples and Oranges: Fairness and Diversity in Ranking}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.2},
  URN =		{urn:nbn:de:0030-drops-137104},
  doi =		{10.4230/LIPIcs.ICDT.2021.2},
  annote =	{Keywords: fairness, diversity, ranking, set selection, responsible data management}
}
Document
Box Covers and Domain Orderings for Beyond Worst-Case Join Processing

Authors: Kaleb Alway, Eric Blais, and Semih Salihoglu

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
Recent beyond worst-case optimal join algorithms Minesweeper and its generalization Tetris have brought the theory of indexing and join processing together by developing a geometric framework for joins. These algorithms take as input an index ℬ, referred to as a box cover, that stores output gaps that can be inferred from traditional indexes, such as B+ trees or tries, on the input relations. The performances of these algorithms highly depend on the certificate of ℬ, which is the smallest subset of gaps in ℬ whose union covers all of the gaps in the output space of a query Q. Different box covers can have different size certificates and the sizes of both the box covers and certificates highly depend on the ordering of the domain values of the attributes in Q. We study how to generate box covers that contain small size certificates to guarantee efficient runtimes for these algorithms. First, given a query Q over a set of relations of size N and a fixed set of domain orderings for the attributes, we give a Õ(N)-time algorithm called GAMB which generates a box cover for Q that is guaranteed to contain the smallest size certificate across any box cover for Q. Second, we show that finding a domain ordering to minimize the box cover size and certificate is NP-hard through a reduction from the 2 consecutive block minimization problem on boolean matrices. Our third contribution is a Õ(N)-time approximation algorithm called ADORA to compute domain orderings, under which one can compute a box cover of size Õ(K^r), where K is the minimum box cover for Q under any domain ordering and r is the maximum arity of any relation. This guarantees certificates of size Õ(K^r). We combine ADORA and GAMB with Tetris to form a new algorithm we call TetrisReordered, which provides several new beyond worst-case bounds. On infinite families of queries, TetrisReordered’s runtimes are unboundedly better than the bounds stated in prior work.

Cite as

Kaleb Alway, Eric Blais, and Semih Salihoglu. Box Covers and Domain Orderings for Beyond Worst-Case Join Processing. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 3:1-3:23, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{alway_et_al:LIPIcs.ICDT.2021.3,
  author =	{Alway, Kaleb and Blais, Eric and Salihoglu, Semih},
  title =	{{Box Covers and Domain Orderings for Beyond Worst-Case Join Processing}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{3:1--3:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.3},
  URN =		{urn:nbn:de:0030-drops-137114},
  doi =		{10.4230/LIPIcs.ICDT.2021.3},
  annote =	{Keywords: Beyond worst-case join algorithms, Tetris, Box covers, Domain orderings}
}
Document
A Purely Regular Approach to Non-Regular Core Spanners

Authors: Markus L. Schmid and Nicole Schweikardt

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
The regular spanners (characterised by vset-automata) are closed under the algebraic operations of union, join and projection, and have desirable algorithmic properties. The core spanners (introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015) as a formalisation of the core functionality of the query language AQL used in IBM’s SystemT) additionally need string equality selections and it has been shown by Freydenberger and Holldack (ICDT 2016, Theory of Computing Systems 2018) that this leads to high complexity and even undecidability of the typical problems in static analysis and query evaluation. We propose an alternative approach to core spanners: by incorporating the string-equality selections directly into the regular language that represents the underlying regular spanner (instead of treating it as an algebraic operation on the table extracted by the regular spanner), we obtain a fragment of core spanners that, while having slightly weaker expressive power than the full class of core spanners, arguably still covers the intuitive applications of string equality selections for information extraction and has much better upper complexity bounds of the typical problems in static analysis and query evaluation.

Cite as

Markus L. Schmid and Nicole Schweikardt. A Purely Regular Approach to Non-Regular Core Spanners. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 4:1-4:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{schmid_et_al:LIPIcs.ICDT.2021.4,
  author =	{Schmid, Markus L. and Schweikardt, Nicole},
  title =	{{A Purely Regular Approach to Non-Regular Core Spanners}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{4:1--4:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.4},
  URN =		{urn:nbn:de:0030-drops-137124},
  doi =		{10.4230/LIPIcs.ICDT.2021.4},
  annote =	{Keywords: Document spanners, regular expressions with backreferences}
}
Document
Ranked Enumeration of Conjunctive Query Results

Authors: Shaleen Deep and Paraschos Koutris

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
We study the problem of enumerating answers of Conjunctive Queries ranked according to a given ranking function. Our main contribution is a novel algorithm with small preprocessing time, logarithmic delay, and non-trivial space usage during execution. To allow for efficient enumeration, we exploit certain properties of ranking functions that frequently occur in practice. To this end, we introduce the notions of decomposable and compatible (w.r.t. a query decomposition) ranking functions, which allow for partial aggregation of tuple scores in order to efficiently enumerate the output. We complement the algorithmic results with lower bounds that justify why restrictions on the structure of ranking functions are necessary. Our results extend and improve upon a long line of work that has studied ranked enumeration from both a theoretical and practical perspective.

Cite as

Shaleen Deep and Paraschos Koutris. Ranked Enumeration of Conjunctive Query Results. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 5:1-5:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{deep_et_al:LIPIcs.ICDT.2021.5,
  author =	{Deep, Shaleen and Koutris, Paraschos},
  title =	{{Ranked Enumeration of Conjunctive Query Results}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.5},
  URN =		{urn:nbn:de:0030-drops-137139},
  doi =		{10.4230/LIPIcs.ICDT.2021.5},
  annote =	{Keywords: Query result enumeration, joins, ranking}
}
Document
Towards Optimal Dynamic Indexes for Approximate (and Exact) Triangle Counting

Authors: Shangqi Lu and Yufei Tao

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
In ICDT'19, Kara, Ngo, Nikolic, Olteanu, and Zhang gave a structure which maintains the number T of triangles in an undirected graph G = (V, E) along with the edge insertions/deletions in G. Using O(m) space (m = |E|), their structure supports an update in O(√m log m) amortized time which is optimal (up to polylog factors) subject to the OMv-conjecture (Henzinger, Krinninger, Nanongkai, and Saranurak, STOC'15). Aiming to improve the update efficiency, we study: - the optimal tradeoff between update time and approximation quality. We require a structure to provide the (ε, Γ)-guarantee: when queried, it should return an estimate t of T that has relative error at most ε if T ≥ Γ, or an absolute error at most ε ⋅ Γ, otherwise. We prove that, under any ε ≤ 0.49 and subject to the OMv-conjecture, no structure can guarantee O(m^{0.5-δ}/Γ) expected amortized update time and O(m^{2/3-δ}) query time simultaneously for any constant δ > 0; this is true for Γ = m^c of any constant c in [0, 1/2). We match the lower bound with a structure that ensures Õ((1/ε)³ ⋅ √m/Γ) amortized update time with high probability, and O(1) query time. - (for exact counting) how to achieve arboricity-sensitive update time. For any 1 ≤ Γ ≤ √m, we describe a structure of O(min{α m + m log m, (m/Γ)²}) space that maintains T precisely, and supports an update in Õ(min{α + Γ, √m}) amortized time, where α is the largest arboricity of G in history (and does not need to be known). Our structure reconstructs the aforementioned ICDT'19 result up to polylog factors by setting Γ = √m, but achieves Õ(m^{0.5-δ}) update time as long as α = O(m^{0.5-δ}).

Cite as

Shangqi Lu and Yufei Tao. Towards Optimal Dynamic Indexes for Approximate (and Exact) Triangle Counting. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 6:1-6:23, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{lu_et_al:LIPIcs.ICDT.2021.6,
  author =	{Lu, Shangqi and Tao, Yufei},
  title =	{{Towards Optimal Dynamic Indexes for Approximate (and Exact) Triangle Counting}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{6:1--6:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.6},
  URN =		{urn:nbn:de:0030-drops-137146},
  doi =		{10.4230/LIPIcs.ICDT.2021.6},
  annote =	{Keywords: Triangle Counting, Data Structures, Lower Bounds, Graph Algorithms}
}
Document
Grammars for Document Spanners

Authors: Liat Peterfreund

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied formalisms for document spanners are mainly based on regular expressions, we use an extension of context-free grammars, called {extraction grammars}, to define the new class of context-free spanners. Extraction grammars are simply context-free grammars extended with variables that capture interval positions of the document, namely spans. While regular expressions are efficient for tokenizing and tagging, context-free grammars are also efficient for capturing structural properties. Indeed, we show that context-free spanners are strictly more expressive than their regular counterparts. We reason about the expressive power of our new class and present a pushdown-automata model that captures it. We show that extraction grammars can be evaluated with polynomial data complexity. Nevertheless, as the degree of the polynomial depends on the query, we present an enumeration algorithm for unambiguous extraction grammars that, after quintic preprocessing, outputs the results sequentially, without repetitions, with a constant delay between every two consecutive ones.

Cite as

Liat Peterfreund. Grammars for Document Spanners. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 7:1-7:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{peterfreund:LIPIcs.ICDT.2021.7,
  author =	{Peterfreund, Liat},
  title =	{{Grammars for Document Spanners}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.7},
  URN =		{urn:nbn:de:0030-drops-137154},
  doi =		{10.4230/LIPIcs.ICDT.2021.7},
  annote =	{Keywords: Information Extraction, Document Spanners, Context-Free Grammars, Constant-Delay Enumeration, Regular Expressions, Pushdown Automata}
}
Document
Input-Output Disjointness for Forward Expressions in the Logic of Information Flows

Authors: Heba Aamer and Jan Van den Bussche

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
Last year we introduced the logic FLIF (forward logic of information flows) as a declarative language for specifying complex compositions of information sources with limited access patterns. The key insight of this approach is to view a system of information sources as a graph, where the nodes are valuations of variables, so that accesses to information sources can be modeled as edges in the graph. This allows the use of XPath-like navigational graph query languages. Indeed, a well-behaved fragment of FLIF, called io-disjoint FLIF, was shown to be equivalent to the executable fragment of first-order logic. It remained open, however, how io-disjoint FLIF compares to general FLIF . In this paper we close this gap by showing that general FLIF expressions can always be put into io-disjoint form.

Cite as

Heba Aamer and Jan Van den Bussche. Input-Output Disjointness for Forward Expressions in the Logic of Information Flows. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 8:1-8:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{aamer_et_al:LIPIcs.ICDT.2021.8,
  author =	{Aamer, Heba and Van den Bussche, Jan},
  title =	{{Input-Output Disjointness for Forward Expressions in the Logic of Information Flows}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.8},
  URN =		{urn:nbn:de:0030-drops-137167},
  doi =		{10.4230/LIPIcs.ICDT.2021.8},
  annote =	{Keywords: Composition, expressive power, variable substitution}
}
Document
Conjunctive Queries: Unique Characterizations and Exact Learnability

Authors: Balder ten Cate and Victor Dalmau

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
We answer the question of which conjunctive queries are uniquely characterized by polynomially many positive and negative examples, and how to construct such examples efficiently. As a consequence, we obtain a new efficient exact learning algorithm for a class of conjunctive queries. At the core of our contributions lie two new polynomial-time algorithms for constructing frontiers in the homomorphism lattice of finite structures. We also discuss implications for the unique characterizability and learnability of schema mappings and of description logic concepts.

Cite as

Balder ten Cate and Victor Dalmau. Conjunctive Queries: Unique Characterizations and Exact Learnability. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 9:1-9:24, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{tencate_et_al:LIPIcs.ICDT.2021.9,
  author =	{ten Cate, Balder and Dalmau, Victor},
  title =	{{Conjunctive Queries: Unique Characterizations and Exact Learnability}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{9:1--9:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.9},
  URN =		{urn:nbn:de:0030-drops-137172},
  doi =		{10.4230/LIPIcs.ICDT.2021.9},
  annote =	{Keywords: Conjunctive Queries, Homomorphisms, Frontiers, Unique Characterizations, Exact Learnability, Schema Mappings, Description Logic}
}
Document
The Complexity of Aggregates over Extractions by Regular Expressions

Authors: Johannes Doleschal, Noa Bratman, Benny Kimelfeld, and Wim Martens

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
Regular expressions with capture variables, also known as "regex-formulas", extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the regex formulas under the Relational Algebra. We investigate the computational complexity of querying text by aggregate functions, such as sum, average, and quantile, on top of regular document spanners. To this end, we formally define aggregate functions over regular document spanners and analyze the computational complexity of exact and approximate computation. More precisely, we show that in a restricted case, all studied aggregate functions can be computed in polynomial time. In general, however, even though exact computation is intractable, some aggregates can still be approximated with fully polynomial-time randomized approximation schemes (FPRAS).

Cite as

Johannes Doleschal, Noa Bratman, Benny Kimelfeld, and Wim Martens. The Complexity of Aggregates over Extractions by Regular Expressions. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 10:1-10:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{doleschal_et_al:LIPIcs.ICDT.2021.10,
  author =	{Doleschal, Johannes and Bratman, Noa and Kimelfeld, Benny and Martens, Wim},
  title =	{{The Complexity of Aggregates over Extractions by Regular Expressions}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{10:1--10:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.10},
  URN =		{urn:nbn:de:0030-drops-137181},
  doi =		{10.4230/LIPIcs.ICDT.2021.10},
  annote =	{Keywords: Information extraction, document spanners, regular expressions, aggregation functions}
}
Document
Answer Counting Under Guarded TGDs

Authors: Cristina Feier, Carsten Lutz, and Marcin Przybyłko

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
We study the complexity of answer counting for ontology-mediated queries and for querying under constraints, considering conjunctive queries and unions thereof (UCQs) as the query language and guarded TGDs as the ontology and constraint language, respectively. Our main result is a classification according to whether answer counting is fixed-parameter tractable (FPT), W[1]-equivalent, #W[1]-equivalent, #W[2]-hard, or #A[2]-equivalent, lifting a recent classification for UCQs without ontologies and constraints due to Dell et al. [Holger Dell et al., 2019]. The classification pertains to various structural measures, namely treewidth, contract treewidth, starsize, and linked matching number. Our results rest on the assumption that the arity of relation symbols is bounded by a constant and, in the case of ontology-mediated querying, that all symbols from the ontology and query can occur in the data (so-called full data schema). We also study the meta-problems for the mentioned structural measures, that is, to decide whether a given ontology-mediated query or constraint-query specification is equivalent to one for which the structural measure is bounded.

Cite as

Cristina Feier, Carsten Lutz, and Marcin Przybyłko. Answer Counting Under Guarded TGDs. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 11:1-11:22, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{feier_et_al:LIPIcs.ICDT.2021.11,
  author =	{Feier, Cristina and Lutz, Carsten and Przyby{\l}ko, Marcin},
  title =	{{Answer Counting Under Guarded TGDs}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{11:1--11:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.11},
  URN =		{urn:nbn:de:0030-drops-137195},
  doi =		{10.4230/LIPIcs.ICDT.2021.11},
  annote =	{Keywords: Ontology-Mediated Querying, Querying under Constraints, Answer Counting, Parameterized Complexity}
}
  • Refine by Author
  • 4 Kimelfeld, Benny
  • 4 Wei, Zhewei
  • 3 Koutris, Paraschos
  • 2 Afshani, Peyman
  • 2 Deep, Shaleen
  • Show More...

  • Refine by Classification
  • 5 Theory of computation → Database theory
  • 4 Theory of computation → Database query languages (principles)
  • 4 Theory of computation → Database query processing and optimization (theory)
  • 4 Theory of computation → Incomplete, inconsistent, and uncertain databases
  • 3 Information systems → Data management systems
  • Show More...

  • Refine by Keyword
  • 3 ranking
  • 2 Query result enumeration
  • 2 database repairs
  • 2 functional dependencies
  • 2 joins
  • Show More...

  • Refine by Type
  • 28 document
  • 1 volume

  • Refine by Publication Year
  • 25 2021
  • 2 2015
  • 1 2017
  • 1 2023

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail