19 Search Results for "Suciu, Dan"


Document
Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability

Authors: Antoine Amarilli, Timothy van Bremen, and Kuldeep S. Meel

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
Query evaluation over probabilistic databases is a notoriously intractable problem - not only in combined complexity, but for many natural queries in data complexity as well [Antoine Amarilli et al., 2017; Nilesh N. Dalvi and Dan Suciu, 2012]. This motivates the study of probabilistic query evaluation through the lens of approximation algorithms, and particularly of combined FPRASes, whose runtime is polynomial in both the query and instance size. In this paper, we focus on tuple-independent probabilistic databases over binary signatures, which can be equivalently viewed as probabilistic graphs. We study in which cases we can devise combined FPRASes for probabilistic query evaluation in this setting. We settle the complexity of this problem for a variety of query and instance classes, by proving both approximability and (conditional) inapproximability results. This allows us to deduce many corollaries of possible independent interest. For example, we show how the results of [Marcelo Arenas et al., 2021] on counting fixed-length strings accepted by an NFA imply the existence of an FPRAS for the two-terminal network reliability problem on directed acyclic graphs: this was an open problem until now [Rico Zenklusen and Marco Laumanns, 2011]. We also show that one cannot extend a recent result [Timothy van Bremen and Kuldeep S. Meel, 2023] that gives a combined FPRAS for self-join-free conjunctive queries of bounded hypertree width on probabilistic databases: neither the bounded-hypertree-width condition nor the self-join-freeness hypothesis can be relaxed. Finally, we complement all our inapproximability results with unconditional lower bounds, showing that DNNF provenance circuits must have at least moderately exponential size in combined complexity.

Cite as

Antoine Amarilli, Timothy van Bremen, and Kuldeep S. Meel. Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 15:1-15:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2024.15,
  author =	{Amarilli, Antoine and van Bremen, Timothy and Meel, Kuldeep S.},
  title =	{{Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{15:1--15:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.15},
  URN =		{urn:nbn:de:0030-drops-197978},
  doi =		{10.4230/LIPIcs.ICDT.2024.15},
  annote =	{Keywords: Probabilistic query evaluation, tuple-independent databases, approximation}
}
Document
Degree Sequence Bound for Join Cardinality Estimation

Authors: Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)


Abstract
Recent work has demonstrated the catastrophic effects of poor cardinality estimates on query processing time. In particular, underestimating query cardinality can result in overly optimistic query plans which take orders of magnitude longer to complete than one generated with the true cardinality. Cardinality bounding avoids this pitfall by computing an upper bound on the query’s output size using statistics about the database such as table sizes and degrees, i.e. value frequencies. In this paper, we extend this line of work by proving a novel bound called the Degree Sequence Bound which takes into account the full degree sequences and the max tuple multiplicity. This work focuses on the important class of Berge-Acyclic queries for which the Degree Sequence Bound is tight. Further, we describe how to practically compute this bound using a functional approximation of the true degree sequences and prove that even this functional form improves upon previous bounds.

Cite as

Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai. Degree Sequence Bound for Join Cardinality Estimation. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{deeds_et_al:LIPIcs.ICDT.2023.8,
  author =	{Deeds, Kyle and Suciu, Dan and Balazinska, Magda and Cai, Walter},
  title =	{{Degree Sequence Bound for Join Cardinality Estimation}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.8},
  URN =		{urn:nbn:de:0030-drops-177508},
  doi =		{10.4230/LIPIcs.ICDT.2023.8},
  annote =	{Keywords: Cardinality Estimation, Cardinality Bounding, Degree Bounds, Functional Approximation, Query Planning, Berge-Acyclic Queries}
}
Document
Algorithmic Aspects of Information Theory (Dagstuhl Seminar 22301)

Authors: Phokion G. Kolaitis, Andrej E. Romashchenko, Milan Studený, Dan Suciu, and Tobias A. Boege

Published in: Dagstuhl Reports, Volume 12, Issue 7 (2023)


Abstract
This report documents the program and the outcomes of Dagstuhl Seminar 22301 "Algorithmic Aspects of Information Theory". Constraints on entropies constitute the "laws of information theory". These constraints go well beyond Shannon’s basic information inequalities, as they include not only information inequalities that cannot be derived from Shannon’s basic inequalities, but also conditional inequalities and disjunctive inequalities that are valid for all entropic functions. There is an extensive body of research on constraints on entropies and their applications to different areas of mathematics and computer science. So far, however, little progress has been made on the algorithmic aspects of information theory. In fact, even fundamental questions about the decidability of information inequalities and their variants have remained open to date. Recently, research in different applications has demonstrated a clear need for algorithmic solutions to questions in information theory. These applications include: finding tight upper bounds on the answer to a query on a relational database, the homomorphism domination problem and its uses in query optimization, the conditional independence implication problem, soft constraints in databases, group-theoretic inequalities, and lower bounds on the information ratio in secret sharing. Thus far, the information-theory community has had little interaction with the communities where these applications have been studied or with the computational complexity community. The main goal of this Dagstuhl Seminar was to bring together researchers from the aforementioned communities and to develop an agenda for studying algorithmic aspects of information theory, motivated from a rich set of diverse applications. By using the algorithmic lens to examine the common problems and by transferring techniques from one community to the other, we expected that bridges would be created and some tangible progress on open questions could be made.

Cite as

Phokion G. Kolaitis, Andrej E. Romashchenko, Milan Studený, Dan Suciu, and Tobias A. Boege. Algorithmic Aspects of Information Theory (Dagstuhl Seminar 22301). In Dagstuhl Reports, Volume 12, Issue 7, pp. 180-204, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@Article{kolaitis_et_al:DagRep.12.7.180,
  author =	{Kolaitis, Phokion G. and Romashchenko, Andrej E. and Studen\'{y}, Milan and Suciu, Dan and Boege, Tobias A.},
  title =	{{Algorithmic Aspects of Information Theory (Dagstuhl Seminar 22301)}},
  pages =	{180--204},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2023},
  volume =	{12},
  number =	{7},
  editor =	{Kolaitis, Phokion G. and Romashchenko, Andrej E. and Studen\'{y}, Milan and Suciu, Dan and Boege, Tobias A.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagRep.12.7.180},
  URN =		{urn:nbn:de:0030-drops-176155},
  doi =		{10.4230/DagRep.12.7.180},
  annote =	{Keywords: Information theory, Information inequalities, Conditional independence structures, Database query evaluation and containment, Decision problems}
}
Document
Invited Talk
On an Information Theoretic Approach to Cardinality Estimation (Invited Talk)

Authors: Hung Q. Ngo

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)


Abstract
This article is a companion to an invited talk at ICDT'2022 with the same title. Cardinality estimation is among the most important problems in query optimization. It is well-documented that, when query plans go haywire, in most cases one can trace the root cause to the cardinality estimator being far off. In particular, traditional cardinality estimation based on selectivity estimation may sometimes under-estimate cardinalities by orders of magnitudes, because the independence or the uniformity assumptions do not typically hold. This talk outlines an approach to cardinality estimation that is "model-free" from a statistical stand-point. Being model-free means the approach tries to avoid making any distributional assumptions. Our approach is information-theoretic, and generalizes recent results on worst-case output size bounds of queries, allowing the estimator to take into account histogram information from the input relations. The estimator turns out to be the objective of a maximization problem subject to concave constraints, over an exponential number of variables. We then explain how the estimator can be computed in polynomial time for some fragment of these constraints. Overall, the talk introduces a new direction to address the classic problem of cardinality estimation that is designed to circumvent some of the pitfalls of selectivity-based estimation. We will also explain connections to other fundamental problems in information theory and database theory regarding information inequalities. The talk is based on (published and unpublished) joint works with Mahmoud Abo Khamis, Sungjin Im, Hossein Keshavarz, Phokion Kolaitis, Ben Moseley, XuanLong Nguyen, Kirk Pruhs, Dan Suciu, and Alireza Samadian Zakaria

Cite as

Hung Q. Ngo. On an Information Theoretic Approach to Cardinality Estimation (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 1:1-1:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{ngo:LIPIcs.ICDT.2022.1,
  author =	{Ngo, Hung Q.},
  title =	{{On an Information Theoretic Approach to Cardinality Estimation}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{1:1--1:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.1},
  URN =		{urn:nbn:de:0030-drops-158750},
  doi =		{10.4230/LIPIcs.ICDT.2022.1},
  annote =	{Keywords: Cardinality Estimation, Information Theory, Polymatroid Bound, Worst-case Optimal Join}
}
Document
Uniform Reliability of Self-Join-Free Conjunctive Queries

Authors: Antoine Amarilli and Benny Kimelfeld

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
The reliability of a Boolean Conjunctive Query (CQ) over a tuple-independent probabilistic database is the probability that the CQ is satisfied when the tuples of the database are sampled one by one, independently, with their associated probability. For queries without self-joins (repeated relation symbols), the data complexity of this problem is fully characterized in a known dichotomy: reliability can be computed in polynomial time for hierarchical queries, and is #P-hard for non-hierarchical queries. Hierarchical queries also characterize the tractability of queries for other tasks: having read-once lineage formulas, supporting insertion/deletion updates to the database in constant time, and having a tractable computation of tuples' Shapley and Banzhaf values. In this work, we investigate a fundamental counting problem for CQs without self-joins: how many sets of facts from the input database satisfy the query? This is equivalent to the uniform case of the query reliability problem, where the probability of every tuple is required to be 1/2. Of course, for hierarchical queries, uniform reliability is in polynomial time, like the reliability problem. However, it is an open question whether being hierarchical is necessary for the uniform reliability problem to be in polynomial time. In fact, the complexity of the problem has been unknown even for the simplest non-hierarchical CQs without self-joins. We solve this open question by showing that uniform reliability is #P-complete for every non-hierarchical CQ without self-joins. Hence, we establish that being hierarchical also characterizes the tractability of unweighted counting of the satisfying tuple subsets. We also consider the generalization to query reliability where all tuples of the same relation have the same probability, and give preliminary results on the complexity of this problem.

Cite as

Antoine Amarilli and Benny Kimelfeld. Uniform Reliability of Self-Join-Free Conjunctive Queries. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2021.17,
  author =	{Amarilli, Antoine and Kimelfeld, Benny},
  title =	{{Uniform Reliability of Self-Join-Free Conjunctive Queries}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{17:1--17:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.17},
  URN =		{urn:nbn:de:0030-drops-137252},
  doi =		{10.4230/LIPIcs.ICDT.2021.17},
  annote =	{Keywords: Hierarchical conjunctive queries, query reliability, tuple-independent database, counting problems, #P-hardness}
}
Document
Track B: Automata, Logic, Semantics, and Theory of Programming
Decision Problems in Information Theory

Authors: Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu

Published in: LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)


Abstract
Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into levels of the arithmetical hierarchy. We establish the following results on checking the validity over all almost-entropic functions: first, validity of a Boolean information constraint arising from a monotone Boolean formula is co-recursively enumerable; second, validity of "tight" conditional information constraints is in Π⁰₃. Furthermore, under some restrictions, validity of conditional information constraints "with slack" is in Σ⁰₂, and validity of information inequality constraints involving max is Turing equivalent to validity of information inequality constraints (with no max involved). We also prove that the classical implication problem for conditional independence statements is co-recursively enumerable.

Cite as

Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu. Decision Problems in Information Theory. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 106:1-106:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{abokhamis_et_al:LIPIcs.ICALP.2020.106,
  author =	{Abo Khamis, Mahmoud and Kolaitis, Phokion G. and Ngo, Hung Q. and Suciu, Dan},
  title =	{{Decision Problems in Information Theory}},
  booktitle =	{47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
  pages =	{106:1--106:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-138-2},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{168},
  editor =	{Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.106},
  URN =		{urn:nbn:de:0030-drops-125137},
  doi =		{10.4230/LIPIcs.ICALP.2020.106},
  annote =	{Keywords: Information theory, decision problems, arithmetical hierarchy, entropic functions}
}
Document
Integrity Constraints Revisited: From Exact to Approximate Implication

Authors: Batya Kenig and Dan Suciu

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)


Abstract
Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction.

Cite as

Batya Kenig and Dan Suciu. Integrity Constraints Revisited: From Exact to Approximate Implication. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 18:1-18:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{kenig_et_al:LIPIcs.ICDT.2020.18,
  author =	{Kenig, Batya and Suciu, Dan},
  title =	{{Integrity Constraints Revisited: From Exact to Approximate Implication}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{18:1--18:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.18},
  URN =		{urn:nbn:de:0030-drops-119420},
  doi =		{10.4230/LIPIcs.ICDT.2020.18},
  annote =	{Keywords: Integrity constraints, The implication problem}
}
Document
Boolean Tensor Decomposition for Conjunctive Queries with Negation

Authors: Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
We propose an approach for answering conjunctive queries with negation, where the negated relations have bounded degree. Its data complexity matches that of the InsideOut and PANDA algorithms for the positive subquery of the input query and is expressed in terms of the fractional hypertree width and the submodular width respectively. Its query complexity depends on the structure of the conjunction of negated relations; in general it is exponential in the number of join variables occurring in negated relations yet it becomes polynomial for several classes of queries. This approach relies on several contributions. We show how to rewrite queries with negation on bounded-degree relations into equivalent conjunctive queries with not-all-equal (NAE) predicates, which are a multi-dimensional analog of disequality (!=). We then generalize the known color-coding technique to conjunctions of NAE predicates and explain it via a Boolean tensor decomposition of conjunctions of NAE predicates. This decomposition can be achieved via a probabilistic construction that can be derandomized efficiently.

Cite as

Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu. Boolean Tensor Decomposition for Conjunctive Queries with Negation. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 21:1-21:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{abokhamis_et_al:LIPIcs.ICDT.2019.21,
  author =	{Abo Khamis, Mahmoud and Ngo, Hung Q. and Olteanu, Dan and Suciu, Dan},
  title =	{{Boolean Tensor Decomposition for Conjunctive Queries with Negation}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{21:1--21:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.21},
  URN =		{urn:nbn:de:0030-drops-103236},
  doi =		{10.4230/LIPIcs.ICDT.2019.21},
  annote =	{Keywords: color-coding, combined complexity, negation, query evaluation}
}
Document
Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151)

Authors: Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, and Ke Yi

Published in: Dagstuhl Manifestos, Volume 7, Issue 1 (2018)


Abstract
The area of Principles of Data Management (PDM) has made crucial contributions to the development of formal frameworks for understanding and managing data and knowledge. This work has involved a rich cross-fertilization between PDM and other disciplines in mathematics and computer science, including logic, complexity theory, and knowledge representation. We anticipate on-going expansion of PDM research as the technology and applications involving data management continue to grow and evolve. In particular, the lifecycle of Big Data Analytics raises a wealth of challenge areas that PDM can help with. In this report we identify some of the most important research directions where the PDM community has the potential to make significant contributions. This is done from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term.

Cite as

Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, and Ke Yi. Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151). In Dagstuhl Manifestos, Volume 7, Issue 1, pp. 1-29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@Article{abiteboul_et_al:DagMan.7.1.1,
  author =	{Abiteboul, Serge and Arenas, Marcelo and Barcel\'{o}, Pablo and Bienvenu, Meghyn and Calvanese, Diego and David, Claire and Hull, Richard and H\"{u}llermeier, Eyke and Kimelfeld, Benny and Libkin, Leonid and Martens, Wim and Milo, Tova and Murlak, Filip and Neven, Frank and Ortiz, Magdalena and Schwentick, Thomas and Stoyanovich, Julia and Su, Jianwen and Suciu, Dan and Vianu, Victor and Yi, Ke},
  title =	{{Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151)}},
  pages =	{1--29},
  journal =	{Dagstuhl Manifestos},
  ISSN =	{2193-2433},
  year =	{2018},
  volume =	{7},
  number =	{1},
  editor =	{Abiteboul, Serge and Arenas, Marcelo and Barcel\'{o}, Pablo and Bienvenu, Meghyn and Calvanese, Diego and David, Claire and Hull, Richard and H\"{u}llermeier, Eyke and Kimelfeld, Benny and Libkin, Leonid and Martens, Wim and Milo, Tova and Murlak, Filip and Neven, Frank and Ortiz, Magdalena and Schwentick, Thomas and Stoyanovich, Julia and Su, Jianwen and Suciu, Dan and Vianu, Victor and Yi, Ke},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagMan.7.1.1},
  URN =		{urn:nbn:de:0030-drops-86772},
  doi =		{10.4230/DagMan.7.1.1},
  annote =	{Keywords: database theory, principles of data management, query languages, efficient query processing, query optimization, heterogeneous data, uncertainty, knowledge-enriched data management, machine learning, workflows, human-related data, ethics}
}
Document
Recent Trends in Knowledge Compilation (Dagstuhl Seminar 17381)

Authors: Adnan Darwiche, Pierre Marquis, Dan Suciu, and Stefan Szeider

Published in: Dagstuhl Reports, Volume 7, Issue 9 (2018)


Abstract
Knowledge compilation (KC) is a research topic which aims to investigate the possibility of circumventing the computational intractability of hard tasks, by preprocessing part of the available information, common to a number of instances. Pioneered almost three decades ago, KC is nowadays a very active research field, transversal to several areas within computer science. Among others, KC intersects knowledge representation, constraint satisfaction, algorithms, complexity theory, machine learning, and databases. The results obtained so far take various forms, from theory (compilability settings, definition of target languages for KC, complexity results, succinctness results, etc.) to more practical results (development and evaluation of compilers and other preprocessors, applications to diagnosis, planning, automatic configuration, etc.). Recently, KC has been positioned as providing a systematic method for solving problems beyond NP, and also found applications in machine learning. The goal of this Dagstuhl Seminar was to advance both aspects of KC, and to pave the way for a fruitful cross-fertilization between the topics, from theory to practice. The program included a mixture of long and short presentations, with discussions. Several long talks with a tutorial flavor introduced the participants to the variety of aspects in knowledge compilation and the diversity of techniques used. System presentations as well as an open problem session were also included in the program.

Cite as

Adnan Darwiche, Pierre Marquis, Dan Suciu, and Stefan Szeider. Recent Trends in Knowledge Compilation (Dagstuhl Seminar 17381). In Dagstuhl Reports, Volume 7, Issue 9, pp. 62-85, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@Article{darwiche_et_al:DagRep.7.9.62,
  author =	{Darwiche, Adnan and Marquis, Pierre and Suciu, Dan and Szeider, Stefan},
  title =	{{Recent Trends in Knowledge Compilation (Dagstuhl Seminar 17381)}},
  pages =	{62--85},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2018},
  volume =	{7},
  number =	{9},
  editor =	{Darwiche, Adnan and Marquis, Pierre and Suciu, Dan and Szeider, Stefan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagRep.7.9.62},
  URN =		{urn:nbn:de:0030-drops-85896},
  doi =		{10.4230/DagRep.7.9.62},
  annote =	{Keywords: Knowledge compilation, Constraints, Preprocessing, Probabilistic databases, Model counting}
}
Document
Worst-Case Optimal Algorithms for Parallel Query Processing

Authors: Paraschos Koutris, Paul Beame, and Dan Suciu

Published in: LIPIcs, Volume 48, 19th International Conference on Database Theory (ICDT 2016)


Abstract
In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with p servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of (Ngo et al. 2012) for sequential algorithms. We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query q when all relations have size equal to M is O(M/p^{1/psi^*}), where psi^* is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several classes of queries. Finally, we show a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms. This technique allows us to recover (within a polylogarithmic factor) several recent results on the I/O complexity for computing join queries, and also obtain optimal algorithms for other classes of queries.

Cite as

Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-Case Optimal Algorithms for Parallel Query Processing. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{koutris_et_al:LIPIcs.ICDT.2016.8,
  author =	{Koutris, Paraschos and Beame, Paul and Suciu, Dan},
  title =	{{Worst-Case Optimal Algorithms for Parallel Query Processing}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.8},
  URN =		{urn:nbn:de:0030-drops-57771},
  doi =		{10.4230/LIPIcs.ICDT.2016.8},
  annote =	{Keywords: conjunctive query, parallel computation, worst-case bounds}
}
Document
Answering Conjunctive Queries with Inequalities

Authors: Paraschos Koutris, Tova Milo, Sudeepa Roy, and Dan Suciu

Published in: LIPIcs, Volume 31, 18th International Conference on Database Theory (ICDT 2015)


Abstract
In this parer, we study the complexity of answering conjunctive queries (CQ) with inequalities. In particular, we compare the complexity of the query with and without inequalities. The main contribution of our work is a novel combinatorial technique that enables the use of any Select-Project-Join query plan for a given CQ without inequalities in answering the CQ with inequalities, with an additional factor in running time that only depends on the query. To achieve this, we define a new projection operator that keeps a small representation (independent of the size of the database) of the set of input tuples that map to each tuple in the output of the projection; this representation is used to evaluate all the inequalities in the query. Second, we generalize a result by Papadimitriou-Yannakakis [PODS'97] and give an alternative algorithm based on the color-coding technique [Alon, Yuster and Zwick, PODS'02] to evaluate a CQ with inequalities by using an algorithm for the CQ without inequalities. Third, we investigate the structure of the query graph, inequality graph, and the augmented query graph with inequalities, and show that even if the query and the inequality graphs have bounded treewidth, the augmented graph not only can have an unbounded treewidth but can also be NP-hard to evaluate. Further, we illustrate classes of queries and inequalities where the augmented graphs have unbounded treewidth, but the CQ with inequalities can be evaluated in poly-time. Finally, we give necessary properties and sufficient properties that allow a class of CQs to have poly-time combined complexity with respect to any inequality pattern.

Cite as

Paraschos Koutris, Tova Milo, Sudeepa Roy, and Dan Suciu. Answering Conjunctive Queries with Inequalities. In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 76-93, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Copy BibTex To Clipboard

@InProceedings{koutris_et_al:LIPIcs.ICDT.2015.76,
  author =	{Koutris, Paraschos and Milo, Tova and Roy, Sudeepa and Suciu, Dan},
  title =	{{Answering Conjunctive Queries with Inequalities}},
  booktitle =	{18th International Conference on Database Theory (ICDT 2015)},
  pages =	{76--93},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-79-8},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{31},
  editor =	{Arenas, Marcelo and Ugarte, Mart{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2015.76},
  URN =		{urn:nbn:de:0030-drops-49781},
  doi =		{10.4230/LIPIcs.ICDT.2015.76},
  annote =	{Keywords: query evaluation, conjunctive query, inequality, treewidth}
}
Document
05061 Abstracts Collection – Foundations of Semistructured Data

Authors: Frank Neven, Thomas Schwentick, and Dan Suciu

Published in: Dagstuhl Seminar Proceedings, Volume 5061, Foundations of Semistructured Data (2005)


Abstract
From 06.02.05 to 11.02.05, the Dagstuhl Seminar 05061 ``Foundations of Semistructured Data'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.

Cite as

Frank Neven, Thomas Schwentick, and Dan Suciu. 05061 Abstracts Collection – Foundations of Semistructured Data. In Foundations of Semistructured Data. Dagstuhl Seminar Proceedings, Volume 5061, pp. 1-13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005)


Copy BibTex To Clipboard

@InProceedings{neven_et_al:DagSemProc.05061.1,
  author =	{Neven, Frank and Schwentick, Thomas and Suciu, Dan},
  title =	{{05061 Abstracts Collection – Foundations of Semistructured Data}},
  booktitle =	{Foundations of Semistructured Data},
  pages =	{1--13},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{5061},
  editor =	{Frank Neven and Thomas Schwentick and Dan Suciu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.05061.1},
  URN =		{urn:nbn:de:0030-drops-2330},
  doi =		{10.4230/DagSemProc.05061.1},
  annote =	{Keywords: Semistructured data, XML, database theory, document processing}
}
Document
05061 Summary – Foundations of Semi-structured Data

Authors: Frank Neven, Thomas Schwentick, and Dan Suciu

Published in: Dagstuhl Seminar Proceedings, Volume 5061, Foundations of Semistructured Data (2005)


Abstract
As in the first seminar on this topic, the aim o the workshop was to bring together people from the areas related to semi-structured data. However, besides the presentation of recent work, this time the main goal was to identify the main lines of a common framework for future foundational work on semi-structured data. These lines of research are summarized below. The workshop was of a very interdisciplinary nature with invitees from databases, structured documents, programming languages, information retrieval and formal language theory. Several of the lectures were presented by PhD students. We had four invited speakers and a panel on research evaluation. Due to strong connections between topics treated at this workshop, many of the participants initiated new cooperations and research projects.

Cite as

Frank Neven, Thomas Schwentick, and Dan Suciu. 05061 Summary – Foundations of Semi-structured Data. In Foundations of Semistructured Data. Dagstuhl Seminar Proceedings, Volume 5061, pp. 1-5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005)


Copy BibTex To Clipboard

@InProceedings{neven_et_al:DagSemProc.05061.2,
  author =	{Neven, Frank and Schwentick, Thomas and Suciu, Dan},
  title =	{{05061 Summary – Foundations of Semi-structured Data}},
  booktitle =	{Foundations of Semistructured Data},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{5061},
  editor =	{Frank Neven and Thomas Schwentick and Dan Suciu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.05061.2},
  URN =		{urn:nbn:de:0030-drops-2276},
  doi =		{10.4230/DagSemProc.05061.2},
  annote =	{Keywords: Report, summary}
}
Document
Deterministic Automata on Unranked Trees

Authors: Wolfgang Thomas, Julien Christau, and Christof Löding

Published in: Dagstuhl Seminar Proceedings, Volume 5061, Foundations of Semistructured Data (2005)


Abstract
We investigate bottom-up and top-down deterministic automata on unranked trees. We show that for an appropriate definition of bottom-up deterministic automata it is possible to minimize the number of states efficiently and to obtain a unique canonical representative of the accepted tree language. For top-down deterministic automata it is well known that they are less expressive than the non-deterministic ones. By generalizing a corresponding proof from the theory of ranked tree automata we show that it is decidable whether a given regular language of unranked trees can be recognized by a top-down deterministic automaton. The standard deterministic top-down model is slightly weaker than the model we use, where at each node the automaton can scan the sequence of the labels of its successors before deciding its next move.

Cite as

Wolfgang Thomas, Julien Christau, and Christof Löding. Deterministic Automata on Unranked Trees. In Foundations of Semistructured Data. Dagstuhl Seminar Proceedings, Volume 5061, pp. 1-12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005)


Copy BibTex To Clipboard

@InProceedings{thomas_et_al:DagSemProc.05061.3,
  author =	{Thomas, Wolfgang and Christau, Julien and L\"{o}ding, Christof},
  title =	{{Deterministic Automata on Unranked Trees}},
  booktitle =	{Foundations of Semistructured Data},
  pages =	{1--12},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{5061},
  editor =	{Frank Neven and Thomas Schwentick and Dan Suciu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.05061.3},
  URN =		{urn:nbn:de:0030-drops-2281},
  doi =		{10.4230/DagSemProc.05061.3},
  annote =	{Keywords: Automata, unranked trees, parikh automata}
}
  • Refine by Author
  • 12 Suciu, Dan
  • 4 Schwentick, Thomas
  • 3 Neven, Frank
  • 3 Ngo, Hung Q.
  • 2 Abo Khamis, Mahmoud
  • Show More...

  • Refine by Classification
  • 4 Theory of computation → Database query processing and optimization (theory)
  • 2 Information systems → Query optimization
  • 2 Information systems → Query planning
  • 2 Mathematics of computing → Information theory
  • 1 Information systems → Database design and models
  • Show More...

  • Refine by Keyword
  • 2 Cardinality Estimation
  • 2 Information theory
  • 2 conjunctive query
  • 2 database theory
  • 2 query evaluation
  • Show More...

  • Refine by Type
  • 19 document

  • Refine by Publication Year
  • 6 2005
  • 2 2018
  • 2 2020
  • 2 2023
  • 1 2002
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail