DROPS

Volume

LIPIcs, Volume 220

25th International Conference on Database Theory (ICDT 2022)

ICDT 2022, March 29 to April 1, 2022, Edinburgh, UK (Virtual Conference)

Editors: Dan Olteanu and Nils Vortmeier

Document

DOI: 10.4230/DagMan.10.1.1

Current and Future Challenges in Knowledge Representation and Reasoning (Dagstuhl Perspectives Workshop 22282)

Authors: James P. Delgrande, Birte Glimm, Thomas Meyer, Miroslaw Truszczynski, and Frank Wolter

Published in: Dagstuhl Manifestos, Volume 10, Issue 1 (2024)

Abstract

Knowledge Representation and Reasoning is a central, longstanding, and active area of Artificial Intelligence. Over the years it has evolved significantly; more recently it has been challenged and complemented by research in areas such as machine learning and reasoning under uncertainty. In July 2022,sser a Dagstuhl Perspectives workshop was held on Knowledge Representation and Reasoning. The goal of the workshop was to describe the state of the art in the field, including its relation with other areas, its shortcomings and strengths, together with recommendations for future progress. We developed this manifesto based on the presentations, panels, working groups, and discussions that took place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge Representation: its origins, goals, milestones, and current foci; its relation to other disciplines, especially to Artificial Intelligence; and on its challenges, along with key priorities for the next decade.

Cite as

James P. Delgrande, Birte Glimm, Thomas Meyer, Miroslaw Truszczynski, and Frank Wolter. Current and Future Challenges in Knowledge Representation and Reasoning (Dagstuhl Perspectives Workshop 22282). In Dagstuhl Manifestos, Volume 10, Issue 1, pp. 1-61, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{delgrande_et_al:DagMan.10.1.1,
  author =	{Delgrande, James P. and Glimm, Birte and Meyer, Thomas and Truszczynski, Miroslaw and Wolter, Frank},
  title =	{{Current and Future Challenges in Knowledge Representation and Reasoning (Dagstuhl Perspectives Workshop 22282)}},
  pages =	{1--61},
  journal =	{Dagstuhl Manifestos},
  ISSN =	{2193-2433},
  year =	{2024},
  volume =	{10},
  number =	{1},
  editor =	{Delgrande, James P. and Glimm, Birte and Meyer, Thomas and Truszczynski, Miroslaw and Wolter, Frank},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagMan.10.1.1},
  URN =		{urn:nbn:de:0030-drops-201403},
  doi =		{10.4230/DagMan.10.1.1},
  annote =	{Keywords: Knowledge representation and reasoning, Applications of logics, Declarative representations, Formal logic}
}

Document

Position

DOI: 10.4230/TGDK.2.1.2

Grounding Stream Reasoning Research

Authors: Pieter Bonte, Jean-Paul Calbimonte, Daniel de Leng, Daniele Dell'Aglio, Emanuele Della Valle, Thomas Eiter, Federico Giannini, Fredrik Heintz, Konstantin Schekotihin, Danh Le-Phuoc, Alessandra Mileo, Patrik Schneider, Riccardo Tommasini, Jacopo Urbani, and Giacomo Ziffer

Published in: TGDK, Volume 2, Issue 1 (2024): Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge, Volume 2, Issue 1

Abstract

In the last decade, there has been a growing interest in applying AI technologies to implement complex data analytics over data streams. To this end, researchers in various fields have been organising a yearly event called the "Stream Reasoning Workshop" to share perspectives, challenges, and experiences around this topic. In this paper, the previous organisers of the workshops and other community members provide a summary of the main research results that have been discussed during the first six editions of the event. These results can be categorised into four main research areas: The first is concerned with the technological challenges related to handling large data streams. The second area aims at adapting and extending existing semantic technologies to data streams. The third and fourth areas focus on how to implement reasoning techniques, either considering deductive or inductive techniques, to extract new and valuable knowledge from the data in the stream. This summary is written not only to provide a crystallisation of the field, but also to point out distinctive traits of the stream reasoning community. Moreover, it also provides a foundation for future research by enumerating a list of use cases and open challenges, to stimulate others to join this exciting research area.

Cite as

Pieter Bonte, Jean-Paul Calbimonte, Daniel de Leng, Daniele Dell'Aglio, Emanuele Della Valle, Thomas Eiter, Federico Giannini, Fredrik Heintz, Konstantin Schekotihin, Danh Le-Phuoc, Alessandra Mileo, Patrik Schneider, Riccardo Tommasini, Jacopo Urbani, and Giacomo Ziffer. Grounding Stream Reasoning Research. In Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 1, pp. 2:1-2:47, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{bonte_et_al:TGDK.2.1.2,
  author =	{Bonte, Pieter and Calbimonte, Jean-Paul and de Leng, Daniel and Dell'Aglio, Daniele and Della Valle, Emanuele and Eiter, Thomas and Giannini, Federico and Heintz, Fredrik and Schekotihin, Konstantin and Le-Phuoc, Danh and Mileo, Alessandra and Schneider, Patrik and Tommasini, Riccardo and Urbani, Jacopo and Ziffer, Giacomo},
  title =	{{Grounding Stream Reasoning Research}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{2:1--2:47},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.1.2},
  URN =		{urn:nbn:de:0030-drops-198597},
  doi =		{10.4230/TGDK.2.1.2},
  annote =	{Keywords: Stream Reasoning, Stream Processing, RDF streams, Streaming Linked Data, Continuous query processing, Temporal Logics, High-performance computing, Databases}
}

@Article{bonte_et_al:TGDK.2.1.2,
  author =	{Bonte, Pieter and Calbimonte, Jean-Paul and de Leng, Daniel and Dell'Aglio, Daniele and Della Valle, Emanuele and Eiter, Thomas and Giannini, Federico and Heintz, Fredrik and Schekotihin, Konstantin and Le-Phuoc, Danh and Mileo, Alessandra and Schneider, Patrik and Tommasini, Riccardo and Urbani, Jacopo and Ziffer, Giacomo},
  title =	{{Grounding Stream Reasoning Research}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{2:1--2:47},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.1.2},
  URN =		{urn:nbn:de:0030-drops-198597},
  doi =		{10.4230/TGDK.2.1.2},
  annote =	{Keywords: Stream Reasoning, Stream Processing, RDF streams, Streaming Linked Data, Continuous query processing, Temporal Logics, High-performance computing, Databases}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2023.17

Conjunctive Queries with Free Access Patterns Under Updates

Authors: Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)

Abstract

We study the problem of answering conjunctive queries with free access patterns under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple of values over the input variables. We introduce a fully dynamic evaluation approach for such queries. We also give a syntactic characterisation of those queries that admit constant time per single-tuple update and whose output tuples can be enumerated with constant delay given an input tuple. Finally, we chart the complexity trade-off between the preprocessing time, update time and enumeration delay for such queries. For a class of queries, our approach achieves optimal, albeit non-constant, update time and delay. Their optimality is predicated on the Online Matrix-Vector Multiplication conjecture. Our results recover prior work on the dynamic evaluation of conjunctive queries without access patterns.

Cite as

Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Conjunctive Queries with Free Access Patterns Under Updates. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 17:1-17:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2023.17,
  author =	{Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Conjunctive Queries with Free Access Patterns Under Updates}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{17:1--17:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.17},
  URN =		{urn:nbn:de:0030-drops-177599},
  doi =		{10.4230/LIPIcs.ICDT.2023.17},
  annote =	{Keywords: fully dynamic algorithm, enumeration delay, complexity trade-off, dichotomy}
}

Document

DOI: 10.4230/LIPIcs.CSL.2023.29

Evaluation Trade-Offs for Acyclic Conjunctive Queries

Authors: Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang

Published in: LIPIcs, Volume 252, 31st EACSL Annual Conference on Computer Science Logic (CSL 2023)

Abstract

We consider the evaluation of acyclic conjunctive queries, where the evaluation time is decomposed into preprocessing time and enumeration delay. In a seminal paper at CSL'07, Bagan, Durand, and Grandjean showed that acyclic queries can be evaluated with linear preprocessing time and linear enumeration delay. If the query is free-connex, the enumeration delay becomes constant. Further prior work showed that constant enumeration delay can be achieved for arbitrary acyclic conjunctive queries at the expense of a preprocessing time that is characterised by the fractional hypertree width. We introduce an approach that exposes a trade-off between preprocessing time and enumeration delay for acyclic conjunctive queries. The aforementioned prior works represent extremes in this trade-off space. Yet our approach also allows for the enumeration delay and the preprocessing time between these extremes, in particular the delay may lie between constant and linear time. Our approach decomposes the given query into subqueries and achieves for each subquery a trade-off that depends on a parameter controlling the times for preprocessing and enumeration. The complexity of the query is given by the Pareto optimal points of a bi-objective optimisation program whose inputs are possible query decompositions and parameter values.

Cite as

Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Evaluation Trade-Offs for Acyclic Conjunctive Queries. In 31st EACSL Annual Conference on Computer Science Logic (CSL 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 252, pp. 29:1-29:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.CSL.2023.29,
  author =	{Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Evaluation Trade-Offs for Acyclic Conjunctive Queries}},
  booktitle =	{31st EACSL Annual Conference on Computer Science Logic (CSL 2023)},
  pages =	{29:1--29:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-264-8},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{252},
  editor =	{Klin, Bartek and Pimentel, Elaine},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CSL.2023.29},
  URN =		{urn:nbn:de:0030-drops-174907},
  doi =		{10.4230/LIPIcs.CSL.2023.29},
  annote =	{Keywords: acyclic queries, query evaluation, enumeration delay}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.ICDT.2022

LIPIcs, Volume 220, ICDT 2022, Complete Volume

Authors: Dan Olteanu and Nils Vortmeier

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

LIPIcs, Volume 220, ICDT 2022, Complete Volume

Cite as

25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 1-354, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@Proceedings{olteanu_et_al:LIPIcs.ICDT.2022,
  title =	{{LIPIcs, Volume 220, ICDT 2022, Complete Volume}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{1--354},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022},
  URN =		{urn:nbn:de:0030-drops-158737},
  doi =		{10.4230/LIPIcs.ICDT.2022},
  annote =	{Keywords: LIPIcs, Volume 220, ICDT 2022, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.ICDT.2022.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Dan Olteanu and Nils Vortmeier

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{olteanu_et_al:LIPIcs.ICDT.2022.0,
  author =	{Olteanu, Dan and Vortmeier, Nils},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.0},
  URN =		{urn:nbn:de:0030-drops-158745},
  doi =		{10.4230/LIPIcs.ICDT.2022.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2022.1

On an Information Theoretic Approach to Cardinality Estimation (Invited Talk)

Authors: Hung Q. Ngo

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

This article is a companion to an invited talk at ICDT'2022 with the same title. Cardinality estimation is among the most important problems in query optimization. It is well-documented that, when query plans go haywire, in most cases one can trace the root cause to the cardinality estimator being far off. In particular, traditional cardinality estimation based on selectivity estimation may sometimes under-estimate cardinalities by orders of magnitudes, because the independence or the uniformity assumptions do not typically hold. This talk outlines an approach to cardinality estimation that is "model-free" from a statistical stand-point. Being model-free means the approach tries to avoid making any distributional assumptions. Our approach is information-theoretic, and generalizes recent results on worst-case output size bounds of queries, allowing the estimator to take into account histogram information from the input relations. The estimator turns out to be the objective of a maximization problem subject to concave constraints, over an exponential number of variables. We then explain how the estimator can be computed in polynomial time for some fragment of these constraints. Overall, the talk introduces a new direction to address the classic problem of cardinality estimation that is designed to circumvent some of the pitfalls of selectivity-based estimation. We will also explain connections to other fundamental problems in information theory and database theory regarding information inequalities. The talk is based on (published and unpublished) joint works with Mahmoud Abo Khamis, Sungjin Im, Hossein Keshavarz, Phokion Kolaitis, Ben Moseley, XuanLong Nguyen, Kirk Pruhs, Dan Suciu, and Alireza Samadian Zakaria

Cite as

Hung Q. Ngo. On an Information Theoretic Approach to Cardinality Estimation (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 1:1-1:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ngo:LIPIcs.ICDT.2022.1,
  author =	{Ngo, Hung Q.},
  title =	{{On an Information Theoretic Approach to Cardinality Estimation}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{1:1--1:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.1},
  URN =		{urn:nbn:de:0030-drops-158750},
  doi =		{10.4230/LIPIcs.ICDT.2022.1},
  annote =	{Keywords: Cardinality Estimation, Information Theory, Polymatroid Bound, Worst-case Optimal Join}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2022.2

Counting the Solutions to a Query (Invited Talk)

Authors: Marcelo Arenas

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

In this talk, we consider the problem of counting the solutions to a query. Our first motivating scenario is the use of regular expressions to extract paths from a graph database. More specifically, given a graph database D, a regular expression r and a natural number n, consider the problem of counting the number of paths p in D such that p conforms to r and the length of p is n. This problem is known to be hard, namely #P-complete. In this talk, we show that this problem admits a fully polynomial-time randomized approximation scheme (FPRAS). Remarkably, the key idea to prove this result is to show that the fundamental problem #NFA admits an FPRAS, where #NFA is the problem of counting the number of strings of length n accepted by a non-deterministic finite automaton (NFA). While this problem is known to be #P-complete and, more precisely, SpanL-complete, it was open whether this problem admits an FPRAS. In this work, we solve this open problem and obtain as a welcome corollary that every function in SpanL admits an FPRAS. As a second motivating scenario, we consider the widely used class of conjunctive queries over relational databases. More specifically, for every class C of conjunctive queries with bounded treewidth, we introduce the first FPRAS for counting the answers to a query in C. In fact, our FPRAS is more general, and also applies to conjunctive queries with bounded hypertree width, as well as unions of such queries. As for the case of graph databases, the key ingredient in our proof is the resolution of a fundamental counting problem from automata theory. Specifically, we show that the problem #TA admits an FPRAS, where #TA is the problem of counting the number of trees of size n accepted by a tree automaton (TA). This talk is based on the results presented in [Marcelo Arenas et al., 2021; Marcelo Arenas et al., 2021].

Cite as

Marcelo Arenas. Counting the Solutions to a Query (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{arenas:LIPIcs.ICDT.2022.2,
  author =	{Arenas, Marcelo},
  title =	{{Counting the Solutions to a Query}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.2},
  URN =		{urn:nbn:de:0030-drops-158763},
  doi =		{10.4230/LIPIcs.ICDT.2022.2},
  annote =	{Keywords: Counting, query answering, fully polynomial-time randomized approximation scheme}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2022.3

Answering Unions of Conjunctive Queries with Ideal Time Guarantees (Invited Talk)

Authors: Nofar Carmeli

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

The holy grail we strive for is, given a query, to identify an algorithm that answers it over general databases with optimal time guarantees for the specific query. In this tutorial, we focus on what can be seen as ideal time guarantees: linear preprocessing (needed to read the input) and constant time per answer (needed to print the output). We seek to understand which queries can be solved with these (or almost these) time guarantees and how. We start with the basic building blocks of database queries: joins, and slowly increase the expressivity by introducing projections and unions until we cover positive relational algebra. We first consider the task of enumerating all query answers and then discuss related, more demanding, tasks such as ordered enumeration and direct access to query answers. We investigate the challenges in answering such queries and provide algorithms and conditional lower bounds

Cite as

Nofar Carmeli. Answering Unions of Conjunctive Queries with Ideal Time Guarantees (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{carmeli:LIPIcs.ICDT.2022.3,
  author =	{Carmeli, Nofar},
  title =	{{Answering Unions of Conjunctive Queries with Ideal Time Guarantees}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.3},
  URN =		{urn:nbn:de:0030-drops-158771},
  doi =		{10.4230/LIPIcs.ICDT.2022.3},
  annote =	{Keywords: query evaluation, enumeration, fine-grained complexity, constant delay, union of conjunctive queries}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2022.4

On the Hardness of Category Tree Construction

Authors: Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, and Slava Novgorodov

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

Category trees, or taxonomies, are rooted trees where each node, called a category, corresponds to a set of related items. The construction of taxonomies has been studied in various domains, including e-commerce, document management, and question answering. Multiple algorithms for automating construction have been proposed, employing a variety of clustering approaches and crowdsourcing. However, no formal model to capture such categorization problems has been devised, and their complexity has not been studied. To address this, we propose in this work a combinatorial model that captures many practical settings and show that the aforementioned empirical approach has been warranted, as we prove strong inapproximability bounds for various problem variants and special cases when the goal is to produce a categorization of the maximum utility. In our model, the input is a set of n weighted item sets that the tree would ideally contain as categories. Each category, rather than perfectly match the corresponding input set, is allowed to exceed a given threshold for a given similarity function. The goal is to produce a tree that maximizes the total weight of the sets for which it contains a matching category. A key parameter is an upper bound on the number of categories an item may belong to, which produces the hardness of the problem, as initially each item may be contained in an arbitrary number of input sets. For this model, we prove inapproximability bounds, of order Θ̃(√n) or Θ̃(n), for various problem variants and special cases, loosely justifying the aforementioned heuristic approach. Our work includes reductions based on parameterized randomized constructions that highlight how various problem parameters and properties of the input may affect the hardness. Moreover, for the special case where the category must be identical to the corresponding input set, we devise an algorithm whose approximation guarantee depends solely on a more granular parameter, allowing improved worst-case guarantees. Finally, we also generalize our results to DAG-based and non-hierarchical categorization.

Cite as

Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, and Slava Novgorodov. On the Hardness of Category Tree Construction. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{gershtein_et_al:LIPIcs.ICDT.2022.4,
  author =	{Gershtein, Shay and Avron, Uri and Guy, Ido and Milo, Tova and Novgorodov, Slava},
  title =	{{On the Hardness of Category Tree Construction}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{4:1--4:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.4},
  URN =		{urn:nbn:de:0030-drops-158785},
  doi =		{10.4230/LIPIcs.ICDT.2022.4},
  annote =	{Keywords: maximum independent set, approximation algorithms, approximation hardness bounds, taxonomy construction, category tree construction}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2022.5

Linear Programs with Conjunctive Queries

Authors: Florent Capelli, Nicolas Crosetti, Joachim Niehren, and Jan Ramon

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

In this paper, we study the problem of optimizing a linear program whose variables are the answers to a conjunctive query. For this we propose the language LP(CQ) for specifying linear programs whose constraints and objective functions depend on the answer sets of conjunctive queries. We contribute an efficient algorithm for solving programs in a fragment of LP(CQ). The naive approach constructs a linear program having as many variables as there are elements in the answer set of the queries. Our approach constructs a linear program having the same optimal value but fewer variables. This is done by exploiting the structure of the conjunctive queries using generalized hypertree decompositions of small width to factorize elements of the answer set together. We illustrate the various applications of LP(CQ) programs on three examples: optimizing deliveries of resources, minimizing noise for differential privacy, and computing the s-measure of patterns in graphs as needed for data mining.

Cite as

Florent Capelli, Nicolas Crosetti, Joachim Niehren, and Jan Ramon. Linear Programs with Conjunctive Queries. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{capelli_et_al:LIPIcs.ICDT.2022.5,
  author =	{Capelli, Florent and Crosetti, Nicolas and Niehren, Joachim and Ramon, Jan},
  title =	{{Linear Programs with Conjunctive Queries}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.5},
  URN =		{urn:nbn:de:0030-drops-158796},
  doi =		{10.4230/LIPIcs.ICDT.2022.5},
  annote =	{Keywords: Database queries, linear programming, hypergraph decomposition}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2022.6

Certifiable Robustness for Nearest Neighbor Classifiers

Authors: Austen Z. Fan and Paraschos Koutris

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, k-Nearest Neighbors (k-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.

Cite as

Austen Z. Fan and Paraschos Koutris. Certifiable Robustness for Nearest Neighbor Classifiers. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{fan_et_al:LIPIcs.ICDT.2022.6,
  author =	{Fan, Austen Z. and Koutris, Paraschos},
  title =	{{Certifiable Robustness for Nearest Neighbor Classifiers}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{6:1--6:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.6},
  URN =		{urn:nbn:de:0030-drops-158809},
  doi =		{10.4230/LIPIcs.ICDT.2022.6},
  annote =	{Keywords: Inconsistent databases, k-NN classification, certifiable robustness}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2022.7

Improved Approximation and Scalability for Fair Max-Min Diversification

Authors: Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

Given an n-point metric space ({𝒳},d) where each point belongs to one of m = O(1) different categories or groups and a set of integers k₁, …, k_m, the fair Max-Min diversification problem is to select k_i points belonging to category i ∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1) We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6-approximation that is guaranteed to satisfy the fairness constraints up to a factor 1-ε for any constant ε. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m-1 approximation. 2) We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. {For constant dimensions, categories and any constant ε > 0, we present a 1+ε approximation algorithm that runs in O(nk) + 2^{O(k)} time where k = k₁+…+k_m.} We can improve the running time to O(nk)+poly(k) at the expense of only picking (1-ε) k_i points from category i ∈ [m]. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.

Cite as

Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou. Improved Approximation and Scalability for Fair Max-Min Diversification. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 7:1-7:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{addanki_et_al:LIPIcs.ICDT.2022.7,
  author =	{Addanki, Raghavendra and McGregor, Andrew and Meliou, Alexandra and Moumoulidou, Zafeiria},
  title =	{{Improved Approximation and Scalability for Fair Max-Min Diversification}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{7:1--7:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.7},
  URN =		{urn:nbn:de:0030-drops-158812},
  doi =		{10.4230/LIPIcs.ICDT.2022.7},
  annote =	{Keywords: algorithmic fairness, diversity maximization, data selection, approximation algorithms}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2022.8

Rewriting with Acyclic Queries: Mind Your Head

Authors: Gaetano Geck, Jens Keppeler, Thomas Schwentick, and Christopher Spinrath

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

The paper studies the rewriting problem, that is, the decision problem whether, for a given conjunctive query Q and a set 𝒱 of views, there is a conjunctive query Q' over 𝒱 that is equivalent to Q, for cases where the query, the views, and/or the desired rewriting are acyclic or even more restricted. It shows that, if Q itself is acyclic, an acyclic rewriting exists if there is any rewriting. An analogous statement also holds for free-connex acyclic, hierarchical, and q-hierarchical queries. Regarding the complexity of the rewriting problem, the paper identifies a border between tractable and (presumably) intractable variants of the rewriting problem: for schemas of bounded arity, the acyclic rewriting problem is NP-hard, even if both Q and the views in 𝒱 are acyclic or hierarchical. However, it becomes tractable, if the views are free-connex acyclic (i.e., in a nutshell, their body is (i) acyclic and (ii) remains acyclic if their head is added as an additional atom).

Cite as

Gaetano Geck, Jens Keppeler, Thomas Schwentick, and Christopher Spinrath. Rewriting with Acyclic Queries: Mind Your Head. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{geck_et_al:LIPIcs.ICDT.2022.8,
  author =	{Geck, Gaetano and Keppeler, Jens and Schwentick, Thomas and Spinrath, Christopher},
  title =	{{Rewriting with Acyclic Queries: Mind Your Head}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{8:1--8:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.8},
  URN =		{urn:nbn:de:0030-drops-158829},
  doi =		{10.4230/LIPIcs.ICDT.2022.8},
  annote =	{Keywords: rewriting, acyclic rewriting, acyclic conjunctive queries, free-connex queries, hierarchical queries, NP-hardness}
}

33 Search Results for "Olteanu, Dan"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as