DROPS

Volume

LIPIcs, Volume 290

27th International Conference on Database Theory (ICDT 2024)

ICDT 2024, March 25-28, 2024, Paestum, Italy

Editors: Graham Cormode and Michael Shekelyan

Document

Complete Volume

DOI: 10.4230/LIPIcs.ICDT.2024

LIPIcs, Volume 290, ICDT 2024, Complete Volume

Authors: Graham Cormode and Michael Shekelyan

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

LIPIcs, Volume 290, ICDT 2024, Complete Volume

Cite as

27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 1-484, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Proceedings{cormode_et_al:LIPIcs.ICDT.2024,
  title =	{{LIPIcs, Volume 290, ICDT 2024, Complete Volume}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{1--484},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024},
  URN =		{urn:nbn:de:0030-drops-197819},
  doi =		{10.4230/LIPIcs.ICDT.2024},
  annote =	{Keywords: LIPIcs, Volume 290, ICDT 2024, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.ICDT.2024.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Graham Cormode and Michael Shekelyan

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cormode_et_al:LIPIcs.ICDT.2024.0,
  author =	{Cormode, Graham and Shekelyan, Michael},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.0},
  URN =		{urn:nbn:de:0030-drops-197828},
  doi =		{10.4230/LIPIcs.ICDT.2024.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2024.1

Natural Language Data Interfaces: A Data Access Odyssey (Invited Talk)

Authors: Georgia Koutrika

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Back in 1970’s, E. F. Codd worked on a prototype of a natural language question and answer application that would sit on top of a relational database system. Soon, natural language interfaces for databases (NLIDBs) became the holy grail for the database community. Different approaches have been proposed from the database, machine learning and NLP communities. Interest in the topic has had its peaks and valleys. After a long and adventurous journey of almost 50 years, there is a rekindled interest in NLIDBs in recent years, fueled by the need for democratizing data access and by the recent advances in deep learning and natural language processing in particular. There is a surge of works on natural language interfaces for databases using neural translation, and suddenly it becomes hard to keep up with advancements in the field. Are we close to finding the holy grail of data access? What are the lurking challenges that we need to surpass and what research opportunities arise? Finally, what is the role of the database community?

Cite as

Georgia Koutrika. Natural Language Data Interfaces: A Data Access Odyssey (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 1:1-1:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{koutrika:LIPIcs.ICDT.2024.1,
  author =	{Koutrika, Georgia},
  title =	{{Natural Language Data Interfaces: A Data Access Odyssey}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{1:1--1:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.1},
  URN =		{urn:nbn:de:0030-drops-197832},
  doi =		{10.4230/LIPIcs.ICDT.2024.1},
  annote =	{Keywords: natural language data interfaces, NLIDBs, NL-to-SQL, text-to-SQL, conversational databases}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2024.2

How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk)

Authors: Sudeepa Roy, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Kristin Stephens-Martinez, and Jun Yang

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured relational data. Relational queries are based on logic and may be declarative in nature, posing new challenges to novices and students. Manual teaching resources being limited and enrollment growing rapidly, automated tools that help students debug queries and explain errors are potential game-changers in database education. We present a suite of tools built on the foundations of database theory that has been used by over 1600 students in database classes at Duke University, showcasing a high-impact application of database theory in database education.

Cite as

Sudeepa Roy, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Kristin Stephens-Martinez, and Jun Yang. How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 2:1-2:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{roy_et_al:LIPIcs.ICDT.2024.2,
  author =	{Roy, Sudeepa and Gilad, Amir and Hu, Yihao and Meng, Hanze and Miao, Zhengjie and Stephens-Martinez, Kristin and Yang, Jun},
  title =	{{How Database Theory Helps Teach Relational Queries in Database Education}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{2:1--2:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.2},
  URN =		{urn:nbn:de:0030-drops-197841},
  doi =		{10.4230/LIPIcs.ICDT.2024.2},
  annote =	{Keywords: Query Debugging, SQL, Relational Algebra, Relational Calculus, Database Education, Boolean Provenance}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2024.3

Rule-Based Ontologies: From Semantics to Syntax (Invited Talk)

Authors: Andreas Pieris

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

An ontology specifies an abstract model of a domain of interest via a formal language that is typically based on logic. Tuple-generating dependencies (tgds) and equality-generating dependencies (egds) originally introduced as a unifying framework for database integrity constraints, and later on used in data exchange and integration, are well suited for modeling ontologies that are intended for data-intensive tasks. The reason is that, unlike other popular formalisms such as description logics, tgds and egds can easily handle higher-arity relations that naturally occur in relational databases. In recent years, there has been an extensive study of tgd- and egd-based ontologies and of their applications to several different data-intensive tasks. In those studies, model theory plays a crucial role and it typically proceeds from syntax to semantics. In other words, the syntax of an ontology language is introduced first and then the properties of the mathematical structures that satisfy ontologies of that language are explored. There is, however, a mature and growing body of research in the reverse direction, i.e., from semantics to syntax. Here, the starting point is a collection of model-theoretic properties and the goal is to determine whether or not these properties characterize some ontology language. Such results are welcome as they pinpoint the expressive power of an ontology language in terms of insightful model-theoretic properties. The main aim of this tutorial is to present a comprehensive overview of model-theoretic characterizations of tgd- and egd-based ontology languages that are encountered in database theory and symbolic artificial intelligence.

Cite as

Andreas Pieris. Rule-Based Ontologies: From Semantics to Syntax (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{pieris:LIPIcs.ICDT.2024.3,
  author =	{Pieris, Andreas},
  title =	{{Rule-Based Ontologies: From Semantics to Syntax}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.3},
  URN =		{urn:nbn:de:0030-drops-197857},
  doi =		{10.4230/LIPIcs.ICDT.2024.3},
  annote =	{Keywords: ontologies, tuple-generating dependencies, equality-generating dependencies, model theory, model-theoretic characterizations}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.4

Direct Access for Answers to Conjunctive Queries with Aggregation

Authors: Idan Eldar, Nofar Carmeli, and Benny Kimelfeld

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For some common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. Specifically, we investigate the ability to evaluate such queries by constructing in log-linear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than log-linear in the size of the input, and so, the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research provides a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself is not part of the lexicographic order. On the other hand, we show infeasibility for the case of count-distinct that does not have any efficient representation as a commutative semiring. We then investigate the ability to include the aggregate and annotation outcome in the lexicographic order. Among the hardness results, standing out as tractable is the case of a semiring with an idempotent addition, such as those of min and max. Notably, this case captures also count-distinct over a logarithmic-size domain.

Cite as

Idan Eldar, Nofar Carmeli, and Benny Kimelfeld. Direct Access for Answers to Conjunctive Queries with Aggregation. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{eldar_et_al:LIPIcs.ICDT.2024.4,
  author =	{Eldar, Idan and Carmeli, Nofar and Kimelfeld, Benny},
  title =	{{Direct Access for Answers to Conjunctive Queries with Aggregation}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.4},
  URN =		{urn:nbn:de:0030-drops-197861},
  doi =		{10.4230/LIPIcs.ICDT.2024.4},
  annote =	{Keywords: aggregate queries, conjunctive queries, provenance semirings, commutative semirings, annotated databases, direct access, ranking function, answer orderings, query classification}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.5

Communication Cost of Joins over Federated Data

Authors: Tamara Cucumides and Juan Reutter

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

We study the problem of querying different data sources, which we assume out of our control and that are made available by standard web communication protocols. In this scenario, the time spent communicating data often dominates the time spent processing local queries in each server. Thus, our focus is on algorithms that minimize the communication between the query processing server and the federated servers containing data. However, any federated query can always be answered with linear communication, simply by requesting all the data to the federated sources. Further, one can show that certain queries do require this amount of communication. But sending all the data is definitely not a relevant algorithm from a practical point of view. This worst-case analysis is, therefore, not useful for our needs. There is a growing body of work in terms of designing strategies that minimize communication in query federation, but these strategies are commonly based in heuristics, and we currently miss a formal analysis providing guidelines for the design of such strategies. We focus on the communication complexity of federated joins when the problem is parameterized by a measure commonly referred to as the certificate of the instance: a framework that has been used before in the context of set intersection and local query processing. We show how to process any conjunctive query in time given by the certificate of instances. Our algorithm is an adaptation of Minesweeper, one of the algorithms devised for local query processing, into our federating setting. When certificates are of the size of the instance, this amount to sending the entire database, but our strategy provides drastic reductions in the communication needed for queries and instances with small certificates. We also show matching communication lower bounds for cases where the certificate is smaller than the size of active domain of the instances.

Cite as

Tamara Cucumides and Juan Reutter. Communication Cost of Joins over Federated Data. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cucumides_et_al:LIPIcs.ICDT.2024.5,
  author =	{Cucumides, Tamara and Reutter, Juan},
  title =	{{Communication Cost of Joins over Federated Data}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.5},
  URN =		{urn:nbn:de:0030-drops-197879},
  doi =		{10.4230/LIPIcs.ICDT.2024.5},
  annote =	{Keywords: databases, database queries, query federation, communication complexity, adaptive algorithms}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.6

Range Entropy Queries and Partitioning

Authors: Sanjay Krishnan and Stavros Sintos

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Data partitioning that maximizes or minimizes Shannon entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be accelerated if we have a data structure to find the entropy in different subsets of data when the algorithm needs to decide what block to construct. While it is generally known how to compute the entropy of a discrete distribution efficiently, we want to efficiently derive the entropy among the data items that lie in a specific area. We solve this problem in a typical setting when we deal with real data, where data items are geometric points and each requested area is a query (hyper)rectangle. More specifically, we consider a set P of n weighted and colored points in ℝ^d. The goal is to construct a low space data structure, such that given a query (hyper)rectangle R, it computes the entropy based on the colors of the points in P∩ R, in sublinear time. We show a conditional lower bound for this problem proving that we cannot hope for data structures with near-linear space and near-constant query time. Then, we propose exact data structures for d = 1 and d > 1 with o(n^{2d}) space and o(n) query time. We also provide a tune parameter t that the user can choose to bound the asymptotic space and query time of the new data structures. Next, we propose near linear space data structures for returning either an additive or a multiplicative approximation of the entropy. Finally, we show how we can use the new data structures to efficiently partition time series and histograms with respect to entropy.

Cite as

Sanjay Krishnan and Stavros Sintos. Range Entropy Queries and Partitioning. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 6:1-6:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{krishnan_et_al:LIPIcs.ICDT.2024.6,
  author =	{Krishnan, Sanjay and Sintos, Stavros},
  title =	{{Range Entropy Queries and Partitioning}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{6:1--6:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.6},
  URN =		{urn:nbn:de:0030-drops-197883},
  doi =		{10.4230/LIPIcs.ICDT.2024.6},
  annote =	{Keywords: Shannon entropy, range query, data structure, data partitioning}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.7

Skyline Operators for Document Spanners

Authors: Antoine Amarilli, Benny Kimelfeld, Sébastien Labbé, and Stefan Mengel

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples that extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor. To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.

Cite as

Antoine Amarilli, Benny Kimelfeld, Sébastien Labbé, and Stefan Mengel. Skyline Operators for Document Spanners. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2024.7,
  author =	{Amarilli, Antoine and Kimelfeld, Benny and Labb\'{e}, S\'{e}bastien and Mengel, Stefan},
  title =	{{Skyline Operators for Document Spanners}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.7},
  URN =		{urn:nbn:de:0030-drops-197898},
  doi =		{10.4230/LIPIcs.ICDT.2024.7},
  annote =	{Keywords: Information Extraction, Document Spanners, Query Evaluation}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.8

When Do Homomorphism Counts Help in Query Algorithms?

Authors: Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis, and Wei-Lin Wu

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

A query algorithm based on homomorphism counts is a procedure for determining whether a given instance satisfies a property by counting homomorphisms between the given instance and finitely many predetermined instances. In a left query algorithm, we count homomorphisms from the predetermined instances to the given instance, while in a right query algorithm we count homomorphisms from the given instance to the predetermined instances. Homomorphisms are usually counted over the semiring ℕ of non-negative integers; it is also meaningful, however, to count homomorphisms over the Boolean semiring 𝔹, in which case the homomorphism count indicates whether or not a homomorphism exists. We first characterize the properties that admit a left query algorithm over 𝔹 by showing that these are precisely the properties that are both first-order definable and closed under homomorphic equivalence. After this, we turn attention to a comparison between left query algorithms over 𝔹 and left query algorithms over ℕ. In general, there are properties that admit a left query algorithm over ℕ but not over 𝔹. The main result of this paper asserts that if a property is closed under homomorphic equivalence, then that property admits a left query algorithm over 𝔹 if and only if it admits a left query algorithm over ℕ. In other words and rather surprisingly, homomorphism counts over ℕ do not help as regards properties that are closed under homomorphic equivalence. Finally, we characterize the properties that admit both a left query algorithm over 𝔹 and a right query algorithm over 𝔹.

Cite as

Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis, and Wei-Lin Wu. When Do Homomorphism Counts Help in Query Algorithms?. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{tencate_et_al:LIPIcs.ICDT.2024.8,
  author =	{ten Cate, Balder and Dalmau, Victor and Kolaitis, Phokion G. and Wu, Wei-Lin},
  title =	{{When Do Homomorphism Counts Help in Query Algorithms?}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{8:1--8:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.8},
  URN =		{urn:nbn:de:0030-drops-197905},
  doi =		{10.4230/LIPIcs.ICDT.2024.8},
  annote =	{Keywords: query algorithms, homomorphism, homomorphism counts, conjunctive query, constraint satisfaction}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.9

Approximating Single-Source Personalized PageRank with Absolute Error Guarantees

Authors: Zhewei Wei, Ji-Rong Wen, and Mingji Yang

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes s and t on a graph G = (V,E), the PPR value π(s,t) is defined as the probability that an α-discounted random walk from s terminates at t, where the walk terminates with probability α at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node s to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates π̂(s,t) satisfy max_{t ∈ V} |π̂(s,t)-π(s,t)| ≤ ε for a given error bound ε. We propose an algorithm that achieves this with high probability, with an expected running time of - Õ(√m/ε) for directed graphs, where m = |E|; - Õ(√{d_max}/ε) for undirected graphs, where d_max is the maximum node degree in the graph; - Õ(n^{γ-1/2}/ε) for power-law graphs, where n = |V| and γ ∈ (1/2,1) is the extent of the power law. These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring max_{t ∈ V} |π̂(s,t)/d(t)-π(s,t)/d(t)| ≤ ε_d for a given error bound ε_d, where the graph is undirected and d(t) is the degree of node t. We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of Õ(√{∑_{t ∈ V} π(s,t)/d(t)}/ε_d). This improves over the previously known O(1/ε_d) complexity.

Cite as

Zhewei Wei, Ji-Rong Wen, and Mingji Yang. Approximating Single-Source Personalized PageRank with Absolute Error Guarantees. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{wei_et_al:LIPIcs.ICDT.2024.9,
  author =	{Wei, Zhewei and Wen, Ji-Rong and Yang, Mingji},
  title =	{{Approximating Single-Source Personalized PageRank with Absolute Error Guarantees}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.9},
  URN =		{urn:nbn:de:0030-drops-197911},
  doi =		{10.4230/LIPIcs.ICDT.2024.9},
  annote =	{Keywords: Graph Algorithms, Sublinear Algorithms, Personalized PageRank}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.10

Right-Adjoints for Datalog Programs

Authors: Balder ten Cate, Víctor Dalmau, and Jakub Opršal

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

A Datalog program can be viewed as a syntactic specification of a mapping from database instances over some schema to database instances over another schema. We establish a large class of Datalog programs for which this mapping admits a (generalized) right-adjoint. We employ these results to obtain new insights into the existence of, and methods for constructing, homomorphism dualities within restricted classes of instances. From this, we derive new results regarding the existence of uniquely characterizing data examples for database queries in the presence of integrity constraints.

Cite as

Balder ten Cate, Víctor Dalmau, and Jakub Opršal. Right-Adjoints for Datalog Programs. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{tencate_et_al:LIPIcs.ICDT.2024.10,
  author =	{ten Cate, Balder and Dalmau, V{\'\i}ctor and Opr\v{s}al, Jakub},
  title =	{{Right-Adjoints for Datalog Programs}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{10:1--10:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.10},
  URN =		{urn:nbn:de:0030-drops-197929},
  doi =		{10.4230/LIPIcs.ICDT.2024.10},
  annote =	{Keywords: Datalog, Adjoints, Homomorphism Dualities, Database Constraints, Conjunctive Queries, Data Examples}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.11

On the Convergence Rate of Linear Datalog ^∘ over Stable Semirings

Authors: Sungjin Im, Benjamin Moseley, Hung Ngo, and Kirk Pruhs

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Datalog^∘ is an extension of Datalog, where instead of a program being a collection of union of conjunctive queries over the standard Boolean semiring, a program may now be a collection of sum-product queries over an arbitrary commutative partially ordered pre-semiring. Datalog^∘ is more powerful than Datalog in that its additional algebraic structure alows for supporting recursion with aggregation. At the same time, Datalog^∘ retains the syntactic and semantic simplicity of Datalog: Datalog^∘ has declarative least fixpoint semantics. The least fixpoint can be found via the naïve evaluation algorithm that repeatedly applies the immediate consequence operator until no further change is possible. It was shown in [Mahmoud Abo Khamis et al., 2022] that, when the underlying semiring is p-stable, then the naïve evaluation of any Datalog^∘ program over the semiring converges in a finite number of steps. However, the upper bounds on the rate of convergence were exponential in the number n of ground IDB atoms. This paper establishes polynomial upper bounds on the convergence rate of the naïve algorithm on linear Datalog^∘ programs, which is quite common in practice. In particular, the main result of this paper is that the convergence rate of linear Datalog^∘ programs under any p-stable semiring is O(pn³). Furthermore, we show a matching lower bound by constructing a p-stable semiring and a linear Datalog^∘ program that requires Ω(pn³) iterations for the naïve iteration algorithm to converge. Next, we study the convergence rate in terms of the number of elements in the semiring for linear Datalog^∘ programs. When L is the number of elements, the convergence rate is bounded by O(pn log L). This significantly improves the convergence rate for small L. We show a nearly matching lower bound as well.

Cite as

Sungjin Im, Benjamin Moseley, Hung Ngo, and Kirk Pruhs. On the Convergence Rate of Linear Datalog ^∘ over Stable Semirings. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{im_et_al:LIPIcs.ICDT.2024.11,
  author =	{Im, Sungjin and Moseley, Benjamin and Ngo, Hung and Pruhs, Kirk},
  title =	{{On the Convergence Rate of Linear Datalog ^∘ over Stable Semirings}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.11},
  URN =		{urn:nbn:de:0030-drops-197939},
  doi =		{10.4230/LIPIcs.ICDT.2024.11},
  annote =	{Keywords: Datalog, convergence rate, semiring}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.12

Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility

Authors: Thomas Muñoz Serrano, Cristian Riveros, and Stijn Vansummeren

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Due to the importance of linear algebra and matrix operations in data analytics, there is significant interest in using relational query optimization and processing techniques for evaluating (sparse) linear algebra programs. In particular, in recent years close connections have been established between linear algebra programs and relational algebra that allow transferring optimization techniques of the latter to the former. In this paper, we ask ourselves which linear algebra programs in MATLANG correspond to the free-connex and q-hierarchical fragments of conjunctive first-order logic. Both fragments have desirable query processing properties: free-connex conjunctive queries support constant-delay enumeration after a linear-time preprocessing phase, and q-hierarchical conjunctive queries further allow constant-time updates. By characterizing the corresponding fragments of MATLANG, we hence identify the fragments of linear algebra programs that one can evaluate with constant-delay enumeration after linear-time preprocessing and with constant-time updates. To derive our results, we improve and generalize previous correspondences between MATLANG and relational algebra evaluated over semiring-annotated relations. In addition, we identify properties on semirings that allow to generalize the complexity bounds for free-connex and q-hierarchical conjunctive queries from Boolean annotations to general semirings.

Cite as

Thomas Muñoz Serrano, Cristian Riveros, and Stijn Vansummeren. Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{munozserrano_et_al:LIPIcs.ICDT.2024.12,
  author =	{Mu\~{n}oz Serrano, Thomas and Riveros, Cristian and Vansummeren, Stijn},
  title =	{{Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.12},
  URN =		{urn:nbn:de:0030-drops-197946},
  doi =		{10.4230/LIPIcs.ICDT.2024.12},
  annote =	{Keywords: Query evaluation, conjunctive queries, linear algebra, enumeration algorithms}
}

37 Search Results for "Cormode, Graham"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message