40 Search Results for "Cormode, Graham"


Volume

LIPIcs, Volume 290

27th International Conference on Database Theory (ICDT 2024)

ICDT 2024, March 25-28, 2024, Paestum, Italy

Editors: Graham Cormode and Michael Shekelyan

Document
Streaming Zero-Knowledge Proofs

Authors: Graham Cormode, Marcel Dall'Agnol, Tom Gur, and Chris Hickey

Published in: LIPIcs, Volume 300, 39th Computational Complexity Conference (CCC 2024)


Abstract
Streaming interactive proofs (SIPs) enable a space-bounded algorithm with one-pass access to a massive stream of data to verify a computation that requires large space, by communicating with a powerful but untrusted prover. This work initiates the study of zero-knowledge proofs for data streams. We define the notion of zero-knowledge in the streaming setting and construct zero-knowledge SIPs for the two main algorithmic building blocks in the streaming interactive proofs literature: the sumcheck and polynomial evaluation protocols. To the best of our knowledge all known streaming interactive proofs are based on either of these tools, and indeed, this allows us to obtain zero-knowledge SIPs for central streaming problems such as index, point and range queries, median, frequency moments, and inner product. Our protocols are efficient in terms of time and space, as well as communication: the verifier algorithm’s space complexity is polylog(n) and, after a non-interactive setup that uses a random string of near-linear length, the remaining parameters are n^o(1). En route, we develop an algorithmic toolkit for designing zero-knowledge data stream protocols, consisting of an algebraic streaming commitment protocol and a temporal commitment protocol. Our analyses rely on delicate algebraic and information-theoretic arguments and reductions from average-case communication complexity.

Cite as

Graham Cormode, Marcel Dall'Agnol, Tom Gur, and Chris Hickey. Streaming Zero-Knowledge Proofs. In 39th Computational Complexity Conference (CCC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 300, pp. 2:1-2:66, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{cormode_et_al:LIPIcs.CCC.2024.2,
  author =	{Cormode, Graham and Dall'Agnol, Marcel and Gur, Tom and Hickey, Chris},
  title =	{{Streaming Zero-Knowledge Proofs}},
  booktitle =	{39th Computational Complexity Conference (CCC 2024)},
  pages =	{2:1--2:66},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-331-7},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{300},
  editor =	{Santhanam, Rahul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2024.2},
  URN =		{urn:nbn:de:0030-drops-203988},
  doi =		{10.4230/LIPIcs.CCC.2024.2},
  annote =	{Keywords: Zero-knowledge proofs, streaming algorithms, computational complexity}
}
Document
Top-k Frequent Patterns in Streams and Parameterized-Space LZ Compression

Authors: Patrick Dinklage, Johannes Fischer, and Nicola Prezza

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)


Abstract
We present novel online approximations of the Lempel-Ziv 77 (LZ77) and Lempel-Ziv 78 (LZ78) compression schemes [Lempel & Ziv, 1977/1978] with parameterizable space usage based on estimating which k patterns occur the most frequently in the streamed input for parameter k. This new approach overcomes the issue of finding only local repetitions, which is a natural limitation of algorithms that compress using a sliding window or by partitioning the input into blocks. For this, we introduce the top-k trie, a summary for maintaining online the top-k frequent consecutive patterns in a stream of characters based on a combination of the Lempel-Ziv 78 compression scheme and the Misra-Gries algorithm for frequent item estimation in streams. Using straightforward encoding, our implementations yield compression ratios (output over input size) competitive with established general-purpose LZ-based compression utilities such as gzip or xz.

Cite as

Patrick Dinklage, Johannes Fischer, and Nicola Prezza. Top-k Frequent Patterns in Streams and Parameterized-Space LZ Compression. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 9:1-9:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.SEA.2024.9,
  author =	{Dinklage, Patrick and Fischer, Johannes and Prezza, Nicola},
  title =	{{Top-k Frequent Patterns in Streams and Parameterized-Space LZ Compression}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{9:1--9:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.9},
  URN =		{urn:nbn:de:0030-drops-203748},
  doi =		{10.4230/LIPIcs.SEA.2024.9},
  annote =	{Keywords: compression, streaming, heavy hitters, algorithm engineering}
}
Document
Track A: Algorithms, Complexity and Games
Streaming Edge Coloring with Asymptotically Optimal Colors

Authors: Mohammad Saneian and Soheil Behnezhad

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
Given a graph G, an edge-coloring is an assignment of colors to edges of G such that any two edges sharing an endpoint receive different colors. By Vizing’s celebrated theorem, any graph of maximum degree Δ needs at least Δ and at most (Δ + 1) colors to be properly edge colored. In this paper, we study edge colorings in the streaming setting. The edges arrive one by one in an arbitrary order. The algorithm takes a single pass over the input and must output a solution using a much smaller space than the input size. Since the output of edge coloring is as large as its input, the assigned colors should also be reported in a streaming fashion. The streaming edge coloring problem has been studied in a series of works over the past few years. The main challenge is that the algorithm cannot "remember" all the color assignments that it returns. To ensure the validity of the solution, existing algorithms use many more colors than Vizing’s bound. Namely, in n-vertex graphs, the state-of-the-art algorithm with Õ(n s) space requires O(Δ²/s + Δ) colors. Note, in particular, that for an asymptotically optimal O(Δ) coloring, this algorithm requires Ω(nΔ) space which is as large as the input. Whether such a coloring can be achieved with sublinear space has been left open. In this paper, we answer this question in the affirmative. We present a randomized algorithm that returns an asymptotically optimal O(Δ) edge coloring using Õ(n √{Δ}) space. More generally, our algorithm returns a proper O(Δ^{1.5}/s + Δ) edge coloring with Õ(n s) space, improving prior algorithms for the whole range of s.

Cite as

Mohammad Saneian and Soheil Behnezhad. Streaming Edge Coloring with Asymptotically Optimal Colors. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 121:1-121:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{saneian_et_al:LIPIcs.ICALP.2024.121,
  author =	{Saneian, Mohammad and Behnezhad, Soheil},
  title =	{{Streaming Edge Coloring with Asymptotically Optimal Colors}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{121:1--121:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.121},
  URN =		{urn:nbn:de:0030-drops-202640},
  doi =		{10.4230/LIPIcs.ICALP.2024.121},
  annote =	{Keywords: Streaming, Edge coloring, Adversarial order}
}
Document
Complete Volume
LIPIcs, Volume 290, ICDT 2024, Complete Volume

Authors: Graham Cormode and Michael Shekelyan

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
LIPIcs, Volume 290, ICDT 2024, Complete Volume

Cite as

27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 1-484, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@Proceedings{cormode_et_al:LIPIcs.ICDT.2024,
  title =	{{LIPIcs, Volume 290, ICDT 2024, Complete Volume}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{1--484},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024},
  URN =		{urn:nbn:de:0030-drops-197819},
  doi =		{10.4230/LIPIcs.ICDT.2024},
  annote =	{Keywords: LIPIcs, Volume 290, ICDT 2024, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Graham Cormode and Michael Shekelyan

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{cormode_et_al:LIPIcs.ICDT.2024.0,
  author =	{Cormode, Graham and Shekelyan, Michael},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.0},
  URN =		{urn:nbn:de:0030-drops-197828},
  doi =		{10.4230/LIPIcs.ICDT.2024.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Natural Language Data Interfaces: A Data Access Odyssey (Invited Talk)

Authors: Georgia Koutrika

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
Back in 1970’s, E. F. Codd worked on a prototype of a natural language question and answer application that would sit on top of a relational database system. Soon, natural language interfaces for databases (NLIDBs) became the holy grail for the database community. Different approaches have been proposed from the database, machine learning and NLP communities. Interest in the topic has had its peaks and valleys. After a long and adventurous journey of almost 50 years, there is a rekindled interest in NLIDBs in recent years, fueled by the need for democratizing data access and by the recent advances in deep learning and natural language processing in particular. There is a surge of works on natural language interfaces for databases using neural translation, and suddenly it becomes hard to keep up with advancements in the field. Are we close to finding the holy grail of data access? What are the lurking challenges that we need to surpass and what research opportunities arise? Finally, what is the role of the database community?

Cite as

Georgia Koutrika. Natural Language Data Interfaces: A Data Access Odyssey (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 1:1-1:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{koutrika:LIPIcs.ICDT.2024.1,
  author =	{Koutrika, Georgia},
  title =	{{Natural Language Data Interfaces: A Data Access Odyssey}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{1:1--1:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.1},
  URN =		{urn:nbn:de:0030-drops-197832},
  doi =		{10.4230/LIPIcs.ICDT.2024.1},
  annote =	{Keywords: natural language data interfaces, NLIDBs, NL-to-SQL, text-to-SQL, conversational databases}
}
Document
Invited Talk
How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk)

Authors: Sudeepa Roy, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Kristin Stephens-Martinez, and Jun Yang

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
Data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured relational data. Relational queries are based on logic and may be declarative in nature, posing new challenges to novices and students. Manual teaching resources being limited and enrollment growing rapidly, automated tools that help students debug queries and explain errors are potential game-changers in database education. We present a suite of tools built on the foundations of database theory that has been used by over 1600 students in database classes at Duke University, showcasing a high-impact application of database theory in database education.

Cite as

Sudeepa Roy, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Kristin Stephens-Martinez, and Jun Yang. How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 2:1-2:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{roy_et_al:LIPIcs.ICDT.2024.2,
  author =	{Roy, Sudeepa and Gilad, Amir and Hu, Yihao and Meng, Hanze and Miao, Zhengjie and Stephens-Martinez, Kristin and Yang, Jun},
  title =	{{How Database Theory Helps Teach Relational Queries in Database Education}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{2:1--2:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.2},
  URN =		{urn:nbn:de:0030-drops-197841},
  doi =		{10.4230/LIPIcs.ICDT.2024.2},
  annote =	{Keywords: Query Debugging, SQL, Relational Algebra, Relational Calculus, Database Education, Boolean Provenance}
}
Document
Invited Talk
Rule-Based Ontologies: From Semantics to Syntax (Invited Talk)

Authors: Andreas Pieris

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
An ontology specifies an abstract model of a domain of interest via a formal language that is typically based on logic. Tuple-generating dependencies (tgds) and equality-generating dependencies (egds) originally introduced as a unifying framework for database integrity constraints, and later on used in data exchange and integration, are well suited for modeling ontologies that are intended for data-intensive tasks. The reason is that, unlike other popular formalisms such as description logics, tgds and egds can easily handle higher-arity relations that naturally occur in relational databases. In recent years, there has been an extensive study of tgd- and egd-based ontologies and of their applications to several different data-intensive tasks. In those studies, model theory plays a crucial role and it typically proceeds from syntax to semantics. In other words, the syntax of an ontology language is introduced first and then the properties of the mathematical structures that satisfy ontologies of that language are explored. There is, however, a mature and growing body of research in the reverse direction, i.e., from semantics to syntax. Here, the starting point is a collection of model-theoretic properties and the goal is to determine whether or not these properties characterize some ontology language. Such results are welcome as they pinpoint the expressive power of an ontology language in terms of insightful model-theoretic properties. The main aim of this tutorial is to present a comprehensive overview of model-theoretic characterizations of tgd- and egd-based ontology languages that are encountered in database theory and symbolic artificial intelligence.

Cite as

Andreas Pieris. Rule-Based Ontologies: From Semantics to Syntax (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{pieris:LIPIcs.ICDT.2024.3,
  author =	{Pieris, Andreas},
  title =	{{Rule-Based Ontologies: From Semantics to Syntax}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.3},
  URN =		{urn:nbn:de:0030-drops-197857},
  doi =		{10.4230/LIPIcs.ICDT.2024.3},
  annote =	{Keywords: ontologies, tuple-generating dependencies, equality-generating dependencies, model theory, model-theoretic characterizations}
}
Document
Direct Access for Answers to Conjunctive Queries with Aggregation

Authors: Idan Eldar, Nofar Carmeli, and Benny Kimelfeld

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For some common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. Specifically, we investigate the ability to evaluate such queries by constructing in log-linear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than log-linear in the size of the input, and so, the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research provides a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself is not part of the lexicographic order. On the other hand, we show infeasibility for the case of count-distinct that does not have any efficient representation as a commutative semiring. We then investigate the ability to include the aggregate and annotation outcome in the lexicographic order. Among the hardness results, standing out as tractable is the case of a semiring with an idempotent addition, such as those of min and max. Notably, this case captures also count-distinct over a logarithmic-size domain.

Cite as

Idan Eldar, Nofar Carmeli, and Benny Kimelfeld. Direct Access for Answers to Conjunctive Queries with Aggregation. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{eldar_et_al:LIPIcs.ICDT.2024.4,
  author =	{Eldar, Idan and Carmeli, Nofar and Kimelfeld, Benny},
  title =	{{Direct Access for Answers to Conjunctive Queries with Aggregation}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.4},
  URN =		{urn:nbn:de:0030-drops-197861},
  doi =		{10.4230/LIPIcs.ICDT.2024.4},
  annote =	{Keywords: aggregate queries, conjunctive queries, provenance semirings, commutative semirings, annotated databases, direct access, ranking function, answer orderings, query classification}
}
Document
Communication Cost of Joins over Federated Data

Authors: Tamara Cucumides and Juan Reutter

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
We study the problem of querying different data sources, which we assume out of our control and that are made available by standard web communication protocols. In this scenario, the time spent communicating data often dominates the time spent processing local queries in each server. Thus, our focus is on algorithms that minimize the communication between the query processing server and the federated servers containing data. However, any federated query can always be answered with linear communication, simply by requesting all the data to the federated sources. Further, one can show that certain queries do require this amount of communication. But sending all the data is definitely not a relevant algorithm from a practical point of view. This worst-case analysis is, therefore, not useful for our needs. There is a growing body of work in terms of designing strategies that minimize communication in query federation, but these strategies are commonly based in heuristics, and we currently miss a formal analysis providing guidelines for the design of such strategies. We focus on the communication complexity of federated joins when the problem is parameterized by a measure commonly referred to as the certificate of the instance: a framework that has been used before in the context of set intersection and local query processing. We show how to process any conjunctive query in time given by the certificate of instances. Our algorithm is an adaptation of Minesweeper, one of the algorithms devised for local query processing, into our federating setting. When certificates are of the size of the instance, this amount to sending the entire database, but our strategy provides drastic reductions in the communication needed for queries and instances with small certificates. We also show matching communication lower bounds for cases where the certificate is smaller than the size of active domain of the instances.

Cite as

Tamara Cucumides and Juan Reutter. Communication Cost of Joins over Federated Data. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{cucumides_et_al:LIPIcs.ICDT.2024.5,
  author =	{Cucumides, Tamara and Reutter, Juan},
  title =	{{Communication Cost of Joins over Federated Data}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.5},
  URN =		{urn:nbn:de:0030-drops-197879},
  doi =		{10.4230/LIPIcs.ICDT.2024.5},
  annote =	{Keywords: databases, database queries, query federation, communication complexity, adaptive algorithms}
}
Document
Range Entropy Queries and Partitioning

Authors: Sanjay Krishnan and Stavros Sintos

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
Data partitioning that maximizes or minimizes Shannon entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be accelerated if we have a data structure to find the entropy in different subsets of data when the algorithm needs to decide what block to construct. While it is generally known how to compute the entropy of a discrete distribution efficiently, we want to efficiently derive the entropy among the data items that lie in a specific area. We solve this problem in a typical setting when we deal with real data, where data items are geometric points and each requested area is a query (hyper)rectangle. More specifically, we consider a set P of n weighted and colored points in ℝ^d. The goal is to construct a low space data structure, such that given a query (hyper)rectangle R, it computes the entropy based on the colors of the points in P∩ R, in sublinear time. We show a conditional lower bound for this problem proving that we cannot hope for data structures with near-linear space and near-constant query time. Then, we propose exact data structures for d = 1 and d > 1 with o(n^{2d}) space and o(n) query time. We also provide a tune parameter t that the user can choose to bound the asymptotic space and query time of the new data structures. Next, we propose near linear space data structures for returning either an additive or a multiplicative approximation of the entropy. Finally, we show how we can use the new data structures to efficiently partition time series and histograms with respect to entropy.

Cite as

Sanjay Krishnan and Stavros Sintos. Range Entropy Queries and Partitioning. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 6:1-6:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{krishnan_et_al:LIPIcs.ICDT.2024.6,
  author =	{Krishnan, Sanjay and Sintos, Stavros},
  title =	{{Range Entropy Queries and Partitioning}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{6:1--6:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.6},
  URN =		{urn:nbn:de:0030-drops-197883},
  doi =		{10.4230/LIPIcs.ICDT.2024.6},
  annote =	{Keywords: Shannon entropy, range query, data structure, data partitioning}
}
Document
Skyline Operators for Document Spanners

Authors: Antoine Amarilli, Benny Kimelfeld, Sébastien Labbé, and Stefan Mengel

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples that extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor. To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.

Cite as

Antoine Amarilli, Benny Kimelfeld, Sébastien Labbé, and Stefan Mengel. Skyline Operators for Document Spanners. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2024.7,
  author =	{Amarilli, Antoine and Kimelfeld, Benny and Labb\'{e}, S\'{e}bastien and Mengel, Stefan},
  title =	{{Skyline Operators for Document Spanners}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.7},
  URN =		{urn:nbn:de:0030-drops-197898},
  doi =		{10.4230/LIPIcs.ICDT.2024.7},
  annote =	{Keywords: Information Extraction, Document Spanners, Query Evaluation}
}
Document
When Do Homomorphism Counts Help in Query Algorithms?

Authors: Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis, and Wei-Lin Wu

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
A query algorithm based on homomorphism counts is a procedure for determining whether a given instance satisfies a property by counting homomorphisms between the given instance and finitely many predetermined instances. In a left query algorithm, we count homomorphisms from the predetermined instances to the given instance, while in a right query algorithm we count homomorphisms from the given instance to the predetermined instances. Homomorphisms are usually counted over the semiring ℕ of non-negative integers; it is also meaningful, however, to count homomorphisms over the Boolean semiring 𝔹, in which case the homomorphism count indicates whether or not a homomorphism exists. We first characterize the properties that admit a left query algorithm over 𝔹 by showing that these are precisely the properties that are both first-order definable and closed under homomorphic equivalence. After this, we turn attention to a comparison between left query algorithms over 𝔹 and left query algorithms over ℕ. In general, there are properties that admit a left query algorithm over ℕ but not over 𝔹. The main result of this paper asserts that if a property is closed under homomorphic equivalence, then that property admits a left query algorithm over 𝔹 if and only if it admits a left query algorithm over ℕ. In other words and rather surprisingly, homomorphism counts over ℕ do not help as regards properties that are closed under homomorphic equivalence. Finally, we characterize the properties that admit both a left query algorithm over 𝔹 and a right query algorithm over 𝔹.

Cite as

Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis, and Wei-Lin Wu. When Do Homomorphism Counts Help in Query Algorithms?. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{tencate_et_al:LIPIcs.ICDT.2024.8,
  author =	{ten Cate, Balder and Dalmau, Victor and Kolaitis, Phokion G. and Wu, Wei-Lin},
  title =	{{When Do Homomorphism Counts Help in Query Algorithms?}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{8:1--8:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.8},
  URN =		{urn:nbn:de:0030-drops-197905},
  doi =		{10.4230/LIPIcs.ICDT.2024.8},
  annote =	{Keywords: query algorithms, homomorphism, homomorphism counts, conjunctive query, constraint satisfaction}
}
Document
Approximating Single-Source Personalized PageRank with Absolute Error Guarantees

Authors: Zhewei Wei, Ji-Rong Wen, and Mingji Yang

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)


Abstract
Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes s and t on a graph G = (V,E), the PPR value π(s,t) is defined as the probability that an α-discounted random walk from s terminates at t, where the walk terminates with probability α at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node s to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates π̂(s,t) satisfy max_{t ∈ V} |π̂(s,t)-π(s,t)| ≤ ε for a given error bound ε. We propose an algorithm that achieves this with high probability, with an expected running time of - Õ(√m/ε) for directed graphs, where m = |E|; - Õ(√{d_max}/ε) for undirected graphs, where d_max is the maximum node degree in the graph; - Õ(n^{γ-1/2}/ε) for power-law graphs, where n = |V| and γ ∈ (1/2,1) is the extent of the power law. These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring max_{t ∈ V} |π̂(s,t)/d(t)-π(s,t)/d(t)| ≤ ε_d for a given error bound ε_d, where the graph is undirected and d(t) is the degree of node t. We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of Õ(√{∑_{t ∈ V} π(s,t)/d(t)}/ε_d). This improves over the previously known O(1/ε_d) complexity.

Cite as

Zhewei Wei, Ji-Rong Wen, and Mingji Yang. Approximating Single-Source Personalized PageRank with Absolute Error Guarantees. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{wei_et_al:LIPIcs.ICDT.2024.9,
  author =	{Wei, Zhewei and Wen, Ji-Rong and Yang, Mingji},
  title =	{{Approximating Single-Source Personalized PageRank with Absolute Error Guarantees}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.9},
  URN =		{urn:nbn:de:0030-drops-197911},
  doi =		{10.4230/LIPIcs.ICDT.2024.9},
  annote =	{Keywords: Graph Algorithms, Sublinear Algorithms, Personalized PageRank}
}
  • Refine by Author
  • 11 Cormode, Graham
  • 3 Amarilli, Antoine
  • 3 Kimelfeld, Benny
  • 3 Sintos, Stavros
  • 2 Capelli, Florent
  • Show More...

  • Refine by Classification
  • 7 Theory of computation → Database query processing and optimization (theory)
  • 7 Theory of computation → Database theory
  • 6 Theory of computation → Streaming, sublinear and near linear time algorithms
  • 5 Theory of computation → Database query languages (principles)
  • 5 Theory of computation → Graph algorithms analysis
  • Show More...

  • Refine by Keyword
  • 3 streaming algorithms
  • 2 Conjunctive Queries
  • 2 Datalog
  • 2 computational complexity
  • 2 conjunctive queries
  • Show More...

  • Refine by Type
  • 39 document
  • 1 volume

  • Refine by Publication Year
  • 31 2024
  • 3 2019
  • 2 2015
  • 2 2017
  • 1 2018
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail