DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.50

Fully-Scalable MPC Algorithms for Clustering in High Dimension

Authors: Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be n^σ for arbitrarily small fixed σ > 0. Importantly, the local memory may be substantially smaller than the number of clusters k, yet all our algorithms are fast, i.e., run in O(1) rounds. We first devise a fast MPC algorithm for O(1)-approximation of uniform Facility Location. This is the first fully-scalable MPC algorithm that achieves O(1)-approximation for any clustering problem in general geometric setting; previous algorithms only provide poly(log n)-approximation or apply to restricted inputs, like low dimension or small number of clusters k; e.g. [Bhaskara and Wijewardena, ICML'18; Cohen-Addad et al., NeurIPS'21; Cohen-Addad et al., ICML'22]. We then build on this Facility Location result and devise a fast MPC algorithm that achieves O(1)-bicriteria approximation for k-Median and for k-Means, namely, it computes (1+ε)k clusters of cost within O(1/ε²)-factor of the optimum for k clusters. A primary technical tool that we introduce, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing for every data point a statistic of its approximate neighborhood, for statistics like range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22].

Cite as

Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý. Fully-Scalable MPC Algorithms for Clustering in High Dimension. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 50:1-50:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{czumaj_et_al:LIPIcs.ICALP.2024.50,
  author =	{Czumaj, Artur and Gao, Guichen and Jiang, Shaofeng H.-C. and Krauthgamer, Robert and Vesel\'{y}, Pavel},
  title =	{{Fully-Scalable MPC Algorithms for Clustering in High Dimension}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{50:1--50:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.50},
  URN =		{urn:nbn:de:0030-drops-201938},
  doi =		{10.4230/LIPIcs.ICALP.2024.50},
  annote =	{Keywords: Massively parallel computing, high dimension, facility location, k-median, k-means}
}

Document

Track B: Automata, Logic, Semantics, and Theory of Programming

DOI: 10.4230/LIPIcs.ICALP.2023.127

The Fine-Grained Complexity of Boolean Conjunctive Queries and Sum-Product Problems

Authors: Austen Z. Fan, Paraschos Koutris, and Hangdong Zhao

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

Abstract

We study the fine-grained complexity of evaluating Boolean Conjunctive Queries and their generalization to sum-of-product problems over an arbitrary semiring. For these problems, we present a general semiring-oblivious reduction from the k-clique problem to any query structure (hypergraph). Our reduction uses the notion of embedding a graph to a hypergraph, first introduced by Marx [Dániel Marx, 2013]. As a consequence of our reduction, we can show tight conditional lower bounds for many classes of hypergraphs, including cycles, Loomis-Whitney joins, some bipartite graphs, and chordal graphs. These lower bounds have a dependence on what we call the clique embedding power of a hypergraph H, which we believe is a quantity of independent interest. We show that the clique embedding power is always less than the submodular width of the hypergraph, and present a decidable algorithm for computing it. We conclude with many open problems for future research.

Cite as

Austen Z. Fan, Paraschos Koutris, and Hangdong Zhao. The Fine-Grained Complexity of Boolean Conjunctive Queries and Sum-Product Problems. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 127:1-127:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{fan_et_al:LIPIcs.ICALP.2023.127,
  author =	{Fan, Austen Z. and Koutris, Paraschos and Zhao, Hangdong},
  title =	{{The Fine-Grained Complexity of Boolean Conjunctive Queries and Sum-Product Problems}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{127:1--127:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.127},
  URN =		{urn:nbn:de:0030-drops-181791},
  doi =		{10.4230/LIPIcs.ICALP.2023.127},
  annote =	{Keywords: Fine-grained complexity, conjunctive queries, semiring-oblivious reduction}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2022.6

Certifiable Robustness for Nearest Neighbor Classifiers

Authors: Austen Z. Fan and Paraschos Koutris

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, k-Nearest Neighbors (k-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.

Cite as

Austen Z. Fan and Paraschos Koutris. Certifiable Robustness for Nearest Neighbor Classifiers. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{fan_et_al:LIPIcs.ICDT.2022.6,
  author =	{Fan, Austen Z. and Koutris, Paraschos},
  title =	{{Certifiable Robustness for Nearest Neighbor Classifiers}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{6:1--6:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.6},
  URN =		{urn:nbn:de:0030-drops-158809},
  doi =		{10.4230/LIPIcs.ICDT.2022.6},
  annote =	{Keywords: Inconsistent databases, k-NN classification, certifiable robustness}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2021.5

Ranked Enumeration of Conjunctive Query Results

Authors: Shaleen Deep and Paraschos Koutris

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

We study the problem of enumerating answers of Conjunctive Queries ranked according to a given ranking function. Our main contribution is a novel algorithm with small preprocessing time, logarithmic delay, and non-trivial space usage during execution. To allow for efficient enumeration, we exploit certain properties of ranking functions that frequently occur in practice. To this end, we introduce the notions of decomposable and compatible (w.r.t. a query decomposition) ranking functions, which allow for partial aggregation of tuple scores in order to efficiently enumerate the output. We complement the algorithmic results with lower bounds that justify why restrictions on the structure of ranking functions are necessary. Our results extend and improve upon a long line of work that has studied ranked enumeration from both a theoretical and practical perspective.

Cite as

Shaleen Deep and Paraschos Koutris. Ranked Enumeration of Conjunctive Query Results. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{deep_et_al:LIPIcs.ICDT.2021.5,
  author =	{Deep, Shaleen and Koutris, Paraschos},
  title =	{{Ranked Enumeration of Conjunctive Query Results}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.5},
  URN =		{urn:nbn:de:0030-drops-137139},
  doi =		{10.4230/LIPIcs.ICDT.2021.5},
  annote =	{Keywords: Query result enumeration, joins, ranking}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2021.14

Enumeration Algorithms for Conjunctive Queries with Projection

Authors: Shaleen Deep, Xiao Hu, and Paraschos Koutris

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

We investigate the enumeration of query results for an important subset of CQs with projections, namely star and path queries. The task is to design data structures and algorithms that allow for efficient enumeration with delay guarantees after a preprocessing phase. Our main contribution is a series of results based on the idea of interleaving precomputed output with further join processing to maintain delay guarantees, which maybe of independent interest. In particular, we design combinatorial algorithms that provide instance-specific delay guarantees in linear preprocessing time. These algorithms improve upon the currently best known results. Further, we show how existing results can be improved upon by using fast matrix multiplication. We also present {new} results involving tradeoff between preprocessing time and delay guarantees for enumeration of path queries that contain projections. CQs with projection where the join attribute is projected away is equivalent to boolean matrix multiplication. Our results can therefore be also interpreted as sparse, output-sensitive matrix multiplication with delay guarantees.

Cite as

Shaleen Deep, Xiao Hu, and Paraschos Koutris. Enumeration Algorithms for Conjunctive Queries with Projection. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{deep_et_al:LIPIcs.ICDT.2021.14,
  author =	{Deep, Shaleen and Hu, Xiao and Koutris, Paraschos},
  title =	{{Enumeration Algorithms for Conjunctive Queries with Projection}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.14},
  URN =		{urn:nbn:de:0030-drops-137229},
  doi =		{10.4230/LIPIcs.ICDT.2021.14},
  annote =	{Keywords: Query result enumeration, joins, ranking}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2021.22

Locality-Aware Distribution Schemes

Authors: Bruhathi Sundarmurthy, Paraschos Koutris, and Jeffrey Naughton

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

One of the bottlenecks in parallel query processing is the cost of shuffling data across nodes in a cluster. Ideally, given a distribution of the data across the nodes and a query, we want to execute the query by performing only local computation and no communication: in this case, the query is called parallel-correct with respect to the data distribution. Previous work studied this problem for Conjunctive Queries in the case where the distribution scheme is oblivious, i.e., the location of each tuple depends only on the tuple and is independent of the instance. In this work, we show that oblivious schemes have a fundamental theoretical limitation, and initiate the formal study of distribution schemes that are locality-aware. In particular, we focus on a class of distribution schemes called co-hash distribution schemes, which are widely used in parallel systems. In co-hash partitioning, some tables are initially hashed, and the remaining tables are co-located so that a join condition is always satisfied. Given a co-hash distribution scheme, we formally study the complexity of deciding various desirable properties, including obliviousness and redundancy. Then, for a given Conjunctive Query and co-hash scheme, we determine the computational complexity of deciding whether the query is parallel-correct. We also explore a stronger notion of correctness, called parallel disjoint correctness, which guarantees that the query result will be disjointly partitioned across nodes, i.e., there is no duplication of results.

Cite as

Bruhathi Sundarmurthy, Paraschos Koutris, and Jeffrey Naughton. Locality-Aware Distribution Schemes. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 22:1-22:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{sundarmurthy_et_al:LIPIcs.ICDT.2021.22,
  author =	{Sundarmurthy, Bruhathi and Koutris, Paraschos and Naughton, Jeffrey},
  title =	{{Locality-Aware Distribution Schemes}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{22:1--22:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.22},
  URN =		{urn:nbn:de:0030-drops-137302},
  doi =		{10.4230/LIPIcs.ICDT.2021.22},
  annote =	{Keywords: partitioning, parallel correctness, join queries}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.23

Consistent Query Answering for Primary Keys in Logspace

Authors: Paraschos Koutris and Jef Wijsen

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

We study the complexity of consistent query answering on databases that may violate primary key constraints. A repair of such a database is any consistent database that can be obtained by deleting a minimal set of tuples. For every Boolean query q, CERTAINTY(q) is the problem that takes a database as input and asks whether q evaluates to true on every repair. In [Koutris and Wijsen, ACM TODS, 2017], the authors show that for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either in P or coNP-complete, and it is decidable which of the two cases applies. In this paper, we sharpen this result by showing that for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either expressible in symmetric stratified Datalog (with some aggregation operator) or coNP-complete. Since symmetric stratified Datalog is in L, we thus obtain a complexity-theoretic dichotomy between L and coNP-complete. Another new finding of practical importance is that CERTAINTY(q) is on the logspace side of the dichotomy for queries q where all join conditions express foreign-to-primary key matches, which is undoubtedly the most common type of join condition.

Cite as

Paraschos Koutris and Jef Wijsen. Consistent Query Answering for Primary Keys in Logspace. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{koutris_et_al:LIPIcs.ICDT.2019.23,
  author =	{Koutris, Paraschos and Wijsen, Jef},
  title =	{{Consistent Query Answering for Primary Keys in Logspace}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{23:1--23:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.23},
  URN =		{urn:nbn:de:0030-drops-103252},
  doi =		{10.4230/LIPIcs.ICDT.2019.23},
  annote =	{Keywords: conjunctive queries, consistent query answering, Datalog, primary keys}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2018.17

Distribution Policies for Datalog

Authors: Bas Ketsman, Aws Albarghouthi, and Paraschos Koutris

Published in: LIPIcs, Volume 98, 21st International Conference on Database Theory (ICDT 2018)

Abstract

Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds.

Cite as

Bas Ketsman, Aws Albarghouthi, and Paraschos Koutris. Distribution Policies for Datalog. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 17:1-17:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{ketsman_et_al:LIPIcs.ICDT.2018.17,
  author =	{Ketsman, Bas and Albarghouthi, Aws and Koutris, Paraschos},
  title =	{{Distribution Policies for Datalog}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{17:1--17:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.17},
  URN =		{urn:nbn:de:0030-drops-86034},
  doi =		{10.4230/LIPIcs.ICDT.2018.17},
  annote =	{Keywords: Datalog queries, Distributed evaluation, Distribution policies}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2017.12

The Design of Arbitrage-Free Data Pricing Schemes

Authors: Shaleen Deep and Paraschos Koutris

Published in: LIPIcs, Volume 68, 20th International Conference on Database Theory (ICDT 2017)

Abstract

Motivated by a growing market that involves buying and selling data over the web, we study pricing schemes that assign value to queries issued over a database. Previous work studied pricing mechanisms that compute the price of a query by extending a data seller’s explicit prices on certain queries, or investigated the properties that a pricing function should exhibit without detailing a generic construction. In this work, we present a formal framework for pricing queries over data that allows the construction of general families of pricing functions, with the main goal of avoiding arbitrage. We consider two types of pricing schemes: instance-independent schemes, where the price depends only on the structure of the query, and answer-dependent schemes, where the price also depends on the query output. Our main result is a complete characterization of the structure of pricing functions in both settings, by relating it to properties of a function over a lattice. We use our characterization, together with information-theoretic methods, to construct a variety of arbitrage-free pricing functions. Finally, we discuss various tradeoffs in the design space and present techniques for efficient computation of the proposed pricing functions.

Cite as

Shaleen Deep and Paraschos Koutris. The Design of Arbitrage-Free Data Pricing Schemes. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{deep_et_al:LIPIcs.ICDT.2017.12,
  author =	{Deep, Shaleen and Koutris, Paraschos},
  title =	{{The Design of Arbitrage-Free Data Pricing Schemes}},
  booktitle =	{20th International Conference on Database Theory (ICDT 2017)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-024-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{68},
  editor =	{Benedikt, Michael and Orsi, Giorgio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2017.12},
  URN =		{urn:nbn:de:0030-drops-70574},
  doi =		{10.4230/LIPIcs.ICDT.2017.12},
  annote =	{Keywords: data pricing, determinacy, arbitrage}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2017.21

m-tables: Representing Missing Data

Authors: Bruhathi Sundarmurthy, Paraschos Koutris, Willis Lang, Jeffrey Naughton, and Val Tannen

Published in: LIPIcs, Volume 68, 20th International Conference on Database Theory (ICDT 2017)

Abstract

Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible.

Cite as

Bruhathi Sundarmurthy, Paraschos Koutris, Willis Lang, Jeffrey Naughton, and Val Tannen. m-tables: Representing Missing Data. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{sundarmurthy_et_al:LIPIcs.ICDT.2017.21,
  author =	{Sundarmurthy, Bruhathi and Koutris, Paraschos and Lang, Willis and Naughton, Jeffrey and Tannen, Val},
  title =	{{m-tables: Representing Missing Data}},
  booktitle =	{20th International Conference on Database Theory (ICDT 2017)},
  pages =	{21:1--21:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-024-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{68},
  editor =	{Benedikt, Michael and Orsi, Giorgio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2017.21},
  URN =		{urn:nbn:de:0030-drops-70618},
  doi =		{10.4230/LIPIcs.ICDT.2017.21},
  annote =	{Keywords: missing values, incomplete data, c tables, representation systems}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2016.8

Worst-Case Optimal Algorithms for Parallel Query Processing

Authors: Paraschos Koutris, Paul Beame, and Dan Suciu

Published in: LIPIcs, Volume 48, 19th International Conference on Database Theory (ICDT 2016)

Abstract

In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with p servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of (Ngo et al. 2012) for sequential algorithms. We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query q when all relations have size equal to M is O(M/p^{1/psi^*}), where psi^* is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several classes of queries. Finally, we show a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms. This technique allows us to recover (within a polylogarithmic factor) several recent results on the I/O complexity for computing join queries, and also obtain optimal algorithms for other classes of queries.

Cite as

Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-Case Optimal Algorithms for Parallel Query Processing. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{koutris_et_al:LIPIcs.ICDT.2016.8,
  author =	{Koutris, Paraschos and Beame, Paul and Suciu, Dan},
  title =	{{Worst-Case Optimal Algorithms for Parallel Query Processing}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.8},
  URN =		{urn:nbn:de:0030-drops-57771},
  doi =		{10.4230/LIPIcs.ICDT.2016.8},
  annote =	{Keywords: conjunctive query, parallel computation, worst-case bounds}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2015.76

Answering Conjunctive Queries with Inequalities

Authors: Paraschos Koutris, Tova Milo, Sudeepa Roy, and Dan Suciu

Published in: LIPIcs, Volume 31, 18th International Conference on Database Theory (ICDT 2015)

Abstract

In this parer, we study the complexity of answering conjunctive queries (CQ) with inequalities. In particular, we compare the complexity of the query with and without inequalities. The main contribution of our work is a novel combinatorial technique that enables the use of any Select-Project-Join query plan for a given CQ without inequalities in answering the CQ with inequalities, with an additional factor in running time that only depends on the query. To achieve this, we define a new projection operator that keeps a small representation (independent of the size of the database) of the set of input tuples that map to each tuple in the output of the projection; this representation is used to evaluate all the inequalities in the query. Second, we generalize a result by Papadimitriou-Yannakakis [PODS'97] and give an alternative algorithm based on the color-coding technique [Alon, Yuster and Zwick, PODS'02] to evaluate a CQ with inequalities by using an algorithm for the CQ without inequalities. Third, we investigate the structure of the query graph, inequality graph, and the augmented query graph with inequalities, and show that even if the query and the inequality graphs have bounded treewidth, the augmented graph not only can have an unbounded treewidth but can also be NP-hard to evaluate. Further, we illustrate classes of queries and inequalities where the augmented graphs have unbounded treewidth, but the CQ with inequalities can be evaluated in poly-time. Finally, we give necessary properties and sufficient properties that allow a class of CQs to have poly-time combined complexity with respect to any inequality pattern.

Cite as

Paraschos Koutris, Tova Milo, Sudeepa Roy, and Dan Suciu. Answering Conjunctive Queries with Inequalities. In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 76-93, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{koutris_et_al:LIPIcs.ICDT.2015.76,
  author =	{Koutris, Paraschos and Milo, Tova and Roy, Sudeepa and Suciu, Dan},
  title =	{{Answering Conjunctive Queries with Inequalities}},
  booktitle =	{18th International Conference on Database Theory (ICDT 2015)},
  pages =	{76--93},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-79-8},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{31},
  editor =	{Arenas, Marcelo and Ugarte, Mart{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2015.76},
  URN =		{urn:nbn:de:0030-drops-49781},
  doi =		{10.4230/LIPIcs.ICDT.2015.76},
  annote =	{Keywords: query evaluation, conjunctive query, inequality, treewidth}
}

12 Search Results for "Koutris, Paraschos"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as