DROPS

Document

DOI: 10.4230/LIPIcs.ICDT.2021.22

Locality-Aware Distribution Schemes

Authors: Bruhathi Sundarmurthy, Paraschos Koutris, and Jeffrey Naughton

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

One of the bottlenecks in parallel query processing is the cost of shuffling data across nodes in a cluster. Ideally, given a distribution of the data across the nodes and a query, we want to execute the query by performing only local computation and no communication: in this case, the query is called parallel-correct with respect to the data distribution. Previous work studied this problem for Conjunctive Queries in the case where the distribution scheme is oblivious, i.e., the location of each tuple depends only on the tuple and is independent of the instance. In this work, we show that oblivious schemes have a fundamental theoretical limitation, and initiate the formal study of distribution schemes that are locality-aware. In particular, we focus on a class of distribution schemes called co-hash distribution schemes, which are widely used in parallel systems. In co-hash partitioning, some tables are initially hashed, and the remaining tables are co-located so that a join condition is always satisfied. Given a co-hash distribution scheme, we formally study the complexity of deciding various desirable properties, including obliviousness and redundancy. Then, for a given Conjunctive Query and co-hash scheme, we determine the computational complexity of deciding whether the query is parallel-correct. We also explore a stronger notion of correctness, called parallel disjoint correctness, which guarantees that the query result will be disjointly partitioned across nodes, i.e., there is no duplication of results.

Cite as

Bruhathi Sundarmurthy, Paraschos Koutris, and Jeffrey Naughton. Locality-Aware Distribution Schemes. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 22:1-22:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{sundarmurthy_et_al:LIPIcs.ICDT.2021.22,
  author =	{Sundarmurthy, Bruhathi and Koutris, Paraschos and Naughton, Jeffrey},
  title =	{{Locality-Aware Distribution Schemes}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{22:1--22:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.22},
  URN =		{urn:nbn:de:0030-drops-137302},
  doi =		{10.4230/LIPIcs.ICDT.2021.22},
  annote =	{Keywords: partitioning, parallel correctness, join queries}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2017.21

m-tables: Representing Missing Data

Authors: Bruhathi Sundarmurthy, Paraschos Koutris, Willis Lang, Jeffrey Naughton, and Val Tannen

Published in: LIPIcs, Volume 68, 20th International Conference on Database Theory (ICDT 2017)

Abstract

Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible.

Cite as

Bruhathi Sundarmurthy, Paraschos Koutris, Willis Lang, Jeffrey Naughton, and Val Tannen. m-tables: Representing Missing Data. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{sundarmurthy_et_al:LIPIcs.ICDT.2017.21,
  author =	{Sundarmurthy, Bruhathi and Koutris, Paraschos and Lang, Willis and Naughton, Jeffrey and Tannen, Val},
  title =	{{m-tables: Representing Missing Data}},
  booktitle =	{20th International Conference on Database Theory (ICDT 2017)},
  pages =	{21:1--21:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-024-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{68},
  editor =	{Benedikt, Michael and Orsi, Giorgio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2017.21},
  URN =		{urn:nbn:de:0030-drops-70618},
  doi =		{10.4230/LIPIcs.ICDT.2017.21},
  annote =	{Keywords: missing values, incomplete data, c tables, representation systems}
}

Search Results

Documents authored by Naughton, Jeffrey

Locality-Aware Distribution Schemes

Abstract

Cite as

m-tables: Representing Missing Data

Abstract

Cite as

Thanks for your feedback!

Could not send message