Document

**Published in:** LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Recent beyond worst-case optimal join algorithms Minesweeper and its generalization Tetris have brought the theory of indexing and join processing together by developing a geometric framework for joins. These algorithms take as input an index ℬ, referred to as a box cover, that stores output gaps that can be inferred from traditional indexes, such as B+ trees or tries, on the input relations. The performances of these algorithms highly depend on the certificate of ℬ, which is the smallest subset of gaps in ℬ whose union covers all of the gaps in the output space of a query Q. Different box covers can have different size certificates and the sizes of both the box covers and certificates highly depend on the ordering of the domain values of the attributes in Q. We study how to generate box covers that contain small size certificates to guarantee efficient runtimes for these algorithms. First, given a query Q over a set of relations of size N and a fixed set of domain orderings for the attributes, we give a Õ(N)-time algorithm called GAMB which generates a box cover for Q that is guaranteed to contain the smallest size certificate across any box cover for Q. Second, we show that finding a domain ordering to minimize the box cover size and certificate is NP-hard through a reduction from the 2 consecutive block minimization problem on boolean matrices. Our third contribution is a Õ(N)-time approximation algorithm called ADORA to compute domain orderings, under which one can compute a box cover of size Õ(K^r), where K is the minimum box cover for Q under any domain ordering and r is the maximum arity of any relation. This guarantees certificates of size Õ(K^r). We combine ADORA and GAMB with Tetris to form a new algorithm we call TetrisReordered, which provides several new beyond worst-case bounds. On infinite families of queries, TetrisReordered’s runtimes are unboundedly better than the bounds stated in prior work.

Kaleb Alway, Eric Blais, and Semih Salihoglu. Box Covers and Domain Orderings for Beyond Worst-Case Join Processing. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 3:1-3:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{alway_et_al:LIPIcs.ICDT.2021.3, author = {Alway, Kaleb and Blais, Eric and Salihoglu, Semih}, title = {{Box Covers and Domain Orderings for Beyond Worst-Case Join Processing}}, booktitle = {24th International Conference on Database Theory (ICDT 2021)}, pages = {3:1--3:23}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-179-5}, ISSN = {1868-8969}, year = {2021}, volume = {186}, editor = {Yi, Ke and Wei, Zhewei}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.3}, URN = {urn:nbn:de:0030-drops-137114}, doi = {10.4230/LIPIcs.ICDT.2021.3}, annote = {Keywords: Beyond worst-case join algorithms, Tetris, Box covers, Domain orderings} }

Document

**Published in:** LIPIcs, Volume 68, 20th International Conference on Database Theory (ICDT 2017)

Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for several rounds for the problem of computing the equijoin of n relations. Given any query Q with width w, intersection width iw, input size IN, output size OUT, and a cluster of machines with M=\Omega(IN \frac{1}{\epsilon}) memory available per machine, where \epsilon > 1 and w \ge 1 are constants, we show that:
1. Q can be computed in O(n) rounds with O(n(INw + OUT)2/M) communication cost with high probability.
Q can be computed in O(log(n)) rounds with O(n(INmax(w, 3iw) + OUT)2/M) communication cost with high probability.
Intersection width is a new notion we introduce for queries and generalized hypertree decompositions (GHDs) of queries that captures how connected the adjacent components of the GHDs are.
We achieve our first result by introducing a distributed and generalized version of Yannakakis's algorithm, called GYM. GYM takes as input any GHD of Q with width w and depth d, and computes Q in O(d + log(n)) rounds and O(n (INw + OUT)2/M) communication cost. We achieve our second result by showing how to construct GHDs of Q with width max(w, 3iw) and depth O(log(n)). We describe another technique to construct GHDs with longer widths and lower depths, demonstrating other tradeoffs one can make between communication and the number of rounds.

Foto N. Afrati, Manas R. Joglekar, Christopher M. Re, Semih Salihoglu, and Jeffrey D. Ullman. GYM: A Multiround Distributed Join Algorithm. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{afrati_et_al:LIPIcs.ICDT.2017.4, author = {Afrati, Foto N. and Joglekar, Manas R. and Re, Christopher M. and Salihoglu, Semih and Ullman, Jeffrey D.}, title = {{GYM: A Multiround Distributed Join Algorithm}}, booktitle = {20th International Conference on Database Theory (ICDT 2017)}, pages = {4:1--4:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-024-8}, ISSN = {1868-8969}, year = {2017}, volume = {68}, editor = {Benedikt, Michael and Orsi, Giorgio}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2017.4}, URN = {urn:nbn:de:0030-drops-70462}, doi = {10.4230/LIPIcs.ICDT.2017.4}, annote = {Keywords: Joins, Yannakakis, Bulk Synchronous Processing, GHDs} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail