Box Covers and Domain Orderings for Beyond Worst-Case Join Processing
Recent beyond worst-case optimal join algorithms Minesweeper and its generalization Tetris have brought the theory of indexing and join processing together by developing a geometric framework for joins. These algorithms take as input an index ℬ, referred to as a box cover, that stores output gaps that can be inferred from traditional indexes, such as B+ trees or tries, on the input relations. The performances of these algorithms highly depend on the certificate of ℬ, which is the smallest subset of gaps in ℬ whose union covers all of the gaps in the output space of a query Q. Different box covers can have different size certificates and the sizes of both the box covers and certificates highly depend on the ordering of the domain values of the attributes in Q. We study how to generate box covers that contain small size certificates to guarantee efficient runtimes for these algorithms. First, given a query Q over a set of relations of size N and a fixed set of domain orderings for the attributes, we give a Õ(N)-time algorithm called GAMB which generates a box cover for Q that is guaranteed to contain the smallest size certificate across any box cover for Q. Second, we show that finding a domain ordering to minimize the box cover size and certificate is NP-hard through a reduction from the 2 consecutive block minimization problem on boolean matrices. Our third contribution is a Õ(N)-time approximation algorithm called ADORA to compute domain orderings, under which one can compute a box cover of size Õ(K^r), where K is the minimum box cover for Q under any domain ordering and r is the maximum arity of any relation. This guarantees certificates of size Õ(K^r). We combine ADORA and GAMB with Tetris to form a new algorithm we call TetrisReordered, which provides several new beyond worst-case bounds. On infinite families of queries, TetrisReordered’s runtimes are unboundedly better than the bounds stated in prior work.
Beyond worst-case join algorithms
Tetris
Box covers
Domain orderings
Information systems~Database query processing
Theory of computation~Database query processing and optimization (theory)
3:1-3:23
Regular Paper
https://arxiv.org/abs/1909.12102
Kaleb
Alway
Kaleb Alway
University of Waterloo, Canada
Eric
Blais
Eric Blais
University of Waterloo, Canada
Semih
Salihoglu
Semih Salihoglu
University of Waterloo, Canada
10.4230/LIPIcs.ICDT.2021.3
Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. Joins via geometric resolutions: Worst-case and beyond, April 2014. URL: http://arxiv.org/abs/1404.0703.
http://arxiv.org/abs/1404.0703
Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems, 41(4), December 2016.
Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. Computing join queries with functional dependencies. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2016.
Kaleb Alway, Eric Blais, and Semih Salihoglu. Box covers and domain orderings for beyond worst-case join processing, September 2019. URL: http://arxiv.org/abs/1909.12102.
http://arxiv.org/abs/1909.12102
Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. SIAM Journal on Computing, 42(4), 2013.
Piotr Berman and Bhaskar DasGupta. Complexities of efficient solutions of rectilinear polygon cover problems. Algorithmica, 17(4), April 1997.
Philip A. Bernstein and Dah ming W. Chiu. Using Semi-joins to Solve Relational Queries. Journal of the ACM, 28(1), January 1981.
Kellogg S. Booth and George S. Lueker. Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms. Journal of Computer and System Sciences, 13(3), December 1976.
John Horton Conway and Neil James Alexander Sloane. Sphere Packings, Lattices and Groups, volume 290. Springer Science & Business Media, 2013.
Joseph C. Culberson and Robert A. Reckhow. Covering polygons is hard. Journal of Algorithms, 17(1), July 1994.
Rina Dechter and Judea Pearl. Tree-clustering schemes for constraint-processing. In Proceedings of the Seventh AAAI National Conference on Artificial Intelligence, 1988.
Deborah S. Franzblau. Performance guarantees on a sweep-line heuristic for covering rectilinear polygons with rectangles. SIAM Journal on Discrete Mathematics, 2(3), August 1989.
Deborah S. Franzblau and Daniel J. Kleitman. An algorithm for covering polygons with rectangles. Information and Control, 63(3), December 1984.
Georg Gottlob, Stephanie Tien Lee, Gregory Valiant, and Paul Valiant. Size and treewidth bounds for conjunctive queries. Journal of the ACM, 59(3), June 2012.
Martin Grohe and Dániel Marx. Constraint solving via fractional edge covers. ACM Transactions on Algorithms, 11(1), October 2014.
Joachim Gudmundsson and Christos Levcopoulos. Close approximations of minimum rectangular coverings. Journal of combinatorial optimization, 3(4), December 1999.
Salim Haddadi. A note on the NP-hardness of the consecutive block minimization problem. International Transactions in Operational Research, 9, November 2002.
Manas R. Joglekar and Christopher M. Ré. It’s all a matter of degree: Using degree information to optimize multiway joins. In 19th International Conference on Database Theory, 2016.
Lawrence T. Kou. Polynomial complete consecutive information retrieval problems. SIAM Journal on Computing, 6(1), March 1977.
V.S. Anil Kumar and H. Ramesh. Covering rectilinear polygons with axis-parallel rectangles. SIAM Journal on Computing, 32(6), October 2003.
Christos Levcopoulos and Joachim Gudmundsson. Approximation algorithms for covering polygons with squares and similar problems. In International Workshop on Randomization and Approximation Techniques in Computer Science, 1997.
Anna Lubiw. Doubly lexical orderings of matrices. SIAM Journal on Computing, 16(5), October 1987.
Hung Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2018.
Hung Q. Ngo, Dung T. Nguyen, Christopher Ré, and Atri Rudra. Beyond worst-case analysis for joins with Minesweeper. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2014.
Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms. Journal of the ACM, 65(3), March 2018.
Hung Q Ngo, Christopher Ré, and Atri Rudra. Skew strikes back: New developments in the theory of join algorithms. ACM SIGMOD Record, 42(4), February 2014.
Dan Olteanu and Jakub Závodny. Size bounds for factorised representations of query results. ACM Transactions on Database Systems, 40(1), March 2015.
Gábor Fejes Tóth and Wlodzimierz Kuperberg. A survey of recent results in the theory of packing and covering. In New Trends in Discrete and Computational Geometry, pages 251-279. Springer Berlin Heidelberg, 1993.
Todd L. Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In Proceedings of the 17th International Conference on Database Theory, 2014.
Mihalis Yannakakis. Algorithms for acyclic database schemes. In Proceedings of the 7th International Conference on Very Large Data Bases, 1981.
Kaleb Alway, Eric Blais, and Semih Salihoglu
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode