Box Covers and Domain Orderings for Beyond Worst-Case Join Processing

Authors Kaleb Alway, Eric Blais, Semih Salihoglu



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2021.3.pdf
  • Filesize: 0.85 MB
  • 23 pages

Document Identifiers

Author Details

Kaleb Alway
  • University of Waterloo, Canada
Eric Blais
  • University of Waterloo, Canada
Semih Salihoglu
  • University of Waterloo, Canada

Cite AsGet BibTex

Kaleb Alway, Eric Blais, and Semih Salihoglu. Box Covers and Domain Orderings for Beyond Worst-Case Join Processing. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 3:1-3:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICDT.2021.3

Abstract

Recent beyond worst-case optimal join algorithms Minesweeper and its generalization Tetris have brought the theory of indexing and join processing together by developing a geometric framework for joins. These algorithms take as input an index ℬ, referred to as a box cover, that stores output gaps that can be inferred from traditional indexes, such as B+ trees or tries, on the input relations. The performances of these algorithms highly depend on the certificate of ℬ, which is the smallest subset of gaps in ℬ whose union covers all of the gaps in the output space of a query Q. Different box covers can have different size certificates and the sizes of both the box covers and certificates highly depend on the ordering of the domain values of the attributes in Q. We study how to generate box covers that contain small size certificates to guarantee efficient runtimes for these algorithms. First, given a query Q over a set of relations of size N and a fixed set of domain orderings for the attributes, we give a Õ(N)-time algorithm called GAMB which generates a box cover for Q that is guaranteed to contain the smallest size certificate across any box cover for Q. Second, we show that finding a domain ordering to minimize the box cover size and certificate is NP-hard through a reduction from the 2 consecutive block minimization problem on boolean matrices. Our third contribution is a Õ(N)-time approximation algorithm called ADORA to compute domain orderings, under which one can compute a box cover of size Õ(K^r), where K is the minimum box cover for Q under any domain ordering and r is the maximum arity of any relation. This guarantees certificates of size Õ(K^r). We combine ADORA and GAMB with Tetris to form a new algorithm we call TetrisReordered, which provides several new beyond worst-case bounds. On infinite families of queries, TetrisReordered’s runtimes are unboundedly better than the bounds stated in prior work.

Subject Classification

ACM Subject Classification
  • Information systems → Database query processing
  • Theory of computation → Database query processing and optimization (theory)
Keywords
  • Beyond worst-case join algorithms
  • Tetris
  • Box covers
  • Domain orderings

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. Joins via geometric resolutions: Worst-case and beyond, April 2014. URL: http://arxiv.org/abs/1404.0703.
  2. Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems, 41(4), December 2016. Google Scholar
  3. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. Computing join queries with functional dependencies. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2016. Google Scholar
  4. Kaleb Alway, Eric Blais, and Semih Salihoglu. Box covers and domain orderings for beyond worst-case join processing, September 2019. URL: http://arxiv.org/abs/1909.12102.
  5. Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. SIAM Journal on Computing, 42(4), 2013. Google Scholar
  6. Piotr Berman and Bhaskar DasGupta. Complexities of efficient solutions of rectilinear polygon cover problems. Algorithmica, 17(4), April 1997. Google Scholar
  7. Philip A. Bernstein and Dah ming W. Chiu. Using Semi-joins to Solve Relational Queries. Journal of the ACM, 28(1), January 1981. Google Scholar
  8. Kellogg S. Booth and George S. Lueker. Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms. Journal of Computer and System Sciences, 13(3), December 1976. Google Scholar
  9. John Horton Conway and Neil James Alexander Sloane. Sphere Packings, Lattices and Groups, volume 290. Springer Science & Business Media, 2013. Google Scholar
  10. Joseph C. Culberson and Robert A. Reckhow. Covering polygons is hard. Journal of Algorithms, 17(1), July 1994. Google Scholar
  11. Rina Dechter and Judea Pearl. Tree-clustering schemes for constraint-processing. In Proceedings of the Seventh AAAI National Conference on Artificial Intelligence, 1988. Google Scholar
  12. Deborah S. Franzblau. Performance guarantees on a sweep-line heuristic for covering rectilinear polygons with rectangles. SIAM Journal on Discrete Mathematics, 2(3), August 1989. Google Scholar
  13. Deborah S. Franzblau and Daniel J. Kleitman. An algorithm for covering polygons with rectangles. Information and Control, 63(3), December 1984. Google Scholar
  14. Georg Gottlob, Stephanie Tien Lee, Gregory Valiant, and Paul Valiant. Size and treewidth bounds for conjunctive queries. Journal of the ACM, 59(3), June 2012. Google Scholar
  15. Martin Grohe and Dániel Marx. Constraint solving via fractional edge covers. ACM Transactions on Algorithms, 11(1), October 2014. Google Scholar
  16. Joachim Gudmundsson and Christos Levcopoulos. Close approximations of minimum rectangular coverings. Journal of combinatorial optimization, 3(4), December 1999. Google Scholar
  17. Salim Haddadi. A note on the NP-hardness of the consecutive block minimization problem. International Transactions in Operational Research, 9, November 2002. Google Scholar
  18. Manas R. Joglekar and Christopher M. Ré. It’s all a matter of degree: Using degree information to optimize multiway joins. In 19th International Conference on Database Theory, 2016. Google Scholar
  19. Lawrence T. Kou. Polynomial complete consecutive information retrieval problems. SIAM Journal on Computing, 6(1), March 1977. Google Scholar
  20. V.S. Anil Kumar and H. Ramesh. Covering rectilinear polygons with axis-parallel rectangles. SIAM Journal on Computing, 32(6), October 2003. Google Scholar
  21. Christos Levcopoulos and Joachim Gudmundsson. Approximation algorithms for covering polygons with squares and similar problems. In International Workshop on Randomization and Approximation Techniques in Computer Science, 1997. Google Scholar
  22. Anna Lubiw. Doubly lexical orderings of matrices. SIAM Journal on Computing, 16(5), October 1987. Google Scholar
  23. Hung Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2018. Google Scholar
  24. Hung Q. Ngo, Dung T. Nguyen, Christopher Ré, and Atri Rudra. Beyond worst-case analysis for joins with Minesweeper. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2014. Google Scholar
  25. Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms. Journal of the ACM, 65(3), March 2018. Google Scholar
  26. Hung Q Ngo, Christopher Ré, and Atri Rudra. Skew strikes back: New developments in the theory of join algorithms. ACM SIGMOD Record, 42(4), February 2014. Google Scholar
  27. Dan Olteanu and Jakub Závodny. Size bounds for factorised representations of query results. ACM Transactions on Database Systems, 40(1), March 2015. Google Scholar
  28. Gábor Fejes Tóth and Wlodzimierz Kuperberg. A survey of recent results in the theory of packing and covering. In New Trends in Discrete and Computational Geometry, pages 251-279. Springer Berlin Heidelberg, 1993. Google Scholar
  29. Todd L. Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In Proceedings of the 17th International Conference on Database Theory, 2014. Google Scholar
  30. Mihalis Yannakakis. Algorithms for acyclic database schemes. In Proceedings of the 7th International Conference on Very Large Data Bases, 1981. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail