Optimal Joins Using Compact Data Structures

Authors Gonzalo Navarro, Juan L. Reutter, Javiel Rojas-Ledesma



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2020.21.pdf
  • Filesize: 1.18 MB
  • 21 pages

Document Identifiers

Author Details

Gonzalo Navarro
  • University of Chile, Santiago, Chile
  • IMFD, Santiago, Chile
Juan L. Reutter
  • Pontificia Universidad Católica de Chile, Santiago, Chile
  • IMFD, Santiago, Chile
Javiel Rojas-Ledesma
  • University of Chile, Santiago, Chile
  • IMFD, Santiago, Chile

Cite As Get BibTex

Gonzalo Navarro, Juan L. Reutter, and Javiel Rojas-Ledesma. Optimal Joins Using Compact Data Structures. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 21:1-21:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.ICDT.2020.21

Abstract

Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality we either need to build completely new indexes, or we must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that may be non-negligible.
We show that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage. Our representation is a compact quadtree for the static indexes, and a dynamic quadtree sharing subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, and show that the running time of this algorithm is worst-case optimal in data complexity. Remarkably, we can extend our framework to evaluate more expressive queries from relational algebra by introducing a lazy version of qdags (lqdags). Once again, we can show that the running time of our algorithms is worst-case optimal.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database query processing and optimization (theory)
  • Theory of computation → Data structures and algorithms for data management
Keywords
  • Join algorithms
  • Compact data structures
  • Quadtrees
  • AGM bound

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. S. Álvarez-García, N. Brisaboa, J. Fernández, M. Martínez-Prieto, and G. Navarro. Compressed vertical partitioning for efficient RDF management. Knowledge and Information Systems, 44(2):439-474, 2015. Google Scholar
  2. A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. SIAM Journal on Computing, 42(4):1737-1767, 2013. Google Scholar
  3. D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing Trees of Higher Degree. Algorithmica, 43(4):275-292, 2005. Google Scholar
  4. N. R. Brisaboa, S. Ladra, and G. Navarro. Compact representation of Web graphs with extended functionality. Information Systems, 39(1):152-174, 2014. Google Scholar
  5. G. de Bernardo, S. Alvarez-García, N. Brisaboa, G. Navarro, and O. Pedreira. Compact Querieable Representations of Raster Data. In Proc. 20th International Symposium on String Processing and Information Retrieval (SPIRE), pages 96-108, 2013", !SERIES = "LNCS 8214. Google Scholar
  6. R. A. Finkel and J. L. Bentley. Quad Trees: A data structure for retrieval on composite keys. Acta Informatica, 4:1-9, 1974. Google Scholar
  7. T. Gagie, J. González-Nova, S. Ladra, G. Navarro, and D. Seco. Faster compressed quadtrees. In Proc. 25th Data Compression Conference (DCC), pages 93-102, 2015. Google Scholar
  8. A. Hogan, C. Riveros, C. Rojas, and A. Soto. Extending SPARQL engines with multiway joins. In Proc. 18th International Semantic Web Conference (ISWC), 2019. To appear. Google Scholar
  9. M. A. Khamis, H. Q. Ngo, C. Ré, and A. Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems, 41(4):22, 2016. Google Scholar
  10. M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: Questions asked frequently. In Proc. 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), pages 13-28, 2016. Google Scholar
  11. M. A. Khamis, H. Q. Ngo, and D. Suciu. What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one another? In Proc. 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), pages 429-444, 2017. Google Scholar
  12. G. M. Morton. A computer oriented geodetic data base; and a new technique in file sequencing. Technical report, IBM Ltd., 1966. Google Scholar
  13. G. Navarro. Compact Data Structures - A practical approach. Cambridge University Press, 2016. Google Scholar
  14. H. Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Proc. 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), pages 111-124, 2018. Google Scholar
  15. H. Q. Ngo, D. T. Nguyen, C. Re, and A. Rudra. Beyond worst-case analysis for joins with Minesweeper. In Proc. 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 234-245, 2014. Google Scholar
  16. H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: New developments in the theory of join algorithms. ACM SIGMOD Record, 42(4):5-16, 2014. Google Scholar
  17. Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems, pages 37-48. ACM, 2012. Google Scholar
  18. D. Nguyen, M. Aref, M. Bravenboer, G. Kollias, H. Q. Ngo, C. Ré, and A. Rudra. Join processing for graph patterns: An old dog with new tricks. In Proc. 3rd International Workshop on Graph Data Management Experiences and Systems (GRADES), pages 2:1-2:8, 2015. Google Scholar
  19. H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006. Google Scholar
  20. D. Suciu. Communication cost in parallel query evaluation: A tutorial. In Proc. 36th ACM Symposium on Principles of Database Systems (PODS), pages 319-319, 2017. Google Scholar
  21. T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proc. 17th International Conference on Database Theory (ICDT), pages 96-106, 2014. Google Scholar
  22. D. S. Wise and J. Franco. Costs of quadtree representation of nondense matrices. Journal of Parallel and Distributed Computing, 9(3):282-296, 1990. Google Scholar
  23. M. Yannakakis. Algorithms for acyclic database schemes. In Proc. 7th International Conference on Very Large Databases (VLDB), pages 82-94, 1981. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail