Compact Data Structures Meet Databases (Invited Talk)

Author Gonzalo Navarro



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2023.2.pdf
  • Filesize: 0.66 MB
  • 16 pages

Document Identifiers

Author Details

Gonzalo Navarro
  • Millennium Institute for Foundational Research on Data (IMFD), Santiago, Chile
  • Department of Computer Science, University of Chile, Santiago, Chile

Cite AsGet BibTex

Gonzalo Navarro. Compact Data Structures Meet Databases (Invited Talk). In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 2:1-2:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ICDT.2023.2

Abstract

We describe two success stories on the application of compact data structures (cds) to solve the problem of the excessively redundant space requirements posed by worst-case-optimal (wco) algorithms for multijoins in databases, and particularly basic graph patterns on graph databases. The aim of cds is to represent the data and additional data structures on it, using total space close to that of the plain (and, sometimes, compressed) data, while efficiently simulating the data structure operations. Cds turn out to be a perfect approach for the described problem: We designed and implemented cds that effectively use space close to that of the plain or compressed data, which is orders of magnitude less than existing systems, while retaining worst-case optimality and performing competitively with those systems in query time, sometimes being even considerably faster.

Subject Classification

ACM Subject Classification
  • Information systems → Data structures
  • Theory of computation → Data structures design and analysis
Keywords
  • succinct data structures
  • tries
  • multidimensional grids
  • text searching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. C. R. Aberger, A. Lamb, S. Tu, A. Nötzli, K. Olukotun, and C. Ré. Emptyheaded: A relational engine for graph processing. ACM Transactions on Database Systems, 42, 2017. Google Scholar
  2. W. Ali, M. Saleem, B. Yao, A. Hogan, and A.-C. Ngonga Ngomo. A survey of RDF stores & SPARQL engines for querying knowledge graphs. The VLDB Journal, 31(3):1-26, 2022. Google Scholar
  3. R. Angles, M. Arenas, P. Barceló, A. Hogan, J. L. Reutter, and D. Vrgoc. Foundations of modern query languages for graph databases. ACM Computing Surveys, 50(5):68:1-68:40, 2017. Google Scholar
  4. D. Arroyuelo, A. Hogan, G. Navarro, J. Reutter, J. Rojas-Ledesma, and A. Soto. Worst-case optimal graph joins in almost no space. In Proc. 47th ACM International Conference on Management of Data (SIGMOD), pages 102-114, 2021. Google Scholar
  5. D. Arroyuelo, A. Hogan, G. Navarro, and J. Rojas-Ledesma. Time- and space-efficient regular path queries. In Proc. 38th IEEE International Conference on Data Engineering (ICDE), pages 3091-3105, 2022. Google Scholar
  6. D. Arroyuelo, G. Navarro, J. L. Reutter, and J. Rojas-Ledesma. Optimal joins using compressed quadtrees. ACM Transactions on Database Systems, 47(2):article 8, 2022. Google Scholar
  7. A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. SIAM Journal on Computing, 42(4):1737-1767, 2013. Google Scholar
  8. J. Barbay, F. Claude, and G. Navarro. Compact binary relation representations with rich functionality. Information and Computation, 232:19-37, 2013. Google Scholar
  9. D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing trees of higher degree. Algorithmica, 43(4):275-292, 2005. Google Scholar
  10. N. Brisaboa, S. Ladra, and G. Navarro. Compact representation of web graphs with extended functionality. Information Systems, 39(1):152-174, 2014. Google Scholar
  11. D. Clark. Compact Pat Trees. PhD thesis, University of Waterloo, 1996. Google Scholar
  12. O. Erling. Virtuoso, a hybrid RDBMS/graph column store. Data Engineering Bulletin, 35(1):3-8, 2012. Google Scholar
  13. S. Ferrada, B. Bustos, and A. Hogan. Extending SPARQL with similarity joins. In Proc. 19th International Semantic Web Conference (ISWC), pages 201-217, 2020. Google Scholar
  14. P. Ferragina and G. Manzini. Indexing compressed texts. Journal of the ACM, 52(4):552-581, 2005. Google Scholar
  15. P. Ferragina, G. Manzini, T. Gagie, D. Köppl, G. Navarro, M. Striani, and F. Tosoni. Improving matrix-vector multiplication via lossless grammar-compressed matrices. Proceedings of the VLDB Endowment, 2022. To appear. See URL: https://www.dcc.uchile.cl/gnavarro/ps/pvldb22.pdf.
  16. P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro. Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms, 3(2):article 20, 2007. Google Scholar
  17. N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker, V. Marsault, S. Plantikow, M. Rydberg, P. Selmer, and A. Taylor. Cypher: An evolving query language for property graphs. In Proc. SIGMOD International Conference on Management of Data, pages 1433-1445, 2018. Google Scholar
  18. A. P. Francisco, T. Gagie, D. Köppl, S. Ladra, and G. Navarro. Graph compression for adjacency-matrix multiplication. SN Computer Science, 3:article 193, 2022. Google Scholar
  19. E. Fredkin. Trie memory. Communications of the ACM, 3:490-500, 1960. Google Scholar
  20. M. J. Freitag, M. Bandle, T. Schmidt, A. Kemper, and T. Neumann. Adopting worst-case optimal joins in relational database systems. Proceedings of the VLDB Endowment, 13(11):1891-1904, 2020. Google Scholar
  21. T. Gagie, G. Navarro, and S. J. Puglisi. New algorithms on wavelet trees and applications to information retrieval. Theoretical Computer Science, 426-427:25-41, 2012. Google Scholar
  22. F. Geerts, T. Muñoz, C. Riveros, J. van den Bussche, and D. Vrgoc. Matrix query languages. SIGMOD Record, 50(3):6-19, 2021. Google Scholar
  23. V-M. Glushkov. The abstract theory of automata. Russian Mathematical Surveys, 16:1-53, 1961. Google Scholar
  24. R. Grossi, A. Gupta, and J. Vitter. High-order entropy-compressed text indexes. In Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841-850, 2003. Google Scholar
  25. S. Harris, A. Seaborne, and E. Prud'hommeaux. SPARQL 1.1 Query Language. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-query/.
  26. A. Hogan. The Web of Data. Springer, 2020. Google Scholar
  27. A. Hogan, E. Blomqvist, M. Cochez, C. d'Amato, G. de Melo, C. Gutiérrez, S. Kirrane, J.E. Labra Gayo, R. Navigli, S. Neumaier, A.-C. Ngonga Ngomo, A. Polleres, S. M. Rashid, A. Rula, L. Schmelzeisen, J. Sequeda, S. Staab, and A. Zimmermann. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge. Morgan & Claypool Publishers, 2021. Google Scholar
  28. A. Hogan, J. L. Reutter, and A. Soto. In-database graph analytics with recursive SPARQL. In Proc. 19th International Semantic Web Conference (ISWC), pages 511-528, 2020. Google Scholar
  29. A. Hogan, C. Riveros, C. Rojas, and A. Soto. A worst-case optimal join algorithm for SPARQL. In Proc. 18th International Semantic Web Conference (ISWC), pages 258-275, 2019. Google Scholar
  30. O. Kalinsky, Y. Etsion, and B. Kimelfeld. Flexible caching in trie joins. In Proc. 20th International Conference on Extending Database Technology (EDBT), pages 282-293, 2017. Google Scholar
  31. M. A. Khamis, H. Q. Ngo, C. Ré, and A. Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems, 41(4), 2016. Google Scholar
  32. A. Koschmieder and U. Leser. Regular path queries on large graphs. In Proc. International Conference on Scientific and Statistical Database Management (SSDBM), volume 7338 of LNCS, pages 177-194. Springer, 2012. Google Scholar
  33. S. Malyshev, M. Krötzsch, L. González, J. Gonsior, and A. Bielefeldt. Getting the most out of Wikidata: Semantic technology usage in Wikipedia’s knowledge graph. In Proc. 17th International Semantic Web Conference (ISWC), pages 376-394, 2018. Google Scholar
  34. F. Manola and E. Miller. RDF primer. W3C Recommendation, 2004. URL: http://www.w3.org/TR/rdf-primer/.
  35. A. Mhedhbi and S. Salihoglu. Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB Endowment, 12(11):1692-1704, 2019. Google Scholar
  36. J. J. Miller. Graph database applications and concepts with Neo4j. In Proc. Southern Association for Information Systems Conference, pages 141-147, 2013. Google Scholar
  37. G. Navarro. Wavelet trees for all. Journal of Discrete Algorithms, 25:2-20, 2014. Google Scholar
  38. G. Navarro. Compact Data Structures - A practical approach. Cambridge Univ. Press, 2016. Google Scholar
  39. G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings - Practical on-line search algorithms for texts and biological sequences. Cambridge Univ. Press, 2002. Google Scholar
  40. G. Navarro, J. Reutter, and J. Rojas-Ledesma. Optimal joins using compact data structures. In Proc. 23rd International Conference on Database Theory (ICDT), pages 21:1-21:21, 2020. Google Scholar
  41. T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. VLDB Journal, 19:91-113, 2010. Google Scholar
  42. H. Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Proc. 37th Symposium on Principles of Database Systems (PODS), pages 111-124, 2018. Google Scholar
  43. H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. In Proc. 31st Symposium on Principles of Database Systems (PODS), pages 37-48, 2012. Google Scholar
  44. H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Record, 42(4):5-16, 2013. Google Scholar
  45. D. Nguyen, M. Aref, M. Bravenboer, G. Kollias, H. Q. Ngo, C. Ré, and A. Rudra. Join processing for graph patterns: An old dog with new tricks. In Proc. 3rd International Workshop on Graph Data Management Experiences and Systems (GRADES), pages 2:1-2:8, 2015. Google Scholar
  46. V.-Q. Nguyen and K. Kim. Efficient regular path query evaluation by splitting with unit-subquery cost matrix. IEICE Transactions on Information and Systems, 100-D(10):2648-2652, 2017. Google Scholar
  47. R. Raman, V. Raman, and S. S. Rao. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms, 3(4):article 43, 2007. Google Scholar
  48. I. Robinson, J. Webber, and E. Eifrem. Graph Databases. O'Reilly, 2nd edition, 2015. Google Scholar
  49. H. Samet. The quadtree and related hierarchical data structures. ACM Computing Surveys, 16(2):187-260, 1984. Google Scholar
  50. H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006. Google Scholar
  51. B. B. Thompson, M. Personick, and M. Cutcher. The Bigdataregistered RDF Graph Database. In Linked Data Management, pages 193-237. Chapman and Hall/CRC, 2014. Google Scholar
  52. N. Tziavelis, D. Ajwani, W. Gatterbauer, M. Riedewald, and X. Yang. Optimal algorithms for ranked enumeration of answers to full conjunctive queries. Proceedings of the VLDB Endowment, 13(9):1582-1597, 2020. Google Scholar
  53. T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proc. 17th International Conference on Database Theory (ICDT), pages 96-106, 2014. Google Scholar
  54. D. Vrandecic and M. Krötzsch. Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10):78-85, 2014. Google Scholar
  55. D. Vrgoc, C. Rojas, R. Angles, M. Arenas, D. Arroyuelo, C. Buil Aranda, A. Hogan, G. Navarro, C. Riveros, and J. Romero. MillenniumDB: A persistent, open-source, graph database. CoRR, abs/2111.01540, 2021. URL: http://arxiv.org/abs/2111.01540.
  56. X. Wang, J. Wang, and X. Zhang. Efficient distributed regular path queries on RDF graphs using partial evaluation. In Proc. International Conference on Information and Knowledge Management (CIKM), pages 1933-1936, 2016. Google Scholar
  57. M. Yannakakis. Algorithms for acyclic database schemes. In Proc. 7th International Conference on Very Large Databases (VLDB), pages 82-94, 1981. Google Scholar