A Formal Language Perspective on Factorized Representations

Authors Benny Kimelfeld , Wim Martens , Matthias Niewerth



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.20.pdf
  • Filesize: 0.78 MB
  • 20 pages

Document Identifiers

Author Details

Benny Kimelfeld
  • Technion, Haifa, Israel
Wim Martens
  • University of Bayreuth, Germany
Matthias Niewerth
  • University of Bayreuth, Germany

Cite As Get BibTex

Benny Kimelfeld, Wim Martens, and Matthias Niewerth. A Formal Language Perspective on Factorized Representations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 20:1-20:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.20

Abstract

Factorized representations (FRs) are a well-known tool to succinctly represent results of join queries and have been originally defined using the named database perspective. We define FRs in the unnamed database perspective and use them to establish several new connections. First, unnamed FRs can be exponentially more succinct than named FRs, but this difference can be alleviated by imposing a disjointness condition on columns. Conversely, named FRs can also be exponentially more succinct than unnamed FRs. Second, unnamed FRs are the same as (i.e., isomorphic to) context-free grammars for languages in which each word has the same length. This tight connection allows us to transfer a wide range of results on context-free grammars to database factorization; of which we offer a selection in the paper. Third, when we generalize unnamed FRs to arbitrary sets of tuples, they become a generalization of path multiset representations, a formalism that was recently introduced to succinctly represent sets of paths in the context of graph database query evaluation.

Subject Classification

ACM Subject Classification
  • Information systems → Data management systems
Keywords
  • Databases
  • relational databases
  • graph databases
  • factorized databases
  • regular path queries
  • compact representations

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Margareta Ackerman and Erkki Mäkinen. Three new algorithms for regular language enumeration. In Hung Q. Ngo, editor, Computing and Combinatorics, 15th Annual International Conference, COCOON 2009, Niagara Falls, NY, USA, July 13-15, 2009, Proceedings, volume 5609 of Lecture Notes in Computer Science, pages 178-191. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-02882-3_19.
  2. Margareta Ackerman and Jeffrey O. Shallit. Efficient enumeration of words in regular languages. Theor. Comput. Sci., 410(37):3461-3470, 2009. URL: https://doi.org/10.1016/J.TCS.2009.03.018.
  3. Antoine Amarilli, Marcelo Arenas, YooJung Choi, Mikaël Monet, Guy Van den Broeck, and Benjie Wang. A circus of circuits: Connections between decision diagrams, circuits, and automata. CoRR, abs/2404.09674, 2024. URL: https://doi.org/10.48550/arXiv.2404.09674.
  4. Antoine Amarilli, Pierre Bourhis, Louis Jachiet, and Stefan Mengel. A circuit-based approach to efficient enumeration. In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, volume 80 of LIPIcs, pages 111:1-111:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPICS.ICALP.2017.111.
  5. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Enumeration on trees with tractable combined complexity and efficient updates. In Symposium on Principles of Database Systems (PODS), pages 89-103. ACM, 2019. URL: https://doi.org/10.1145/3294052.3319702.
  6. Marcelo Arenas, Pablo Barceló, Leonid Libkin, Wim Martens, and Andreas Pieris. Database Theory. Open source at https://github.com/pdm-book/community, 2022.
  7. Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, and Cristian Riveros. #NFA admits an FPRAS: efficient enumeration, counting, and uniform generation for logspace classes. J. ACM, 68(6):48:1-48:40, 2021. URL: https://doi.org/10.1145/3477045.
  8. Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, and Cristian Riveros. When is approximate counting for conjunctive queries tractable? In Symposium on Theory of Computing (STOC), pages 1015-1027. ACM, 2021. URL: https://doi.org/10.1145/3406325.3451014.
  9. Nurzhan Bakibayev, Tomás Kociský, Dan Olteanu, and Jakub Zavodny. Aggregation and ordering in factorised databases. Proc. VLDB Endow., 6(14):1990-2001, 2013. URL: https://doi.org/10.14778/2556549.2556579.
  10. Nurzhan Bakibayev, Dan Olteanu, and Jakub Zavodny. FDB: A query engine for factorised relational databases. Proc. VLDB Endow., 5(11):1232-1243, 2012. URL: https://doi.org/10.14778/2350229.2350242.
  11. Pablo Barceló, Diego Figueira, and Miguel Romero. Boundedness of conjunctive regular path queries. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 104:1-104:15, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.104.
  12. Pablo Barceló, Carlos A. Hurtado, Leonid Libkin, and Peter T. Wood. Expressive languages for path queries over graph-structured data. In Symposium on Principles of Database Systems (PODS), pages 3-14. ACM, 2010. URL: https://doi.org/10.1145/1807085.1807089.
  13. Pablo Barceló, Leonid Libkin, and Juan L. Reutter. Querying graph patterns. In Symposium on Principles of Database Systems (PODS), pages 199-210. ACM, 2011. URL: https://doi.org/10.1145/1989284.1989307.
  14. Christoph Berkholz and Harry Vinall-Smeeth. A dichotomy for succinct representations of homomorphisms. In International Colloquium on Automata, Languages, and Programming (ICALP), volume 261 of LIPIcs, pages 113:1-113:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPICS.ICALP.2023.113.
  15. Vicente Calisto, Benjamín Farias, Wim Martens, Carlos Rojas, and Domagoj Vrgoc. Pathfinder demo: Returning paths in graph queries. In ISWC 2024 Posters, Demos and Industry Tracks, volume 3828 of CEUR Workshop Proceedings. CEUR-WS.org, 2024. URL: https://ceur-ws.org/Vol-3828/paper34.pdf.
  16. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. Rewriting of regular expressions and regular path queries. In Symposium on Principles of Database Systems (PODS), pages 194-204. ACM Press, 1999. URL: https://doi.org/10.1145/303976.303996.
  17. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. Containment of conjunctive regular path queries with inverse. In International Conference on Principles of Knowledge Representation and Reasoning (KR), pages 176-185. Morgan Kaufmann, 2000. Google Scholar
  18. Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, and Mirek Riedewald. Tractable orders for direct access to ranked answers of conjunctive queries. ACM Trans. Database Syst., 48(1):1:1-1:45, 2023. URL: https://doi.org/10.1145/3578517.
  19. Nofar Carmeli, Shai Zeevi, Christoph Berkholz, Alessio Conte, Benny Kimelfeld, and Nicole Schweikardt. Answering (unions of) conjunctive queries using random access and random-order enumeration. ACM Trans. Database Syst., 47(3):9:1-9:49, 2022. URL: https://doi.org/10.1145/3531055.
  20. Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Trans. Inf. Theory, 51(7):2554-2576, 2005. URL: https://doi.org/10.1109/TIT.2005.850116.
  21. Noam Chomsky. On certain formal properties of grammars. Inf. Control., 2(2):137-167, 1959. URL: https://doi.org/10.1016/S0019-9958(59)90362-6.
  22. Mariano P. Consens and Alberto O. Mendelzon. GraphLog: a visual formalism for real life recursion. In Symposium on Principles of Database Systems (PODS), pages 404-416, 1990. URL: https://doi.org/10.1145/298514.298591.
  23. Isabel F. Cruz, Alberto O. Mendelzon, and Peter T. Wood. A graphical query language supporting recursion. In International Conference on Management of Data (SIGMOD), pages 323-330, 1987. URL: https://doi.org/10.1145/38713.38749.
  24. Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Fred Zemke. Graph pattern matching in GQL and SQL/PGQ. In International Conference on Management of Data (SIGMOD), pages 2246-2258. ACM, 2022. URL: https://doi.org/10.1145/3514221.3526057.
  25. Pál Dömösi. Unusual algorithms for lexicographical enumeration. Acta Cybern., 14(3):461-468, 2000. URL: https://cyber.bibl.u-szeged.hu/index.php/actcybern/article/view/3539.
  26. Benjamín Farias, Wim Martens, Carlos Rojas, and Domagoj Vrgoc. Pathfinder: Returning paths in graph queries. In International Semantic Web Conference (ISWC), pages 135-154. Springer, 2024. URL: https://doi.org/10.1007/978-3-031-77850-6_8.
  27. Diego Figueira, Adwait Godbole, Shankara Narayanan Krishna, Wim Martens, Matthias Niewerth, and Tina Trautner. Containment of simple conjunctive regular path queries. In International Conference on Principles of Knowledge Representation and Reasoning (KR), pages 371-380, 2020. Google Scholar
  28. Yuval Filmus. Lower bounds for context-free grammars. Inf. Process. Lett., 111(18):895-898, 2011. URL: https://doi.org/10.1016/J.IPL.2011.06.006.
  29. Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics. Cambridge University Press, 1 edition, 2009. Google Scholar
  30. Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoč. A researcher’s digest of GQL (invited talk). In International Conference on Database Theory (ICDT), pages 1:1-1:22, 2023. URL: https://doi.org/10.4230/LIPICS.ICDT.2023.1.
  31. Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. Cypher: An evolving query language for property graphs. In International Conference on Management of Data (SIGMOD), pages 1433-1445. ACM, 2018. URL: https://doi.org/10.1145/3183713.3190657.
  32. Moses Ganardi, Artur Jez, and Markus Lohrey. Balancing straight-line programs. J. ACM, 68(4):27:1-27:40, 2021. URL: https://doi.org/10.1145/3457389.
  33. Kareem El Gebaly, Parag Agrawal, Lukasz Golab, Flip Korn, and Divesh Srivastava. Interpretable and informative explanations of outcomes. Proc. VLDB Endow., 8(1):61-72, 2014. URL: https://doi.org/10.14778/2735461.2735467.
  34. GQL. https://www.gqlstandards.org/, 2023.
  35. John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to automata theory, languages, and computation, 2nd edition. Addison-Wesley, 2 edition, 2001. Google Scholar
  36. ISO. Information technology - database languages SQL - Part 16: Property graph queries (SQL/PGQ), 2023. Google Scholar
  37. Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Conjunctive queries with free access patterns under updates. In International Conference on Database Theory (ICDT), pages 17:1-17:20, 2023. URL: https://doi.org/10.4230/LIPICS.ICDT.2023.17.
  38. Benny Kimelfeld, Wim Martens, and Matthias Niewerth. A unifying perspective on succinct data representations. CoRR, abs/2309.11663, 2023. URL: https://doi.org/10.48550/arXiv.2309.11663.
  39. Markus Lohrey. Algorithmics on slp-compressed strings: A survey. Groups Complex. Cryptol., 4(2):241-299, 2012. URL: https://doi.org/10.1515/GCC-2012-0016.
  40. Ole Lehrmann Madsen and Bent Bruun Kristensen. LR-parsing of extended context free grammars. Acta Informatica, 7:61-73, 1976. URL: https://doi.org/10.1007/BF00265221.
  41. Wim Martens, Matthias Niewerth, Tina Popp, Carlos Rojas, Stijn Vansummeren, and Domagoj Vrgoc. Representing paths in graph database pattern matching. Proc. VLDB Endow., 16(7):1790-1803, 2023. URL: https://doi.org/10.14778/3587136.3587151.
  42. Wim Martens, Matthias Niewerth, and Tina Trautner. A trichotomy for regular trail queries. In International Symposium on Theoretical Aspects of Computer Science (STACS), pages 7:1-7:16, 2020. URL: https://doi.org/10.4230/LIPICS.STACS.2020.7.
  43. Wim Martens and Tina Popp. The complexity of regular trail and simple path queries on undirected graphs. In Symposium on Principles of Database Systems (PODS), pages 165-174. ACM, 2022. URL: https://doi.org/10.1145/3517804.3524149.
  44. Wim Martens and Tina Trautner. Evaluation and enumeration problems for regular path queries. In International Conference on Database Theory (ICDT), pages 19:1-19:21, 2018. URL: https://doi.org/10.4230/LIPIcs.ICDT.2018.19.
  45. Kuldeep S. Meel, Sourav Chakraborty, and Umang Mathur. A faster FPRAS for #NFA. Proc. ACM Manag. Data, 2(2):112, 2024. URL: https://doi.org/10.1145/3651613.
  46. Kuldeep S. Meel and Alexis de Colnet. #CFG and #DNNF admit FPRAS. CoRR, abs/2406.18224, 2024. URL: https://doi.org/10.48550/arXiv.2406.18224.
  47. Stefan Mengel and Harry Vinall-Smeeth. A lower bound on unambiguous context free grammars via communication complexity. CoRR, abs/2412.03199, 2024. URL: https://doi.org/10.48550/arXiv.2412.03199.
  48. Albert R. Meyer and Larry J. Stockmeyer. The equivalence problem for regular expressions with squaring requires exponential space. In SWAT (FOCS), pages 125-129. IEEE Computer Society, 1972. URL: https://doi.org/10.1109/SWAT.1972.29.
  49. Martin Muñoz and Cristian Riveros. Streaming enumeration on nested documents. In Dan Olteanu and Nils Vortmeier, editors, 25th International Conference on Database Theory, ICDT 2022, March 29 to April 1, 2022, Edinburgh, UK (Virtual Conference), volume 220 of LIPIcs, pages 19:1-19:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPICS.ICDT.2022.19.
  50. Neo4j. Intro to Cypher. https://neo4j.com/developer/cypher-query-language/, 2017.
  51. Milos Nikolic and Dan Olteanu. Incremental view maintenance with triple lock factorization benefits. In International Conference on Management of Data (SIGMOD), pages 365-380, 2018. URL: https://doi.org/10.1145/3183713.3183758.
  52. Dan Olteanu and Maximilian Schleich. F: regression models over factorized views. Proc. VLDB Endow., 9(13):1573-1576, 2016. URL: https://doi.org/10.14778/3007263.3007312.
  53. Dan Olteanu and Maximilian Schleich. Factorized databases. SIGMOD Rec., 45(2):5-16, 2016. URL: https://doi.org/10.1145/3003665.3003667.
  54. Dan Olteanu and Jakub Zavodny. Factorised representations of query results: size bounds and readability. In International Conference on Database Theory (ICDT), pages 285-298. ACM, 2012. URL: https://doi.org/10.1145/2274576.2274607.
  55. Dan Olteanu and Jakub Závodný. Size bounds for factorised representations of query results. ACM Trans. Database Syst., 40(1):2:1-2:44, 2015. URL: https://doi.org/10.1145/2656335.
  56. Steven T. Piantadosi. How to enumerate trees from a context-free grammar. CoRR, abs/2305.00522, 2023. URL: https://doi.org/10.48550/arXiv.2305.00522.
  57. The Rel language (relations). https://docs.relational.ai/rel/primer/basic-syntax#relations, 2023.
  58. RelationalAI. The Rel language, 2024. https://learn.relational.ai/. Google Scholar
  59. Maximilian Schleich, Dan Olteanu, and Radu Ciucanu. Learning linear regression models over factorized joins. In International Conference on Management of Data (SIGMOD), pages 3-18, 2016. URL: https://doi.org/10.1145/2882903.2882939.
  60. Markus L. Schmid. Conjunctive regular path queries with string variables. In Symposium on Principles of Database Systems (PODS), pages 361-374. ACM, 2020. URL: https://doi.org/10.1145/3375395.3387663.
  61. Richard Edwin Stearns and Harry B. Hunt III. On the equivalence and containment problems for unambiguous regular expressions, regular grammars and finite automata. SIAM J. Comput., 14(3):598-611, 1985. URL: https://doi.org/10.1137/0214044.
  62. Szymon Torunczyk. Aggregate queries on sparse databases. In Symposium on Principles of Database Systems (PODS), pages 427-443. ACM, 2020. URL: https://doi.org/10.1145/3375395.3387660.
  63. Wen-Guey Tzeng. On path equivalence of nondeterministic finite automata. Inf. Process. Lett., 58(1):43-46, 1996. URL: https://doi.org/10.1016/0020-0190(96)00039-7.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail