Dynamic Direct Access of MSO Query Evaluation over Strings

Authors Pierre Bourhis , Florent Capelli , Stefan Mengel , Cristian Riveros



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.26.pdf
  • Filesize: 0.96 MB
  • 18 pages

Document Identifiers

Author Details

Pierre Bourhis
  • Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
Florent Capelli
  • Univ. Artois, CNRS, UMR 8188, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
Stefan Mengel
  • Univ. Artois, CNRS, UMR 8188, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
Cristian Riveros
  • Pontificia Universidad Católica de Chile, Santiago, Chile
  • Millennium Institute for Foundational Research on Data, Santiago, Chile

Acknowledgements

This work benefited from https://www.dagstuhl.de/seminars/seminar-calendar/seminar-details/24032, Representation, Provenance, and Explanations in Database Theory and Logic.

Cite As Get BibTex

Pierre Bourhis, Florent Capelli, Stefan Mengel, and Cristian Riveros. Dynamic Direct Access of MSO Query Evaluation over Strings. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 26:1-26:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.26

Abstract

We study the problem of evaluating a Monadic Second Order (MSO) query over strings under updates in the setting of direct access. We present an algorithm that, given an MSO query with first-order free variables represented by an unambiguous variable-set automaton 𝒜 with state set Q and variables X and a string s, computes a data structure in time 𝒪(|Q|^ω⋅ |X|² ⋅ |s|) and, then, given an index i retrieves, using the data structure, the i-th output of the evaluation of 𝒜 over s in time 𝒪(|Q|^ω ⋅ |X|³ ⋅ log(|s|)²) where ω is the exponent for matrix multiplication. Ours is the first efficient direct access algorithm for MSO query evaluation over strings; such algorithms so far had only been studied for first-order queries and conjunctive queries over relational data.
Our algorithm gives the answers in lexicographic order where, in contrast to the setting of conjunctive queries, the order between variables can be freely chosen by the user without degrading the runtime. Moreover, our data structure can be updated efficiently after changes to the input string, allowing more powerful updates than in the enumeration literature, e.g. efficient deletion of substrings, concatenation and splitting of strings, and cut-and-paste operations. Our approach combines a matrix representation of MSO queries and a novel data structure for dynamic word problems over semi-groups which yields an overall algorithm that is elegant and easy to formulate.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database theory
Keywords
  • Query evaluation
  • direct access
  • MSO queries

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Georgii M Adel’son-Vel’skii and Evgenii Landis. An algorithm for the organization of information. Soviet Math., 3:1259-1263, 1962. Google Scholar
  2. Carme Àlvarez and Birgit Jenner. A very hard log-space counting class. Theor. Comput. Sci., 107(1):3-30, 1993. URL: https://doi.org/10.1016/0304-3975(93)90252-O.
  3. Antoine Amarilli, Pierre Bourhis, and Stefan Mengel. Enumeration on trees under relabelings. In ICDT, volume 98, pages 5:1-5:18, 2018. URL: https://doi.org/10.4230/LIPIcs.ICDT.2018.5.
  4. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Constant-delay enumeration for nondeterministic document spanners. In ICDT, pages 22:1-22:19, 2019. URL: https://doi.org/10.4230/LIPIcs.ICDT.2019.22.
  5. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Enumeration on trees with tractable combined complexity and efficient updates. In PODS, pages 89-103, 2019. URL: https://doi.org/10.1145/3294052.3319702.
  6. Antoine Amarilli, Louis Jachiet, Martin Muñoz, and Cristian Riveros. Efficient enumeration for annotated grammars. In PODS, pages 291-300, 2022. URL: https://doi.org/10.1145/3517804.3526232.
  7. Antoine Amarilli, Louis Jachiet, and Charles Paperman. Dynamic membership for regular languages. In ICALP, volume 198 of LIPIcs, pages 116:1-116:17, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.116.
  8. Guillaume Bagan, Arnaud Durand, Etienne Grandjean, and Frédéric Olive. Computing the jth solution of a first-order query. RAIRO Theor. Informatics Appl., 42(1):147-164, 2008. URL: https://doi.org/10.1051/ita:2007046.
  9. Nurzhan Bakibayev, Tomás Kociský, Dan Olteanu, and Jakub Zavodny. Aggregation and ordering in factorised databases. Proc. VLDB Endow., 6(14):1990-2001, 2013. URL: https://doi.org/10.14778/2556549.2556579.
  10. Andrey Balmin, Yannis Papakonstantinou, and Victor Vianu. Incremental validation of XML documents. ACM Trans. Database Syst., 29(4):710-751, 2004. URL: https://doi.org/10.1145/1042046.1042050.
  11. Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering conjunctive queries under updates. In PODS, pages 303-318, 2017. URL: https://doi.org/10.1145/3034786.3034789.
  12. Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering FO+MOD queries under updates on bounded degree databases. ACM Trans. Database Syst., 43(2):7:1-7:32, 2018. URL: https://doi.org/10.1145/3232056.
  13. Guy E. Blelloch, Daniel Ferizovic, and Yihan Sun. Just join for parallel ordered sets. In SPAA, pages 253-264, 2016. URL: https://doi.org/10.1145/2935764.2935768.
  14. Johann Brault-Baron. De la pertinence de l'énumération : complexité en logiques propositionnelle et du premier ordre. (The relevance of the list: propositional logic and complexity of the first order). PhD thesis, University of Caen Normandy, France, 2013. URL: https://tel.archives-ouvertes.fr/tel-01081392.
  15. Karl Bringmann, Nofar Carmeli, and Stefan Mengel. Tight fine-grained bounds for direct access on join queries. In PODS, pages 427-436, 2022. URL: https://doi.org/10.1145/3517804.3526234.
  16. J Richard Büchi. Weak second-order arithmetic and finite automata. Mathematical Logic Quarterly, 6(1-6), 1960. URL: https://doi.org/10.1002/malq.19600060105.
  17. Florent Capelli and Oliver Irwin. Direct access for conjunctive queries with negations. In ICDT, volume 290, pages 13:1-13:20, 2024. URL: https://doi.org/10.4230/LIPIcs.ICDT.2024.13.
  18. Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, and Mirek Riedewald. Tractable orders for direct access to ranked answers of conjunctive queries. ACM Trans. Database Syst., 48(1):1:1-1:45, 2023. URL: https://doi.org/10.1145/3578517.
  19. Nofar Carmeli, Shai Zeevi, Christoph Berkholz, Alessio Conte, Benny Kimelfeld, and Nicole Schweikardt. Answering (unions of) conjunctive queries using random access and random-order enumeration. ACM Trans. Database Syst., 47(3):9:1-9:49, 2022. URL: https://doi.org/10.1145/3531055.
  20. Shaleen Deep, Xiao Hu, and Paraschos Koutris. Ranked enumeration of join queries with projections. Proc. VLDB Endow., 15(5):1024-1037, 2022. URL: https://doi.org/10.14778/3510397.3510401.
  21. Johannes Doleschal, Benny Kimelfeld, Wim Martens, and Liat Peterfreund. Weight annotation in information extraction. Log. Methods Comput. Sci., 18(1), 2022. URL: https://doi.org/10.46298/lmcs-18(1:21)2022.
  22. James R Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. Making data structures persistent. In STOC, pages 109-121, 1986. URL: https://doi.org/10.1145/12130.12142.
  23. Idan Eldar, Nofar Carmeli, and Benny Kimelfeld. Direct access for answers to conjunctive queries with aggregation. In ICDT, volume 290, pages 4:1-4:20, 2024. URL: https://doi.org/10.4230/LIPIcs.ICDT.2024.4.
  24. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document spanners: A formal approach to information extraction. Journal of the ACM (JACM), 62(2):1-51, 2015. URL: https://doi.org/10.1145/2699442.
  25. Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, and Domagoj Vrgoč. Efficient enumeration algorithms for regular document spanners. ACM Transactions on Database Systems (TODS), 45(1):1-42, 2020. URL: https://doi.org/10.1145/3351451.
  26. Gudmund Skovbjerg Frandsen, Peter Bro Miltersen, and Sven Skyum. Dynamic word problems. Journal of the ACM (JACM), 44(2):257-271, 1997. URL: https://doi.org/10.1145/256303.256309.
  27. Etienne Grandjean and Louis Jachiet. Which arithmetic operations can be performed in constant time in the RAM model with addition? CoRR, abs/2206.13851, 2022. URL: https://doi.org/10.48550/arXiv.2206.13851.
  28. Muhammad Idris, Martín Ugarte, and Stijn Vansummeren. The dynamic yannakakis algorithm: Compact and efficient query processing under updates. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 1259-1274, 2017. URL: https://doi.org/10.1145/3035918.3064027.
  29. Sarah Kleest-Meißner, Jonas Marasus, and Matthias Niewerth. MSO queries on trees: Enumerating answers under updates using forest algebras. CoRR, abs/2208.04180, 2022. URL: https://doi.org/10.48550/arXiv.2208.04180.
  30. Donald Ervin Knuth. The art of computer programming, , Volume III, 2nd Edition. Addison-Wesley, 1998. URL: https://www.worldcat.org/oclc/312994415.
  31. Leonid Libkin. Elements of finite model theory, volume 41. Springer, 2004. URL: https://doi.org/10.1007/978-3-662-07003-1.
  32. Katja Losemann and Wim Martens. MSO queries on trees: enumerating answers under updates. In CSL-LICS, pages 67:1-67:10, 2014. URL: https://doi.org/10.1145/2603088.2603137.
  33. Francisco Maturana, Cristian Riveros, and Domagoj Vrgoc. Document spanners for extracting incomplete information: Expressiveness and complexity. In PODS, pages 125-136, 2018. URL: https://doi.org/10.1145/3196959.3196968.
  34. Martin Muñoz and Cristian Riveros. Streaming enumeration on nested documents. In ICDT, volume 220 of LIPIcs, pages 19:1-19:18, 2022. URL: https://doi.org/10.4230/LIPIcs.ICDT.2022.19.
  35. Matthias Niewerth. MSO queries on trees: Enumerating answers under updates using forest algebras. In LICS, pages 769-778, 2018. URL: https://doi.org/10.1145/3209108.3209144.
  36. Matthias Niewerth and Luc Segoufin. Enumeration of MSO queries on strings with constant delay and logarithmic updates. In Jan Van den Bussche and Marcelo Arenas, editors, PODS, pages 179-191, 2018. URL: https://doi.org/10.1145/3196959.3196961.
  37. Liat Peterfreund. Grammars for document spanners. In Ke Yi and Zhewei Wei, editors, ICDT, volume 186, pages 7:1-7:18, 2021. URL: https://doi.org/10.4230/LIPIcs.ICDT.2021.7.
  38. Markus L. Schmid and Nicole Schweikardt. Spanner evaluation over slp-compressed documents. In PODS, pages 153-165, 2021. URL: https://doi.org/10.1145/3452021.3458325.
  39. Markus L. Schmid and Nicole Schweikardt. Query evaluation over slp-represented document databases with complex document editing. In PODS, pages 79-89, 2022. URL: https://doi.org/10.1145/3517804.3524158.
  40. Nikolaos Tziavelis, Wolfgang Gatterbauer, and Mirek Riedewald. Optimal join algorithms meet top-k. In SIGMOD, pages 2659-2665, 2020. URL: https://doi.org/10.1145/3318464.3383132.
  41. Nikolaos Tziavelis, Wolfgang Gatterbauer, and Mirek Riedewald. Any-k algorithms for enumerating ranked answers to conjunctive queries. CoRR, abs/2205.05649, 2022. URL: https://doi.org/10.48550/arXiv.2205.05649.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail