FC-Datalog as a Framework for Efficient String Querying

Authors Owen M. Bell , Joel D. Day , Dominik D. Freydenberger



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.29.pdf
  • Filesize: 0.75 MB
  • 18 pages

Document Identifiers

Author Details

Owen M. Bell
  • Loughborough University, UK
Joel D. Day
  • Loughborough University, UK
Dominik D. Freydenberger
  • Loughborough University, UK

Acknowledgements

The authors would like to thank the anonymous reviewers of the current and previous versions for their detailed and helpful feedback.

Cite As Get BibTex

Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger. FC-Datalog as a Framework for Efficient String Querying. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 29:1-29:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.29

Abstract

Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures 𝖯, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization.
We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.

Subject Classification

ACM Subject Classification
  • Theory of computation → Logic and databases
Keywords
  • Information extraction
  • word equations
  • datalog
  • document spanners
  • regex

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
  2. Alfred V. Aho. Algorithms for Finding Patterns in Strings, pages 255-300. MIT Press, 1991. Google Scholar
  3. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Constant-Delay Enumeration for Nondeterministic Document Spanners. SIGMOD Rec., 49(1):25-32, 2020. URL: https://doi.org/10.1145/3422648.3422655.
  4. Dana Angluin. Finding Patterns Common to a Set of Strings. J. Comput. Syst. Sci., 21(1):46-62, 1980. URL: https://doi.org/10.1016/0022-0000(80)90041-0.
  5. Mena Badieh Habib Morgan and Maurice van Keulen. Information Extraction for Social Media. In Proc. SWAIE, pages 9-16, 2014. URL: https://doi.org/10.3115/v1/W14-6202.
  6. Pablo Barceló and Pablo Muñoz. Graph Logics with Rational Relations: The Role of Word Combinatorics. ACM Trans. Comput. Logic, 18(2), 2017. URL: https://doi.org/10.1145/3070822.
  7. Anthony J. Bonner and Giansalvatore Mecca. Sequences, Datalog, and Transducers. J. Comput. Syst. Sci., 57(3):234-259, 1998. URL: https://doi.org/10.1006/jcss.1998.1562.
  8. Pierre Boullier. Range Concatenation Grammars, pages 269-289. New Developments in Parsing Technology. Springer, 2004. URL: https://doi.org/10.1007/1-4020-2295-6_13.
  9. Anne Brüggemann-Klein. Regular expressions into finite automata. Theoretical Computer Science, 120(2):197-213, 1993. URL: https://doi.org/10.1016/0304-3975(93)90287-4.
  10. Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov. Complexity and Expressive Power of Logic Programming. ACM Computing Surveys, 33(3):374-425, 2001. URL: https://doi.org/10.1145/502807.502810.
  11. Heinz-Dieter Ebbinghaus and Jörg Flum. Finite Model Theory. Springer Monographs in Mathematics, 2nd edition, 1999. Google Scholar
  12. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document Spanners: A Formal Approach to Information Extraction. J. ACM, 62(2), 2015. URL: https://doi.org/10.1145/2699442.
  13. Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, and Domagoj Vrgoč. Efficient Enumeration Algorithms for Regular Document Spanners. ACM Trans. Database Syst., 45(1), 2020. URL: https://doi.org/10.1145/3351451.
  14. Dominik D. Freydenberger. A Logic for Document Spanners. Theory of Computing Systems, 63(7):1679-1754, 2019. URL: https://doi.org/10.1007/s00224-018-9874-1.
  15. Dominik D. Freydenberger and Mario Holldack. Document Spanners: From Expressive Power to Decision Problems. Theory Comput. Syst., 62(4):854-898, 2018. URL: https://doi.org/10.1007/S00224-017-9770-0.
  16. Dominik D. Freydenberger and Liat Peterfreund. The Theory of Concatenation over Finite Models. In Proc. ICALP 2021, pages 130:1-130:17, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.130.
  17. Dominik D. Freydenberger and Markus L. Schmid. Deterministic regular expressions with back-references. J. Comput. Syst. Sci., 105:1-39, 2019. URL: https://doi.org/10.1016/J.JCSS.2019.04.001.
  18. Dominik D. Freydenberger and Sam M. Thompson. Splitting Spanner Atoms: A Tool for Acyclic Core Spanners. In Proc. ICDT 2022, pages 10:1-10:18, 2022. URL: https://doi.org/10.4230/LIPIcs.ICDT.2022.10.
  19. V M Glushkov. The Absract Theory of Automata. Russian Mathematical Surveys, 16(5), 1961. URL: https://doi.org/10.1070/RM1961v016n05ABEH004112.
  20. Georg Gottlob and Christos Papadimitriou. On the complexity of single-rule datalog queries. Information and Computation, 183(1):104-122, 2003. URL: https://doi.org/10.1016/S0890-5401(03)00012-9.
  21. B. Groz and S. Maneth. Efficient testing and matching of deterministic regular expressions. J. Comp. Syst. Sci., 89:372-399, 2017. URL: https://doi.org/10.1016/j.jcss.2017.05.013.
  22. John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 3rd edition, 2007. Google Scholar
  23. Artur Jez. Word Equations in Nondeterministic Linear Space. In Proc. ICALP 2017, pages 95:1-95:13, 2017. URL: https://doi.org/10.4230/LIPIcs.ICALP.2017.95.
  24. Juhani Karhumäki, Filippo Mignosi, and Wojciech Plandowski. The expressibility of languages and relations by word equations. J. ACM, 47(3):483-505, 2000. URL: https://doi.org/10.1145/337244.337255.
  25. K. N. King. Alternating Multihead Finite Automata. Theoretical Computer Science, 61(2):149-174, 1988. URL: https://doi.org/10.1016/0304-3975(88)90122-3.
  26. S. C. Kleene. Representation of Events in Nerve Nets and Finite Automata, pages 3-42. Princeton University Press, 1956. URL: https://doi.org/10.1515/9781400882618-002.
  27. Antoni Koscielski and Leszek Pacholski. Complexity of Makanin’s algorithm. J. ACM, 43(4):670-684, 1996. URL: https://doi.org/10.1145/234533.234543.
  28. Leonid Libkin. Complexity of First-Order Logic, pages 87-111. Elements of Finite Model Theory. Springer, 2004. URL: https://doi.org/10.1007/978-3-662-07003-1_6.
  29. Dean Light, Ahmad Aiashi, Mahmoud Diab, Daniel Nachmias, Stijn Vansummeren, and Benny Kimelfeld. SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow. Proc. VLDB Endow., 17(12):4281-4284, 2024. URL: https://doi.org/10.14778/3685800.3685855.
  30. Carsten Lutz and Leif Sabellek. Ontology-Mediated Querying with the Description Logic EL: Trichotomy and Linear Datalog Rewritability. In IJCAI 2017, pages 1181-1187, 2017. URL: https://doi.org/10.24963/ijcai.2017/164.
  31. S. Miyano, A. Shinohara, and T. Shinohara. Which classes of Elementary Formal Systems are polynomial-time learnable? In Proc. ALT 1992, pages 139-150, 1992. Google Scholar
  32. Yoav Nahshon, Liat Peterfreund, and Stijn Vansummeren. Incorporating information extraction in the relational database model. In Proc. WebDB 2016, 2016. URL: https://doi.org/10.1145/2932194.2932200.
  33. Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld. Recursive Programs for Document Spanners. In Proc. ICDT 2019, pages 13:1-13:18, 2019. URL: https://doi.org/10.4230/LIPIcs.ICDT.2019.13.
  34. J. L. Ponty, D. Ziadi, and J. M. Champarnaud. A new quadratic algorithm to convert a regular expression into an automaton. In Automata Implementation, pages 109-119. Springer, 1997. Google Scholar
  35. Markus L. Schmid and Nicole Schweikardt. Spanner Evaluation over SLP-Compressed Documents. In Proc. PODS 2021, pages 153-165, 2021. URL: https://doi.org/10.1145/3452021.3458325.
  36. Luc Segoufin. Enumerating with constant delay the answers to a query. In Proc. ICDT 2013, pages 10-20, 2013. URL: https://doi.org/10.1145/2448496.2448498.
  37. Sam M. Thompson and Dominik D. Freydenberger. Languages Generated by Conjunctive Query Fragments of FC[REG]. In Proc. DLT 2023, pages 233-245, 2023. URL: https://doi.org/10.1007/978-3-031-33264-7_19.
  38. Sam M. Thompson and Dominik D. Freydenberger. Generalized Core Spanner Inexpressibility via Ehrenfeucht-Fraïssé Games for FC. Proc. ACM Manag. Data, 2(2), 2024. URL: https://doi.org/10.1145/3651143.
  39. Yanshan Wang, Liwei Wang, Majid Rastegar-Mojarad, Sungrim Moon, Feichen Shen, Naveed Afzal, Sijia Liu, Yuqun Zeng, Saeed Mehrabi, Sunghwan Sohn, and Hongfang Liu. Clinical information extraction applications: A literature review. Journal of Biomedical Informatics, 77:34-49, 2018. URL: https://doi.org/10.1016/j.jbi.2017.11.011.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail