,
Joel D. Day
,
Dominik D. Freydenberger
Creative Commons Attribution 4.0 International license
Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures 𝖯, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.
@InProceedings{bell_et_al:LIPIcs.ICDT.2025.29,
author = {Bell, Owen M. and Day, Joel D. and Freydenberger, Dominik D.},
title = {{FC-Datalog as a Framework for Efficient String Querying}},
booktitle = {28th International Conference on Database Theory (ICDT 2025)},
pages = {29:1--29:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-364-5},
ISSN = {1868-8969},
year = {2025},
volume = {328},
editor = {Roy, Sudeepa and Kara, Ahmet},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.29},
URN = {urn:nbn:de:0030-drops-229708},
doi = {10.4230/LIPIcs.ICDT.2025.29},
annote = {Keywords: Information extraction, word equations, datalog, document spanners, regex}
}