Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures 𝖯, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.
@InProceedings{bell_et_al:LIPIcs.ICDT.2025.29, author = {Bell, Owen M. and Day, Joel D. and Freydenberger, Dominik D.}, title = {{FC-Datalog as a Framework for Efficient String Querying}}, booktitle = {28th International Conference on Database Theory (ICDT 2025)}, pages = {29:1--29:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-364-5}, ISSN = {1868-8969}, year = {2025}, volume = {328}, editor = {Roy, Sudeepa and Kara, Ahmet}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.29}, URN = {urn:nbn:de:0030-drops-229708}, doi = {10.4230/LIPIcs.ICDT.2025.29}, annote = {Keywords: Information extraction, word equations, datalog, document spanners, regex} }
Feedback for Dagstuhl Publishing