Splitting Spanner Atoms: A Tool for Acyclic Core Spanners

Authors Dominik D. Freydenberger , Sam M. Thompson



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2022.10.pdf
  • Filesize: 0.74 MB
  • 18 pages

Document Identifiers

Author Details

Dominik D. Freydenberger
  • Loughborough University, UK
Sam M. Thompson
  • Loughborough University, UK

Acknowledgements

The authors would like to thank Justin Brackemann, and the anonymous reviewers for all their helpful comments and suggestions.

Cite AsGet BibTex

Dominik D. Freydenberger and Sam M. Thompson. Splitting Spanner Atoms: A Tool for Acyclic Core Spanners. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 10:1-10:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ICDT.2022.10

Abstract

This paper investigates regex CQs with string equalities (SERCQs), a subclass of core spanners. As shown by Freydenberger, Kimelfeld, and Peterfreund (PODS 2018), these queries are intractable, even if restricted to acyclic queries. This previous result defines acyclicity by treating regex formulas as atoms. In contrast to this, we propose an alternative definition by converting SERCQs into FC-CQs - conjunctive queries in FC, a logic that is based on word equations. We introduce a way to decompose word equations of unbounded arity into a conjunction of binary word equations. If the result of the decomposition is acyclic, then evaluation and enumeration of results become tractable. The main result of this work is an algorithm that decides in polynomial time whether an FC-CQ can be decomposed into an acyclic FC-CQ. We also give an efficient conversion from synchronized SERCQs to FC-CQs with regular constraints. As a consequence, tractability results for acyclic relational CQs directly translate to a large class of SERCQs.

Subject Classification

ACM Subject Classification
  • Theory of computation → Complexity theory and logic
Keywords
  • Document spanners
  • information extraction
  • conjunctive queries

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of databases, volume 8. Addison-Wesley Reading, 1995. Google Scholar
  2. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Constant-delay enumeration for nondeterministic document spanners. ACM SIGMOD Record, 49(1):25-32, 2020. Google Scholar
  3. Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Proceedings of CSL 2007, pages 208-222, 2007. Google Scholar
  4. Joachim Bremer and Dominik D. Freydenberger. Inclusion problems for patterns with a bounded number of variables. Information and Computation, 220:15-43, 2012. Google Scholar
  5. Stefan Burkhardt, Juha Kärkkäinen, and Peter Sanders. Linear work suffix array construction. Journal of the ACM, 53(6):918-936, 2006. Google Scholar
  6. Andrzej Ehrenfreucht and Grzegorz Rozenberg. Finding a homomorphism between two words is NP-complete. Information Processing Letters, 9(2):86-88, 1979. Google Scholar
  7. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document spanners: A formal approach to information extraction. Journal of the ACM, 62(2):12, 2015. Google Scholar
  8. Henning Fernau, Markus L Schmid, and Yngve Villanger. On the parameterised complexity of string morphism problems. Theory of Computing Systems, 59:24-51, 2016. Google Scholar
  9. Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, and Domagoj Vrgoc. Constant delay algorithms for regular document spanners. In Proceedings of PODS 2018, pages 165-177, 2018. Google Scholar
  10. Dominik D. Freydenberger. A logic for document spanners. Theory of Computing Systems, 63(7):1679-1754, 2019. Google Scholar
  11. Dominik D. Freydenberger and Mario Holldack. Document spanners: From expressive power to decision problems. Theory of Computing Systems, 62(4):854-898, 2018. Google Scholar
  12. Dominik D. Freydenberger, Benny Kimelfeld, Markus Kröll, and Liat Peterfreund. Complexity bounds for relational algebra over document spanners. In Proceedings of PODS 2019, pages 320-334, 2019. Google Scholar
  13. Dominik D. Freydenberger, Benny Kimelfeld, and Liat Peterfreund. Joining extractions of regular expressions. In Proceedings of PODS 2018, pages 137-149, 2018. Google Scholar
  14. Dominik D. Freydenberger and Liat Peterfreund. The theory of concatenation over finite models. In Proceedings of ICALP 2021, pages 130:1-130:17, 2021. Google Scholar
  15. Georg Gottlob, Nicola Leone, and Francesco Scarcello. The complexity of acyclic conjunctive queries. Journal of the ACM, 48(3):431-498, 2001. Google Scholar
  16. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. Google Scholar
  17. Markus Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups-Complexity-Cryptology, 4(2):241-299, 2012. Google Scholar
  18. Gloria Olive. Catalan numbers revisited. Journal of mathematical analysis and applications, 111(1):201-235, 1985. Google Scholar
  19. Liat Peterfreund. Grammars for document spanners. In Proceedings of ICDT 2021, pages 7:1-7:18, 2021. Google Scholar
  20. Steven David Prestwich. CNF encodings. Handbook of satisfiability, 185:75-97, 2009. Google Scholar
  21. Markus L. Schmid and Nicole Schweikardt. A purely regular approach to non-regular core spanners. In Proceedings of ICDT 2021, pages 4:1-4:19, 2021. Google Scholar
  22. Mihalis Yannakakis. Algorithms for acyclic database schemes. In Proceedings of VLDB 1981, pages 82-94, 1981. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail