Dynamic Complexity of Document Spanners

Authors Dominik D. Freydenberger , Sam M. Thompson



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2020.11.pdf
  • Filesize: 0.69 MB
  • 21 pages

Document Identifiers

Author Details

Dominik D. Freydenberger
  • Loughborough University, Loughborough, United Kingdom
Sam M. Thompson
  • Loughborough University, Loughborough, United Kingdom

Acknowledgements

The authors would like to thank the anonymous reviewers as well as Thomas Schwentick for their helpful comments and suggestions. The authors would also like to thank Thomas Zeume for clarifying a result from his thesis.

Cite AsGet BibTex

Dominik D. Freydenberger and Sam M. Thompson. Dynamic Complexity of Document Spanners. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 11:1-11:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ICDT.2020.11

Abstract

The present paper investigates the dynamic complexity of document spanners, a formal framework for information extraction introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (JACM 2015). We first look at the class of regular spanners and prove that any regular spanner can be maintained in the dynamic complexity class DynPROP. This result follows from work done previously on the dynamic complexity of formal languages by Gelade, Marquardt, and Schwentick (TOCL 2012). To investigate core spanners we use SpLog, a concatenation logic that exactly captures core spanners. We show that the dynamic complexity class DynCQ is more expressive than SpLog and therefore can maintain any core spanner. This result is then extended to show that DynFO can maintain any generalized core spanner and that DynFO is more powerful than SpLog with negation.

Subject Classification

ACM Subject Classification
  • Theory of computation → Complexity theory and logic
  • Information systems → Information extraction
Keywords
  • Document spanners
  • information extraction
  • dynamic complexity
  • descriptive complexity
  • word equations

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Constant-Delay Enumeration for Nondeterministic Document Spanners. In Proceedings of ICDT 2019, pages 22:1-22:19, 2019. Google Scholar
  2. Johannes Doleschal, Benny Kimelfeld, Wim Martens, Yoav Nahshon, and Frank Neven. Split-Correctness in Information Extraction. In Proceedings of PODS 2019, pages 149-163, 2019. Google Scholar
  3. Guozhu Dong, Jianwen Su, and Rodney Topor. Nonrecursive incremental evaluation of datalog queries. Annals of Mathematics and Artificial Intelligence, 14(2-4):187-223, 1995. Google Scholar
  4. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document Spanners: A Formal Approach to Information Extraction. Journal of the ACM, 62(2):12, 2015. Google Scholar
  5. Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, and Domagoj Vrgoc. Constant Delay Algorithms for Regular Document Spanners. In Proceedings of PODS 2018, pages 165-177, 2018. Google Scholar
  6. Dominik D. Freydenberger. A Logic for Document Spanners. Theory of Computing Systems, 63(7):1679-1754, 2019. Google Scholar
  7. Dominik D. Freydenberger and Mario Holldack. Document Spanners: From Expressive Power to Decision Problems. Theory of Computing Systems, 62(4):854-898, 2018. Google Scholar
  8. Dominik D. Freydenberger, Benny Kimelfeld, and Liat Peterfreund. Joining Extractions of Regular Expressions. In Proceedings of PODS 2018, pages 137-149, 2018. Google Scholar
  9. Dominik D. Freydenberger and Sam M. Thompson. Dynamic Complexity of Document Spanners, 2019. URL: http://arxiv.org/abs/1909.10869.
  10. Wouter Gelade, Marcel Marquardt, and Thomas Schwentick. The dynamic complexity of formal languages. ACM Transactions on Computational Logic, 13(3):19:1-19:36, 2012. Google Scholar
  11. Tao Jiang, Efim Kinber, Arto Salomaa, Kai Salomaa, and Sheng Yu. Pattern languages with and without erasing. International Journal of Computer Mathematics, 50(3-4):147-163, 1994. Google Scholar
  12. Katja Losemann. Foundations of Regular Languages for Processing RDF and XML. PhD thesis, University of Bayreuth, 2015. URL: https://epub.uni-bayreuth.de/2536/.
  13. Francisco Maturana, Cristian Riveros, and Domagoj Vrgoc. Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity. In Proceedings of PODS 2018, pages 125-136, 2018. Google Scholar
  14. Andrea Morciano, Martin Ugarte, and Stijn Vansummeren. Automata-Based Evaluation of AQL queries. Technical report, Université Libre de Bruxelles, 2016. Google Scholar
  15. Pablo Mu~noz, Nils Vortmeier, and Thomas Zeume. Dynamic Graph Queries. In Proceedings of ICDT 2016, pages 14:1-14:18, 2016. Google Scholar
  16. Sushant Patnaik and Neil Immerman. Dyn-FO: A parallel, dynamic complexity class. Journal of Computer and System Sciences, 55(2):199-209, 1997. Google Scholar
  17. Liat Peterfreund, Dominik D. Freydenberger, Benny Kimelfeld, and Markus Kröll. Complexity Bounds for Relational Algebra over Document Spanners. In Proceedings of PODS 2019, pages 320-334, 2019. Google Scholar
  18. Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld. Recursive Programs for Document Spanners. In Proceedings of ICDT 2019, pages 13:1-13:18, 2019. Google Scholar
  19. Markus L. Schmid. Characterising REGEX Languages by Regular Languages Equipped with Factor-Referencing. Information and Computation, 249:1-17, 2016. Google Scholar
  20. Thomas Zeume. Small dynamic complexity classes. Springer, 2017. Google Scholar
  21. Thomas Zeume and Thomas Schwentick. Dynamic conjunctive queries. Journal of Computer and System Sciences, 88:3-26, 2017. Google Scholar