Document Spanners: From Expressive Power to Decision Problems

Authors Dominik D. Freydenberger, Mario Holldack



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2016.17.pdf
  • Filesize: 0.56 MB
  • 17 pages

Document Identifiers

Author Details

Dominik D. Freydenberger
Mario Holldack

Cite AsGet BibTex

Dominik D. Freydenberger and Mario Holldack. Document Spanners: From Expressive Power to Decision Problems. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ICDT.2016.17

Abstract

We examine document spanners, a formal framework for information extraction that was introduced by Fagin et al. (PODS 2013). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection. First, we compare the expressive power of core spanners to three models - namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator). These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners. Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators.
Keywords
  • Information extraction
  • document spanners
  • regular expressions
  • regex
  • patterns
  • word equations
  • decision problems
  • descriptional complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. P. Barceló, L. Libkin, A. W. Lin, and P. T. Wood. Expressive languages for path queries over graph-structured data. ACM T. Database Syst., 37(4):31, 2012. Google Scholar
  2. P. Barceló and P. Muñoz. Graph logics with rational relations: the role of word combinatorics. In Proc. CSL-LICS 2014, 2014. Google Scholar
  3. J. Bremer and D. D. Freydenberger. Inclusion problems for patterns with a bounded number of variables. Inform. Comput., 220-221:15-43, 2012. Google Scholar
  4. C. Câmpeanu, K. Salomaa, and S. Yu. A formal study of practical regular expressions. Int. J. Found. Comput. Sci., 14:1007-1018, 2003. Google Scholar
  5. V. Diekert. Makanin’s Algorithm. In M. Lothaire, editor, Algebraic Combinatorics on Words, chapter 12, pages 387-442. Cambridge University Press, 2002. Google Scholar
  6. R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren. Cleaning inconsistencies in information extraction via prioritized repairs. In Proc. PODS 2014, 2014. Google Scholar
  7. R. Fagin, B. Kimelfeld, F. Reiss, and S. Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2):12, 2015. Google Scholar
  8. H. Fernau, F. Manea, R. Mercas, and M. L. Schmid. Pattern matching with variables: Fast algorithms and new hardness results. In Proc. STACS 2015, 2015. Google Scholar
  9. H. Fernau and M. L. Schmid. Pattern matching with variables: A multivariate complexity analysis. Inf. Comput., 242:287-305, 2015. Google Scholar
  10. H. Fernau, M. L. Schmid, and Y. Villanger. On the parameterised complexity of string morphism problems. Theory Comput. Sys., 2015. Google Scholar
  11. D. D. Freydenberger. Extended regular expressions: Succinctness and decidability. Theory Comput. Sys., 53(2):159-193, 2013. Google Scholar
  12. D. D. Freydenberger and D. Reidenbach. Bad news on decision problems for patterns. Inform. Comput., 208(1):83-96, 2010. Google Scholar
  13. D. D. Freydenberger and N. Schweikardt. Expressiveness and static analysis of extended conjunctive regular path queries. J. Comput. Syst. Sci., 79(6):892-909, 2013. Google Scholar
  14. J. E. F. Friedl. Mastering Regular Expressions. O'Reilly Media, 3rd edition, 2006. Google Scholar
  15. S. Ginsburg and E. Spanier. Semigroups, presburger formulas, and languages. Pac. J. Math., 16(2):285-296, 1966. Google Scholar
  16. J. Hartmanis. On Gödel speed-up and succinctness of language representations. Theor. Comput. Sci., 26(3):335-342, 1983. Google Scholar
  17. M. Holzer and M. Kutrib. Descriptional complexity-an introductory survey. Scientific Applications of Language Methods, 2:1-58, 2010. Google Scholar
  18. O. H. Ibarra, T.-C. Pong, and S. M. Sohn. A note on parsing pattern languages. Pattern Recogn. Lett., 16(2):179-182, 1995. Google Scholar
  19. T. Jiang, E. Kinber, A. Salomaa, K. Salomaa, and S. Yu. Pattern languages with and without erasing. Int. J. Comput. Math., 50:147-163, 1994. Google Scholar
  20. J. Karhumäki, F. Mignosi, and W. Plandowski. The expressibility of languages and relations by word equations. J. ACM, 47(3):483-505, 2000. Google Scholar
  21. M. Kutrib. The phenomenon of non-recursive trade-offs. Int. J. Found. Comput. Sci., 16(5):957-973, 2005. Google Scholar
  22. M. Lothaire. Combinatorics on Words. Cambridge University Press, 1997. Google Scholar
  23. E. Ohlebusch and E. Ukkonen. On the equivalence problem for E-pattern languages. Theor. Comput. Sci., 186:231-248, 1997. Google Scholar
  24. R. J. Parikh. On context-free languages. J. ACM, 13(4):570-581, 1966. Google Scholar
  25. D. Reidenbach and M. L. Schmid. Patterns with bounded treewidth. Inform. Comput., 239:87-99, 2014. Google Scholar
  26. F. Stephan, R. Yoshinaka, and T. Zeugmann. On the parameterised complexity of learning patterns. In Proc. ISCIS 2011, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail