Search Results

Documents authored by Holldack, Mario


Document
Document Spanners: From Expressive Power to Decision Problems

Authors: Dominik D. Freydenberger and Mario Holldack

Published in: LIPIcs, Volume 48, 19th International Conference on Database Theory (ICDT 2016)


Abstract
We examine document spanners, a formal framework for information extraction that was introduced by Fagin et al. (PODS 2013). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection. First, we compare the expressive power of core spanners to three models - namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator). These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners. Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators.

Cite as

Dominik D. Freydenberger and Mario Holldack. Document Spanners: From Expressive Power to Decision Problems. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2016.17,
  author =	{Freydenberger, Dominik D. and Holldack, Mario},
  title =	{{Document Spanners: From Expressive Power to Decision Problems}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{17:1--17:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.17},
  URN =		{urn:nbn:de:0030-drops-57867},
  doi =		{10.4230/LIPIcs.ICDT.2016.17},
  annote =	{Keywords: Information extraction, document spanners, regular expressions, regex, patterns, word equations, decision problems, descriptional complexity}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail