9 Search Results for "Freydenberger, Dominik D."


Document
Splitting Spanner Atoms: A Tool for Acyclic Core Spanners

Authors: Dominik D. Freydenberger and Sam M. Thompson

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)


Abstract
This paper investigates regex CQs with string equalities (SERCQs), a subclass of core spanners. As shown by Freydenberger, Kimelfeld, and Peterfreund (PODS 2018), these queries are intractable, even if restricted to acyclic queries. This previous result defines acyclicity by treating regex formulas as atoms. In contrast to this, we propose an alternative definition by converting SERCQs into FC-CQs - conjunctive queries in FC, a logic that is based on word equations. We introduce a way to decompose word equations of unbounded arity into a conjunction of binary word equations. If the result of the decomposition is acyclic, then evaluation and enumeration of results become tractable. The main result of this work is an algorithm that decides in polynomial time whether an FC-CQ can be decomposed into an acyclic FC-CQ. We also give an efficient conversion from synchronized SERCQs to FC-CQs with regular constraints. As a consequence, tractability results for acyclic relational CQs directly translate to a large class of SERCQs.

Cite as

Dominik D. Freydenberger and Sam M. Thompson. Splitting Spanner Atoms: A Tool for Acyclic Core Spanners. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 10:1-10:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2022.10,
  author =	{Freydenberger, Dominik D. and Thompson, Sam M.},
  title =	{{Splitting Spanner Atoms: A Tool for Acyclic Core Spanners}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{10:1--10:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.10},
  URN =		{urn:nbn:de:0030-drops-158843},
  doi =		{10.4230/LIPIcs.ICDT.2022.10},
  annote =	{Keywords: Document spanners, information extraction, conjunctive queries}
}
Document
Track B: Automata, Logic, Semantics, and Theory of Programming
The Theory of Concatenation over Finite Models

Authors: Dominik D. Freydenberger and Liat Peterfreund

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)


Abstract
We propose FC, a new logic on words that combines finite model theory with the theory of concatenation - a first-order logic that is based on word equations. Like the theory of concatenation, FC is built around word equations; in contrast to it, its semantics are defined to only allow finite models, by limiting the universe to a word and all its factors. As a consequence of this, FC has many of the desirable properties of FO on finite models, while being far more expressive than FO[<]. Most noteworthy among these desirable properties are sufficient criteria for efficient model checking, and capturing various complexity classes by adding operators for transitive closures or fixed points. Not only does FC allow us to obtain new insights and techniques for expressive power and efficient evaluation of document spanners, but it also provides a general framework for logic on words that also has potential applications in other areas.

Cite as

Dominik D. Freydenberger and Liat Peterfreund. The Theory of Concatenation over Finite Models. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 130:1-130:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICALP.2021.130,
  author =	{Freydenberger, Dominik D. and Peterfreund, Liat},
  title =	{{The Theory of Concatenation over Finite Models}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{130:1--130:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.130},
  URN =		{urn:nbn:de:0030-drops-141997},
  doi =		{10.4230/LIPIcs.ICALP.2021.130},
  annote =	{Keywords: finite model theory, word equations, descriptive complexity, model checking, document spanners}
}
Document
Grammars for Document Spanners

Authors: Liat Peterfreund

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied formalisms for document spanners are mainly based on regular expressions, we use an extension of context-free grammars, called {extraction grammars}, to define the new class of context-free spanners. Extraction grammars are simply context-free grammars extended with variables that capture interval positions of the document, namely spans. While regular expressions are efficient for tokenizing and tagging, context-free grammars are also efficient for capturing structural properties. Indeed, we show that context-free spanners are strictly more expressive than their regular counterparts. We reason about the expressive power of our new class and present a pushdown-automata model that captures it. We show that extraction grammars can be evaluated with polynomial data complexity. Nevertheless, as the degree of the polynomial depends on the query, we present an enumeration algorithm for unambiguous extraction grammars that, after quintic preprocessing, outputs the results sequentially, without repetitions, with a constant delay between every two consecutive ones.

Cite as

Liat Peterfreund. Grammars for Document Spanners. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{peterfreund:LIPIcs.ICDT.2021.7,
  author =	{Peterfreund, Liat},
  title =	{{Grammars for Document Spanners}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.7},
  URN =		{urn:nbn:de:0030-drops-137154},
  doi =		{10.4230/LIPIcs.ICDT.2021.7},
  annote =	{Keywords: Information Extraction, Document Spanners, Context-Free Grammars, Constant-Delay Enumeration, Regular Expressions, Pushdown Automata}
}
Document
Dynamic Complexity of Document Spanners

Authors: Dominik D. Freydenberger and Sam M. Thompson

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)


Abstract
The present paper investigates the dynamic complexity of document spanners, a formal framework for information extraction introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (JACM 2015). We first look at the class of regular spanners and prove that any regular spanner can be maintained in the dynamic complexity class DynPROP. This result follows from work done previously on the dynamic complexity of formal languages by Gelade, Marquardt, and Schwentick (TOCL 2012). To investigate core spanners we use SpLog, a concatenation logic that exactly captures core spanners. We show that the dynamic complexity class DynCQ is more expressive than SpLog and therefore can maintain any core spanner. This result is then extended to show that DynFO can maintain any generalized core spanner and that DynFO is more powerful than SpLog with negation.

Cite as

Dominik D. Freydenberger and Sam M. Thompson. Dynamic Complexity of Document Spanners. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 11:1-11:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2020.11,
  author =	{Freydenberger, Dominik D. and Thompson, Sam M.},
  title =	{{Dynamic Complexity of Document Spanners}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{11:1--11:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.11},
  URN =		{urn:nbn:de:0030-drops-119355},
  doi =		{10.4230/LIPIcs.ICDT.2020.11},
  annote =	{Keywords: Document spanners, information extraction, dynamic complexity, descriptive complexity, word equations}
}
Document
A Logic for Document Spanners

Authors: Dominik D. Freydenberger

Published in: LIPIcs, Volume 68, 20th International Conference on Database Theory (ICDT 2017)


Abstract
Document spanners are a formal framework for information extraction that was introduced by [Fagin, Kimelfeld, Reiss, and Vansummeren, J.ACM, 2015]. One of the central models in this framework are core spanners, which are based on regular expressions with variables that are then extended with an algebra. As shown by [Freydenberger and Holldack, ICDT, 2016], there is a connection between core spanners and EC^{reg}, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of EC^{reg} that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between this fragment and core spanners. This even holds for variants of core spanners that are based on automata instead of regular expressions. Applications of this approach include an alternative way of defining relations for spanners, insights into the relative succinctness of various classes of spanner representations, and a pumping lemma for core spanners.

Cite as

Dominik D. Freydenberger. A Logic for Document Spanners. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{freydenberger:LIPIcs.ICDT.2017.13,
  author =	{Freydenberger, Dominik D.},
  title =	{{A Logic for Document Spanners}},
  booktitle =	{20th International Conference on Database Theory (ICDT 2017)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-024-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{68},
  editor =	{Benedikt, Michael and Orsi, Giorgio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2017.13},
  URN =		{urn:nbn:de:0030-drops-70493},
  doi =		{10.4230/LIPIcs.ICDT.2017.13},
  annote =	{Keywords: information extraction, document spanners, word equations, regex, descriptional complexity}
}
Document
Deterministic Regular Expressions with Back-References

Authors: Dominik D. Freydenberger and Markus L. Schmid

Published in: LIPIcs, Volume 66, 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017)


Abstract
Most modern libraries for regular expression matching allow back-references (i.e. repetition operators) that substantially increase expressive power, but also lead to intractability. In order to find a better balance between expressiveness and tractability, we combine these with the notion of determinism for regular expressions used in XML DTDs and XML Schema. This includes the definition of a suitable automaton model, and a generalization of the Glushkov construction.

Cite as

Dominik D. Freydenberger and Markus L. Schmid. Deterministic Regular Expressions with Back-References. In 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 66, pp. 33:1-33:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.STACS.2017.33,
  author =	{Freydenberger, Dominik D. and Schmid, Markus L.},
  title =	{{Deterministic Regular Expressions with Back-References}},
  booktitle =	{34th Symposium on Theoretical Aspects of Computer Science (STACS 2017)},
  pages =	{33:1--33:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-028-6},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{66},
  editor =	{Vollmer, Heribert and Vall\'{e}e, Brigitte},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2017.33},
  URN =		{urn:nbn:de:0030-drops-70004},
  doi =		{10.4230/LIPIcs.STACS.2017.33},
  annote =	{Keywords: Deterministic Regular Expression, Regex, Glushkov Automaton}
}
Document
Document Spanners: From Expressive Power to Decision Problems

Authors: Dominik D. Freydenberger and Mario Holldack

Published in: LIPIcs, Volume 48, 19th International Conference on Database Theory (ICDT 2016)


Abstract
We examine document spanners, a formal framework for information extraction that was introduced by Fagin et al. (PODS 2013). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection. First, we compare the expressive power of core spanners to three models - namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator). These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners. Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators.

Cite as

Dominik D. Freydenberger and Mario Holldack. Document Spanners: From Expressive Power to Decision Problems. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2016.17,
  author =	{Freydenberger, Dominik D. and Holldack, Mario},
  title =	{{Document Spanners: From Expressive Power to Decision Problems}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{17:1--17:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Martens, Wim and Zeume, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2016.17},
  URN =		{urn:nbn:de:0030-drops-57867},
  doi =		{10.4230/LIPIcs.ICDT.2016.17},
  annote =	{Keywords: Information extraction, document spanners, regular expressions, regex, patterns, word equations, decision problems, descriptional complexity}
}
Document
Weakly Unambiguous Morphisms

Authors: Dominik D. Freydenberger, Hossein Nevisi, and Daniel Reidenbach

Published in: LIPIcs, Volume 9, 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)


Abstract
A nonerasing morphism sigma is said to be weakly unambiguous with respect to a word w if sigma is the only nonerasing morphism that can map w to sigma(w), i.e., there does not exist any other nonerasing morphism tau satisfying tau(w) = sigma(w). In the present paper, we wish to characterise those words with respect to which there exists such a morphism. This question is nontrivial if we consider so-called length-increasing morphisms, which map a word to an image that is strictly longer than the word. Our main result is a compact characterisation that holds for all morphisms with ternary or larger target alphabets. We also comprehensively describe those words that have a weakly unambiguous length-increasing morphism with a unary target alphabet, but we have to leave the problem open for binary alphabets, where we can merely give some non-characteristic conditions.

Cite as

Dominik D. Freydenberger, Hossein Nevisi, and Daniel Reidenbach. Weakly Unambiguous Morphisms. In 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011). Leibniz International Proceedings in Informatics (LIPIcs), Volume 9, pp. 213-224, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.STACS.2011.213,
  author =	{Freydenberger, Dominik D. and Nevisi, Hossein and Reidenbach, Daniel},
  title =	{{Weakly Unambiguous Morphisms}},
  booktitle =	{28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)},
  pages =	{213--224},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-25-5},
  ISSN =	{1868-8969},
  year =	{2011},
  volume =	{9},
  editor =	{Schwentick, Thomas and D\"{u}rr, Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2011.213},
  URN =		{urn:nbn:de:0030-drops-30123},
  doi =		{10.4230/LIPIcs.STACS.2011.213},
  annote =	{Keywords: Combinatorics on words, Morphisms, Ambiguity}
}
Document
Extended Regular Expressions: Succinctness and Decidability

Authors: Dominik D. Freydenberger

Published in: LIPIcs, Volume 9, 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)


Abstract
Most modern implementations of regular expression engines allow the use of variables (also called back references). The resulting extended regular expressions (which, in the literature, are also called practical regular expressions, rewbr, or regex) are able to express non-regular languages. The present paper demonstrates that extended regular-expressions cannot be minimized effectively (neither with respect to length, nor number of variables), and that the tradeoff in size between extended and ``classical'' regular expressions is not bounded by any recursive function. In addition to this, we prove the undecidability of several decision problems (universality, equivalence, inclusion, regularity, and cofiniteness) for extended regular expressions. Furthermore, we show that all these results hold even if the extended regular expressions contain only a single variable.

Cite as

Dominik D. Freydenberger. Extended Regular Expressions: Succinctness and Decidability. In 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011). Leibniz International Proceedings in Informatics (LIPIcs), Volume 9, pp. 507-518, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)


Copy BibTex To Clipboard

@InProceedings{freydenberger:LIPIcs.STACS.2011.507,
  author =	{Freydenberger, Dominik D.},
  title =	{{Extended Regular Expressions: Succinctness and Decidability}},
  booktitle =	{28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)},
  pages =	{507--518},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-25-5},
  ISSN =	{1868-8969},
  year =	{2011},
  volume =	{9},
  editor =	{Schwentick, Thomas and D\"{u}rr, Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2011.507},
  URN =		{urn:nbn:de:0030-drops-30396},
  doi =		{10.4230/LIPIcs.STACS.2011.507},
  annote =	{Keywords: extended regular expressions, regex, decidability, non-recursive tradeoffs}
}
  • Refine by Author
  • 8 Freydenberger, Dominik D.
  • 2 Peterfreund, Liat
  • 2 Thompson, Sam M.
  • 1 Holldack, Mario
  • 1 Nevisi, Hossein
  • Show More...

  • Refine by Classification
  • 2 Information systems → Information extraction
  • 2 Theory of computation → Complexity theory and logic
  • 1 Information systems → Data model extensions
  • 1 Information systems → Relational database model
  • 1 Theory of computation → Database query languages (principles)
  • Show More...

  • Refine by Keyword
  • 4 word equations
  • 3 document spanners
  • 3 information extraction
  • 3 regex
  • 2 Document spanners
  • Show More...

  • Refine by Type
  • 9 document

  • Refine by Publication Year
  • 2 2011
  • 2 2017
  • 2 2021
  • 1 2016
  • 1 2020
  • Show More...