The Theory of Concatenation over Finite Models

Authors Dominik D. Freydenberger , Liat Peterfreund

Thumbnail PDF


  • Filesize: 0.79 MB
  • 17 pages

Document Identifiers

Author Details

Dominik D. Freydenberger
  • Loughborough University, UK
Liat Peterfreund
  • DI ENS, ENS Paris, CNRS, PSL University, INRIA, France


The authors thank Joel D. Day, Manfred Kufleitner, Leonid Libkin, Sam M. Thompson, and the reviewers of the current and previous versions for their helpful comments.

Cite AsGet BibTex

Dominik D. Freydenberger and Liat Peterfreund. The Theory of Concatenation over Finite Models. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 130:1-130:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


We propose FC, a new logic on words that combines finite model theory with the theory of concatenation - a first-order logic that is based on word equations. Like the theory of concatenation, FC is built around word equations; in contrast to it, its semantics are defined to only allow finite models, by limiting the universe to a word and all its factors. As a consequence of this, FC has many of the desirable properties of FO on finite models, while being far more expressive than FO[<]. Most noteworthy among these desirable properties are sufficient criteria for efficient model checking, and capturing various complexity classes by adding operators for transitive closures or fixed points. Not only does FC allow us to obtain new insights and techniques for expressive power and efficient evaluation of document spanners, but it also provides a general framework for logic on words that also has potential applications in other areas.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database query languages (principles)
  • Theory of computation → Logic and databases
  • Theory of computation → Finite Model Theory
  • finite model theory
  • word equations
  • descriptive complexity
  • model checking
  • document spanners


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL:
  2. Isolde Adler and Mark Weyer. Tree-width for first order formulae. Log. Methods Comput. Sci., 8(1), 2012. Google Scholar
  3. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Constant-delay enumeration for nondeterministic document spanners. In Proc. ICDT 2019, pages 22:1-22:19, 2019. Google Scholar
  4. Michael Benedikt, Leonid Libkin, Thomas Schwentick, and Luc Segoufin. Definable relations and first-order query languages over strings. J. ACM, 50(5):694-751, 2003. Google Scholar
  5. Anthony J. Bonner and Giansalvatore Mecca. Sequences, datalog, and transducers. J. Comput. Syst. Sci, 57(3):234-259, 1998. Google Scholar
  6. Pierre Boullier. Range concatenation grammars. In New developments in parsing technology, pages 269-289. Springer, 2004. Google Scholar
  7. Laura Ciobanu, Volker Diekert, and Murray Elder. Solution sets for equations over free groups are EDT0L languages. IJAC, 26(5):843-886, 2016. Google Scholar
  8. Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. Google Scholar
  9. Joel D. Day, Pamela Fleischmann, Florin Manea, Dirk Nowotka, and Markus L. Schmid. On matching generalised repetitive patterns. In Proc. DLT 2018, pages 269-281, 2018. Google Scholar
  10. Joel D. Day, Vijay Ganesh, Paul He, Florin Manea, and Dirk Nowotka. The satisfiability of word equations: Decidable and undecidable theories. In Proc. RP 2018, pages 15-29, 2018. Google Scholar
  11. Joel D. Day and Florin Manea. On the structure of solution sets to regular word equations. In Proc. ICALP 2020, pages 124:1-124:16, 2020. Google Scholar
  12. Joel D. Day, Florin Manea, and Dirk Nowotka. The hardness of solving simple word equations. In Proc. MFCS 2017, pages 18:1-18:14, 2017. Google Scholar
  13. Volker Diekert. Makanin’s Algorithm. In M. Lothaire, editor, Algebraic Combinatorics on Words, chapter 12. Cambridge University Press, 2002. Google Scholar
  14. Volker Diekert. More than 1700 years of word equations. In Proc. CAI 2015, 2015. Google Scholar
  15. Johannes Doleschal, Benny Kimelfeld, Wim Martens, Yoav Nahshon, and Frank Neven. Split-correctness in information extraction. In Proc. PODS 2019, pages 149-163, 2019. Google Scholar
  16. Johannes Doleschal, Benny Kimelfeld, Wim Martens, and Liat Peterfreund. Weight annotation in information extraction. In Proc. ICDT 2020, pages 8:1-8:18, 2020. Google Scholar
  17. V. G. Durnev. Undecidability of the positive ∀∃³-theory of a free semigroup. Siberian Mathematical Journal, 36(5):917-929, 1995. Google Scholar
  18. Heinz-Dieter Ebbinghaus and Jörg Flum. Finite Model Theory. Springer, 2nd edition, 1999. Google Scholar
  19. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2):12:1-12:51, 2015. Google Scholar
  20. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Declarative cleaning of inconsistencies in information extraction. ACM Trans. Database Syst., 41(1):6:1-6:44, 2016. Google Scholar
  21. Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, and Domagoj Vrgoc. Constant delay algorithms for regular document spanners. In Proc. PODS 2018, pages 165-177, 2018. Google Scholar
  22. Dominik D. Freydenberger. A logic for document spanners. Theory Comput. Syst., 63(7):1679-1754, 2019. Google Scholar
  23. Dominik D. Freydenberger and Mario Holldack. Document spanners: From expressive power to decision problems. Theory Comput. Syst., 62:854-898, 2018. Google Scholar
  24. Dominik D. Freydenberger, Benny Kimelfeld, and Liat Peterfreund. Joining extractions of regular expressions. In Proc. PODS 2018, pages 137-149, 2018. Google Scholar
  25. Dominik D. Freydenberger and Sam M. Thompson. Dynamic complexity of document spanners. In Proc. ICDT 2020, pages 11:1-11:21, 2020. Google Scholar
  26. Dominik D. Freydenberger and Sam M. Thompson. Splitting spanner atoms: A tool for acyclic core spanners, 2021. URL:
  27. Seymour Ginsburg and Xiaoyang Sean Wang. Regular sequence operations and their use in database queries. J. Comput. Syst. Sci, 56(1):1-26, 1998. Google Scholar
  28. Gösta Grahne, Matti Nykänen, and Esko Ukkonen. Reasoning about strings in databases. J. Comput. Syst. Sci, 59(1):116-162, 1999. Google Scholar
  29. Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1-12, 2017. Google Scholar
  30. Juris Hartmanis. On Gödel speed-up and succinctness of language representations. Theor. Comput. Sci., 26:335-342, 1983. Google Scholar
  31. Laura Kallmeyer. Parsing Beyond Context-Free Grammars. Cognitive Technologies. Springer, 2010. Google Scholar
  32. Laura Kallmeyer, Wolfgang Maier, and Yannick Parmentier. An earley parsing algorithm for range concatenation grammars. In Proc. ACL-IJCNLP 2009, pages 9-12, 2009. Google Scholar
  33. Juhani Karhumäki, Filippo Mignosi, and Wojciech Plandowski. The expressibility of languages and relations by word equations. J. ACM, 47(3):483-505, 2000. Google Scholar
  34. Phokion G. Kolaitis and Moshe Y. Vardi. Conjunctive-query containment and constraint satisfaction. J. Comput. Syst. Sci, 61(2):302-332, 2000. Google Scholar
  35. Martin Kutrib. The phenomenon of non-recursive trade-offs. Int. J. Found. Comput. Sci., 16(5):957-973, 2005. Google Scholar
  36. Leonid Libkin. Elements of Finite Model Theory. Texts in Theoretical Computer Science. Springer, 2004. Google Scholar
  37. Katja Losemann. Foundations of Regular Languages for Processing RDF and XML. PhD thesis, University of Bayreuth, 2015. URL:
  38. M. Lothaire. Combinatorics on Words. Addison-Wesley, Reading, MA, 1983. Google Scholar
  39. Johann A. Makowsky. Algorithmic uses of the Feferman-Vaught theorem. Annals of Pure and Applied Logic, 126(1-3):159-213, 2004. Google Scholar
  40. Florin Manea and Markus L. Schmid. Matching patterns with variables. In Proc. WORDS 2019, pages 1-27, 2019. Google Scholar
  41. Francisco Maturana, Cristian Riveros, and Domagoj Vrgoc. Document spanners for extracting incomplete information: Expressiveness and complexity. In Proc. PODS 2018, pages 125-136, 2018. Google Scholar
  42. Andrea Morciano, Martin Ugarte, and Stijn Vansummeren. Automata-based evaluation of AQL queries. Technical report, Université Libre de Bruxelles, 2016. Google Scholar
  43. Yoav Nahshon, Liat Peterfreund, and Stijn Vansummeren. Incorporating information extraction in the relational database model. In Proc. WebDB 2016, page 6, 2016. Google Scholar
  44. Dirk Nowotka and Aleksi Saarela. An optimal bound on the solution sets of one-variable word equations and its consequences. In Proc. ICALP 2018, pages 136:1-136:13, 2018. Google Scholar
  45. Liat Peterfreund. Grammars for document spanners. In Proc. ICDT 2021, pages 7:1-7:18, 2021. Google Scholar
  46. Liat Peterfreund, Dominik D. Freydenberger, Benny Kimelfeld, and Markus Kröll. Complexity bounds for relational algebra over document spanners. In Proc. PODS 2019, pages 320-334, 2019. Google Scholar
  47. Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld. Recursive programs for document spanners. In Proc. ICDT 2019, pages 13:1-13:18, 2019. Google Scholar
  48. W. V. Quine. Concatenation as a basis for arithmetic. J. Symb. Log., 11(4):105-114, 1946. Google Scholar
  49. Daniel Reidenbach and Markus L. Schmid. Patterns with bounded treewidth. Inf. Comput., 239:87-99, 2014. Google Scholar
  50. Aleksi Saarela. Hardness results for constant-free pattern languages and word equations. In Proc. ICALP 2020, pages 140:1-140:15, 2020. Google Scholar
  51. Markus L. Schmid. Characterising REGEX languages by regular languages equipped with factor-referencing. Inf. Comput., 249:1-17, 2016. Google Scholar
  52. Markus L. Schmid and Nicole Schweikardt. A purely regular approach to non-regular core spanners. In Proc. ICDT 2021, pages 4:1-4:19, 2021. Google Scholar
  53. Howard Straubing. Finite Automate, Formal Logic, and Circuit Complexity. Birkhäuser, 1994. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail