Deterministic Regular Expressions with Back-References

Authors Dominik D. Freydenberger, Markus L. Schmid

Thumbnail PDF


  • Filesize: 0.55 MB
  • 14 pages

Document Identifiers

Author Details

Dominik D. Freydenberger
Markus L. Schmid

Cite AsGet BibTex

Dominik D. Freydenberger and Markus L. Schmid. Deterministic Regular Expressions with Back-References. In 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 66, pp. 33:1-33:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Most modern libraries for regular expression matching allow back-references (i.e. repetition operators) that substantially increase expressive power, but also lead to intractability. In order to find a better balance between expressiveness and tractability, we combine these with the notion of determinism for regular expressions used in XML DTDs and XML Schema. This includes the definition of a suitable automaton model, and a generalization of the Glushkov construction.
  • Deterministic Regular Expression
  • Regex
  • Glushkov Automaton


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Abigail. Re: Random number in perl. Posting in the newsgroup comp.lang.perl.misc, October 1997. Message-ID Google Scholar
  2. Alfred V. Aho. Algorithms for finding patterns in strings. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, volume A, chapter 5, pages 255-300. Elsevier, Amsterdam, 1990. Google Scholar
  3. Dana Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21:46-62, 1980. Google Scholar
  4. Pablo Barceló, Carlos A. Hurtado, Leonid Libkin, and Peter T. Wood. Expressive languages for path queries over graph-structured data. In Proc. PODS 2010, 2010. Google Scholar
  5. Geert Jan Bex, Wouter Gelade, Frank Neven, and Stijn Vansummeren. Learning deterministic regular expressions for the inference of schemas from XML data. ACM Trans. Web, 4(4):14, 2010. Google Scholar
  6. Martin Braun. moar - Deterministic Regular Expressions with Backreferences, 2016. Accessed December 2016. URL:
  7. Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau. Extensible markup language XML 1.0 (fifth edition). W3C recommendation. Technical Report, W3C, November 2008.
  8. Anne Brüggemann-Klein. Regular expressions into finite automata. Theor. Comput. Sci., 120(2):197-213, 1993. Google Scholar
  9. Anne Brüggemann-Klein and Derick Wood. One-unambiguous regular languages. Inf. Comput., 142(2):182-206, 1998. Google Scholar
  10. Cezar Câmpeanu, Kai Salomaa, and Sheng Yu. A formal study of practical regular expressions. Int. J. Found. Comput. Sci., 14:1007-1018, 2003. Google Scholar
  11. Benjamin Carle and Paliath Narendran. On extended regular expressions. In Proc. LATA 2009, 2009. Google Scholar
  12. Wojciech Czerwinski, Claire David, Katja Losemann, and Wim Martens. Deciding definability by deterministic regular expressions. In Proc. FOSSACS 2013, pages 289-304, 2013. Google Scholar
  13. Volker Diekert. Makanin’s Algorithm. In Algebraic Combinatorics on Words [29], chapter 12. Google Scholar
  14. Volker Diekert, Artur Jeż, and Wojciech Plandowski. Finding all solutions of equations in free groups and monoids with involution. In Proc. CSR 2014, 2014. Google Scholar
  15. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2):12, 2015. Google Scholar
  16. Henning Fernau and Markus L. Schmid. Pattern matching with variables: A multivariate complexity analysis. Inform. Comput., 242:287-305, 2015. Google Scholar
  17. Henning Fernau, Markus L. Schmid, and Yngve Villanger. On the parameterised complexity of string morphism problems. Theory Comput. Syst., 59(1):24-51, 2016. Google Scholar
  18. Dominik D. Freydenberger. Extended regular expressions: Succinctness and decidability. Theory Comput. Sys., 53(2):159-193, 2013. Google Scholar
  19. Dominik D. Freydenberger. A logic for document spanners. In Proc. ICDT 2017, 2017. Accepted. Available at URL:
  20. Dominik D. Freydenberger and Mario Holldack. Document spanners: From expressive power to decision problems. In Proc. ICDT 2016, 2016. Google Scholar
  21. Dominik D. Freydenberger and Markus L. Schmid. Deterministic regular expressions with back-references. A version of this paper that also includes the Appendix. URL:
  22. Shudi (Sandy) Gao, C. M. Sperberg-McQueen, and Henry S. Thompson. W3C XML schema definition language (XSD) 1.1 part 1: Structures. Technical Report, W3C, April 2012.
  23. Wouter Gelade, Marc Gyssens, and Wim Martens. Regular expressions with counting: Weak versus strong determinism. SIAM J. Comput., 41(1):160-190, 2012. Google Scholar
  24. Benoît Groz, Sebastian Maneth, and Slawek Staworko. Deterministic regular expressions in linear time. In Proc. PODS 2012, 2012. Google Scholar
  25. John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979. Google Scholar
  26. S. C. Kleene. Representation of events in nerve nets and finite automata. In C. E. Shannon, J. McCarthy, and W. R. Ashby, editors, Automata Studies, pages 3-42. Princeton University Press, Princeton, NJ, 1956. Google Scholar
  27. Markus Latte and Matthias Niewerth. Definability by weakly deterministic regular expressions with counters is decidable. In Proc. MFCS 2015, 2015. Google Scholar
  28. Katja Losemann, Wim Martens, and Matthias Niewerth. Closure properties and descriptional complexity of deterministic regular expressions. Theor. Comput. Sci., 627:54-70, 2016. Google Scholar
  29. M. Lothaire. Algebraic Combinatorics on Words, volume 90 of Encyclopedia of mathematics and its applications. Cambridge University Press, 2002. Google Scholar
  30. Ping Lu, Joachim Bremer, and Haiming Chen. Deciding determinism of regular languages. Theory Comput. Syst., 57(1):97-139, 2015. Google Scholar
  31. Wim Martens, Frank Neven, and Thomas Schwentick. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput., 39(4):1486-1530, 2009. Google Scholar
  32. Makoto Murata, Dongwon Lee, Murali Mani, and Kohsuke Kawaguchi. Taxonomy of XML schema languages using formal language theory. ACM TOIT, 5(4):660-704, 2005. Google Scholar
  33. Matthias Niewerth. Data Definition Languages for XML Repository Management Systems. PhD thesis, TU Dortmund, 2015. URL:
  34. Jean-Luc Ponty, Djelloul Ziadi, and Jean-Marc Champarnaud. A new quadratic algorithm to convert a regular expression into an automaton. In Proc. WIA'96, 1996. Google Scholar
  35. Markus L. Schmid. Characterising REGEX languages by regular languages equipped with factor-referencing. Inform. Comput., 249:1-17, 2016. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail