Regular Expressions with Backreferences and Lookaheads Capture NLOG

Author Yuya Uezato



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2024.155.pdf
  • Filesize: 0.98 MB
  • 20 pages

Document Identifiers

Author Details

Yuya Uezato
  • CyberAgent, Inc., Tokyo, Japan
  • National Institute of Informatics, Tokyo, Japan

Acknowledgements

I thank anonymous reviewers for their detailed comments on a previous version of this paper. I also sincerely thank anonymous reviewers for their careful reading and invaluable comments on the current version. All comments helped me to significantly improve the presentation of this paper and the clarity of proofs and constructions.

Cite AsGet BibTex

Yuya Uezato. Regular Expressions with Backreferences and Lookaheads Capture NLOG. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 155:1-155:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICALP.2024.155

Abstract

Backreferences and lookaheads are vital features to make classical regular expressions (REGEX) practical. Although these features have been widely used, understanding of the unrestricted combination of them has been limited. Practically, most likely, no implementation fully supports them. Theoretically, while some studies have addressed these features separately, few have dared to combine them. Those few studies showed that the amalgamation of these features significantly enhances the expressiveness of REGEX. However, no acceptable expressivity bound for REWBLk - REGEX with backreferences and lookaheads - has been established. We elucidate this by establishing that REWBLk coincides with NLOG, the class of languages accepted by log-space nondeterministic Turing machines (NTMs). In translating REWBLk to log-space NTMs, negative lookaheads are the most challenging part since it essentially requires complementing log-space NTMs in nondeterministic log-space. To address this problem, we revisit Immerman-Szelepcsényi theorem. In addition, we employ log-space nested-oracles NTMs to naturally handle nested lookaheads of REWBLk. Utilizing such oracle machines, we also present the new result that the membership problem of REWBLk is PSPACE-complete.

Subject Classification

ACM Subject Classification
  • Theory of computation → Formal languages and automata theory
  • Theory of computation → Complexity classes
Keywords
  • Regular Expression
  • Automata Theory
  • Nondeterministic Log-Space

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. V. Aho. Indexed grammars - an extension of context free grammars. J. ACM, 15(4):647-671, 1968. Google Scholar
  2. A. V. Aho. Algorithms for finding patterns in strings. In J. V. Leeuwen, editor, Algorithms and Complexity, Handbook of Theoretical Computer Science, pages 255-300. Elsevier, 1990. Google Scholar
  3. S. Arora and B. Barak. Computational Complexity: A Modern Approach. Cambridge University Press, 2009. Google Scholar
  4. M. Berglund and B. van der Merwe. Re-examining regular expressions with backreferences. Theoretical Computer Science, 940:66-80, 2023. Google Scholar
  5. C. Câmpaanu, K. Salomaa, and S. Yu. A formal study of practical regular expressions. International Journal of Foundations of Computer Science, 14(06):1007-1018, 2003. Google Scholar
  6. A. Chandra, D. Kozen, and L. Stockmeyer. Alternation. J. ACM, 28(1):114-133, 1981. Google Scholar
  7. N. Chida and T. Terauchi. On lookaheads in regular expressions with backreferences. In FSCD 2022, volume 228, pages 15:1-15:18. Schloss Dagstuhl, 2022. Google Scholar
  8. N. Chida and T. Terauchi. On lookaheads in regular expressions with backreferences. IEICE Transactions on Information and Systems, E106.D(5):959-975, 2023. Google Scholar
  9. ECMAScript community. Ecmascript 2023 language specification. URL: https://262.ecma-international.org/14.0/#sec-runtime-semantics-repeatmatcher-abstract-operation.
  10. D. D. Freydenberger and M. L. Schmid. Deterministic regular expressions with back-references. Journal of Computer and System Sciences, 105:1-39, 2019. Google Scholar
  11. R. H. Gilman. A shrinking lemma for indexed languages. Theoretical Computer Science, 163(1):277-281, 1996. Google Scholar
  12. J. Hartmanis. On non-determinancy in simple computing devices. Acta Informatica, 1(4):336-344, 1972. Google Scholar
  13. J. Hartmanis and S. Mahaney. Languages simultaneously complete for One-way and Two-way log-tape automata. SIAM Journal on Computing, 10(2):383-390, 1981. Google Scholar
  14. T. Hayashi. On derivation trees of indexed grammars - an extension of the uvwxy-theorem - . Publications of the Research Institute for Mathematical Sciences, 9(1):61-92, 1973. Google Scholar
  15. M. Holzer, M. Kutrib, and A. Malcher. Complexity of multi-head finite automata: Origins and directions. Theoretical Computer Science, 412(1):83-96, 2011. Google Scholar
  16. N. Immerman. Nondeterministic space is closed under complementation. SIAM Journal on Computing, 17(5):935-938, 1988. Google Scholar
  17. N. Immerman. Descriptive Complexity. Springer Verlag, 1998. Google Scholar
  18. T. Jiang and B. Ravikumar. A note on the space complexity of some decision problems for finite automata. Information Processing Letters, 40(1):25-31, 1991. Google Scholar
  19. K.-J. Lange, B. Jenner, and B. Kirsig. The logarithmic alternation hierarchy collapses: AΣ^ℒ₂ = AΠ^ℒ₂. In ICALP 87, pages 531-541. Springer, 1987. Google Scholar
  20. T. Nogami and T. Terauchi. On the expressive power of regular expressions with backreferences. In MFCS 2023, volume 272, pages 71:1-71:15. Schloss Dagstuhl, 2023. Google Scholar
  21. C. H. Papadimitriou. Computational complexity. Addison-Wesley, 1994. Google Scholar
  22. W. C. Rounds. Complexity of recognition in intermediate level languages. In SWAT'73, pages 145-158, 1973. Google Scholar
  23. M. L. Schmid. Inside the class of regex languages. International Journal of Foundations of Computer Science, 24(07):1117-1134, 2013. Google Scholar
  24. M. L. Schmid. Characterising regex languages by regular languages equipped with factor-referencing. Information and Computation, 249:1-17, 2016. Google Scholar
  25. U. Schöning and K. W. Wagner. Collapsing oracle hierarchies, census functions and logarithmically many queries. In STACS 88, pages 91-97. Springer, 1988. Google Scholar
  26. M. Sipser. Introduction to the theory of computation. Cengage Learning, third edition, 2013. Google Scholar
  27. R. Szelepcsényi. The method of forced enumeration for nondeterministic automata. Acta Informatica, 26(3):279-284, 1988. Google Scholar