On Lookaheads in Regular Expressions with Backreferences

Authors Nariyoshi Chida , Tachio Terauchi



PDF
Thumbnail PDF

File

LIPIcs.FSCD.2022.15.pdf
  • Filesize: 0.95 MB
  • 18 pages

Document Identifiers

Author Details

Nariyoshi Chida
  • NTT Corporation, Tokyo, Japan
  • Waseda University, Tokyo, Japan
Tachio Terauchi
  • Waseda University, Tokyo, Japan

Cite As Get BibTex

Nariyoshi Chida and Tachio Terauchi. On Lookaheads in Regular Expressions with Backreferences. In 7th International Conference on Formal Structures for Computation and Deduction (FSCD 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 228, pp. 15:1-15:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.FSCD.2022.15

Abstract

Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewbl_p and the latter as rewbl_n). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid [Markus L. Schmid, 2016] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewbl_p or rewbl_n. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewbl_p. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewbl_p.

Subject Classification

ACM Subject Classification
  • Theory of computation → Formal languages and automata theory
Keywords
  • Regular expressions
  • Lookaheads
  • Backreferences
  • Memory automata

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho. Chapter 5 - algorithms for finding patterns in strings. In Jan van Leeuwen, editor, Algorithms and Complexity, Handbook of Theoretical Computer Science, pages 255-300. Elsevier, Amsterdam, 1990. URL: https://doi.org/10.1016/B978-0-444-88071-0.50010-2.
  2. Martin Berglund and Brink van der Merwe. Regular expressions with backreferences re-examined. In Jan Holub and Jan Zdárek, editors, Proceedings of the Prague Stringology Conference 2017, Prague, Czech Republic, August 28-30, 2017, pages 30-41. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2017. URL: http://www.stringology.org/event/2017/p04.html.
  3. Martin Berglund, Brink van der Merwe, and Steyn van Litsenborgh. Regular expressions with lookahead. J. Univers. Comput. Sci., 27(1):324-340, 2021. URL: https://doi.org/10.3897/jucs.66330.
  4. Janusz A. Brzozowski and Ernst L. Leiss. On equations for regular languages, finite automata, and sequential networks. Theor. Comput. Sci., 10:19-35, 1980. URL: http://dblp.uni-trier.de/db/journals/tcs/tcs10.html#BrzozowskiL80.
  5. Cezar Câmpeanu, Kai Salomaa, and Sheng Yu. A formal study of practical regular expressions. International Journal of Foundations of Computer Science, 14(06):1007-1018, 2003. URL: https://doi.org/10.1142/S012905410300214X.
  6. Benjamin Carle and Paliath Narendran. On extended regular expressions. In Adrian-Horia Dediu, Armand-Mihai Ionescu, and Carlos Martín-Vide, editors, Language and Automata Theory and Applications, Third International Conference, LATA 2009, Tarragona, Spain, April 2-8, 2009. Proceedings, volume 5457 of Lecture Notes in Computer Science, pages 279-289. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-00982-2_24.
  7. Ashok K. Chandra and Larry J. Stockmeyer. Alternation. In 17th Annual Symposium on Foundations of Computer Science (sfcs 1976), pages 98-108, 1976. URL: https://doi.org/10.1109/SFCS.1976.4.
  8. N. Chida and T. Terauchi. Repairing dos vulnerability of real-world regexes. In Proceedings of the 43rd IEEE Symposium on Security and Privacy, SP 2022, pages 1049-1066, Los Alamitos, CA, USA, May 2022. IEEE Computer Society. URL: https://doi.org/10.1109/SP46214.2022.00061.
  9. S. C. Kleene. Representation of events in nerve nets and finite automata. In Automata Studies. 1956. Google Scholar
  10. Vladimir Komendantsky. Matching problem for regular expressions with variables. In Hans-Wolfgang Loidl and Ricardo Peña, editors, Trends in Functional Programming - 13th International Symposium, TFP 2012, St. Andrews, UK, June 12-14, 2012, Revised Selected Papers, volume 7829 of Lecture Notes in Computer Science, pages 149-166. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-40447-4_10.
  11. Blake Loring, Duncan Mitchell, and Johannes Kinder. Sound regular expression semantics for dynamic symbolic execution of javascript. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, pages 425-438, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3314221.3314645.
  12. Akimasa Morihata. Translation of regular expression with lookahead into finite state automaton. Computer Software, 29(1):1_147-1_158, 2012. URL: https://doi.org/10.11309/jssst.29.1_147.
  13. Jacques Sakarovitch. Elements of Automata Theory. Cambridge University Press, USA, 2009. Google Scholar
  14. Markus L. Schmid. Characterising REGEX languages by regular languages equipped with factor-referencing. Inf. Comput., 249:1-17, 2016. URL: https://doi.org/10.1016/j.ic.2016.02.003.
  15. Ken Thompson. Programming techniques: Regular expression search algorithm. Commun. ACM, 11(6):419-422, June 1968. URL: https://doi.org/10.1145/363347.363387.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail