Search Results

Documents authored by Uezato, Yuya


Document
Track A: Algorithms, Complexity and Games
On the Complexity of the Matching Problem of Regular Expressions with Backreferences

Authors: Soh Kumabe and Yuya Uezato

Published in: LIPIcs, Volume 374, 53rd International Colloquium on Automata, Languages, and Programming (ICALP 2026)


Abstract
Regular Expression Denial of Service (ReDoS) is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines has been exploited in successful attacks, as exemplified by major outages at Stack Overflow (2016) and Cloudflare (2019). These incidents motivate a fundamental question: Is it possible to construct matching engines that run in linear or near-linear time in the length of the input string? For classical regular expressions (REGEX), Thompson’s construction yields a linear-time algorithm for fixed expressions. However, practical engines support powerful features such as backreferences, which allow capturing a substring and reusing it later. This feature strictly extends the expressive power of REGEX but unfortunately increases the risk of ReDoS attacks. This paper investigates the fine-grained complexity of the string matching problem for regular expressions with backreferences (REWBs). Specifically, we consider r-use k-REWBs, i.e., REWBs with k variables such that, in any computation, the total number of backreference executions is at most r. On the hardness side, we show that the string matching problem for k-REWBs cannot be solved in O(n^{2k-ε}) time for any ε > 0 under the Strong Exponential Time Hypothesis (SETH), where n is the length of the input string. We also prove that this problem is W[2]-hard when parameterized by the length of the REWB expression, strengthening the previous W[1]-hardness result. Moreover, we prove that this problem for 2-use 2-REWBs cannot be solved in n^{1+o(1)} time unless the triangle detection problem can be solved in that time. On the algorithmic side, we present an O(n log² n)-time algorithm for 1-use REWBs. In particular, we focus on the ABCBD problem, which is the REWB matching problem for the form A(B)_xC∖xD where A, B, C, and D are fixed REGEXes. We also show that every 1-use REWB can be transformed into this canonical form. Our algorithm significantly improves upon the recent O(n²)-time algorithm for the ABCBD problem by Nogami and Terauchi (MFCS, 2025). Our algorithm is highly nontrivial and employs several techniques, including suffix trees, transition monoids of REGEXes, factorization forest data structures, and periodicity of strings.

Cite as

Soh Kumabe and Yuya Uezato. On the Complexity of the Matching Problem of Regular Expressions with Backreferences. In 53rd International Colloquium on Automata, Languages, and Programming (ICALP 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 374, pp. 135:1-135:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)


Copy BibTex To Clipboard

@InProceedings{kumabe_et_al:LIPIcs.ICALP.2026.135,
  author =	{Kumabe, Soh and Uezato, Yuya},
  title =	{{On the Complexity of the Matching Problem of Regular Expressions with Backreferences}},
  booktitle =	{53rd International Colloquium on Automata, Languages, and Programming (ICALP 2026)},
  pages =	{135:1--135:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-428-4},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{374},
  editor =	{Bhattacharya, Sayan and Nanongkai, Danupon and Benedikt, Michael and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2026.135},
  URN =		{urn:nbn:de:0030-drops-265241},
  doi =		{10.4230/LIPIcs.ICALP.2026.135},
  annote =	{Keywords: Pattern Matching, Regular Expression, Backreference, Fine-grained Complexity}
}
Document
Matching Regular-Typed Pattern Languages: Quadratic-Time Algorithms

Authors: Yuya Uezato

Published in: LIPIcs, Volume 369, 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)


Abstract
Pattern languages (PAT) are a class of languages generated by expressions called patterns that may contain variables. In a pattern, each variable can be instantiated with an arbitrary string. Typed pattern languages extend PAT by associating a type (constraint) with each variable that restricts the domain of allowed substitutions. In this paper, we study regular-typed PAT (PATwRT), where all types are represented either by a regular expression or by an ε-NFA. We consider the PATwRT matching problem for patterns with a single repeated variable of the form P = α₁ β α₂ β ⋯ β α_K. We present simple algorithms whose running time is linear in K and quadratic in the input length N, with polynomial dependence on the sizes of the type representations. Our results extend previous quadratic-time work in two directions: (1) the quadratic-time algorithm for untyped PAT of Fernau et al. (STACS 2015), and (2) the quadratic-time algorithm for the restricted PATwRT K = 3, i.e., α₁ β α₂ β α₃ of Nogami and Terauchi (MFCS 2025).

Cite as

Yuya Uezato. Matching Regular-Typed Pattern Languages: Quadratic-Time Algorithms. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)


Copy BibTex To Clipboard

@InProceedings{uezato:LIPIcs.CPM.2026.11,
  author =	{Uezato, Yuya},
  title =	{{Matching Regular-Typed Pattern Languages: Quadratic-Time Algorithms}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.11},
  URN =		{urn:nbn:de:0030-drops-259374},
  doi =		{10.4230/LIPIcs.CPM.2026.11},
  annote =	{Keywords: Pattern languages, Regular expressions, String algorithms}
}
Document
Track B: Automata, Logic, Semantics, and Theory of Programming
Regular Expressions with Backreferences and Lookaheads Capture NLOG

Authors: Yuya Uezato

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
Backreferences and lookaheads are vital features to make classical regular expressions (REGEX) practical. Although these features have been widely used, understanding of the unrestricted combination of them has been limited. Practically, most likely, no implementation fully supports them. Theoretically, while some studies have addressed these features separately, few have dared to combine them. Those few studies showed that the amalgamation of these features significantly enhances the expressiveness of REGEX. However, no acceptable expressivity bound for REWBLk - REGEX with backreferences and lookaheads - has been established. We elucidate this by establishing that REWBLk coincides with NLOG, the class of languages accepted by log-space nondeterministic Turing machines (NTMs). In translating REWBLk to log-space NTMs, negative lookaheads are the most challenging part since it essentially requires complementing log-space NTMs in nondeterministic log-space. To address this problem, we revisit Immerman-Szelepcsényi theorem. In addition, we employ log-space nested-oracles NTMs to naturally handle nested lookaheads of REWBLk. Utilizing such oracle machines, we also present the new result that the membership problem of REWBLk is PSPACE-complete.

Cite as

Yuya Uezato. Regular Expressions with Backreferences and Lookaheads Capture NLOG. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 155:1-155:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{uezato:LIPIcs.ICALP.2024.155,
  author =	{Uezato, Yuya},
  title =	{{Regular Expressions with Backreferences and Lookaheads Capture NLOG}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{155:1--155:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.155},
  URN =		{urn:nbn:de:0030-drops-202984},
  doi =		{10.4230/LIPIcs.ICALP.2024.155},
  annote =	{Keywords: Regular Expression, Automata Theory, Nondeterministic Log-Space}
}
Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail