,
Yuya Uezato
Creative Commons Attribution 4.0 International license
Regular Expression Denial of Service (ReDoS) is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines has been exploited in successful attacks, as exemplified by major outages at Stack Overflow (2016) and Cloudflare (2019). These incidents motivate a fundamental question: Is it possible to construct matching engines that run in linear or near-linear time in the length of the input string? For classical regular expressions (REGEX), Thompson’s construction yields a linear-time algorithm for fixed expressions. However, practical engines support powerful features such as backreferences, which allow capturing a substring and reusing it later. This feature strictly extends the expressive power of REGEX but unfortunately increases the risk of ReDoS attacks.
This paper investigates the fine-grained complexity of the string matching problem for regular expressions with backreferences (REWBs). Specifically, we consider r-use k-REWBs, i.e., REWBs with k variables such that, in any computation, the total number of backreference executions is at most r. On the hardness side, we show that the string matching problem for k-REWBs cannot be solved in O(n^{2k-ε}) time for any ε > 0 under the Strong Exponential Time Hypothesis (SETH), where n is the length of the input string. We also prove that this problem is W[2]-hard when parameterized by the length of the REWB expression, strengthening the previous W[1]-hardness result. Moreover, we prove that this problem for 2-use 2-REWBs cannot be solved in n^{1+o(1)} time unless the triangle detection problem can be solved in that time. On the algorithmic side, we present an O(n log² n)-time algorithm for 1-use REWBs. In particular, we focus on the ABCBD problem, which is the REWB matching problem for the form A(B)_xC∖xD where A, B, C, and D are fixed REGEXes. We also show that every 1-use REWB can be transformed into this canonical form. Our algorithm significantly improves upon the recent O(n²)-time algorithm for the ABCBD problem by Nogami and Terauchi (MFCS, 2025). Our algorithm is highly nontrivial and employs several techniques, including suffix trees, transition monoids of REGEXes, factorization forest data structures, and periodicity of strings.
@InProceedings{kumabe_et_al:LIPIcs.ICALP.2026.135,
author = {Kumabe, Soh and Uezato, Yuya},
title = {{On the Complexity of the Matching Problem of Regular Expressions with Backreferences}},
booktitle = {53rd International Colloquium on Automata, Languages, and Programming (ICALP 2026)},
pages = {135:1--135:25},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-428-4},
ISSN = {1868-8969},
year = {2026},
volume = {374},
editor = {Bhattacharya, Sayan and Nanongkai, Danupon and Benedikt, Michael and Puppis, Gabriele},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2026.135},
URN = {urn:nbn:de:0030-drops-265241},
doi = {10.4230/LIPIcs.ICALP.2026.135},
annote = {Keywords: Pattern Matching, Regular Expression, Backreference, Fine-grained Complexity}
}