,
Tomohiro I
Creative Commons Attribution 4.0 International license
A repeat is a substring that occurs at least twice in a string, and is called a maximal repeat if it cannot be extended outwards without reducing its frequency. Nishimoto and Tabei [CPM, 2021] proposed r-enum, an algorithm to enumerate various characteristic substrings, including maximal repeats, in a string T of length n in O(r) words of compressed working space, where r ≤ n is the number of runs in the Burrows-Wheeler transform (BWT) of T. Given the run-length encoded BWT (RLBWT) of T, r-enum runs in O(n log log_w (n/r)) time in addition to the time linear to the number of output strings, where w = Θ(log n) is the word size. In this paper, we first improve the O(n log log_w (n/r)) term to O(n). We next extend r-enum to compute other context-sensitive repeats such as near-supermaximal repeats (NSMRs) and supermaximal repeats, as well as the context diversity for every maximal repeat in the same complexities. Furthermore, we study net occurrences: An occurrence of a repeat is called a net occurrence if it is not covered by another repeat, and the net frequency of a repeat is the number of its net occurrences. With this terminology, an NSMR is a repeat with a positive net frequency. Given the RLBWT of T, we show how to compute the set 𝒮^{nsmr} of all NSMRs in T together with their net frequency/occurrences in O(n) time and O(r) space. We also show that an O(r)-space data structure can be built from the RLBWT to compute the net frequency/occurrences of any pattern in optimal time. The data structure is built in O(r) space and in O(n) time with high probability or deterministic O(n + |𝒮^{nsmr}| log log min(σ, |𝒮^{nsmr}|)) time, where σ ≤ r is the alphabet size of T. To achieve this, we prove that the total number of net occurrences is less than 2r. With the duality between net occurrences and minimal unique substrings (MUSs), we get a new upper bound 2r of the number of MUSs in T, which may be of independent interest.
@InProceedings{kimura_et_al:LIPIcs.CPM.2026.10,
author = {Kimura, Kotaro and I, Tomohiro},
title = {{R-Enum Revisited: Speedup and Extension for Context-Sensitive Repeats and Net Frequencies}},
booktitle = {37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
pages = {10:1--10:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-420-8},
ISSN = {1868-8969},
year = {2026},
volume = {369},
editor = {Bille, Philip and Prezza, Nicola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.10},
URN = {urn:nbn:de:0030-drops-259361},
doi = {10.4230/LIPIcs.CPM.2026.10},
annote = {Keywords: Supermaximal repeats, Largest maximal repeats, Net frequencies, Run-length Burrows-Wheeler transform, Compressed data mining}
}