eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2023-06-21
24:1
24:17
10.4230/LIPIcs.CPM.2023.24
article
Computing MEMs on Repetitive Text Collections
Navarro, Gonzalo
1
2
Center for Biotechnology and Bioengineering (CeBiB), Santiago, Chile
Department of Computer Science, University of Chile, Santiago, Chile
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1..m] on a large repetitive text collection T[1..n], which is represented as a (hopefully much smaller) run-length context-free grammar of size g_{rl}. We show that the problem can be solved in time O(m² log^ε n), for any constant ε > 0, on a data structure of size O(g_{rl}). Further, on a locally consistent grammar of size O(δ log n/δ), the time decreases to O(m log m(log m + log^ε n)). The value δ is a function of the substring complexity of T and Ω(δ log n/δ) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and δ.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol259-cpm2023/LIPIcs.CPM.2023.24/LIPIcs.CPM.2023.24.pdf
grammar-based indices
maximal exact matches
locally consistent grammars
substring complexity