,
Travis Gagie
,
Giovanni Manzini
,
Gonzalo Navarro
,
Marinella Sciortino
Creative Commons Attribution 4.0 International license
We first review how we can store a run-length compressed suffix array (RLCSA) for a text T of length n over an alphabet of size σ whose Burrows-Wheeler Transform (BWT) consists of r runs in O (r log (n / r) + r log σ + σ) bits such that later, given character a and the suffix-array (SA) interval for P, we can find the SA interval for a P in O (log r_a + log log n) time, where r_a is the number of runs of copies of a in the BWT. We then show how to modify the RLCSA such that we find the SA interval for a P in only O (log r_a) time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. We also review two-level indexing and discuss how our faster RLCSA may be useful in improving it. Finally, we briefly discuss how two-level indexing may speed up a recent heuristic for finding maximal exact matches of a pattern with respect to an indexed text.
@InProceedings{brown_et_al:OASIcs.Grossi.10,
author = {Brown, Nathaniel K. and Gagie, Travis and Manzini, Giovanni and Navarro, Gonzalo and Sciortino, Marinella},
title = {{Faster Run-Length Compressed Suffix Arrays}},
booktitle = {From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
pages = {10:1--10:15},
series = {Open Access Series in Informatics (OASIcs)},
ISBN = {978-3-95977-391-1},
ISSN = {2190-6807},
year = {2025},
volume = {132},
editor = {Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.10},
URN = {urn:nbn:de:0030-drops-238095},
doi = {10.4230/OASIcs.Grossi.10},
annote = {Keywords: Run-length compressed suffix arrays, interpolative coding, two-level indexing}
}