We first review how we can store a run-length compressed suffix array (RLCSA) for a text T of length n over an alphabet of size σ whose Burrows-Wheeler Transform (BWT) consists of r runs in O (r log (n / r) + r log σ + σ) bits such that later, given character a and the suffix-array (SA) interval for P, we can find the SA interval for a P in O (log r_a + log log n) time, where r_a is the number of runs of copies of a in the BWT. We then show how to modify the RLCSA such that we find the SA interval for a P in only O (log r_a) time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. We also review two-level indexing and discuss how our faster RLCSA may be useful in improving it. Finally, we briefly discuss how two-level indexing may speed up a recent heuristic for finding maximal exact matches of a pattern with respect to an indexed text.
@InProceedings{brown_et_al:OASIcs.Grossi.10, author = {Brown, Nathaniel K. and Gagie, Travis and Manzini, Giovanni and Navarro, Gonzalo and Sciortino, Marinella}, title = {{Faster Run-Length Compressed Suffix Arrays}}, booktitle = {From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday}, pages = {10:1--10:15}, series = {Open Access Series in Informatics (OASIcs)}, ISBN = {978-3-95977-391-1}, ISSN = {2190-6807}, year = {2025}, volume = {132}, editor = {Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.10}, URN = {urn:nbn:de:0030-drops-238095}, doi = {10.4230/OASIcs.Grossi.10}, annote = {Keywords: Run-length compressed suffix arrays, interpolative coding, two-level indexing} }
Feedback for Dagstuhl Publishing