Move-r: Optimizing the r-index

Bertram, Nico; Fischer, Johannes; Nalbach, Lukas

doi:10.4230/LIPIcs.SEA.2024.1

File

Author Details

Nico Bertram

Technische Universität Dortmund, Germany

Johannes Fischer

Technische Universität Dortmund, Germany

Lukas Nalbach

Technische Universität Dortmund, Germany

Cite AsGet BibTex

Nico Bertram, Johannes Fischer, and Lukas Nalbach. Move-r: Optimizing the r-index. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 1:1-1:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SEA.2024.1

Abstract

We present a static text index called Move-r, which is a highly optimized version of the r-index ([Travis Gagie et al., 2020] Gagie et al., 2020) that encorporates recent theoretical developments of the move data structure ([Takaaki Nishimoto and Yasuo Tabei, 2021] Nishimoto and Tabei, 2021). The r-index is the method of choice for indexing highly repetitive texts, such as different versions of a text document or DNA from the same species, as it exploits the compressibilty of the underlying data. With Move-r, we can answer count- and locate queries 2-35 (typically 15) times as fast as with any other r-index supporting locate queries while being 0.8-2.5 (typically 2) times as large. A Move-r index can be constructed 0.9-2 (typically 2) times as fast while using 1/3-1 (typically 1/2) times as much space.

Subject Classification

ACM Subject Classification

Theory of computation → Pattern matching

Keywords

Compressed Text Index
Burrows-Wheeler Transform

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. Engineering in-place (shared-memory) sorting algorithms. ACM Transactions on Parallel Computing, 9(1):2:1-2:62, 2022.
Hideo Bannai, Travis Gagie, and Tomohiro I. Refining the r-index. Theoretical Computer Science, 812:96-108, 2020.
Djamal Belazzougui and Gonzalo Navarro. Optimal lower and upper bounds for representing sequences. ACM Transactions on Algorithms, 11(4):31:1-31:21, 2015.
Christina Boucher, Travis Gagie, Alan Kuhnle, and Giovanni Manzini. Prefix-free parsing for building big bwts. In 18th International Workshop on Algorithms in Bioinformatics WABI, pages 2:1-2:16, 2018.
Nathaniel K. Brown, Travis Gagie, and Massimiliano Rossi. RLBWT tricks. In 20th International Symposium on Experimental Algorithms SEA, pages 16:1-16:16, 2022.
Michael Burrows and David J. Wheeler. A block-sorting lossless data compression algorithm. In Technical Report DEC, volume 124, 1994.
David Richard Clark. Compact Pat Trees. PhD thesis, University of Waterloo, CAN, 1998.
Diego Díaz-Domínguez, Saska Dönges, Simon J. Puglisi, and Leena Salmela. Simple runs-bounded fm-index designs are fast. In 21st International Symposium on Experimental Algorithms SEA, pages 7:1-7:16, 2023.
Diego Díaz-Domínguez and Gonzalo Navarro. Efficient construction of the BWT for repetitive text using string compression. In 33rd Annual Symposium on Combinatorial Pattern Matching CPM, pages 29:1-29:18, 2022.
Paolo Ferragina and Giovanni Manzini. Indexing compressed text. Journal of the ACM, 52(4):552-581, 2005.
Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Fully functional suffix trees and optimal text searching in bwt-runs bounded space. Journal of the ACM, 67(1):2:1-2:54, 2020.
Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In 13th International Symposium on Experimental Algorithms SEA, pages 326-337, 2014.
Torben Hagerup. Sorting and searching on the word RAM. In 15th Annual Symposium on Theoretical Aspects of Computer Science STACS, Lecture Notes in Computer Science, pages 366-398, 1998.
Udi Manber and Gene Myers. Suffix arrays: A new method for on-line string searches. In 1st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 319-327, 1990.
Taher Mun, Alan Kuhnle, Christina Boucher, Travis Gagie, Ben Langmead, and Giovanni Manzini. Matching reads to many genomes with the r-index. Journal of Computational Biology, 27(4):514-518, 2020.
Gonzalo Navarro. Compact Data Structures - A Practical Approach. Cambridge University Press, 2016.
Gonzalo Navarro. Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Computing Surveys, 54(2):29:1-29:31, 2022.
Takaaki Nishimoto, Shunsuke Kanda, and Yasuo Tabei. An optimal-time RLBWT construction in bwt-runs bounded space. In 49th International Colloquium on Automata, Languages, and Programming ICALP, pages 99:1-99:20, 2022.
Takaaki Nishimoto and Yasuo Tabei. Optimal-time queries on bwt-runs compressed indexes. In 48th International Colloquium on Automata, Languages, and Programming ICALP, pages 101:1-101:15, 2021.
Daisuke Okanohara and Kunihiko Sadakane. Practical entropy-compressed rank/select dictionary. In 9th Workshop on Algorithm Engineering and Experiments ALENEX, pages 60-70, 2007.
Anna Pagh, Rasmus Pagh, and Milan Ruzic. Linear probing with 5-wise independence. SIAM Review, 53(3):547-558, 2011.
Nicola Prezza. A framework of dynamic data structures for string processing. In 16th International Symposium on Experimental Algorithms SEA, pages 11:1-11:15, 2017.
Sebastiano Vigna. Broadword implementation of rank/select queries. In 7th International Workshop on Experimental Algorithms WEA, pages 154-168, 2008.

Move-r: Optimizing the r-index

Authors Nico Bertram, Johannes Fischer , Lukas Nalbach

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Move-r: Optimizing the r-index

Authors Nico Bertram, Johannes Fischer , Lukas Nalbach

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message