Relative Compressed Reverse Suffix Array

Kulekci, Muhammed Oguzhan; Parthasarathi, Mano Prakash; Shah, Rahul; Thankachan, Sharma V.

doi:10.4230/LIPIcs.STACS.2026.62

Abstract

Suffix trees and suffix arrays are two fundamental data structures in the field of string algorithms. For a string (a.k.a. text or sequence) of length n over an alphabet of size σ, these structures typically require O(nlog n) bits of space. The FM-index provides a compressed representation of the suffix array in ≈ nlog σ bits, allowing for efficient queries on both the suffix array and its inverse array in near logarithmic time. In certain applications, such as approximate pattern matching (i.e., with wildcards, mismatches, edits), there is a need to access the suffix array of a text, as well as the suffix array of text’s reverse. Motivated by this, we explore the possibility of encoding the suffix array of the reversed text in a compact form, assuming the availability of the FM-index for the original text. Our first solution is an O(n)-bit (relative) encoding of the suffix array of the reversed text, with the time for decoding an entry being only O(log^*n) times that of decoding an entry in the text’s suffix array using FM-index. We then demonstrate how to reduce the space to O(n/κ) bits for a parameter κ, while multiplicative factor in time becomes approximately O(κlog^*n+κ³). We can also support inverse suffix array and longest common extension queries on the reversed text. These results are achieved through some careful and non-trivial application of various succinct data structure techniques.

Paniz Abedin, Oliver A. Chubet, Daniel Gibney, and Sharma V. Thankachan. Contextual pattern matching in less space. In Data Compression Conference, DCC 2023, Snowbird, UT, USA, March 21-24, 2023, pages 160-167. IEEE, 2023. URL: https://doi.org/10.1109/DCC55655.2023.00024.
Amihood Amir, Dmitry Keselman, Gad M. Landau, Moshe Lewenstein, Noa Lewenstein, and Michael Rodeh. Text indexing and dictionary matching with one error. J. Algorithms, 37(2):309-325, 2000. URL: https://doi.org/10.1006/JAGM.2000.1104.
Djamal Belazzougui, Travis Gagie, Simon Gog, Giovanni Manzini, and Jouni Sirén. Relative fm-indexes. In String Processing and Information Retrieval: 21st International Symposium, SPIRE 2014, Ouro Preto, Brazil, October 20-22, 2014. Proceedings 21, pages 52-64. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-11918-2_6.
Philip Bille, Inge Li Gørtz, Mathias Bæk Tejs Knudsen, Moshe Lewenstein, and Hjalte Wedel Vildhøj. Longest common extensions in sublinear space. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Ischia Island, Italy, June 29 - July 1, 2015, Proceedings, volume 9133 of Lecture Notes in Computer Science, pages 65-76. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-19929-0_6.
Michael Burrows. A block-sorting lossless data compression algorithm. SRS Research Report, 124, 1994.
Andrea Farruggia, Travis Gagie, Gonzalo Navarro, Simon J Puglisi, and Jouni Sirén. Relative suffix trees. The Computer Journal, 61(5):773-788, 2018. URL: https://doi.org/10.1093/COMJNL/BXX108.
Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pages 390-398. IEEE, 2000. URL: https://doi.org/10.1109/SFCS.2000.892127.
Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms, 3(2):20, 2007. URL: https://doi.org/10.1145/1240233.1240243.
Johannes Fischer and Volker Heun. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput., 40(2):465-492, 2011. URL: https://doi.org/10.1137/090779759.
Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM, 67(1):2:1-2:54, 2020. URL: https://doi.org/10.1145/3375890.
Travis Gagie, Simon J. Puglisi, and Andrew Turpin. Range quantile queries: Another virtue of wavelet trees. In String Processing and Information Retrieval (SPIRE), volume 5721 of Lecture Notes in Computer Science, pages 1-6. Springer, Berlin, Heidelberg, 2009. arXiv:0903.4726. URL: https://doi.org/10.1007/978-3-642-03784-9_1.
Arnab Ganguly, Daniel Gibney, Sahar Hooshmand, M. Oguzhan Külekci, and Sharma V. Thankachan. Fm-index reveals the reverse suffix array. In Inge Li Gørtz and Oren Weimann, editors, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, Copenhagen, Denmark, June 17-19, 2020, volume 161 of LIPIcs, pages 13:1-13:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.13.
Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 841-850. SIAM, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644250.
Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput., 35(2):378-407, 2005. URL: https://doi.org/10.1137/S0097539702402354.
Jesper Jansson, Kunihiko Sadakane, and Wing-Kin Sung. Ultra-succinct representation of ordered trees with applications. J. Comput. Syst. Sci., 78(2):619-631, 2012. URL: https://doi.org/10.1016/J.JCSS.2011.09.002.
Dominik Kempa and Tomasz Kociumaka. Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 1877-1886. IEEE, 2023. URL: https://doi.org/10.1109/FOCS57990.2023.00114.
Tak Wah Lam, Ruiqiang Li, Alan Tam, Simon C. K. Wong, Edward Wu, and Siu-Ming Yiu. High throughput short read alignment via bi-directional BWT. In 2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009, Washington, DC, USA, November 1-4, 2009, Proceedings, pages 31-36. IEEE Computer Society, 2009. URL: https://doi.org/10.1109/BIBM.2009.42.
Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biology, 10(3):R25, 2009.
Moshe Lewenstein. Orthogonal range searching for text indexing. In Andrej Brodnik, Alejandro López-Ortiz, Venkatesh Raman, and Alfredo Viola, editors, Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday, volume 8066 of Lecture Notes in Computer Science, pages 267-302. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-40273-9_18.
Heng Li and Richard Durbin. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics, 25(14):1754-1760, 2009. URL: https://doi.org/10.1093/BIOINFORMATICS/BTP324.
Udi Manber and Gene Myers. Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing, 22(5):935-948, 1993. URL: https://doi.org/10.1137/0222058.
Edward M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262-272, 1976. URL: https://doi.org/10.1145/321941.321946.
J. Ian Munro and Venkatesh Raman. Succinct representation of balanced parentheses and static trees. SIAM J. Comput., 31(3):762-776, 2001. URL: https://doi.org/10.1137/S0097539799364092.
Gonzalo Navarro. Wavelet trees for all. J. Discrete Algorithms, 25:2-20, 2014. URL: https://doi.org/10.1016/J.JDA.2013.07.004.
Gonzalo Navarro. Contextual pattern matching. In International Symposium on String Processing and Information Retrieval, pages 3-10. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-59212-7_1.
Gonzalo Navarro. Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv., 54(2):29:1-29:31, 2022. URL: https://doi.org/10.1145/3434399.
Gonzalo Navarro. Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv., 54(2):26:1-26:32, 2022. URL: https://doi.org/10.1145/3432999.
Mihai Pătraşcu. Succincter. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 305-313. IEEE Computer Society, 2008. URL: https://doi.org/10.1109/FOCS.2008.83.
Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43, 2007. URL: https://doi.org/10.1145/1290672.1290680.
Luís MS Russo, Gonzalo Navarro, and Arlindo L Oliveira. Fully-compressed suffix trees. In LATIN 2008: Theoretical Informatics: 8th Latin American Symposium, Búzios, Brazil, April 7-11, 2008. Proceedings 8, pages 362-373. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-78773-0_32.
Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory Comput. Syst., 41(4):589-607, 2007. URL: https://doi.org/10.1007/S00224-006-1198-X.
Dirk Strothmann. The affix array data structure and its applications to RNA secondary structure analysis. Theor. Comput. Sci., 389(1-2):278-294, 2007. URL: https://doi.org/10.1016/J.TCS.2007.09.029.
Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. URL: https://doi.org/10.1007/BF01206331.
Peter Weiner. Linear pattern matching algorithms. 14th Annual Symposium on Switching and Automata Theory, pages 1-11, 1973. URL: https://doi.org/10.1109/SWAT.1973.13.

Relative Compressed Reverse Suffix Array

Authors Muhammed Oguzhan Kulekci , Mano Prakash Parthasarathi , Rahul Shah , Sharma V. Thankachan

Files

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Relative Compressed Reverse Suffix Array

Authors Muhammed Oguzhan Kulekci , Mano Prakash Parthasarathi , Rahul Shah , Sharma V. Thankachan

Files

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message