Constructing Suffixient Arrays Revisited

Bonizzoni, Paola; Gao, Younan; Riccardi, Brian

doi:10.4230/LIPIcs.CPM.2026.30

Abstract

Recently, Cenzato et al. proposed a new text index, called the suffixient array, which is a subset of the suffix array and supports locating a single pattern occurrence or finding its maximal exact matches (MEMs), assuming random access to the input text T[1..n] is available. They show that, given the suffix array, the longest common prefix array, and the Burrows-Wheeler transform (BWT) of the reverse of T[1..n] over an alphabet {1,…,σ}, a suffixient array can be constructed in linear time. However, their construction algorithms require multiple scans of these arrays. When restricted to a single pass over the arrays, they present an alternative construction algorithm running in O(n + r log σ) time, where r is the number of runs in the BWT of the reversed text. In this paper, we present a new one-pass algorithm that constructs a suffixient array in linear time under the standard RAM model.

Cite As Get BibTex

Paola Bonizzoni, Younan Gao, and Brian Riccardi. Constructing Suffixient Arrays Revisited. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 30:1-30:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026) https://doi.org/10.4230/LIPIcs.CPM.2026.30

Author Details

Paola Bonizzoni

Department of Computer Science, University of Milano-Bicocca, Italy

Younan Gao

Department of Computer Science, University of Milano-Bicocca, Italy

Brian Riccardi

Department of Computer Science, University of Milano-Bicocca, Italy

Funding

All authors have received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement PANGAIA No. 872539, as well as from grant MIUR 2022YRB97K (PINC, Pangenome Informatics: From Theory to Applications), funded by the European Union under the NextGenerationEU programme, Mission 4.

References

Mohamed Ibrahim Abouelhoda, Stefan Kurtz, and Enno Ohlebusch. Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms, 2(1):53-86, 2004. URL: https://doi.org/10.1016/S1570-8667(03)00065-0.
Alfred V Aho and John E Hopcroft. The design and analysis of computer algorithms. Pearson Education India, 1974.
Christina Boucher, Travis Gagie, Alan Kuhnle, Ben Langmead, Giovanni Manzini, and Taher Mun. Prefix-free parsing for building big BWTs. Algorithms Mol. Biol., 14(1):13:1-13:15, 2019. URL: https://doi.org/10.1186/S13015-019-0148-5.
Michael Burrows and David J. Wheeler. A Block-sorting Lossless Data Compression Algorithm. Technical Report SRC-TR-124, Digital Equipment Corporation, Palo Alto, CA, USA, May 1994.
Davide Cenzato, Lore Depuydt, Travis Gagie, Sung-Hwan Kim, Giovanni Manzini, Francisco Olivares, and Nicola Prezza. Suffixient Arrays: a New Efficient Suffix Array Compression Technique, 2025. URL: https://doi.org/10.48550/arXiv.2407.18753.
Luc Devroye, Wojciech Szpankowski, and Bonita Rais. A Note on the Height of Suffix Trees. SIAM J. Comput., 21(1):48-53, 1992. URL: https://doi.org/10.1137/0221005.
Johannes Fischer, Veli Mäkinen, and Gonzalo Navarro. Faster entropy-bounded compressed suffix trees. Theoretical Computer Science, 410(51):5354-5364, 2009. URL: https://doi.org/10.1016/J.TCS.2009.09.012.
Michael L. Fredman and Dan E. Willard. Surpassing the Information Theoretic Bound with Fusion Trees. J. Comput. Syst. Sci., 47(3):424-436, 1993. URL: https://doi.org/10.1016/0022-0000(93)90040-4.
Alan Kuhnle, Taher Mun, Christina Boucher, Travis Gagie, Ben Langmead, and Giovanni Manzini. Efficient Construction of a Complete Index for Pan-Genomics Read Alignment. J. Comput. Biol., 27(4):500-513, 2020. URL: https://doi.org/10.1089/CMB.2019.0309.
Stefan Kurtz. Reducing the space requirement of suffix trees. Softw. Pract. Exp., 29(13):1149-1171, 1999. URL: https://doi.org/10.1002/(SICI)1097-024X(199911)29:13%3C1149::AID-SPE274%3E3.0.CO;2-O.
Heng Li and Richard Durbin. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinform., 26(5):589-595, 2010. URL: https://doi.org/10.1093/BIOINFORMATICS/BTP698.
Udi Manber and Eugene W. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput., 22(5):935-948, 1993. URL: https://doi.org/10.1137/0222058.
Gonzalo Navarro, Giuseppe Romana, and Cristian Urbina. Smallest Suffixient Sets as a Repetitiveness Measure. CoRR, abs/2506.05638, 2025. URL: https://doi.org/10.48550/arXiv.2506.05638.
Massimiliano Rossi, Marco Oliva, Ben Langmead, Travis Gagie, and Christina Boucher. MONI: A pangenomic index for finding maximal exact matches. J. Comput. Biol., 29(2):169-187, 2022. URL: https://doi.org/10.1089/CMB.2021.0290.
Esko Ukkonen. On-Line Construction of Suffix Trees. Algorithmica, 14(3):249-260, 1995. URL: https://doi.org/10.1007/BF01206331.
Peter Weiner. Linear Pattern Matching Algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1-11. IEEE Computer Society, 1973. URL: https://doi.org/10.1109/SWAT.1973.13.

Constructing Suffixient Arrays Revisited

Authors Paola Bonizzoni , Younan Gao , Brian Riccardi

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message