Lyndon Words Accelerate Suffix Sorting

Authors Nico Bertram, Jonas Ellert , Johannes Fischer



PDF
Thumbnail PDF

File

LIPIcs.ESA.2021.15.pdf
  • Filesize: 1.08 MB
  • 13 pages

Document Identifiers

Author Details

Nico Bertram
  • Department of Computer Science, Technische Universität Dortmund, Germany
Jonas Ellert
  • Department of Computer Science, Technische Universität Dortmund, Germany
Johannes Fischer
  • Department of Computer Science, Technische Universität Dortmund, Germany

Cite AsGet BibTex

Nico Bertram, Jonas Ellert, and Johannes Fischer. Lyndon Words Accelerate Suffix Sorting. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ESA.2021.15

Abstract

Suffix sorting is arguably the most fundamental building block in string algorithmics, like regular sorting in the broader field of algorithms. It is thus not surprising that the literature is full of algorithms for suffix sorting, in particular focusing on their practicality. However, the advances on practical suffix sorting stalled with the emergence of the DivSufSort algorithm more than 10 years ago, which, up to date, has remained the fastest suffix sorter. This article shows how properties of Lyndon words can be exploited algorithmically to accelerate suffix sorting again. Our new algorithm is 6-19% faster than DivSufSort on real-world texts, and up to three times as fast on artificial repetitive texts. It can also be parallelized, where similar speedups can be observed. Thus, we make the first advances in practical suffix sorting after more than a decade of standstill.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Suffix array
  • suffix sorting
  • Lyndon words
  • string algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M. Axtmann, S. Witt, D. Ferizovic, and P. Sanders. In-place parallel super scalar samplesort (IPS⁴o). In 25th European Symposium on Algorithms (ESA), pages 9:1-9:14, 2017. URL: https://arxiv.org/abs/1705.02257.
  2. Johannes Bahne, Nico Bertram, Marvin Böcker, Jonas Bode, Johannes Fischer, Hermann Foot, Florian Grieskamp, Florian Kurpicz, Marvin Löbel, Oliver Magiera, Rosa Pink, David Piper, and Christopher Poeplau. SACABench: Benchmarking suffix array construction. In Proceedings of the 26th International Symposium on String Processing and Information Retrieval (SPIRE 2019), pages 407-416, Segovia, Spain, 2019. URL: https://doi.org/10.1007/978-3-030-32686-9_29.
  3. Uwe Baier. Linear-time suffix sorting - A new approach for suffix array construction. Master’s thesis, Ulm University, 2015. Google Scholar
  4. Uwe Baier. Linear-time suffix sorting - A new approach for suffix array construction. In Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), Tel Aviv, Israel, 2016. URL: https://doi.org/10.4230/LIPIcs.CPM.2016.23.
  5. Jean Pierre Duval. Factorizing words over an ordered alphabet. Journal of Algorithms, 4(4):363-381, 1983. URL: https://doi.org/10.1016/0196-6774(83)90017-2.
  6. Johannes Fischer and Florian Kurpicz. Dismantling DivSufSort. In Proceedings of the 21st Prague Stringology Conference (PSC 2019), pages 62-76, Prague, Czech Republic, August 2017. URL: http://www.stringology.org/event/2017/p07.html.
  7. Frantisek Franek, A. S. M. Sohidull Islam, Mohammad Sohel Rahman, and William F. Smyth. Algorithms to compute the Lyndon array. In Proceedings of the 20th Prague Stringology Conference (PSC 2016), pages 172-184, Prague, Czech Republic, August 2016. URL: http://www.stringology.org/event/2016/p15.html.
  8. Frantisek Franek, Asma Paracha, and William F. Smyth. The linear equivalence of the suffix array and the partially sorted Lyndon array. In Proceedings of the 21st Prague Stringology Conference (PSC 2017), pages 77-84, Prague, Czech Republic, August 2017. URL: http://www.stringology.org/event/2017/p08.html.
  9. Keisuke Goto. Optimal time and space construction of suffix arrays and LCP arrays for integer alphabets. In Proceedings of the 23rd Prague Stringology Conference (PSC 2019), pages 111-125, Prague, Czech Republic, 2019. URL: http://www.stringology.org/event/2019/p11.html.
  10. Torben Hagerup. Sorting and searching on the word RAM. In Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science (STACS 1998), pages 366-398, Paris, France, 1998. URL: https://doi.org/10.1007/BFb0028575.
  11. Florian Kurpicz. Parallel Text Index Construction. PhD thesis, Technical University of Dortmund, 2020. Google Scholar
  12. Julian Labeit, Julian Shun, and Guy E. Blelloch. Parallel lightweight wavelet tree, suffix array and FM-index construction. Journal of Discrete Algorithms, 43:2-17, 2017. URL: https://doi.org/10.1016/j.jda.2017.04.001.
  13. Udi Manber and Gene Myers. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st Annual Symposium on Discrete Algorithms (SODA 1990), pages 319-327, San Francisco, California, USA, 1990. URL: https://dl.acm.org/doi/10.5555/320176.320218.
  14. Simon J. Puglisi, William F. Smyth, and Andrew Turpin. A taxonomy of suffix array construction algorithms. ACM Computing Surveys, 39(2):4, 2007. URL: https://doi.org/10.1145/1242471.1242472.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail