Space Efficient Construction of Lyndon Arrays in Linear Time

Authors Philip Bille , Jonas Ellert , Johannes Fischer, Inge Li Gørtz , Florian Kurpicz , J. Ian Munro, Eva Rotenberg



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2020.14.pdf
  • Filesize: 0.61 MB
  • 18 pages

Document Identifiers

Author Details

Philip Bille
  • DTU Compute, Technical University of Denmark, Lyngby, Denmark
Jonas Ellert
  • Department of Computer Science, Technical University of Dortmund, Germany
Johannes Fischer
  • Department of Computer Science, Technical University of Dortmund, Germany
Inge Li Gørtz
  • DTU Compute, Technical University of Denmark, Lyngby, Denmark
Florian Kurpicz
  • Department of Computer Science, Technical University of Dortmund, Germany
J. Ian Munro
  • Cheriton School of Computer Science, University of Waterloo, Canada
Eva Rotenberg
  • DTU Compute, Technical University of Denmark, Lyngby, Denmark

Cite AsGet BibTex

Philip Bille, Jonas Ellert, Johannes Fischer, Inge Li Gørtz, Florian Kurpicz, J. Ian Munro, and Eva Rotenberg. Space Efficient Construction of Lyndon Arrays in Linear Time. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ICALP.2020.14

Abstract

Given a string S of length n, its Lyndon array identifies for each suffix S[i..n] the next lexicographically smaller suffix S[j..n], i.e. the minimal index j > i with S[i..n] ≻ S[j..n]. Apart from its plain (n log₂ n)-bit array representation, the Lyndon array can also be encoded as a succinct parentheses sequence that requires only 2n bits of space. While linear time construction algorithms for both representations exist, it has previously been unknown if the same time bound can be achieved with less than Ω(n lg n) bits of additional working space. We show that, in fact, o(n) additional bits are sufficient to compute the succinct 2n-bit version of the Lyndon array in linear time. For the plain (n log₂ n)-bit version, we only need 𝒪(1) additional words to achieve linear time. Our space efficient construction algorithm makes the Lyndon array more accessible as a fundamental data structure in applications like full-text indexing.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • String algorithms
  • string suffixes
  • succinct data structures
  • Lyndon word
  • Lyndon array
  • nearest smaller values
  • nearest smaller suffixes

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Uwe Baier. Linear-time suffix sorting - A new approach for suffix array construction. In Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), Tel Aviv, Israel, June 2016. URL: https://doi.org/10.4230/LIPIcs.CPM.2016.23.
  2. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The "Runs" theorem. SIAM Journal on Computing, 46(5):1501-1514, 2017. URL: https://doi.org/10.1137/15M1011032.
  3. Jérémy Barbay, Johannes Fischer, and Gonzalo Navarro. LRM-trees: Compressed indices, adaptive sorting, and compressed permutations. In Proceedings of the 22nd Annual Symposium on Combinatorial Pattern Matching (CPM 2011), pages 285-298, Palermo, Italy, June 2011. URL: https://doi.org/10.1007/978-3-642-21458-5_25.
  4. Philip Bille, Johannes Fischer, Inge Li Gørtz, Tsvi Kopelowitz, Benjamin Sach, and Hjalte Wedel Vildhøj. Sparse text indexing in small space. ACM Transactions on Algorithms, 12(3):Article No. 39, 2016. URL: https://doi.org/10.1145/2836166.
  5. K. T. Chen, R. H. Fox, and R. C. Lyndon. Free differential calculus, IV. the quotient groups of the lower central series. Annals of Mathematics, 68(1):81-95, 1958. URL: https://doi.org/10.2307/1970044.
  6. Maxime Crochemore and Luís M. S. Russo. Cartesian and lyndon trees. Theor. Comput. Sci., 806:1-9, 2020. Google Scholar
  7. Jacqueline W. Daykin, Frantisek Franek, Jan Holub, A. S. M. Sohidull Islam, and W. F. Smyth. Reconstructing a string from its lyndon arrays. Theor. Comput. Sci., 710:44-51, 2018. Google Scholar
  8. Jean Pierre Duval. Factorizing words over an ordered alphabet. Journal of Algorithms, 4(4):363-381, 1983. URL: https://doi.org/10.1016/0196-6774(83)90017-2.
  9. Johannes Fischer. Optimal succinctness for range minimum queries. In Proceedings of the 9th Latin American Symposium on Theoretical Informatics (LATIN 2010), pages 158-169, Oaxaca, Mexico, April 2010. URL: https://doi.org/10.1007/978-3-642-12200-2_16.
  10. Johannes Fischer. Combined data structure for previous- and next-smaller-values. Theoretical Computer Science, 412(22):2451-2456, 2011. Google Scholar
  11. Johannes Fischer, Tomohiro I, and Dominik Köppl. Deterministic sparse suffix sorting on rewritable texts. In Proceedings of the 12th Latin American Theoretical Informatics Symposium (LATIN 2016), pages 483-496, Ensenada, México, April 2016. URL: https://doi.org/10.1007/978-3-662-49529-2_36.
  12. Johannes Fischer and Florian Kurpicz. Dismantling DivSufSort. In Proceedings of the 25th Prague Stringology Conference (PSC 2017), pages 62-76, Prague, Czech Republic, August 2017. Google Scholar
  13. Frantisek Franek, A. S. M. Sohidull Islam, Mohammad Sohel Rahman, and William F. Smyth. Algorithms to compute the Lyndon array. In Proceedings of the 20th Prague Stringology Conference (PSC 2016), pages 172-184, Prague, Czech Republic, August 2016. Google Scholar
  14. Frantisek Franek, Asma Paracha, and William F. Smyth. The linear equivalence of the suffix array and the partially sorted Lyndon array. In Proceedings of the 21st Prague Stringology Conference (PSC 2017), pages 77-84, Prague, Czech Republic, August 2017. Google Scholar
  15. Paweł Gawrychowski and Tomasz Kociumaka. Sparse suffix tree construction in optimal time and space. In Proceedings of the 28th Annual Symposium on Discrete Algorithms (SODA 2017), pages 425-439, Barcelona, Spain, January 2017. URL: https://doi.org/10.1137/1.9781611974782.27.
  16. Alexander Golynski. Optimal lower bounds for rank and select indexes. Theoretical Computer Science, 387(3):348-359, 2007. URL: https://doi.org/10.1016/j.tcs.2007.07.041.
  17. Torben Hagerup. Sorting and searching on the word ram. In Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science (STACS 1998), pages 366-398, Paris, France, February 1998. URL: https://doi.org/10.1007/BFb0028575.
  18. Christophe Hohlweg and Christophe Reutenauer. Lyndon words, permutations and trees. Theoretical Computer Science, 307(1):173-178, 2003. URL: https://doi.org/10.1016/S0304-3975(03)00099-9.
  19. Felipe A. Louza, W.F. Smyth, Giovanni Manzini, and Guilherme P. Telles. Lyndon array construction during BurrowsendashWheeler inversion. Journal of Discrete Algorithms, 50:2-9, May 2018. URL: https://doi.org/10.1016/j.jda.2018.08.001.
  20. Felipe Alves Louza, Sabrina Mantaci, Giovanni Manzini, Marinella Sciortino, and Guilherme P. Telles. Inducing the Lyndon array. In Proceedings of the 26th International Symposium on String Processing and Information Retrieval (SPIRE 2019), pages 138-151, Segovia, Spain, October 2019. URL: https://doi.org/10.1007/978-3-030-32686-9_10.
  21. J. Ian Munro and Venkatesh Raman. Succinct representation of balanced parentheses and static trees. SIAM Journal on Computing, 31(3):762-776, 2001. URL: https://doi.org/10.1137/s0097539799364092.
  22. Kunihiko Sadakane and Gonzalo Navarro. Fully-functional succinct trees. In Proceedings of the 21st Annual Symposium on Discrete Algorithms (SODA 2010), pages 134-149, Austin, TX, USA, January 2010. URL: https://doi.org/10.1137/1.9781611973075.13.
  23. Kazuya Tsuruta, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Grammar-compressed self-index with Lyndon words. CoRR, abs/2004.05309, 2020. URL: http://arxiv.org/abs/2004.05309.