Back-To-Front Online Lyndon Forest Construction

Authors Golnaz Badkobeh , Maxime Crochemore , Jonas Ellert , Cyril Nicaud



PDF
Thumbnail PDF

File

LIPIcs.CPM.2022.13.pdf
  • Filesize: 0.93 MB
  • 23 pages

Document Identifiers

Author Details

Golnaz Badkobeh
  • Goldsmiths University of London, UK
Maxime Crochemore
  • Univ. Gustave Eiffel, Champs-sur-Marne, France
  • King’s College London, UK
Jonas Ellert
  • Department of Computer Science, Technical University of Dortmund, Germany
Cyril Nicaud
  • LIGM, Univ. Gustave Eiffel, Champs-sur-Marne, France

Cite As Get BibTex

Golnaz Badkobeh, Maxime Crochemore, Jonas Ellert, and Cyril Nicaud. Back-To-Front Online Lyndon Forest Construction. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 13:1-13:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.CPM.2022.13

Abstract

A Lyndon word is a word that is lexicographically smaller than all of its non-trivial rotations (e.g. ananas is a Lyndon word; banana is not a Lyndon word due to its smaller rotation abanan). The Lyndon forest (or equivalently Lyndon table) identifies maximal Lyndon factors of a word, and is of great combinatoric interest, e.g. when finding maximal repetitions in words. While optimal linear time algorithms for computing the Lyndon forest are known, none of them work in an online manner. We present algorithms that compute the Lyndon forest of a word in a reverse online manner, processing the input word from back to front. We assume a general ordered alphabet, i.e. the only elementary operations on symbols are comparisons of the form less-equal-greater. We start with a naive algorithm and show that, despite its quadratic worst-case behaviour, it already takes expected linear time on words drawn uniformly at random. We then introduce a much more sophisticated algorithm that takes linear time in the worst case. It borrows some ideas from the offline algorithm by Bille et al. (ICALP 2020), combined with new techniques that are necessary for the reverse online setting. While the back-to-front approach for this computation is rather natural (see Franek and Liut, PSC 2019), the steps required to achieve linear time are surprisingly intricate. We envision that our algorithm will be useful for the online computation of maximal repetitions in words.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Mathematics of computing → Combinatorics on words
  • Mathematics of computing → Combinatorial algorithms
Keywords
  • Lyndon factorisation
  • Lyndon forest
  • Lyndon table
  • Lyndon array
  • right Lyndon tree
  • Cartesian tree
  • standard factorisation
  • online algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Golnaz Badkobeh and Maxime Crochemore. Left Lyndon tree construction. In Jan Holub and Jan Zdárek, editors, Prague Stringology Conference 2020, Prague, Czech Republic, August 31-September 2, 2020, pages 84-95. Czech Technical University in Prague, Faculty of Information Technology, Department of Theoretical Computer Science, 2020. https://arxiv.org/abs/2011.12742. Google Scholar
  2. Uwe Baier. Linear-time Suffix Sorting - A New Approach for Suffix Array Construction. In Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), pages 23:1-23:12, 2016. URL: https://doi.org/10.4230/LIPIcs.CPM.2016.23.
  3. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The "runs" theorem. SIAM J. Comput., 46(5):1501-1514, 2017. URL: https://doi.org/10.1137/15M1011032.
  4. Jérémy Barbay, Johannes Fischer, and Gonzalo Navarro. LRM-trees: Compressed indices, adaptive sorting, and compressed permutations. Theor. Comput. Sci., 459:26-41, 2012. URL: https://doi.org/10.1016/j.tcs.2012.08.010.
  5. Frédérique Bassino, Julien Clément, and Cyril Nicaud. The standard factorization of Lyndon words: an average point of view. Discret. Math., 290(1):1-25, 2005. URL: https://doi.org/10.1016/j.disc.2004.11.002.
  6. Nico Bertram, Jonas Ellert, and Johannes Fischer. Lyndon words accelerate suffix sorting. In Petra Mutzel, Rasmus Pagh, and Grzegorz Herman, editors, 29th Annual European Symposium on Algorithms, ESA 2021, September 6-8, 2021, Lisbon, Portugal (Virtual Conference), volume 204 of LIPIcs, pages 15:1-15:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ESA.2021.15.
  7. Philip Bille, Jonas Ellert, Johannes Fischer, Inge Li Gørtz, Florian Kurpicz, J. Ian Munro, and Eva Rotenberg. Space efficient construction of Lyndon arrays in linear time. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), volume 168 of LIPIcs, pages 14:1-14:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.14.
  8. Dany Breslauer, Tao Jiang, and Zhigen Jiang. Rotations of periodic strings and short superstrings. J. Algorithms, 24(2):340-353, 1997. URL: https://doi.org/10.1006/jagm.1997.0861.
  9. Philippe Chassaing and Lucas Mercier. The height of the Lyndon tree. Discrete Mathematics & Theoretical Computer Science, 2013. Google Scholar
  10. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages. Google Scholar
  11. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Ritu Kundu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Near-optimal computation of runs over general alphabet via non-crossing LCE queries. In Shunsuke Inenaga, Kunihiko Sadakane, and Tetsuya Sakai, editors, String Processing and Information Retrieval - 23rd International Symposium, SPIRE 2016, Beppu, Japan, October 18-20, 2016, Proceedings, volume 9954 of Lecture Notes in Computer Science, pages 22-34, 2016. URL: https://doi.org/10.1007/978-3-319-46049-9_3.
  12. Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. The maximal number of cubic runs in a word. J. Comput. Syst. Sci., 78(6):1828-1836, 2012. URL: https://doi.org/10.1016/j.jcss.2011.12.005.
  13. Maxime Crochemore, Thierry Lecroq, and Wojciech Rytter. 125 Problems in Text Algorithms. Cambridge University Press, 2021. 334 pages. Google Scholar
  14. Maxime Crochemore and Dominique Perrin. Two-way string-matching. J. Assoc. Comput. Mach., 38(3):651-675, 1991. Google Scholar
  15. Maxime Crochemore and Luís M. S. Russo. Cartesian and Lyndon trees. Theoretical Computer Science, 806:1-9, February 2020. Google Scholar
  16. Jean-Pierre Duval. Factorizing words over an ordered alphabet. J. Algorithms, 4(4):363-381, 1983. URL: https://doi.org/10.1016/0196-6774(83)90017-2.
  17. Jonas Ellert and Johannes Fischer. Linear time runs over general ordered alphabets. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, July 12-16, 2021, Glasgow, Scotland (Virtual Conference), volume 198 of LIPIcs, pages 63:1-63:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.63.
  18. Johannes Fischer and Volker Heun. Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In Moshe Lewenstein and Gabriel Valiente, editors, Combinatorial Pattern Matching, 17th Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006, Proceedings, volume 4009 of Lecture Notes in Computer Science, pages 36-48. Springer, 2006. URL: https://doi.org/10.1007/11780441_5.
  19. Frantisek Franek, A. S. M. Sohidull Islam, Mohammad Sohel Rahman, and William F. Smyth. Algorithms to compute the Lyndon array. In Jan Holub and Jan Zdárek, editors, Proceedings of the Prague Stringology Conference 2016, Prague, Czech Republic, August 29-31, 2016, pages 172-184. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2016. URL: http://www.stringology.org/event/2016/p15.html.
  20. Frantisek Franek and Michael Liut. Algorithms to compute the Lyndon array revisited. In Jan Holub and Jan Zdárek, editors, Prague Stringology Conference 2019, Prague, Czech Republic, August 26-28, 2019, pages 16-28. Czech Technical University in Prague, Faculty of Information Technology, Department of Theoretical Computer Science, 2019. URL: http://www.stringology.org/event/2019/p03.html.
  21. Christophe Hohlweg and Christophe Reutenauer. Lyndon words, permutations and trees. Theor. Comput. Sci., 307(1):173-178, 2003. URL: https://doi.org/10.1016/S0304-3975(03)00099-9.
  22. Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In 40th Annual Symposium on Foundations of Computer Science, FOCS '99, 17-18 October, 1999, New York, NY, USA, pages 596-604. IEEE Computer Society, 1999. URL: https://doi.org/10.1109/SFFCS.1999.814634.
  23. Dmitry Kosolobov. Computing runs on a general alphabet. Inf. Process. Lett., 116(3):241-244, 2016. URL: https://doi.org/10.1016/j.ipl.2015.11.016.
  24. M. Lothaire. Combinatorics on Words. Addison-Wesley, 1983. Reprinted in 1997. Google Scholar
  25. Felipe A. Louza, Sabrina Mantaci, Giovanni Manzini, Marinella Sciortino, and Guilherme P. Telles. Inducing the Lyndon array. In Nieves R. Brisaboa and Simon J. Puglisi, editors, String Processing and Information Retrieval - 26th International Symposium, SPIRE 2019, Segovia, Spain, October 7-9, 2019, Proceedings, volume 11811 of Lecture Notes in Computer Science, pages 138-151. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-32686-9_10.
  26. R. C. Lyndon. On Burnside problem I. Trans. Amer. Math. Soc., 77:202-215, 1954. Google Scholar
  27. Udi Manber and Gene Myers. Suffix arrays: a new method for on-line string searches. In David S. Johnson, editor, Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, 22-24 January 1990, San Francisco, California, USA, pages 319-327. SIAM, 1990. URL: http://dl.acm.org/citation.cfm?id=320176.320218.
  28. Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. Sorting suffixes of a text via its Lyndon factorization. In Jan Holub and Jan Zdárek, editors, Proceedings of the Prague Stringology Conference 2013, Prague, Czech Republic, September 2-4, 2013, pages 119-127. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2013. URL: http://www.stringology.org/event/2013/p11.html.
  29. Joe Sawada and Frank Ruskey. Generating Lyndon brackets.: An addendum to: Fast algorithms to generate necklaces, unlabeled necklaces and irreducible polynomials over GF(2). J. Algorithms, 46(1):21-26, 2003. URL: https://doi.org/10.1016/S0196-6774(02)00286-9.
  30. Jean Vuillemin. A unifying look at data structures. Commun. ACM, 23(4):229-239, 1980. URL: https://doi.org/10.1145/358841.358852.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail