Linear Time Runs Over General Ordered Alphabets

Authors Jonas Ellert , Johannes Fischer



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2021.63.pdf
  • Filesize: 0.95 MB
  • 16 pages

Document Identifiers

Author Details

Jonas Ellert
  • Department of Computer Science, Technical University of Dortmund, Germany
Johannes Fischer
  • Department of Computer Science, Technical University of Dortmund, Germany

Cite As Get BibTex

Jonas Ellert and Johannes Fischer. Linear Time Runs Over General Ordered Alphabets. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 63:1-63:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.ICALP.2021.63

Abstract

A run in a string is a maximal periodic substring. For example, the string bananatree contains the runs anana = (an)^{5/2} and ee = e². There are less than n runs in any length-n string, and computing all runs for a string over a linearly-sortable alphabet takes 𝒪(n) time (Bannai et al., SIAM J. Comput. 2017). Kosolobov conjectured that there also exists a linear time runs algorithm for general ordered alphabets (Inf. Process. Lett. 2016). The conjecture was almost proven by Crochemore et al., who presented an 𝒪(nα(n)) time algorithm (where α(n) is the extremely slowly growing inverse Ackermann function). We show how to achieve 𝒪(n) time by exploiting combinatorial properties of the Lyndon array, thus proving Kosolobov’s conjecture. This also positively answers the at least 29-year-old question whether square-freeness can be tested in linear time over general ordered alphabets (Breslauer, PhD thesis, Columbia University 1992).

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • String algorithms
  • Lyndon array
  • runs
  • squares
  • longest common extension
  • general ordered alphabets
  • combinatorics on words

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Arne Andersson, Torben Hagerup, Stefan Nilsson, and Rajeev Raman. Sorting in linear time? Journal of Computer and System Sciences, 57(1):74-93, 1998. URL: https://doi.org/10.1006/jcss.1998.1580.
  2. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The “runs” theorem. SIAM Journal on Computing, 46(5):1501-1514, 2017. URL: https://doi.org/10.1137/15M1011032.
  3. Philip Bille, Jonas Ellert, Johannes Fischer, Inge Li Gørtz, Florian Kurpicz, J. Ian Munro, and Eva Rotenberg. Space efficient construction of Lyndon arrays in linear time. In Proceedings of the 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), pages 14:1-14:18, Saarbrücken, Germany, July 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.14.
  4. Dany Breslauer. Efficient String Algorithmics. PhD thesis, Columbia University, New York, USA, 1992. URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.9146.
  5. Helen Budworth and Cynthia T. McMurray. A Brief History of Triplet Repeat Diseases, volume 1010 of Methods in Molecular Biology, pages 3-17. Springer, 2013. URL: https://doi.org/10.1007/978-1-62703-411-1_1.
  6. M. Crochemore, C.S. Iliopoulos, M. Kubica, J. Radoszewski, W. Rytter, and T. Waleń. Extracting powers and periods in a word from its runs structure. Theoretical Computer Science, 521:29-41, 2014. URL: https://doi.org/10.1016/j.tcs.2013.11.018.
  7. Maxime Crochemore and Lucian Ilie. Maximal repetitions in strings. Journal of Computer and System Sciences, 74(5):796-807, 2008. URL: https://doi.org/10.1016/j.jcss.2007.09.003.
  8. Maxime Crochemore, Lucian Ilie, and Wojciech Rytter. Repetitions in strings: Algorithms and combinatorics. Theoretical Computer Science, 410(50):5227-5235, 2009. URL: https://doi.org/10.1016/j.tcs.2009.08.024.
  9. Maxime Crochemore, Lucian Ilie, and Liviu Tinta. The “runs” conjecture. Theoretical Computer Science, 412(27):2931-2941, 2011. URL: https://doi.org/10.1016/j.tcs.2010.06.019.
  10. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Ritu Kundu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Near-optimal computation of runs over general alphabet via non-crossing lce queries. In Proceedings of the 23rd International Symposium on String Processing and Information Retrieval (SPIRE 2016), pages 22-34, Beppu, Japan, October 2016. URL: https://doi.org/10.1007/978-3-319-46049-9_3.
  11. Johannes Fischer and Volker Heun. Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM 2006), pages 36-48, Barcelona, Spain, 2006. URL: https://doi.org/10.1007/11780441_5.
  12. Johannes Fischer, Stepan Holub, Tomohiro I, and Moshe Lewenstein. Beyond the runs theorem. In Costas S. Iliopoulos, Simon J. Puglisi, and Emine Yilmaz, editors, String Processing and Information Retrieval - 22nd International Symposium, SPIRE 2015, London, UK, September 1-4, 2015, Proceedings, volume 9309 of Lecture Notes in Computer Science, pages 277-286. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-23826-5_27.
  13. Frantisek Franek, A. S. M. Sohidull Islam, Mohammad Sohel Rahman, and William F. Smyth. Algorithms to compute the Lyndon array. In Proceedings of the Prague Stringology Conference 2016 (PSC 2016), pages 172-184, Prague, Czech Republic, 2016. URL: http://www.stringology.org/event/2016/p15.html.
  14. Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster longest common extension queries in strings over general alphabets. In Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), pages 5:1-5:13, Tel Aviv, Israel, 2016. URL: https://doi.org/10.4230/LIPIcs.CPM.2016.5.
  15. Torben Hagerup. Sorting and searching on the word RAM. In Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science (STACS 98), pages 366-398, Paris, France, February 1998. URL: https://doi.org/10.1007/BFb0028575.
  16. Yijie Han and M. Thorup. Integer sorting in 𝒪(n √log log n) expected time and linear space. In Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2002), pages 135-144, Vancouver, Canada, 2002. URL: https://doi.org/10.1109/SFCS.2002.1181890.
  17. Stepan Holub. Prefix frequency of lost positions. Theor. Comput. Sci., 684:43-52, 2017. URL: https://doi.org/10.1016/j.tcs.2017.01.026.
  18. R. Kolpakov and G. Kucherov. Finding maximal repetitions in a word in linear time. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS 1999), pages 596-604, New York, NY, USA, 1999. URL: https://doi.org/10.1109/SFFCS.1999.814634.
  19. Dmitry Kosolobov. Lempel-Ziv factorization may be harder than computing all runs. In Proceedings of the 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015), pages 582-593, Munich, Germany, 2015. URL: https://doi.org/10.4230/LIPIcs.STACS.2015.582.
  20. Dmitry Kosolobov. Computing runs on a general alphabet. Information Processing Letters, 116(3):241-244, 2016. URL: https://doi.org/10.1016/j.ipl.2015.11.016.
  21. R. C. Lyndon and M. P. Schützenberger. The equation a^m = b^nc^p in a free group. Michigan Mathematical Journal, 9(4):289-298, 1962. URL: https://doi.org/10.1307/mmj/1028998766.
  22. Michael G Main and Richard J Lorentz. An o(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5(3):422-432, 1984. URL: https://doi.org/10.1016/0196-6774(84)90021-X.
  23. Wataru Matsubara, Kazuhiko Kusano, Hideo Bannai, and Ayumi Shinohara. A series of run-rich strings. In Adrian Horia Dediu, Armand Mihai Ionescu, and Carlos Martín-Vide, editors, Proceedings of the 3rd International Conference on Language and Automata Theory and Applications (LATA 2009), pages 578-587, Tarragona, Spain, 2009. URL: https://doi.org/10.1007/978-3-642-00982-2_49.
  24. Simon J. Puglisi, Jamie Simpson, and W.F. Smyth. How many runs can a string contain? Theoretical Computer Science, 401(1):165-171, 2008. URL: https://doi.org/10.1016/j.tcs.2008.04.020.
  25. Wojciech Rytter. The number of runs in a string: Improved analysis of the linear upper bound. In Proceedings of the 24th Annual Symposium on Theoretical Aspects of Computer Science (STACS 2006), pages 184-195, Marseille, France, 2006. URL: https://doi.org/10.1007/11672142_14.
  26. Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing runs on a trie. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019), volume 128, pages 23:1-23:11, Pisa, Italy, June 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.23.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail