Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

Authors Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2017.33.pdf
  • Filesize: 0.54 MB
  • 12 pages

Document Identifiers

Author Details

Yuta Fujishige
Yuto Nakashima
Shunsuke Inenaga
Hideo Bannai
Masayuki Takeda

Cite As Get BibTex

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 33:1-33:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ISAAC.2017.33

Abstract

We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding.
Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2.
We also show an algorithm for computing all maximal repetitions in O(m \alpha(m)) time and O(m) space, where \alpha denotes the inverse Ackermann function.

Subject Classification

Keywords
  • maximal repetitions,run length encoding

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Anthony Bagnall, Chotirat "Ann" Ratanamahatana, Eamonn Keogh, Stefano Lonardi, and Gareth Janacek. A bit level representation for time series data mining with shape based similarity. Data Mining and Knowledge Discovery, 13(1):11-40, 2006. Google Scholar
  2. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. A new characterization of maximal repetitions by lyndon trees. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 562-571. SIAM, 2015. URL: http://dx.doi.org/10.1137/1.9781611973730.38.
  3. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The “runs” theorem. SIAM Journal on Computing, 46(5):1501-1514, 2017. URL: http://dx.doi.org/10.1137/15M1011032.
  4. Frédérique Bassino, Julien Clément, and Cyril Nicaud. The standard factorization of Lyndon words: an average point of view. Discrete Mathematics, 290(1):1-25, 2005. Google Scholar
  5. K. T. Chen, R. H. Fox, and R. C. Lyndon. Free differential calculus. iv. the quotient groups of the lower central series. Annals of Mathematics, 68(1):81-95, 1958. Google Scholar
  6. Kuan-Yu Chen and Kun-Mao Chao. A fully compressed algorithm for computing the edit distance of run-length encoded strings. Algorithmica, 65(2):354-370, 2013. URL: http://dx.doi.org/10.1007/s00453-011-9592-4.
  7. Kuan-Yu Chen, Ping-Hui Hsu, and Kun-Mao Chao. Efficient retrieval of approximate palindromes in a run-length encoded string. Theor. Comput. Sci., 432:28-37, 2012. URL: http://dx.doi.org/10.1016/j.tcs.2012.01.023.
  8. Maxime Crochemore and Lucian Ilie. Computing longest previous factor in linear time and applications. Inf. Process. Lett., 106(2):75-80, 2008. URL: http://dx.doi.org/10.1016/j.ipl.2007.10.006.
  9. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Ritu Kundu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Near-optimal computation of runs over general alphabet via non-crossing LCE queries. In Proc. SPIRE 2016, pages 22-34, 2016. Google Scholar
  10. Jean-Pierre Duval. Factorizing words over an ordered alphabet. J. Algorithms, 4(4):363-381, 1983. Google Scholar
  11. Johannes Fischer and Volker Heun. Theoretical and practical improvements on the rmq-problem, with applications to LCA and LCE. In Moshe Lewenstein and Gabriel Valiente, editors, Combinatorial Pattern Matching, 17th Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006, Proceedings, volume 4009 of Lecture Notes in Computer Science, pages 36-48. Springer, 2006. URL: http://dx.doi.org/10.1007/11780441_5.
  12. Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster longest common extension queries in strings over general alphabets. In Proc. CPM 2016, pages 5:1-5:13, 2016. Google Scholar
  13. Sukhpal Singh Ghuman, Emanuele Giaquinta, and Jorma Tarhio. Alternative algorithms for lyndon factorization. In Jan Holub and Jan Zdárek, editors, Proceedings of the Prague Stringology Conference 2014, Prague, Czech Republic, September 1-3, 2014, pages 169-178. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2014. URL: http://www.stringology.org/event/2014/p16.html.
  14. Roman Kolpakov, Ghizlane Bana, and Gregory Kucherov. mreps: Efficient and flexible detection of tandem repeats in DNA. Nucleic acids research, 31(13):3672-3678, July 2003. Google Scholar
  15. Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In 40th Annual Symposium on Foundations of Computer Science, FOCS '99, 17-18 October, 1999, New York, NY, USA, pages 596-604. IEEE Computer Society, 1999. URL: http://dx.doi.org/10.1109/SFFCS.1999.814634.
  16. Dmitry Kosolobov. Lempel-ziv factorization may be harder than computing all runs. In Ernst W. Mayr and Nicolas Ollinger, editors, 32nd International Symposium on Theoretical Aspects of Computer Science, STACS 2015, March 4-7, 2015, Garching, Germany, volume 30 of LIPIcs, pages 582-593. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2015. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2015.582.
  17. Dmitry Kosolobov. Computing runs on a general alphabet. Inf. Process. Lett., 116(3):241-244, 2016. Google Scholar
  18. Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster STR-IC-LCS computation via RLE. In Combinatorial Pattern Matching, 28th Annual Symposium, CPM 2017, Warsaw, Poland, July 4-6, 2017, Proceedings, 2017. in press. URL: http://arxiv.org/abs/1703.04954.
  19. Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2):107-144, 2007. Google Scholar
  20. Jia Jie Liu, Yue-Li Wang, and Yu-shan Chiu. Constrained longest common subsequences with run-length-encoded strings. Comput. J., 58(5):1074-1084, 2015. URL: http://dx.doi.org/10.1093/comjnl/bxu012.
  21. M. Lothaire. Combinatorics on Words. Addison-Wesley, 1983. Google Scholar
  22. Roger C. Lyndon. On Burnside’s problem. Transactions of the American Mathematical Society, 77(2):202-202, February 1954. Google Scholar
  23. Michael G. Main. Detecting leftmost maximal periodicities. Discrete Applied Mathematics, 25(1-2):145-153, 1989. URL: http://dx.doi.org/10.1016/0166-218X(89)90051-6.
  24. Michael G. Main and Richard J. Lorentz. An O(nlog n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5(3):422-432, 1984. Google Scholar
  25. Axel Thue. Über unendliche zeichenreihen. Christiana Videnskabs Selskabs Skrifter, I. Math. naturv. Klasse, 7, 1906. Google Scholar
  26. Jun-ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Faster compact on-line lempel-ziv factorization. In Ernst W. Mayr and Natacha Portier, editors, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), STACS 2014, March 5-8, 2014, Lyon, France, volume 25 of LIPIcs, pages 675-686. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2014.675.
  27. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337-349, 1977. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail