Lempel-Ziv Factorization May Be Harder Than Computing All Runs

Author Dmitry Kosolobov



PDF
Thumbnail PDF

File

LIPIcs.STACS.2015.582.pdf
  • Filesize: 0.61 MB
  • 12 pages

Document Identifiers

Author Details

Dmitry Kosolobov

Cite AsGet BibTex

Dmitry Kosolobov. Lempel-Ziv Factorization May Be Harder Than Computing All Runs. In 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 30, pp. 582-593, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)
https://doi.org/10.4230/LIPIcs.STACS.2015.582

Abstract

The complexity of computing the Lempel-Ziv decomposition and the set of all runs (= maximal repetitions) is studied in the decision tree model of computation over ordered alphabet. It is known that both these problems can be solved by RAM algorithms in O(n\log\sigma) time, where n is the length of the input string and \sigma is the number of distinct letters in it. We prove an \Omega(n\log\sigma) lower bound on the number of comparisons required to construct the Lempel-Ziv decomposition and thereby conclude that a popular technique of computation of runs using the Lempel-Ziv decomposition cannot achieve an o(n\log\sigma) time bound. In contrast with this, we exhibit an O(n) decision tree algorithm finding all runs in a string. Therefore, in the decision tree model the runs problem is easier than the Lempel-Ziv decomposition. Thus we support the conjecture that there is a linear RAM algorithm finding all runs.
Keywords
  • Lempel-Ziv factorization
  • runs
  • repetitions
  • decision tree
  • lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms, 2(1):53-86, 2004. Google Scholar
  2. G. Badkobeh, M. Crochemore, and C. Toopsuwan. Computing the maximal-exponent repeats of an overlap-free string in linear time. In String Processing and Information Retrieval, pages 61-72. Springer, 2012. Google Scholar
  3. H. Bannai, T. I, S. Inenaga, Y. Nakashima, M. Takeda, and K. Tsuruta. The "runs" theorem. arXiv preprint arXiv:1406.0263v4, 2014. Google Scholar
  4. D. Breslauer. Efficient string algorithmics. PhD thesis, Columbia University, 1992. Google Scholar
  5. G. Chen, S. J. Puglisi, and W. F. Smyth. Lempel-ziv factorization using less time & space. Mathematics in Computer Science, 1(4):605-623, 2008. Google Scholar
  6. M. Crochemore. Transducers and repetitions. Theoretical Computer Science, 45:63-86, 1986. Google Scholar
  7. M. Crochemore, L. Ilie, and W. F. Smyth. A simple algorithm for computing the lempel-ziv factorization. In Data Compression Conference (DCC'08), pages 482-488. IEEE, 2008. Google Scholar
  8. M. Crochemore, L. Ilie, and L. Tinta. The "runs" conjecture. Theoretical Computer Science, 412(27):2931-2941, 2011. Google Scholar
  9. M. Crochemore, M. Kubica, J. Radoszewski, W. Rytter, and T. Waleń. On the maximal sum of exponents of runs in a string. Journal of Discrete Algorithms, 14:29-36, 2012. Google Scholar
  10. E. R. Fiala and D. H. Greene. Data compression with finite windows. Communications of the ACM, 32(4):490-505, 1989. Google Scholar
  11. N. J. Fine and H. S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. Google Scholar
  12. J. Karkkainen, D. Kempa, and S. J. Puglisi. Lempel-ziv parsing in external memory. In Data Compression Conference (DCC'14), pages 153-162. IEEE, 2014. Google Scholar
  13. R. Kolpakov. On primary and secondary repetitions in words. Theoretical Computer Science, 418:71-81, 2012. Google Scholar
  14. R. Kolpakov and G. Kucherov. Finding maximal repetitions in a word in linear time. In 40th Annual Symposium on Foundations of Computer Science, pages 596-604. IEEE, 1999. Google Scholar
  15. R. Kolpakov, M. Podolskiy, M. Posypkin, and N. Khrapov. Searching of gapped repeats and subrepetitions in a word. In Combinatorial Pattern Matching, pages 212-221. Springer, 2014. Google Scholar
  16. A. Lempel and J. Ziv. On the complexity of finite sequences. Information Theory, IEEE Transactions on, 22(1):75-81, 1976. Google Scholar
  17. M. G. Main. Detecting leftmost maximal periodicities. Discrete Applied Mathematics, 25(1):145-153, 1989. Google Scholar
  18. M. G. Main and R. J. Lorentz. Linear time recognition of squarefree strings. In Combinatorial Algorithms on Words, pages 271-278. Springer, 1985. Google Scholar
  19. D. Okanohara and K. Sadakane. An online algorithm for finding the longest previous factors. In Algorithms-ESA 2008, pages 696-707. Springer, 2008. Google Scholar
  20. M. Rodeh, V. R. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal of the ACM (JACM), 28(1):16-24, 1981. Google Scholar
  21. T. Starikovskaya. Computing lempel-ziv factorization online. In Mathematical Foundations of Computer Science 2012, pages 789-799. Springer, 2012. Google Scholar
  22. J. D. Ullman, A. V. Aho, and D. S. Hirschberg. Bounds on the complexity of the longest common subsequence problem. Journal of the ACM (JACM), 23(1):1-12, 1976. Google Scholar
  23. J. Yamamoto, H. Bannai, S. Inenaga, and M. Takeda. Faster compact on-line lempel-ziv factorization. arXiv preprint arXiv:1305.6095v1, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail