On the Size of Lempel-Ziv and Lyndon Factorizations

Authors Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, Arseny M. Shur



PDF
Thumbnail PDF

File

LIPIcs.STACS.2017.45.pdf
  • Filesize: 0.5 MB
  • 13 pages

Document Identifiers

Author Details

Juha Kärkkäinen
Dominik Kempa
Yuto Nakashima
Simon J. Puglisi
Arseny M. Shur

Cite As Get BibTex

Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, and Arseny M. Shur. On the Size of Lempel-Ziv and Lyndon Factorizations. In 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 66, pp. 45:1-45:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.STACS.2017.45

Abstract

Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlapping LZ factorization (which we demonstrate by describing a new, non-trivial family of strings) it is always less than twice the size.

Subject Classification

Keywords
  • Lempel-Ziv factorization
  • Lempel-Ziv parsing
  • LZ
  • Lyndon word
  • Lyndon factorization
  • Standard factorization

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Golnaz Badkobeh, Hideo Bannai, Keisuke Goto, Tomohiro I, Costas S. Iliopoulos, Shunsuke Inenaga, Simon J. Puglisi, and Shiho Sugimoto. Closed factorization. Discrete Appl. Math., 212:23-29, 2016. Google Scholar
  2. Hideo Bannai, Travis Gagie, Shunsuke Inenaga, Juha Kärkkäinen, Dominik Kempa, Marcin Pia̧tkowski, Simon J. Puglisi, and Shiho Sugimoto. Diverse palindromic factorization is NP-complete. In Proceedings of the 19th International Conference on Developments in Language Theory (DLT), pages 85-96. Springer, 2015. Google Scholar
  3. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The runs theorem. arXiv, abs/1406.0263, 2014. Google Scholar
  4. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. A new characterization of maximal repetitions by Lyndon trees. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 562-571. SIAM, 2015. Google Scholar
  5. Djamal Belazzougui, Travis Gagie, Pawel Gawrychowski, Juha Kärkkäinen, Alberto Ordóñez Pereira, Simon J. Puglisi, and Yasuo Tabei. Queries on LZ-bounded encodings. In Proceedings of the 2015 Data Compression Conference (DCC), pages 83-92. IEEE, 2015. Google Scholar
  6. Srecko Brlek, Jacques-Olivier Lachaud, Xavier Provençal, and Christophe Reutenauer. Lyndon + Christoffel = digitally convex. Pattern Recogn., 42(10):2239-2246, 2009. Google Scholar
  7. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2), 2008. Google Scholar
  8. Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Trans. Information Theory, 51(7):2554-2576, 2005. Google Scholar
  9. Marc Chemillier. Periodic musical sequences and Lyndon words. Soft Comput., 8(9):611-616, 2004. Google Scholar
  10. Kuo-Tsai Chen, Ralph H. Fox, and Roger C. Lyndon. Free differential calculus, IV. The quotient groups of the lower central series. Ann. Math., 68:81-95, 1958. Google Scholar
  11. Yoann Dieudonné and Franck Petit. Circle formation of weak robots and Lyndon words. Inf. Process. Lett., 101(4):156-162, 2007. Google Scholar
  12. Gabriele Fici, Travis Gagie, Juha Kärkkäinen, and Dominik Kempa. A subquadratic algorithm for minimum palindromic factorization. J. Discrete Algorithms, 28:41-48, 2014. Google Scholar
  13. Travis Gagie, Pawel Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. LZ77-based self-indexing with faster pattern matching. In Proceedings of the 11th Latin American Theoretical Informatics Symposium (LATIN), pages 731-742. Springer, 2014. Google Scholar
  14. Joseph Yossi Gil and David Allen Scott. A bijective string sorting transform. arXiv, abs/1201.3077, 2012. Google Scholar
  15. David Hill, George Melvin, and Damien Mondragon. Representations of quiver Hecke algebras via Lyndon bases. J. Pure Appl. Algebr., 216:1052-1079, 2012. Google Scholar
  16. Christopher Hoobin, Simon J. Puglisi, and Justin Zobel. Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections. PVLDB, 5(3):265-273, 2011. Google Scholar
  17. Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS), pages 596-604. IEEE, 1999. Google Scholar
  18. Manfred Kufleitner. On bijective variants of the Burrows-Wheeler transform. In Proceedings of the 2009 Prague Stringology Conference (PSC), pages 65-79. Czech Technical University in Prague, 2009. Google Scholar
  19. Pierre Lalonde and Arun Ram. Standard Lyndon bases of Lie algebras and enveloping algebras. Transactions of the American Mathematical Society, 347:1821-1830, 1995. Google Scholar
  20. M. Lothaire. Combinatorics on Words. Cambridge University Press, 1997. Google Scholar
  21. Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. Suffix array and Lyndon factorization of a text. J. Discrete Algorithms, 28:2-8, 2014. Google Scholar
  22. Yoshiaki Matsuoka, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Florin Manea. Factorizing a string into squares in linear time. In Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 27:1-27:12. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016. Google Scholar
  23. Marcin Mucha. Lyndon words and short superstrings. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 958-972. SIAM, 2013. Google Scholar
  24. Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. Google Scholar
  25. I Tomohiro, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Lyndon factorization algorithms for SLP and LZ78 compressed text. Theor. Comput. Sci., 656:215-224, 2016. Google Scholar
  26. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23(3):337-343, 1977. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail