Relations Between Greedy and Bit-Optimal LZ77 Encodings

Author Dmitry Kosolobov



PDF
Thumbnail PDF

File

LIPIcs.STACS.2018.46.pdf
  • Filesize: 0.53 MB
  • 14 pages

Document Identifiers

Author Details

Dmitry Kosolobov

Cite AsGet BibTex

Dmitry Kosolobov. Relations Between Greedy and Bit-Optimal LZ77 Encodings. In 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 96, pp. 46:1-46:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.STACS.2018.46

Abstract

This paper investigates the size in bits of the LZ77 encoding, which is the most popular and efficient variant of the Lempel-Ziv encodings used in data compression. We prove that, for a wide natural class of variable-length encoders for LZ77 phrases, the size of the greedily constructed LZ77 encoding on constant alphabets is within a factor O(log n / log log log n) of the optimal LZ77 encoding, where n is the length of the processed string. We describe a series of examples showing that, surprisingly, this bound is tight, thus improving both the previously known upper and lower bounds. Further, we obtain a more detailed bound O(min{z, log n / log log z}), which uses the number z of phrases in the greedy LZ77 encoding as a parameter, and construct a series of examples showing that this bound is tight even for binary alphabet. We then investigate the problem on non-constant alphabets: we show that the known O(log n) bound is tight even for alphabets of logarithmic size, and provide tight bounds for some other important cases.
Keywords
  • Lempel-Ziv
  • LZ77 encoding
  • greedy LZ77
  • bit optimal LZ77

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Gad M. Landau, and Esko Ukkonen. Online timestamped text indexing. Inf. Process. Lett., 82(5):253-259, 2002. URL: http://dx.doi.org/10.1016/S0020-0190(01)00275-7.
  2. Djamal Belazzougui and Simon J. Puglisi. Range predecessor and lempel-ziv parsing. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 2053-2071. SIAM, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch143.
  3. Philip Bille, Patrick Hagge Cording, Johannes Fischer, and Inge Li Gørtz. Lempel-ziv compression in a sliding window. In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, July 4-6, 2017, Warsaw, Poland, volume 78 of LIPIcs, pages 15:1-15:11. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2017.15.
  4. Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Trans. Information Theory, 51(7):2554-2576, 2005. URL: http://dx.doi.org/10.1109/TIT.2005.850116.
  5. Martin Cohn. Affine m-ary gray codes. Information and Control, 6(1):70-78, 1963. URL: http://dx.doi.org/10.1016/S0019-9958(63)90119-0.
  6. C. J. Colbourn and J. H. Dinitz. Handbook of combinatorial designs. CRC press, 2006. Google Scholar
  7. Maxime Crochemore, Laura Giambruno, Alessio Langiu, Filippo Mignosi, and Antonio Restivo. Dictionary-symbolwise flexible parsing. J. Discrete Algorithms, 14:74-90, 2012. URL: http://dx.doi.org/10.1016/j.jda.2011.12.021.
  8. Maxime Crochemore, Alessio Langiu, and Filippo Mignosi. The rightmost equal-cost position problem. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2013 Data Compression Conference, DCC 2013, Snowbird, UT, USA, March 20-22, 2013, pages 421-430. IEEE, 2013. URL: http://dx.doi.org/10.1109/DCC.2013.50.
  9. Maxime Crochemore, Alessio Langiu, and Filippo Mignosi. Note on the greedy parsing optimality for dictionary-based text compression. Theor. Comput. Sci., 525:55-59, 2014. URL: http://dx.doi.org/10.1016/j.tcs.2014.01.013.
  10. Peter Elias. Universal codeword sets and representations of the integers. IEEE Trans. Information Theory, 21(2):194-203, 1975. URL: http://dx.doi.org/10.1109/TIT.1975.1055349.
  11. Paolo Ferragina, Igor Nitto, and Rossano Venturini. On the bit-complexity of lempel-ziv compression. SIAM J. Comput., 42(4):1521-1541, 2013. URL: http://dx.doi.org/10.1137/120869511.
  12. Travis Gagie, Pawel Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. Lz77-based self-indexing with faster pattern matching. In Alberto Pardo and Alfredo Viola, editors, LATIN 2014: Theoretical Informatics - 11th Latin American Symposium, Montevideo, Uruguay, March 31 - April 4, 2014. Proceedings, volume 8392 of Lecture Notes in Computer Science, pages 731-742. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-642-54423-1_63.
  13. F. Gray. Pulse code communication, 1953. US Patent 2,632,058. Google Scholar
  14. S. R. Kosaraju and G. Manzini. Some entropic bounds for Lempel-Ziv algorithms. In DCC 1997, page 446. IEEE, 1997. URL: http://dx.doi.org/10.1109/DCC.1997.582106.
  15. Sebastian Kreft and Gonzalo Navarro. On compressing and indexing repetitive sequences. Theor. Comput. Sci., 483:115-133, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2012.02.006.
  16. N. Jesper Larsson. Most recent match queries in on-line suffix trees. In Alexander S. Kulikov, Sergei O. Kuznetsov, and Pavel A. Pevzner, editors, Combinatorial Pattern Matching - 25th Annual Symposium, CPM 2014, Moscow, Russia, June 16-18, 2014. Proceedings, volume 8486 of Lecture Notes in Computer Science, pages 252-261. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07566-2_26.
  17. V. I. Levenshtein. On the redundancy and delay of decodable coding of natural numbers. Systems Theory Research, 20:149-155, 1968. Google Scholar
  18. Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. ACM Comput. Surv., 39(1):2, 2007. URL: http://dx.doi.org/10.1145/1216370.1216372.
  19. N. Rajpoot and C. Sahinalp. Dictionary-based data compression. in Handbook of Lossless Data Compression, pages 153-167, 2002. Google Scholar
  20. Wojciech Rytter. Application of lempel-ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(02)00777-6.
  21. James A. Storer and Thomas G. Szymanski. Data compression via textural substitution. J. ACM, 29(4):928-951, 1982. URL: http://dx.doi.org/10.1145/322344.322346.
  22. A. D. Wyner and J. Ziv. The sliding-window Lempel-Ziv algorithm is asymptotically optimal. Proceedings of the IEEE, 82(6):872-877, 1994. URL: http://dx.doi.org/10.1109/5.286191.
  23. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Information Theory, 23(3):337-343, 1977. URL: http://dx.doi.org/10.1109/TIT.1977.1055714.
  24. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. Information Theory, 24(5):530-536, 1978. URL: http://dx.doi.org/10.1109/TIT.1978.1055934.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail