Analysis of Lempel-Ziv'78 for Markov Sources

Jacquet, Philippe; Szpankowski, Wojciech

doi:10.4230/LIPIcs.AofA.2020.15

File

LIPIcs.AofA.2020.15.pdf

Filesize: 0.68 MB
19 pages

Document Identifiers

DOI: 10.4230/LIPIcs.AofA.2020.15
URN: urn:nbn:de:0030-drops-120459

Author Details

Philippe Jacquet

INRIA, Paris, France

Wojciech Szpankowski

Center for Science of Information, Department of Computer Science, Purdue University, West Lafayette, IN, USA

Acknowledgements

We thank Guillaume Duboc for simulation of LZ78 scheme resulting in Figure 2.

Cite AsGet BibTex

Philippe Jacquet and Wojciech Szpankowski. Analysis of Lempel-Ziv'78 for Markov Sources. In 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 159, pp. 15:1-15:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.AofA.2020.15

Abstract

Lempel-Ziv'78 is one of the most popular data compression algorithms. Over the last few decades fascinating properties of LZ78 were uncovered. Among others, in 1995 we settled the Ziv conjecture by proving that for a memoryless source the number of LZ78 phrases satisfies the Central Limit Theorem (CLT). Since then the quest commenced to extend it to Markov sources. However, despite several attempts this problem is still open. The 1995 proof of the Ziv conjecture was based on two models: In the DST-model, the associated digital search tree (DST) is built over m independent strings. In the LZ-model a single string of length n is partitioned into variable length phrases such that the next phrase is not seen in the past as a phrase. The Ziv conjecture for memoryless source was settled by proving that both DST-model and the LZ-model are asymptotically equivalent. The main result of this paper shows that this is not the case for the LZ78 algorithm over Markov sources. In addition, we develop here a large deviation for the number of phrases in the LZ78 and give a precise asymptotic expression for the redundancy which is the excess of LZ78 code over the entropy of the source. We establish these findings using a combination of combinatorial and analytic tools. In particular, to handle the strong dependency between Markov phrases, we introduce and precisely analyze the so called tail symbol which is the first symbol of the next phrase in the LZ78 parsing.

Subject Classification

ACM Subject Classification

Mathematics of computing → Information theory

Keywords

Lempel-Ziv algorithm
digital search trees
depoissonization
analytic combinatorics
large deviations

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

D. Aldous and P. Shields. A diffusion limit for a class of random-growing binary trees. Probab. Th. Rel. Fields, 79:509-542, 1988.
J. Fayolle and M. Ward. Analysis of the average depth in a suffix tree under a markov model. In 2005 International Conference on Analysis of Algorithms, pages 95-104, 2005.
P. Jacquet and W. Szpankowski. Asymptotic behavior of the lempel-ziv parsing scheme and digital search trees. Theoretical Computer Science, 144:161-197, 1995.
P. Jacquet and W. Szpankowski. Anaylytical depoissonization and its applications. Theoretical Computer Science, 201:1-62, 1998.
P. Jacquet and W. Szpankowski. On the limiting distribution of lempel ziv'78 redundancy for memoryless sources. IEEE Trans. Information Theory, 60:6917-6930, 2014.
P. Jacquet and W. Szpankowski. Analytic Pattern Matching: From DNA to Twitter. Cambridge University Press, Cambridge, 2015.
P. Jacquet, W. Szpankowski, and J. Tang. Average profile of hte lempel-ziv parsing scheme for a markovian source. Algorithmica, 31:318-360, 2001.
D. E. Knuth. The Art of Computer Programming Sorting and Searching. Addison-Wesley Reading, 1998.
K. Leckey, R. Neininger, and W. Szpankowski. Towards more realistic probabilistic models for data structures: The external path length in tries under the markov model. In ACM-SIAM Symposium on Discrete Algorithms, pages 877-886, 2013.
G. Louchard and W. Szpankowski. Average profile and limiting distribution for a phrase size in the lempel-ziv parsing algorithm. IEEE Trans. Information Theory, 41:478-488, 1995.
N. Merhav. Universal coding with minimum probability of codeword length overflow. IEEE Trans. Information Theory, 37:556-563, 1991.
N. Merhav and J. Ziv. On the amount of statistical side information required for lossy data compression. IEEE Trans. Information Theory, 43:1112-1121, 1997.
R. Neininger and L. Rüschendorf. A general limit theorem for recursive algorithms and combinatorial structures. Ann. Appl. Probab., 14:378-418, 2004.
S. Savari. Redundancy of the lempel-ziv incremental parsing rule. IEEE Trans. Information Theory, 43:9-21, 1997.
W. Szpankowski. Average Case Analysis of Algorithms on Sequences. Wiley New York, New York, 2001.
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. Information Theory, 24:530-536, 1978.

Analysis of Lempel-Ziv'78 for Markov Sources

Authors Philippe Jacquet , Wojciech Szpankowski

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Analysis of Lempel-Ziv'78 for Markov Sources

Authors Philippe Jacquet , Wojciech Szpankowski

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message