Lempel-Ziv Compression in a Sliding Window

Authors Philip Bille, Patrick Hagge Cording, Johannes Fischer, Inge Li Gørtz



PDF
Thumbnail PDF

File

LIPIcs.CPM.2017.15.pdf
  • Filesize: 0.68 MB
  • 11 pages

Document Identifiers

Author Details

Philip Bille
Patrick Hagge Cording
Johannes Fischer
Inge Li Gørtz

Cite As Get BibTex

Philip Bille, Patrick Hagge Cording, Johannes Fischer, and Inge Li Gørtz. Lempel-Ziv Compression in a Sliding Window. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 15:1-15:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.CPM.2017.15

Abstract

We present new algorithms for the sliding window Lempel-Ziv (LZ77) problem and the approximate rightmost LZ77 parsing problem.

Our main result is a new and surprisingly simple algorithm that computes the sliding window LZ77 parse in O(w) space and either O(n) expected time or O(n log log w+z log log s) deterministic time. Here, w is the window size, n is the size of the input string, z is the number of phrases in the parse, and s is the size of the alphabet. This matches the space and time bounds of previous results while removing constant size restrictions on the alphabet size.  

To achieve our result, we combine a simple modification and augmentation of the suffix tree with periodicity properties of sliding windows. We also apply this new technique to obtain an algorithm for the approximate rightmost LZ77 problem that uses O(n(log z + log log n)) time and O(n) space and produces a (1+e)-approximation of the rightmost parsing (any constant e>0). While this does not improve the best known time-space trade-offs for exact rightmost parsing, our algorithm is significantly simpler and exposes a direct connection between sliding window parsing and the approximate rightmost matching problem.

Subject Classification

Keywords
  • Lempel-Ziv parsing
  • sliding window
  • rightmost matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Djamal Belazzougui and Simon J. Puglisi. Range predecessor and Lempel-Ziv parsing. In Robert Krauthgamer, editor, Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2016), pages 2053-2071. SIAM, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch143.
  2. Timothy C. Bell. Better OPM/L text compression. IEEE Trans. Commun., 34(12):1176-1182, 1986. URL: http://dx.doi.org/10.1109/TCOM.1986.1096485.
  3. Dany Breslauer and Zvi Galil. Real-time streaming string-matching. ACM Trans. Algorithms, 10(4):22:1-22:12, 2014. URL: http://dx.doi.org/10.1145/2635814.
  4. Maxime Crochemore, Lucian Ilie, and William F. Smyth. A simple algorithm for computing the Lempel Ziv factorization. In James A. Storer and Michael W. Marcellin, editors, Proceedings of the 2008 Data Compression Conference (DCC 2008), pages 482-488. IEEE Computer Society, 2008. URL: http://dx.doi.org/10.1109/DCC.2008.36.
  5. Maxime Crochemore, Alessio Langiu, and Filippo Mignosi. The rightmost equal-cost position problem. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2013 Data Compression Conference (DCC 2013), pages 421-430. IEEE, 2013. URL: http://dx.doi.org/10.1109/DCC.2013.50.
  6. Maxime Crochemore, Alessio Langiu, and Filippo Mignosi. Note on the greedy parsing optimality for dictionary-based text compression. Theor. Comput. Sci., 525:55-59, 2014. URL: http://dx.doi.org/10.1016/j.tcs.2014.01.013.
  7. Martin Farach-Colton, Paolo Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987-1011, 2000. URL: http://dx.doi.org/10.1145/355541.355547.
  8. Paolo Ferragina, Igor Nitto, and Rossano Venturini. On the bit-complexity of Lempel-Ziv compression. SIAM J. Comput., 42(4):1521-1541, 2013. URL: http://dx.doi.org/10.1137/120869511.
  9. Edward R. Fiala and Daniel H. Greene. Data compression with finite windows. Commun. ACM, 32(4):490-505, 1989. URL: http://dx.doi.org/10.1145/63334.63341.
  10. Johannes Fischer, Travis Gagie, Paweł Gawrychowski, and Tomasz Kociumaka. Approximating LZ77 via small-space multiple-pattern matching. In Nikhil Bansal and Irene Finocchi, editors, Proc. of the 23rd Annual European Symposium on Algorithms (ESA 2015), volume 9294 of LNCS, pages 533-544. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_45.
  11. Johannes Fischer and Paweł Gawrychowski. Alphabet-dependent string searching with wexponential search trees. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Proc. of the 26th Annual Symp. on Combinatorial Pattern Matching (CPM 2015), volume 9133 of LNCS, pages 160-171. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_14.
  12. Johannes Fischer, Tomohiro I, and Dominik Köppl. Lempel-Ziv computation in small space (LZ-CISS). In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Proc. of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), volume 9133 of LNCS, pages 172-184. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_15.
  13. Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. J. ACM, 31(3):538-544, 1984. URL: http://dx.doi.org/10.1145/828.1884.
  14. Keisuke Goto and Hideo Bannai. Space efficient linear time Lempel-Ziv factorization for small alphabets. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2014 Data Compression Conference (DCC 2014), pages 163-172. IEEE, 2014. URL: http://dx.doi.org/10.1109/DCC.2014.62.
  15. Yijie Han. Deterministic sorting in O(n log log n) time and linear space. In John H. Reif, editor, Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC 2002), pages 602-608. ACM, 2002. URL: http://dx.doi.org/10.1145/509907.509993.
  16. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Lightweight Lempel-Ziv parsing. In Vincenzo Bonifaci, Camil Demetrescu, and Alberto Marchetti-Spaccamela, editors, Proc. of the 12th International Symposium on Experimental Algorithms (SEA 2013), volume 7933 of LNCS, pages 139-150. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-38527-8_14.
  17. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Linear time Lempel-Ziv factorization: Simple, fast, small. In Johannes Fischer and Peter Sanders, editors, Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching (CPM 2013), volume 7922 of LNCS, pages 189-200. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-38905-4_19.
  18. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Lempel-Ziv parsing in external memory. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2014 Data Compression Conference (DCC 2014), pages 153-162. IEEE, 2014. URL: http://dx.doi.org/10.1109/DCC.2014.78.
  19. Dominik Kempa and Simon J. Puglisi. Lempel-Ziv factorization: Simple, fast, practical. In Peter Sanders and Norbert Zeh, editors, Proceedings of the 15th Meeting on Algorithm Engineering and Experiments (ALENEX 2013), pages 103-112. SIAM, 2013. URL: http://dx.doi.org/10.1137/1.9781611972931.9.
  20. Dominik Köppl and Kunihiko Sadakane. Lempel-Ziv computation in compressed space (LZ-CICS). In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2016 Data Compression Conference (DCC 2016), pages 3-12. IEEE, 2016. URL: http://dx.doi.org/10.1109/DCC.2016.38.
  21. S. Rao Kosaraju and Giovanni Manzini. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. Comput., 29(3):893-911, 1999. URL: http://dx.doi.org/10.1137/S0097539797331105.
  22. Dmitry Kosolobov. Faster lightweight Lempel-Ziv parsing. In Giuseppe F. Italiano, Giovanni Pighizzini, and Donald Sannella, editors, Proceedings of the 40th International Symposium on Mathematical Foundations of Computer Science (MFCS 2015), volume 9235 of LNCS, pages 432-444. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48054-0_36.
  23. Alessio Langiu. On parsing optimality for dictionary-based text compression - the Zip case. J. Discrete Algorithms, 20:65-70, 2013. URL: http://dx.doi.org/10.1016/j.jda.2013.04.001.
  24. N. Jesper Larsson. Extended application of suffix trees to data compression. In James A. Storer and Martin Cohn, editors, Proceedings of the 1996 Data Compression Conference (DCC 1996), pages 190-199. IEEE Computer Society, 1996. URL: http://dx.doi.org/10.1109/DCC.1996.488324.
  25. Edward M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262-272, 1976. URL: http://dx.doi.org/10.1145/321941.321946.
  26. Joong Chae Na, Alberto Apostolico, Costas S. Iliopoulos, and Kunsoo Park. Truncated suffix trees and their application to data compression. Theor. Comput. Sci., 1-3(304):87-101, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(03)00053-7.
  27. Enno Ohlebusch and Simon Gog. Lempel-Ziv factorization revisited. In Raffaele Giancarlo and Giovanni Manzini, editors, Proceedings of the 22nd Annual Symposium on Combinatorial Pattern Matching (CPM 2011), volume 6661 of LNCS, pages 15-26. Springer, 2011. URL: http://dx.doi.org/10.1007/978-3-642-21458-5_4.
  28. Daisuke Okanohara and Kunihiko Sadakane. An online algorithm for finding the longest previous factors. In Dan Halperin and Kurt Mehlhorn, editors, Proceedings of the 16th Annual European Symposium on Algorithms (ESA 2008), volume 5193 of LNCS, pages 696-707. Springer, 2008. URL: http://dx.doi.org/10.1007/978-3-540-87744-8_58.
  29. Alberto Policriti and Nicola Prezza. Fast online Lempel-Ziv factorization in compressed space. In Costas S. Iliopoulos, Simon J. Puglisi, and Emine Yilmaz, editors, Proceedings of the 22nd International Symposium on String Processing and Information Retrieval (SPIRE 2015), volume 9309 of LNCS, pages 13-20. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-23826-5_2.
  30. Alberto Policriti and Nicola Prezza. Computing LZ77 in run-compressed space. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2016 Data Compression Conference (DCC 2016), pages 23-32. IEEE, 2016. URL: http://dx.doi.org/10.1109/DCC.2016.30.
  31. Julian Shun and Fuyao Zhao. Practical parallel Lempel-Ziv factorization. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2013 Data Compression Conference (DCC 2013), pages 123-132. IEEE, 2013. URL: http://dx.doi.org/10.1109/DCC.2013.20.
  32. Tatiana Starikovskaya. Computing Lempel-Ziv factorization online. In Branislav Rovan, Vladimiro Sassone, and Peter Widmayer, editors, Proceedings of the 37th International Symposium on Mathematical Foundations of Computer Science (MFCS 2012), volume 7464 of LNCS, pages 789-799. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-32589-2_68.
  33. James A. Storer and Thomas G. Szymanski. Data compression via textural substitution. J. ACM, 29(4):928-951, 1982. URL: http://dx.doi.org/10.1145/322344.322346.
  34. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. URL: http://dx.doi.org/10.1007/BF01206331.
  35. Jun-ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Faster compact on-line Lempel-Ziv factorization. In Ernst W. Mayr and Natacha Portier, editors, Proceedings of the 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), volume 25 of LIPIcs, pages 675-686. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2014.675.
  36. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23(3):337-343, 1977. URL: http://dx.doi.org/10.1109/TIT.1977.1055714.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail