Optimal LZ-End Parsing Is Hard

Authors Hideo Bannai , Mitsuru Funakoshi , Kazuhiro Kurita , Yuto Nakashima , Kazuhisa Seto , Takeaki Uno



PDF
Thumbnail PDF

File

LIPIcs.CPM.2023.3.pdf
  • Filesize: 0.93 MB
  • 11 pages

Document Identifiers

Author Details

Hideo Bannai
  • M&D Data Science Center, Tokyo Medical and Dental University, Japan
Mitsuru Funakoshi
  • Department of Informatics, Kyushu University, Fukuoka, Japan
  • Japan Society for the Promotion of Science, Tokyo, Japan
Kazuhiro Kurita
  • Nagoya University, Japan
Yuto Nakashima
  • Department of Informatics, Kyushu University, Fukuoka, Japan
Kazuhisa Seto
  • Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan
Takeaki Uno
  • National Institute of Informatics, Tokyo, Japan

Acknowledgements

We would like to thank Dominik Köppl for discussion. We also gratefully acknowledge the comments of anonymous reviewers for improving the manuscript.

Cite As Get BibTex

Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno. Optimal LZ-End Parsing Is Hard. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 3:1-3:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.CPM.2023.3

Abstract

LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint [Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches 2.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data compression
Keywords
  • Data Compression
  • LZ-End
  • Repetitiveness measures

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, and Takaaki Nishimoto. Computing np-hard repetitiveness measures via MAX-SAT. In Shiri Chechik, Gonzalo Navarro, Eva Rotenberg, and Grzegorz Herman, editors, 30th Annual European Symposium on Algorithms, ESA 2022, September 5-9, 2022, Berlin/Potsdam, Germany, volume 244 of LIPIcs, pages 12:1-12:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.ESA.2022.12.
  2. Sergio De Agostino and James A. Storer. On-line versus off-line computation in dynamic text compression. Information Processing Letters, 59(3):169-174, 1996. URL: https://doi.org/10.1016/0020-0190(96)00068-3.
  3. Michael R Garey and David S Johnson. Computers and intractability, volume 174. freeman San Francisco, 1979. Google Scholar
  4. Takumi Ideue, Takuya Mieno, Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, and Masayuki Takeda. On the approximation ratio of LZ-End to LZ77. In Thierry Lecroq and Hélène Touzet, editors, String Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Lille, France, October 4-6, 2021, Proceedings, volume 12944 of Lecture Notes in Computer Science, pages 114-126. Springer, 2021. URL: https://doi.org/10.1007/978-3-030-86692-1_10.
  5. Dominik Kempa and Dmitry Kosolobov. LZ-End parsing in compressed space. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2017 Data Compression Conference, DCC 2017, Snowbird, UT, USA, April 4-7, 2017, pages 350-359. IEEE, 2017. URL: https://doi.org/10.1109/DCC.2017.73.
  6. Dominik Kempa and Dmitry Kosolobov. LZ-End parsing in linear time. In Kirk Pruhs and Christian Sohler, editors, 25th Annual European Symposium on Algorithms, ESA 2017, September 4-6, 2017, Vienna, Austria, volume 87 of LIPIcs, pages 53:1-53:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.ESA.2017.53.
  7. Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: string attractors. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 827-840. ACM, 2018. URL: https://doi.org/10.1145/3188745.3188814.
  8. Dominik Kempa and Barna Saha. An upper bound and linear-space queries on the LZ-End parsing. In Joseph (Seffi) Naor and Niv Buchbinder, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, pages 2847-2866. SIAM, 2022. URL: https://doi.org/10.1137/1.9781611977073.111.
  9. Sebastian Kreft and Gonzalo Navarro. LZ77-like compression with fast random access. In James A. Storer and Michael W. Marcellin, editors, 2010 Data Compression Conference (DCC 2010), 24-26 March 2010, Snowbird, UT, USA, pages 239-248. IEEE Computer Society, 2010. URL: https://doi.org/10.1109/DCC.2010.29.
  10. Sebastian Kreft and Gonzalo Navarro. On compressing and indexing repetitive sequences. Theor. Comput. Sci., 483:115-133, 2013. URL: https://doi.org/10.1016/j.tcs.2012.02.006.
  11. Abraham Lempel and Jacob Ziv. On the complexity of finite sequences. IEEE Transactions on Information Theory, 22(1):75-81, 1976. URL: https://doi.org/10.1109/TIT.1976.1055501.
  12. Gonzalo Navarro. Indexing highly repetitive string collections. CoRR, abs/2004.02781, 2020. URL: https://arxiv.org/abs/2004.02781.
  13. Gonzalo Navarro. Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv., 54(2):29:1-29:31, 2021. URL: https://doi.org/10.1145/3434399.
  14. Carsten Sinz. Towards an optimal CNF encoding of boolean cardinality constraints. In Peter van Beek, editor, Principles and Practice of Constraint Programming - CP 2005, 11th International Conference, CP 2005, Sitges, Spain, October 1-5, 2005, Proceedings, volume 3709 of Lecture Notes in Computer Science, pages 827-831. Springer, 2005. URL: https://doi.org/10.1007/11564751_73.
  15. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337-343, 1977. URL: https://doi.org/10.1109/TIT.1977.1055714.
  16. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5):530-536, 1978. URL: https://doi.org/10.1109/TIT.1978.1055934.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail