Maintaining the Size of LZ77 on Semi-Dynamic Strings

Authors Hideo Bannai , Panagiotis Charalampopoulos , Jakub Radoszewski



PDF
Thumbnail PDF

File

LIPIcs.CPM.2024.3.pdf
  • Filesize: 0.89 MB
  • 20 pages

Document Identifiers

Author Details

Hideo Bannai
  • M&D Data Science Center, Tokyo Medical and Dental University (TMDU), Japan
Panagiotis Charalampopoulos
  • Birkbeck, University of London, UK
Jakub Radoszewski
  • Institute of Informatics, University of Warsaw, Poland

Cite AsGet BibTex

Hideo Bannai, Panagiotis Charalampopoulos, and Jakub Radoszewski. Maintaining the Size of LZ77 on Semi-Dynamic Strings. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 3:1-3:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.CPM.2024.3

Abstract

We consider the problem of maintaining the size of the LZ77 factorization of a string S of length at most n under the following operations: (a) appending a given letter to S and (b) deleting the first letter of S. Our main result is an algorithm for this problem with amortized update time Õ(√n). As a corollary, we obtain an Õ(n√n)-time algorithm for computing the most LZ77-compressible rotation of a length-n string - a naive approach for this problem would compute the LZ77 factorization of each possible rotation and would thus take quadratic time in the worst case. We also show an Ω(√n) lower bound for the additive sensitivity of LZ77 with respect to the rotation operation. Our algorithm employs dynamic trees to maintain the longest-previous-factor array information and depends on periodicity-based arguments that bound the number of the required updates and enable their efficient computation.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Data compression
Keywords
  • Lempel-Ziv
  • compression
  • LZ77
  • semi-dynamic algorithm
  • cyclic rotation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mohamed Ibrahim Abouelhoda, Stefan Kurtz, and Enno Ohlebusch. Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms, 2(1):53-86, 2004. URL: https://doi.org/10.1016/S1570-8667(03)00065-0.
  2. Tooru Akagi, Mitsuru Funakoshi, and Shunsuke Inenaga. Sensitivity of string compressors and repetitiveness measures. Information and Computation, 291:104999, 2023. URL: https://doi.org/10.1016/J.IC.2022.104999.
  3. Tooru Akagi, Yuki Kuhara, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Combinatorics of minimal absent words for a sliding window. Theoretical Computer Science, 927:109-119, 2022. URL: https://doi.org/10.1016/J.TCS.2022.06.002.
  4. Anisa Al-Hafeedh, Maxime Crochemore, Lucian Ilie, Evguenia Kopylova, William F. Smyth, German Tischler, and Munina Yusufu. A comparison of index-based Lempel-Ziv LZ77 factorization algorithms. ACM Computing Surveys, 45(1):5:1-5:17, 2012. URL: https://doi.org/10.1145/2379776.2379781.
  5. Arne Andersson and Mikkel Thorup. Dynamic ordered sets with exponential search trees. Journal of the ACM, 54(3):13, 2007. URL: https://doi.org/10.1145/1236457.1236460.
  6. Hideo Bannai, Travis Gagie, and Tomohiro I. Refining the r-index. Theoretical Computer Science, 812:96-108, 2020. URL: https://doi.org/10.1016/J.TCS.2019.08.005.
  7. Djamal Belazzougui, Dmitry Kosolobov, Simon J. Puglisi, and Rajeev Raman. Weighted ancestors in suffix trees revisited. In 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021, pages 8:1-8:15, 2021. URL: https://doi.org/10.4230/LIPICS.CPM.2021.8.
  8. Djamal Belazzougui and Simon J. Puglisi. Range predecessor and Lempel-Ziv parsing. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, pages 2053-2071, 2016. URL: https://doi.org/10.1137/1.9781611974331.CH143.
  9. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, pages 88-94. Springer, 2000. URL: https://doi.org/10.1007/10719839_9.
  10. Philip Bille, Patrick Hagge Cording, Johannes Fischer, and Inge Li Gørtz. Lempel-Ziv compression in a sliding window. In 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, pages 15:1-15:11, 2017. URL: https://doi.org/10.4230/LIPICS.CPM.2017.15.
  11. Dany Breslauer and Zvi Galil. Finding all periods and initial palindromes of a string in parallel. Algorithmica, 14(4):355-366, 1995. URL: https://doi.org/10.1007/BF01294132.
  12. Andrej Brodnik and Matevz Jekovec. Sliding suffix tree. Algorithms, 11(8):118, 2018. URL: https://doi.org/10.3390/A11080118.
  13. Graham Cormode and S. Muthukrishnan. Substring compression problems. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005, pages 321-330. SIAM, 2005. URL: http://dl.acm.org/citation.cfm?id=1070432.1070478.
  14. Maxime Crochemore. Linear searching for a square in a word. Bulletin-European Association for Theoretical Computer Science, 24(1):66-72, 1984. Google Scholar
  15. Maxime Crochemore, Alice Héliou, Gregory Kucherov, Laurent Mouchard, Solon P. Pissis, and Yann Ramusat. Absent words in a sliding window with applications. Information and Computation, 270, 2020. URL: https://doi.org/10.1016/J.IC.2019.104461.
  16. Maxime Crochemore and Lucian Ilie. Computing longest previous factor in linear time and applications. Information Processing Letters, 106(2):75-80, 2008. URL: https://doi.org/10.1016/J.IPL.2007.10.006.
  17. Maxime Crochemore, Lucian Ilie, Costas S. Iliopoulos, Marcin Kubica, Wojciech Rytter, and Tomasz Waleń. Computing the longest previous factor. European Journal of Combinatorics, 34(1):15-26, 2013. URL: https://doi.org/10.1016/J.EJC.2012.07.011.
  18. Maxime Crochemore, Lucian Ilie, and William F. Smyth. A simple algorithm for computing the Lempel Ziv factorization. In 2008 Data Compression Conference (DCC 2008), pages 482-488, 2008. URL: https://doi.org/10.1109/DCC.2008.36.
  19. Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, M. Sohel Rahman, German Tischler, and Tomasz Waleń. Improved algorithms for the range next value problem and applications. Theoretical Computer Science, 434:23-34, 2012. URL: https://doi.org/10.1016/J.TCS.2012.02.015.
  20. Jonas Ellert. Sublinear time Lempel-Ziv (LZ77) factorization. In String Processing and Information Retrieval - 30th International Symposium, SPIRE 2023, pages 171-187, 2023. URL: https://doi.org/10.1007/978-3-031-43980-3_14.
  21. Edward R. Fiala and Daniel H. Greene. Data compression with finite windows. Communications of the ACM, 32(4):490-505, 1989. URL: https://doi.org/10.1145/63334.63341.
  22. Nathan J. Fine and Herbert S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. URL: https://doi.org/10.2307/2034009.
  23. Johannes Fischer and Pawel Gawrychowski. Alphabet-dependent string searching with wexponential search trees. In Combinatorial Pattern Matching, CPM 2015, pages 160-171, 2015. URL: https://doi.org/10.1007/978-3-319-19929-0_14.
  24. Johannes Fischer, Tomohiro I, and Dominik Köppl. Lempel Ziv computation in small space (LZ-CISS). In Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, pages 172-184, 2015. URL: https://doi.org/10.1007/978-3-319-19929-0_15.
  25. Johannes Fischer, Tomohiro I, Dominik Köppl, and Kunihiko Sadakane. Lempel-Ziv factorization powered by space efficient suffix trees. Algorithmica, 80(7):2048-2081, 2018. URL: https://doi.org/10.1007/S00453-017-0333-1.
  26. Younan Gao, Meng He, and Yakov Nekrich. Fast preprocessing for optimal orthogonal range reporting and range successor with applications to text indexing. In 28th Annual European Symposium on Algorithms, ESA 2020, pages 54:1-54:18, 2020. URL: https://doi.org/10.4230/LIPICS.ESA.2020.54.
  27. Keisuke Goto and Hideo Bannai. Simpler and faster Lempel Ziv factorization. In 2013 Data Compression Conference, DCC 2013, pages 133-142, 2013. URL: https://doi.org/10.1109/DCC.2013.21.
  28. Keisuke Goto and Hideo Bannai. Space efficient linear time Lempel-Ziv factorization for small alphabets. In Data Compression Conference, DCC 2014, pages 163-172, 2014. URL: https://doi.org/10.1109/DCC.2014.62.
  29. Shunsuke Inenaga, Ayumi Shinohara, Masayuki Takeda, and Setsuo Arikawa. Compact directed acyclic word graphs for a sliding window. Journal of Discrete Algorithms, 2(1):33-51, 2004. URL: https://doi.org/10.1016/S1570-8667(03)00064-9.
  30. Yusuke Ishida, Shunsuke Inenaga, Ayumi Shinohara, and Masayuki Takeda. Fully incremental LCS computation. In 15th International Symposium on Fundamentals of Computation Theory, FCT 2005, pages 563-574, 2005. URL: https://doi.org/10.1007/11537311_49.
  31. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Lightweight Lempel-Ziv parsing. In Experimental Algorithms, 12th International Symposium, SEA 2013, volume 7933, pages 139-150, 2013. URL: https://doi.org/10.1007/978-3-642-38527-8_14.
  32. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Linear time Lempel-Ziv factorization: Simple, fast, small. In Combinatorial Pattern Matching, 24th Annual Symposium, CPM 2013, pages 189-200, 2013. URL: https://doi.org/10.1007/978-3-642-38905-4_19.
  33. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Lempel-Ziv parsing in external memory. In Data Compression Conference, DCC 2014, pages 153-162, 2014. URL: https://doi.org/10.1109/DCC.2014.78.
  34. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. Journal of the ACM, 53(6):918-936, 2006. URL: https://doi.org/10.1145/1217856.1217858.
  35. Orgad Keller, Tsvi Kopelowitz, Shir Landau Feibish, and Moshe Lewenstein. Generalized substring compression. Theoretical Computer Science, 525:42-54, 2014. URL: https://doi.org/10.1016/J.TCS.2013.10.010.
  36. Dominik Kempa. Optimal construction of compressed indexes for highly repetitive texts. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 1344-1357, 2019. URL: https://doi.org/10.1137/1.9781611975482.82.
  37. Dominik Kempa and Tomasz Kociumaka. Dynamic suffix array with polylogarithmic queries and updates. In STOC 2022: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1657-1670, 2022. URL: https://doi.org/10.1145/3519935.3520061.
  38. Dominik Kempa and Simon J. Puglisi. Lempel-Ziv factorization: Simple, fast, practical. In Proceedings of the 15th Meeting on Algorithm Engineering and Experiments, ALENEX 2013, pages 103-112, 2013. URL: https://doi.org/10.1137/1.9781611972931.9.
  39. Sung-Ryul Kim and Kunsoo Park. A dynamic edit distance table. Journal of Discrete Algorithms, 2(2):303-312, 2004. URL: https://doi.org/10.1016/S1570-8667(03)00082-0.
  40. Philip Klein and Shay Mozes. Optimization algorithms for planar graphs, 2023. URL: https://planarity.org/.
  41. Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323-350, 1977. URL: https://doi.org/10.1137/0206024.
  42. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal pattern matching queries in a text and applications. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pages 532-551, 2015. URL: https://doi.org/10.1137/1.9781611973730.36.
  43. Dominik Köppl. Non-overlapping LZ77 factorization and LZ78 substring compression queries with suffix trees. Algorithms, 14(2):44, 2021. URL: https://doi.org/10.3390/A14020044.
  44. Dominik Köppl and Kunihiko Sadakane. Lempel-Ziv computation in compressed space (LZ-CICS). In 2016 Data Compression Conference, DCC 2016, pages 3-12, 2016. URL: https://doi.org/10.1109/DCC.2016.38.
  45. Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM Journal on Computing, 27(2):557-582, 1998. URL: https://doi.org/10.1137/S0097539794264810.
  46. N. Jesper Larsson. Extended application of suffix trees to data compression. In Proceedings of the 6th Data Compression Conference (DCC 1996), pages 190-199, 1996. URL: https://doi.org/10.1109/DCC.1996.488324.
  47. Takuya Mieno, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing minimal unique substrings for a sliding window. Algorithmica, 84(3):670-693, 2022. URL: https://doi.org/10.1007/S00453-021-00864-1.
  48. Takuya Mieno and Mitsuru Funakoshi. Shortest unique palindromic substring queries in semi-dynamic settings. In Combinatorial Algorithms - 33rd International Workshop, IWOCA 2022, pages 425-438, 2022. URL: https://doi.org/10.1007/978-3-031-06678-8_31.
  49. Takuya Mieno, Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Palindromic trees for a sliding window and its applications. Information Processing Letters, 173:106174, 2022. URL: https://doi.org/10.1016/J.IPL.2021.106174.
  50. Yakov Nekrich and Gonzalo Navarro. Sorted range reporting. In Algorithm Theory - SWAT 2012 - 13th Scandinavian Symposium and Workshops, pages 271-282, 2012. URL: https://doi.org/10.1007/978-3-642-31155-0_24.
  51. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. In 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016, pages 72:1-72:15, 2016. URL: https://doi.org/10.4230/LIPICS.MFCS.2016.72.
  52. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. CoRR, abs/1605.01488, 2016. URL: https://arxiv.org/abs/1605.01488.
  53. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic index and LZ factorization in compressed space. Discrete and Applied Mathematics, 274:116-129, 2020. URL: https://doi.org/10.1016/J.DAM.2019.01.014.
  54. Enno Ohlebusch and Simon Gog. Lempel-Ziv factorization revisited. In Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, pages 15-26, 2011. URL: https://doi.org/10.1007/978-3-642-21458-5_4.
  55. Daisuke Okanohara and Kunihiko Sadakane. An online algorithm for finding the longest previous factors. In Algorithms - ESA 2008, 16th Annual European Symposium, pages 696-707, 2008. URL: https://doi.org/10.1007/978-3-540-87744-8_58.
  56. Wojciech Plandowski and Wojciech Rytter. Application of Lempel-Ziv encodings to the solution of words equations. In Automata, Languages and Programming, 25th International Colloquium, ICALP 1998, pages 731-742, 1998. URL: https://doi.org/10.1007/BFB0055097.
  57. Alberto Policriti and Nicola Prezza. LZ77 computation based on the run-length encoded BWT. Algorithmica, 80(7):1986-2011, 2018. URL: https://doi.org/10.1007/S00453-017-0327-Z.
  58. Nicola Prezza and Giovanna Rosone. Faster online computation of the succinct longest previous factor array. In Beyond the Horizon of Computability - 16th Conference on Computability in Europe, CiE 2020, pages 339-352, 2020. URL: https://doi.org/10.1007/978-3-030-51466-2_31.
  59. Milan Ružić. Constructing efficient dictionaries in close to sorting time. In International Colloquium on Automata, Languages and Programming, ICALP 2008, pages 84-95. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-70575-8_8.
  60. Martin Senft and Tomás Dvorák. Sliding CDAWG perfection. In String Processing and Information Retrieval, 15th International Symposium, SPIRE 2008, pages 109-120, 2008. URL: https://doi.org/10.1007/978-3-540-89097-3_12.
  61. Daniel Dominic Sleator and Robert Endre Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 26(3):362-391, 1983. URL: https://doi.org/10.1016/0022-0000(83)90006-5.
  62. Tatiana Starikovskaya. Computing Lempel-Ziv factorization online. In Mathematical Foundations of Computer Science 2012 - 37th International Symposium, MFCS 2012, pages 789-799, 2012. URL: https://doi.org/10.1007/978-3-642-32589-2_68.
  63. James A. Storer and Thomas G. Szymanski. Data compression via textual substitution. Journal of the ACM, 29(4):928-951, 1982. URL: https://doi.org/10.1145/322344.322346.
  64. Robert Endre Tarjan. Data structures and network algorithms, volume 44 of CBMS-NSF regional conference series in applied mathematics. SIAM, 1983. URL: https://doi.org/10.1137/1.9781611970265.
  65. Mikkel Thorup. Equivalence between priority queues and sorting. Journal of the ACM, 54(6):28, 2007. URL: https://doi.org/10.1145/1314690.1314692.
  66. Alexandre Tiskin. Semi-local string comparison: algorithmic techniques and applications. CoRR, abs/0707.3619, 2007. URL: https://arxiv.org/abs/0707.3619.
  67. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. URL: https://doi.org/10.1007/BF01206331.
  68. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(N). Information Processing Letters, 17(2):81-84, 1983. URL: https://doi.org/10.1016/0020-0190(83)90075-3.
  69. Jun-ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Faster compact on-line Lempel-Ziv factorization. In 31st International Symposium on Theoretical Aspects of Computer Science, STACS 2014, pages 675-686, 2014. URL: https://doi.org/10.4230/LIPICS.STACS.2014.675.
  70. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337-343, 1977. URL: https://doi.org/10.1109/TIT.1977.1055714.