Edit and Alphabet-Ordering Sensitivity of Lex-Parse

Authors Yuto Nakashima , Dominik Köppl , Mitsuru Funakoshi , Shunsuke Inenaga , Hideo Bannai



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2024.75.pdf
  • Filesize: 0.9 MB
  • 15 pages

Document Identifiers

Author Details

Yuto Nakashima
  • Department of Informatics, Kyushu University, Fukuoka, Japan
Dominik Köppl
  • Department of Computer Science and Engineering, University of Yamanashi, Kofu, Japan
  • M&D Data Science Center, Tokyo Medical and Dental University, Japan
Mitsuru Funakoshi
  • NTT Communication Science Laboratories, Kyoto, Japan
Shunsuke Inenaga
  • Department of Informatics, Kyushu University, Fukuoka, Japan
Hideo Bannai
  • M&D Data Science Center, Tokyo Medical and Dental University, Japan

Cite AsGet BibTex

Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, and Hideo Bannai. Edit and Alphabet-Ordering Sensitivity of Lex-Parse. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 306, pp. 75:1-75:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.MFCS.2024.75

Abstract

We investigate the compression sensitivity [Akagi et al., 2023] of lex-parse [Navarro et al., 2021] for two operations: (1) single character edit and (2) modification of the alphabet ordering, and give tight upper and lower bounds for both operations (i.e., we show Θ(log n) bounds for strings of length n). For both lower bounds, we use the family of Fibonacci words. For the bounds on edit operations, our analysis makes heavy use of properties of the Lyndon factorization of Fibonacci words to characterize the structure of lex-parse.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data compression
Keywords
  • Compression sensitivity
  • Lex-parse
  • Fibonacci words

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Tooru Akagi, Mitsuru Funakoshi, and Shunsuke Inenaga. Sensitivity of string compressors and repetitiveness measures. Information and Computation, 291:104999, 2023. URL: https://doi.org/10.1016/j.ic.2022.104999.
  2. Jason W. Bentley, Daniel Gibney, and Sharma V. Thankachan. On the complexity of BWT-runs minimization via alphabet reordering. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms, ESA 2020, September 7-9, 2020, Pisa, Italy (Virtual Conference), volume 173 of LIPIcs, pages 15:1-15:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.15.
  3. Anselm Blumer, J. Blumer, David Haussler, Ross M. McConnell, and Andrzej Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578-595, 1987. URL: https://doi.org/10.1145/28869.28873.
  4. Michael Burrows and David Wheeler. A block-sorting lossless data compression algorithm. Technical report, DIGITAL SRC RESEARCH REPORT, 1994. Google Scholar
  5. Davide Cenzato, Veronica Guerrini, Zsuzsanna Lipták, and Giovanna Rosone. Computing the optimal BWT of very large string collections. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Data Compression Conference, DCC 2023, Snowbird, UT, USA, March 21-24, 2023, pages 71-80. IEEE, 2023. URL: https://doi.org/10.1109/DCC55655.2023.00015.
  6. K. T. Chen, R. H. Fox, and R. C. Lyndon. Free differential calculus. IV. The quotient groups of the lower central series. Annals of Mathematics, 68(1):81-95, 1958. Google Scholar
  7. Amanda Clare and Jacqueline W. Daykin. Enhanced string factoring from alphabet orderings. Inf. Process. Lett., 143:4-7, 2019. URL: https://doi.org/10.1016/J.IPL.2018.10.011.
  8. Amanda Clare, Jacqueline W. Daykin, Thomas Mills, and Christine Zarges. Evolutionary search techniques for the Lyndon factorization of biosequences. In Manuel López-Ibáñez, Anne Auger, and Thomas Stützle, editors, Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2019, Prague, Czech Republic, July 13-17, 2019, pages 1543-1550. ACM, 2019. URL: https://doi.org/10.1145/3319619.3326872.
  9. Maxime Crochemore and Wojciech Rytter. Squares, cubes, and time-space efficient string searching. Algorithmica, 13(5):405-425, 1995. URL: https://doi.org/10.1007/BF01190846.
  10. Maxime Crochemore and Renaud Vérin. Direct construction of compact directed acyclic word graphs. In Proc. CPM, volume 1264 of LNCS, pages 116-129, 1997. URL: https://doi.org/10.1007/3-540-63220-4_55.
  11. Hiroto Fujimaru, Yuto Nakashima, and Shunsuke Inenaga. On sensitivity of compact directed acyclic word graphs. In Anna E. Frid and Robert Mercas, editors, Combinatorics on Words - 14th International Conference, WORDS 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, volume 13899 of Lecture Notes in Computer Science, pages 168-180. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-33180-0_13.
  12. Raffaele Giancarlo, Giovanni Manzini, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. A new class of string transformations for compressed text indexing. Inf. Comput., 294:105068, 2023. URL: https://doi.org/10.1016/J.IC.2023.105068.
  13. Daniel Gibney and Sharma V. Thankachan. Finding an optimal alphabet ordering for Lyndon factorization is hard. In Markus Bläser and Benjamin Monmege, editors, 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), volume 187 of LIPIcs, pages 35:1-35:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.35.
  14. Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Giuseppe Romana, Marinella Sciortino, and Cristian Urbina. Bit catastrophes for the Burrows-Wheeler transform. In Frank Drewes and Mikhail Volkov, editors, Developments in Language Theory - 27th International Conference, DLT 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, volume 13911 of Lecture Notes in Computer Science, pages 86-99. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-33264-7_8.
  15. Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Lyndon factorization algorithms for SLP and LZ78 compressed text. Theor. Comput. Sci., 656:215-224, 2016. URL: https://doi.org/10.1016/j.tcs.2016.03.005.
  16. Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Factorizing strings into repetitions. Theory Comput. Syst., 66(2):484-501, 2022. URL: https://doi.org/10.1007/S00224-022-10070-3.
  17. Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: string attractors. In STOC 2018, pages 827-840, 2018. Google Scholar
  18. Dominik Köppl and Tomohiro I. Arithmetics on suffix arrays of Fibonacci words. In Florin Manea and Dirk Nowotka, editors, Combinatorics on Words - 10th International Conference, WORDS 2015, Kiel, Germany, September 14-17, 2015, Proceedings, volume 9304 of Lecture Notes in Computer Science, pages 135-146. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-23660-5_12.
  19. Guillaume Lagarde and Sylvain Perifel. Lempel-Ziv: A "one-bit catastrophe" but not a tragedy. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 1478-1495. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.97.
  20. M Lothaire. Applied combinatorics on words, volume 105. Cambridge University Press, 2005. Google Scholar
  21. R. C. Lyndon. On Burnside’s problem. Transactions of the American Mathematical Society, 77:202-215, 1954. Google Scholar
  22. Lily Major, Amanda Clare, Jacqueline W. Daykin, Benjamin Mora, Leonel Jose Peña Gamboa, and Christine Zarges. Evaluation of a permutation-based evolutionary framework for Lyndon factorizations. In Thomas Bäck, Mike Preuss, André H. Deutz, Hao Wang, Carola Doerr, Michael T. M. Emmerich, and Heike Trautmann, editors, Parallel Problem Solving from Nature - PPSN XVI - 16th International Conference, PPSN 2020, Leiden, The Netherlands, September 5-9, 2020, Proceedings, Part I, volume 12269 of Lecture Notes in Computer Science, pages 390-403. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-58112-1_27.
  23. Lily Major, Amanda Clare, Jacqueline W. Daykin, Benjamin Mora, and Christine Zarges. Heuristics for the run-length encoded Burrows-Wheeler transform alphabet ordering problem. CoRR, abs/2401.16435, 2024. URL: https://doi.org/10.48550/arXiv.2401.16435.
  24. Guy Melançon. Lyndon factorization of sturmian words. Discrete Mathematics, 210(1):137-149, 2000. URL: https://doi.org/10.1016/S0012-365X(99)00123-5.
  25. Gonzalo Navarro. Indexing highly repetitive string collections. CoRR, abs/2004.02781, 2020. URL: https://arxiv.org/abs/2004.02781.
  26. Gonzalo Navarro. Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv., 54(2):29:1-29:31, 2021. URL: https://doi.org/10.1145/3434399.
  27. Gonzalo Navarro, Carlos Ochoa, and Nicola Prezza. On the approximation ratio of ordered parsings. IEEE Trans. Inf. Theory, 67(2):1008-1026, 2021. URL: https://doi.org/10.1109/TIT.2020.3042746.
  28. James A. Storer and Thomas G. Szymanski. The macro model for data compression (extended abstract). In Richard J. Lipton, Walter A. Burkhard, Walter J. Savitch, Emily P. Friedman, and Alfred V. Aho, editors, Proceedings of the 10th Annual ACM Symposium on Theory of Computing, May 1-3, 1978, San Diego, California, USA, pages 30-39. ACM, 1978. URL: https://doi.org/10.1145/800133.804329.
  29. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337-343, 1977. Google Scholar
  30. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory, 24(5):530-536, 1978. Google Scholar