{RePair} Grammars Are the Smallest Grammars for Fibonacci Words

Authors Takuya Mieno , Shunsuke Inenaga , Takashi Horiyama



PDF
Thumbnail PDF

File

LIPIcs.CPM.2022.26.pdf
  • Filesize: 1.12 MB
  • 17 pages

Document Identifiers

Author Details

Takuya Mieno
  • Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan
Shunsuke Inenaga
  • Department of Informatics, Kyushu University, Fukuoka, Japan
  • PRESTO, Japan Science and Technology Agency, Kawaguchi, Japan
Takashi Horiyama
  • Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan

Cite AsGet BibTex

Takuya Mieno, Shunsuke Inenaga, and Takashi Horiyama. {RePair} Grammars Are the Smallest Grammars for Fibonacci Words. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 26:1-26:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.CPM.2022.26

Abstract

Grammar-based compression is a loss-less data compression scheme that represents a given string w by a context-free grammar that generates only w. While computing the smallest grammar which generates a given string w is NP-hard in general, a number of polynomial-time grammar-based compressors which work well in practice have been proposed. RePair, proposed by Larsson and Moffat in 1999, is a grammar-based compressor which recursively replaces all possible occurrences of a most frequently occurring bigrams in the string. Since there can be multiple choices of the most frequent bigrams to replace, different implementations of RePair can result in different grammars. In this paper, we show that the smallest grammars generating the Fibonacci words F_k can be completely characterized by RePair, where F_k denotes the k-th Fibonacci word. Namely, all grammars for F_k generated by any implementation of RePair are the smallest grammars for F_k, and no other grammars can be the smallest for F_k. To the best of our knowledge, Fibonacci words are the first non-trivial infinite family of strings for which RePair is optimal.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Combinatorics on words
Keywords
  • grammar based compression
  • Fibonacci words
  • RePair
  • smallest grammar problem

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alberto Apostolico and Stefano Lonardi. Compression of biological sequences by greedy off-line textual substitution. In Data Compression Conference, DCC 2000, Snowbird, Utah, USA, March 28-30, 2000, pages 143-152. IEEE Computer Society, 2000. URL: https://doi.org/10.1109/DCC.2000.838154.
  2. Hideo Bannai, Momoko Hirayama, Danny Hucke, Shunsuke Inenaga, Artur Jez, Markus Lohrey, and Carl Philipp Reh. The smallest grammar problem revisited. IEEE Trans. Inf. Theory, 67(1):317-328, 2021. URL: https://doi.org/10.1109/TIT.2020.3038147.
  3. Jean Berstel. Fibonacci Words - A Survey, pages 13-27. Springer Berlin Heidelberg, Berlin, Heidelberg, 1986. URL: https://doi.org/10.1007/978-3-642-95486-3_2.
  4. Jean Berstel and Alessandra Savelli. Crochemore factorization of sturmian and other infinite words. In Rastislav Kralovic and Pawel Urzyczyn, editors, Mathematical Foundations of Computer Science 2006, 31st International Symposium, MFCS 2006, Stará Lesná, Slovakia, August 28-September 1, 2006, Proceedings, volume 4162 of Lecture Notes in Computer Science, pages 157-166. Springer, 2006. URL: https://doi.org/10.1007/11821069_14.
  5. Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gørtz. Finger search in grammar-compressed strings. Theory Comput. Syst., 62(8):1715-1735, 2018. URL: https://doi.org/10.1007/s00224-017-9839-9.
  6. Philip Bille, Travis Gagie, Inge Li Gørtz, and Nicola Prezza. A separation between rlslps and LZ77. J. Discrete Algorithms, 50:36-39, 2018. URL: https://doi.org/10.1016/j.jda.2018.09.002.
  7. Philip Bille, Inge Li Gørtz, and Nicola Prezza. Space-efficient Re-Pair compression. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2017 Data Compression Conference, DCC 2017, Snowbird, UT, USA, April 4-7, 2017, pages 171-180. IEEE, 2017. URL: https://doi.org/10.1109/DCC.2017.24.
  8. Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random access to grammar-compressed strings and trees. SIAM J. Comput., 44(3):513-539, 2015. URL: https://doi.org/10.1137/130936889.
  9. Katrin Casel, Henning Fernau, Serge Gaspers, Benjamin Gras, and Markus L. Schmid. On the complexity of the smallest grammar problem over fixed alphabets. Theory Comput. Syst., 65(2):344-409, 2021. URL: https://doi.org/10.1007/s00224-020-10013-w.
  10. Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Trans. Inf. Theory, 51(7):2554-2576, 2005. URL: https://doi.org/10.1109/TIT.2005.850116.
  11. Francisco Claude and Gonzalo Navarro. Fast and compact web graph representations. ACM Trans. Web, 4(4):16:1-16:31, 2010. URL: https://doi.org/10.1145/1841909.1841913.
  12. Maxime Crochemore. Linear searching for a square in a word. Bulletin of the European Association of Theoretical Computer Science, 24:66-72, 1984. Google Scholar
  13. Martin Farach and Mikkel Thorup. String matching in Lempel-Ziv compressed strings. Algorithmica, 20(4):388-404, 1998. URL: https://doi.org/10.1007/PL00009202.
  14. Gabriele Fici. Factorizations of the fibonacci infinite word. J. Integer Seq., 18(9):15.9.3, 2015. URL: https://cs.uwaterloo.ca/journals/JIS/VOL18/Fici/fici5.html.
  15. Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Takuya Kida. Practical grammar compression based on maximal repeats. Algorithms, 13(4):103, 2020. URL: https://doi.org/10.3390/a13040103.
  16. Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, and Yoshimasa Takabatake. Rpair: Rescaling RePair with rsync. In Nieves R. Brisaboa and Simon J. Puglisi, editors, String Processing and Information Retrieval - 26th International Symposium, SPIRE 2019, Segovia, Spain, October 7-9, 2019, Proceedings, volume 11811 of Lecture Notes in Computer Science, pages 35-44. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-32686-9_3.
  17. Moses Ganardi, Artur Jez, and Markus Lohrey. Balancing straight-line programs. J. ACM, 68(4):27:1-27:40, 2021. URL: https://doi.org/10.1145/3457389.
  18. Michal Ganczorz and Artur Jez. Improvements on Re-Pair grammar compressor. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2017 Data Compression Conference, DCC 2017, Snowbird, UT, USA, April 4-7, 2017, pages 181-190. IEEE, 2017. URL: https://doi.org/10.1109/DCC.2017.52.
  19. Leszek Gasieniec, Marek Karpinski, Wojciech Plandowski, and Wojciech Rytter. Efficient algorithms for Lempel-Ziv encoding (extended abstract). In Rolf G. Karlsson and Andrzej Lingas, editors, Algorithm Theory - SWAT '96, 5th Scandinavian Workshop on Algorithm Theory, Reykjavík, Iceland, July 3-5, 1996, Proceedings, volume 1097 of Lecture Notes in Computer Science, pages 392-403. Springer, 1996. URL: https://doi.org/10.1007/3-540-61422-2_148.
  20. Danny Hucke and Carl Philipp Reh. Approximation ratios of RePair, LongestMatch and Greedy on unary strings. Algorithms, 14(2):65, 2021. URL: https://doi.org/10.3390/a14020065.
  21. Tomohiro I. Longest common extensions with recompression. In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, July 4-6, 2017, Warsaw, Poland, volume 78 of LIPIcs, pages 18:1-18:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.CPM.2017.18.
  22. Tomohiro I, Wataru Matsubara, Kouji Shimohira, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Kazuyuki Narisawa, and Ayumi Shinohara. Detecting regularities on grammar-compressed strings. Inf. Comput., 240:74-89, 2015. URL: https://doi.org/10.1016/j.ic.2014.09.009.
  23. Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Lyndon factorization algorithms for SLP and LZ78 compressed text. Theor. Comput. Sci., 656:215-224, 2016. URL: https://doi.org/10.1016/j.tcs.2016.03.005.
  24. Artur Jez. Approximation of grammar-based compression via recompression. Theor. Comput. Sci., 592:115-134, 2015. URL: https://doi.org/10.1016/j.tcs.2015.05.027.
  25. Marek Karpinski, Wojciech Rytter, and Ayumi Shinohara. An efficient pattern-matching algorithm for strings with short descriptions. Nord. J. Comput., 4(2):172-186, 1997. Google Scholar
  26. Takuya Kida, Tetsuya Matsumoto, Yusuke Shibata, Masayuki Takeda, Ayumi Shinohara, and Setsuo Arikawa. Collage system: a unifying framework for compressed pattern matching. Theor. Comput. Sci., 298(1):253-272, 2003. URL: https://doi.org/10.1016/S0304-3975(02)00426-7.
  27. John C. Kieffer, En-Hui Yang, Gregory J. Nelson, and Pamela C. Cosman. Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory, 46(4):1227-1245, 2000. URL: https://doi.org/10.1109/18.850665.
  28. Donald Ervin Knuth. The art of computer programming, Volume II: Seminumerical Algorithms, 3rd Edition. Addison-Wesley, 1998. URL: https://www.worldcat.org/oclc/312898417.
  29. Dominik Köppl, Tomohiro I, Isamu Furuya, Yoshimasa Takabatake, Kensuke Sakai, and Keisuke Goto. Re-Pair in small space. Algorithms, 14(1):5, 2021. URL: https://doi.org/10.3390/a14010005.
  30. N. Jesper Larsson and Alistair Moffat. Offline dictionary-based compression. In Data Compression Conference, DCC 1999, Snowbird, Utah, USA, March 29-31, 1999, pages 296-305. IEEE Computer Society, 1999. URL: https://doi.org/10.1109/DCC.1999.755679.
  31. Markus Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups Complex. Cryptol., 4(2):241-299, 2012. URL: https://doi.org/10.1515/gcc-2012-0016.
  32. Markus Lohrey, Sebastian Maneth, and Roy Mennicke. XML tree structure compression using RePair. Inf. Syst., 38(8):1150-1167, 2013. URL: https://doi.org/10.1016/j.is.2013.06.006.
  33. Takuya Masaki and Takuya Kida. Online grammar transformation based on Re-Pair algorithm. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2016 Data Compression Conference, DCC 2016, Snowbird, UT, USA, March 30 - April 1, 2016, pages 349-358. IEEE, 2016. URL: https://doi.org/10.1109/DCC.2016.69.
  34. Wataru Matsubara, Shunsuke Inenaga, Akira Ishino, Ayumi Shinohara, Tomoyuki Nakamura, and Kazuo Hashimoto. Efficient algorithms to compute compressed longest common substrings and compressed palindromes. Theor. Comput. Sci., 410(8-10):900-913, 2009. URL: https://doi.org/10.1016/j.tcs.2008.12.016.
  35. Takuya Mieno, Shunsuke Inenaga, and Takashi Horiyama. Repair grammars are the smallest grammars for Fibonacci words. CoRR, abs/2202.08447, 2022. URL: https://arxiv.org/abs/2202.08447.
  36. Gonzalo Navarro and Cristian Urbina. On stricter reachable repetitiveness measures. In Thierry Lecroq and Hélène Touzet, editors, String Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Lille, France, October 4-6, 2021, Proceedings, volume 12944 of Lecture Notes in Computer Science, pages 193-206. Springer, 2021. URL: https://doi.org/10.1007/978-3-030-86692-1_16.
  37. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. In Piotr Faliszewski, Anca Muscholl, and Rolf Niedermeier, editors, 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016, August 22-26, 2016 - Kraków, Poland, volume 58 of LIPIcs, pages 72:1-72:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.MFCS.2016.72.
  38. Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. URL: https://doi.org/10.1016/S0304-3975(02)00777-6.
  39. James A. Storer and Thomas G. Szymanski. Data compression via textual substitution. J. ACM, 29(4):928-951, 1982. URL: https://doi.org/10.1145/322344.322346.
  40. Toshiya Tanaka, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing convolution on grammar-compressed text. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2013 Data Compression Conference, DCC 2013, Snowbird, UT, USA, March 20-22, 2013, pages 451-460. IEEE, 2013. URL: https://doi.org/10.1109/DCC.2013.53.
  41. En-Hui Yang and John C. Kieffer. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform - part one: Without context models. IEEE Trans. Inf. Theory, 46(3):755-777, 2000. URL: https://doi.org/10.1109/18.841161.
  42. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23(3):337-343, 1977. URL: https://doi.org/10.1109/TIT.1977.1055714.
  43. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory, 24(5):530-536, 1978. URL: https://doi.org/10.1109/TIT.1978.1055934.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail