Compression by Contracting Straight-Line Programs

Author Moses Ganardi



PDF
Thumbnail PDF

File

LIPIcs.ESA.2021.45.pdf
  • Filesize: 0.78 MB
  • 16 pages

Document Identifiers

Author Details

Moses Ganardi
  • Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern, Germany

Acknowledgements

The author thanks Paweł Gawrychowski, Artur Jeż, Philipp Reh, and Louisa Seelbach Benkner for helpful discussions. The author is also indebted to the anonymous referees whose comments improved the presentation of this work.

Cite AsGet BibTex

Moses Ganardi. Compression by Contracting Straight-Line Programs. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 45:1-45:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ESA.2021.45

Abstract

In grammar-based compression a string is represented by a context-free grammar, also called a straight-line program (SLP), that generates only that string. We refine a recent balancing result stating that one can transform an SLP of size g in linear time into an equivalent SLP of size 𝒪(g) so that the height of the unique derivation tree is 𝒪(log N) where N is the length of the represented string (FOCS 2019). We introduce a new class of balanced SLPs, called contracting SLPs, where for every rule A → β₁ … β_k the string length of every variable β_i on the right-hand side is smaller by a constant factor than the string length of A. In particular, the derivation tree of a contracting SLP has the property that every subtree has logarithmic height in its leaf size. We show that a given SLP of size g can be transformed in linear time into an equivalent contracting SLP of size 𝒪(g) with rules of constant length. This result is complemented by a lower bound, proving that converting SLPs into so called α-balanced SLPs or AVL-grammars can incur an increase by a factor of Ω(log N). We present an application to the navigation problem in compressed unranked trees, represented by forest straight-line programs (FSLPs). A linear space data structure by Reh and Sieber (2020) supports navigation steps such as going to the parent, left/right sibling, or to the first/last child in constant time. We extend their solution by the operation of moving to the i-th child in time 𝒪(log d) where d is the degree of the current node. Contracting SLPs are also applied to the finger search problem over SLP-compressed strings where one wants to access positions near to a pre-specified finger position, ideally in 𝒪(log d) time where d is the distance between the accessed position and the finger. We give a linear space solution for the dynamic variant where one can set the finger in 𝒪(log N) time, and then access symbols or move the finger in time 𝒪(log d + log^(t) N) for any constant t where log^(t) N is the t-fold logarithm of N. This improves a previous solution by Bille, Christiansen, Cording, and Gørtz (2018) with access/move time 𝒪(log d + log log N).

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • grammar-based compression
  • balancing
  • finger search

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, Karl Bringmann, and Marvin Künnemann. Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-and-Solve. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 192-203. IEEE Computer Society, 2017. URL: https://doi.org/10.1109/FOCS.2017.26.
  2. Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gørtz. Finger Search in Grammar-Compressed Strings. Theory Comput. Syst., 62(8):1715-1735, 2018. URL: https://doi.org/10.1007/s00224-017-9839-9.
  3. Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random Access to Grammar-Compressed Strings and Trees. SIAM J. Comput., 44(3):513-539, 2015. URL: https://doi.org/10.1137/130936889.
  4. Gerth Stølting Brodal. Finger Search Trees. In Dinesh P. Mehta and Sartaj Sahni, editors, Handbook of Data Structures and Applications. Chapman and Hall/CRC, 2004. URL: https://doi.org/10.1201/9781420035179.ch11.
  5. Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Trans. Inf. Theory, 51(7):2554-2576, 2005. URL: https://doi.org/10.1109/TIT.2005.850116.
  6. Martin Farach and S. Muthukrishnan. Perfect Hashing for Strings: Formalization and Algorithms. In Daniel S. Hirschberg and Eugene W. Myers, editors, Combinatorial Pattern Matching, 7th Annual Symposium, CPM 96, Laguna Beach, California, USA, June 10-12, 1996, Proceedings, volume 1075 of Lecture Notes in Computer Science, pages 130-140. Springer, 1996. URL: https://doi.org/10.1007/3-540-61258-0_11.
  7. Travis Gagie, Pawel Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. A Faster Grammar-Based Self-index. In Adrian-Horia Dediu and Carlos Martín-Vide, editors, Language and Automata Theory and Applications - 6th International Conference, LATA 2012, A Coruña, Spain, March 5-9, 2012. Proceedings, volume 7183 of Lecture Notes in Computer Science, pages 240-251. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-28332-1_21.
  8. Travis Gagie, Pawel Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. LZ77-Based Self-indexing with Faster Pattern Matching. In Alberto Pardo and Alfredo Viola, editors, LATIN 2014: Theoretical Informatics - 11th Latin American Symposium, Montevideo, Uruguay, March 31 - April 4, 2014. Proceedings, volume 8392 of Lecture Notes in Computer Science, pages 731-742. Springer, 2014. URL: https://doi.org/10.1007/978-3-642-54423-1_63.
  9. Moses Ganardi. Compression by Contracting Straight-Line Programs. CoRR, abs/2107.00446, 2021. URL: https://arxiv.org/abs/2107.00446.
  10. Moses Ganardi, Artur Jeż, and Markus Lohrey. Balancing Straight-Line Programs. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 1169-1183. IEEE Computer Society, 2019. URL: https://doi.org/10.1109/FOCS.2019.00073.
  11. Adrià Gascón, Markus Lohrey, Sebastian Maneth, Carl Philipp Reh, and Kurt Sieber. Grammar-Based Compression of Unranked Trees. Theory Comput. Syst., 64(1):141-176, 2020. URL: https://doi.org/10.1007/s00224-019-09942-y.
  12. Leszek Gasieniec, Roman M. Kolpakov, Igor Potapov, and Paul Sant. Real-Time Traversal in Grammar-Based Compressed Files. In 2005 Data Compression Conference (DCC 2005), 29-31 March 2005, Snowbird, UT, USA, page 458. IEEE Computer Society, 2005. URL: https://doi.org/10.1109/DCC.2005.78.
  13. Pawel Gawrychowski. Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic. In Camil Demetrescu and Magnús M. Halldórsson, editors, Algorithms - ESA 2011 - 19th Annual European Symposium, Saarbrücken, Germany, September 5-9, 2011. Proceedings, volume 6942 of Lecture Notes in Computer Science, pages 421-432. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-23719-5_36.
  14. Pawel Gawrychowski, Moshe Lewenstein, and Patrick K. Nicholson. Weighted Ancestors in Suffix Trees. In Andreas S. Schulz and Dorothea Wagner, editors, Algorithms - ESA 2014 - 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Proceedings, volume 8737 of Lecture Notes in Computer Science, pages 455-466. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-44777-2_38.
  15. Tsvi Kopelowitz and Moshe Lewenstein. Dynamic weighted ancestors. In Nikhil Bansal, Kirk Pruhs, and Clifford Stein, editors, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 565-574. SIAM, 2007. URL: http://dl.acm.org/citation.cfm?id=1283383.1283444.
  16. N. Jesper Larsson and Alistair Moffat. Off-line dictionary-based compression. Proceedings of the IEEE, 88(11):1722-1732, 2000. Google Scholar
  17. Markus Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups Complex. Cryptol., 4(2):241-299, 2012. URL: https://doi.org/10.1515/gcc-2012-0016.
  18. Markus Lohrey, Sebastian Maneth, and Carl Philipp Reh. Constant-Time Tree Traversal and Subtree Equality Check for Grammar-Compressed Trees. Algorithmica, 80(7):2082-2105, 2018. URL: https://doi.org/10.1007/s00453-017-0331-3.
  19. Craig G. Nevill-Manning and Ian H. Witten. Identifying Hierarchical Structure in Sequences: A linear-time algorithm. J. Artif. Intell. Res., 7:67-82, 1997. URL: https://doi.org/10.1613/jair.374.
  20. Mihai Patrascu and Mikkel Thorup. Dynamic Integer Sets with Optimal Rank, Select, and Predecessor Search. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 166-175. IEEE Computer Society, 2014. URL: https://doi.org/10.1109/FOCS.2014.26.
  21. Carl Philipp Reh and Kurt Sieber. Navigating Forest Straight-Line Programs in Constant Time. In Christina Boucher and Sharma V. Thankachan, editors, String Processing and Information Retrieval - 27th International Symposium, SPIRE 2020, Orlando, FL, USA, October 13-15, 2020, Proceedings, volume 12303 of Lecture Notes in Computer Science, pages 11-26. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-59212-7_2.
  22. Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. URL: https://doi.org/10.1016/S0304-3975(02)00777-6.
  23. Robert Endre Tarjan. A Class of Algorithms which Require Nonlinear Time to Maintain Disjoint Sets. J. Comput. Syst. Sci., 18(2):110-127, 1979. URL: https://doi.org/10.1016/0022-0000(79)90042-4.
  24. Terry A. Welch. A Technique for High-Performance Data Compression. Computer Magazine of the Computer Group News of the IEEE Computer Group Society, 17(6):8-19, 1984. URL: https://doi.org/10.1109/MC.1984.1659158.
  25. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5):530-536, 1978. Google Scholar