LZ77 Factorisation of Trees

Gawrychowski, Pawel; Jez, Artur

doi:10.4230/LIPIcs.FSTTCS.2016.35

Abstract

We generalise the fundamental concept of LZ77 factorisation from strings to trees. A tree is represented as a collection of edge-disjoint fragments that either consist of one node or has already occurred earlier (in the BFS order). Similarly as for strings, such a collection uniquely determines the tree, so by minimising the number of fragments we obtain a compressed representation of the tree. We show that our generalisation has several useful properties of the standard LZ77 factorisation: it can be computed in polynomial time and its simpler variant in linear time; its size is not larger than the smallest grammar for a tree; it can be transformed (in linear time) into a tree grammar of size O(rg log(n/(rg))), where n is the size of the tree, g the size of the smallest grammar for this tree and r the maximal arity of the nodes in the tree, which matches a recent bound of Jez and Lohrey [STACS 2014], but with a simpler and more modular proof.

Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
Tatsuya Akutsu. A bisection algorithm for grammar-based compression of ordered trees. Inf. Process. Lett., 110(18-19):815-820, 2010.
Philip Bille, Inge Li Gørtz, Gad M. Landau, and Oren Weimann. Tree compression with top trees. Infmormation and Computation, 243:166-177, 2015. URL: http://dx.doi.org/10.1016/j.ic.2014.12.012.
Mikołaj Bojańczyk and Igor Walukiewicz. Forest algebras. In Jörg Flum, Erich Grädel, and Thomas Wilke, editors, Logic and Automata: History and Perspectives [in Honor of Wolfgang Thomas]., volume 2 of Texts in Logic and Games, pages 107-132. Amsterdam University Press, 2008.
Mireille Bousquet-Mélou, Markus Lohrey, Sebastian Maneth, and Eric Nöth. XML compression via directed acyclic graphs. Theory Comput. Syst., 57(4):1322-1371, 2015. URL: http://dx.doi.org/10.1007/s00224-014-9544-x.
Giorgio Busatto, Markus Lohrey, and Sebastian Maneth. Efficient memory representation of XML document trees. Information Systems, 33(4-5):456-474, 2008.
Katrin Casel, Henning Fernau, Serge Gaspers, Benjamin Gras, and Markus L. Schmid. On the complexity of grammar-based compression over fixed alphabets. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, ICALP, volume 55 of LIPIcs, pages 122:1-122:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2016.122.
Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Transactions on Information Theory, 51(7):2554-2576, 2005. URL: http://dx.doi.org/10.1109/TIT.2005.850116.
Travis Gagie and Paweł Gawrychowski. Grammar-based compression in a streaming model. In Adrian Horia Dediu, Henning Fernau, and Carlos Martín-Vide, editors, LATA, volume 6031 of LNCS, pages 273-284. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13089-2_23.
Moses Ganardi, Danny Hucke, Markus Lohrey, and Eric Noeth. Tree compression using string grammars. In Evangelos Kranakis, Gonzalo Navarro, and Edgar Chávez, editors, LATIN, volume 9644 of LNCS, pages 590-604. Springer, 2016. URL: http://dx.doi.org/10.1007/978-3-662-49529-2_44.
Adria Gascón, Guillem Godoy, and Manfred Schmidt-Schauß. Context matching for compressed terms. In Proceedings of the Twenty-Third Annual IEEE Symposium on Logic in Computer Science, LICS 2008, 24-27 June 2008, Pittsburgh, PA, USA, pages 93-102. IEEE Computer Society, 2008. URL: http://dx.doi.org/10.1109/LICS.2008.17.
Adria Gascón, Guillem Godoy, and Manfred Schmidt-Schauß. Unification with singleton tree grammars. In Ralf Treinen, editor, RTA, volume 5595 of LNCS, pages 365-379. Springer, 2009. URL: http://dx.doi.org/10.1007/978-3-642-02348-4_26.
Adria Gascón, Guillem Godoy, Manfred Schmidt-Schauß, and Ashish Tiwari. Context unification with one context variable. J. Symb. Comput., 45(2):173-193, 2010. URL: http://dx.doi.org/10.1016/j.jsc.2008.10.005.
Adria Gascón, Manfred Schmidt-Schauß, and Ashish Tiwari. Two-restricted one context unification is in polynomial time. In Stephan Kreutzer, editor, CSL, volume 41 of LIPIcs, pages 405-422. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015. URL: http://dx.doi.org/10.4230/LIPIcs.CSL.2015.405.
Adria Gascón, Ashish Tiwari, and Manfred Schmidt-Schauß. One context unification problems solvable in polynomial time. In LICS, pages 499-510. IEEE, 2015. URL: http://dx.doi.org/10.1109/LICS.2015.53.
Paweł Gawrychowski. Pattern matching in Lempel-Ziv compressed strings: fast, simple, and deterministic. In Camil Demetrescu and Magnús M. Halldórsson, editors, ESA, volume 6942 of LNCS, pages 421-432. Springer, 2011. URL: http://dx.doi.org/10.1007/978-3-642-23719-5_36.
Danny Hucke, Markus Lohrey, and Eric Noeth. Constructing small tree grammars and small circuits for formulas. In Venkatesh Raman and S. P. Suresh, editors, FSTTCS, volume 29 of LIPIcs, pages 457-468. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2014. URL: http://dx.doi.org/10.4230/LIPIcs.FSTTCS.2014.457.
Artur Jeż. Context unification is in PSPACE. In Elias Koutsoupias, Javier Esparza, and Pierre Fraigniaud, editors, ICALP, volume 8573 of LNCS, pages 244-255. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-662-43951-7_21.
Artur Jeż. A really simple approximation of smallest grammar. Theoretical Computer Science, 616:141-150, 2016. URL: http://dx.doi.org/10.1016/j.tcs.2015.12.032.
Artur Jeż and Markus Lohrey. Approximation of smallest linear tree grammar. In Ernst W. Mayr and Natacha Portier, editors, STACS, volume 25 of LIPIcs, pages 445-457. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2014. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2014.445.
Jordi Levy, Manfred Schmidt-Schauß, and Mateu Villaret. On the complexity of bounded second-order unification and stratified context unification. Logic Journal of the IGPL, 19(6):763-789, 2011. URL: http://dx.doi.org/10.1093/jigpal/jzq010.
Markus Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups Complexity Cryptology, 4(2):241-299, 2012.
Markus Lohrey, Sebastian Maneth, and Roy Mennicke. XML tree structure compression using RePair. Inf. Syst., 38(8):1150-1167, 2013. URL: http://dx.doi.org/10.1016/j.is.2013.06.006.
Markus Lohrey, Sebastian Maneth, and Manfred Schmidt-Schauß. Parameter reduction and automata evaluation for grammar-compressed trees. J. Comput. Syst. Sci., 78(5):1651-1669, 2012. URL: http://dx.doi.org/10.1016/j.jcss.2012.03.003.
Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(02)00777-6.
Hiroshi Sakamoto. A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms, 3(2-4):416-430, 2005. URL: http://dx.doi.org/10.1016/j.jda.2004.08.016.
Manfred Schmidt-Schauß. Linear compressed pattern matching for polynomial rewriting (extended abstract). In Rachid Echahed and Detlef Plump, editors, TERMGRAPH, volume 110 of EPTCS, pages 29-40, 2013. URL: http://dx.doi.org/10.4204/EPTCS.110.5.
Tetsuo Shibuya. Constructing the suffix tree of a tree with a large alphabet. In Algorithms and Computation, pages 225-236. Springer, 1999.
James A. Storer and Thomas G. Szymanski. The macro model for data compression. In Richard J. Lipton, Walter A. Burkhard, Walter J. Savitch, Emily P. Friedman, and Alfred V. Aho, editors, STOC, pages 30-39. ACM, 1978.

LZ77 Factorisation of Trees

Authors Pawel Gawrychowski, Artur Jez

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References