Slowing Down Top Trees for Better Worst-Case Compression

Authors Bartlomiej Dudek, Pawel Gawrychowski

Thumbnail PDF


  • Filesize: 0.5 MB
  • 8 pages

Document Identifiers

Author Details

Bartlomiej Dudek
  • Institute of Computer Science, University of Wrocław, Poland
Pawel Gawrychowski
  • Institute of Computer Science, University of Wrocław, Poland

Cite AsGet BibTex

Bartlomiej Dudek and Pawel Gawrychowski. Slowing Down Top Trees for Better Worst-Case Compression. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 16:1-16:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


We consider the top tree compression scheme introduced by Bille et al. [ICALP 2013] and construct an infinite family of trees on n nodes labeled from an alphabet of size sigma, for which the size of the top DAG is Theta(n/log_sigma n log log_sigma n). Our construction matches a previously known upper bound and exhibits a weakness of this scheme, as the information-theoretic lower bound is Omega(n/log_sigma n}). This settles an open problem stated by Lohrey et al. [arXiv 2017], who designed a more involved version achieving the lower bound. We show that this can be also guaranteed by a very minor modification of the original scheme: informally, one only needs to ensure that different parts of the tree are not compressed too quickly. Arguably, our version is more uniform, and in particular, the compression procedure is oblivious to the value of sigma.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data compression
  • top trees
  • compression
  • tree grammars


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Philip Bille, Finn Fernstrøm, and Inge Li Gørtz. Tight bounds for top tree compression. In SPIRE, volume 10508 of Lecture Notes in Computer Science, pages 97-102. Springer, 2017. Google Scholar
  2. Philip Bille, Inge Li Gørtz, Gad M. Landau, and Oren Weimann. Tree compression with top trees. Inf. Comput., 243:166-177, 2015. Google Scholar
  3. Peter Buneman, Martin Grohe, and Christoph Koch. Path queries on compressed XML. In VLDB, pages 141-152. Morgan Kaufmann, 2003. Google Scholar
  4. Giorgio Busatto, Markus Lohrey, and Sebastian Maneth. Efficient memory representation of XML document trees. Inf. Syst., 33(4-5):456-474, 2008. Google Scholar
  5. Peter J. Downey, Ravi Sethi, and Robert Endre Tarjan. Variations on the common subexpression problem. J. ACM, 27(4):758-771, 1980. Google Scholar
  6. Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini, and S. Muthukrishnan. Compressing and indexing labeled trees, with applications. J. ACM, 57(1):4:1-4:33, 2009. Google Scholar
  7. Markus Frick, Martin Grohe, and Christoph Koch. Query evaluation on compressed trees (extended abstract). In LICS, page 188. IEEE Computer Society, 2003. Google Scholar
  8. Paweł Gawrychowski and Artur Jeż. LZ77 factorisation of trees. In FSTTCS, volume 65 of LIPIcs, pages 35:1-35:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016. Google Scholar
  9. Lorenz Hübschle-Schneider and Rajeev Raman. Tree compression with top trees revisited. In SEA, volume 9125 of Lecture Notes in Computer Science, pages 15-27. Springer, 2015. Google Scholar
  10. Guy Jacobson. Space-efficient static trees and graphs. In FOCS, pages 549-554. IEEE Computer Society, 1989. Google Scholar
  11. Artur Jeż and Markus Lohrey. Approximation of smallest linear tree grammar. Inf. Comput., 251:215-251, 2016. Google Scholar
  12. Markus Lohrey and Sebastian Maneth. The complexity of tree automata and xpath on grammar-compressed trees. Theor. Comput. Sci., 363(2):196-210, 2006. Google Scholar
  13. Markus Lohrey, Sebastian Maneth, and Eric Noeth. XML compression via dags. In ICDT, pages 69-80. ACM, 2013. Google Scholar
  14. Markus Lohrey, Carl Philipp Reh, and Kurt Sieber. Optimal top dag compression. CoRR, abs/1712.05822, 2017. URL: