Hypersuccinct Trees - New Universal Tree Source Codes for Optimal Compressed Tree Data Structures and Range Minima

Authors J. Ian Munro , Patrick K. Nicholson, Louisa Seelbach Benkner , Sebastian Wild



PDF
Thumbnail PDF

File

LIPIcs.ESA.2021.70.pdf
  • Filesize: 0.91 MB
  • 18 pages

Document Identifiers

Author Details

J. Ian Munro
  • University of Waterloo, Canada
Patrick K. Nicholson
  • Nokia Bell Labs, Dublin, Ireland
Louisa Seelbach Benkner
  • Universität Siegen, Germany
Sebastian Wild
  • University of Liverpool, UK

Acknowledgements

We thank Conrado Martínez and Markus Lohrey for valuable discussions and feedback on earlier drafts of this paper, as well as our anonymous reviewers for their comments.

Cite AsGet BibTex

J. Ian Munro, Patrick K. Nicholson, Louisa Seelbach Benkner, and Sebastian Wild. Hypersuccinct Trees - New Universal Tree Source Codes for Optimal Compressed Tree Data Structures and Range Minima. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 70:1-70:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ESA.2021.70

Abstract

We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it supports answering many navigational queries on the compressed representation in constant time on the word-RAM; this is not known to be possible for any existing tree compression method. The resulting data structures, "hypersuccinct trees", hence combine the compression achieved by the best known universal codes with the operation support of the best succinct tree data structures. We apply hypersuccinct trees to obtain a universal compressed data structure for range-minimum queries. It has constant query time and the optimal worst-case space usage of 2n+o(n) bits, but the space drops to 1.736n + o(n) bits on average for random permutations of n elements, and 2lg binom{n}{r} + o(n) for arrays with r increasing runs, respectively. Both results are optimal; the former answers an open problem of Davoodi et al. (2014) and Golin et al. (2016). Compared to prior work on succinct data structures, we do not have to tailor our data structure to specific applications; hypersuccinct trees automatically adapt to the trees at hand. We show that they simultaneously achieve the optimal space usage to within lower order terms for a wide range of distributions over tree shapes, including: binary search trees (BSTs) generated by insertions in random order / Cartesian trees of random arrays, random fringe-balanced BSTs, binary trees with a given number of binary/unary/leaf nodes, random binary tries generated from memoryless sources, full binary trees, unary paths, as well as uniformly chosen weight-balanced BSTs, AVL trees, and left-leaning red-black trees.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data compression
  • Theory of computation → Data structures design and analysis
Keywords
  • analysis of algorithms
  • universal source code
  • compressed trees
  • succinct data structure
  • succinct trees
  • tree covering
  • random binary search trees
  • range-minimum queries

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Djamal Belazzougui, Veli Mäkinen, and Daniel Valenzuela. Compressed suffix array. In Encyclopedia of Algorithms, pages 1-6. Springer US, 2014. URL: https://doi.org/10.1007/978-3-642-27848-8_82-2.
  2. Philip Bille, Inge Li Gørtz, Gad M. Landau, and Oren Weimann. Tree compression with top trees. Information and Computation, 243:166-177, 2015. URL: https://doi.org/10.1016/j.ic.2014.12.012.
  3. Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random access to grammar-compressed strings and trees. SIAM Journal on Computing, 44(3):513-539, January 2015. URL: https://doi.org/10.1137/130936889.
  4. Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley Interscience, 2nd edition, 2006. Google Scholar
  5. Pooya Davoodi, Gonzalo Navarro, Rajeev Raman, and Srinivasa Rao Satti. Encoding range minima and range top-2 queries. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 372(2016):20130131-20130131, April 2014. URL: https://doi.org/10.1098/rsta.2013.0131.
  6. M. Effros, K. Visweswariah, S. R. Kulkarni, and S. Verdu. Universal lossless source coding with the burrows wheeler transform. IEEE Transactions on Information Theory, 48(5):1061-1081, May 2002. URL: https://doi.org/10.1109/18.995542.
  7. Arash Farzan and J. Ian Munro. A uniform paradigm to succinctly encode various families of trees. Algorithmica, 68(1):16-40, 2014. URL: https://doi.org/10.1007/s00453-012-9664-0.
  8. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Annual Symposium on Foundations of Computer Science (FOCS). IEEE Comput. Soc, 2000. URL: https://doi.org/10.1109/sfcs.2000.892127.
  9. Johannes Fischer and Volker Heun. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing, 40(2):465-492, January 2011. URL: https://doi.org/10.1137/090779759.
  10. Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics. Cambridge University Press, 2009. (available on author’s website: http://algo.inria.fr/flajolet/Publications/book.pdf).
  11. Harold N. Gabow, Jon Louis Bentley, and Robert E. Tarjan. Scaling and related techniques for geometry problems. In STOC 1984. ACM Press, 1984. URL: https://doi.org/10.1145/800057.808675.
  12. Moses Ganardi, Danny Hucke, Artur Jez, Markus Lohrey, and Eric Noeth. Constructing small tree grammars and small circuits for formulas. Journal of Computer and System Sciences, 86:136-158, June 2017. URL: https://doi.org/10.1016/j.jcss.2016.12.007.
  13. Moses Ganardi, Danny Hucke, Markus Lohrey, and Louisa Seelbach Benkner. Universal tree source coding using grammar-based compression. IEEE Transactions on Information Theory, 65(10):6399-6413, October 2019. URL: https://doi.org/10.1109/tit.2019.2919829.
  14. Moses Ganardi, Danny Hucke, Markus Lohrey, and Eric Noeth. Tree compression using string grammars. Algorithmica, 80(3):885-917, 2017. URL: https://doi.org/10.1007/s00453-017-0279-3.
  15. Michal Ganczorz. Entropy lower bounds for dictionary compression. In Nadia Pisanti and Solon P. Pissis, editors, 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, June 18-20, 2019, Pisa, Italy, volume 128 of LIPIcs, pages 11:1-11:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.11.
  16. Michał Gańczorz. Using statistical encoding to achieve tree succinctness never seen before. In Symposium on Theoretical Aspects of Computer Science (STACS), LIPIcs, pages 22:1-22:29. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPICS.STACS.2020.22.
  17. Adrià Gascón, Markus Lohrey, Sebastian Maneth, Carl Philipp Reh, and Kurt Sieber. Grammar-based compression of unranked trees. Theory of Computing Systems, 64(1):141-176, 2020. URL: https://doi.org/10.1007/s00224-019-09942-y.
  18. Pawel Gawrychowski and Artur Jez. LZ77 Factorisation of Trees. In Akash Lal, S. Akshay, Saket Saurabh, and Sandeep Sen, editors, Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016), volume 65 of LIPIcs, pages 35:1-35:15, Dagstuhl, Germany, 2016. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2016.35.
  19. Paweł Gawrychowski and Patrick K. Nicholson. Optimal encodings for range top-k, selection, and min-max. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 593-604, 2015. URL: https://doi.org/10.1007/978-3-662-47672-7_48.
  20. Richard F. Geary, Rajeev Raman, and Venkatesh Raman. Succinct ordinal trees with level-ancestor queries. ACM Transactions on Algorithms, 2(4):510-534, October 2006. URL: https://doi.org/10.1145/1198513.1198516.
  21. Mordecai Golin, John Iacono, Danny Krizanc, Rajeev Raman, Srinivasa Rao Satti, and Sunil Shende. Encoding 2d range maximum queries. Theoretical Computer Science, 609:316-327, January 2016. URL: https://doi.org/10.1016/j.tcs.2015.10.012.
  22. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation For Computer Science. Addison-Wesley, 1994. Google Scholar
  23. Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In ACM Symposium on Theory of Computing (STOC). ACM Press, 2000. URL: https://doi.org/10.1145/335305.335351.
  24. Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing, 35(2):378-407, January 2005. URL: https://doi.org/10.1137/s0097539702402354.
  25. Meng He, J. Ian Munro, and Srinivasa Satti Rao. Succinct ordinal trees based on tree covering. ACM Transactions on Algorithms, 8(4):1-32, 2012. URL: https://doi.org/10.1145/2344422.2344432.
  26. Hsien-Kuei Hwang and Ralph Neininger. Phase change of limit laws in the quicksort recurrence under varying toll functions. SIAM Journal on Computing, 31(6):1687-1722, January 2002. URL: https://doi.org/10.1137/s009753970138390x.
  27. OEIS Foundation Inc. The On-Line Encyclopedia of Integer Sequences, A001263, 2021. URL: https://oeis.org/A001263.
  28. G. Jacobson. Space-efficient static trees and graphs. In Symposium on Foundations of Computer Science (FOCS). IEEE, 1989. URL: https://doi.org/10.1109/sfcs.1989.63533.
  29. Jesper Jansson, Kunihiko Sadakane, and Wing-Kin Sung. Ultra-succinct representation of ordered trees with applications. Journal of Computer and System Sciences, 78(2):619-631, March 2012. URL: https://doi.org/10.1016/j.jcss.2011.09.002.
  30. J.C. Kieffer and En-Hui Yang. Grammar-based codes: a new class of universal lossless source codes. IEEE Transactions on Information Theory, 46(3):737-754, May 2000. URL: https://doi.org/10.1109/18.841160.
  31. John C. Kieffer, En-Hui Yang, and Wojciech Szpankowski. Structural complexity of random binary trees. In 2009 IEEE International Symposium on Information Theory. IEEE, June 2009. URL: https://doi.org/10.1109/isit.2009.5205704.
  32. Markus Lohrey. Grammar-based tree compression. In International Conference on Developments in Language Theory, pages 46-57. Springer, 2015. Google Scholar
  33. J. Ian Munro and Sebastian Wild. Entropy trees and range-minimum queries in optimal average-case space, 2019. URL: http://arxiv.org/abs/1903.02533.
  34. Gonzalo Navarro. Compact Data Structures - A practical approach. Cambridge University Press, 2016. Google Scholar
  35. Gonzalo Navarro. Indexing highly repetitive string collections, part I: Repetitiveness measures. ACM Computing Surveys, 54(2):29:1-29:36, February 2021. Google Scholar
  36. Gonzalo Navarro. Indexing highly repetitive string collections, part II: Compressed indexes. ACM Computing Surveys, 54(2):26:1-26:38, February 2021. URL: https://doi.org/10.1145/3432999.
  37. Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. ACM Computing Surveys, 39(1):2, April 2007. URL: https://doi.org/10.1145/1216370.1216372.
  38. Jürg Nievergelt and Edward M. Reingold. Binary search trees of bounded balance. SIAM J. Comput., 2(1):33-43, 1973. URL: https://doi.org/10.1137/0202005.
  39. Nicola Prezza. Optimal Rank and Select Queries on Dictionary-Compressed Text. In Nadia Pisanti and Solon P. Pissis, editors, Symposium on Combinatorial Pattern Matching (CPM), volume 128 of LIPIcs, pages 4:1-4:12, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.4.
  40. Louisa Seelbach Benkner and Markus Lohrey. Average Case Analysis of Leaf-Centric Binary Tree Sources. In Igor Potapov, Paul Spirakis, and James Worrell, editors, Symposium on Mathematical Foundations of Computer Science (MFCS), volume 117 of LIPIcs, pages 16:1-16:15, Dagstuhl, Germany, 2018. Schloss Dagstuhl. URL: https://doi.org/10.4230/LIPIcs.MFCS.2018.16.
  41. Dekel Tsur. Representation of ordered trees with a given degree distribution, 2018. URL: http://arxiv.org/abs/1807.00371.
  42. Sebastian Wild. Dual-Pivot Quicksort and Beyond: Analysis of Multiway Partitioning and Its Practical Potential. Dissertation (Ph. D. thesis), University of Kaiserslautern, 2016. URL: https://www.wild-inter.net/publications/wild-2016.
  43. Jie Zhang, En-Hui Yang, and John C. Kieffer. A universal grammar-based code for lossless compression of binary trees. IEEE Transactions on Information Theory, 60(3):1373-1386, March 2014. URL: https://doi.org/10.1109/tit.2013.2295392.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail