Top Tree Compression of Tries

Authors Philip Bille , Paweł Gawrychowski , Inge Li Gørtz , Gad M. Landau , Oren Weimann



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2019.4.pdf
  • Filesize: 4.51 MB
  • 18 pages

Document Identifiers

Author Details

Philip Bille
  • Technical University of Denmark, DTU Compute, Denmark
Paweł Gawrychowski
  • University of Wrocław, Poland
Inge Li Gørtz
  • Technical University of Denmark, DTU Compute, Denmark
Gad M. Landau
  • University of Haifa, Israel
Oren Weimann
  • University of Haifa, Israel

Cite AsGet BibTex

Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, Gad M. Landau, and Oren Weimann. Top Tree Compression of Tries. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ISAAC.2019.4

Abstract

We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length n over an alphabet of size sigma into a compressed data structure of worst-case optimal size O(n/log_sigma n) that given a pattern string P of length m determines if P is a prefix of one of the strings in time O(min(m log sigma,m + log n)). We show that this query time is in fact optimal regardless of the size of the data structure. Existing solutions either use Omega(n) space or rely on word RAM techniques, such as tabulation, hashing, address arithmetic, or word-level parallelism, and hence do not work on a pointer machine. Our result is the first solution on a pointer machine that achieves worst-case o(n) space. Along the way, we develop several interesting data structures that work on a pointer machine and are of independent interest. These include an optimal data structures for random access to a grammar-compressed string and an optimal data structure for a variant of the level ancestor problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • pattern matching
  • tree compression
  • top trees
  • pointer machine

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Peyman Afshani, Lars Arge, and Kasper Green Larsen. Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In Proc. 28th SoCG, pages 323-332, 2012. Google Scholar
  2. Stephen Alstrup and Jacob Holm. Improved algorithms for finding level ancestors in dynamic trees. In Proc. 27th ICALP, pages 73-84, 2000. Google Scholar
  3. Stephen Alstrup, Jacob Holm, Kristian De Lichtenberg, and Mikkel Thorup. Maintaining Information in Fully Dynamic Trees with Top Trees. ACM Trans. Algorithms, 1(2):243-264, 2005. Google Scholar
  4. J-I Aoe. An efficient digital search algorithm by using a double-array structure. IEEE Trans. Soft. Eng., 15(9):1066-1077, 1989. Google Scholar
  5. Julian Arz and Johannes Fischer. LZ-compressed string dictionaries. In Proc. 24th DCC, pages 322-331, 2014. Google Scholar
  6. Julian Arz and Johannes Fischer. Lempel-Ziv-78 Compressed String Dictionaries. Algorithmica, pages 1-36, 2018. Google Scholar
  7. Nikolas Askitis and Ranjan Sinha. Engineering scalable, cache and space efficient tries for strings. The VLDB Journal, 19(5):633-660, 2010. Google Scholar
  8. Djamal Belazzougui, Paolo Boldi, and Sebastiano Vigna. Dynamic z-fast tries. In Proc. 17th SPIRE, pages 159-172, 2010. Google Scholar
  9. Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, and Mathieu Raffinot. Composite repetition-aware data structures. In Proc. 26th CPM, pages 26-39, 2015. Google Scholar
  10. Djamal Belazzougui, Travis Gagie, Pawel Gawrychowski, Juha Kärkkäinen, Alberto Ordónez, Simon J Puglisi, and Yasuo Tabei. Queries on LZ-bounded encodings. In Proc. 25th DCC, pages 83-92, 2015. Google Scholar
  11. Djamal Belazzougui, Travis Gagie, Simon Gog, Giovanni Manzini, and Jouni Sirén. Relative FM-indexes. In Proc. 21st SPIRE, pages 52-64, 2014. Google Scholar
  12. Michael A. Bender and Martin Farach-Colton. The Level Ancestor Problem simplified. Theoret. Comput. Sci., 321(1):5-12, 2004. Google Scholar
  13. David Benoit, Erik D Demaine, J Ian Munro, Rajeev Raman, Venkatesh Raman, and S Srinivasa Rao. Representing trees of higher degree. Algorithmica, 43(4):275-292, 2005. Google Scholar
  14. Samuel W. Bent, Daniel D. Sleator, and Robert E. Tarjan. Biased Search Trees. SIAM J. Comput., 14(3):545-568, 1985. Google Scholar
  15. Philip Bille, Mikko B. Ettienne, Inge Li Gørtz, and Hjalte W. Vildhøj. Time-space trade-offs for Lempel-Ziv compressed indexing. Theor. Comput. Sci., 713:66-77, 2018. Google Scholar
  16. Philip Bille, Finn Fernstrøm, and Inge Li Gørtz. Tight Bounds for Top Tree Compression. In Proc. 24th SPIRE, pages 97-102, 2017. Google Scholar
  17. Philip Bille, Inge Li Gørtz, and Frederik Rye Skjoldjensen. Deterministic Indexing for Packed Strings. In Proc. 28th CPM, 2017. Google Scholar
  18. Philip Bille, Inge Li Gørtz, Oren Weimann, and Gad M. Landau. Tree compression with top trees. Inf. Comput., 243:166-177, 2015. Announced at ICALP 2013. Google Scholar
  19. Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random Access to Grammar-Compressed Strings and Trees. SIAM J. Comput., 44(3):513-539, 2015. Announced at SODA 2011. Google Scholar
  20. Barnard Chazelle. Lower bounds for orthogonal range searching: I. The reporting case. J. ACM, 37(2):200-212, 1990. Google Scholar
  21. Bernard Chazelle and Burton Rosenberg. Simplex range reporting on a pointer machine. Comput. Geom., 5(5):237-247, 1996. Google Scholar
  22. Anders R. Christiansen and Mikko B. Ettienne. Compressed Indexing with Signature Grammars. In Proc. 13th LATIN, pages 331-345, 2018. Google Scholar
  23. Francisco Claude and Gonzalo Navarro. Self-indexed grammar-based compression. Fundamenta Informaticae, 111(3):313-337, 2011. Google Scholar
  24. Francisco Claude and Gonzalo Navarro. Improved grammar-based compressed indexes. In Proc. 19th SPIRE, pages 180-192, 2012. Google Scholar
  25. John J Darragh, John G Cleary, and Ian H Witten. Bonsai: a compact representation of trees. Softw. Pract. Exper., 23(3):277-291, 1993. Google Scholar
  26. Paul F. Dietz. Finding level-ancestors in dynamic trees. In Proc. 2nd WADS, pages 32-40, 1991. Google Scholar
  27. Peter J. Downey, Ravi Sethi, and Robert E. Tarjan. Variations on the common subexpression problem. J. ACM, 27(4):758-771, 1980. Google Scholar
  28. Bartlomiej Dudek and Paweł Gawrychowski. Slowing Down Top Trees for Better Worst-Case Compression. In Proc. 29th CPM, pages 16:1-16:8, 2018. Google Scholar
  29. Andrea Farruggia, Travis Gagie, Gonzalo Navarro, Simon J Puglisi, and Jouni Sirén. Relative suffix trees. Comput. J., 61(5):773-788, 2017. Google Scholar
  30. Edward Fredkin. Trie Memory. Commun. ACM, 3(9):490-499, 1960. Google Scholar
  31. Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. A Faster Grammar-Based Self-index. In Proc. 6th LATA, pages 240-251, 2012. Google Scholar
  32. Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. LZ77-based self-indexing with faster pattern matching. In Proc. 11th LATIN, pages 731-742, 2014. Google Scholar
  33. Roberto Grossi and Giuseppe Ottaviano. Fast compressed tries through path decompositions. ACM J. Exp. Alg., 19:3-4, 2015. Google Scholar
  34. Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput., 35(2):378-407, 2005. Google Scholar
  35. Torben Hagerup. Sorting and Searching on the Word RAM. In Proc. 15th STACS, pages 366-398, 1998. Google Scholar
  36. Meng He, J. Ian Munro, and Gelin Zhou. Data Structures for Path Queries. ACM Trans. Algorithms, 12(4):53:1-53:32, 2016. Google Scholar
  37. Robert Hood and Robert Melville. Real-Time Queue Operation in Pure LISP. Inf. Process. Lett., 13(2):50-54, 1981. Google Scholar
  38. Lorenz Hübschle-Schneider and Rajeev Raman. Tree compression with top trees revisited. In Proc. 14th SEA, pages 15-27, 2015. Google Scholar
  39. Shunsuke Kanda, Kazuhiro Morita, and Masao Fuketa. Compressed double-array tries for string dictionaries supporting fast lookup. Knowl. Inf. Syst., 51(3):1023-1042, 2017. Google Scholar
  40. Shunsuke Kanda, Kazuhiro Morita, and Masao Fuketa. Practical implementation of space-efficient dynamic keyword dictionaries. In Proc. 24th SPIRE, pages 221-233, 2017. Google Scholar
  41. Juha Kärkkäinen and Esko Ukkonen. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proc. 3rd WSP, pages 141-155, 1996. Google Scholar
  42. Richard M. Karp and Michael O. Rabin. Efficient Randomized Pattern-Matching Algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. Google Scholar
  43. Donald E. Knuth, Jr. James H. Morris, and Vaughan R. Pratt. Fast Pattern Matching in Strings. SIAM J. Comput., 6(2):323-350, 1977. Google Scholar
  44. Donald Erwin Knuth. The Art of Computer Programming, Volume 1. Addison Wesley, 1969. Google Scholar
  45. Veli Mäkinen. Compact suffix array - a space-efficient full-text index. Fundamenta Informaticae, 56(1-2):191-210, 2003. Google Scholar
  46. Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. Nordic J. Comput., 12(1):40-66, 2005. Google Scholar
  47. Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and retrieval of individual genomes. In Proc. 13th RECOMB, pages 121-137, 2009. Google Scholar
  48. Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and retrieval of highly repetitive sequence collections. J. Comp.Bio., 17(3):281-308, 2010. Google Scholar
  49. Gonzalo Navarro and Veli Mäkinen. Compressed Full-text Indexes. ACM Comput. Surv., 39(1), 2007. Google Scholar
  50. Gonzalo Navarro and Nicola Prezza. Universal compressed text indexing. Theor. Comput. Sci., 762:41-50, 2019. Google Scholar
  51. Takaaki Nishimoto, I Tomohiro, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic index and LZ factorization in compressed space. Disc. App. Math., 2019. Google Scholar
  52. Andreas Poyias and Rajeev Raman. Improved practical compact dynamic tries. In Proc. 22nd SPIRE, pages 324-336, 2015. Google Scholar
  53. Franco P Preparata and Se June Hong. Convex hulls of finite sets of points in two and three dimensions. Communications of the ACM, 20(2):87-93, 1977. Google Scholar
  54. Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43, 2007. Google Scholar
  55. Kunihiko Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proc. 11th ISAAC, pages 410-421, 2000. Google Scholar
  56. Jouni Sirén, Niko Välimäki, Veli Mäkinen, and Gonzalo Navarro. Run-length compressed indexes are superior for highly repetitive sequence collections. In Proc. 15th SPIRE, pages 164-175, 2008. Google Scholar
  57. Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, and Hiroki Arimura. Linear-size CDAWG: new repetition-aware indexing and grammar compression. In Proc. 24th SPIRE, pages 304-316, 2017. Google Scholar
  58. Takuya Takagi, Shunsuke Inenaga, Kunihiko Sadakane, and Hiroki Arimura. Packed compact tries: A fast and efficient data structure for online string processing. IEICE Trans. on Fund. Elect., Comm. and Comp. Sci., 100(9):1785-1793, 2017. Google Scholar
  59. Robert Endre Tarjan. A class of algorithms which require nonlinear time to maintain disjoint sets. Journal of Computer and System Sciences, 18(2):110-127, April 1979. Google Scholar
  60. Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic Packed Compact Tries Revisited. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.07467.
  61. Susumu Yata. Dictionary compression by nesting prefix/patricia tries. In Proc. 17th Meeting of the Association for Natural Language, 2011. Google Scholar
  62. Naoki Yoshinaga and Masaru Kitsuregawa. A self-adaptive classifier for efficient text-stream processing. In Proc. 25th COLING, pages 1091-1102, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail