Top Tree Compression of Tries

Bille, Philip; Gawrychowski, Paweł; Gørtz, Inge Li; Landau, Gad M.; Weimann, Oren

doi:10.4230/LIPIcs.ISAAC.2019.4

Abstract

We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length n over an alphabet of size sigma into a compressed data structure of worst-case optimal size O(n/log_sigma n) that given a pattern string P of length m determines if P is a prefix of one of the strings in time O(min(m log sigma,m + log n)). We show that this query time is in fact optimal regardless of the size of the data structure. Existing solutions either use Omega(n) space or rely on word RAM techniques, such as tabulation, hashing, address arithmetic, or word-level parallelism, and hence do not work on a pointer machine. Our result is the first solution on a pointer machine that achieves worst-case o(n) space. Along the way, we develop several interesting data structures that work on a pointer machine and are of independent interest. These include an optimal data structures for random access to a grammar-compressed string and an optimal data structure for a variant of the level ancestor problem.

Peyman Afshani, Lars Arge, and Kasper Green Larsen. Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model. In Proc. 28th SoCG, pages 323-332, 2012.
Stephen Alstrup and Jacob Holm. Improved algorithms for finding level ancestors in dynamic trees. In Proc. 27th ICALP, pages 73-84, 2000.
Stephen Alstrup, Jacob Holm, Kristian De Lichtenberg, and Mikkel Thorup. Maintaining Information in Fully Dynamic Trees with Top Trees. ACM Trans. Algorithms, 1(2):243-264, 2005.
J-I Aoe. An efficient digital search algorithm by using a double-array structure. IEEE Trans. Soft. Eng., 15(9):1066-1077, 1989.
Julian Arz and Johannes Fischer. LZ-compressed string dictionaries. In Proc. 24th DCC, pages 322-331, 2014.
Julian Arz and Johannes Fischer. Lempel-Ziv-78 Compressed String Dictionaries. Algorithmica, pages 1-36, 2018.
Nikolas Askitis and Ranjan Sinha. Engineering scalable, cache and space efficient tries for strings. The VLDB Journal, 19(5):633-660, 2010.
Djamal Belazzougui, Paolo Boldi, and Sebastiano Vigna. Dynamic z-fast tries. In Proc. 17th SPIRE, pages 159-172, 2010.
Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, and Mathieu Raffinot. Composite repetition-aware data structures. In Proc. 26th CPM, pages 26-39, 2015.
Djamal Belazzougui, Travis Gagie, Pawel Gawrychowski, Juha Kärkkäinen, Alberto Ordónez, Simon J Puglisi, and Yasuo Tabei. Queries on LZ-bounded encodings. In Proc. 25th DCC, pages 83-92, 2015.
Djamal Belazzougui, Travis Gagie, Simon Gog, Giovanni Manzini, and Jouni Sirén. Relative FM-indexes. In Proc. 21st SPIRE, pages 52-64, 2014.
Michael A. Bender and Martin Farach-Colton. The Level Ancestor Problem simplified. Theoret. Comput. Sci., 321(1):5-12, 2004.
David Benoit, Erik D Demaine, J Ian Munro, Rajeev Raman, Venkatesh Raman, and S Srinivasa Rao. Representing trees of higher degree. Algorithmica, 43(4):275-292, 2005.
Samuel W. Bent, Daniel D. Sleator, and Robert E. Tarjan. Biased Search Trees. SIAM J. Comput., 14(3):545-568, 1985.
Philip Bille, Mikko B. Ettienne, Inge Li Gørtz, and Hjalte W. Vildhøj. Time-space trade-offs for Lempel-Ziv compressed indexing. Theor. Comput. Sci., 713:66-77, 2018.
Philip Bille, Finn Fernstrøm, and Inge Li Gørtz. Tight Bounds for Top Tree Compression. In Proc. 24th SPIRE, pages 97-102, 2017.
Philip Bille, Inge Li Gørtz, and Frederik Rye Skjoldjensen. Deterministic Indexing for Packed Strings. In Proc. 28th CPM, 2017.
Philip Bille, Inge Li Gørtz, Oren Weimann, and Gad M. Landau. Tree compression with top trees. Inf. Comput., 243:166-177, 2015. Announced at ICALP 2013.
Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random Access to Grammar-Compressed Strings and Trees. SIAM J. Comput., 44(3):513-539, 2015. Announced at SODA 2011.
Barnard Chazelle. Lower bounds for orthogonal range searching: I. The reporting case. J. ACM, 37(2):200-212, 1990.
Bernard Chazelle and Burton Rosenberg. Simplex range reporting on a pointer machine. Comput. Geom., 5(5):237-247, 1996.
Anders R. Christiansen and Mikko B. Ettienne. Compressed Indexing with Signature Grammars. In Proc. 13th LATIN, pages 331-345, 2018.
Francisco Claude and Gonzalo Navarro. Self-indexed grammar-based compression. Fundamenta Informaticae, 111(3):313-337, 2011.
Francisco Claude and Gonzalo Navarro. Improved grammar-based compressed indexes. In Proc. 19th SPIRE, pages 180-192, 2012.
John J Darragh, John G Cleary, and Ian H Witten. Bonsai: a compact representation of trees. Softw. Pract. Exper., 23(3):277-291, 1993.
Paul F. Dietz. Finding level-ancestors in dynamic trees. In Proc. 2nd WADS, pages 32-40, 1991.
Peter J. Downey, Ravi Sethi, and Robert E. Tarjan. Variations on the common subexpression problem. J. ACM, 27(4):758-771, 1980.
Bartlomiej Dudek and Paweł Gawrychowski. Slowing Down Top Trees for Better Worst-Case Compression. In Proc. 29th CPM, pages 16:1-16:8, 2018.
Andrea Farruggia, Travis Gagie, Gonzalo Navarro, Simon J Puglisi, and Jouni Sirén. Relative suffix trees. Comput. J., 61(5):773-788, 2017.
Edward Fredkin. Trie Memory. Commun. ACM, 3(9):490-499, 1960.
Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. A Faster Grammar-Based Self-index. In Proc. 6th LATA, pages 240-251, 2012.
Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. LZ77-based self-indexing with faster pattern matching. In Proc. 11th LATIN, pages 731-742, 2014.
Roberto Grossi and Giuseppe Ottaviano. Fast compressed tries through path decompositions. ACM J. Exp. Alg., 19:3-4, 2015.
Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput., 35(2):378-407, 2005.
Torben Hagerup. Sorting and Searching on the Word RAM. In Proc. 15th STACS, pages 366-398, 1998.
Meng He, J. Ian Munro, and Gelin Zhou. Data Structures for Path Queries. ACM Trans. Algorithms, 12(4):53:1-53:32, 2016.
Robert Hood and Robert Melville. Real-Time Queue Operation in Pure LISP. Inf. Process. Lett., 13(2):50-54, 1981.
Lorenz Hübschle-Schneider and Rajeev Raman. Tree compression with top trees revisited. In Proc. 14th SEA, pages 15-27, 2015.
Shunsuke Kanda, Kazuhiro Morita, and Masao Fuketa. Compressed double-array tries for string dictionaries supporting fast lookup. Knowl. Inf. Syst., 51(3):1023-1042, 2017.
Shunsuke Kanda, Kazuhiro Morita, and Masao Fuketa. Practical implementation of space-efficient dynamic keyword dictionaries. In Proc. 24th SPIRE, pages 221-233, 2017.
Juha Kärkkäinen and Esko Ukkonen. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proc. 3rd WSP, pages 141-155, 1996.
Richard M. Karp and Michael O. Rabin. Efficient Randomized Pattern-Matching Algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987.
Donald E. Knuth, Jr. James H. Morris, and Vaughan R. Pratt. Fast Pattern Matching in Strings. SIAM J. Comput., 6(2):323-350, 1977.
Donald Erwin Knuth. The Art of Computer Programming, Volume 1. Addison Wesley, 1969.
Veli Mäkinen. Compact suffix array - a space-efficient full-text index. Fundamenta Informaticae, 56(1-2):191-210, 2003.
Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. Nordic J. Comput., 12(1):40-66, 2005.
Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and retrieval of individual genomes. In Proc. 13th RECOMB, pages 121-137, 2009.
Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and retrieval of highly repetitive sequence collections. J. Comp.Bio., 17(3):281-308, 2010.
Gonzalo Navarro and Veli Mäkinen. Compressed Full-text Indexes. ACM Comput. Surv., 39(1), 2007.
Gonzalo Navarro and Nicola Prezza. Universal compressed text indexing. Theor. Comput. Sci., 762:41-50, 2019.
Takaaki Nishimoto, I Tomohiro, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic index and LZ factorization in compressed space. Disc. App. Math., 2019.
Andreas Poyias and Rajeev Raman. Improved practical compact dynamic tries. In Proc. 22nd SPIRE, pages 324-336, 2015.
Franco P Preparata and Se June Hong. Convex hulls of finite sets of points in two and three dimensions. Communications of the ACM, 20(2):87-93, 1977.
Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43, 2007.
Kunihiko Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proc. 11th ISAAC, pages 410-421, 2000.
Jouni Sirén, Niko Välimäki, Veli Mäkinen, and Gonzalo Navarro. Run-length compressed indexes are superior for highly repetitive sequence collections. In Proc. 15th SPIRE, pages 164-175, 2008.
Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, and Hiroki Arimura. Linear-size CDAWG: new repetition-aware indexing and grammar compression. In Proc. 24th SPIRE, pages 304-316, 2017.
Takuya Takagi, Shunsuke Inenaga, Kunihiko Sadakane, and Hiroki Arimura. Packed compact tries: A fast and efficient data structure for online string processing. IEICE Trans. on Fund. Elect., Comm. and Comp. Sci., 100(9):1785-1793, 2017.
Robert Endre Tarjan. A class of algorithms which require nonlinear time to maintain disjoint sets. Journal of Computer and System Sciences, 18(2):110-127, April 1979.
Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic Packed Compact Tries Revisited. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.07467.
Susumu Yata. Dictionary compression by nesting prefix/patricia tries. In Proc. 17th Meeting of the Association for Natural Language, 2011.
Naoki Yoshinaga and Masaru Kitsuregawa. A self-adaptive classifier for efficient text-stream processing. In Proc. 25th COLING, pages 1091-1102, 2014.

Top Tree Compression of Tries

Authors Philip Bille , Paweł Gawrychowski , Inge Li Gørtz , Gad M. Landau , Oren Weimann

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Top Tree Compression of Tries

Authors Philip Bille , Paweł Gawrychowski , Inge Li Gørtz , Gad M. Landau , Oren Weimann

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message