LIPIcs.ISAAC.2024.18.pdf
- Filesize: 0.83 MB
- 15 pages
Generalized suffix trees are data structures for storing and searching a set of strings. Though many string problems can be solved efficiently using them, their space usage can be large relative to the size of the input strings. For a set of strings with n characters in total, generalized suffix trees use O(n log n) bit space, which is much larger than the strings that occupy n log σ bits where σ is the alphabet size. Generalized compressed suffix trees use just O(n log σ) bits but support the same basic operations as the generalized suffix trees. However, for some sophisticated operations we need to add auxiliary data structures of O(n log n) bits. This becomes a bottleneck for applications involving big data. In this paper, we enhance the generalized compressed suffix trees while still retaining their space efficiency. First, we give an auxiliary data structure of O(n) bits for generalized compressed suffix trees such that given a suffix s of a string and another string t, we can find the suffix of t that is closest to s. Next, we give a o(n) bit data structure for finding the ancestor of a node in a (generalized) compressed suffix tree with given string depth. Finally, we give data structures for a generalization of the document listing problem from arrays to trees. We also show their applications to suffix-prefix matching problems.
Feedback for Dagstuhl Publishing