LIPIcs.STACS.2020.22.pdf
- Filesize: 0.63 MB
- 29 pages
We propose new entropy measures for trees, the known ones are H_k(?), the k-th order (tree label) entropy (Ferragina at al. 2005), and tree entropy H(?) (Jansson et al. 2006), the former considers only the tree labels and the latter only tree shape. The proposed entropy measures, H_k(?|L) and H_k(L|?), exploit the relation between the labels and the tree shape. We prove that they lower bound label entropy and tree entropy, respectively, i.e. H_k(?|L) ≤ H(?) and H_k(L|?) ≤ H_k(L). Besides being theoretically superior, the new measures are significantly smaller in practice. We also propose a new succinct representation of labeled trees which represents a tree T using one of the following bounds: |T|(H(?) + H_k(L|?)) or |T|(H_k(?|L) + H_k(L)). The representation is based on a new, simple method of partitioning the tree, which preserves both tree shape and node degrees. The previous state-of-the-art method of compressing the tree achieved |T|(H(?) + H_k(L)) bits, by combining the results of Ferragina at al. 2005 and Jansson et al. 2006; so proposed representation is not worse and often superior. Moreover, our representation supports standard tree navigation in constant time as well as more complex queries. Such a structure achieving this space bounds was not known before: aforementioned solution only worked for compression alone, our structure is the first which achieves H_k(?) for k>0 and supports such queries. Lastly, our data structure is fairly simple, both conceptually and in terms of the implementation, moreover it uses known tools, which is a counter-argument to the claim that methods based on tree-partitioning are impractical.
Feedback for Dagstuhl Publishing