,
Inge Li Gørtz
,
Máximo Pérez-López
Creative Commons Attribution 4.0 International license
We introduce a framework to use any relative compression algorithm as a subroutine for hierarchical relative compression. In a dataset consisting of n sequences, it consists of constructing a rooted tree on the sequences, using hashing and similarity techniques, and compressing the children of a node relative to their parent. We build up on previous techniques [Bille et al., 2023], and optimize them further for computational efficiency. We test our framework with three existing relative compression algorithms on six genomic datasets, and we show that in datasets that contain heterogeneous data, hierarchical relative compression improves the compression ratio by a factor 2 or more, when compared to relative compression to a single sequence. Apart from compression ratio, we also explore the trade-offs with respect to compression speed, dataset decompression speed, and average sequence decompression speed. With two of the surveyed algorithms, dataset decompression becomes faster and sequence decompression remains practical, at the cost of compression time, which remains competitive for the datasets with highest variability.
@InProceedings{bille_et_al:LIPIcs.SEA.2026.7,
author = {Bille, Philip and G{\o}rtz, Inge Li and P\'{e}rez-L\'{o}pez, M\'{a}ximo},
title = {{From Relative Compression to Hierarchical Compression}},
booktitle = {24th International Symposium on Experimental Algorithms (SEA 2026)},
pages = {7:1--7:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-422-2},
ISSN = {1868-8969},
year = {2026},
volume = {371},
editor = {Aum\"{u}ller, Martin and Finocchi, Irene},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2026.7},
URN = {urn:nbn:de:0030-drops-260117},
doi = {10.4230/LIPIcs.SEA.2026.7},
annote = {Keywords: Relative compression, RLZ, string collections, compressed representation, data structures, efficient algorithms}
}