In this paper, we study the following problem: given n subsets S₁, … , S_n of an integer universe U = {0,… , u-1}, having total cardinality N = ∑_{i = 1}ⁿ |S_i|, find a prefix-free encoding enc : U → {0,1}^+ minimizing the so-called trie measure, i.e., the total number of edges in the n binary tries T₁, … , T_n, where T_i is the trie packing the encoded integers {enc(x):x ∈ S_i}. We first observe that this problem is equivalent to that of merging u sets with the cheapest sequence of binary unions, a problem which in [Ghosh et al., ICDCS 2015] is shown to be NP-hard. Motivated by the hardness of the general problem, we focus on particular families of prefix-free encodings. We start by studying the fixed-length shifted encoding of [Gupta et al., Theoretical Computer Science 2007]. Given a parameter 0 ≤ a < u, this encoding sends each x ∈ U to (x + a) mod u, interpreted as a bit-string of log u bits. We develop the first efficient algorithms that find the value of a minimizing the trie measure when this encoding is used. Our two algorithms run in O(u + Nlog u) and O(Nlog² u) time, respectively. We proceed by studying ordered encodings (a.k.a. monotone or alphabetic), and describe an algorithm finding the optimal such encoding in O(N+u³) time. Within the same running time, we show how to compute the best shifted ordered encoding, provably no worse than both the optimal shifted and optimal ordered encodings. We provide implementations of our algorithms and discuss how these encodings perform in practice.
@InProceedings{alanko_et_al:LIPIcs.CPM.2025.19, author = {Alanko, Jarno N. and Becker, Ruben and Cenzato, Davide and Gagie, Travis and Kim, Sung-Hwan and Kodric, Bojana and Prezza, Nicola}, title = {{The Trie Measure, Revisited}}, booktitle = {36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)}, pages = {19:1--19:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-369-0}, ISSN = {1868-8969}, year = {2025}, volume = {331}, editor = {Bonizzoni, Paola and M\"{a}kinen, Veli}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.19}, URN = {urn:nbn:de:0030-drops-231135}, doi = {10.4230/LIPIcs.CPM.2025.19}, annote = {Keywords: Succinct data structures, degenerate strings, integer encoding} }
Feedback for Dagstuhl Publishing