,
Pavel Veselý
Creative Commons Attribution 4.0 International license
Invertible Bloom Lookup Tables (IBLTs) provide a highly space-efficient way to reconstruct small sets resulting from a large number of insertions and deletions of elements, such as in streaming or distributed computation of the symmetric difference of similar sets. The set recovery process succeeds if the IBLT size is at least 1.22 times the size of the encoded set; otherwise, a 2-core occurs with high probability in the corresponding random hypergraph. However, the sets in practice often exhibit structure that allows for performance beyond worst-case bounds. Here, we demonstrate that structured sets - such as the k-mers in the symmetric difference of two closely related genomes - can be recovered with an IBLT of significantly smaller size. We achieve this by employing structure-aware predictors to break the 2-core whenever the recovery process gets stuck. Importantly, this approach modifies only the decoding procedure, leaving the IBLT data structure unchanged. We prove that even a weak matching-based predictor enables the recovery of 27% more elements than the nominal IBLT size. Equipped with simple predictors for k-mers of genomic datasets, we demonstrate that recovering a symmetric difference with high probability can be done with an IBLT of size only 66% of the encoded set size for k = 31, improving the space efficiency by almost a factor of two. Moreover, we design an improved method for k-mers with large k that combines subsampling with nearly perfect prediction via fingerprinting and achieves a scaling property, requiring only O(M log M) bits for recovering M k-mers, instead of Θ(k⋅M) bits of the standard IBLT. Overall, our results highlight the possibility of significant space-efficiency improvements for IBLTs on datasets with predictable structure.
@InProceedings{gadurek_et_al:LIPIcs.SEA.2026.19,
author = {Ga\v{d}urek, Vojt\v{e}ch and Vesel\'{y}, Pavel},
title = {{Breaking 2-Cores for Invertible Bloom Lookup Tables by Structure Prediction}},
booktitle = {24th International Symposium on Experimental Algorithms (SEA 2026)},
pages = {19:1--19:24},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-422-2},
ISSN = {1868-8969},
year = {2026},
volume = {371},
editor = {Aum\"{u}ller, Martin and Finocchi, Irene},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2026.19},
URN = {urn:nbn:de:0030-drops-260237},
doi = {10.4230/LIPIcs.SEA.2026.19},
annote = {Keywords: Invertible Bloom Lookup Table, symmetric difference, k-mer sets}
}