Better Space-Time-Robustness Trade-Offs for Set Reconciliation

Authors Djamal Belazzougui , Gregory Kucherov , Stefan Walzer



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2024.20.pdf
  • Filesize: 0.88 MB
  • 19 pages

Document Identifiers

Author Details

Djamal Belazzougui
  • CAPA, DTISI, Centre de Recherche sur l'Information Scientifique et Technique, Algiers, Algeria
Gregory Kucherov
  • LIGM, CNRS, Université Gustave Eiffel, Marne-la-Vallée, France
Stefan Walzer
  • Karlsruhe Institute of Technology, Germany

Cite AsGet BibTex

Djamal Belazzougui, Gregory Kucherov, and Stefan Walzer. Better Space-Time-Robustness Trade-Offs for Set Reconciliation. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICALP.2024.20

Abstract

We consider the problem of reconstructing the symmetric difference between similar sets from their representations (sketches) of size linear in the number of differences. Exact solutions to this problem are based on error-correcting coding techniques and suffer from a large decoding time. Existing probabilistic solutions based on Invertible Bloom Lookup Tables (IBLTs) are time-efficient but offer insufficient success guarantees for many applications. Here we propose a tunable trade-off between the two approaches combining the efficiency of IBLTs with exponentially decreasing failure probability. The proof relies on a refined analysis of IBLTs proposed in (Bæk Tejs Houen et al. SOSA 2023) which has an independent interest. We also propose a modification of our algorithm that enables telling apart the elements of each set in the symmetric difference.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Randomness, geometry and discrete structures
  • Theory of computation → Sketching and sampling
  • Theory of computation → Error-correcting codes
Keywords
  • data structures
  • hashing
  • set reconciliation
  • invertible Bloom lookup tables
  • random hypergraphs
  • BCH codes

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jakob Bæk Tejs Houen, Rasmus Pagh, and Stefan Walzer. Simple set sketching. In Symposium on Simplicity in Algorithms (SOSA), pages 228-241. SIAM, 2023. Google Scholar
  2. Daniella Bar-Lev, Avi Mizrahi, Tuvi Etzion, Ori Rottenstreich, and Eitan Yaakobi. Coding for IBLTs with listing guarantees. In IEEE International Symposium on Information Theory, ISIT 2023, Taipei, Taiwan, June 25-30, 2023, pages 1657-1662. IEEE, 2023. URL: https://doi.org/10.1109/ISIT54713.2023.10206563.
  3. Mahdi Cheraghchi. Coding-theoretic methods for sparse recovery. In 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011, Allerton Park & Retreat Center, Monticello, IL, USA, 28-30 September, 2011, pages 909-916. IEEE, 2011. URL: https://doi.org/10.1109/Allerton.2011.6120263.
  4. Mahdi Cheraghchi and João Ribeiro. Simple codes and sparse recovery with fast decoding. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 156-160. IEEE, 2019. Google Scholar
  5. Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin, and Adam D. Smith. Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. SIAM J. Comput., 38(1):97-139, 2008. URL: https://doi.org/10.1137/060651380.
  6. David Eppstein and Michael T Goodrich. Straggler identification in round-trip data streams via Newton’s identities and invertible Bloom filters. IEEE Transactions on Knowledge and Data Engineering, 23(2):297-306, 2011. Google Scholar
  7. David Eppstein, Michael T. Goodrich, Frank Uyeda, and George Varghese. What’s the difference? Efficient set reconciliation without prior context. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM '11, pages 218-229, New York, NY, USA, 2011. Association for Computing Machinery. URL: https://doi.org/10.1145/2018436.2018462.
  8. Nils Fleischhacker, Kasper Green Larsen, Maciej Obremski, and Mark Simkin. Invertible Bloom lookup tables with less memory and randomness. CoRR, abs/2306.07583, 2023. https://arxiv.org/abs/2306.07583, URL: https://doi.org/10.48550/arXiv.2306.07583.
  9. Nils Fleischhacker, Kasper Green Larsen, and Mark Simkin. Property-preserving hash functions for Hamming distance from standard assumptions. In Orr Dunkelman and Stefan Dziembowski, editors, Advances in Cryptology - EUROCRYPT 2022, pages 764-781, Cham, 2022. Springer International Publishing. Google Scholar
  10. Sumit Ganguly. Counting distinct items over update streams. Theoretical Computer Science, 378(3):211-222, 2007. Algorithms and Computation. URL: https://doi.org/10.1016/j.tcs.2007.02.031.
  11. Sumit Ganguly and Anirban Majumder. Deterministic k-set structure. Information Processing Letters, 109(1):27-31, 2008. URL: https://doi.org/10.1016/j.ipl.2008.08.010.
  12. Michael T Goodrich and Michael Mitzenmacher. Invertible Bloom lookup tables. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 792-799. IEEE, 2011. Google Scholar
  13. David Harvey and Joris van der Hoeven. Polynomial multiplication over finite fields in time 𝒪(n log n). J. ACM, 69(2):12:1-12:40, 2022. URL: https://doi.org/10.1145/3505584.
  14. Mark G Karpovsky, Lev B Levitin, and Ari Trachtenberg. Data verification and reconciliation with generalized error-control codes. IEEE Transactions on Information Theory, 49(7):1788-1793, 2003. Google Scholar
  15. Jeong Han Kim. Poisson cloning model for random graphs. In Proc. ICM, Vol. III, pages 873-898, 2006. URL: https://www.mathunion.org/fileadmin/ICM/Proceedings/ICM2006.3/ICM2006.3.ocr.pdf.
  16. M. Leconte, M. Lelarge, and L. Massoulié. Convergence of multivariate belief propagation, with applications to cuckoo hashing and load balancing. In Proc. 24th SODA, pages 35-46, 2013. URL: http://dl.acm.org/citation.cfm?id=2627817.2627820.
  17. F.J. MacWilliams and N.J.A. Sloane. The Theory of Error-Correcting Codes. North-holland Publishing Company, 2nd edition, 1978. Google Scholar
  18. Colin McDiarmid. On the method of bounded differences, pages 148-188. London Mathematical Society Lecture Note Series. Cambridge University Press, 1989. URL: https://doi.org/10.1017/CBO9781107359949.008.
  19. Minisketch: an optimized library for BCH-based set reconciliation. https://github.com/sipa/minisketch/. Accessed: 2023-03-28.
  20. Yaron Minsky, Ari Trachtenberg, and Richard Zippel. Set reconciliation with nearly optimal communication complexity. IEEE Transactions on Information Theory, 49(9):2213-2218, 2003. URL: https://doi.org/10.1109/TIT.2003.815784.
  21. Michael Mitzenmacher and Rasmus Pagh. Simple multi-party set reconciliation. Distributed Computing, 31:441-453, 2018. Google Scholar
  22. Michael Mitzenmacher and George Varghese. Biff (Bloom filter) codes: Fast error correction for large data sets. In 2012 IEEE International Symposium on Information Theory Proceedings, pages 483-487. IEEE, 2012. Google Scholar
  23. Avi Mizrahi, Daniella Bar-Lev, Eitan Yaakobi, and Ori Rottenstreich. Invertible Bloom lookup tables with listing guarantees. CoRR, abs/2212.13812, 2022. https://arxiv.org/abs/2212.13812, URL: https://doi.org/10.48550/arXiv.2212.13812.
  24. Michael Molloy. Cores in random hypergraphs and boolean formulas. Random Structures & Algorithms, 27(1):124-135, 2005. Google Scholar
  25. Thomas Morgan. An exploration of two-party reconciliation problems. PhD thesis, Harvard University, Cambridge, MA, May 2018. URL: https://dash.harvard.edu/handle/1/39947174.