Fast Succinct Retrieval and Approximate Membership Using Ribbon

Authors Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, Stefan Walzer



PDF
Thumbnail PDF

File

LIPIcs.SEA.2022.4.pdf
  • Filesize: 1.39 MB
  • 20 pages

Document Identifiers

Author Details

Peter C. Dillinger
  • Meta, Seattle, WA, USA
Lorenz Hübschle-Schneider
  • Karlsruhe Institute of Technology, Germany
Peter Sanders
  • Karlsruhe Institute of Technology, Germany
Stefan Walzer
  • Universität Köln, Germany

Cite AsGet BibTex

Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer. Fast Succinct Retrieval and Approximate Membership Using Ribbon. In 20th International Symposium on Experimental Algorithms (SEA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 233, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.SEA.2022.4

Abstract

A retrieval data structure for a static function f: S → {0,1}^r supports queries that return f(x) for any x ∈ S. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ), i.e., a Bloom filter alternative, with false positive rate 2^{-r}. The information-theoretic lower bound for both tasks is r|S| bits. While succinct theoretical constructions using (1+o(1))r|S| bits were known, these could not achieve very small overheads in practice because they have an unfavorable space-time tradeoff hidden in the asymptotic costs or because small overheads would only be reached for physically impossible input sizes. With bumped ribbon retrieval (BuRR), we present the first practical succinct retrieval data structure. In an extensive experimental evaluation BuRR achieves space overheads well below 1% while being faster than most previously used retrieval data structures (typically with space overheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead ≥ 44%). This efficiency, including favorable constants, stems from a combination of simplicity, word parallelism, and high locality. We additionally describe homogeneous ribbon filter AMQs, which are even simpler and faster at the price of slightly larger space overhead.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data compression
  • Information systems → Point lookups
Keywords
  • AMQ
  • Bloom filter
  • dictionary
  • linear algebra
  • randomized algorithm
  • retrieval data structure
  • static function data structure
  • succinct data structure
  • perfect hashing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Martin Aumüller, Martin Dietzfelbinger, and Michael Rink. Experimental variations of a theoretically good retrieval data structure. In Proc. 17th ESA, pages 742-751, 2009. URL: https://doi.org/10.1007/978-3-642-04128-0_66.
  2. Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. Engineering in-place (shared-memory) sorting algorithms. CoRR, 2020. URL: http://arxiv.org/abs/2009.13569.
  3. Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, and Sebastiano Vigna. Cache-oblivious peeling of random hypergraphs. In Proc. DCC, pages 352-361, 2014. URL: https://doi.org/10.1109/DCC.2014.48.
  4. Djamal Belazzougui and Rossano Venturini. Compressed static functions with applications. In Sanjeev Khanna, editor, Proc. 24th SODA, pages 229-240. SIAM, 2013. URL: https://doi.org/10.1137/1.9781611973105.17.
  5. Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. Don't thrash: How to cache your hash on flash. Proc. VLDB Endow., 5(11):1627-1637, 2012. URL: https://doi.org/10.14778/2350229.2350275.
  6. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422-426, 1970. URL: https://doi.org/10.1145/362686.362692.
  7. Fabiano Cupertino Botelho. Near-Optimal Space Perfect Hashing Algorithms. PhD thesis, Federal University of Minas Gerais, 2008. URL: http://cmph.sourceforge.net/papers/thesis.pdf.
  8. Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Simple and space-efficient minimal perfect hash functions. In Proc. 10th WADS, pages 139-150, 2007. URL: https://doi.org/10.1007/978-3-540-73951-7_13.
  9. Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Practical perfect hashing in nearly optimal space. Inf. Syst., 38(1):108-131, 2013. URL: https://doi.org/10.1016/j.is.2012.06.002.
  10. Alex D. Breslow and Nuwan Jayasena. Morton filters: fast, compressed sparse cuckoo filters. VLDB J., 29(2-3):731-754, 2020. URL: https://doi.org/10.1007/s00778-019-00561-0.
  11. Andrei Z. Broder and Anna R. Karlin. Multilevel adaptive hashing. In David S. Johnson, editor, Proc. 1st SODA, pages 43-53. SIAM, 1990. URL: http://dl.acm.org/citation.cfm?id=320176.320181.
  12. Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. The Bloomier filter: An efficient data structure for static support lookup tables. In Proc. 15th SODA, pages 30-39. SIAM, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982797.
  13. Colin Cooper. On the rank of random matrices. Random Structures & Algorithms, 16(2):209-232, 2000. URL: https://doi.org/10.1002/(SICI)1098-2418(200003)16:2<209::AID-RSA6>3.0.CO;2-1.
  14. Niv Dayan, Manos Athanassoulis, and Stratos Idreos. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 79-94, New York, NY, USA, 2017. Association for Computing Machinery. URL: https://doi.org/10.1145/3035918.3064054.
  15. Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh, and Michael Rink. Tight thresholds for cuckoo hashing via XORSAT. In Proc. 37th ICALP (1), pages 213-225, 2010. URL: https://doi.org/10.1007/978-3-642-14165-2_19.
  16. Martin Dietzfelbinger and Rasmus Pagh. Succinct data structures for retrieval and approximate membership (extended abstract). In Proc. 35th ICALP (1), pages 385-396, 2008. URL: https://doi.org/10.1007/978-3-540-70575-8_32.
  17. Martin Dietzfelbinger and Michael Rink. Applications of a splitting trick. In Proc. 36th ICALP (1), pages 354-365, 2009. URL: https://doi.org/10.1007/978-3-642-02927-1_30.
  18. Martin Dietzfelbinger and Stefan Walzer. Constant-time retrieval with O(log m) extra bits. In Proc. 36th STACS, pages 24:1-24:16, 2019. URL: https://doi.org/10.4230/LIPIcs.STACS.2019.24.
  19. Martin Dietzfelbinger and Stefan Walzer. Dense peelable random uniform hypergraphs. In Proc. 27th ESA, pages 38:1-38:16, 2019. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.38.
  20. Martin Dietzfelbinger and Stefan Walzer. Efficient Gauss elimination for near-quadratic matrices with one short random block per row, with applications. In Proc. 27th ESA, pages 39:1-39:18, 2019. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.39.
  21. Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer. Fast succinct retrieval and approximate membership using ribbon. CoRR, abs/2106.12270, 2021. URL: http://arxiv.org/abs/2109.01892.
  22. Peter C. Dillinger and Stefan Walzer. Ribbon filter: practically smaller than Bloom and Xor. CoRR, 2021. URL: http://arxiv.org/abs/2103.02515.
  23. Olivier Dubois and Jacques Mandler. The 3-XORSAT threshold. In Proc. 43rd FOCS, pages 769-778, 2002. URL: https://doi.org/10.1109/SFCS.2002.1182002.
  24. Bin Fan, David G. Andersen, and Michael Kaminsky. Cuckoo filter: Better than Bloom. ;login:, 38(4), 2013. URL: https://www.usenix.org/publications/login/august-2013-volume-38-number-4/cuckoo-filter-better-bloom.
  25. Dimitris Fotakis, Rasmus Pagh, Peter Sanders, and Paul G. Spirakis. Space efficient hash tables with worst case constant access time. Theory Comput. Syst., 38(2):229-248, 2005. URL: https://doi.org/10.1007/s00224-004-1195-x.
  26. Nikolaos Fountoulakis and Konstantinos Panagiotou. Sharp load thresholds for cuckoo hashing. Random Struct. Algorithms, 41(3):306-333, 2012. URL: https://doi.org/10.1002/rsa.20426.
  27. Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. Fast scalable construction of (minimal perfect hash) functions. In Proc. 15th SEA, pages 339-352, 2016. URL: https://doi.org/10.1007/978-3-319-38851-9_23.
  28. Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. Fast scalable construction of ([compressed] static | minimal perfect hash) functions. Information and Computation, 2020. URL: https://doi.org/10.1016/j.ic.2020.104517.
  29. Thomas Mueller Graf and Daniel Lemire. fastfilter_cpp, 2019. URL: https://github.com/FastFilter/fastfilter_cpp.
  30. Thomas Mueller Graf and Daniel Lemire. Xor filters: Faster and smaller than Bloom and cuckoo filters. ACM J. Exp. Algorithmics, 25:1-16, 2020. URL: https://doi.org/10.1145/3376122.
  31. Jóhannes B. Hreinsson, Morten Krøyer, and Rasmus Pagh. Storing a compressed function with constant time access. In Proc. 17th ESA, pages 730-741, 2009. URL: https://doi.org/10.1007/978-3-642-04128-0_65.
  32. Svante Janson and Malwina J. Luczak. A simple solution to the k-core problem. Random Struct. Algorithms, 30(1-2):50-62, 2007. URL: https://doi.org/10.1002/rsa.20147.
  33. Marc Lelarge. A new approach to the orientation of random hypergraphs. In Proc. 23rd SODA, pages 251-264. SIAM, 2012. URL: https://doi.org/10.1137/1.9781611973099.23.
  34. Michael Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, and Daniel A. Spielman. Efficient erasure correcting codes. IEEE Trans. Inf. Theory, 47(2):569-584, 2001. URL: https://doi.org/10.1109/18.910575.
  35. Tobias Maier, Peter Sanders, and Robert Williger. Concurrent expandable AMQs on the basis of quotient filters. In Proc. 18th SEA, pages 15:1-15:13, 2020. URL: https://doi.org/10.4230/LIPIcs.SEA.2020.15.
  36. Michael Molloy. Cores in random hypergraphs and Boolean formulas. Random Struct. Algorithms, 27(1):124-135, 2005. URL: https://doi.org/10.1002/rsa.20061.
  37. Thomas Mueller Graf and Daniel Lemire. Binary fuse filters: Fast and smaller than xor filters. ACM Journal of Experimental Algorithmics, 27, March 2022. URL: https://doi.org/10.1145/3510449.
  38. Ingo Müller, Cornelius Ratsch, and Franz Färber. Adaptive string dictionary compression in in-memory column-store database systems. In Proc. 17th EDBT, pages 283-294, 2014. URL: https://doi.org/10.5441/002/edbt.2014.27.
  39. Ingo Müller, Peter Sanders, Robert Schulze, and Wei Zhou. Retrieval and perfect hashing using fingerprinting. In Proc. 14th SEA, pages 138-149, 2014. URL: https://doi.org/10.1007/978-3-319-07959-2_12.
  40. Ely Porat. An optimal Bloom filter replacement based on matrix solving. In Proc. 4th CSR, pages 263-273, 2009. URL: https://doi.org/10.1007/978-3-642-03351-3_25.
  41. Felix Putze, Peter Sanders, and Johannes Singler. Cache-, hash-, and space-efficient Bloom filters. ACM Journal of Experimental Algorithmics, 14, 2009. URL: https://doi.org/10.1145/1498698.1594230.
  42. Kapil Vaidya, Eric Knorr, Tim Kraska, and Michael Mitzenmacher. Partitioned learned bloom filter. CoRR, abs/2006.03176, 2020. URL: http://arxiv.org/abs/2006.03176.
  43. Stefan Walzer. Random Hypergraphs for Hashing-Based Data Structures. PhD thesis, Technische Universität Ilmenau, 2020. URL: https://www.db-thueringen.de/receive/dbt_mods_00047127.
  44. Stefan Walzer. Peeling close to the orientability threshold: Spatial coupling in hashing-based data structures. In Proc. 32nd SODA, pages 2194-2211. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.131.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail