Constant-Time Retrieval with O(log m) Extra Bits

Authors Martin Dietzfelbinger , Stefan Walzer

Thumbnail PDF


  • Filesize: 0.67 MB
  • 16 pages

Document Identifiers

Author Details

Martin Dietzfelbinger
  • Technische Universität Ilmenau, Germany
Stefan Walzer
  • Technische Universität Ilmenau, Germany

Cite AsGet BibTex

Martin Dietzfelbinger and Stefan Walzer. Constant-Time Retrieval with O(log m) Extra Bits. In 36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 126, pp. 24:1-24:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


For a set U (the universe), retrieval is the following problem. Given a finite subset S subseteq U of size m and f : S -> {0,1}^r for a small constant r, build a data structure D_f with the property that for a suitable query algorithm query we have query(D_f,x) = f(x) for all x in S. For x in U setminus S the value query(D_f,x) is arbitrary in {0,1}^r. The number of bits needed for D_f should be (1+epsilon)r m with overhead epsilon = epsilon(m) >= 0 as small as possible, while the query time should be small. Of course, the time for constructing D_f is relevant as well. We assume fully random hash functions on U with constant evaluation time are available. It is known that with epsilon ~= 0.09 one can achieve linear construction time and constant query time, and with overhead epsilon_k ~= e^{-k} it is possible to have O(k) query time and O(m^{1+alpha}) construction time, for arbitrary alpha>0. Furthermore, a theoretical construction with epsilon =O((log log m)/sqrt{log m}) gives constant query time and linear construction time. Known constructions avoiding all overhead, except for a seed value of size O(log log m), require logarithmic query time. In this paper, we present a method for treating the retrieval problem with overhead epsilon = O((log m)/m), which corresponds to O(1) extra memory words (O(log m) bits), and an extremely simple, constant-time query operation. The price to pay is a construction time of O(m^2). We employ the usual framework for retrieval data structures, where construction is effected by solving a sparse linear system of equations over the 2-element field F_2 and a query is effected by a dot product calculation. Our main technical contribution is the design and analysis of a new and natural family of sparse random linear systems with m equations and (1+epsilon)m variables, which combines good locality properties with high probability of having full rank. Paying a larger overhead of epsilon = O((log m)/m^alpha), the construction time can be reduced to O(m^{1+alpha}) for arbitrary constant 0 < alpha < 1. In combination with an adaptation of known techniques for solving sparse linear systems of equations, our approach leads to a highly practical algorithm for retrieval. In a particular benchmark with m = 10^7 we achieve an order-of-magnitude improvement over previous techniques with epsilon = 0.24% instead of the previously best result of epsilon ~= 3%, with better query time and no significant sacrifices in construction time.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data structures design and analysis
  • Retrieval
  • Hashing
  • Succinct Data Structure
  • Randomised Data Structure
  • Structured Gaussian Elimination
  • Method of Four Russians


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Austin Appleby. MurmurHash3, 2012. URL:
  2. Martin Aumüller, Martin Dietzfelbinger, and Michael Rink. Experimental Variations of a Theoretically Good Retrieval Data Structure. In Proc. 17th ESA, pages 742-751, 2009. URL:
  3. Gregory V. Bard. Algebraic Cryptanalysis, chapter The Method of Four Russians, pages 133-158. Springer US, Boston, MA, 2009. URL:
  4. Djamal Belazzougui, Fabiano Cupertino Botelho, and Martin Dietzfelbinger. Hash, Displace, and Compress. In Proc. 17th ESA, pages 682-693, 2009. URL:
  5. Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. BUbiNG: Massive crawling for the masses. In Proc. 23rd WWW'14, pages 227-228, 2014. URL:
  6. Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Simple and Space-Efficient Minimal Perfect Hash Functions. In Proc. 10th WADS, pages 139-150, 2007. URL:
  7. Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Practical Perfect Hashing in Nearly Optimal Space. Inf. Syst., pages 108-131, 2013. URL:
  8. Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables. In Proc. 15th SODA, pages 30-39, 2004. URL:
  9. Colin Cooper. On the rank of random matrices. Random Structures &Algorithms, 16(2):209-232, 2000. URL:<209::AID-RSA6>3.0.CO;2-1.
  10. Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh, and Michael Rink. Tight Thresholds for Cuckoo Hashing via XORSAT. In Proc. 37th ICALP (1), pages 213-225, 2010. URL:
  11. Martin Dietzfelbinger and Rasmus Pagh. Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract). In Proc. 35th ICALP (1), pages 385-396, 2008. URL:
  12. Martin Dietzfelbinger and Christoph Weidling. Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comput. Sci., 380(1-2):47-68, 2007. URL:
  13. Nikolaos Fountoulakis and Konstantinos Panagiotou. Sharp Load Thresholds for Cuckoo Hashing. Random Struct. Algorithms, 41(3):306-333, 2012. URL:
  14. Alan M. Frieze and Páll Melsted. Maximum Matchings in Random Bipartite Graphs and the Space Utilization of Cuckoo Hash Tables. Random Struct. Algorithms, 41(3):334-364, 2012. URL:
  15. Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. Fast Scalable Construction of (Minimal Perfect Hash) Functions. In Proc. 15th SEA, pages 339-352, 2016. URL:
  16. Torben Hagerup and Torsten Tholey. Efficient minimal perfect hashing in nearly minimal space. In Proc. 18st STACS, pages 317-326, 2001. URL:
  17. Brian A. LaMacchia and Andrew M. Odlyzko. Solving large sparse linear systems over finite fields. In CRYPTO '90, 10th Annual International Cryptology Conference, pages 109-133, 1990. URL:
  18. Bohdan S. Majewski, Nicholas C. Wormald, George Havas, and Zbigniew J. Czech. A Family of Perfect Hashing Methods. Comput. J., pages 547-554, 1996. URL:
  19. Michael Molloy. The pure literal rule threshold and cores in random hypergraphs. In Proc. 15th SODA, pages 672-681, 2004. URL:
  20. Boris Pittel and Gregory B. Sorkin. The Satisfiability Threshold for k-XORSAT. Combinatorics, Probability & Computing, 25(2):236-268, 2016. URL:
  21. Ely Porat. An Optimal Bloom Filter Replacement Based on Matrix Solving. In Proc. 4th CSR, pages 263-273, 2009. URL:
  22. Michael Rink. Mixed Hypergraphs for Linear-Time Construction of Denser Hashing-Based Data Structures. In Proc. 39th SOFSEM, pages 356-368, 2013. URL:
  23. Steven S. Seiden and Daniel S. Hirschberg. Finding succinct ordered minimal perfect hash functions. Inf. Process. Lett., pages 283-288, 1994. URL:
  24. Douglas H. Wiedemann. Solving Sparse Linear Equations Over Finite Fields. IEEE Transactions on Information Theory, pages 54-62, 1986. URL:
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail