Constant-Time Retrieval with O(log m) Extra Bits

Dietzfelbinger, Martin; Walzer, Stefan

doi:10.4230/LIPIcs.STACS.2019.24

File

Subject Classification

ACM Subject Classification

Theory of computation → Data structures design and analysis

Keywords

Retrieval
Hashing
Succinct Data Structure
Randomised Data Structure
Structured Gaussian Elimination
Method of Four Russians

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

For a set U (the universe), retrieval is the following problem. Given a finite subset S subseteq U of size m and f : S -> {0,1}^r for a small constant r, build a data structure D_f with the property that for a suitable query algorithm query we have query(D_f,x) = f(x) for all x in S. For x in U setminus S the value query(D_f,x) is arbitrary in {0,1}^r. The number of bits needed for D_f should be (1+epsilon)r m with overhead epsilon = epsilon(m) >= 0 as small as possible, while the query time should be small. Of course, the time for constructing D_f is relevant as well. We assume fully random hash functions on U with constant evaluation time are available. It is known that with epsilon ~= 0.09 one can achieve linear construction time and constant query time, and with overhead epsilon_k ~= e^{-k} it is possible to have O(k) query time and O(m^{1+alpha}) construction time, for arbitrary alpha>0. Furthermore, a theoretical construction with epsilon =O((log log m)/sqrt{log m}) gives constant query time and linear construction time. Known constructions avoiding all overhead, except for a seed value of size O(log log m), require logarithmic query time. In this paper, we present a method for treating the retrieval problem with overhead epsilon = O((log m)/m), which corresponds to O(1) extra memory words (O(log m) bits), and an extremely simple, constant-time query operation. The price to pay is a construction time of O(m^2). We employ the usual framework for retrieval data structures, where construction is effected by solving a sparse linear system of equations over the 2-element field F_2 and a query is effected by a dot product calculation. Our main technical contribution is the design and analysis of a new and natural family of sparse random linear systems with m equations and (1+epsilon)m variables, which combines good locality properties with high probability of having full rank. Paying a larger overhead of epsilon = O((log m)/m^alpha), the construction time can be reduced to O(m^{1+alpha}) for arbitrary constant 0 < alpha < 1. In combination with an adaptation of known techniques for solving sparse linear systems of equations, our approach leads to a highly practical algorithm for retrieval. In a particular benchmark with m = 10^7 we achieve an order-of-magnitude improvement over previous techniques with epsilon = 0.24% instead of the previously best result of epsilon ~= 3%, with better query time and no significant sacrifices in construction time.

Cite As Get BibTex

Martin Dietzfelbinger and Stefan Walzer. Constant-Time Retrieval with O(log m) Extra Bits. In 36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 126, pp. 24:1-24:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.STACS.2019.24

Author Details

Martin Dietzfelbinger

Technische Universität Ilmenau, Germany

Stefan Walzer

Technische Universität Ilmenau, Germany

References

Austin Appleby. MurmurHash3, 2012. URL: https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp.
Martin Aumüller, Martin Dietzfelbinger, and Michael Rink. Experimental Variations of a Theoretically Good Retrieval Data Structure. In Proc. 17th ESA, pages 742-751, 2009. URL: http://dx.doi.org/10.1007/978-3-642-04128-0_66.
Gregory V. Bard. Algebraic Cryptanalysis, chapter The Method of Four Russians, pages 133-158. Springer US, Boston, MA, 2009. URL: http://dx.doi.org/10.1007/978-0-387-88757-9_9.
Djamal Belazzougui, Fabiano Cupertino Botelho, and Martin Dietzfelbinger. Hash, Displace, and Compress. In Proc. 17th ESA, pages 682-693, 2009. URL: http://dx.doi.org/10.1007/978-3-642-04128-0_61.
Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. BUbiNG: Massive crawling for the masses. In Proc. 23rd WWW'14, pages 227-228, 2014. URL: http://dx.doi.org/10.1145/2567948.2577304.
Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Simple and Space-Efficient Minimal Perfect Hash Functions. In Proc. 10th WADS, pages 139-150, 2007. URL: http://dx.doi.org/10.1007/978-3-540-73951-7_13.
Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Practical Perfect Hashing in Nearly Optimal Space. Inf. Syst., pages 108-131, 2013. URL: http://dx.doi.org/10.1016/j.is.2012.06.002.
Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables. In Proc. 15th SODA, pages 30-39, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982797.
Colin Cooper. On the rank of random matrices. Random Structures &Algorithms, 16(2):209-232, 2000. URL: http://dx.doi.org/10.1002/(SICI)1098-2418(200003)16:2<209::AID-RSA6>3.0.CO;2-1.
Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh, and Michael Rink. Tight Thresholds for Cuckoo Hashing via XORSAT. In Proc. 37th ICALP (1), pages 213-225, 2010. URL: http://dx.doi.org/10.1007/978-3-642-14165-2_19.
Martin Dietzfelbinger and Rasmus Pagh. Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract). In Proc. 35th ICALP (1), pages 385-396, 2008. URL: http://dx.doi.org/10.1007/978-3-540-70575-8_32.
Martin Dietzfelbinger and Christoph Weidling. Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comput. Sci., 380(1-2):47-68, 2007. URL: http://dx.doi.org/10.1016/j.tcs.2007.02.054.
Nikolaos Fountoulakis and Konstantinos Panagiotou. Sharp Load Thresholds for Cuckoo Hashing. Random Struct. Algorithms, 41(3):306-333, 2012. URL: http://dx.doi.org/10.1002/rsa.20426.
Alan M. Frieze and Páll Melsted. Maximum Matchings in Random Bipartite Graphs and the Space Utilization of Cuckoo Hash Tables. Random Struct. Algorithms, 41(3):334-364, 2012. URL: http://dx.doi.org/10.1002/rsa.20427.
Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. Fast Scalable Construction of (Minimal Perfect Hash) Functions. In Proc. 15th SEA, pages 339-352, 2016. URL: http://dx.doi.org/10.1007/978-3-319-38851-9_23.
Torben Hagerup and Torsten Tholey. Efficient minimal perfect hashing in nearly minimal space. In Proc. 18st STACS, pages 317-326, 2001. URL: http://dx.doi.org/10.1007/3-540-44693-1_28.
Brian A. LaMacchia and Andrew M. Odlyzko. Solving large sparse linear systems over finite fields. In CRYPTO '90, 10th Annual International Cryptology Conference, pages 109-133, 1990. URL: http://dx.doi.org/10.1007/3-540-38424-3_8.
Bohdan S. Majewski, Nicholas C. Wormald, George Havas, and Zbigniew J. Czech. A Family of Perfect Hashing Methods. Comput. J., pages 547-554, 1996. URL: http://dx.doi.org/10.1093/comjnl/39.6.547.
Michael Molloy. The pure literal rule threshold and cores in random hypergraphs. In Proc. 15th SODA, pages 672-681, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982896.
Boris Pittel and Gregory B. Sorkin. The Satisfiability Threshold for k-XORSAT. Combinatorics, Probability & Computing, 25(2):236-268, 2016. URL: http://dx.doi.org/10.1017/S0963548315000097.
Ely Porat. An Optimal Bloom Filter Replacement Based on Matrix Solving. In Proc. 4th CSR, pages 263-273, 2009. URL: http://dx.doi.org/10.1007/978-3-642-03351-3_25.
Michael Rink. Mixed Hypergraphs for Linear-Time Construction of Denser Hashing-Based Data Structures. In Proc. 39th SOFSEM, pages 356-368, 2013. URL: http://dx.doi.org/10.1007/978-3-642-35843-2_31.
Steven S. Seiden and Daniel S. Hirschberg. Finding succinct ordered minimal perfect hash functions. Inf. Process. Lett., pages 283-288, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)00108-1.
Douglas H. Wiedemann. Solving Sparse Linear Equations Over Finite Fields. IEEE Transactions on Information Theory, pages 54-62, 1986. URL: http://dx.doi.org/10.1109/TIT.1986.1057137.

Constant-Time Retrieval with O(log m) Extra Bits

Authors Martin Dietzfelbinger , Stefan Walzer

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message