Dense Peelable Random Uniform Hypergraphs

Dietzfelbinger, Martin; Walzer, Stefan

doi:10.4230/LIPIcs.ESA.2019.38

Abstract

We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled “from the outside in”. The density thresholds f_k for peelability of our hypergraphs (f_3 ≈ 0.918, f_4 ≈ 0.977, f_5 ≈ 0.992, …) are well beyond the corresponding thresholds (c_3 ≈ 0.818, c_4 ≈ 0.772, c_5 ≈ 0.702, …) of standard k-uniform random hypergraphs. To get a grip on f_k, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on [0,1]^ℤ and f_k can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods. Random hypergraphs underlie the construction of various data structures based on hashing, for instance invertible Bloom filters, perfect hash functions, retrieval data structures, error correcting codes and cuckoo hash tables, where inputs are mapped to edges using hash functions. Frequently, the data structures rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. Memory efficiency is closely tied to edge density while worst and average case query times are tied to maximum and average edge size. To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al. [Fabiano Cupertino Botelho et al., 2013]. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time. Using k > 3 attains, at small sacrifices in running time, further improvements to memory usage.

David Aldous and J. Michael Steele. The objective method: Probabilistic combinatorial optimization and local weak convergence. In Probability on Discrete Structures. Encyclopaedia of Mathematical Sciences (Probability Theory), volume 110, pages 1-72. Springer, Berlin, Heidelberg, 2004. URL: https://doi.org/10.1007/978-3-662-09444-0_1.
Austin Appleby. MurmurHash3, 2012. URL: https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp.
Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, and Sebastiano Vigna. Cache-oblivious peeling of random hypergraphs. In Data Compression Conference, pages 352-361, 2014. URL: https://doi.org/10.1109/DCC.2014.48.
Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 1970. URL: https://doi.org/10.1145/362686.362692.
Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. BUbiNG: Massive crawling for the masses. In Proc. 23rd WWW'14, pages 227-228, 2014. URL: https://doi.org/10.1145/2567948.2577304.
Fabiano Cupertino Botelho. Near-Optimal Space Perfect Hashing Algorithms. PhD thesis, Federal University of Minas Gerais, 2008. URL: http://cmph.sourceforge.net/papers/thesis.pdf.
Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Simple and space-efficient minimal perfect hash functions. In Proc. 10th WADS, pages 139-150, 2007. URL: https://doi.org/10.1007/978-3-540-73951-7_13.
Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Practical perfect hashing in nearly optimal space. Inf. Syst., pages 108-131, 2013. URL: https://doi.org/10.1016/j.is.2012.06.002.
Julie Anne Cain, Peter Sanders, and Nicholas C. Wormald. The random graph threshold for k-orientiability and a fast algorithm for optimal multiple-choice allocation. In Proc. 18th SODA, pages 469-476, 2007. URL: http://dl.acm.org/citation.cfm?id=1283383.1283433.
Denis Xavier Charles and Kumar Chellapilla. Bloomier filters: A second look. In Proc. 16th ESA, 2008. URL: https://doi.org/10.1007/978-3-540-87744-8_22.
Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh, and Michael Rink. Tight thresholds for cuckoo hashing via XORSAT. In Proc. 37th ICALP (1), pages 213-225, 2010. URL: https://doi.org/10.1007/978-3-642-14165-2_19.
Martin Dietzfelbinger and Rasmus Pagh. Succinct data structures for retrieval and approximate membership (extended abstract). In Proc. 35th ICALP (1), pages 385-396, 2008. URL: https://doi.org/10.1007/978-3-540-70575-8_32.
Martin Dietzfelbinger and Stefan Walzer. Constant-time retrieval with O(log m) extra bits. In Proc. 36th STACS, pages 24:1-24:16, 2019. URL: https://doi.org/10.4230/LIPIcs.STACS.2019.24.
Martin Dietzfelbinger and Christoph Weidling. Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comput. Sci., 380(1-2):47-68, 2007. URL: https://doi.org/10.1016/j.tcs.2007.02.054.
Olivier Dubois and Jacques Mandler. The 3-XORSAT threshold. In Proc. 43rd FOCS, pages 769-778, 2002. URL: https://doi.org/10.1109/SFCS.2002.1182002.
David Eppstein and Michael T. Goodrich. Straggler identification in round-trip data streams via Newton’s identities and invertible Bloom filters. IEEE Trans. on Knowl. and Data Eng., 23(2):297-306, 2011. URL: https://doi.org/10.1109/TKDE.2010.132.
Daniel Fernholz and Vijaya Ramachandran. The k-orientability thresholds for g_n,p. In Proc. 18th SODA, pages 459-468, 2007. URL: http://dl.acm.org/citation.cfm?id=1283383.1283432.
Nikolaos Fountoulakis, Megha Khosla, and Konstantinos Panagiotou. The multiple-orientability thresholds for random hypergraphs. Combinatorics, Probability & Computing, 25(6):870-908, 2016. URL: https://doi.org/10.1017/S0963548315000334.
Nikolaos Fountoulakis and Konstantinos Panagiotou. Sharp load thresholds for cuckoo hashing. Random Struct. Algorithms, 41(3):306-333, 2012. URL: https://doi.org/10.1002/rsa.20426.
Alan M. Frieze and Páll Melsted. Maximum matchings in random bipartite graphs and the space utilization of cuckoo hash tables. Random Struct. Algorithms, 41(3):334-364, 2012. URL: https://doi.org/10.1002/rsa.20427.
Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. Fast scalable construction of (minimal perfect hash) functions. In Proc. 15th SEA, pages 339-352, 2016. URL: https://doi.org/10.1007/978-3-319-38851-9_23.
Michael T. Goodrich and Michael Mitzenmacher. Invertible Bloom lookup tables. In Proc. 49th Annual Allerton Conference on Communication, Control, and Computing, pages 792-799, 2011. URL: https://doi.org/10.1109/Allerton.2011.6120248.
Svante Janson and Malwina J. Luczak. A simple solution to the k-core problem. Random Struct. Algorithms, 30(1-2):50-62, 2007. URL: https://doi.org/10.1002/rsa.20147.
Jeong Han Kim. Poisson cloning model for random graphs. In Proc. ICM Madrid 2006 Vol. III, pages 873-898, 2006. URL: https://www.mathunion.org/fileadmin/ICM/Proceedings/ICM2006.3/ICM2006.3.ocr.pdf.
Mathieu Leconte. Double hashing thresholds via local weak convergence. In 51st Annual Allerton Conference on Communication, Control, and Computing, pages 131-137, 2013. URL: https://doi.org/10.1109/Allerton.2013.6736515.
Eric Lehman and Rina Panigrahy. 3.5-way cuckoo hashing for the price of 2-and-a-bit. In Proc. 17th ESA, pages 671-681, 2009. URL: https://doi.org/10.1007/978-3-642-04128-0_60.
Marc Lelarge. A new approach to the orientation of random hypergraphs. In Proc. 23rd SODA, pages 251-264, 2012. URL: https://doi.org/10.1137/1.9781611973099.23.
Michael Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, and Daniel A. Spielman. Efficient erasure correcting codes. IEEE Transactions on Information Theory, 47(2):569-584, 2001. URL: https://doi.org/10.1109/18.910575.
Tomasz Luczak. Size and connectivity of the k-core of a random graph. Discrete Mathematics, 91(1):61-68, 1991. URL: https://doi.org/10.1016/0012-365X(91)90162-U.
Bohdan S. Majewski, Nicholas C. Wormald, George Havas, and Zbigniew J. Czech. A family of perfect hashing methods. Comput. J., pages 547-554, 1996. URL: https://doi.org/10.1093/comjnl/39.6.547.
Michael Mitzenmacher. Some open questions related to cuckoo hashing. In Proc. 17th ESA, 2009. URL: https://doi.org/10.1007/978-3-642-04128-0_1.
Michael Mitzenmacher, Konstantinos Panagiotou, and Stefan Walzer. Load thresholds for cuckoo hashing with double hashing. In 16th SWAT, pages 29:1-29:9, 2018. URL: https://doi.org/10.4230/LIPIcs.SWAT.2018.29.
Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, New York, NY, USA, 2nd edition, 2017.
Michael Mitzenmacher and George Varghese. Biff (Bloom filter) codes: Fast error correction for large data sets. In Proc. ISIT 2012, pages 483-487, 2012. URL: https://doi.org/10.1109/ISIT.2012.6284236.
Michael Molloy. Cores in random hypergraphs and boolean formulas. Random Struct. Algorithms, 27(1):124-135, 2005. URL: https://doi.org/10.1002/rsa.20061.
Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. J. Algorithms, 51(2):122-144, 2004. URL: https://doi.org/10.1016/j.jalgor.2003.12.002.
Boris Pittel and Gregory B. Sorkin. The satisfiability threshold for k-XORSAT. Combinatorics, Probability & Computing, 25(2):236-268, 2016. URL: https://doi.org/10.1017/S0963548315000097.
Ely Porat. An optimal Bloom filter replacement based on matrix solving. In Proc. 4th CSR, pages 263-273, 2009. URL: https://doi.org/10.1007/978-3-642-03351-3_25.
Michael Rink. Mixed hypergraphs for linear-time construction of denser hashing-based data structures. In Proc. 39th SOFSEM, pages 356-368, 2013. URL: https://doi.org/10.1007/978-3-642-35843-2_31.
Stefan Walzer. Load thresholds for cuckoo hashing with overlapping blocks. In Proc. 45th ICALP, pages 102:1-102:10, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.102.

Dense Peelable Random Uniform Hypergraphs

Authors Martin Dietzfelbinger , Stefan Walzer

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References