Fast and Scalable Minimal Perfect Hashing for Massive Key Sets

Authors Antoine Limasset, Guillaume Rizk, Rayan Chikhi, Pierre Peterlongo



PDF
Thumbnail PDF

File

LIPIcs.SEA.2017.25.pdf
  • Filesize: 0.67 MB
  • 16 pages

Document Identifiers

Author Details

Antoine Limasset
Guillaume Rizk
Rayan Chikhi
Pierre Peterlongo

Cite As Get BibTex

Antoine Limasset, Guillaume Rizk, Rayan Chikhi, and Pierre Peterlongo. Fast and Scalable Minimal Perfect Hashing for Massive Key Sets. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 25:1-25:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.SEA.2017.25

Abstract

Minimal perfect hash functions provide space-efficient and collision-free hashing on static sets. Existing algorithms and implementations that build such functions have practical limitations on the number of input elements they can process, due to high construction time, RAM or external memory usage. We revisit a simple algorithm and show that it is highly competitive with the state of the art, especially in terms of construction time and memory usage. We provide a parallel C++ implementation called BBhash. It is capable of creating a minimal perfect hash function of 10^{10} elements in less than 7 minutes using 8 threads and 5 GB of memory, and the resulting function uses 3.7 bits/element. To the best of our knowledge, this is also the first implementation that has been successfully tested on an input of cardinality 10^{12}.
Source code: https://github.com/rizkg/BBHash

Subject Classification

Keywords
  • Minimal Perfect Hash Functions
  • Algorithms
  • Data Structures
  • Big Data

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, and Sebastiano Vigna. Cache-oblivious peeling of random hypergraphs. In Data Compression Conference (DCC), 2014, pages 352-361. IEEE, 2014. Google Scholar
  2. Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger. Hash, displace, and compress. In European Symposium on Algorithms, pages 682-693. Springer, 2009. Google Scholar
  3. Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. Simple and space-efficient minimal perfect hash functions. In Algorithms and Data Structures, pages 139-150. Springer, 2007. Google Scholar
  4. Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. Practical perfect hashing in nearly optimal space. Information Systems, 38(1):108-131, 2013. Google Scholar
  5. Chin-Chen Chang and Chih-Yang Lin. Perfect hashing schemes for mining association rules. The Computer Journal, 48(2):168-179, 2005. URL: http://dx.doi.org/10.1093/comjnl/bxh074.
  6. Jarrod A. Chapman, Isaac Ho, Sirisha Sunkara, Shujun Luo, Gary P. Schroth, and Daniel S. Rokhsar. Meraculous: de novo genome assembly with short paired-end reads. PloS one, 6(8):e23501, 2011. Google Scholar
  7. Yupeng Chen, Bertil Schmidt, and Douglas L Maskell. A hybrid short read mapping accelerator. BMC Bioinformatics, 14(1):67, 2013. URL: http://dx.doi.org/10.1186/1471-2105-14-67.
  8. Rayan Chikhi, Antoine Limasset, and Paul Medvedev. Compacting de bruijn graphs from sequencing data quickly and in low memory. Bioinformatics, 32(12):i201-i208, 2016. Google Scholar
  9. Zbigniew J. Czech, George Havas, and Bohdan S. Majewski. Perfect hashing. Theoretical Computer Science, 182(1):1-143, 1997. Google Scholar
  10. Michael L. Fredman and János Komlós. On the size of separating systems and families of perfect hash functions. SIAM Journal on Algebraic Discrete Methods, 5(1):61-68, 1984. Google Scholar
  11. Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. Fast scalable construction of (minimal perfect hash) functions. In V. Andrew Goldberg and S. Alexander Kulikov, editors, Experimental Algorithms: 15th International Symposium, SEA 2016, St. Petersburg, Russia, June 5-8, 2016, Proceedings, pages 339-352. Springer International Publishing, Cham, 2016. URL: http://dx.doi.org/10.1007/978-3-319-38851-9_23.
  12. Yi Lu, Balaji Prabhakar, and Flavio Bonomi. Perfect hashing for network applications. In 2006 IEEE International Symposium on Information Theory, pages 2774-2778. IEEE, 2006. Google Scholar
  13. George Marsaglia et al. Xorshift rngs. Journal of Statistical Software, 8(14):1-6, 2003. Google Scholar
  14. Kurt Mehlhorn. On the program size of perfect and universal hash functions. In Foundations of Computer Science, 1982. SFCS'08. 23rd Annual Symposium on, pages 170-175. IEEE, 1982. Google Scholar
  15. Michael Mitzenmacher and Salil Vadhan. Why simple hash functions work: exploiting the entropy in a data stream. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 746-755. Society for Industrial and Applied Mathematics, 2008. Google Scholar
  16. Ingo Müller, Peter Sanders, Robert Schulze, and Wei Zhou. Retrieval and Perfect Hashing Using Fingerprinting, pages 138-149. Springer International Publishing, Cham, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07959-2_12.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail