PtrHash: Minimal Perfect Hashing at RAM Throughput

Groot Koerkamp, Ragnar

doi:10.4230/LIPIcs.SEA.2025.21

Abstract

Motivation. Given a set K of n keys, a minimal perfect hash function (MPHF) is a collision-free bijective map H_mphf from K to {0, … , n-1}. These functions have uses in databases, search engines, and are used in bioinformatics indexing tools such as Pufferfish (using BBHash), and Piscem (PTHash). PTHash is also used in SSHash, a data structure on k-mers that supports membership queries. PTHash only takes around 5% of the total space of SSHash, and thus, trading slightly more space for faster queries is beneficial. Thus, this work presents a (minimal) perfect hash function that first prioritizes query throughput, while also allowing efficient construction for 10⁹ or more elements using 2.4 bits of memory per key.
Contributions. Both PTHash and PHOBIC first map all n keys to n/λ < n buckets. Then, each bucket stores a pilot that controls the final hash value of the keys mapping to it. PtrHash builds on this by using 1) fixed-width (uncompressed) 8-bit pilots, 2) a construction algorithm similar to Cuckoo hashing to find suitable pilot values. Further, it partitions the keys, so that keys in each part map to their own set of slots. PtrHash 3) uses the same number of buckets and slots for each part, with 4) a single remap table to map intermediate positions ≥ n to < n, 5) encoded using per-cacheline Elias-Fano coding. Lastly, 6) PtrHash supports streaming queries, where we use prefetching to answer a stream of multiple queries more efficiently than one-by-one processing.
Results. With default parameters, PtrHash takes 2.4 bits per key. On 300 million string keys, PtrHash is as fast or faster to build than other MPHFs at a similar size, and at least 2.1× faster to query. When streaming multiple queries, this improves to 3.3× speedup over the fastest alternative, while also being significantly faster to construct. When using 10⁹ integer keys instead, query times are as low as 12 ns/key when iterating in a for loop, or even down to 8 ns/key when using the streaming approach, just short of the 7.4 ns inverse throughput of random memory accesses.

Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger. Hash, displace, and compress. In Algorithms - ESA 2009, pages 682-693. Springer Berlin Heidelberg, 2009. URL: https://doi.org/10.1007/978-3-642-04128-0_61.
Djamal Belazzougui and Gonzalo Navarro. Alphabet-independent compressed text indexing. ACM Transactions on Algorithms, 10(4):1-19, August 2014. URL: https://doi.org/10.1145/2635816.
Piotr Beling. Fingerprinting-based minimal perfect hashing revisited. ACM Journal of Experimental Algorithmics, 28:1-16, June 2023. URL: https://doi.org/10.1145/3596453.
Piotr Beling and Peter Sanders. Phast - perfect hashing with fast evaluation, 2025. URL: https://doi.org/10.48550/arXiv.2504.17918.
Dominik Bez, Florian Kurpicz, Hans-Peter Lehmann, and Peter Sanders. High Performance Construction of RecSplit Based Minimal Perfect Hash Functions. In Inge Li Gørtz, Martin Farach-Colton, Simon J. Puglisi, and Grzegorz Herman, editors, 31st Annual European Symposium on Algorithms (ESA 2023), volume 274 of Leibniz International Proceedings in Informatics (LIPIcs), pages 19:1-19:16, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2023.19.
Christopher Breeden. fxhash: A fast, non-secure, hashing algorithm derived from an internal hasher in firefox. URL: https://crates.io/crates/fxhash.
C.-C. Chang. Perfect hashing schemes for mining association rules. The Computer Journal, 48(2):168-179, February 2005. URL: https://doi.org/10.1093/comjnl/bxh074.
Travis Downs. Measuring reorder buffer capacity on skylake, April 2017. URL: https://www.realworldtech.com/forum/?threadid=166772&curpostid=167685.
Travis Downs. Measuring reorder buffer capacity on skylake, November 2018. URL: https://github.com/Kobzol/hardware-effects/issues/1#issuecomment-441111396.
Peter Elias. Efficient storage and retrieval by content and address of static files. Journal of the ACM, 21(2):246-260, April 1974. URL: https://doi.org/10.1145/321812.321820.
Emmanuel Esposito, Thomas Mueller Graf, and Sebastiano Vigna. Recsplit: Minimal perfect hashing via recursive splitting. In 2020 Proceedings of the Twenty-Second Workshop on Algorithm Engineering and Experiments (ALENEX), pages 175-185. Society for Industrial and Applied Mathematics, January 2020. URL: https://doi.org/10.1137/1.9781611976007.14.
Bin Fan, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. Cuckoo filter. Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, December 2014. URL: https://doi.org/10.1145/2674005.2674994.
R.M. Fano. On the Number of Bits Required to Implement an Associative Memory. Memorandum 61, Computation Structures Group. MIT Project MAC Computer Structures Group, 1971. URL: https://books.google.ch/books?id=07DeGwAACAAJ.
Tommaso Fontana, Sebastiano Vigna, and Stefano Zacchiroli. Webgraph: The next generation (is in rust). In Companion Proceedings of the ACM Web Conference 2024, WWW ’24, pages 686-689. ACM, May 2024. URL: https://doi.org/10.1145/3589335.3651581.
Dimitris Fotakis, Rasmus Pagh, Peter Sanders, and Paul Spirakis. Space efficient hash tables with worst case constant access time. Theory of Computing Systems, 38(2):229-248, December 2004. URL: https://doi.org/10.1007/s00224-004-1195-x.
Edward A. Fox, Qi Fan Chen, and Lenwood S. Heath. A faster algorithm for constructing minimal perfect hash functions. Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '92, 1992. URL: https://doi.org/10.1145/133160.133209.
Olivier Giniaux. Gxhash: A high-throughput, non-cryptographic hashing algorithm leveraging modern cpu capabilities, 2023. URL: https://doi.org/10.5281/ZENODO.8368254.
Ragnar Groot Koerkamp. RagnarGrootKoerkamp/PtrHash. Software, (visited on 2025-05-20). URL: https://github.com/RagnarGrootKoerkamp/PtrHash
archived version
full metadata available at: https://doi.org/10.4230/artifacts.23124
Stefan Hermann. Accelerating minimal perfect hash function construction using gpu parallelization. Master’s thesis, Karlsruher Institut für Technologie (KIT), 2023. URL: https://doi.org/10.5445/IR/1000164413.
Stefan Hermann, Hans-Peter Lehmann, Giulio Ermanno Pibiri, Peter Sanders, and Stefan Walzer. Phobic: Perfect hashing with optimized bucket sizes and interleaved coding. In 32nd Annual European Symposium on Algorithms. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024. URL: https://doi.org/10.4230/LIPICS.ESA.2024.69.
Stefan Hermann, Hans-Peter Lehmann, Giulio Ermanno Pibiri, Peter Sanders, and Stefan Walzer. Phobic: Perfect hashing with optimized bucket sizes and interleaved coding, 2024. URL: https://doi.org/10.48550/arXiv.2404.18497.
Hans-Peter Lehmann. Fast and Space-Efficient Perfect Hashing. PhD thesis, Karlsruher Institut für Technologie (KIT), 2024. URL: https://doi.org/10.5445/IR/1000176432.
Hans-Peter Lehmann. Mphf-experiments, January 2025. URL: https://github.com/ByteHamster/MPHF-Experiments.
Hans-Peter Lehmann, Peter Sanders, and Stefan Walzer. Shockhash: Towards optimal-space minimal perfect hashing beyond brute-force, 2023. URL: https://doi.org/10.48550/arXiv.2308.09561.
Hans-Peter Lehmann, Peter Sanders, and Stefan Walzer. Sichash - small irregular cuckoo tables for perfect hashing. In 2023 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), pages 176-189. Society for Industrial and Applied Mathematics, January 2023. URL: https://doi.org/10.1137/1.9781611977561.ch15.
Hans-Peter Lehmann, Peter Sanders, and Stefan Walzer. Shockhash: Towards optimal-space minimal perfect hashing beyond brute-force. In 2024 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), pages 194-206. Society for Industrial and Applied Mathematics, January 2024. URL: https://doi.org/10.1137/1.9781611977929.15.
Hans-Peter Lehmann, Peter Sanders, Stefan Walzer, and Jonatan Ziegler. Combined search and encoding for seeds, with an application to minimal perfect hashing, 2025. URL: https://doi.org/10.48550/arXiv.2502.05613.
Daniel Lemire. Fast random integer generation in an interval. ACM Transactions on Modeling and Computer Simulation, 29(1):1-12, January 2019. URL: https://doi.org/10.1145/3230636.
Daniel Lemire, Owen Kaser, and Nathan Kurz. Faster remainder by direct computation: Applications to compilers and software libraries. Software: Practice and Experience, 49(6):953-970, February 2019. URL: https://doi.org/10.1002/spe.2689.
Yi Lu, Balaji Prabhakar, and Flavio Bonomi. Perfect hashing for network applications. In 2006 IEEE International Symposium on Information Theory. IEEE, July 2006. URL: https://doi.org/10.1109/isit.2006.261567.
B. S. Majewski. A family of perfect hashing methods. The Computer Journal, 39(6):547-554, June 1996. URL: https://doi.org/10.1093/comjnl/39.6.547.
Kurt Mehlhorn. On the program size of perfect and universal hash functions. 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), November 1982. URL: https://doi.org/10.1109/sfcs.1982.80.
Rasmus Pagh. Hash and displace: Efficient evaluation of minimal perfect hash functions. In Algorithms and Data Structures, pages 49-54. Springer Berlin Heidelberg, 1999. URL: https://doi.org/10.1007/3-540-48447-7_5.
Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. In Algorithms — ESA 2001, pages 121-133. Springer Berlin Heidelberg, 2001. URL: https://doi.org/10.1007/3-540-44676-1_10.
Prashant Pandey, Michael A. Bender, and Rob Johnson. A fast x86 implementation of select, 2017. URL: https://doi.org/10.48550/arXiv.1706.00990.
Giulio Ermanno Pibiri. Sparse and skew hashing of k-mers. Bioinformatics, 38:i185-i194, June 2022. URL: https://doi.org/10.1093/bioinformatics/btac245.
Giulio Ermanno Pibiri and Roberto Trani. Pthash: Revisiting fch minimal perfect hashing. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2021. URL: https://doi.org/10.1145/3404835.3462849.
Giulio Ermanno Pibiri and Roberto Trani. Parallel and external-memory construction of minimal perfect hash functions with pthash. IEEE Transactions on Knowledge and Data Engineering, 36(3):1249-1259, March 2024. URL: https://doi.org/10.1109/tkde.2023.3303341.
Sebastiano Vigna. sux-rs, 2024. URL: https://github.com/vigna/sux-rs.
Sebastiano Vigna. ε-cost sharding: Scaling hypergraph-based static functions and filters to trillions of keys, 2025. URL: https://doi.org/10.48550/arXiv.2503.18397.
Sebastiano Vigna and Tommaso Fontana. ε-serde, 2024. URL: https://github.com/vigna/epserde-rs.
Henry Wong. Measuring reorder buffer capacity, May 2013. URL: https://blog.stuffedcow.net/2013/05/measuring-rob-capacity/.

PtrHash: Minimal Perfect Hashing at RAM Throughput

Author Ragnar Groot Koerkamp

Files

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

PtrHash: Minimal Perfect Hashing at RAM Throughput

Author Ragnar Groot Koerkamp

Files

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

Supplementary Materials

References

Thanks for your feedback!

Could not send message