DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2025.9

MorphisHash: Improving Space Efficiency of ShockHash for Minimal Perfect Hashing

Authors: Stefan Hermann

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

A minimal perfect hash function (MPHF) maps a set of n keys to unique positions {1, …, n}. Representing an MPHF requires at least log₂(e)≈ 1.443 bits per key. ShockHash is a technique to construct an MPHF and requires just slightly more space. It gives each key two random candidate positions. If each key can be mapped to one of its two candidate positions such that there is exactly one key mapped to each position, then an MPHF is found. If not, ShockHash repeats the process with a new set of random candidate positions. ShockHash has to store how many repetitions were required and for each key to which of the two candidate positions it is mapped. However, when a given set of candidate positions can be used as MPHF then there is not only one but multiple ways of mapping the keys to one of their candidate positions such that the mapping results in an MPHF. This redundancy makes up for the majority of the remaining space overhead in ShockHash. In this paper, we present MorphisHash which almost completely eliminates this redundancy. Our theoretical result is that MorphisHash saves Θ(ln(n)) bits in expectation compared to ShockHash. This corresponds to a factor of 20 less space overhead in practice. Just like ShockHash, MorphisHash can be used as a building block within RecSplit to obtain MorphisHash-RS. When compared for same space consumption, MorphisHash-RS can be constructed up to 21 times faster than ShockHash-RS. The technique to accomplish this might be of a more general interest to compress data structures.

Cite as

Stefan Hermann. MorphisHash: Improving Space Efficiency of ShockHash for Minimal Perfect Hashing. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 9:1-9:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{hermann:LIPIcs.ESA.2025.9,
  author =	{Hermann, Stefan},
  title =	{{MorphisHash: Improving Space Efficiency of ShockHash for Minimal Perfect Hashing}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{9:1--9:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.9},
  URN =		{urn:nbn:de:0030-drops-244779},
  doi =		{10.4230/LIPIcs.ESA.2025.9},
  annote =	{Keywords: compressed data structure, perfect hashing, random graph, pseudoforest, component}
}

Document

DOI: 10.4230/LIPIcs.ESA.2025.99

Engineering Minimal k-Perfect Hash Functions

Authors: Stefan Hermann, Sebastian Kirmayer, Hans-Peter Lehmann, Peter Sanders, and Stefan Walzer

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

Given a set S of n keys, a k-perfect hash function (kPHF) is a data structure that maps the keys to the first m integers, where each output integer can be hit by at most k input keys. When m = ⌈n/k⌉, the resulting function is called a minimal k-perfect hash function (MkPHF). Applications of kPHFs can be found in external memory data structures or to create efficient 1-perfect hash functions, which in turn have a wide range of applications from databases to bioinformatics. Several papers from the 1980s look at external memory data structures with small internal memory indexes. However, actual k-perfect hash functions are surprisingly rare, and the area has not seen a lot of research recently. At the same time, recent research in 1-perfect hashing shows that there is a lack of efficient kPHFs. In this paper, we revive the area of k-perfect hashing, presenting four new constructions. Our implementations simultaneously dominate older approaches in space consumption, construction time, and query time. We see this paper as a possible starting point of an active line of research, similar to the area of 1-perfect hashing.

Cite as

Stefan Hermann, Sebastian Kirmayer, Hans-Peter Lehmann, Peter Sanders, and Stefan Walzer. Engineering Minimal k-Perfect Hash Functions. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 99:1-99:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{hermann_et_al:LIPIcs.ESA.2025.99,
  author =	{Hermann, Stefan and Kirmayer, Sebastian and Lehmann, Hans-Peter and Sanders, Peter and Walzer, Stefan},
  title =	{{Engineering Minimal k-Perfect Hash Functions}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{99:1--99:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.99},
  URN =		{urn:nbn:de:0030-drops-245685},
  doi =		{10.4230/LIPIcs.ESA.2025.99},
  annote =	{Keywords: Compressed Data Structures, Perfect Hashing}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.25

Blocked Bloom Filters with Choices

Authors: Johanna Elena Schmitz, Jens Zentgraf, and Sven Rahmann

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

Probabilistic filters are approximate set membership data structures that represent a set of keys in small space, and answer set membership queries without false negative answers, but with a certain allowed false positive probability. Such filters are widely used in database systems, networks, storage systems and in biological sequence analysis because of their fast query times and low space requirements. Starting with Bloom filters in the 1970s, many filter data structures have been developed, each with its own advantages and disadvantages, e.g., Blocked Bloom filters, Cuckoo filters, XOR filters, Ribbon filters, and more. We introduce Blocked Bloom filters with choices that work similarly to Blocked Bloom filters, except that for each key there are two (or more) alternative choices of blocks where the key’s information may be stored. When inserting a key, we select the block using a cost function which takes into account the current load and the additional number of bits to be set in the candidate blocks. The result is a filter that partially inherits the advantages of a Blocked Bloom filter, such as the ability to insert keys rapidly online or the ability to slightly overload the filter with only a small penalty to the false positive rate. At the same time, it avoids the major disadvantage of a Blocked Bloom filter, namely the larger space consumption. Our new data structure uses less space at the same false positive rate, or has a lower false positive rate at the same space consumption as a Blocked Bloom filter. We discuss the methodology, cost functions for block selection, engineered implementation, a detailed performance evaluation and use cases in bioinformatics of Blocked Bloom filters with choices, showing that they can be of practical value. The implementation of the evaluated filters and the workflows used are provided via Gitlab at https://gitlab.com/rahmannlab/blowchoc-filters.

Cite as

Johanna Elena Schmitz, Jens Zentgraf, and Sven Rahmann. Blocked Bloom Filters with Choices. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 25:1-25:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{schmitz_et_al:LIPIcs.SEA.2025.25,
  author =	{Schmitz, Johanna Elena and Zentgraf, Jens and Rahmann, Sven},
  title =	{{Blocked Bloom Filters with Choices}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{25:1--25:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.25},
  URN =		{urn:nbn:de:0030-drops-232631},
  doi =		{10.4230/LIPIcs.SEA.2025.25},
  annote =	{Keywords: Probabilistic filter, Bloom filter, power of two choices}
}

Document

DOI: 10.4230/LIPIcs.SEA.2022.4

Fast Succinct Retrieval and Approximate Membership Using Ribbon

Authors: Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer

Published in: LIPIcs, Volume 233, 20th International Symposium on Experimental Algorithms (SEA 2022)

Abstract

A retrieval data structure for a static function f: S → {0,1}^r supports queries that return f(x) for any x ∈ S. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ), i.e., a Bloom filter alternative, with false positive rate 2^{-r}. The information-theoretic lower bound for both tasks is r|S| bits. While succinct theoretical constructions using (1+o(1))r|S| bits were known, these could not achieve very small overheads in practice because they have an unfavorable space-time tradeoff hidden in the asymptotic costs or because small overheads would only be reached for physically impossible input sizes. With bumped ribbon retrieval (BuRR), we present the first practical succinct retrieval data structure. In an extensive experimental evaluation BuRR achieves space overheads well below 1% while being faster than most previously used retrieval data structures (typically with space overheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead ≥ 44%). This efficiency, including favorable constants, stems from a combination of simplicity, word parallelism, and high locality. We additionally describe homogeneous ribbon filter AMQs, which are even simpler and faster at the price of slightly larger space overhead.

Cite as

Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer. Fast Succinct Retrieval and Approximate Membership Using Ribbon. In 20th International Symposium on Experimental Algorithms (SEA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 233, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{dillinger_et_al:LIPIcs.SEA.2022.4,
  author =	{Dillinger, Peter C. and H\"{u}bschle-Schneider, Lorenz and Sanders, Peter and Walzer, Stefan},
  title =	{{Fast Succinct Retrieval and Approximate Membership Using Ribbon}},
  booktitle =	{20th International Symposium on Experimental Algorithms (SEA 2022)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-251-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{233},
  editor =	{Schulz, Christian and U\c{c}ar, Bora},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2022.4},
  URN =		{urn:nbn:de:0030-drops-165385},
  doi =		{10.4230/LIPIcs.SEA.2022.4},
  annote =	{Keywords: AMQ, Bloom filter, dictionary, linear algebra, randomized algorithm, retrieval data structure, static function data structure, succinct data structure, perfect hashing}
}

@InProceedings{dillinger_et_al:LIPIcs.SEA.2022.4,
  author =	{Dillinger, Peter C. and H\"{u}bschle-Schneider, Lorenz and Sanders, Peter and Walzer, Stefan},
  title =	{{Fast Succinct Retrieval and Approximate Membership Using Ribbon}},
  booktitle =	{20th International Symposium on Experimental Algorithms (SEA 2022)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-251-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{233},
  editor =	{Schulz, Christian and U\c{c}ar, Bora},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2022.4},
  URN =		{urn:nbn:de:0030-drops-165385},
  doi =		{10.4230/LIPIcs.SEA.2022.4},
  annote =	{Keywords: AMQ, Bloom filter, dictionary, linear algebra, randomized algorithm, retrieval data structure, static function data structure, succinct data structure, perfect hashing}
}

4 Search Results for "Dillinger, Peter C."

MorphisHash: Improving Space Efficiency of ShockHash for Minimal Perfect Hashing

Abstract

Cite as

Engineering Minimal k-Perfect Hash Functions

Abstract

Cite as

Blocked Bloom Filters with Choices

Abstract

Cite as

Fast Succinct Retrieval and Approximate Membership Using Ribbon

Abstract

Cite as

Thanks for your feedback!

Could not send message