4 Search Results for "Bez, Dominik"


Document
Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing

Authors: Hans-Peter Lehmann, Peter Sanders, Stefan Walzer, and Jonatan Ziegler

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)


Abstract
Randomised algorithms often employ methods that can fail and that are retried with independent randomness until they succeed. Randomised data structures therefore often store indices of successful attempts, called seeds. If n such seeds are required (e.g., for independent substructures) the standard approach is to compute for each i ∈ [n] the smallest successful seed S_i and store S = (S_1,…,S_n). The central observation of this paper is that this is not space-optimal. We present a different algorithm that computes a sequence S' = (S_1',…,S_n') of successful seeds such that the entropy of S' undercuts the entropy of S by Ω(n) bits in most cases. To achieve a memory consumption of OPT+εn, the expected number of inspected seeds increases by a factor of 𝒪(1/ε). We demonstrate the usefulness of our findings with a novel construction for minimal perfect hash functions that, for n keys and any ε ∈ [n^{-3/7},1], has space requirement (1+ε)OPT and construction time 𝒪(n/ε). All previous approaches only support ε = ω(1/log n) or have construction times that increase exponentially with 1/ε. Our implementation beats the construction throughput of the state of the art by more than two orders of magnitude for ε ≤ 3%.

Cite as

Hans-Peter Lehmann, Peter Sanders, Stefan Walzer, and Jonatan Ziegler. Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 109:1-109:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{lehmann_et_al:LIPIcs.ESA.2025.109,
  author =	{Lehmann, Hans-Peter and Sanders, Peter and Walzer, Stefan and Ziegler, Jonatan},
  title =	{{Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{109:1--109:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.109},
  URN =		{urn:nbn:de:0030-drops-245780},
  doi =		{10.4230/LIPIcs.ESA.2025.109},
  annote =	{Keywords: Random Seed, Encoding, Bernoulli Process, Backtracking, Perfect Hashing}
}
Document
MorphisHash: Improving Space Efficiency of ShockHash for Minimal Perfect Hashing

Authors: Stefan Hermann

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)


Abstract
A minimal perfect hash function (MPHF) maps a set of n keys to unique positions {1, …, n}. Representing an MPHF requires at least log₂(e)≈ 1.443 bits per key. ShockHash is a technique to construct an MPHF and requires just slightly more space. It gives each key two random candidate positions. If each key can be mapped to one of its two candidate positions such that there is exactly one key mapped to each position, then an MPHF is found. If not, ShockHash repeats the process with a new set of random candidate positions. ShockHash has to store how many repetitions were required and for each key to which of the two candidate positions it is mapped. However, when a given set of candidate positions can be used as MPHF then there is not only one but multiple ways of mapping the keys to one of their candidate positions such that the mapping results in an MPHF. This redundancy makes up for the majority of the remaining space overhead in ShockHash. In this paper, we present MorphisHash which almost completely eliminates this redundancy. Our theoretical result is that MorphisHash saves Θ(ln(n)) bits in expectation compared to ShockHash. This corresponds to a factor of 20 less space overhead in practice. Just like ShockHash, MorphisHash can be used as a building block within RecSplit to obtain MorphisHash-RS. When compared for same space consumption, MorphisHash-RS can be constructed up to 21 times faster than ShockHash-RS. The technique to accomplish this might be of a more general interest to compress data structures.

Cite as

Stefan Hermann. MorphisHash: Improving Space Efficiency of ShockHash for Minimal Perfect Hashing. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 9:1-9:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{hermann:LIPIcs.ESA.2025.9,
  author =	{Hermann, Stefan},
  title =	{{MorphisHash: Improving Space Efficiency of ShockHash for Minimal Perfect Hashing}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{9:1--9:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.9},
  URN =		{urn:nbn:de:0030-drops-244779},
  doi =		{10.4230/LIPIcs.ESA.2025.9},
  annote =	{Keywords: compressed data structure, perfect hashing, random graph, pseudoforest, component}
}
Document
PtrHash: Minimal Perfect Hashing at RAM Throughput

Authors: Ragnar Groot Koerkamp

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)


Abstract
Motivation. Given a set K of n keys, a minimal perfect hash function (MPHF) is a collision-free bijective map H_mphf from K to {0, … , n-1}. These functions have uses in databases, search engines, and are used in bioinformatics indexing tools such as Pufferfish (using BBHash), and Piscem (PTHash). PTHash is also used in SSHash, a data structure on k-mers that supports membership queries. PTHash only takes around 5% of the total space of SSHash, and thus, trading slightly more space for faster queries is beneficial. Thus, this work presents a (minimal) perfect hash function that first prioritizes query throughput, while also allowing efficient construction for 10⁹ or more elements using 2.4 bits of memory per key. Contributions. Both PTHash and PHOBIC first map all n keys to n/λ < n buckets. Then, each bucket stores a pilot that controls the final hash value of the keys mapping to it. PtrHash builds on this by using 1) fixed-width (uncompressed) 8-bit pilots, 2) a construction algorithm similar to Cuckoo hashing to find suitable pilot values. Further, it partitions the keys, so that keys in each part map to their own set of slots. PtrHash 3) uses the same number of buckets and slots for each part, with 4) a single remap table to map intermediate positions ≥ n to < n, 5) encoded using per-cacheline Elias-Fano coding. Lastly, 6) PtrHash supports streaming queries, where we use prefetching to answer a stream of multiple queries more efficiently than one-by-one processing. Results. With default parameters, PtrHash takes 2.4 bits per key. On 300 million string keys, PtrHash is as fast or faster to build than other MPHFs at a similar size, and at least 2.1× faster to query. When streaming multiple queries, this improves to 3.3× speedup over the fastest alternative, while also being significantly faster to construct. When using 10⁹ integer keys instead, query times are as low as 12 ns/key when iterating in a for loop, or even down to 8 ns/key when using the streaming approach, just short of the 7.4 ns inverse throughput of random memory accesses.

Cite as

Ragnar Groot Koerkamp. PtrHash: Minimal Perfect Hashing at RAM Throughput. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 21:1-21:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{grootkoerkamp:LIPIcs.SEA.2025.21,
  author =	{Groot Koerkamp, Ragnar},
  title =	{{PtrHash: Minimal Perfect Hashing at RAM Throughput}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{21:1--21:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.21},
  URN =		{urn:nbn:de:0030-drops-232597},
  doi =		{10.4230/LIPIcs.SEA.2025.21},
  annote =	{Keywords: Minimal perfect hashing, Compressed Data Structures}
}
Document
High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

Authors: Dominik Bez, Florian Kurpicz, Hans-Peter Lehmann, and Peter Sanders

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)


Abstract
A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX'20] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way. We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs. In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation.

Cite as

Dominik Bez, Florian Kurpicz, Hans-Peter Lehmann, and Peter Sanders. High Performance Construction of RecSplit Based Minimal Perfect Hash Functions. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 19:1-19:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{bez_et_al:LIPIcs.ESA.2023.19,
  author =	{Bez, Dominik and Kurpicz, Florian and Lehmann, Hans-Peter and Sanders, Peter},
  title =	{{High Performance Construction of RecSplit Based Minimal Perfect Hash Functions}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{19:1--19:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.19},
  URN =		{urn:nbn:de:0030-drops-186728},
  doi =		{10.4230/LIPIcs.ESA.2023.19},
  annote =	{Keywords: compressed data structure, parallel perfect hashing, bit parallelism, GPU, SIMD, parallel computing, vector instructions}
}
  • Refine by Type
  • 4 Document/PDF
  • 3 Document/HTML

  • Refine by Publication Year
  • 3 2025
  • 1 2023

  • Refine by Author
  • 2 Lehmann, Hans-Peter
  • 2 Sanders, Peter
  • 1 Bez, Dominik
  • 1 Groot Koerkamp, Ragnar
  • 1 Hermann, Stefan
  • Show More...

  • Refine by Series/Journal
  • 4 LIPIcs

  • Refine by Classification
  • 3 Theory of computation → Data compression
  • 2 Information systems → Point lookups
  • 2 Theory of computation → Data structures design and analysis
  • 1 Information systems → Retrieval efficiency
  • 1 Mathematics of computing → Random graphs
  • Show More...

  • Refine by Keyword
  • 2 compressed data structure
  • 1 Backtracking
  • 1 Bernoulli Process
  • 1 Compressed Data Structures
  • 1 Encoding
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail