21 Search Results for "Dietzfelbinger, Martin"


Document
The mod-minimizer: A Simple and Efficient Sampling Algorithm for Long k-mers

Authors: Ragnar Groot Koerkamp and Giulio Ermanno Pibiri

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
Motivation. Given a string S, a minimizer scheme is an algorithm defined by a triple (k,w,𝒪) that samples a subset of k-mers (k-long substrings) from a string S. Specifically, it samples the smallest k-mer according to the order 𝒪 from each window of w consecutive k-mers in S. Because consecutive windows can sample the same k-mer, the set of the sampled k-mers is typically much smaller than S. More generally, we consider substring sampling algorithms that respect a window guarantee: at least one k-mer must be sampled from every window of w consecutive k-mers. As a sampled k-mer is uniquely identified by its absolute position in S, we can define the density of a sampling algorithm as the fraction of distinct sampled positions. Good methods have low density which, by respecting the window guarantee, is lower bounded by 1/w. It is however difficult to design a sequence-agnostic algorithm with provably optimal density. In practice, the order 𝒪 is usually implemented using a pseudo-random hash function to obtain the so-called random minimizer. This scheme is simple to implement, very fast to compute even in streaming fashion, and easy to analyze. However, its density is almost a factor of 2 away from the lower bound for large windows. Methods. In this work we introduce mod-sampling, a two-step sampling algorithm to obtain new minimizer schemes. Given a (small) parameter t, the mod-sampling algorithm finds the position p of the smallest t-mer in a window. It then samples the k-mer at position pod w. The lr-minimizer uses t = k-w and the mod-minimizer uses t≡ k (mod w). Results. These new schemes have provably lower density than random minimizers and other schemes when k is large compared to w, while being as fast to compute. Importantly, the mod-minimizer achieves optimal density when k → ∞. Although the mod-minimizer is not the first method to achieve optimal density for large k, its proof of optimality is simpler than previous work. We provide pseudocode for a number of other methods and compare to them. In practice, the mod-minimizer has considerably lower density than the random minimizer and other state-of-the-art methods, like closed syncmers and miniception, when k > w. We plugged the mod-minimizer into SSHash, a k-mer dictionary based on minimizers. For default parameters (w,k) = (11,21), space usage decreases by 15% when indexing the whole human genome (GRCh38), while maintaining its fast query time.

Cite as

Ragnar Groot Koerkamp and Giulio Ermanno Pibiri. The mod-minimizer: A Simple and Efficient Sampling Algorithm for Long k-mers. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 11:1-11:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{grootkoerkamp_et_al:LIPIcs.WABI.2024.11,
  author =	{Groot Koerkamp, Ragnar and Pibiri, Giulio Ermanno},
  title =	{{The mod-minimizer: A Simple and Efficient Sampling Algorithm for Long k-mers}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{11:1--11:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.11},
  URN =		{urn:nbn:de:0030-drops-206552},
  doi =		{10.4230/LIPIcs.WABI.2024.11},
  annote =	{Keywords: Minimizers, Randomized algorithms, Sketching, Hashing}
}
Document
Invited Talk
Models and Counter-Models of Quantified Boolean Formulas (Invited Talk)

Authors: Martina Seidl

Published in: LIPIcs, Volume 305, 27th International Conference on Theory and Applications of Satisfiability Testing (SAT 2024)


Abstract
Because of the duality of universal and existential quantification, quantified Boolean formulas (QBF), the extension of propositional logic with quantifiers over the Boolean variables, have not only solutions in terms of models for true formulas like in SAT. Also false QBFs have solutions in terms of counter-models. Both models and counter-models can be represented as certain binary trees or as sets of Boolean functions reflecting the dependencies among the variables of a formula. Such solutions encode the answers to application problems for which QBF solvers are employed like the plan for a planning problem or the error trace of a verification problem. Therefore, models and counter-models are at the core of theory and practice of QBF solving. In this invited talk, we survey approaches that deal with models and counter-models of QBFs and identify some open challenges.

Cite as

Martina Seidl. Models and Counter-Models of Quantified Boolean Formulas (Invited Talk). In 27th International Conference on Theory and Applications of Satisfiability Testing (SAT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 305, pp. 1:1-1:7, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{seidl:LIPIcs.SAT.2024.1,
  author =	{Seidl, Martina},
  title =	{{Models and Counter-Models of Quantified Boolean Formulas}},
  booktitle =	{27th International Conference on Theory and Applications of Satisfiability Testing (SAT 2024)},
  pages =	{1:1--1:7},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-334-8},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{305},
  editor =	{Chakraborty, Supratik and Jiang, Jie-Hong Roland},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SAT.2024.1},
  URN =		{urn:nbn:de:0030-drops-205238},
  doi =		{10.4230/LIPIcs.SAT.2024.1},
  annote =	{Keywords: Quantified Boolean Formula, Solution Extraction, Solution Counting}
}
Document
Exponential Separation Between Powers of Regular and General Resolution over Parities

Authors: Sreejata Kishor Bhattacharya, Arkadev Chattopadhyay, and Pavel Dvořák

Published in: LIPIcs, Volume 300, 39th Computational Complexity Conference (CCC 2024)


Abstract
Proving super-polynomial lower bounds on the size of proofs of unsatisfiability of Boolean formulas using resolution over parities is an outstanding problem that has received a lot of attention after its introduction by Itsykson and Sokolov [Dmitry Itsykson and Dmitry Sokolov, 2014]. Very recently, Efremenko, Garlík and Itsykson [Klim Efremenko et al., 2023] proved the first exponential lower bounds on the size of ResLin proofs that were additionally restricted to be bottom-regular. We show that there are formulas for which such regular ResLin proofs of unsatisfiability continue to have exponential size even though there exist short proofs of their unsatisfiability in ordinary, non-regular resolution. This is the first super-polynomial separation between the power of general ResLin and that of regular ResLin for any natural notion of regularity. Our argument, while building upon the work of Efremenko et al. [Klim Efremenko et al., 2023], uses additional ideas from the literature on lifting theorems.

Cite as

Sreejata Kishor Bhattacharya, Arkadev Chattopadhyay, and Pavel Dvořák. Exponential Separation Between Powers of Regular and General Resolution over Parities. In 39th Computational Complexity Conference (CCC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 300, pp. 23:1-23:32, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.CCC.2024.23,
  author =	{Bhattacharya, Sreejata Kishor and Chattopadhyay, Arkadev and Dvo\v{r}\'{a}k, Pavel},
  title =	{{Exponential Separation Between Powers of Regular and General Resolution over Parities}},
  booktitle =	{39th Computational Complexity Conference (CCC 2024)},
  pages =	{23:1--23:32},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-331-7},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{300},
  editor =	{Santhanam, Rahul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2024.23},
  URN =		{urn:nbn:de:0030-drops-204191},
  doi =		{10.4230/LIPIcs.CCC.2024.23},
  annote =	{Keywords: Proof Complexity, Regular Reslin, Branching Programs, Lifting}
}
Document
Track A: Algorithms, Complexity and Games
Towards an Analysis of Quadratic Probing

Authors: William Kuszmaul and Zoe Xi

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
Since 1968, one of the simplest open questions in the theory of hash tables has been to prove anything nontrivial about the correctness of quadratic probing. We make the first tangible progress towards this goal, showing that there exists a positive-constant load factor at which quadratic probing is a constant-expected-time hash table. Our analysis applies more generally to any fixed-offset open-addressing hash table, and extends to higher load factors in the case where the hash table examines blocks of some size B = ω(1).

Cite as

William Kuszmaul and Zoe Xi. Towards an Analysis of Quadratic Probing. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 103:1-103:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{kuszmaul_et_al:LIPIcs.ICALP.2024.103,
  author =	{Kuszmaul, William and Xi, Zoe},
  title =	{{Towards an Analysis of Quadratic Probing}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{103:1--103:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.103},
  URN =		{urn:nbn:de:0030-drops-202463},
  doi =		{10.4230/LIPIcs.ICALP.2024.103},
  annote =	{Keywords: quadratic probing, hashing, open addressing, witness trees}
}
Document
Track A: Algorithms, Complexity and Games
Optimal Non-Adaptive Cell Probe Dictionaries and Hashing

Authors: Kasper Green Larsen, Rasmus Pagh, Giuseppe Persiano, Toniann Pitassi, Kevin Yeo, and Or Zamir

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
We present a simple and provably optimal non-adaptive cell probe data structure for the static dictionary problem. Our data structure supports storing a set of n key-value pairs from [u]× [u] using s words of space and answering key lookup queries in t = O(lg(u/n)/lg(s/n)) non-adaptive probes. This generalizes a solution to the membership problem (i.e., where no values are associated with keys) due to Buhrman et al. We also present matching lower bounds for the non-adaptive static membership problem in the deterministic setting. Our lower bound implies that both our dictionary algorithm and the preceding membership algorithm are optimal, and in particular that there is an inherent complexity gap in these problems between no adaptivity and one round of adaptivity (with which hashing-based algorithms solve these problems in constant time). Using the ideas underlying our data structure, we also obtain the first implementation of a n-wise independent family of hash functions with optimal evaluation time in the cell probe model.

Cite as

Kasper Green Larsen, Rasmus Pagh, Giuseppe Persiano, Toniann Pitassi, Kevin Yeo, and Or Zamir. Optimal Non-Adaptive Cell Probe Dictionaries and Hashing. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 104:1-104:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{larsen_et_al:LIPIcs.ICALP.2024.104,
  author =	{Larsen, Kasper Green and Pagh, Rasmus and Persiano, Giuseppe and Pitassi, Toniann and Yeo, Kevin and Zamir, Or},
  title =	{{Optimal Non-Adaptive Cell Probe Dictionaries and Hashing}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{104:1--104:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.104},
  URN =		{urn:nbn:de:0030-drops-202471},
  doi =		{10.4230/LIPIcs.ICALP.2024.104},
  annote =	{Keywords: non-adaptive, cell probe, dictionary, hashing}
}
Document
Track A: Algorithms, Complexity and Games
Limits of Sequential Local Algorithms on the Random k-XORSAT Problem

Authors: Kingsley Yung

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
The random k-XORSAT problem is a random constraint satisfaction problem of n Boolean variables and m = rn clauses, which a random instance can be expressed as a G𝔽(2) linear system of the form Ax = b, where A is a random m × n matrix with k ones per row, and b is a random vector. It is known that there exist two distinct thresholds r_{core}(k) < r_{sat}(k) such that as n → ∞ for r < r_{sat}(k) the random instance has solutions with high probability, while for r_{core} < r < r_{sat}(k) the solution space shatters into an exponential number of clusters. Sequential local algorithms are a natural class of algorithms which assign values to variables one by one iteratively. In each iteration, the algorithm runs some heuristics, called local rules, to decide the value assigned, based on the local neighborhood of the selected variables under the factor graph representation of the instance. We prove that for any r > r_{core}(k) the sequential local algorithms with certain local rules fail to solve the random k-XORSAT with high probability. They include (1) the algorithm using the Unit Clause Propagation as local rule for k ≥ 9, and (2) the algorithms using any local rule that can calculate the exact marginal probabilities of variables in instances with factor graphs that are trees, for k ≥ 13. The well-known Belief Propagation and Survey Propagation are included in (2). Meanwhile, the best known linear-time algorithm succeeds with high probability for r < r_{core}(k). Our results support the intuition that r_{core}(k) is the sharp threshold for the existence of a linear-time algorithm for random k-XORSAT. Our approach is to apply the Overlap Gap Property OGP framework to the sub-instance induced by the core of the instance, instead of the whole instance. By doing so, the sequential local algorithms can be ruled out at density as low as r_{core}(k), since the sub-instance exhibits OGP at much lower clause density, compared with the whole instance.

Cite as

Kingsley Yung. Limits of Sequential Local Algorithms on the Random k-XORSAT Problem. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 123:1-123:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{yung:LIPIcs.ICALP.2024.123,
  author =	{Yung, Kingsley},
  title =	{{Limits of Sequential Local Algorithms on the Random k-XORSAT Problem}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{123:1--123:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.123},
  URN =		{urn:nbn:de:0030-drops-202666},
  doi =		{10.4230/LIPIcs.ICALP.2024.123},
  annote =	{Keywords: Random k-XORSAT, Sequential local algorithms, Average-case complexity, Phase transition, Overlap gap property}
}
Document
Track B: Automata, Logic, Semantics, and Theory of Programming
Identifying Tractable Quantified Temporal Constraints Within Ord-Horn

Authors: Jakub Rydval, Žaneta Semanišinová, and Michał Wrona

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
The constraint satisfaction problem, parameterized by a relational structure, provides a general framework for expressing computational decision problems. Already the restriction to the class of all finite structures forms an interesting microcosm on its own, but to express decision problems in temporal reasoning one has to take a step beyond the finite-domain realm. An important class of templates used in this context are temporal structures, i.e., structures over ℚ whose relations are first-order definable using the usual countable dense linear order without endpoints. In the standard setting, which allows only existential quantification over input variables, the complexity of finite and temporal constraints has been fully classified. In the quantified setting, i.e., when one also allows universal quantifiers, there is only a handful of partial classification results and many concrete cases of unknown complexity. This paper presents a significant progress towards understanding the complexity of the quantified constraint satisfaction problem for temporal structures. We provide a complexity dichotomy for quantified constraints over the Ord-Horn fragment, which played an important role in understanding the complexity of constraints both over temporal structures and in Allen’s interval algebra. We show that all problems under consideration are in P or coNP-hard. In particular, we determine the complexity of the quantified constraint satisfaction problem for (ℚ;x = y⇒ x ≥ z), hereby settling a question open for more than ten years.

Cite as

Jakub Rydval, Žaneta Semanišinová, and Michał Wrona. Identifying Tractable Quantified Temporal Constraints Within Ord-Horn. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 151:1-151:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{rydval_et_al:LIPIcs.ICALP.2024.151,
  author =	{Rydval, Jakub and Semani\v{s}inov\'{a}, \v{Z}aneta and Wrona, Micha{\l}},
  title =	{{Identifying Tractable Quantified Temporal Constraints Within Ord-Horn}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{151:1--151:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.151},
  URN =		{urn:nbn:de:0030-drops-202944},
  doi =		{10.4230/LIPIcs.ICALP.2024.151},
  annote =	{Keywords: constraint satisfaction problems, quantifiers, dichotomy, temporal reasoning, Ord-Horn}
}
Document
RANDOM
Subset Sum in Time 2^{n/2} / poly(n)

Authors: Xi Chen, Yaonan Jin, Tim Randolph, and Rocco A. Servedio

Published in: LIPIcs, Volume 275, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)


Abstract
A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) n-input Subset Sum problem that runs in time 2^{(1/2 - c)n} for some constant c > 0. In this paper we give a Subset Sum algorithm with worst-case running time O(2^{n/2} ⋅ n^{-γ}) for a constant γ > 0.5023 in standard word RAM or circuit RAM models. To the best of our knowledge, this is the first improvement on the classical "meet-in-the-middle" algorithm for worst-case Subset Sum, due to Horowitz and Sahni, which can be implemented in time O(2^{n/2}) in these memory models [Horowitz and Sahni, 1974]. Our algorithm combines a number of different techniques, including the "representation method" introduced by Howgrave-Graham and Joux [Howgrave-Graham and Joux, 2010] and subsequent adaptations of the method in Austrin, Kaski, Koivisto, and Nederlof [Austrin et al., 2016], and Nederlof and Węgrzycki [Jesper Nederlof and Karol Wegrzycki, 2021], and "bit-packing" techniques used in the work of Baran, Demaine, and Pǎtraşcu [Baran et al., 2005] on subquadratic algorithms for 3SUM.

Cite as

Xi Chen, Yaonan Jin, Tim Randolph, and Rocco A. Servedio. Subset Sum in Time 2^{n/2} / poly(n). In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 275, pp. 39:1-39:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.APPROX/RANDOM.2023.39,
  author =	{Chen, Xi and Jin, Yaonan and Randolph, Tim and Servedio, Rocco A.},
  title =	{{Subset Sum in Time 2^\{n/2\} / poly(n)}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)},
  pages =	{39:1--39:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-296-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{275},
  editor =	{Megow, Nicole and Smith, Adam},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2023.39},
  URN =		{urn:nbn:de:0030-drops-188641},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2023.39},
  annote =	{Keywords: Exact algorithms, subset sum, log shaving}
}
Document
Invited Talk
On Hashing by (Random) Equations (Invited Talk)

Authors: Martin Dietzfelbinger

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)


Abstract
The talk will consider aspects of the following setup: Assume for each (key) x from a set 𝒰 (the universe) a vector a_x = (a_{x,0},… ,a_{x,{m-1}}) has been chosen. Given a list Z = (z_i)_{i ∈ [m]} of vectors in {0,1}^r we obtain a mapping φ_Z: 𝒰 → {0,1}^r, x ↦ ⟨a_x,Z⟩ := ⨁_{i ∈ [m]} a_{x,i} ⋅ z_i, where ⨁ is bitwise XOR. The simplest way for creating a data structure for calculating φ_Z is to store Z in an array Z[0..m-1] and answer a query for x by returning ⟨ a_x,Z⟩. The length m of the array should be (1+ε)n for some small ε, and calculating this inner product should be fast. In the focus of the talk is the case where for all or for most of the sets S ⊆ 𝒰 of a certain size n the vectors a_x, x ∈ S, are linearly independent. Choosing Z at random will lead to hash families of various degrees of independence. We will be mostly interested in the case where a_x, x ∈ 𝒰 are chosen independently at random from {0,1}^m, according to some distribution 𝒟. We wish to construct (static) retrieval data structures, which means that S ⊂ 𝒰 and some mapping f: S → {0,1}^r are given, and the task is to find Z such that the restriction of φ_Z to S is f. For creating such a data structure it is necessary to solve the linear system (a_x)_{x ∈ S} ⋅ Z = (f(x))_{x ∈ S} for Z. Two problems are central: Under what conditions on m and 𝒟 can we expect the vectors a_x, x ∈ S to be linearly independent, and how can we arrange things so that in this case the system can be solved fast, in particular in time close to linear (in n, assuming a reasonable machine model)? Solutions to these problems, some classical and others that have emerged only in recent years, will be discussed.

Cite as

Martin Dietzfelbinger. On Hashing by (Random) Equations (Invited Talk). In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{dietzfelbinger:LIPIcs.ESA.2023.1,
  author =	{Dietzfelbinger, Martin},
  title =	{{On Hashing by (Random) Equations}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.1},
  URN =		{urn:nbn:de:0030-drops-186545},
  doi =		{10.4230/LIPIcs.ESA.2023.1},
  annote =	{Keywords: Hashing, Retrieval, Linear equations, Randomness}
}
Document
Work-Efficient Query Evaluation with PRAMs

Authors: Jens Keppeler, Thomas Schwentick, and Christopher Spinrath

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)


Abstract
The paper studies query evaluation in parallel constant time in the PRAM model. While it is well-known that all relational algebra queries can be evaluated in constant time on an appropriate CRCW-PRAM, this paper is interested in the efficiency of evaluation algorithms, that is, in the number of processors or, asymptotically equivalent, in the work. Naive evaluation in the parallel setting results in huge (polynomial) bounds on the work of such algorithms and in presentations of the result sets that can be extremely scattered in memory. The paper first discusses some obstacles for constant time PRAM query evaluation. It presents algorithms for relational operators that are considerably more efficient than the naive approaches. Further it explores three settings, in which efficient sequential query evaluation algorithms exist: acyclic queries, semi-join algebra queries, and join queries - the latter in the worst-case optimal framework. Under natural assumptions on the representation of the database, the work of the given algorithms matches the best sequential algorithms in the case of semi-join queries, and it comes close in the other two settings. An important tool is the compaction technique from Hagerup (1992).

Cite as

Jens Keppeler, Thomas Schwentick, and Christopher Spinrath. Work-Efficient Query Evaluation with PRAMs. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 16:1-16:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{keppeler_et_al:LIPIcs.ICDT.2023.16,
  author =	{Keppeler, Jens and Schwentick, Thomas and Spinrath, Christopher},
  title =	{{Work-Efficient Query Evaluation with PRAMs}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{16:1--16:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.16},
  URN =		{urn:nbn:de:0030-drops-177589},
  doi =		{10.4230/LIPIcs.ICDT.2023.16},
  annote =	{Keywords: PRAM, query evaluation, work-efficient, parallel, acyclic queries, free-connex queries}
}
Document
Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold

Authors: Stefan Walzer

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)


Abstract
Most hash tables have an insertion time of 𝒪(1), often qualified as "expected" and/or "amortised". While insertions into cuckoo hash tables indeed seem to take 𝒪(1) expected time in practice, only polylogarithmic guarantees are proven in all but the simplest of practically relevant cases. Given the widespread use of cuckoo hashing to implement compact dictionaries and Bloom filter alternatives, closing this gap is an important open problem for theoreticians. In this paper, we show that random walk insertions into cuckoo hash tables take 𝒪(1) expected amortised time when any number k ≥ 3 of hash functions is used and the load factor is below the corresponding peeling threshold (e.g. ≈0.81 for k = 3). To our knowledge, this is the first meaningful guarantee for constant time insertion for cuckoo hashing that works for k ∈ {3,…,9}. In addition to being useful in its own right, we hope that our key-centred analysis method can be a stepping stone on the path to the true end goal: 𝒪(1) time insertions for all load factors below the load threshold (e.g. ≈0.91 for k = 3).

Cite as

Stefan Walzer. Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 87:1-87:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{walzer:LIPIcs.ESA.2022.87,
  author =	{Walzer, Stefan},
  title =	{{Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{87:1--87:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.87},
  URN =		{urn:nbn:de:0030-drops-170250},
  doi =		{10.4230/LIPIcs.ESA.2022.87},
  annote =	{Keywords: Cuckoo Hashing, Random Walk, Random Hypergraph, Peeling, Cores}
}
Document
Dense Peelable Random Uniform Hypergraphs

Authors: Martin Dietzfelbinger and Stefan Walzer

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled “from the outside in”. The density thresholds f_k for peelability of our hypergraphs (f_3 ≈ 0.918, f_4 ≈ 0.977, f_5 ≈ 0.992, …) are well beyond the corresponding thresholds (c_3 ≈ 0.818, c_4 ≈ 0.772, c_5 ≈ 0.702, …) of standard k-uniform random hypergraphs. To get a grip on f_k, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on [0,1]^ℤ and f_k can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods. Random hypergraphs underlie the construction of various data structures based on hashing, for instance invertible Bloom filters, perfect hash functions, retrieval data structures, error correcting codes and cuckoo hash tables, where inputs are mapped to edges using hash functions. Frequently, the data structures rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. Memory efficiency is closely tied to edge density while worst and average case query times are tied to maximum and average edge size. To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al. [Fabiano Cupertino Botelho et al., 2013]. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time. Using k > 3 attains, at small sacrifices in running time, further improvements to memory usage.

Cite as

Martin Dietzfelbinger and Stefan Walzer. Dense Peelable Random Uniform Hypergraphs. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 38:1-38:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{dietzfelbinger_et_al:LIPIcs.ESA.2019.38,
  author =	{Dietzfelbinger, Martin and Walzer, Stefan},
  title =	{{Dense Peelable Random Uniform Hypergraphs}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{38:1--38:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.38},
  URN =		{urn:nbn:de:0030-drops-111599},
  doi =		{10.4230/LIPIcs.ESA.2019.38},
  annote =	{Keywords: Random Hypergraphs, Peeling Threshold, 2-Core, Hashing, Retrieval, Succinct Data Structure}
}
Document
Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications

Authors: Martin Dietzfelbinger and Stefan Walzer

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
In this paper we identify a new class of sparse near-quadratic random Boolean matrices that have full row rank over F_2 = {0,1} with high probability and can be transformed into echelon form in almost linear time by a simple version of Gauss elimination. The random matrix with dimensions n(1-epsilon) x n is generated as follows: In each row, identify a block of length L = O((log n)/epsilon) at a random position. The entries outside the block are 0, the entries inside the block are given by fair coin tosses. Sorting the rows according to the positions of the blocks transforms the matrix into a kind of band matrix, on which, as it turns out, Gauss elimination works very efficiently with high probability. For the proof, the effects of Gauss elimination are interpreted as a ("coin-flipping") variant of Robin Hood hashing, whose behaviour can be captured in terms of a simple Markov model from queuing theory. Bounds for expected construction time and high success probability follow from results in this area. They readily extend to larger finite fields in place of F_2. By employing hashing, this matrix family leads to a new implementation of a retrieval data structure, which represents an arbitrary function f: S -> {0,1} for some set S of m = (1-epsilon)n keys. It requires m/(1-epsilon) bits of space, construction takes O(m/epsilon^2) expected time on a word RAM, while queries take O(1/epsilon) time and access only one contiguous segment of O((log m)/epsilon) bits in the representation (O(1/epsilon) consecutive words on a word RAM). The method is readily implemented and highly practical, and it is competitive with state-of-the-art methods. In a more theoretical variant, which works only for unrealistically large S, we can even achieve construction time O(m/epsilon) and query time O(1), accessing O(1) contiguous memory words for a query. By well-established methods the retrieval data structure leads to efficient constructions of (static) perfect hash functions and (static) Bloom filters with almost optimal space and very local storage access patterns for queries.

Cite as

Martin Dietzfelbinger and Stefan Walzer. Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 39:1-39:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{dietzfelbinger_et_al:LIPIcs.ESA.2019.39,
  author =	{Dietzfelbinger, Martin and Walzer, Stefan},
  title =	{{Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{39:1--39:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.39},
  URN =		{urn:nbn:de:0030-drops-111602},
  doi =		{10.4230/LIPIcs.ESA.2019.39},
  annote =	{Keywords: Random Band Matrix, Gauss Elimination, Retrieval, Hashing, Succinct Data Structure, Randomised Data Structure, Robin Hood Hashing, Bloom Filter}
}
Document
Constant-Time Retrieval with O(log m) Extra Bits

Authors: Martin Dietzfelbinger and Stefan Walzer

Published in: LIPIcs, Volume 126, 36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019)


Abstract
For a set U (the universe), retrieval is the following problem. Given a finite subset S subseteq U of size m and f : S -> {0,1}^r for a small constant r, build a data structure D_f with the property that for a suitable query algorithm query we have query(D_f,x) = f(x) for all x in S. For x in U setminus S the value query(D_f,x) is arbitrary in {0,1}^r. The number of bits needed for D_f should be (1+epsilon)r m with overhead epsilon = epsilon(m) >= 0 as small as possible, while the query time should be small. Of course, the time for constructing D_f is relevant as well. We assume fully random hash functions on U with constant evaluation time are available. It is known that with epsilon ~= 0.09 one can achieve linear construction time and constant query time, and with overhead epsilon_k ~= e^{-k} it is possible to have O(k) query time and O(m^{1+alpha}) construction time, for arbitrary alpha>0. Furthermore, a theoretical construction with epsilon =O((log log m)/sqrt{log m}) gives constant query time and linear construction time. Known constructions avoiding all overhead, except for a seed value of size O(log log m), require logarithmic query time. In this paper, we present a method for treating the retrieval problem with overhead epsilon = O((log m)/m), which corresponds to O(1) extra memory words (O(log m) bits), and an extremely simple, constant-time query operation. The price to pay is a construction time of O(m^2). We employ the usual framework for retrieval data structures, where construction is effected by solving a sparse linear system of equations over the 2-element field F_2 and a query is effected by a dot product calculation. Our main technical contribution is the design and analysis of a new and natural family of sparse random linear systems with m equations and (1+epsilon)m variables, which combines good locality properties with high probability of having full rank. Paying a larger overhead of epsilon = O((log m)/m^alpha), the construction time can be reduced to O(m^{1+alpha}) for arbitrary constant 0 < alpha < 1. In combination with an adaptation of known techniques for solving sparse linear systems of equations, our approach leads to a highly practical algorithm for retrieval. In a particular benchmark with m = 10^7 we achieve an order-of-magnitude improvement over previous techniques with epsilon = 0.24% instead of the previously best result of epsilon ~= 3%, with better query time and no significant sacrifices in construction time.

Cite as

Martin Dietzfelbinger and Stefan Walzer. Constant-Time Retrieval with O(log m) Extra Bits. In 36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 126, pp. 24:1-24:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{dietzfelbinger_et_al:LIPIcs.STACS.2019.24,
  author =	{Dietzfelbinger, Martin and Walzer, Stefan},
  title =	{{Constant-Time Retrieval with O(log m) Extra Bits}},
  booktitle =	{36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019)},
  pages =	{24:1--24:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-100-9},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{126},
  editor =	{Niedermeier, Rolf and Paul, Christophe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2019.24},
  URN =		{urn:nbn:de:0030-drops-102639},
  doi =		{10.4230/LIPIcs.STACS.2019.24},
  annote =	{Keywords: Retrieval, Hashing, Succinct Data Structure, Randomised Data Structure, Structured Gaussian Elimination, Method of Four Russians}
}
Document
A Subquadratic Algorithm for 3XOR

Authors: Martin Dietzfelbinger, Philipp Schlag, and Stefan Walzer

Published in: LIPIcs, Volume 117, 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)


Abstract
Given a set X of n binary words of equal length w, the 3XOR problem asks for three elements a, b, c in X such that a oplus b=c, where oplus denotes the bitwise XOR operation. The problem can be easily solved on a word RAM with word length w in time O(n^2 log n). Using Han's fast integer sorting algorithm (STOC/J. Algorithms, 2002/2004) this can be reduced to O(n^2 log log n). With randomization or a sophisticated deterministic dictionary construction, creating a hash table for X with constant lookup time leads to an algorithm with (expected) running time O(n^2). At present, seemingly no faster algorithms are known. We present a surprisingly simple deterministic, quadratic time algorithm for 3XOR. Its core is a version of the PATRICIA tree for X, which makes it possible to traverse the set a oplus X in ascending order for arbitrary a in {0, 1}^{w} in linear time. Furthermore, we describe a randomized algorithm for 3XOR with expected running time O(n^2 * min{log^3(w)/w, (log log n)^2/log^2 n}). The algorithm transfers techniques to our setting that were used by Baran, Demaine, and Patrascu (WADS/Algorithmica, 2005/2008) for solving the related int3SUM problem (the same problem with integer addition in place of binary XOR) in expected time o(n^2). As suggested by Jafargholi and Viola (Algorithmica, 2016), linear hash functions are employed. The latter authors also showed that assuming 3XOR needs expected running time n^(2-o(1)) one can prove conditional lower bounds for triangle enumeration just as with 3SUM. We demonstrate that 3XOR can be reduced to other problems as well, treating the examples offline SetDisjointness and offline SetIntersection, which were studied for 3SUM by Kopelowitz, Pettie, and Porat (SODA, 2016).

Cite as

Martin Dietzfelbinger, Philipp Schlag, and Stefan Walzer. A Subquadratic Algorithm for 3XOR. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 117, pp. 59:1-59:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{dietzfelbinger_et_al:LIPIcs.MFCS.2018.59,
  author =	{Dietzfelbinger, Martin and Schlag, Philipp and Walzer, Stefan},
  title =	{{A Subquadratic Algorithm for 3XOR}},
  booktitle =	{43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)},
  pages =	{59:1--59:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-086-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{117},
  editor =	{Potapov, Igor and Spirakis, Paul and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2018.59},
  URN =		{urn:nbn:de:0030-drops-96417},
  doi =		{10.4230/LIPIcs.MFCS.2018.59},
  annote =	{Keywords: 3SUM, 3XOR, Randomized Algorithms, Reductions, Conditional Lower Time Bounds}
}
  • Refine by Author
  • 9 Dietzfelbinger, Martin
  • 5 Walzer, Stefan
  • 2 Pagh, Rasmus
  • 1 Aumüller, Martin
  • 1 Bhattacharya, Sreejata Kishor
  • Show More...

  • Refine by Classification
  • 4 Theory of computation → Data structures design and analysis
  • 4 Theory of computation → Design and analysis of algorithms
  • 2 Theory of computation → Sorting and searching
  • 1 Applied computing → Bioinformatics
  • 1 Mathematics of computing → Combinatorial algorithms
  • Show More...

  • Refine by Keyword
  • 5 Hashing
  • 4 Retrieval
  • 3 Succinct Data Structure
  • 2 Algorithms
  • 2 Randomised Data Structure
  • Show More...

  • Refine by Type
  • 21 document

  • Refine by Publication Year
  • 7 2024
  • 3 2007
  • 3 2019
  • 3 2023
  • 1 2008
  • Show More...