Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

We present a dynamic algorithm for maintaining the connected and 2-edge-connected components in an undirected graph subject to edge deletions. The algorithm is Monte-Carlo randomized and processes any sequence of edge deletions in O(m + n poly log n) total time. Interspersed with the deletions, it can answer queries whether any two given vertices currently belong to the same (2-edge-)connected component in constant time. Our result is based on a general Monte-Carlo randomized reduction from decremental c-edge-connectivity to a variant of fully-dynamic c-edge-connectivity on a sparse graph.
For non-sparse graphs with Ω(n poly log n) edges, our connectivity and 2-edge-connectivity algorithms handle all deletions in optimal linear total time, using existing algorithms for the respective fully-dynamic problems. This improves upon an O(m log (n² / m) + n poly log n)-time algorithm of Thorup [J.Alg. 1999], which runs in linear time only for graphs with Ω(n²) edges.
Our constant amortized cost for edge deletions in decremental connectivity in non-sparse graphs should be contrasted with an Ω(log n/log log n) worst-case time lower bound in the decremental setting [Alstrup, Husfeldt, and Rauhe FOCS'98] as well as an Ω(log n) amortized time lower-bound in the fully-dynamic setting [Patrascu and Demaine STOC'04].

Anders Aamand, Adam Karczmarz, Jakub Łącki, Nikos Parotsidis, Peter M. R. Rasmussen, and Mikkel Thorup. Optimal Decremental Connectivity in Non-Sparse Graphs. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 6:1-6:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{aamand_et_al:LIPIcs.ICALP.2023.6, author = {Aamand, Anders and Karczmarz, Adam and {\L}\k{a}cki, Jakub and Parotsidis, Nikos and Rasmussen, Peter M. R. and Thorup, Mikkel}, title = {{Optimal Decremental Connectivity in Non-Sparse Graphs}}, booktitle = {50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)}, pages = {6:1--6:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-278-5}, ISSN = {1868-8969}, year = {2023}, volume = {261}, editor = {Etessami, Kousha and Feige, Uriel and Puppis, Gabriele}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.6}, URN = {urn:nbn:de:0030-drops-180581}, doi = {10.4230/LIPIcs.ICALP.2023.6}, annote = {Keywords: decremental connectivity, dynamic connectivity} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

The Sparse Johnson-Lindenstrauss Transform of Kane and Nelson (SODA 2012) provides a linear dimensionality-reducing map A ∈ ℝ^{m × u} in 𝓁₂ that preserves distances up to distortion of 1 + ε with probability 1 - δ, where m = O(ε^{-2} log 1/δ) and each column of A has O(ε m) non-zero entries. The previous analyses of the Sparse Johnson-Lindenstrauss Transform all assumed access to a Ω(log 1/δ)-wise independent hash function. The main contribution of this paper is a more general analysis of the Sparse Johnson-Lindenstrauss Transform with less assumptions on the hash function. We also show that the Mixed Tabulation hash function of Dahlgaard, Knudsen, Rotenberg, and Thorup (FOCS 2015) satisfies the conditions of our analysis, thus giving us the first analysis of a Sparse Johnson-Lindenstrauss Transform that works with a practical hash function.

Jakob Bæk Tejs Houen and Mikkel Thorup. A Sparse Johnson-Lindenstrauss Transform Using Fast Hashing. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 76:1-76:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{houen_et_al:LIPIcs.ICALP.2023.76, author = {Houen, Jakob B{\ae}k Tejs and Thorup, Mikkel}, title = {{A Sparse Johnson-Lindenstrauss Transform Using Fast Hashing}}, booktitle = {50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)}, pages = {76:1--76:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-278-5}, ISSN = {1868-8969}, year = {2023}, volume = {261}, editor = {Etessami, Kousha and Feige, Uriel and Puppis, Gabriele}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.76}, URN = {urn:nbn:de:0030-drops-181281}, doi = {10.4230/LIPIcs.ICALP.2023.76}, annote = {Keywords: dimensionality reduction, hashing, concentration bounds, moment bounds} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Simple tabulation hashing dates back to Zobrist in 1970 and is defined as follows: Each key is viewed as c characters from some alphabet Σ, we have c fully random hash functions h₀, …, h_{c - 1} : Σ → {{0, …, 2^l - 1}}, and a key x = (x₀, …, x_{c - 1}) is hashed to h(x) = h₀(x₀) ⊕ … ⊕ h_{c - 1}(x_{c - 1}) where ⊕ is the bitwise XOR operation. The previous results on tabulation hashing by Pǎtraşcu and Thorup [J.ACM'11] and by Aamand et al. [STOC'20] focused on proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys hashing to a given value, for simple tabulation hashing, but their bounds do not cover the entire tail. Thus their results cannot bound moments. The paper Dahlgaard et al. [FOCS'15] provides a bound on the moments of certain hash-based sums, but their bound only holds for constant moments, and we need logarithmic moments.
Chaoses are random variables of the form ∑ a_{i₀, …, i_{c - 1}} X_{i₀} ⋅ … ⋅ X_{i_{c - 1}} where X_i are independent random variables. Chaoses are a well-studied concept from probability theory, and tight analysis has been proven in several instances, e.g., when the independent random variables are standard Gaussian variables and when the independent random variables have logarithmically convex tails. We notice that hash-based sums of simple tabulation hashing can be seen as a sum of chaoses that are not independent. This motivates us to use techniques from the theory of chaoses to analyze hash-based sums of simple tabulation hashing.
In this paper, we obtain bounds for all the moments of hash-based sums for simple tabulation hashing which are tight up to constants depending only on c. In contrast with the previous attempts, our approach will mostly be analytical and does not employ intricate combinatorial arguments. The improved analysis of simple tabulation hashing allows us to obtain bounds for the moments of hash-based sums for the mixed tabulation hashing introduced by Dahlgaard et al. [FOCS'15]. With simple tabulation hashing, there are certain inputs for which the concentration is much worse than with fully random hashing. However, with mixed tabulation, we get logarithmic moment bounds that are only a constant factor worse than those with fully random hashing for any possible input. This is a strong addition to other powerful probabilistic properties of mixed tabulation hashing proved by Dahlgaard et al.

Jakob Bæk Tejs Houen and Mikkel Thorup. Understanding the Moments of Tabulation Hashing via Chaoses. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 74:1-74:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{houen_et_al:LIPIcs.ICALP.2022.74, author = {Houen, Jakob B{\ae}k Tejs and Thorup, Mikkel}, title = {{Understanding the Moments of Tabulation Hashing via Chaoses}}, booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)}, pages = {74:1--74:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-235-8}, ISSN = {1868-8969}, year = {2022}, volume = {229}, editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.74}, URN = {urn:nbn:de:0030-drops-164154}, doi = {10.4230/LIPIcs.ICALP.2022.74}, annote = {Keywords: hashing, concentration bounds, moment bounds} }

Document

Invited Paper

**Published in:** LIPIcs, Volume 227, 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)

We consider the numerical taxonomy problem of fitting an S× S distance matrix D with a tree metric T. Here T is a weighted tree spanning S where the path lengths in T induce a metric on S. If there is a tree metric matching D exactly, then it is easily found. If there is no exact match, then for some k, we want to minimize the L_k norm of the errors, that is, pick T so as to minimize
‖D-T‖_k = (∑_{i,j ∈ S} |D(i,j)-T(i,j)|^k) ^{1/k}.
This problem was raised in biology in the 1960s for k = 1,2. The biological interpretation is that T represents a possible evolution behind the species in S matching some measured distances in D. Sometimes, it is required that T is an ultrametric, meaning that all species are at the same distance from the root.
An evolutionary tree induces a hierarchical classification of species and this is not just tied to biology. Medicine, ecology and linguistics are just some of the fields where this concept appears, and it is even an integral part of machine learning and data science. Fundamentally, if we can approximate distances with a tree, then they are much easier to reason about: many questions that are NP-hard for general metrics can be answered in linear time on tree metrics. In fact, humans have appreciated hierarchical classifications at least since Plato and Aristotle (350 BC).
The numerical taxonomy problem is important in practice and many heuristics have been proposed. In this talk we will review the basic algorithmic theory, results and techniques, for the problem, including the most recent result from FOCS'21 [Vincent Cohen-Addad et al., 2021]. They paint a varied landscape with big differences between different moments, and with some very nice open problems remaining.
- At STOC'93, Farach, Kannan, and Warnow [Martin Farach et al., 1995] proved that under L_∞, we can find the optimal ultrametric. Almost all other variants of the problem are APX-hard.
- At SODA'96, Agarwala, Bafna, Farach, Paterson, and Thorup [Richa Agarwala et al., 1999] showed that for any norm L_k, k ≥ 1, if the best ultrametric can be α-approximated, then the best tree metric can be 3α-approximated. In particular, this implied a 3-approximation for tree metrics under L_∞.
- At FOCS'05, Ailon and Charikar [Nir Ailon and Moses Charikar, 2011] showed that for any L_k, k ≥ 1, we can get an approximation factor of O(((log n)(log log n))^{1/k}) for both tree and ultrametrics. Their paper was focused on the L₁ norm, and they wrote "Determining whether an O(1) approximation can be obtained is a fascinating question".
- At FOCS'21, Cohen-Addad, Das, Kipouridis, Parotsidis, and Thorup [Vincent Cohen-Addad et al., 2021] showed that indeed a constant factor is possible for L₁ for both tree and ultrametrics. This uses the special structure of L₁ in relation to hierarchies.
- The status of L_k is wide open for 1 < k < ∞. All we know is that the approximation factor is between Ω(1) and O((log n)(log log n)).

Mikkel Thorup. Reconstructing the Tree of Life (Fitting Distances by Tree Metrics) (Invited Paper). In 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 227, pp. 3:1-3:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{thorup:LIPIcs.SWAT.2022.3, author = {Thorup, Mikkel}, title = {{Reconstructing the Tree of Life (Fitting Distances by Tree Metrics)}}, booktitle = {18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)}, pages = {3:1--3:2}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-236-5}, ISSN = {1868-8969}, year = {2022}, volume = {227}, editor = {Czumaj, Artur and Xin, Qin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2022.3}, URN = {urn:nbn:de:0030-drops-161631}, doi = {10.4230/LIPIcs.SWAT.2022.3}, annote = {Keywords: Numerical taxonomy, computational phylogenetics, hierarchical clustering} }

Document

**Published in:** LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A x B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a cap b|/|a cup b| for (a,b) in A x B.
We consider the approximate version of the problem where we are given thresholds j_1 > j_2 and wish to return a pair from A x B that has Jaccard similarity higher than j_2 if there exists a pair in A x B with Jaccard similarity at least j_1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n^(2-delta)) time if j_1 >= j_2^(1-delta). In particular, for delta=Omega(1), the approximation ratio j_1/j_2 = 1/j_2^delta increases polynomially in 1/j_2.
In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n^(2-Omega(1))) time for j_1/j_2 = 1/j_2^o(1). Specifically, assuming OVC, we prove that for any delta>0 there exists an epsilon>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Omega(n^(2-delta)) for any choice of thresholds j_2 < j_1 < 1-delta, that satisfy j_1 <= j_2^(1-epsilon).

Rasmus Pagh, Nina Mesing Stausholm, and Mikkel Thorup. Hardness of Bichromatic Closest Pair with Jaccard Similarity. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 74:1-74:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{pagh_et_al:LIPIcs.ESA.2019.74, author = {Pagh, Rasmus and Stausholm, Nina Mesing and Thorup, Mikkel}, title = {{Hardness of Bichromatic Closest Pair with Jaccard Similarity}}, booktitle = {27th Annual European Symposium on Algorithms (ESA 2019)}, pages = {74:1--74:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-124-5}, ISSN = {1868-8969}, year = {2019}, volume = {144}, editor = {Bender, Michael A. and Svensson, Ola and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.74}, URN = {urn:nbn:de:0030-drops-111951}, doi = {10.4230/LIPIcs.ESA.2019.74}, annote = {Keywords: fine-grained complexity, set similarity search, bichromatic closest pair, jaccard similarity} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

We consider word RAM data structures for maintaining ordered sets of integers whose select and rank operations are allowed to return approximate results, i.e., ranks, or items whose rank, differ by less than Delta from the exact answer, where Delta=Delta(n) is an error parameter. Related to approximate select and rank is approximate (one-dimensional) nearest-neighbor. A special case of approximate select queries are approximate min queries. Data structures that support approximate min operations are known as approximate heaps (priority queues). Related to approximate heaps are soft heaps, which are approximate heaps with a different notion of approximation.
We prove the optimality of all the data structures presented, either through matching cell-probe lower bounds, or through equivalences to well studied static problems. For approximate select, rank, and nearest-neighbor operations we get matching cell-probe lower bounds. We prove an equivalence between approximate min operations, i.e., approximate heaps, and the static partitioning problem. Finally, we prove an equivalence between soft heaps and the classical sorting problem, on a smaller number of items.
Our results have many interesting and unexpected consequences. It turns out that approximation greatly speeds up some of these operations, while others are almost unaffected. In particular, while select and rank have identical operation times, both in comparison-based and word RAM implementations, an interesting separation emerges between the approximate versions of these operations in the word RAM model. Approximate select is much faster than approximate rank. It also turns out that approximate min is exponentially faster than the more general approximate select. Next, we show that implementing soft heaps is harder than implementing approximate heaps. The relation between them corresponds to the relation between sorting and partitioning.
Finally, as an interesting byproduct, we observe that a combination of known techniques yields a deterministic word RAM algorithm for (exactly) sorting n items in O(n log log_w n) time, where w is the word length. Even for the easier problem of finding duplicates, the best previous deterministic bound was O(min{n log log n,n log_w n}). Our new unifying bound is an improvement when w is sufficiently large compared with n.

Mikkel Thorup, Or Zamir, and Uri Zwick. Dynamic Ordered Sets with Approximate Queries, Approximate Heaps and Soft Heaps. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 95:1-95:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{thorup_et_al:LIPIcs.ICALP.2019.95, author = {Thorup, Mikkel and Zamir, Or and Zwick, Uri}, title = {{Dynamic Ordered Sets with Approximate Queries, Approximate Heaps and Soft Heaps}}, booktitle = {46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)}, pages = {95:1--95:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-109-2}, ISSN = {1868-8969}, year = {2019}, volume = {132}, editor = {Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.95}, URN = {urn:nbn:de:0030-drops-106712}, doi = {10.4230/LIPIcs.ICALP.2019.95}, annote = {Keywords: Order queries, word RAM, lower bounds} }

Document

**Published in:** LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

We consider the classic d-choice paradigm of Azar et al. [STOC'94] in which m balls are put into n bins sequentially as follows: For each ball we are given a choice of d bins chosen according to d hash functions and the ball is placed in the least loaded of these bins, breaking ties arbitrarily. The interest is in the number of balls in the fullest bin after all balls have been placed.
In this paper we suppose that the d hash functions are simple tabulation hash functions which are easy to implement and can be evaluated in constant time. Generalising a result by Dahlgaard et al. [SODA'16] we show that for an arbitrary constant d >= 2 the expected maximum load is at most (lg lg n)/(lg d) + O(1). We further show that by using a simple tie-breaking algorithm introduced by Vöcking [J.ACM'03] the expected maximum load is reduced to (lg lg n)/(d lg phi_d) + O(1) where phi_d is the rate of growth of the d-ary Fibonacci numbers. Both of these expected bounds match those known from the fully random setting.
The analysis by Dahlgaard et al. relies on a proof by Patrascu and Thorup [J.ACM'11] concerning the use of simple tabulation for cuckoo hashing. We require a generalisation to d>2 hash functions, but the original proof is an 8-page tour de force of ad-hoc arguments that do not appear to generalise. Our main technical contribution is a shorter, simpler and more accessible proof of the result by Patrascu and Thorup, where the relevant parts generalise nicely to the analysis of d choices.

Anders Aamand, Mathias Bæk Tejs Knudsen, and Mikkel Thorup. Power of d Choices with Simple Tabulation. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{aamand_et_al:LIPIcs.ICALP.2018.5, author = {Aamand, Anders and B{\ae}k Tejs Knudsen, Mathias and Thorup, Mikkel}, title = {{Power of d Choices with Simple Tabulation}}, booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)}, pages = {5:1--5:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-076-7}, ISSN = {1868-8969}, year = {2018}, volume = {107}, editor = {Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.5}, URN = {urn:nbn:de:0030-drops-90096}, doi = {10.4230/LIPIcs.ICALP.2018.5}, annote = {Keywords: Hashing, Load Balancing, Balls and Bins, Simple Tabulation} }

Document

Invited Talk

**Published in:** LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees.
Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of c characters and we have precomputed character tables h_1,...,h_q mapping characters to random hash values. A key x=(x_1,...,x_c) is hashed to h_1[x_1] xor h_2[x_2]..... xor h_c[x_c]. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing.
Next we consider twisted tabulation where one character is "twisted" with some simple operations. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing.
Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing.
While these tabulation schemes are all easy to implement and use, their analysis is not.
This keynote talk surveys results from the papers in the reference list.

Mikkel Thorup. Fast and Powerful Hashing Using Tabulation (Invited Talk). In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 4:1-4:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{thorup:LIPIcs.ICALP.2017.4, author = {Thorup, Mikkel}, title = {{Fast and Powerful Hashing Using Tabulation}}, booktitle = {44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)}, pages = {4:1--4:2}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-041-5}, ISSN = {1868-8969}, year = {2017}, volume = {80}, editor = {Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.4}, URN = {urn:nbn:de:0030-drops-75074}, doi = {10.4230/LIPIcs.ICALP.2017.4}, annote = {Keywords: Hashing, Randomized Algorithms} }

Document

Invited Talk

**Published in:** LIPIcs, Volume 65, 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees.
Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of c characters and we have precomputed character tables h_1, ... , h_q mapping characters to random hash values. A key x = (x_1 , ..., x_c) is hashed to h_1[x_1] + h_2[x_2] ... + h_c[x_c]. This scheme is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does
provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing.
Next we consider twisted tabulation where one character is "twisted" with some simple operations. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing.
Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size
proportional to that of the space consumed, double tabulation gives fully-random hashing.
While these tabulation schemes are all easy to implement and use, their analysis is not.

Mikkel Thorup. Fast and Powerful Hashing Using Tabulation (Invited Talk). In 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 65, pp. 1:1-1:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{thorup:LIPIcs.FSTTCS.2016.1, author = {Thorup, Mikkel}, title = {{Fast and Powerful Hashing Using Tabulation}}, booktitle = {36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)}, pages = {1:1--1:2}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-027-9}, ISSN = {1868-8969}, year = {2016}, volume = {65}, editor = {Lal, Akash and Akshay, S. and Saurabh, Saket and Sen, Sandeep}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2016.1}, URN = {urn:nbn:de:0030-drops-68869}, doi = {10.4230/LIPIcs.FSTTCS.2016.1}, annote = {Keywords: Hashing, Randomized Algorithms} }

Document

**Published in:** LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

We present a deterministic incremental algorithm for exactly maintaining the size of a minimum cut with ~O(1) amortized time per edge insertion and O(1) query time. This result partially answers an open question posed by Thorup [Combinatorica 2007]. It also stays in sharp contrast to a polynomial conditional lower-bound for the fully-dynamic weighted minimum cut problem. Our algorithm is obtained by combining a recent sparsification technique of Kawarabayashi and Thorup [STOC 2015] and an exact incremental algorithm of Henzinger [J. of Algorithm 1997].
We also study space-efficient incremental algorithms for the minimum cut problem. Concretely, we show that there exists an O(n log n/epsilon^2) space Monte-Carlo algorithm that can process a stream of edge insertions starting from an empty graph, and with high probability, the algorithm maintains a (1+epsilon)-approximation to the minimum cut. The algorithm has ~O(1) amortized update-time and constant query-time.

Gramoz Goranci, Monika Henzinger, and Mikkel Thorup. Incremental Exact Min-Cut in Poly-logarithmic Amortized Update Time. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 46:1-46:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{goranci_et_al:LIPIcs.ESA.2016.46, author = {Goranci, Gramoz and Henzinger, Monika and Thorup, Mikkel}, title = {{Incremental Exact Min-Cut in Poly-logarithmic Amortized Update Time}}, booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)}, pages = {46:1--46:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-015-6}, ISSN = {1868-8969}, year = {2016}, volume = {57}, editor = {Sankowski, Piotr and Zaroliagis, Christos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.46}, URN = {urn:nbn:de:0030-drops-63584}, doi = {10.4230/LIPIcs.ESA.2016.46}, annote = {Keywords: Dynamic Graph Algorithms, Minimum Cut, Edge Connectivity} }

Document

**Published in:** LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

We present a deterministic dynamic connectivity data structure for undirected graphs with worst case update time O(sqrt{(n(log(log(n)))^2)/log(n)}) and constant query time. This improves on the previous best deterministic worst case algorithm of Frederickson (SIAM J. Comput., 1985) and Eppstein Galil, Italiano, and Nissenzweig (J. ACM, 1997), which had update time O(sqrt{n}). All other algorithms for dynamic connectivity are either randomized (Monte Carlo) or have only amortized performance guarantees.

Casper Kejlberg-Rasmussen, Tsvi Kopelowitz, Seth Pettie, and Mikkel Thorup. Faster Worst Case Deterministic Dynamic Connectivity. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 53:1-53:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{kejlbergrasmussen_et_al:LIPIcs.ESA.2016.53, author = {Kejlberg-Rasmussen, Casper and Kopelowitz, Tsvi and Pettie, Seth and Thorup, Mikkel}, title = {{Faster Worst Case Deterministic Dynamic Connectivity}}, booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)}, pages = {53:1--53:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-015-6}, ISSN = {1868-8969}, year = {2016}, volume = {57}, editor = {Sankowski, Piotr and Zaroliagis, Christos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.53}, URN = {urn:nbn:de:0030-drops-63953}, doi = {10.4230/LIPIcs.ESA.2016.53}, annote = {Keywords: dynamic graph, spanning tree} }

Document

**Published in:** LIPIcs, Volume 51, 32nd International Symposium on Computational Geometry (SoCG 2016)

We describe an algorithm for solving an important geometric problem arising in computer-aided manufacturing. When machining a pocket in a solid piece of material such as steel using a rough tool in a milling machine, sharp convex corners of the pocket cannot be done properly, but have to be left for finer tools that are more expensive to use. We want to determine a tool path that maximizes the use of the rough tool. Mathematically, this boils down to the following problem. Given a simply-connected set of points P in the plane such that the boundary of P is a curvilinear polygon consisting of n line segments and circular arcs of arbitrary radii, compute the maximum subset Q of P consisting of simply-connected sets where the boundary of each set is a curve with bounded convex curvature. A closed curve has bounded convex curvature if, when traversed in counterclockwise direction, it turns to the left with curvature at most 1. There is no bound on the curvature where it turns to the right. The difference in the requirement to left- and right-curvature is a natural consequence of different conditions when machining convex and concave areas of the pocket. We devise an algorithm to compute the unique maximum such set Q. The algorithm runs in O(n log n) time and uses O(n) space.
For the correctness of our algorithm, we prove a new generalization of the Pestov-Ionin Theorem. This is needed to show that the output Q of our algorithm is indeed maximum in the sense that if Q' is any subset of P with a boundary of bounded convex curvature, then Q' is a subset of Q.

Mikkel Abrahamsen and Mikkel Thorup. Finding the Maximum Subset with Bounded Convex Curvature. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{abrahamsen_et_al:LIPIcs.SoCG.2016.4, author = {Abrahamsen, Mikkel and Thorup, Mikkel}, title = {{Finding the Maximum Subset with Bounded Convex Curvature}}, booktitle = {32nd International Symposium on Computational Geometry (SoCG 2016)}, pages = {4:1--4:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-009-5}, ISSN = {1868-8969}, year = {2016}, volume = {51}, editor = {Fekete, S\'{a}ndor and Lubiw, Anna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2016.4}, URN = {urn:nbn:de:0030-drops-58960}, doi = {10.4230/LIPIcs.SoCG.2016.4}, annote = {Keywords: planar computational geometry, bounded curvature, pocket machining} }

Document

**Published in:** LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)

Gabow and Tarjan showed that the Bottleneck Path (BP) problem, i.e., finding a path between a given source and a given target in a weighted directed graph whose largest edge weight is minimized, as well as the Bottleneck spanning tree (BST) problem, i.e., finding a directed spanning tree rooted at a given vertex whose largest edge weight is minimized, can both be solved deterministically in O(m * log^*(n)) time, where m is the number of edges and n is the number of vertices in the graph. We present a slightly improved randomized algorithm for these problems with an expected running time of O(m * beta(m,n)), where beta(m,n) = min{k >= 1 | log^{(k)}n <= m/n } <= log^*(n) - log^*(m/n)+1. This is the first improvement for these problems in over 25 years. In particular, if m >= n * log^{(k)} * n, for some constant k, the expected running time of the new algorithm is O(m). Our algorithm, as that of Gabow and Tarjan, work in the comparison model. We also observe that in the word-RAM model, both problems can be solved deterministically in O(m) time. Finally, we solve an open problem of Andersson et al., giving a deterministic O(m)-time comparison-based algorithm for solving deterministic 2-player turn-based zero-sum terminal payoff games, also known as Deterministic Graphical Games (DGG).

Shiri Chechik, Haim Kaplan, Mikkel Thorup, Or Zamir, and Uri Zwick. Bottleneck Paths and Trees and Deterministic Graphical Games. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 27:1-27:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{chechik_et_al:LIPIcs.STACS.2016.27, author = {Chechik, Shiri and Kaplan, Haim and Thorup, Mikkel and Zamir, Or and Zwick, Uri}, title = {{Bottleneck Paths and Trees and Deterministic Graphical Games}}, booktitle = {33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)}, pages = {27:1--27:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-001-9}, ISSN = {1868-8969}, year = {2016}, volume = {47}, editor = {Ollinger, Nicolas and Vollmer, Heribert}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.27}, URN = {urn:nbn:de:0030-drops-57283}, doi = {10.4230/LIPIcs.STACS.2016.27}, annote = {Keywords: bottleneck paths, comparison model, deterministic graphical games} }

Document

**Published in:** LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)

Recognizing 3-colorable graphs is one of the most famous NP-complete problems [Garey, Johnson, and Stockmeyer, STOC'74]. The problem of coloring 3-colorable graphs in polynomial time with as few colors as possible has been intensively studied: O(n^{1/2}) colors [Wigderson, STOC'82], O(n^{2/5}) colors [Blum, STOC'89], O(n^{3/8}) colors [Blum, FOCS'90], O(n^{1/4}) colors [Karger, Motwani and Sudan, FOCS'94], O(n^{3/14})=O(n^0.2142) colors [Blum and Karger, IPL'97], O(n^{0.2111}) colors [Arora, Chlamtac, and Charikar, STOC'06], and O(n^{0.2072}) colors [Chlamtac, FOCS'07]. Recently the authors got down to O(n^{0.2049}) colors [FOCS'12]. In this paper we get down to O(n^{0.19996})=o(n^{1/5}) colors.
Since 1994, the best bounds have all been obtained balancing between combinatorial and semi-definite approaches. We present a new combinatorial recursion that only makes sense in collaboration with semi-definite programming. We specifically target the worst-case for semi-definite programming: high degrees. By focusing on the interplay, we obtained the biggest improvement in the exponent since 1997.

Ken-ichi Kawarabayashi and Mikkel Thorup. Coloring 3-colorable graphs with o(n^{1/5}) colors. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 458-469, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{kawarabayashi_et_al:LIPIcs.STACS.2014.458, author = {Kawarabayashi, Ken-ichi and Thorup, Mikkel}, title = {{Coloring 3-colorable graphs with o(n^\{1/5\}) colors}}, booktitle = {31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)}, pages = {458--469}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-65-1}, ISSN = {1868-8969}, year = {2014}, volume = {25}, editor = {Mayr, Ernst W. and Portier, Natacha}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.458}, URN = {urn:nbn:de:0030-drops-44797}, doi = {10.4230/LIPIcs.STACS.2014.458}, annote = {Keywords: Approximation Algorithms, Graph Coloring} }