DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2024.69

PHOBIC: Perfect Hashing With Optimized Bucket Sizes and Interleaved Coding

Authors: Stefan Hermann, Hans-Peter Lehmann, Giulio Ermanno Pibiri, Peter Sanders, and Stefan Walzer

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

A minimal perfect hash function (or MPHF) maps a set of n keys to [n] : = {1, …, n} without collisions. Such functions find widespread application e.g. in bioinformatics and databases. In this paper we revisit PTHash - a construction technique particularly designed for fast queries. PTHash distributes the input keys into small buckets and, for each bucket, it searches for a hash function seed that places its keys in the output domain without collisions. The collection of all seeds is then stored in a compressed way. Since the first buckets are easier to place, buckets are considered in non-increasing order of size. Additionally, PTHash heuristically produces an imbalanced distribution of bucket sizes by distributing 60% of the keys into 30% of the buckets. Our main contribution is to characterize, up to lower order terms, an optimal choice for the expected bucket sizes, improving construction throughput for space efficient configurations both in theory and practice. Further contributions include a new encoding scheme for seeds that works across partitions of the data structure and a GPU parallelization. Compared to PTHash, PHOBIC is 0.17 bits/key more space efficient for same query time and construction throughput. For a configuration with fast queries, our GPU implementation can construct an MPHF at 2.17 bits/key in 28 ns/key, which can be queried in 37 ns/query on the CPU.

Cite as

Stefan Hermann, Hans-Peter Lehmann, Giulio Ermanno Pibiri, Peter Sanders, and Stefan Walzer. PHOBIC: Perfect Hashing With Optimized Bucket Sizes and Interleaved Coding. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 69:1-69:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{hermann_et_al:LIPIcs.ESA.2024.69,
  author =	{Hermann, Stefan and Lehmann, Hans-Peter and Pibiri, Giulio Ermanno and Sanders, Peter and Walzer, Stefan},
  title =	{{PHOBIC: Perfect Hashing With Optimized Bucket Sizes and Interleaved Coding}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{69:1--69:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.69},
  URN =		{urn:nbn:de:0030-drops-211405},
  doi =		{10.4230/LIPIcs.ESA.2024.69},
  annote =	{Keywords: Compressed Data Structures, Minimal Perfect Hashing, GPU}
}

Document

DOI: 10.4230/LIPIcs.ESA.2024.97

Engineering Edge Orientation Algorithms

Authors: Henrik Reinstädtler, Christian Schulz, and Bora Uçar

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

Given an undirected graph G, the edge orientation problem asks for assigning a direction to each edge to convert G into a directed graph. The aim is to minimize the maximum out-degree of a vertex in the resulting directed graph. This problem, which is solvable in polynomial time, arises in many applications. An ongoing challenge in edge orientation algorithms is their scalability, particularly in handling large-scale networks with millions or billions of edges efficiently. We propose a novel algorithmic framework based on finding and manipulating simple paths to face this challenge. Our framework is based on an existing algorithm and allows many algorithmic choices. By carefully exploring these choices and engineering the underlying algorithms, we obtain an implementation which is more efficient and scalable than the current state-of-the-art. Our experiments demonstrate significant performance improvements compared to state-of-the-art solvers. On average our algorithm is 6.59 times faster when compared to the state-of-the-art.

Cite as

Henrik Reinstädtler, Christian Schulz, and Bora Uçar. Engineering Edge Orientation Algorithms. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 97:1-97:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{reinstadtler_et_al:LIPIcs.ESA.2024.97,
  author =	{Reinst\"{a}dtler, Henrik and Schulz, Christian and U\c{c}ar, Bora},
  title =	{{Engineering Edge Orientation Algorithms}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{97:1--97:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.97},
  URN =		{urn:nbn:de:0030-drops-211682},
  doi =		{10.4230/LIPIcs.ESA.2024.97},
  annote =	{Keywords: edge orientation, pseudoarboricity, graph algorithms}
}

Document

DOI: 10.4230/LIPIcs.SEA.2024.2

Engineering Zuffix Arrays

Authors: Paolo Boldi, Stefano Marchini, and Sebastiano Vigna

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)

Abstract

Searching patterns in long strings is a classical algorithmic problem with countless practical applications. Suffix trees and suffix arrays (and their variants) are a long-established solution that yields linear-time search (in the size of the pattern). In [Paolo Boldi and Sebastiano Vigna, 2018] it is shown that a z-map gadget can be attached to (enhanced) suffix arrays to improve their theoretical query time, obtaining a data structure called zuffix array. The main contribution of this paper is to show that a carefully engineered implementation of the z-map gadget does provide significant speedups with respect to enhanced suffix arrays on real-world datasets, albeit doubling the required space. In particular, for large alphabets we observe a sevenfold improvement in query time with respect to enhanced suffix arrays; even in the worst case (small alphabets), the query time is almost halved. Thus, zuffix arrays provide a very interesting new point in the space-time tradeoff spectrum.

Cite as

Paolo Boldi, Stefano Marchini, and Sebastiano Vigna. Engineering Zuffix Arrays. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 2:1-2:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boldi_et_al:LIPIcs.SEA.2024.2,
  author =	{Boldi, Paolo and Marchini, Stefano and Vigna, Sebastiano},
  title =	{{Engineering Zuffix Arrays}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{2:1--2:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.2},
  URN =		{urn:nbn:de:0030-drops-203677},
  doi =		{10.4230/LIPIcs.SEA.2024.2},
  annote =	{Keywords: Suffix trees, suffix arrays, z-fast tries}
}

Document

DOI: 10.4230/LIPIcs.SEA.2024.5

Buffered Streaming Edge Partitioning

Authors: Adil Chhabra, Marcelo Fonseca Faraj, Christian Schulz, and Daniel Seemaier

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)

Abstract

Addressing the challenges of processing massive graphs, which are prevalent in diverse fields such as social, biological, and technical networks, we introduce HeiStreamE and FreightE, two innovative (buffered) streaming algorithms designed for efficient edge partitioning of large-scale graphs. HeiStreamE utilizes an adapted Split-and-Connect graph model and a Fennel-based multilevel partitioning scheme, while FreightE partitions a hypergraph representation of the input graph. Besides ensuring superior solution quality, these approaches also overcome the limitations of existing algorithms by maintaining linear dependency on the graph size in both time and memory complexity with no dependence on the number of blocks of partition. Our comprehensive experimental analysis demonstrates that HeiStreamE outperforms current streaming algorithms and the re-streaming algorithm 2PS in partitioning quality (replication factor), and is more memory-efficient for real-world networks where the number of edges is far greater than the number of vertices. Further, FreightE is shown to produce fast and efficient partitions, particularly for higher numbers of partition blocks.

Cite as

Adil Chhabra, Marcelo Fonseca Faraj, Christian Schulz, and Daniel Seemaier. Buffered Streaming Edge Partitioning. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 5:1-5:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{chhabra_et_al:LIPIcs.SEA.2024.5,
  author =	{Chhabra, Adil and Fonseca Faraj, Marcelo and Schulz, Christian and Seemaier, Daniel},
  title =	{{Buffered Streaming Edge Partitioning}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{5:1--5:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.5},
  URN =		{urn:nbn:de:0030-drops-203701},
  doi =		{10.4230/LIPIcs.SEA.2024.5},
  annote =	{Keywords: graph partitioning, edge partitioning, streaming, online, buffered partitioning}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.37

Fast Approximate Counting of Cycles

Authors: Keren Censor-Hillel, Tomer Even, and Virginia Vassilevska Williams

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We consider the problem of approximate counting of triangles and longer fixed length cycles in directed graphs. For triangles, Tětek [ICALP'22] gave an algorithm that returns a (1±ε)-approximation in Õ(n^ω/t^{ω-2}) time, where t is the unknown number of triangles in the given n node graph and ω < 2.372 is the matrix multiplication exponent. We obtain an improved algorithm whose running time is, within polylogarithmic factors the same as that for multiplying an n× n/t matrix by an n/t × n matrix. We then extend our framework to obtain the first nontrivial (1± ε)-approximation algorithms for the number of h-cycles in a graph, for any constant h ≥ 3. Our running time is Õ(MM(n,n/t^{1/(h-2)},n)), the time to multiply n × n/(t^{1/(h-2)}) by n/(t^{1/(h-2)) × n matrices. Finally, we show that under popular fine-grained hypotheses, this running time is optimal.

Cite as

Keren Censor-Hillel, Tomer Even, and Virginia Vassilevska Williams. Fast Approximate Counting of Cycles. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 37:1-37:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{censorhillel_et_al:LIPIcs.ICALP.2024.37,
  author =	{Censor-Hillel, Keren and Even, Tomer and Vassilevska Williams, Virginia},
  title =	{{Fast Approximate Counting of Cycles}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{37:1--37:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.37},
  URN =		{urn:nbn:de:0030-drops-201809},
  doi =		{10.4230/LIPIcs.ICALP.2024.37},
  annote =	{Keywords: Approximate triangle counting, Approximate cycle counting Fast matrix multiplication, Fast rectangular matrix multiplication}
}

Document

DOI: 10.4230/LIPIcs.OPODIS.2022.28

Self-Stabilizing Clock Synchronization in Dynamic Networks

Authors: Bernadette Charron-Bost and Louis Penet de Monterno

Published in: LIPIcs, Volume 253, 26th International Conference on Principles of Distributed Systems (OPODIS 2022)

Abstract

We consider the fundamental problem of periodic clock synchronization in a synchronous multi-agent system. Each agent holds a clock with an arbitrary initial value, and clocks must eventually be congruent, modulo some positive integer P. Previous algorithms worked in static networks with drastic connectivity properties and assumed that global informations are available at each node. In this paper, we propose a finite-state algorithm for time-varying topologies that does not require any global knowledge on the network. The only assumption is the existence of some integer D such that any two nodes can communicate in each sequence of D consecutive rounds, which extends the notion of strong connectivity in static network to dynamic communication patterns. The smallest such D is called the dynamic diameter of the network. If an upper bound on the diameter is provided, then our algorithm achieves synchronization within 3D rounds, whatever the value of the upper bound. Otherwise, using an adaptive mechanism, synchronization is achieved with little performance overhead. Our algorithm is parameterized by a function g, which can be tuned to favor either time or space complexity. Then, we explore a further relaxation of the connectivity requirement: our algorithm still works if there exists a positive integer R such that the network is rooted over each sequence of R consecutive rounds, and if eventually the set of roots is stable. In particular, it works in any rooted static network.

Cite as

Bernadette Charron-Bost and Louis Penet de Monterno. Self-Stabilizing Clock Synchronization in Dynamic Networks. In 26th International Conference on Principles of Distributed Systems (OPODIS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 253, pp. 28:1-28:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{charronbost_et_al:LIPIcs.OPODIS.2022.28,
  author =	{Charron-Bost, Bernadette and Penet de Monterno, Louis},
  title =	{{Self-Stabilizing Clock Synchronization in Dynamic Networks}},
  booktitle =	{26th International Conference on Principles of Distributed Systems (OPODIS 2022)},
  pages =	{28:1--28:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-265-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{253},
  editor =	{Hillel, Eshcar and Palmieri, Roberto and Rivi\`{e}re, Etienne},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2022.28},
  URN =		{urn:nbn:de:0030-drops-176480},
  doi =		{10.4230/LIPIcs.OPODIS.2022.28},
  annote =	{Keywords: Self-stabilization, Clock synchronization, Dynamic networks}
}

Document

DOI: 10.4230/LIPIcs.FUN.2018.10

Kings, Name Days, Lazy Servants and Magic

Authors: Paolo Boldi and Sebastiano Vigna

Published in: LIPIcs, Volume 100, 9th International Conference on Fun with Algorithms (FUN 2018)

Abstract

Once upon a time, a king had a very, very long list of names of his subjects. The king was also a bit obsessed with name days: every day he would ask his servants to look the list for all persons having their name day. Reading every day the whole list was taking an enormous amount of time to the king's servants. One day, the chancellor had a magnificent idea: he wrote a book with instructions. The number of pages in the book was equal to the number of names, but following the instructions one could find all people having their name day by looking at only a few pages - in fact, as many pages as the length of the name - and just glimpsing at the list. Everybody was happy, but in time the king's servants got lazy: when the name was very long they would find excuses to avoid looking at so many pages, and some name days were skipped. Desperate, the king made a call through its reign, and a fat sorceress answered. There was a way to look at much, much fewer pages using an additional magic book. But sometimes, very rarely, it would not work (magic does not always work). The king accepted the offer, and name days parties restarted. Only, once every a few thousand years, the magic book fails, and the assistants have to go by the chancellor book. So the parties start a bit later. But they start anyway.

Cite as

Paolo Boldi and Sebastiano Vigna. Kings, Name Days, Lazy Servants and Magic. In 9th International Conference on Fun with Algorithms (FUN 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 100, pp. 10:1-10:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{boldi_et_al:LIPIcs.FUN.2018.10,
  author =	{Boldi, Paolo and Vigna, Sebastiano},
  title =	{{Kings, Name Days, Lazy Servants and Magic}},
  booktitle =	{9th International Conference on Fun with Algorithms (FUN 2018)},
  pages =	{10:1--10:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-067-5},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{100},
  editor =	{Ito, Hiro and Leonardi, Stefano and Pagli, Linda and Prencipe, Giuseppe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FUN.2018.10},
  URN =		{urn:nbn:de:0030-drops-88017},
  doi =		{10.4230/LIPIcs.FUN.2018.10},
  annote =	{Keywords: Suffix trees, suffix arrays, z-fast tries, prefix search}
}

Document

DOI: 10.4230/DagSemProc.07071.3

A Deeper Investigation of PageRank as a Function of the Damping Factor

Authors: Paolo Boldi, Massimo Santini, and Sebastiano Vigna

Published in: Dagstuhl Seminar Proceedings, Volume 7071, Web Information Retrieval and Linear Algebra Algorithms (2007)

Abstract

PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor $alpha$ that spreads uniformly part of the rank. The choice of $alpha$ is eminently empirical, and in most cases the original suggestion $alpha=0.85$ by Brin and Page is still used. In this paper, we give a mathematical analysis of PageRank when $alpha$ changes. In particular, we show that, contrarily to popular belief, for real-world graphs values of $alpha$ close to $1$ do not give a more meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of any order, and by proving that the $k$-th iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree $k$, we show how to obtain an approximation of the derivatives. Finally, we view PageRank as a linear operator acting on the preference vector and show a tight connection between iterated computation and derivation.

Cite as

Paolo Boldi, Massimo Santini, and Sebastiano Vigna. A Deeper Investigation of PageRank as a Function of the Damping Factor. In Web Information Retrieval and Linear Algebra Algorithms. Dagstuhl Seminar Proceedings, Volume 7071, pp. 1-19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{boldi_et_al:DagSemProc.07071.3,
  author =	{Boldi, Paolo and Santini, Massimo and Vigna, Sebastiano},
  title =	{{A Deeper Investigation of PageRank as a Function of the Damping Factor}},
  booktitle =	{Web Information Retrieval and Linear Algebra Algorithms},
  pages =	{1--19},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{7071},
  editor =	{Andreas Frommer and Michael W. Mahoney and Daniel B. Szyld},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.07071.3},
  URN =		{urn:nbn:de:0030-drops-10722},
  doi =		{10.4230/DagSemProc.07071.3},
  annote =	{Keywords: PageRank, damping factor, Markov chains}
}

8 Search Results for "Boldi, Paolo"

PHOBIC: Perfect Hashing With Optimized Bucket Sizes and Interleaved Coding

Abstract

Cite as

Engineering Edge Orientation Algorithms

Abstract

Cite as

Engineering Zuffix Arrays

Abstract

Cite as

Buffered Streaming Edge Partitioning

Abstract

Cite as

Fast Approximate Counting of Cycles

Abstract

Cite as

Self-Stabilizing Clock Synchronization in Dynamic Networks

Abstract

Cite as

Kings, Name Days, Lazy Servants and Magic

Abstract

Cite as

A Deeper Investigation of PageRank as a Function of the Damping Factor

Abstract

Cite as

Thanks for your feedback!

Could not send message