Document

**Published in:** LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Wheeler automata were introduced in 2017 as a tool to generalize existing indexing and compression techniques based on the Burrows-Wheeler transform. Intuitively, an automaton is said to be Wheeler if there exists a total order on its states reflecting the natural co-lexicographic order of the strings labeling the automaton’s paths; this property makes it possible to represent the automaton’s topology in a constant number of bits per transition, as well as efficiently solving pattern matching queries on its accepted regular language. After their introduction, Wheeler automata have been the subject of a prolific line of research, both from the algorithmic and language-theoretic points of view. A recurring issue faced in these studies is the lack of large datasets of Wheeler automata on which the developed algorithms and theories could be tested. One possible way to overcome this issue is to generate random Wheeler automata. Motivated by this observation of practical nature, in this paper we initiate the theoretical study of random Wheeler automata, focusing our attention on the deterministic case (Wheeler DFAs - WDFAs). We start by naturally extending the Erdős-Rényi random graph model to WDFAs, and proceed by providing an algorithm generating uniform WDFAs according to this model. Our algorithm generates a uniform WDFA with n states, m transitions, and alphabet’s cardinality σ in O(m) expected time (O(mlog m) time w.h.p.) and constant working space for all alphabets of size σ ≤ m/ln m. The output WDFA is streamed directly to the output. As a by-product, we also give formulas for the number of distinct WDFAs and obtain that nσ + (n - σ) log σ bits are necessary and sufficient to encode a WDFA with n states and alphabet of size σ, up to an additive Θ(n) term. We present an implementation of our algorithm and show that it is extremely fast in practice, with a throughput of over 8 million transitions per second.

Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Riccardo Maso, and Nicola Prezza. Random Wheeler Automata. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.CPM.2024.5, author = {Becker, Ruben and Cenzato, Davide and Kim, Sung-Hwan and Kodric, Bojana and Maso, Riccardo and Prezza, Nicola}, title = {{Random Wheeler Automata}}, booktitle = {35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)}, pages = {5:1--5:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-326-3}, ISSN = {1868-8969}, year = {2024}, volume = {296}, editor = {Inenaga, Shunsuke and Puglisi, Simon J.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.5}, URN = {urn:nbn:de:0030-drops-201157}, doi = {10.4230/LIPIcs.CPM.2024.5}, annote = {Keywords: Wheeler automata, Burrows-Wheeler transform, random graphs} }

Document

RANDOM

**Published in:** LIPIcs, Volume 275, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)

A temporal graph is a graph whose edges appear only at certain points in time. In these graphs, reachability among nodes relies on paths that traverse edges in chronological order (temporal paths). Unlike standard paths, temporal paths may not be composable, thus the reachability relation is not transitive and connected components (i.e., sets of pairwise temporally connected nodes) do not form equivalence classes, a fact with far-reaching consequences.
Recently, Casteigts et al. [FOCS 2021] proposed a natural temporal analog of the seminal Erdős-Rényi random graph model, based on the same parameters n and p. The proposed model is obtained by randomly permuting the edges of an Erdős-Rényi random graph and interpreting this permutation as an ordering of presence times. Casteigts et al. showed that the well-known single threshold for connectivity in the Erdős-Rényi model fans out into multiple phase transitions for several distinct notions of reachability in the temporal setting.
The second most basic phenomenon studied by Erdős and Rényi in static (i.e., non-temporal) random graphs is the emergence of a giant connected component. However, the existence of a similar phase transition in the temporal model was left open until now. In this paper, we settle this question. We identify a sharp threshold at p = log n/n, where the size of the largest temporally connected component increases from o(n) to n-o(n) nodes. This transition occurs significantly later than in the static setting, where a giant component of size n-o(n) already exists for any p ∈ ω(1/n) (i.e., as soon as p is larger than a constant fraction of n). Interestingly, the threshold that we obtain holds for both open and closed connected components, i.e., components that allow, respectively forbid, their connecting paths to use external nodes - a distinction arising from the absence of transitivity.
We achieve these results by strengthening the tools from Casteigts et al. and developing new techniques that provide means to decouple dependencies between past and future events in temporal Erdős-Rényi graphs, which could be of general interest in future investigations of these objects.

Ruben Becker, Arnaud Casteigts, Pierluigi Crescenzi, Bojana Kodric, Malte Renken, Michael Raskin, and Viktor Zamaraev. Giant Components in Random Temporal Graphs. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 275, pp. 29:1-29:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.APPROX/RANDOM.2023.29, author = {Becker, Ruben and Casteigts, Arnaud and Crescenzi, Pierluigi and Kodric, Bojana and Renken, Malte and Raskin, Michael and Zamaraev, Viktor}, title = {{Giant Components in Random Temporal Graphs}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)}, pages = {29:1--29:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-296-9}, ISSN = {1868-8969}, year = {2023}, volume = {275}, editor = {Megow, Nicole and Smith, Adam}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2023.29}, URN = {urn:nbn:de:0030-drops-188542}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2023.29}, annote = {Keywords: random temporal graph, Erd\H{o}s–R\'{e}nyi random graph, sharp threshold, temporal connectivity, temporal connected component, edge-ordered graph} }

Document

**Published in:** LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Wheeler nondeterministic finite automata (WNFAs) were introduced in (Gagie et al., TCS 2017) as a powerful generalization of prefix sorting from strings to labeled graphs. WNFAs admit optimal solutions to classic hard problems on labeled graphs and languages such as compression and regular expression matching. The problem of deciding whether a given NFA is Wheeler is known to be NP-complete (Gibney and Thankachan, ESA 2019). Recently, however, Alanko et al. (Information and Computation 2021) showed how to side-step this complexity by switching to preorders: letting Q be the set of states and δ the set of transitions, they provided a O(|δ|⋅|Q|²)-time algorithm computing a totally-ordered partition (i.e. equivalence relation) of the WNFA’s states such that (1) equivalent states recognize the same regular language, and (2) the order of (the classes of) non-equivalent states is consistent with any Wheeler order, when one exists. As a result, the output is a preorder of the states as useful for pattern matching as standard Wheeler orders.
Further extensions of this line of work (Cotumaccio et al., SODA 2021 and DCC 2022) generalized these concepts to arbitrary NFAs by introducing co-lex partial preorders: in general, any NFA admits a partial preorder of its states reflecting the co-lexicographic order of their accepted strings; the smaller the width of such preorder is, the faster regular expression matching queries can be performed. To date, the fastest algorithm for computing the smallest-width partial preorder on NFAs runs in O(|δ|² + |Q|^{5/2}) time (Cotumaccio, DCC 2022), while on DFAs the same task can be accomplished in O(min(|Q|²log|Q|, |δ|⋅|Q|)) time (Kim et al., CPM 2023).
In this paper, we provide much more efficient solutions to the co-lex order computation problem. Our results are achieved by extending a classic algorithm for the relational coarsest partition refinement problem of Paige and Tarjan to work with ordered partitions. More specifically, we provide a O(|δ|log|Q|)-time algorithm computing a co-lex total preorder when the input is a Wheeler NFA, and an algorithm with the same time complexity computing the smallest-width co-lex partial order of any DFA. In addition, we present implementations of our algorithms and show that they are very efficient also in practice.

Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, and Nicola Prezza. Sorting Finite Automata via Partition Refinement. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.ESA.2023.15, author = {Becker, Ruben and C\'{a}ceres, Manuel and Cenzato, Davide and Kim, Sung-Hwan and Kodric, Bojana and Olivares, Francisco and Prezza, Nicola}, title = {{Sorting Finite Automata via Partition Refinement}}, booktitle = {31st Annual European Symposium on Algorithms (ESA 2023)}, pages = {15:1--15:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-295-2}, ISSN = {1868-8969}, year = {2023}, volume = {274}, editor = {G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.15}, URN = {urn:nbn:de:0030-drops-186684}, doi = {10.4230/LIPIcs.ESA.2023.15}, annote = {Keywords: Wheeler automata, prefix sorting, pattern matching, graph compression, sorting, partition refinement} }

Document

**Published in:** LIPIcs, Volume 265, 21st International Symposium on Experimental Algorithms (SEA 2023)

Identifying influential nodes in a network is arguably one of the most important tasks in graph mining and network analysis. A large variety of centrality measures, all aiming at correctly quantifying a node’s importance in the network, have been formulated in the literature. One of the most cited ones is the betweenness centrality, formally introduced by Freeman (Sociometry, 1977). On the other hand, researchers have recently been very interested in capturing the dynamic nature of real-world networks by studying temporal graphs, rather than static ones. Clearly, centrality measures, including the betweenness centrality, have also been extended to temporal graphs. Buß et al. (KDD, 2020) gave algorithms to compute various notions of temporal betweenness centrality, including the perhaps most natural one - shortest temporal betweenness. Their algorithm computes centrality values of all nodes in time O(n³ T²), where n is the size of the network and T is the total number of time steps. For real-world networks, which easily contain tens of thousands of nodes, this complexity becomes prohibitive. Thus, it is reasonable to consider proxies for shortest temporal betweenness rankings that are more efficiently computed, and, therefore, allow for measuring the relative importance of nodes in very large temporal graphs. In this paper, we compare several such proxies on a diverse set of real-world networks. These proxies can be divided into global and local proxies. The considered global proxies include the exact algorithm for static betweenness (computed on the underlying graph), prefix foremost temporal betweenness of Buß et al., which is more efficiently computable than shortest temporal betweenness, and the recently introduced approximation approach of Santoro and Sarpe (WWW, 2022). As all of these global proxies are still expensive to compute on very large networks, we also turn to more efficiently computable local proxies. Here, we consider temporal versions of the ego-betweenness in the sense of Everett and Borgatti (Social Networks, 2005), standard degree notions, and a novel temporal degree notion termed the pass-through degree, that we introduce in this paper and which we consider to be one of our main contributions. We show that the pass-through degree, which measures the number of pairs of neighbors of a node that are temporally connected through it, can be computed in nearly linear time for all nodes in the network and we experimentally observe that it is surprisingly competitive as a proxy for shortest temporal betweenness.

Ruben Becker, Pierluigi Crescenzi, Antonio Cruciani, and Bojana Kodric. Proxying Betweenness Centrality Rankings in Temporal Networks. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 6:1-6:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.SEA.2023.6, author = {Becker, Ruben and Crescenzi, Pierluigi and Cruciani, Antonio and Kodric, Bojana}, title = {{Proxying Betweenness Centrality Rankings in Temporal Networks}}, booktitle = {21st International Symposium on Experimental Algorithms (SEA 2023)}, pages = {6:1--6:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-279-2}, ISSN = {1868-8969}, year = {2023}, volume = {265}, editor = {Georgiadis, Loukas}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2023.6}, URN = {urn:nbn:de:0030-drops-183568}, doi = {10.4230/LIPIcs.SEA.2023.6}, annote = {Keywords: node centrality, betweenness, temporal graphs, graph mining} }

Document

**Published in:** LIPIcs, Volume 151, 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)

In many models for large-scale computation, decomposition of the problem is key to efficient algorithms. For distance-related graph problems, it is often crucial that such a decomposition results in clusters of small diameter, while the probability that an edge is cut by the decomposition scales linearly with the length of the edge. There is a large body of literature on low diameter graph decomposition with small edge cutting probabilities, with all existing techniques heavily building on single source shortest paths (SSSP) computations. Unfortunately, in many theoretical models for large-scale computations, the SSSP task constitutes a complexity bottleneck. Therefore, it is desirable to replace exact SSSP computations with approximate ones. However this imposes a fundamental challenge since the existing constructions of low diameter graph decomposition with small edge cutting probabilities inherently rely on the subtractive form of the triangle inequality, which fails to hold under distance approximation.
The current paper overcomes this obstacle by developing a technique termed blurry ball growing. By combining this technique with a clever algorithmic idea of Miller et al. (SPAA 2013), we obtain a construction of low diameter decompositions with small edge cutting probabilities which replaces exact SSSP computations by (a small number of) approximate ones. The utility of our approach is showcased by deriving efficient algorithms that work in the CONGEST, PRAM, and semi-streaming models of computation. As an application, we obtain metric tree embedding algorithms in the vein of Bartal (FOCS 1996) whose computational complexities in these models are optimal up to polylogarithmic factors. Our embeddings have the additional useful property that the tree can be mapped back to the original graph such that each edge is "used" only logaritmically many times, which is of interest for capacitated problems and simulating CONGEST algorithms on the tree into which the graph is embedded.

Ruben Becker, Yuval Emek, and Christoph Lenzen. Low Diameter Graph Decompositions by Approximate Distance Computation. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 50:1-50:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.ITCS.2020.50, author = {Becker, Ruben and Emek, Yuval and Lenzen, Christoph}, title = {{Low Diameter Graph Decompositions by Approximate Distance Computation}}, booktitle = {11th Innovations in Theoretical Computer Science Conference (ITCS 2020)}, pages = {50:1--50:29}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-134-4}, ISSN = {1868-8969}, year = {2020}, volume = {151}, editor = {Vidick, Thomas}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2020.50}, URN = {urn:nbn:de:0030-drops-117355}, doi = {10.4230/LIPIcs.ITCS.2020.50}, annote = {Keywords: graph decompositions, metric tree embeddings, distributed graph algorithms, parallel graph algorithms, (semi-)streaming graph algorithms} }

Document

**Published in:** LIPIcs, Volume 146, 33rd International Symposium on Distributed Computing (DISC 2019)

Given an undirected graph with integer edge lengths, we study the problem of approximating the distances in the graph by a spanning tree based on the notion of stretch. Our main contribution is a distributed algorithm in the CONGEST model of computation that constructs a random spanning tree with the guarantee that the expected stretch of every edge is O(log^{3} n), where n is the number of nodes in the graph. If the graph is unweighted, then this algorithm can be implemented to run in O(D) rounds, where D is the hop-diameter of the graph, thus being asymptotically optimal. In the weighted case, the run-time of our algorithm matches the currently best known bound for exact distance computations, i.e., O~ (min{sqrt{n D}, sqrt{n} D^{1 / 4} + n^{3 / 5} + D}). We stress that this is the first distributed construction of spanning trees leading to poly-logarithmic expected stretch with non-trivial running time.

Ruben Becker, Yuval Emek, Mohsen Ghaffari, and Christoph Lenzen. Distributed Algorithms for Low Stretch Spanning Trees. In 33rd International Symposium on Distributed Computing (DISC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 146, pp. 4:1-4:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.DISC.2019.4, author = {Becker, Ruben and Emek, Yuval and Ghaffari, Mohsen and Lenzen, Christoph}, title = {{Distributed Algorithms for Low Stretch Spanning Trees}}, booktitle = {33rd International Symposium on Distributed Computing (DISC 2019)}, pages = {4:1--4:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-126-9}, ISSN = {1868-8969}, year = {2019}, volume = {146}, editor = {Suomela, Jukka}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2019.4}, URN = {urn:nbn:de:0030-drops-113116}, doi = {10.4230/LIPIcs.DISC.2019.4}, annote = {Keywords: distributed graph algorithms, low-stretch spanning trees, CONGEST model, ball decomposition, star decomposition} }

Document

**Published in:** LIPIcs, Volume 91, 31st International Symposium on Distributed Computing (DISC 2017)

We present a method for solving the shortest transshipment problem - also known as uncapacitated minimum cost flow - up to a multiplicative error of (1 + epsilon) in undirected graphs with non-negative integer edge weights using a tailored gradient descent algorithm. Our gradient descent algorithm takes epsilon^(-3) polylog(n) iterations, and in each iteration it needs to solve an instance of the transshipment problem up to a multiplicative error of polylog(n), where n is the number of nodes. In particular, this allows us to perform a single iteration by computing a solution on a sparse spanner of logarithmic stretch. Using a careful white-box analysis, we can further extend the method to finding approximate solutions for the single-source shortest paths (SSSP) problem. As a consequence, we improve prior work by obtaining the following results:
(1) Broadcast CONGEST model: (1 + epsilon)-approximate SSSP using ~O((sqrt(n) + D) epsilon^(-O(1))) rounds, where D is the (hop) diameter of the network.
(2) Broadcast congested clique model: (1 + epsilon)-approximate shortest transshipment and SSSP using ~O(epsilon^(-O(1))) rounds.
(3) Multipass streaming model: (1 + epsilon)-approximate shortest transshipment and SSSP using ~O(n) space and ~O(epsilon^(-O(1))) passes.
The previously fastest SSSP algorithms for these models leverage sparse hop sets. We bypass the hop set construction; computing a spanner is sufficient with our method. The above bounds assume non-negative integer edge weights that are polynomially bounded in n; for general non-negative weights, running times scale with the logarithm of the maximum ratio between non-zero weights. In case of asymmetric costs for traversing an edge in opposite directions, running times scale with the maximum ratio between the costs of both directions over all edges.

Ruben Becker, Andreas Karrenbauer, Sebastian Krinninger, and Christoph Lenzen. Near-Optimal Approximate Shortest Paths and Transshipment in Distributed and Streaming Models. In 31st International Symposium on Distributed Computing (DISC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 91, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.DISC.2017.7, author = {Becker, Ruben and Karrenbauer, Andreas and Krinninger, Sebastian and Lenzen, Christoph}, title = {{Near-Optimal Approximate Shortest Paths and Transshipment in Distributed and Streaming Models}}, booktitle = {31st International Symposium on Distributed Computing (DISC 2017)}, pages = {7:1--7:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-053-8}, ISSN = {1868-8969}, year = {2017}, volume = {91}, editor = {Richa, Andr\'{e}a}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2017.7}, URN = {urn:nbn:de:0030-drops-80031}, doi = {10.4230/LIPIcs.DISC.2017.7}, annote = {Keywords: Shortest Paths, Shortest Transshipment, Undirected Min-cost Flow, Gradient Descent, Spanner} }