A distributed algorithm for directed minimum-weight spanning tree

In the directed minimum spanning tree problem (DMST, also called minimum weight arborescence), we are given a directed weighted graph, and a root node r. Our goal is to construct a minimum-weight directed spanning tree, rooted at r and oriented outwards. We present the first sub-quadratic DMST algorithm in the distributed CONGEST\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {CONGEST}}$$\end{document} network model, where the messages exchanged between the network nodes are bounded in size. We consider three versions of the model: a network where the communication links are bidirectional but can have different weights in the two directions; a network where communication is unidirectional; and the Congested Clique model, where all nodes can communicate directly with each other. Our DMST algorithm is based on a variant of Lovász’ DMST algorithm for the PRAM model, and uses a distributed single-source shortest-path (SSSP) algorithm for directed graphs as a black box. In the bidirectional CONGEST\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {CONGEST}}$$\end{document} model, our algorithm has roughly the same running time as the SSSP algorithm that is used as a black box; using the state-of-the-art SSSP algorithm due to Chechik and Mukhtar (in: Symposium on principles of distributed computing (PODC), ACM, 2020, pp 464–473), we obtain a running time of O~(nD1/4+D))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{O}}(\sqrt{n}D^{1/4}+D))$$\end{document} rounds for the bidirectional communication case. For the unidirectional communication model we give an O~(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{O}}(n)$$\end{document} algorithm, and show that it is nearly optimal. And finally, for the Congested Clique, our algorithm again matches the best known SSSP algorithm: it runs in O~(n1/3)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{O}}(n^{1/3})$$\end{document} rounds. On the negative side, we adapt an observation of Chechik in the sequential setting to show that in all three models, the DMST problem is at least as hard as the (s, t)-shortest path problem. Thus, in terms of round complexity, distributed DMST lies between single-source shortest path and (s, t)-shortest path.


Introduction
Finding a lightweight spanning subgraph of a network is among the most fundamental problems in distributed computing. The classical example is the minimum-weight spanning tree (MST) problem, which has received extensive attention: its round complexity in the CONGEST model has been tightly characterized in a series of papers [11][12][13]17,18,20,27,32,33]. Generalizations, including minimum-weight Research 1 Computer Science Department, Tel-Aviv University, Tel-Aviv, Israel k-vertex-connected and k-edge connected subgraph, have also been studied (e.g., [9,38]).
To date, almost all distributed algorithms for MST and related problems have been for undirected graphs, with symmetric edge weights. However, in many settings, the cost associated with an edge is not necessarily symmetric: for example, in a wireless network the energy required to send a message to a specific node can depend on contention and noise in that node's vicinity, and in peer-to-peer cellular phone mesh networks, the price of communicating across a given link in each direction could be dictated by market forces. If we have a single node that needs to repeatedly broadcast to the entire network, or to collect information from the entire network, can we quickly find a low-cost spanning tree-oriented downwards or upwards-allowing it to do so?
The directed minimum-weight spanning tree (DMST) problem asks exactly this question: we have a weighted directed graph G = (V , E, w), where edge weights are not necessarily symmetric, and a fixed root node r ∈ V . Our goal is to construct a minimum-weight directed spanning tree, rooted at r and oriented downwards (or upwards).
Although the DMST problem has been extensively studied in the sequential setting [3,10,16,26,29,30,37], to date there has not been a distributed solution for DMST that runs quickly and does not use a lot of communication. In fact, prior to our work, no non-trivial (i.e., sub-quadratic) algorithm for the CONGEST model was known. In this paper we give distributed DMST algorithms for three variants of the CONGEST model: (a) undirected communication networks with asymmetric edge weights; (b) directed communication networks; and finally, (c) the Congested Clique model, where the communication network is the complete graph, but the edges have asymmetric weights.
In terms of running time, it is known that undirected MST requires Θ( √ n + D) rounds [11,36]; since undirected MST is a special case of DMST, we cannot hope to solve the DMST problem more efficiently. In some settings, such as sequential dynamic graph algorithms, it is believed that DMST is significantly harder than undirected MST (see Sect. 9), but surprisingly, we show that when the underlying communication network is bidirectional and has small diameter, distributed DMST is not significantly harder than MST. In fact, we show that in undirected networks, DMST essentially "reduces" to directed single-source shortest path (SSSP), so that up to a logarithmic factor, its round complexity is bounded from above by the running time of the best SSSP algorithm that can handle asymmetric weights (currently [5]). On the other hand, we show that DMST is no easier than (s, t)-shortest path-this is already known in the sequential setting, and we adapt the proof for the sequential setting to show that it also holds in all three variants of the CONGEST model. Therefore, DMST's round complexity is sandwiched between SSSP and (s, t)-shortest path.
Background. The best sequential algorithm for DMST is Gabow et. al.'s implementation of Edmonds' algorithm [10,16]. It performs a series of contractions, where every non-root vertex v = r deducts the weight of its minimumweight incoming edge from all its incoming edges, and then each zero-weight directed cycle is contracted into a single vertex. Eventually, we are left with a zero weight tree; the weight of the DMST is then given by the sum of all the weights deducted during the algorithm's run. Actually finding the DMST is non-trivial, and requires recursively undoing the contractions and carefully adding edges to the DMST at each step. (Counter-intuitively, it is not true that the lightest incoming edge of any given node is in the DMST-in fact, even the lightest edge in the entire graph might not be in the DMST; this is one aspect that distinguishes the directed MST problem from the undirected version.) The drawback of Edmonds' algorithm in a parallel setting is that it may require up to n−1 contractions to terminate, and the contractions are not easy to parallelize. In [28], Lovász gave a PRAM algorithm that "speeds up" this process, and contracts the entire graph in O(log n) parallel steps. Unfortunately, Lovász' algorithm cannot be implemented efficiently as-is in the CONGEST model, for several reasons-including the fact that it uses all-pairs shortest path (APSP) as a subroutine (APSP requires near-linear time in CONGEST [2,31]), and that certain steps of the algorithm would lead to too much congestion if we try to implement them in CONGEST. We modify Lovász' algorithm to obtain a variant that lends itself to an efficient distributed implementation, and then give implementations for the three variants of CONGEST. The implementation overcomes several challenges that are not encountered in undirected MST, such as the fact that in each step we need to run SSSP inside many disjoint subgraphs, but each component can have a diameter that is much larger than the diameter of the network as a whole. If we called SSSP directly inside each component, our running time would depend on the largest diameter encountered during the run, which could be linear in the worst case. We show how to overcome this difficulty in Sect. 6.

Theorem 2 In the undirected broadcast CONGEST model with asymmetric weights, there is a randomized DMST algorithm that always succeeds, and requires O( √ n D 1/4 + D)) rounds in expectation.
For small diameter networks, D = O(polylog(n)), our algorithm is optimal up to polylogarithmic factors, and nearly matches the lower bound even for undirected MST [36]. For larger diameter, we can also write the running time as O(n 2/3 + D), a slightly weaker bound than the one stated in Theorem 2 (which is easily seen by considering the two cases D > n 2/3 and D ≤ n 2/3 ).
A similar result holds for the Congested Clique. At present, the best SSSP algorithm for that model runs in O(n 1/3 ) rounds [4], and so we obtain an O(n 1/3 )-round DMST algorithm for the Congested Clique.
For the directed communication model, we give a deterministic algorithm with running time O(n), and we show that this is tight (up to a logarithmic factor). The algorithm and the lower bound assume that the weight of each edge (u, v) is known only to its destination v, and that G is strongly connected.
As Theorem 2 shows, in the undirected CONGEST model the DMST problem is no harder than single-source shortest path. Is the converse true? For the sequential setting, this is conjectured to hold, and Chechick [6] showed that DMST is at least as hard as the (s, t)-shortest path problem. We give a reduction that allows the proof from [6] to work in the distributed setting, showing that DMST is no easier than (s, t)-shortest path in all three distributed models we consider. (To date, the best known bounds on the complexity of the (s, t)-shortest path problem and the single-source shortest path problem in CONGEST are the same.) Organization. Since our DMST algorithm is somewhat involved, we present it in a top-down manner, with increasing level of detail. The paper is organized as follows: in Sect. 4, we present an overview of our meta-algorithm, and Edmonds' and Lovász' algorithms, on which it is based. In Sect. 5 we give a formal description of the meta-algorithm and prove its correctness. Next, in Sect. 6, we show how to implement the meta-algorithm in the bidirectional CONGEST model efficiently in order to compute the weight of the DMST, and in Sect. 7 we show how to find the edges of the DMST as well (i.e., not just the weight). In Sect. 8 we give implementations of the meta-algorithm for the CONGEST model with directed communication, and for the Congested Clique. These implementations are simpler than the one for bidirectional CONGEST.
Next we turn our attention to lower bounds. In Sect. 9, we adapt an observation by [6] to show that DMST is at least as hard as the (s, t)-shortest path problem ((s, t)-SP).
In Sect. 10 we give a simple construction that yields a lower bound of Ω(n) for the directed CONGEST model, assuming edge weights are known only to the target node.
In Appendix A we detail some basic procedures used in the algorithm, and in Appendix B we give pseudo-code for the algorithm.

Related work
Minimum-weight spanning tree is one of the most fundamental graph problems, and it has been extensively studied in the CONGEST model, e.g., in [11][12][13]17,18,20,21,27,32,33]. In particular, Ghaffari et. al. [18] gave a simple MST algorithm using a framework called low-congestion shortcuts. This framework formalizes several techniques that appear in many distributed MST algorithms mentioned above, and are used to handle connected components that grow too large for their nodes to communicate with each other directly. We use several of these ideas in our DMST algorithm. Our algorithm also uses a procedure very similar to one used in [11,21] to deterministically decompose a directed tree into few components with relatively small diameter.
For the directed MST problem (DMST), Humblet [23] gave a distributed O(n 2 )-round algorithm with message complexity O(n 2 ). To our knowledge, ours is the first DMST algorithm for CONGEST that has round complexity better than the trivial O(n 2 ) implementation.
It is known that undirected MST requires Ω( √ n + D) rounds, and this lower bound holds even if we only ask for an algorithm that approximates the weight of the MST up to a factor of poly(n) [12,35,36]. Since MST is a special case of DMST, the lower bound also applies to the DMST problem.
Being a basic and natural graph problem, DMST has received significant attention in the sequential setting. The first polynomial-time sequential algorithms for DMST was developed by Edmonds [10] (similar results were published independently by [3,26]). A faster implementation of the algorithm was given by Tarjan in [37], building on ideas from [29,30]. The most efficient known implementation of Edmonds' algorithm [10] in the sequential setting is due to [16], with running time O(m + n log n) in graphs of n nodes and m edges.
In [28], Lovász gave a PRAM algorithm for DMST, which requires O(log n) parallel rounds and poly(n) processors to process graphs of n nodes. Lovász' algorithm uses all-pairs shortest-paths (APSP) as a subroutine, but APSP (or even just computing the diameter of the graph) requires Ω(n) rounds in CONGEST [15]). We modify the algorithm, making its steps less "eager", so that potentially fewer nodes are contracted in each step; this allows us to both reduce the congestion in the network and replace the call to APSP with a call to single-source shortest path (SSSP), yielding a sublinear-round implementation in bidirectional CONGEST.
Since we are concerned with directed graphs (or undirected graphs with asymmetric edge weights), we require an SSSP algorithm that can handle such graphs. Recently, several such algorithms were developed [5,14,19]. The best known running time for both directed and undirected graphs is O( √ n D 1/4 + D)) due to Chechik et. al. [5]. There are algorithms for approximating single-source shortest paths, but these are not directly relevant to us here: it is shown in [1] how to compute a (1+ )-approximation in time O(( √ n + D)/ 3 ) for graphs with asymmetric non-negative weights.
In contrast to the general CONGEST model, in the Congested Clique model it is not currently known whether single-source shortest path (SSSP) can be solved faster than all-pairs shortest path (APSP). Fortunately, however, there is a sublinear-time algorithm for APSP: in [4], Censor-Hillel et. al. gave a O(n 1/3 ) APSP algorithm for directed graphs based on algebraic methods, which can handle zero-weight edges. Our DMST implementation for the Congested Clique uses this algorithm. If in the future a more efficient algorithm for APSP is discovered, it can be substituted for the algorithm of [4] as a black box. A more efficient algorithm for SSSP, rather than APSP, could also be used, similar to our implementation for general CONGEST.

Preliminaries
Let G = (V , E, w G ) be a weighted, strongly-connected directed graph, with a special root vertex r ∈ V . Our goal is to construct a directed spanning tree, rooted at r and oriented downwards. (To obtain a tree oriented upwards towards r , we can simply reverse all edge directions, except when working with a network where communication is unidirectional; in this case, we can only compute a spanning tree oriented upwards-as we describe in Sect. 8.) Let W (G) denote the weight of a minimum-weight directed spanning tree rooted at r and oriented downwards in G. When the graph G is clear from the context, we sometimes omit it from our notation, and use w and W for the weight function and the weight of the DMST, respectively.
We assume that edge weights are integers in the range [0, . . . , poly(n)]. Larger weights, or real-valued weights, do not present an inherent problem, assuming they can be handled by the SSSP algorithm that is used as a black box (if the weights require more than O(log n) bits to represent, we may need more than a constant number of rounds to send edge weights, increasing the running time). We can also easily handle negative weights: simply take the largest number k ∈ N such that −k is the weight of some edge in the graph, and add k to the weight of all the graph edges, so that the smallest weight becomes zero. This does not change the DMST, and it adds exactly k · (n − 1) to the DMST's weight, as each non-root node has exactly one incoming edge in the DMST, and can be done in O(D) rounds.
Given a vertex set A ⊆ V , we denote by G(A) the subgraph induced on G by A. We sometimes abuse notation by writing u ∈ H when u is a vertex of a subgraph H . A subgraph H is called weakly-connected if the underlying graph of H , which ignores the directions of the edges, is connected (in the undirected sense).
We use the convention that if (u, v) / ∈ E, then w G (u, v) = ∞. For two nodes u, v, let dist G (u, v) be the weight of the shortest path from u to v according to the weight function w G . Given a subgraph H of G and two nodes u, v ∈ H , we let dist H (u, v) denote the weight of the shortest path using only vertices of H from u to v. We note that in this definition we also consider paths that may include edges of G(H ) which are not in H . For a subgraph H , let ∈ H } be the set of edges entering a vertex of H .
Finding an SSSP tree. When we call an SSSP algorithm with a source node s, it is useful to assume that the algorithm also computes an SSSP-tree: a directed tree T oriented upwards towards the source s, where each node v "points upwards" to the next node on the shortest path from v to s. Several SSSP algorithms in the literature do not explicitly compute such a tree, but it is folklore that if we are given only the distances from all nodes to s, we can find an SSSP tree. For the sake of completeness, we include a proof.

Observation 3
We can assume w.l.o.g. that in an SSSP algorithm for directed graphs with non-negative integer weights, each node also outputs its parent in an SSSP tree in addition to the distances to the source s.
Proof Intuitively, we would like to find an SSSP tree by having each node v choose as its parent some node u such that dist(s, v) = dist(s, u) + w (u, v). Unfortunately, if the graph has zero-weight edges, we might get cycles if we proceed carelessly. We therefore first modify the weight function so that there are no zero-weight edges. Let w be the weight function of the graph, and define w = w·n +1 (that is, for each edge (u, v), we set w (u, v) = w(u, v) · n + 1). We run the SSSP algorithm using w instead of w, and claim that from the output, we can extract both the original distances according to w, and a parent in an SSSP tree (again, according to w).
The distances according to the original weight function w are easily computed: for each path π we have w (π ) = w(π) · n + |π |, where |π | is the number of edges in π . Since a simple path π has 0 ≤ |π | ≤ n − 1, Therefore we have Furthermore, an SSSP tree for w is also an SSSP tree for the original weight function w. We can find the edges of the tree by simply having each node v = s choose as its parent in the tree some node u with dist w (s, v) = dist w (s, u) + w (u, v).
There must exist at least one such node-the next node on a shortest path from s to v-but there may be more than one, in which case v can choose arbitrarily. The edges thus chosen are a directed tree: we choose a total of n − 1 edges, as each node except s chooses a parent; and there cannot be cycles among the edges we choose, because the distances are strictly decreasing on any directed path (recall that w (u, v) ≥ 1 for each edge (u, v)).

The meta-algorithm
In this section we give a high-level overview of our DMST algorithm, which is based on Edmonds' and Lovász' algorithms. As it runs, the algorithm performs contractions, where a set of vertices is merged into one super-vertex. Here we describe the "meta-algorithm" that runs on the graph of super-vertices, and later we will show how this metaalgorithm is implemented on the actual network (where, of course, we cannot merge nodes).
The description in this section is somewhat informal: it is intended to convey the intuition behind the algorithm. In Sect. 5, we give a formal statement of the algorithm as a sequence of graph operations called hard contractions (defined in Sect. 5), and prove that the meta-algorithm correctly computes the weight of the DMST.
We begin by describing Edmonds' DMST algorithm.
The active edges. Throughout its run, the algorithm maintains a set of zero-weight directed edges, denoted H , with the property that every (super-)vertex except the root r has in-degree 1 in H , and the root has in-degree 0 in H . To initialize H , each node v = r chooses a minimumweight incoming edge (u, v), deducts its weight from all incoming edges, and adds (u, v) to H . (If there is more than one incoming edge with the minimum weight, then we choose arbitrarily.) The weakly-connected component of H that contains the root is called the root component. The remaining weaklyconnected components of H are called active components, and denoted H 1 , . . . , H k . Since the in-degree in H is 1 (except for the root vertex, which has no incoming edges), each active component is a directed cycle, with trees rooted at some of the cycle's vertices and oriented outwards (see Fig. 2). We abuse notation by thinking of each H i as both a set of edges and as a graph (the weakly-connected component). We let C(H i ) denote the (unique) directed cycle at the heart of the active component H i .
The following property is helpful when trying to determine which vertices belong to a given active component: if we know some vertex v that lies on the cycle C(H i ), then the vertices of H i are exactly those vertices that are reachable from v along the directed edges of H .

Edmonds' contractions.
Edmonds' algorithm makes a series of steps, where in each step, (1) Each vertex v deducts the weight of its minimum-weight incoming edge from all its incoming edges, and remembers the weight it subtracted. This creates at least one zero-weight incoming edge for vertex v (possibly more than one). (2) Each vertex adds one zero-weight incoming edge to H .
(3) Any newly-created zero-weight directed cycles in H are contracted into a single vertex.
Eventually, we are left with only the root component, on which the H edges induce a directed spanning tree of weight zero. The weight of the DMST is then given by the total weight subtracted by all the nodes during the run. Then, we must "undo" the contractions and compute the edges of the DMST; we defer this part until later, and focus for now on computing the weight of the DMST. The correctness of Edmonds' algorithm relies on two simple but crucial observations, which together assert that each contraction step described above modifies the weight of the DMST in a predictable way.
The first observation of Edmonds is that reducing the weight of all incoming edges of a single vertex v by α ≥ 0 reduces the weight of the DMST of the graph by exactly α: This holds because every non-root vertex v has exactly one incoming edge (u, v) in the DMST, and so when we deduct α from all of v's incoming edges, the only change that affects the weight of the DMST is that the weight of edge (u, v) is reduced by α.
The second observation is that by contracting zeroweighted strongly-connected components into a single supervertex, we do not change the value of the DMST, as long as all weights are non-negative.
Assume that all edge weights in G are non-negative. Let G be the graph obtained from G by merging A into one new super-vertex, and setting the weights as follows: otherwise.
This holds because once we arrive at any node of A, we can reach all other nodes of A "for free" by taking a zero-weight path, and so we may as well think of A as "one large node" for the purpose of constructing the DMST. Taken together, the two observations imply that each step of Edmonds changes the weight of the DMST as follows: first, each vertex v = r subtracts the weight α v of its minimum-weight incoming edge from all its incoming edges, and this reduces the weight of the DMST by exactly v =r α v ; then, we contract zero-weight cycles, which does not change the weight of the DMST. The algorithm halts when it obtains a zero-weight DMST of the graph (with some vertices contracted), and by the two observations, the weight of the DMST in the original graph is given by the sum of all the weight subtracted by the vertices up to that point.
Each step of Edmonds' algorithm makes progress by contracting at least two vertices. However, a single step of Edmonds' does not necessarily merge each active component with another active component: we might spend many steps contracting nested cycles of inner vertices inside one active component. While each step reduces the number of vertices by at least one, we might require as many as n steps (some of which are spent contracting zero-weight cycles) to contract the entire graph. This may occur even if we attempt to "parallelize" the steps, by contracting the zero-weight cycles of all active components in parallel (see Fig. 1). 1 Lovász' meta-algorithm. Lovász' algorithm can be viewed, somewhat inaccurately, as a way to "jump ahead" and perform in one "mega-step" a series of Edmonds contractions that does merge each active component with at least one other component: roughly speaking, instead of spending a lot of time contracting nested cycles inside an active component, Lovász's algorithm finds the first edge coming in from The "mega-steps" of Lovász's algorithm are difficult to implement in CONGEST: in each step, the algorithm computes all-pairs shortest-paths (APSP, which can be solved efficiently in PRAM), and it finds paths that may cut across many active components, leading to congestion. We give a less eager mechanism for speeding up Edmonds' algorithm, which is very similar to Lovász's algorithm but can be performed in parallel on all the active components in CONGEST; essentially, we show that the steps of Lovász' algorithm can be confined inside the active components without cutting across them, while preserving correctness and the fast running time.
Since our variant of Lovász's algorithm is truly a "spedup" version of Edmonds (unlike Lovász's algorithm itself), we first present our new variant and explain how it relates to Edmonds; then, we detail the steps of Lovász' original algorithm, and show how they compare to our variant.
Our modified meta-algorithm. Our meta-algorithm is obtained from Edmonds by asking: "what contractions would Edmonds' algorithm make inside an active component H i before it adds to H an incoming edge of H i , thereby merging it with another component?" We would like to jump ahead to that point, instead of making the contractions sequentially.
Recall that Edmonds selects at each step the minimumweight incoming edge of a node, adds it to H , and (eventually) contracts the resulting zero-weight cycle. Eventually, this process adds to H an incoming edge of H i , and this is the point we want to find-where H i merges with some other component.
It turns out that as it slowly consumes nodes inside H i and eventually some node outside H i , Edmonds implicitly finds the lightest path from a node outside H i that immedi-ately enters H i , and stays inside H i until it arrives at some node of the cycle C(H i ) (see Fig. 2); i.e., a path of the form u, v 1 , . . . , v k such that Let u, v 1 , . . . , v k be the lightest such path (or an arbitrary one, if there are multiple), and let β be the weight of the path. Then Edmonds performs contractions inside H i , slowly merging the vertices v k , v k−1 , . . . , v 1 into the cycle C(H i ), until eventually it contracts u / ∈ H i into the cycle as well, which causes the active component H i to merge with another component.
As it works its way outwards from the cycle C(H i ), Edmonds progresses not only along the path v k , . . . , v 1 ; it contracts all nodes inside H i that can reach the cycle C(H i ) with paths of increasing weight, until the weight reaches β. To simulate this, our meta-algorithm finds the "entering edge" (u, v 1 ) ∈ In(H i ) and the weight β, and contracts all nodes including node v 1 .
Lovász's algorithm also computes β, but it does it differently: it uses all-pairs shortest-paths to find the shortest distance from any node outside H i to the cycle C(H i ), without insisting that the path have the form we described above. As a result, Lovász's algorithm may find paths that start at a node u / ∈ H i , wander outside H i for a while (along zero-weight edges), enter H i and leave it several times, and eventually enter H i "for good" and go to the cycle C(H i ). Lovász's algorithm then makes a more aggressive contraction, merging all the nodes visited by such a path of weight at most β. This can include any number of nodes both inside H i and outside it; in contrast, the less-eager variant merges only nodes inside H i .
We now give a more formal description. Given an incoming edge (u, v) ∈ In(H i ) (with v ∈ H i and u / ∈ H i ), we define the "entering distance to the cycle" associated with (u, v) as follows: be the "minimum entering distance" associated with H i . This is the weight that would be deducted by Edmonds' algorithm in all the contractions internal to H i , plus the first step that connects H i to another active component. In each step, our algorithm finds-in parallel for each active component H i -an edge e i = (u, v) that has β(u, v) = β i (that is, an edge that minimizes β(u, v)). Then, we contract the zero-weight cycle C(H i ), together with all nodes inside H i that have distance at most β (u, v) to the cycle C(H i ). Formally, the set of nodes we contract into one super-vertex is given by We represent a super-vertex as the set of all original graph vertices that were merged into it; merging super-vertices means replacing them by their union. For the new super-vertex S, we update the weights in the contracted graph (in which S is a vertex): otherwise.
It may be helpful to think of the contracted graph as follows: -An incoming edge (x, S) "represents" all paths x, y 0 , . . . , y k that existed prior to the contraction, where y k ∈ C(H i ), and y 0 , . . . , y k ∈ U i have now been merged into the new super-vertex S. Therefore, edge (x, S) is assigned the weight of the lightest such path, -An outgoing edge (S, y) "represents" all paths x 0 , . . . , x k , y that existed prior to the contraction, where x 0 ∈ C(H i ), and x 0 , . . . , x k ∈ U i have now been merged into super-vertex S. Accordingly, edge (S, y) is assigned the weight of the lighest such path. Since every node of H i is reachable from C(H i ) by a zero-weight path of H edges, in the lightest path, the only edge we might need to "pay" for is the last edge, (x k , y), which may (or may not) be an outgoing non-H edge of H i . Thus, the new edge (S, y) is assigned the weight of the lighest outgoing edge (x, y) where x ∈ S. -Finally, we subtract from all incoming edges of S the weight of the lightest incoming edge, β i , just as Edmonds' algorithm does in preparation for the next step.
We store the value β i , to remember that we subtracted it from the weight of the DMST; at the end of the run, the value output by the nodes is the sum of the β i 's throughout the run. Let e i = (u, v) ∈ In(H i ) be the edge that minimizes β(u, v) which was found by our algorithm. This edge is replaced by (u, S), as node v is merged into S. After the contraction, our algorithm adds (u, S) to H , causing H i to merge with the active component to which u belongs (see Fig. 2). Observe that (u, S) is a zero-weight edge, as its weight is given by Termination. Because every active component merges with another active component in each iteration, the number of components is reduced by at least half, and therefore after at most O(log n) parallel iterations, the edge set H has only one weakly-connected component-the root component. At this point, the weight of the DMST is computed by summing all the β i 's that were subtracted during the entire run, and the algorithm terminates.
Comparison with Lovász' original algorithm. Recall that we defined the "minimum entering distance" into active component H i as where is the weight of the shortest path that crosses into H i by taking edge (u, v), then stays inside H i until it reaches the cycle C(H i ).
Lovász' original algorithm does not restrict the paths it considers; for an edge (u, v) ∈ In(H i ), Lovász's algorithm defines the weight of the lightest path starting with the edge (u, v) and eventually arriving at C(H i ), but potentially leaving and entering H i any number of times in between. Let α i = min u / ∈H i ,v∈H i α(u, v) be the "minimum entering distance" associated with the active component H i . In Lovász's algorithm, we compute α i , select a corresponding incoming edge (u, v) with α(u, v) = α i . This is done in [28] by computing all-pairs shortest paths (APSP). Next, we contract the zeroweight cycle C(H i ), together with all nodes inside or outside The difference between our variant and Lovász's original algorithm is subtle: we consider paths that may not leave H i after entering, while Lovász' original algorithm considers paths that may leave and return. However, this difference is crucial for our distributed implementation: "confining" the path inside H i is what allows us to operate on all active components in parallel, without creating too much congestion.
In Fig. 3, we give an example where Lovász algorithm is "more eager" than our variant, and merges more nodes in a single step, because it merges nodes both inside and outside an active component, whereas our version "stays inside" the component. Lovász's algorithm contracts all of H 1 into one super-vertex, because every v ∈ H 1 has dist(v, C(H 1 )) ≤ 2. However, the path shown with dashes is not internal to H 1 , so in our version we do not use it; since dist H1 (x, C(H 1 )) = 3 > 2, in our version we do not contract x into C(H 1 )

Formal statement and correctness of the meta-algorithm
We prove that our modified variant of Lovász' algorithm is correct, by showing that we can "unpack" each step of our algorithm into several contractions of Edmonds' algorithm. 2 Notation. At each step t of the abstract algorithm, we have We represent each super-vertex by the set of original vertices (from V ) merged into it. Initially, each graph vertex v ∈ V is its own super-vertex: For convenience, we sometimes represent a super-vertex as a set u = {u 1 , . . . , u k }, where u 1 , . . . , u k ⊆ V are the supervertices that were merged to form u; this should be interpreted as u = i u i . In the sequel, we omit the subscript t when the step number is clear from the context. In particular, we abuse notation by using G = (S, E, w) to denote the graph in the current step (we typically do not need to refer to the original edges and weights during the algorithm's execution).
Outline of the proof. We begin by introducing the notion of a hard contraction, where we merge all vertices inside an active component H i at a set distance from the cycle C(H i ). The name "hard contraction" represents the fact that we merge vertices into super-vertices, changing the vertex set of the graph; of course, we cannot do this in the distributed implementation. We essentially prove that hard contractions can be "unpacked" into a sequence of Edmonds contractions, which implies that they modify the weight of the DMST the same way Edmonds' algorithm does-after merging paths up to distance x from the cycle, the weight of the DMST is reduced by exactly x.
Since the distributed implementation cannot alter the communication network by merging vertices, we also define soft contractions, which emulate hard contractions but do not merge any vertices. We prove that hard and soft contractions are equivalent in terms of their effect on the weight of the DMST.
Hard contractions. Recall that each phase of Edmonds' algorithm first deducts the weight of the minimum-weight incoming edge of a vertex v from all incoming edges of v, and then contracts any resulting zero-weight cycles. To mirror these two steps, we define two types of hard contractions: -An open hard contraction, which is intended to capture the state of the graph after performing a sequence of Edmonds contractions, and starting another contraction, by choosing the minimum-weight incoming edge and deducting its weight from all incoming edges (but not yet merging the zero-weight cycles that are created); and -A closed hard contraction, intended to capture the state of the graph after "completing" the last Edmonds contraction, shrinking all zero-weight cycles.
In both cases, the effect is to contract all vertices inside H i that have up to some distance x to the cycle C(H i ), but an open contraction only merges vertices with distance strictly less than x, while a closed contraction merges vertices with distance at most x (inclusive). We first define the set of vertices that would be merged by each type of hard contraction: given a threshold x ≥ 0, let and denote the set of vertices inside H i that have internal distance at most x (for U i (x)) and strictly less than x (forŮ i (x)) to the cycle C(H i ). (Here and in the sequel, when we say "internal distance", we refer to the distance dist H i that considers only paths inside H i .) Note that for any x ≥ 0 we have C(H i ) ⊆ U i (x), because the cycle nodes are at distance zero from one another; therefore, we always have |U i (x)| ≥ 2.

Formal statement of the meta-algorithm
We can now re-state our algorithm in terms of hard contractions, as follows: Initially, we set W (u) = 0 at each node u ∈ V . Next, each vertex u ∈ V finds its minimum-weight incoming edge (u, v), subtracts w(u, v) from all its incoming edges, and adds (u, v) to H . Node u also adds w(u, v) to W (u).
In each subsequent iteration, if the active components at the beginning of the iteration are H 1 , . . . , H , then for each H i : We compute the set U i (β i ), and perform a hard contraction with threshold β i , replacing our graph with i (G). Let s be the new super-vertex formed by con- After performing all the contractions, for each active component H i , we add the incoming edge e i (which now has weight zero) to H . This causes H i to merge with the active component to which the source of e i belongs.
The algorithm halts when we have only one component left-the root component, R. At this point we return the value u∈R W (u) as the weight of the DMST.

Correctness proof
Observe that closed hard contractions do not create negativeweight edges: 3

Lemma 1 For an active component H i , let w be the weight function of i (G).
If w has no negative-weight edges, then neither does w .
Proof Let (u, v) be an edge of i (G), and let s = U i (β i ) be the new super-vertex formed inside H i by the contraction i (G). We consider several cases: Recall that by definition, Edmonds' algorithm vs. ours. Our goal now is to show that the steps that Edmonds' algorithm would make inside H i until H i merges with another component correspond to a sequence of closed hard contractions, , where x 1 < · · · < x k is a sequence of increasing thresholds, and x k = β i . Our algorithm "jumps ahead" and computes , so its correctness then follows from Edmonds' correctness.
Let H i be an active component. Clearly, for any 0 ≤ x ≤ y, we have U i (x) ⊆ U i (y), as allowing a larger distance to the cycle can only contract more nodes. We are interested in threshold values where U i (x) strictly grows (i.e., does not stay the size same), as these correspond to steps made by Edmonds.
There can be at most n distinct meaningful values x i 1 < · · · < x i n for a given active component H i , because for each j > 1, we have so after at most n meaningful values, we have reached the entire graph. Let k i ≤ n be the number of distinct meaningful values x i 1 < · · · < x i k i for component H i , and for convenience, let x i 0 = 0. In the sequel, we fix an active component H i , and omit the superscript i from our notation.
The following observation captures the "size of the jump" between two consecutive meaningful values: Proof By the definition of meaningful values, x j+1 is the smallest value such that there exists a node u ∈ Let u be such a node, and let u 0 = u, u 1 , . . . , u be a lightest path from u to C(H i ). The weight of this path is x j+1 . Assume w.l.o.g. that u 0 is the last node on the path that does not lie inside U i (x j ), so that u 1 , . . . , u ∈ U i (x j ) (otherwise, consider instead the suffix of the path starting at the last node that is not in U i (x j ); the suffix still has weight x j+1 , because there do not exist any nodes whose distance to C(H i ) is strictly between x j and x j+1 ). Moreover, since we chose a lightest path from u 0 to C(H i ), the suffix u 1 , . . . , u must be a lightest path from u 1 to C(H i ), so that Since Recall that for any node is the weight of the lightest path from u 0 to C(H i ) that begins with the edge (u 0 , z). In our case, it must be that β(u 0 , z) = β(u 0 , u 1 ), otherwise we would obtain a contradiction to our choice of u 0 , . . . , u as the lightest path that immediately enters U i (x j ) and goes to C(H i ). Therefore, the updated weight of edge (u 0 , s) is (The equality is by (17).) This shows that there exists an incoming edge (u 0 , s) of . It remains to prove that there is no lighter incoming edge. Suppose for the sake of contradiction that there is such an edge, (u , s), with weight less than x j+1 − x j in By definition of a hard contraction, the weight of edge (u , s) in Therefore there some z ∈ s such that that is, which lies strictly between x j and x j+1 , contradicting their definition as consecutive meaningful values.
Next, we prove that open and closed hard contractions indeed have the relationship to one another that we described on an intuitive level above: namely, a closed hard contraction corresponds to first computing an open hard contraction, and then contracting any resulting zero-weight cycles.
Then the following two graphs are equal: , and II. The graph G obtained starting from λ i (G), merging the super-vertex corresponding toŮ i (λ) with all nodes z ∈ U i (λ) \Ů i (λ), and finally, subtracting the weight x − λ from all the weight of all edges incoming to the new vertex (i.e., the super-vertex corresponding to U i (λ)).
Proof First, note that the vertices are the same: in x i (G) we have a new super-vertex {v ∈ U i (x)} obtained by merging together the vertices in U i (x) to a super vertex, and the other vertices remain the same as in G. On the other hand, in G , we first contract λ i (G), obtaining a super-vertex v ∈Ů i (λ) , and then we merge into the new super-vertex all the nodes z ∈ U i (λ) \Ů i (λ). Since λ is the largest meaningful value that is no greater than x, we have U i (x) = U i (λ), and therefore we see that the new super-vertex is exactly {v ∈ U i (x)}. Since both graphs are obtained by contractions on G and their super-vertices are the same, it follows that the edge set is also the same. Now consider the weights. In x i (G), we have In G , we first compute x i (G): we let s = {v ∈Ů i (λ)}, and set Then we merge U i (λ) with s , obtaining the final super vertex s, and subtract x − λ from its incoming edges. For q ∈Ů i the obtained weight is: The claim follows.
We point out that an equivalent statement to Lemma 2 is simply: However, for our purposes, the "operational" statement of the lemma is more useful.
The following lemma characterizes the "jump" between contracting up to a meaningful value λ, versus contracting up to a threshold x > λ which is smaller than the next meaningful value after λ. Lemma 3 Fix a threshold x ∈ (0, β i ], and let λ be the maximum meaningful value such that λ < x. Then the following two graphs are equal: Proof Since λ is the largest meaningful value such that λ < x, we haveŮ i (x) = U i (λ), and therefore the vertex set is the same. Since both graphs are obtained by contractions on G and their super-vertices are the same, it follows that the edge set is also the same. As for the weights, let w be the weight function of x i (G), w be the weight func- Therefore, after subtracting the weights, the graphs are equal.
We are now ready to show that each step of our metaalgorithm, where we apply a closed hard contraction, reduces the weight of the DMST by exactly the threshold up to which we contract: Proof First, we prove that this holds for all meaningful values x 1 , . . . , x k , by induction on j. Then we extend the argument to arbitrary thresholds.
In this lemma, since we are concerned only with the component H i , we use "distance" to refer to the G( Recall that the smallest meaningful value is always x 0 = 0. For the induction base, consider 0 i (G): this is the graph obtained by contracting U i (0), all super-vertices in H i that have distance zero to C(H i ). In addition, each vertex in H i has distance zero from C(H i ), because the vertices of H i are exactly those vertices that can be reached from C(H i ) by following edges of H , which all have weight zero. Therefore, any two vertices in U i (0) have zero-weight directed paths between them in both directions. By Observation 5, contracting U i (0) does not change the weight of the DMST, so the claim holds in this case.
Next, consider the jth meaningful value x j , j > 0, and let = x j − x j−1 . The induction hypothesis asserts that W ( We view the "step" from to x j as two sub-steps: first, we move from using Lemma 3 to show that the weight of the DMST is reduced by ; then, we "close" the contraction, moving from , and use Lemma 2 to assert that the weight of the DMST does not change.
To that end, consider first the move from Note that by definition, x j−1 is the largest meaningful value such that x j−1 < x j . Thus, by Lemma 3, the graph Note that these are all nodes that have zero-weight

and in
x j i (G) we subtract from the weight of the incoming edge (u,Ů i (x j )) exactly x j . More-over, every node in H i has (by definition of H i ) a path of H -edges from the cycle C(H i ), and since C(H i ) ⊆Ů i (x j ), each node u ∈ U i (x j )\Ů i (x j ) must have an incoming zeroweight edge from some node ofŮ i (x j ). Hence, after the contraction x j i (G), the weight of the edge (Ů i (x j ), u) is set to 0. Together, we see that in has a zero-weight cycle with the super-vertexŮ i (x j ). By Observation 5, merging the nodes of U i (x j )\Ů i (x j ) with the super-vertexŮ i (x j ) does not change the weight of the DMST. This completes the induction step, and proves that the claim holds for all meaningful values of H i .
Finally, suppose that x is not a meaningful value. In this case, let λ ≤ x be the largest meaningful value below x. Let G be the graph is obtained by subtracting x − λ from the incoming edges of the new super-vertex of λ i (G), because by Observation 6, there do not exist any vertices with distance x − λ < x j+1 − x j to the super-vertex U i (x j ) (so no new vertices are contracted). As we already showed, since λ is a meaningful value, we have Since we defined i (G) = β i i (G), we immediately obtain: The corollary implies that each step of the meta-algorithm reduces the weight of the DMST by exactly the sum of the β i 's for the active components H i . Recall that after contracting i (G), the new super-vertex formed by the contraction adds β i to its local weight variable, W . Therefore, we have: Proof By induction on t. The base case, t = 1, follows immediately from Observation 4 of Edmonds: we initialize the algorithm by having each node u ∈ V subtract the weight of its minimum-weight incoming edge (v, u) from all its incoming edges, and store this weight in W (u). By Observation 4, for each node u ∈ V this reduces the weight of the DMST by w(v, u) = W (u), so after the initialization we indeed have Now suppose the claim holds for step t < T , and consider step t + 1. Let H 1 , . . . , H be the active components in step t. The graph G t+1 is obtained from G t by performing a sequence of hard contractions, yielding a sequence of . We see that weight is never "lost", and that for each i = 1, . . . , we that is, the local weight updates indeed match the change in the weight of the DMST (the local weights go up by the same amount that the weight of the DMST goes down). All together, we obtain where the last step uses the induction hypothesis for step The correctness of the meta-algorithm follows: Theorem 7 Let G be a graph with non-negative weights, and let G be the graph after the final iteration of the metaalgorithm on G. Then W (G) = u∈G W (u).
Proof The meta-algorithm halts when the graph induced by the edges H has only one weakly-connected component-the root component. We maintain the invariant that every (super-)vertex except the root has exactly one incoming edge in H , and the root has no incoming edges; because all (super-)vertices of G are in the same weakly-connected component of H as the root, they are each reachable from the root by following edges of H , which have weight zero. We therefore have W (G ) = 0. By Lemma 5, and the claim follows.

Soft contractions
Since we cannot contract vertices of the communication network, we replace hard contractions with soft contractions, which have the same effect but change only the weight function and the super-vertex mapping. We give a mapping between "the real network graph" and "the meta-graph of super-vertices", such that the mapping is preserved when we apply a soft contraction to the real graph and a hard contraction to the meta-graph. We refer to the nodes and edges of the "real network graph" as physical nodes and edges respectively. Let G = (V , E, w) be a graph, and let S ⊆ P(V ) be a partition of the nodes of V into super-vertices. Given u ∈ V , let s(u) ∈ S denote the super-vertex s ∈ S such that u ∈ s. In the other direction, given a set of super-vertices Z ⊆ S, let Z = Z denote the set of physical vertices mapped into Z .
The meta-graph induced by G and S is the graph (Recall that we adopted the convention that edges that are not in E are assigned infinite weight, so w is well-definedif super-vertices s, s have no edges between them, then the weight of the edge (s, s ) is set to ∞.) In the remainder of this section, we let G = (V , E, w, S) denote an annotated graph-the graph, together with a partition S ⊆ P(V ) into super-vertices. We say that G is proper if it has the property that each super-vertex U ∈ S induces a connected subgraph on G, and moreover, for any edge Note the following property of the mapping: Observation 8 Let G = (V , E, w, S) be proper, let U , U ∈ S be super-vertices, and let X ⊆ S be a set of super-vertices such that U , U ∈ X . Then for any physical vertices u ∈ U , u ∈ U we have where the first distance is in the graph G and the second in the meta-graph G S .
Proof Let u = u 0 , . . . , u = u be a lightest path inside X from u to u in G, and let U = U 0 , . . . , U k = U be the corresponding path inside X in G S , obtained by taking s(u 0 ), . . . , s(u ) and omitting self-loops. Then the weight of U 0 , . . . , U k is at most the weight of u 0 , . . . , u , because the weight of an edge between two super-vertices is defined to be the weight of the lightest edge between any two vertices inside them. Therefore, dist X (U , U ) ≤ dist X (u, u ).
For the other direction, we now let U = U 0 , . . . , U k = U be a lightest path from U to U inside X in G S . Construct a path π from u to u inside X in G, as follows: π consists of segments, π = π 1 , . . . , π k , τ , where the cost of each segment π i is the weight in G S of the edge (U i−1 , U i ), and τ has weight 0. Each segment π i ends at a node v i ∈ U i ; for convenience, define v 0 = u. The segments are defined inductively: for each i = 1, . . . , k, let (z i−1 , z i ) be a lightest edge from any node z i−1 ∈ U i−1 to any node z i ∈ U i . Then by definition, In segment π i , starting from the end point v i−1 ∈ U i−1 of the previous segment (or the initial point v 0 ∈ U 0 , if i = 1), we move inside U i−1 to node z i−1 (such a path exists, and has cost 0, because G is proper), then cross the edge (z i−1 , z i ) (paying w G S (U i−1 , U i )), and stop at z i (i.e., we define v i = z i ). Finally, the last segment τ moves inside U k from the vertex v k where π k ended, to node u . This is also for free. Put together, we see that there is a path from u to u with the same cost as U 0 , . . . , U , and therefore dist X (U , U ) ≥ dist X (u, u ).
Intuitively, a soft contraction is the same as a hard contraction meta-step, but instead of merging vertices, it simply zeroes out the weight of the edges between them.

Observation 9 If G is proper, then i (G) is also proper.
We prove that the effect of an open contraction is the same as the corresponding hard contraction in the meta-graph: Proof Let F i (G). By definition of the soft contraction, the vertices of the meta-graph F S are exactly those of i (G S ), and hence the edges are also the same. It remains to show that the weights are the same.
The only weights that change from G to F i (G), or from G S to i (G S ), are those that touch vertices or super-vertices (respectively) in U i (β i ), U i (β i ) (respectively). Thus, let Z = Z ∈ U i (β) be the new super-vertex that replaces U i (β) in i (G S ), and consider an edge (U , V ) ∈ E(F S ) such that: -U = Z (i.e., an outgoing edge of the new super-vertex): in i (G S ), the weight of this edge is set to the minimum weight of any outgoing edge from a super-vertex in U i (β i ) to V before the contraction. In F, we do not change the weight of any edge outgoing from U i (β), and therefore, in F S , the weight of the edge (Z , V ) is the minimum weight of any physical edge (z, v) where z ∈ U (β i ), v ∈ V . By definition of the meta-graph, the weights agree. -V = Z (i.e., an incoming edge of the new supervertex): in i (G S ), the weight of this edge is set to , z), because the weight of a meta-edge linking two super-vertices is defined as the minimum weight of any two vertices inside them.
We point out that soft contractions in different active components can be performed in parallel: if the meta-graph G S has active components H 1 , . . . , H , the meta-algorithm performed the hard contractions 1 (G S ) , . . . , (G S ) in sequence, because they "touched" the same edgesincoming and outgoing edges of the active component (although it is not hard to see that the operations commute, so the order does not matter). Incoming edges are modified "for a good reason": we must subtract from each incoming edge the weight of the minimum-weight entering path into H i . Outgoing edges are only affected for a syntactic reason: when we physically merge two super-vertices U , U ∈ H i , we need to also merge any outgoing edges that have the same target; e.g., two edges

Implementation in CONGEST
In this section we explain how we translate our metaalgorithm to the CONGEST model with bidirectional communication and asymmetric weights. We start by introducing the main ingredients that go into the implementation.
The meta-graph and the physical graph. We refer to the "real" nodes of the communication network as physical vertices or physical nodes. Super-vertices are simply sets of physical vertices, but each super-vertex has a unique identifier, which is the ID of some physical vertex in it. We often conflate a super-vertex with its ID. Let S be the set of all super-vertex IDs (as we said, these are simply IDs from V , but for clarity we use different notation). During the run of our algorithm, each physical vertex v keeps track of sId(v), the ID of the super-vertex that contains it in the meta-graph (i.e., in the graph of super-vertices). Node v also knows which of its physical edges correspond to metaedges in H : that is, for each physical edge (v, u), node v knows whether or not (sId(v), sId(u)) ∈ H .
Given a physical network graph G = (V , E, w) and a mapping sId : V → S of physical nodes onto super-vertices, the meta-graph that corresponds to G and sId is a multigraph, where two super-vertices S 1 , S 2 ∈ S are connected by all the edges that connect physical vertices (u, v) ∈ E such that u ∈ S 1 , v ∈ S 2 . (Although our modified algorithm above is stated for graphs rather than multi-graphs, it is easy to see that its correctness translates immediately to multigraphs as well.) For each super-vertex S ∈ S, there is a single incoming meta-edge (T , S) in H (recall that all nodes have in-degree exactly 1 in H ). The meta-edge (T , S) may correspond to many physical edges; the algorithm chooses one such edge, (u, v) ∈ E such that u ∈ T and v ∈ S, and defines entry(S) = v to be "the physical entry-point of S". Small and large components. As is common in many distributed MST algorithms, after we perform several metasteps of the algorithm, some super-vertices may become so large that we cannot afford for their physical nodes to communicate with each other directly. We resolve this difficulty by handling differently "small" and "large" components, an approach which is used in the earliest algorithms of MST in CONGEST [17], in many of following works (e.g [11]) , and which was made into a formal framework in the lowcongestion shortcuts framework of [21]: super-vertices are classified into "small" super-vertices, which comprise at most √ n physical nodes, and "large" super-vertices, comprising more than √ n nodes. The small super-vertices are small enough that we can compute on them directly (in parallel). As for large super-vertices, there are at most √ n of them, and the entire network helps them carry out their computation. For example, suppose we have large supervertices S 1 , . . . , S k , and for each super-vertex S i , there is some value x i that must be learned by every physical vertex v ∈ S i . To accomplish this, we will propagate all the pairs (sId(S 1 ), x 1 ), . . . , (sId(S k ), x k ) throughout the entire network, and each physical vertex v ∈ S i will pick out the value x i that is paired with its super-vertex sId(v).
Like super-vertices, the active components may also grow too large, so we also use the low-congestion shortcuts framework to communicate across them. We remark that unlike undirected MST, in our case there is a distinction between a super-vertex and an active component; both are "meta-entities" that the algorithm needs to handle in parallel, but each active component consists of many super-vertices, some large and some small. This presents some complications compared to the undirected case. For example, in distributed implementations of Borůvka's undirected MST algorithm, there are super-vertices, but there is no notion of an "active component"-the algorithm simply merges more and more vertices, until there is only one super-vertex in the graph.
Three particularly useful procedures that use the large component/small component paradigm are the following. ClassifySize learns whether the active component to which it belongs is "large" (more than √ n physical vertices) or "small" (at most √ n physical vertices. -BroadcastToNetwork: assume that each physical node v has a set of values X v , possibly empty, such that the total number of distinct values in the network (i.e., | v∈V X v |) is "not too large" (in our case, O( √ n) distinct values). Then BroadcastToNetwork uses a standard pipelining technique [22] to disseminate all values to all nodes in the network, so that at the end, each vertex v knows u∈V X u .
The procedures LearnMin, ClassifySize are special cases of Lemma 16 from [21], and can be implemented in O( √ n + D) rounds. A full description of ClassifySize, LearnMin, BroadcastToNetwork and other useful basic procedures used in our algorithm are presented in Appendix A.

Centers.
A key part of our algorithm is concerned with finding some super-vertex that lies on the cycle C(H i ) of an active component. When we begin this part of the algorithm, each super-vertex knows its incoming active edge in H , but we do not yet know which super-vertices are on the cycle; the cycle can be very long, as it can consist of any number of supervertices, themselves comprising many physical vertices.
To find a super-vertex on the cycle, we "chop up" the cycle into more manageable parts: we select a center set, a set of O( √ n) super-vertices (always including the root supervertex), with the property that for any H -path S 1 , . . . , S k of super-vertices, if the total number of physical vertices in S 1 , . . . , S k is at least √ n, then at least one super-vertex S i is a center.
Centers are often used in shortest-path computations (e.g., [14,19,31] in the CONGEST model, and many other examples in dynamic algorithms and distance oracles), but here we use them in a non-standard way: we construct a center graph representing the reachability relation between centers using edges of H , and use the center graph to find the cycle C(H i ) and determine which super-vertices are reachable from it along edges of H .

Running SSSP on many disjoint components in parallel.
One step of our algorithm is concerned with finding singlesource shortest paths (SSSP) inside each active component, in parallel: to compute the "entering distances" β i , after finding some vertex c(H i ) that lies on the cycle C(H i ) of each active component H i , we must find the distance from each vertex of H i to c(H i ), in parallel for all components H i . It is tempting to try to solve this by running in parallel many instances of an SSSP algorithm, one inside each active component; these instances would not interfere with one another, because they do not use any common edges (each instance is "confined" inside an active component). However, the active components may have large diameters, as they are each an arbitrary subgraph of the original graph; this could cause the SSSP instances to take too long.
Instead, we use the following technique, which allows us to simultaneously solve all the SSSP instances at the cost of solving one SSSP instance on the original graph G: Lemma 7 Suppose we are given a partition of the vertices V into sets A 1 , . . . , A k such that each A i induces a connected subgraph of G. Let a 1 ∈ A 1 , . . . , a k ∈ A k be arbitrary vertices, one from each component. Moreover, assume there is a special node r that has a directed path to every other node in G, and its ID is known to every node in the network.

Then in O(SSS P(2n, D + 1)) rounds, every vertex v in each component A i can learn dist A i (a i , v).
Proof The idea is to construct a "virtual graph" G , such that from a shortest-path tree in G we can extract the shortestpath tree with source a i inside each component A i . Then we show that the real network G can simulate the virtual graph G .
Let A = {a 1 , . . . , a k }. We define the virtual graph G , on which we will eventually simulate SSSP, as follows: G contains all vertices of G and every edge e ∈ E that has both its endpoints in the same component A i . In addition, for each vertex v ∈ V , we add a "shadow vertex" v , and for each original edge (u, v) ∈ E we add a "shadow edge" (u , v ), with weight zero. Next, for each source a i , we add a zero-weight edge (a i , a i ). Finally, we add a set of edges with weight nW + 1 to G : this set includes every edge e ∈ E that has both its endpoints in the two distinct components A i , and a bidirectional edge between every vertex v ∈ V and its "shadow vertex" v . intuitively, these edges behave like infinite-weight edges as far as we are concerned, because the shortest paths we are interested in will never include them (there is always a path of weight at most nW between any two vertices of a given component A i ).
Note that G can be simulated efficiently by G, by having each vertex v simulate itself and its shadow copy v (even in the directed communication model).
The number of vertices in G is 2n: n real vertices, and n "shadow vertices". The diameter of G is at most D + 1: between any pair of real vertices in G there is a path of length at most D (the same path that exists in G), and the same for any pair of shadow vertices (a copy of the path that exists in G). And since every real vertex is connected to its shadow vertex by a bidirectional edge, there is a path of length at most D + 1 between any real vertex and any shadow vertex, or vice-versa.
It remains to show that the shortest paths from r in G also yield shortest paths from each vertex a 1 , . . . , a k for G: we prove that for each component G (r , v). We can ignore edges of weight nW + 1, as these edges will never be traversed by a lightest path-we assume that G is strongly connected, so there is always some path of weight at most nW between any two nodes of G .
Observe that in G , once a path traverses from a shadow vertex v to a real vertex v, it can never cross back to any shadow vertex, as G has no edges directed from real vertices to shadow vertices. Moreover, the active components are vertex-and edge-disjoint, and the source a i is the only vertex in each component A i with an incoming edge from a shadow vertex. Therefore, for a vertex v ∈ A i , any path from r (which is a shadow vertex) to v (a real vertex) must pass through a i , and hence dist On the other hand, observe that in G there is a shadow path from r to a i using only shadow edges, which all have weight zero. (This path is the "shadow copy" of the path in G from r to a i ; we assumed that r has a directed path to all nodes in G.) Therefore, the path first moves from r to a i using shadow edges, then crosses to the real node a i (at zero cost), and finally moves inside A i from a i to v, has weight dist A i (a i , v). This shows that dist G (r , v) ≤ dist A i (a i , v), completing the proof.

Overview of the algorithm
At the beginning of the algorithm, each node v = r sets W (v) to be the weight of its minimum-weight incoming edge, deducts that weight from all its incoming edges, and adds a single zero-weight incoming edge to H (which is initially empty).
Following this initialization, the algorithm runs in O(log n) iterations. At the beginning of each iteration, each physical node v ∈ V knows: -An identifier sId(v) for its super-vertex (initially, s(v) = v), -Which of its edges correspond to meta-edges in H , -Whether or not it is part of the root component.
However, at the beginning of an iteration, physical nodes do not know which active component they currently belong to, because the previous iteration implicitly merged some active components with others, by adding edges to H (recall that the active components are the connected components of H ). The first step of each iteration assigns physical nodes to their current active components.
In each iteration, the network soft-contracts the set U i (β i ) of each active component H i , and updates the weights and super-vertex identifiers of the physical vertices accordingly. In order to compute this soft contraction, for each active component H i we must compute β i , find the set U i (β i ) of nodes with distance ≤ β i to the cycle C(H i ), and then contracts this set. Then we need to find a minimum-weight incoming edge of the new super-vertex U i (β i ) that we just contracted, and add it to H , causing the active component H i to merge with another active component.
As we said above, nodes do not necessarily know which active component they belong to at any given moment; the first part of each iteration of our algorithm is concerned with finding the current active components, after some of them were merged at the end of the previous iteration. Nevertheless, it is convenient to think of the algorithm as "operating in parallel" on all the active components.
Each iteration proceeds as follows, in parallel for each active component H i (see Algorithm 3 in the appendix): (1) Finding a cycle node: we find some super-vertex c(H i ) ∈ C(H i ) that lies on the cycle of H i , and disseminate the ID of c(H i ) to all physical nodes in H i . In particular, we must determine which super-vertices belong to H i (as we said, this is not known at the beginning of the iteration). This procedure is described in Sect. 6 so that we remember the total weight subtracted throughout the run of the algorithm.
After O(log n) iterations, no active components remain, and we have only the root component. We now compute a BFS spanning tree of the network graph, and use it to sum the weights W (v) that we subtracted during the run. The root of the DMST returns this value as the weight of the DMST.
Next, we expand upon the first step of the algorithm (finding a node on the cycle of the active component, and identifying which physical nodes belong to each active component). For exact pseudo-code of the algorithm and its sub-procedures, we refer the reader to Appendix B.

Finding a super-vertex on the cycle, and identifying the active component
In this section we show how to find, for each active component H i , some super-vertex c(H i ) on the cycle C(H i ). We must then disseminate the identifier of c(H i ) to all physical vertices inside H i . When we begin this part of the algorithm, the physical nodes know which of their edges are in H , but they do not know which active component (i.e., which weakly-connected components of H ) they belong to. Part of our goal is to identify the boundaries of the active components, in preparation for finding a minimum-distance incoming edge of each active component; this is accomplished by disseminating c(H i ) to all nodes that can be reached from the cycle C(H i ) along paths of H -edges. Thus, c(H i ) serves as an active component ID, which all physical nodes of H i agree on.
As we said above, in order to identify long cycles and paths, we cut them into shorter pieces by choosing a set of centers. Formally, we need the following property: Definition 4 A set of super-vertices T ⊆ S, which includes the root super-vertex, is said to be a good center set if |T | ≤ 4 √ n, and for any H -path S 1 , . . . , if S 1 , . . . , S k together contain at least √ n physical vertices), then T includes some super-vertex S i .
A good center set can be constructed deterministically in a similar manner to the star-decomposition of [21], or using the fragment joining of [11]. Details regarding this construction can be found in Sect. 6.3.
In the sequel we assume that we have such a set, Centers.
Recall that in H , every super-vertex has in-degree exactly 1. For a super-vertex S (not necessarily a center), we define pred(S) to be the first center we reach by starting from S and traversing backwards along reverse H edges. (Note that a super-vertex can be its own predecessor, if it is a center and is part of a directed cycle in H that includes no other centers.) The center graph is the graph induced by pred(·) (see (1) Like H , the center graph H * also has in-degree 1, except for the root (always selected as a center), which has no incoming edges. (2) If H i includes a center, then H * i is a weakly-connected component of H * .
(3) Whenever C(H i ) includes at least one center, H * i contains a non-empty cycle C(H * i ) (possibly one center with a self-loop), whose vertices are exactly the centers from C(H i ).
Finding c(H i ). After setting up the centers and the center graph, to find some super-vertex from the cycle C(H i ), we divide into three cases, depending on whether the active component H i and its cycle C(H i ) include more than √ n physical nodes or not.
I. H i is a "small component", including at most √ n physical nodes: then in particular, the physical size of the cycle C(H i ) does not exceed √ n. We can find C(H i ) by having each super-vertex start a forward-BFS along the edges of rounds. This is handled by procedure FindSmallCycles, which is described in Sect. 6.2.1. II. H i is "large" (more than √ n nodes), but the cycle C(H i ) is "small" (at most √ n physical nodes): in this case, procedure FindSmallCycles still selects some super-vertex c(H i ) ∈ C(H i ) just as above. However, we cannot afford to disseminate the ID of c(H i ) throughout H i by broadcasting it, because H i is too large. Instead, we add c(H i ) to the center set Centers, and handle its dissemination as described in Sect. 6.2.2. III. C(H i ) is "large": then C(H i ) includes at least one center, and we can identify C(H i ) by examining the center graph H * and looking for the corresponding cycle there.
In cases (II) and (III), after FindSmallCycles is called, C(H i ) includes at least one center: either it was there before, or if the cycle was too small, we added some center in FindSmallCycles. Therefore, the component H * i that corresponds to H i in the center graph contains a cycle C(H * i ).

FindSmallCycles: selecting a super-vertex from a small cycle
Let H i be an active component with a small cycle, | C(H i )| ≤ √ n. Note that initially the network nodes do not know if | C(H i )| ≤ √ n or not; the procedure we describe here will succeed if it is, and otherwise it may or may not fail. We make sure to maintain consistency: either all nodes of H i set their cId to the ID of some super-vertex on the cycle, or all vertices of H i have cId = ⊥.
We start out by performing a priority BFS for √ n rounds along the edges of H : each physical vertex maintains a local variable minId, storing the smallest super-vertex ID received so far. For √ n rounds, nodes send their minId value to their neighbors in the same super-vertex, as well as across edges in H , and update minId to the smallest value they receive. This is called the propagation phase.
Let c = min C(H i ) be the smallest super-vertex on the cycle of H i . If indeed | C(H i )| ≤ √ n, then √ n rounds suffice for c's BFS token to traverse the cycle and arrive back at the entry vertex of c. Therefore, after the propagation phase, we will have minId(entry(c)) = c = sId(entry(c)), and at this point, c is selected as the active component ID. Our task now is to inform the rest of H i and have the physical nodes in H i set their cId to c.
Note that c can be selected as the active component ID even if | C(H i )| > √ n (if c is large but the rest of the cycle is small). Therefore in the sequel we do not assume that | C(H i )| ≤ √ n. The first step is to inform all physical vertices in c that c was selected; for this purpose we use procedure LearnMin with the entry vertex of each super-vertex passing in its minId value, and the other vertices passing in ∞. The edges used in this call to LearnMin are internal super-vertex edges. Following the call to LearnMin, all physical vertices of each super-vertex s ∈ H i agree on the minId of entry(s). They now know whether s was selected as the active component ID (i.e., whether minId(entry(s)) = s, matching their sId).
Next, we call ClassifySize to determine whether the physical size of H i exceeds √ n, and split into cases: if there are no more than √ n physical vertices in H i , then the physical vertices of the active component ID c broadcast the ID of c, and this is forwarded inside H i for √ n rounds. All physical vertices that receive c set their cId to c and forward c as well. Otherwise, if there are more than √ n physical vertices in H i , then instead, super-vertex c marks itself as a center. We leave the dissemination of the active component ID to the next part, HandleLargeComponents.

Procedure HandleLargeComponents
In this procedure we handle active components H i with physical size at least √ n. When we call HandleLargeComponents we are guaranteed that C(H i ) contains at least one center: recall that before calling HangleLargeComponents we already set up centers (SetupCenters) and identified small cycles (FindSmallCycles). If the physical size of C(H i ) is at least √ n, then this is guaranteed by the definition of a valid center set (Definition 4), and otherwise some super-vertex of C(H i ) was added as a center in procedure FindSmallCycles.
Even after FindSmallCycles adds some centers, the center graph still has O( √ n) vertices, because only active components with physical size > √ n add a new center, and there are at most √ n such components. Since the center graph has in-degree at most 1, it has O( √ n) edges. Therefore we can afford to disseminate the entire center graph throughout the network.
We learn and propagate H * as follows: each center (or rather, all the physical nodes in it) sends its ID forward along H -edges, and this ID is forwarded by all non-center supervertices (again, by their physical nodes, and the ID is also forwarded along internal super-vertex edges). Center IDs are forwarded for at most √ n + 1 rounds. When the entry node v of a super-vertex s receives a center ID c along its incoming H -edge, it sets enterId(v) = c. If s is a center, then v stops the propagation of c (it does not send it on to its neighbors). If s is not a center, then v continues the propagation of c by sending it to its same-super-vertex neighbors and along outgoing H -edges.
After √ n + 1 rounds, for each super-vertex s except the root (which has no predecessor), we have enterId(entry(s)) = pred(s). We call LearnMin to have the entry vertex disseminate the value of pred(s) among all physical nodes of s: the entry vertex entry(s) passes to LearnMin the value of pred(s), and all other vertices inside s pass in ∞.
Next, we call BroadcastToNetwork to disseminate the edges (pred(c), c)  Correctness.

Lemma 8 Let C be a cycle in the center graph, then there is an i such that C ⊆ C(H i ). Conversely, if C(H i ) is of physical size ≥ √ n, then there is exactly one cycle C containing all centers of C(H i ).
Proof Recall that ids are propagated only on internal supervertex edges and outgoing edges. As the active component super-vertices form a cycle with trees oriented away from the cycle, an id from outside the cycle cannot reach the cycle. Moreover, since the active components are vertex disjoint, and there is no edge in H connecting two active components, an id from one active component cannot reach another active component. Therefore, if a cycle is formed in the center graph, it must be a subset of the super-vertices of some active component H i .
If the cycle is of physical size larger than √ n, then there is at least one center in C(H i ), and for each segment of super vertices of physical size ≥ √ n in C(H i ) there is at least one center. As IDs are only propagated forward, a center may only receive an ID of another center in the same active component. Moreover, the predecessor center in C(H i ) has physical distance at most √ n, therefore in HandleLageComponents each center receives the predecessor center ID and propagates it to the entire network. In this case, the centers of C(H i ) form a cycle.

Lemma 9
After the procedure terminates, for each active component H i , exactly one super-vertex selects itself to become the active component ID, and each node v ∈ H i knows that ID.
Proof Let H i be an active component. If the cycle C(H i ) is of physical size smaller or equal to √ n, let v min be the (super)vertex with the minimum ID in C(H i ). In sub-procedure FindSmallCycles, v min receives its own ID and selects itself. We note that no other vertex selects itself in the cycle, as every node receives at one point the ID of v min , and no vertex outside the active component cycle can receive its own ID, as the ID is propagated only on outgoing edges of the super-vertices. If the component is of physical size ≤ √ n, then the node propagates its ID to the entire component, and otherwise adds itself to the set of centers with a self loop.
If the cycle is of physical size larger than > √ n, then by Lemma 8 there is a cycle which contains all centers of C(H i ), and it is the only cycle to contain centers from C(H i ).
Overall, there is exactly one cycle for each active component of size ≥ √ n, and taking the minimum ID from each cycle as the ID, each node in such a component knows a predecessor center, and therefore knows which cycle contains its new active component ID.

Choosing centers deterministically
In this section we show how a valid center set can be found deterministically in O( √ n + D) rounds. (We note that this task can be done trivially using randomization by having every physical node mark itself with probability Θ(log n/ √ n), and having every super-vertex with a marked physical node join the center set. This ensures w.h.p. both that the center set is of size O( √ n log n), and that every Hpath S 1 , . . . , S k with at least √ n physical nodes contains a center.) The idea of the deterministic algorithm is to partition H into a set of directed trees of super-vertices, where each tree is initialized to be a single super-vertex, and over O(log n) iterations, we merge some adjacent trees into larger trees. Once a tree becomes "large enough", its root super-vertex becomes a center, and cannot merge with any other tree. We manage the merging process carefully, to ensure that no tree we create is too deep. The effect is therefore to "chop up" long paths in H and select at least one center on any path of physical size √ n, while not selecting too many centers in total. This procedure is similar to procedures in several other works (e.g., [11,21,25]), but the property we want is slightly non-standard-every path of super-vertices that contains more than √ n physical vertices must include a center. Thus, for the sake of completeness, we give the details of the construction.
The tree graph. Following each merging iteration, we imagine that each tree is contracted into a single node. Trees that are still "relatively small" (size < √ n/4) are active, and still participate; trees that are "large" (size ≥ √ n/4) are inactive-their root is selected as a center, and they will not be merged with other trees. The physical nodes of inactive trees participate only in forwarding the messages of the active tree nodes when required. The graph of "contracted trees" is formally defined as follows.
Definition 6 (The tree graph) Let A ⊆ S be a set of active super-vertices, and let P be a partition of A, such that the subgraph induced by each component T ∈ P on H is a tree (oriented downwards from the root). The tree graph induced by P is a directed graph, where the vertices are P, and there is an edge from T 1 ∈ P to T 2 ∈ P iff some super-vertex in T 1 has an edge in H to the root super-vertex of T 2 . The vertices of the tree graph are called the active trees.

Remark 1
The graph is called a tree graph as every component T ∈ P is a tree, but the graph itself might not be a tree.
Note that the tree graph has in-degree at most 1 (because H does).
When we begin, the tree graph is simply H , with each super-vertex in its own separate active tree. The root of the DMST is marked as a center, and for each active component n there is a single marked center c(H i ). Each active tree T ∈ P has a tree ID, which is the ID of its root super-vertex. Let tId(s) denote the ID of the tree to which super-vertex s ∈ S belongs, and for a physical vertex v, let tId(v) = tId(sId(v)) be the tree to which v's supervertex belongs. Finally, let r (T ) denote the root super-vertex of T .
Simulating Cole-Vishkin on the tree graph. To determine which trees are merged in the current iteration we use a vari-ant of Cole and Vishkin's algorithm for 3-coloring oriented pseudo-forests 5 (graphs with in-degree at most 1) [7].
We simulate the execution E of Cole-Vishkin's algorithm on the tree graph (Definition 6) to obtain a 3-coloring of the active trees, which is possible as. Each active tree T is simulated by all the physical nodes of T : the physical nodes of T keep track of T 's state in the simulation in each simulated round. We use √ n/4 "real" rounds to simulate one round of Cole-Vishkin on the tree graph.
A round is simulated as follows: consider a directed edge (T 1 , T 2 ) in the tree graph. Let s ∈ T 1 be the super-vertex that has an edge in H to the root r (T 2 ) of T 2 . Then there is a physical vertex v ∈ s that has an edge to the entry vertex entry(r (T 2 )) of r (T 2 ). Node v knows the local state of T 1 at the beginning of the round, and it now computes the message that T 1 should sent to T 2 , and sends this message to entry(r (T 2 )).
Recall that each active tree has either zero or one incoming H -edges. If T has an incoming H -edge, then entry(r (T )) receives the message sent along it. Then, entry(r (T )) computes the state of T in the next round, and sends this state to all physical nodes of T . Since the physical size of T is at most √ n/4 when T is active, this requires O( √ n) rounds. (The exception is the tree containing the source) Eventually, when Cole-Vishkin terminates, and outputs a color at each active tree T . This color is known to all physical nodes of T , because they know the local state of T in the simulation.
Merging trees. After computing a 3-coloring of the tree graph we proceed in two steps: first, trees with color 0 merge with all their out-neighbors. Then, trees colored 1 that were not merged with a 0-colored tree in the current iteration merge with all their out-neighbors as well.
The merge is physically carried out by having each leaf s of the "parent tree" T send a message along its outgoing H -edge to the roots of its out-neighbors, informing these nodes that their tree has been merged into the parent tree, and providing them the root's ID, tId(s). The roots of the subsumed trees then inform all the physical nodes in their tree, again using O( √ n) rounds. At the end of the iteration, each tree T checks if | T | ≥ √ n/4 or not, by having all its physical nodes call ClassifySize. If | T | ≥ √ n/4 then the root of T marks itself as a center, and the tree becomes inactive.
In addition, we find every isolated edge or vertex in the active tree graph-that is, every edge (T 1 , T 2 ) such that T 1 has no incoming edges and T 2 has no outgoing edges, and every node T 1 that has no incoming or outgoing edges. (Isolated edges and nodes in the tree graph are easily found by having their root and their leaves check if they have incoming or outgoing H -edges respectively, and inform all physical nodes in the tree that they are isolated, which requires O( √ n) rounds for any active tree.) Every isolated edge is merged into one isolated tree, and if the size of the tree reaches √ n/4, the root becomes a center. Then, every isolated tree (including newly-created ones) becomes inactive.
Correctness. The easiest fact to establish is that the set of super-vertices selected as centers is not too large: except for the root super-vertex, which is always a center, we choose s as a center only if s is the root of a tree T with physical size | T | ≥ √ n/4. There can be at most 4 √ n such trees, each with a unique root. Therefore at most 4 √ n + 1 super-vertices are selected as centers.
Now let us show that the algorithm terminates in a small number of iterations, and that it satisfies the other requirements. First we show that O(log n) iterations suffice for all trees to become inactive, by showing that in each iteration, the length of the longest directed path in the tree graph shrinks by a factor of 6/7: Lemma 10 Let π be a longest path in the tree graph, and let π be the path obtained from π after one iteration. Then |π | ≤ (6/7)|π |.
Proof First consider the cases where |π | ≤ 2. If |π | = 1, then π is an isolated vertex, which will become inactive. If |π | = 2, then π is an isolated edge, which the algorithm identifies and merges into one tree.
If |π | = 3, then there are two cases. Let π = T 1 , T 2 , T 3 . If T 1 is colored 0 or 1, then T 1 will merge with T 2 , as T 1 has no predecessor that might merge with it and prevent it from doing so. Therefore, |π | ≤ 2 < (6/7) · 3. Otherwise, if T 1 is colored 2, then T 2 is colored either 0 or 1. In this case T 2 merges with T 3 (again, T 1 will not prevent it from doing so, since it is not colored 0). Now suppose |π | ≥ 4. Divide π into consecutive subpaths of length 4 each, and a suffix of length at most 3. We show that in each length-4 sub-path, at least two trees merge.
Let T 1 , T 2 , T 3 , T 4 be a consecutive sub-path of π . If T 1 , T 2 or T 3 has color 0, then it will merge with its successor, and we are done. Otherwise, either T 2 or T 3 are colored 1 and do not have a predecessor with color 0; therefore one of them will merge with its successor.

Corollary 2
The running time of the center-selection procedure is O(( √ n + D) log n log * n).
Proof By the previous lemma, after log 7/6 (n) < 6 log(n) iterations, all trees become inactive, and we are done. In each iteration, simulating Cole-Vishkin on the tree graph requires O( √ n log * n) rounds. Now let us show that any sufficiently long physical path passes through a center. First, observe that inside each tree, we do not have long physical paths:

Lemma 11
If T is an active or inactive tree and π is an Hpath contained entirely in T , then | π | ≤ (3/4) √ n.
Proof Consider the iteration where T became inactive, as after this point it does not change.
A tree becomes inactive for one of two reasons: either its size reaches √ n/4, or the size remains below √ n/4 but the tree becomes isolated. In the second case we are fine. Thus, suppose that T is formed in the current iteration by merging active trees T 1 , . . . , T k with total physical size i | T i | ≥ √ n/4, which makes T become inactive. If the merge was of an isolated edge (T 1 , T 2 ), then because T 1 , T 2 were active at the time of the merge, we have | T 1 |, | T 2 | ≤ √ n/4. Any H -path that can be formed inside the merged tree has physical size at most √ n/2. Otherwise, the merge was of a 0-colored tree subsuming its out-neighbors, or a 1-colored tree doing the same, or both: in the worst case, a tree T 1 with color 0 merges with all its out-neighbors, and then the predecessor of T 1 , a tree T 2 with color 1, merges with the new tree formed by T 1 and its former out-neighbors. Thus, the size of the physical path that can be formed by a merger is at most triple the size of the largest physical path that existed at the beginning of the iteration. And since only active trees participate in merges, and active trees have physical size at most √ n/4, the largest physical path that can result from a merger is of length (3/4) √ n.

Lemma 12
At the end of the algorithm, for any H -path of super-vertices, π = s 1 , . . . , s k , of physical length | π | ≥ √ n, some super-vertex s i is a center.
Proof Consider the last super-vertex s k of π . Note that since H has in-degree 1, there is only one path of length k ending in s k , that is, π is the unique path obtained by traversing backwards k steps from s k along reverse H -edges.
Let T = tId(s) be the last tree to which s belonged, when T became inactive and stopped merging with other trees. Then T became inactive either because its physical size exceeded √ n/4, or because it was isolated and had no active neighbors.
If T became inactive because it became too large, then the root of T became a center. By the previous lemma, the physical size of the backwards-H -path from s to r (T ) is at H t 1 , . . . , H t k i be the set of active components in iteration i.
By Observation 3, we may assume that from the SSSP algorithm that we used to contract the graph, each node also obtains a parent in a SSSP tree. We require the nodes to store these edges; specifically, each node needs to remember, for each contraction, its edges in the reverse shortest-paths tree from c(H j ) that was computed during the contraction.
Each super-vertex created during the current iteration is unpacked in parallel, and the edges taken into the DMST are the edges described above. When choosing which of their edges to add, the only computation nodes cannot perform locally is to determine which of their edges participates in the shortest-path from v * to c(H (v * )). We handle this using centers, just as we did in Sect. 6.2: if the path is short, we can find it by doing a short BFS from v * on the SSSP tree; and if the path is long, we can use the technique of Sect. 6.3 to shatter the SSSP tree inside every super-vertex to low diameter components separated by centers, add v * and c(H (v * )) of each unpacked super-vertex as centers, and compute a center graph. In the center graph, centers on the path from v * to c(H (v * )) learn that they are on this path, and using a short BFS all vertices on the path between two such centers can be detected and marked accordingly.

Algorithms for networks with unidirectional communication and the congested clique
In this section we present algorithms for directed communication model and the Congested Clique model. The algorithms are based on the same meta-algorithm that we presented above, but their implementation is much simpler than the implementation in the bidirectional communication model. This is because in unidirectional-communication CONGEST, we allow ourselves semi-linear running time (which is almost tight), as we show in Sect. 10; and in the Congested Clique, the fact that each node can communicate with every other node makes life easier as well. We give a simple algorithm (based on the meta-algorithm) that can be implemented in various distributed models, and assumes the existence of two algorithmic primitives: an SSSP algorithm which can handle directed edges and zero-weight edges, and BroadcastToNetwork, which assumes that each node v is given a set of messages M v such that v∈V |M v | ≤ n, and has every node of the network learn all messages of all nodes. Then we show in how to implement these primitives in the directed communication and Congested Clique models.

Overview of the algorithm
For the sake of consistency with the preceding sections, the algorithm we describe below computes a downwardsoriented DMST (this is also the orientation of the DMST computed by Edmonds' original algorithm [10]). However, when working with a unidirectional communication network, we can only compute an upwards-oriented DMST; in this case, we call the algorithm below with reversed edge weights.
At the beginning of the algorithm, each node v = r sets W (v) to be the weight of its minimum-weight incoming edge, deducts that weight from all its incoming edges, and adds a single zero-weight incoming edge to H (which is initially empty).
Subsequently, each iteration of the algorithm proceeds as follows: (1) At the start of each iteration, using the BroadcastToNetwork procedure, each node learns all of the edges of H , as well as the current partition of V into super-vertices and active components.
SoftContract(β(H (v))) // Network Updates H and super-vertices for the next iteration Locally Compute The Tree of G j according to H and the SSSP tree in all iterations

O(n)-round implementation in unidirectional networks
In this model we assume that each node can only send messages on its outgoing edges, and knows only the weights of its incoming edges. Accordingly, we output a tree oriented upwards; we do so by applying Algorithm 1 (which gives a downwards tree) with reverse edge directions. As our SSSP primitive, we use Bellman-Ford, which is a distributed SSSP algorithm that runs in this model in O(n) rounds. As the graph is strongly connected, we can implement the BroadcastToNetwork procedure using O(n) rounds using the BroadcastValue procedure (Algorithm 2 in the appendix). Overall, the round complexity of each iteration is O(n), and therefore the total round complexity of the algorithm is O(n log n).

A O(n 1/3 )-round algorithm for the Congested Clique
In the Congested Clique, we use the APSP algorithm for directed graphs of [4], which runs in O(n 1/3 ) rounds.
We can implement BroadcastToNetwork using O(1) rounds in the following simple manner: first, each node u sends the number m u of messages it needs to disseminate to all the nodes in the graph. Recall that we assume u m u ≤ n. Next, the nodes assign to each message a unique ID in the range {1, . . . , n}, by assigning to node 1's messages the IDs 1, . . . , m 1 , to node 2's messages the IDs m 1 +1, . . . , m 1 +m 2 , and so on (we assume here w.l.o.g. that the IDs of the nodes are 1, . . . , n; if this is not the case, in the Congested Clique, renaming is trivial). If the messages of node u were assigned IDs i 1 , . . . , i m u , then node u sends each message i j to node i j . Finally, each node disseminates the message it received in the previous round (if any) to all nodes of the graph. The entire procedure requires three rounds.

DMST versus (s, t)-shortest path
We have shown that the DMST problem is no harder than single-source shortest path. In this section we adapt a reduction of Chechick [6] from the sequential setting to CONGEST, showing that distributed DMST is at least as hard as (s, t)shortest path, where we are given two vertices s, t and must find the shortest directed path from s to t. The reduction holds for all three models we consider in this paper (assuming we work with strongly-connected graphs): it simply modifies the graph on which we want to solve (s, t)-SP, so that any DMST on the graph will reveal the shortest path from s to t. We take care that the modified graph can be simulated by the original graph without much additional communication.
Given a graph G = (V , E), we define a graph G as follows (see Fig. 5): G contains all vertices and edges of G, and in addition, for each vertex v ∈ V , we add a "shadow vertex" v , with a zero-weight edge (v , v). For each original edge (u, v) we add a "shadow edge" (u , v ), again with weight zero. Finally, we add the zero-weight edge (t, t ) (where t is the target node).
Observe that all the edges we added to G are either shadow edges or edges incoming into vertices of G, except for the edge (t, t ), which is outgoing from t. Therefore, in G , we did not create any path from s to t that was not already in G.

Lemma 14 The weight of the DMST of G rooted at s is the weight of the (s, t)-shortest path in G.
Proof Let W be the weight of the DMST of G rooted at s, and let d be the weight of the shortest path from s to t in G. It is easy to see that W ≥ d, because the DMST must contain some path from s to t, and in G we did not create any path from s to t that was not already in G.
To show that W ≤ d, consider the following DMST: take a shortest path π from s to t in G, and add all its edges to the DMST. In addition, take edge (t, t ), and some arbitrary (Such a spanning tree exists, because we can take a directed spanning tree of G rooted at t and "copy it" onto the shadow edges.) Finally, for each v ∈ V that is not on π , add the edge (v , v). The resulting tree is spanning and oriented outwards from s, and its weight is exactly d, because other than the edges of π , it uses only zero-weight edges.

Theorem 10
The asymptotic round complexity of DMST in CONGEST is at least that of (s, t)-shortest path.
Proof Given a DMST algorithm A and a graph G, we can solve (s, t)-shortest path on G by constructing G and simulating the execution of A on G . Each vertex v of G simulates itself and its shadow vertex v . To simulate one round of A on G , each vertex sends to its neighbors the messages that it would send under A on its own edges, and also the messages its shadow vertex would send on its edges under A. This increases the communication by only a constant factor.

Lower bound for unidirectional networks
We prove a simple lower bound, showing that DMST requires Ω(n) rounds in networks with directed communication links, where nodes only send messages on outgoing edges, and know only the weights of their incoming edges (i.e., the the assumptions under which Algorithm 1 works). This holds even when the strong diameter of the network is 3. The lower bound is proven by reduction to 2-party set disjointness.
The 2-player disjointness problem is defined as follows: we have two players, Alice and Bob, with private inputs X , Y ⊆ [n], respectively. The goal of the players is to determine whether X ∩ Y = ∅. For this purpose they may communicate with one another by sending messages backand-forth. The celebrated lower bound of [24,34] shows that even for randomized communication protocols, the players must exchange Ω(n) bits to solve disjointness with constant success probability. We show that given a distributed algorithm for DMST in directed networks, we can construct a 2-party protocol for disjointness, as follows: using their inputs X , Y to disjointness, Alice and Bob construct a lower bound graph G X ,Y , on which they simulate the execution of the DMST algorithm. We show that the simulation does not require a lot of communication, and also that from the output of the DMST algorithm (i.e., from the weight of the DMST of G X ,Y ) the players can deduce whether X ∩ Y = ∅.
It is convenient to think of the inputs X , Y as the characteristic vectors of the sets, so that X i = 1 iff i ∈ X , and similarly for Y . The graph G X ,Y = (V , E, w) is defined as follows (see Fig. 6 with weight X i + 1, and the edge (v i , v B ) with weight Y i + 1.
Observe that G X ,Y has strong diameter 3 for any X , Y ⊆ [n]. Next, we show that from the weight of the DMST, we can deduce whether or not X ∩ Y = ∅: . The weight DMST of G X ,Y rooted at r is W (G X ,Y ) = n + 2 + |X ∩ Y |.
Proof Clearly the edges (v A , r ) and (v B , r ) must be taken, as these are the only outgoing edges of v A , v B . For the root r , we do not take any outgoing edges.
For each i ∈ {1, . . . , n}, we must connect v i to the tree by choosing either the edge (v i , v A ), which has weight X i + 1, or the edge (v i , v B ), whose weight is Y i + 1. If i ∈ X ∩ Y , then both edges have weight 2; but if i / ∈ X ∩ Y , at least one edge has weight 1, and this edge will be chosen for the DMST. Therefore, the cost of connecting v i to the DMST is 1 + |X ∩ Y ∩ {i} |.
Together, we see that we pay 2 for connecting v A , v B to the DMST, and a total of n + |X ∩ Y | for connecting v 1 , . . . , v n .

Corollary 3
The weight of the DMST of G X ,Y is n + 2 if and only if X and Y are disjoint.
It remains to show that Alice and Bob can jointly simulate the execution of a DMST algorithm in G X ,Y , without requiring a lot of communication.

Lemma 16 Let
A be an R-round algorithm that runs in unidirectional CONGEST networks, assuming each node can send messages only on outgoing edges, and knows the weight only of incoming edges. Then Alice and Bob can simulate the execution of A on G X ,Y using O(R · log n) bits.
Proof Alice locally simulates all the nodes in V \ {v B }, and Bob locally simulates the nodes in V \{v A }; if A is a randomized algorithm, the players use public randomness to agree on the random choice of each node that they both need to simulate (nodes r , v 1 , . . . , v n ).
Each round of A is simulated as follows: for all the nodes that Alice needs to simulate, she knows the weights of their incoming edges, as they depend only on X ; similarly for Bob and Y . Thus, given the state of a node at the end of the previous round (or the initial state, in the first round), the players can compute the messages that the node would send in A on each outgoing edge.
Next, the players compute the local state of each node they simulate at the end of the round. To do this, they need to "feed" the node the messages that it receives under A. Since she simulates all nodes except v B , Alice knows all the messages sent in the network, except for the message sent by node v B along the edge (v B , r ); to complete the picture, Bob sends her this message, using O(log n) bits. Similarly, Alice sends to Bob the message that node v A sends along the edge (v A , r ), again using O(log n) bits. The total communication of the players is 2 log n bits per round of the distributed algorithm.
From the previous two lemmas, together with the lower bound of Ω(n) on n-bit disjointness, we immediately obtain the lower bound:

Corollary 4
Computing the weight of the DMST in directed, strongly-connected networks with strong diameter 3 requires Ω(n) rounds.

A Basic procedures
In this section we define useful basic procedures used in the DMST algorithm. Most procedures in this section can be seen as special cases of Lemma 16 in [21], and can be implemented in O( √ n + D) rounds in a black-box manner using this Lemma.

A.1 ClassifySize
Procedure ClassifySize takes as input a threshold t ∈ N, and a set of edges E ⊆ E. The procedure checks, at each node v ∈ V , whether or not the connected component v belongs to in the subgraph of the network graph induced by E has size greater than t. If the component has size at most t, the procedure returns Small, and if the size exceeds t it returns Large.
This can be implemented using Lemma 17 by taking x v = 1 and ⊕ to be addition. Assume that each node has a set of values, possibly empty, such that the total number of distinct values in the network is not too large. This simple sub-procedure implements a broadcast of all the values to the network, so that at the end of the procedure, each node knows all the distinct values of all the nodes.

A.2 BroadcastValue
A standard pipelining argument [22] shows that if indeed D + | u∈V S(u)| ≤ roundLimit, then roundLimit rounds suffice to propagate all values to the entire network.

A.3 LearnMin
In the LearnMin procedure, we start with a partition P of the nodes into connected components (with respect to the network graph). Let P(v) ∈ P denote the component to which node v belongs. Each node also has an initial value value(v) ∈ D from some domain D. Finally, the procedure takes, as an optimal parameter, a total order ≤ with respect to which values are ordered. The default for ≤ is the lexicographic order on tuples of real numbers (or just the natural ordering on real numbers, if value is not a tuple).
At the end of the procedure, each node v outputs the minimum value in its component, min {value(u) | u ∈ P(v)}.
This can be implemented using Lemma 17 by taking x v to be the value of v, and ⊕ to be min operator.
We also use the following variations on LearnMin, which are essentially syntactic sugar and can be implemented by calling LearnMin appropriately: -LearnMax: same as LearnMin, but takes the maximum instead of the minimum. It can of course be implemented by calling LearnMin with the order ≥ instead of ≤. -LearnSingleValue: here, we assume that in each component C ∈ P, there is at most one value that some nodes of C want to disseminate to all nodes of C. Initially, some nodes of C know this value (and all agree on it), and some nodes do not. We call LearnMin: nodes that have a value to disseminate pass in their value, and the other nodes use ⊥. We use an order where any non-⊥ value x satisfies x ⊥, and among non-⊥ values we order arbitrarily (it does not matter, since at most one non-⊥ value originates in each component).

B Pseudo-code for the DMST algorithm
The code for a node v sometimes uses the values of local variables at v's neighbors, with the implicit understanding that we require one additional round for nodes to send their neighbors these values.
It is assumed that nodes execute the pseudo-code line-byline. They enter all if-else cases, but only execute them if they the case applies to them, and otherwise just passively propagate messages in procedures which require the entire network, such as ClassifySize or a LearnMin.
Two procedures which do not have pseudo-code in this section are SDSPInsideComponent and ChooseCenters. ChooseCenters is the centerchoosing procedure described in Sect. 6.3. SDSPInsideComponent is a procedure described in Sect. 6.1, in which each node v learns the distance from itself to a given node c(H i ) on the cycle of the active component H i to which v belongs.