Message Reduction in the Local Model is a Free Lunch

A new \emph{spanner} construction algorithm is presented, working under the \emph{LOCAL} model with unique edge IDs. Given an $n$-node communication graph, a spanner with a constant stretch and $O (n^{1 + \varepsilon})$ edges (for an arbitrarily small constant $\varepsilon>0$) is constructed in a constant number of rounds sending $O (n^{1 + \varepsilon})$ messages whp. Consequently, we conclude that every $t$-round LOCAL algorithm can be transformed into an $O (t)$-round LOCAL algorithm that sends $O (t \cdot n^{1 + \varepsilon})$ messages whp. This improves upon all previous message-reduction schemes for LOCAL algorithms that incur a $\log^{\Omega (1)} n$ blow-up of the round complexity.


Introduction
What is the minimum number of messages that must be sent by distributed graph algorithm for solving a certain task? Is there a tradeoff between the message and time complexities of such algorithms? How do the message complexity bounds depend on the exact model assumptions? These questions are among the most fundamental ones in distributed computing with a vast body of literature dedicated to their resolution.
A graph theoretic concept that plays a key role in this regard is that of spanners. Introduced by Peleg and Ullman [33] (see also [32]), an α-spanner, or a spanner with stretch bound α, of a connected graph G = (V, E) is a (spanning) subgraph H = (V, S) of G where the distance between any two vertices is at most α times their distance in G. 1 More general spanners, called (α, β)-spanners, are also considered, where the spanner distance between any two nodes is at most α times their distance in G plus an additive β-term ( [17]).
Sparse low stretch spanners are known to provide the means to save on message complexity in the LOCAL model [27,31] without a significant increase in the round complexity. This can be done via the following classic simulation technique: Given an n-node communication graph G = (V, E) and a LOCAL algorithm A whose run A(G) on G takes t rounds, (1) construct an α-spanner H = (V, S) of G; and (2) simulate each communication round of A(G) by α communication rounds in H so that a message sent over the edge (u, v) ∈ E under A(G) is now sent over a (u, v)-path of length at most α in H. The crux of this approach is that the simulating algorithm executed in stage (2) runs for αt rounds and sends at most 2αt · |S| messages. Therefore, if α and |S| are 'small', then the simulating algorithm incurs 'good' round and message bounds. In particular, the performance of the simulating algorithm does not depend on the number |E| of edges in the underlying graph G.
What about the performance of the spanner construction in the 'preprocessing' stage (1) though? A common thread among distributed spanner construction algorithms is that they all send Ω(|E|) messages when running on graph G = (V, E). Consequently, accounting for the messages sent during this preprocessing stage, the overall message complexity of the aforementioned simulation technique includes a seemingly inherent Ω(|E|) term. The following research question that lies at the heart of distributed message reduction schemes is therefore left open. Question 1.1. Given a LOCAL algorithm A whose run A(G) on G takes t rounds, is it possible to simulate A(G) in O(t) rounds while sending only O(n 1+ε ) messages for an arbitrarily small constant ε > 0, irrespective of the number |E| of edges in G?
This question would be resolved on the affirmative if one could design a LOCAL algorithm that constructs an α-spanner H = (V, S) of G with stretch α = O(1) and |S| = O(n 1+ε ) edges in O(1) rounds sending O(n 1+ε ) messages. Despite the vast amount of literature on distributed spanner construction algorithms [5, 9-11, 14, 16, 18, 35], it is still unclear if such a LOCAL spanner construction algorithm exists.
Some progress towards the positive resolution of Question 1.1 has been obtained by Censor-Hillel et al. [8] and Haeupler [22] who introduced techniques for simulating LOCAL algorithms by gossip processes. Using this approach, one can transform any t-round LOCAL algorithm into a LOCAL algorithm that runs in O(t log n + log 2 n) rounds while sending n messages per round [22]. This transformation provides a dramatic message complexity improvement if one is willing to accept algorithms that run for log O(1) n many rounds, e.g., if the the bound t on the round complexity of the original algorithm is already in the log O(1) n range. However, if t = log o(1) n, then the gossip based message reduction scheme of [8,22] significantly increases the round complexity and this increase seems to be inherent to that technique.

Definitions and Results
Throughout, we consider a communication network represented by a connected unweighted undirected graph G = (V, E) and denote n = |V |. The nodes of G participate in a distributed algorithm under the (fully synchronous) LOCAL model [27,31] with the following two model assumptions: (i) the nodes know an O(1)-approximate upper bound on log n (equivalently, a poly(n)-approximate upper bound of n) at all times; and (ii) the graph admits unique edge IDs so that the ID of an edge is known to both its endpoints at all times. 2 Other than that, the nodes have no a priori knowledge of G's topology. Our main technical contribution is a new algorithm for constructing sparse spanners, called Sampler, whose guarantees are cast in the following Theorem. Theorem 1.2. Fix integer parameters 1 ≤ k ≤ log log n and 0 ≤ h ≤ log n. Algorithm Sampler constructs an edge set S ⊆ E of size |S| ≤Õ(n 1+1/(2 k+1 −1) ) such that H = (V, S) is an O(3 k )-spanner of G whp. 3 4 The round complexity of Sampler is O(3 k h) and its message complexity isÕ(n 1+1/(2 k+1 −1)+(1/h) ) whp.
By setting the parameters k and h so that 1/(2 k+1 − 1) = 1/h = ε/2 for an arbitrarily small constant ε > 0 and utilizing the aforementioned spanner based simulation technique, we obtain a message-reduction scheme that transforms any LOCAL algorithm A whose run on G takes t rounds into a (randomized) LO-CAL algorithm that runs in O(t) rounds and sendsÕ(tn 1+ε ) messages whp. This resolves Question 1.1 on the affirmative provided that one is willing to tolerate a 1/poly(n) error probability. In fact, we can improve the message reduction scheme even further via the following two-stage process: first, use the α-spanner H = (V, S) constructed by Sampler to simulate the run on G of some off-the-shelf LOCAL algorithm that constructs an α -spanner H = (V, S ) with a better tradeoff between α and |S |; then, use H to simulate the run of A on G. In Section 6, we show that with the right choice of parameters, this two-stage process leads to the following theorem. Theorem 1.3. Every distributed task solvable by a t-round LOCAL algorithm can be solved with any one of the following pairs of time and message complexities: •Õ(tn 1+2/(2 γ+1 −1) ) message complexity and O(3 γ t+6 γ ) round complexity for any 1 ≤ γ ≤ log log n, •Õ(t 2 n 1+O(1/ log t) ) message complexity and O(t) round complexity.

Related Work and Discussion
*Model Assumptions The current paper considers the fully synchronous message passing LOCAL model [27,31] that ignores the message size and focuses only on locality considerations. This model has been studied extensively (at least) since the seminal paper of Linial [27], with special attention to the question of what can be computed efficiently, including some recent interesting developments, see, e.g., the survey in [20,Section 1]. The more restrictive CONGEST model [31], where message size is bounded (typically to O(log n) bits), has also been extensively studied.
Many variants of the LOCAL model have been addressed over the years, distinguished from each other by the exact model assumptions, the most common such assumptions being unique node IDs and knowledge of n. Another important distinction addresses the exact knowledge held by any node v regarding its incident 2 Alternatively, the algorithm can run under the rather common KT1 model variant [3], where the nodes are associated with unique IDs and each node knows the ID of the other endpoint of each one of its incident edges; see the discussion in Section 1.2. 3 We say that an event occurs with high probability, abbreviated by whp, if the probability that it does not occur is at most n −c for an arbitrarily large constant c. 4 The asymptotic notationÕ(·) may hide log O(1) n factors. edges when the execution commences. Two common choices in this regard are the KT 0 variant, where v knows only its own degree, and the KT 1 variant, where v knows the ID of e's other endpoint for each incident edge e [3]. The authors of [3] advocate KT 1 , arguing that it is the more natural among the two model variants, but papers have been published about each of them.
In the current paper, it is assumed that each edge (u, v) ∈ E is equipped with a unique ID, known to both u and v. In general, this assumption lies (strictly) between the KT 0 and KT 1 model variants. Note that the unique edge IDs assumption is no longer weaker than the KT 1 assumption when the communication graph admits parallel edges. However, our algorithm and analysis apply also to such graphs (assuming that |E| ≤ n O(1) ) under either of the two assumptions. *Message Complexity o(|E|) As discussed in Section 1.1, the main conceptual contribution of this paper is that on graphs with m = |E| n edges, many distributed tasks can now be solved by sending o(m) messages while keeping the round complexity unharmed. The challenge of reducing the message complexity below O(m) has already received significant attention. In particular, it has been proved in [25] that under the CONGEST KT 0 model, intensively studied global tasks, namely, distributed tasks that require Ω(D) rounds, where D is the graph's diameter (e.g., broadcasting, leader election, etc.), cannot be solved unless Ω(m) messages are sent (in the worst case). This is no longer true under more relaxed models. For example, under the LOCAL KT 1 model, DFS and leader election can be solved by sending O(n) and O(n log n) messages, respectively [24]. This implies similar savings in the number of messages required for most global tasks (trivially, by collecting all the information to the leader). Under the CONGEST KT 1 model, it has been recently proved that a minimum spanning tree can be constructed, sending o(m) messages [19,21,23,29].
Restricted graph classes have also been addressed in this regard. In particular, the authors of [26] proved that under the CONGEST KT 0 model, the message complexity of leader election is O( √ n log 3 2 n) whp in the complete graph and more generally, O(τ √ n log 3 2 n) whp in graphs with mixing time τ (G). *Spanners Graph spanners have been extensively studied and papers dealing with this fundamental graph theoretic object are too numerous to cite. Beyond the role that sparse spanners play in reducing the message complexity of distributed (particularly LOCAL) algorithms as discussed in Section 1.1, spanners have many applications in various different fields, some of the more relevant ones include synchronization [1,33], routing [2,34], and distance oracles [36].
Many existing distributed spanner algorithms have a node collect the topology of the graph up to a distance of some r from itself [9,12] or employ more sophisticated bounded diameter graph decomposition techniques involving the node's neighborhood [13,16,18,30] such as the techniques presented in [6,15,28]. This approach typically requires sending messages over every edge at distance at most r from some subset of the nodes which leads to a large number of messages. Another approach to constructing sparse spanners in a distributed manner is to recursively grow local clusters [4,5,7,10,35]. Although this approach does not require the (explicit) exploration of multi-hop neighborhoods, the existing algorithms operating this way also admit large message complexity because too many nodes have to explore their 1-hop neighborhoods. Our algorithm Sampler is inspired by the algorithm of [5] and adheres to the latter approach, but it is designed in a way that drastically reduces the message complexity -see Section 1.3.

Techniques Overview
Algorithm Sampler employs hierarchical node sampling, where a sampled node u in level j of the hierarchy forms the center of a cluster that includes (some of) its non-sampled neighbors v. An edge connecting u and v is added to the spanner. The clusters are then contracted into the nodes of the next level j + 1. Also added to the spanner are all incident edges of every non-center node that has no adjacent center.
This hierarchical node sampling is used also in the distributed spanner construction of Baswana and Sen [5] and similar recursive clustering techniques were used in other papers as well (see Section 1.2).
Common to all these papers is that the centers in each level communicate directly with all their neighbors to facilitate the cluster forming task. 5 As this leads inherently to Ω(|E|) messages, we are forced to follow a different approach, also based on (a different) sampling process.
The first thing to notice in this regard is that it is enough for a non-center node v to find a single center u in its neighborhood, so perhaps there is no need for the center nodes to announce their status to all their neighbors? Indeed, we invoke an edge sampling process in the non-center nodes v that identifies a subset of v's incident edges over which query messages are sent. Our analysis shows that (1) the number of query messages is small whp; (2) if v does have an adjacent center, then the edge sampling process finds such a center whp; and (3) if v does not have an adjacent center, then v's degree is small; in this case, all its incident edges are queried and join the spanner whp.
However, this edge sampling idea by itself does not suffice: Note that the graph (in some level of the hierarchical construction) is constructed via cluster contraction (into a single node) in lower levels of the hierarchy. Hence this graph typically exhibits edge multiplicities (even if the original communication graph is simple). This means that some neighbors w of v may have many more (parallel) (w, v)-edges than others. Informally, this can bias the probabilities for finding additional neighbors in the edge sampling process. The key idea in resolving this issue is to run the edge sampling process (in each level) in a carefully designed iterative fashion. Intuitively, the first iterations in each level "peels off" the neighbors to which a large fraction of v's incident edges lead. This increases the probability of finding one of the rest of the neighbors in later iterations. Note that once v found a neighbor u, node v can identify all v's edges leading to u (and so "peel them off" from the next iterations), by having u report to v the IDs of all the edges touching u.

Paper Organization
The rest of the paper is organized as follows. Following some preliminary definitions provided in Section 2, Sampler is presented in Section 3 and analyzed in Section 4. For clarity of the exposition, we first present Sampler as a centralized algorithm and then, in Section 5, explain how it can be implemented under the LOCAL model with the round and message complexities promised in Theorem 1.2. Section 6 describes how algorithm Sampler is used to obtain the message-reduction schemes we mentioned in the introduction. Finally, we conclude the paper in Section 7.

Preliminaries
Consider some graph G = (V, E). When convenient, we may denote the node set V and edge set E by V (G) and E(G), respectively. Unless stated otherwise, the graphs considered throughout this paper are undirected and not necessarily simple, namely, the edge set E may include edge multiplicities (a.k.a. parallel edges). Given disjoint node subsets U, U ⊆ V , let E(U ) denote the subset of edges with (exactly) one endpoint in U and let E(U, U ) = E(U ) ∩ E(U ) denote the subset of edges with one endpoint in U and the other in U . If U = {u} and U = {u } are singletons, then we may write E(u) and E(u, u ) instead of E({u}) and E({u}, {u }), respectively (notice that E(u, u ) may contain multiple edges when G is not a simple graph).
Let C = {C 1 , . . . , C } be a collection of non-empty pairwise disjoint node subsets referred to as clusters of G. 6 The cluster graph (cf. [31]) induced by C on G, denoted by G(C), is the undirected graph whose nodes are identified with the clusters in C, and the edges connecting nodes C i and C j , 1 ≤ i = j ≤ , correspond to the edges crossing between clusters C i and C j in G, that is, the edges in E(C i , C j ). Observe that G(C) may include edge multiplicities even if G is a simple graph. We denote by Ind G (C) the subgraph of G induced by C. When convenient, the term "cluster C applies also to Ind G (C) 7 For u, v, ∈ V , the distance between u and v in G is denoted by dist G (u, v).
As stated in the introduction, the assumption about global parameters is that each node knows an O(1)approximate upper bound on log n. For the sake of simplicity, we treat the algorithm as if each node knows the exact value of log n, however, this is not essential.

Constructing an O(k )-Spanner
In the following argument, let δ = 1/(2 k+1 − 1) and = 1/h for short. Denote the simple graph input to the algorithm by G 0 = (V 0 , E 0 ). Algorithm Sampler (see Pseudocode 1) generates a sequence G 1 , . . . , G k of graphs, where G j = (V j , E j ). Let n j and m j be the numbers of nodes and edges in G j respectively, N j (v) be the set of neighbors in G j of node v ∈ V j , and E j (v, u) be the set of edges connecting v to u in G j . The process is executed in an iterative fashion, where in each iteration j = 0, . . . , k, the algorithm constructs a collection C ⊂ 2 V j of pairwise disjoint clusters and an edge set F ⊆ E j that is added to the spanner edge set S (as an exception, in the final iteration of j = k, only F is constructed but the cluster collection C is not created). The graph G j+1 is then defined to be the cluster graph G j (C) induced by C on G j . The construction of C and F is handled by procedure Cluster j that is described soon. To avoid confusion, in what follows, we fix n = n 0 = |V 0 |.
if j < k then 6: G j+1 ← G j (C) 7: return S *Procedure Cluster j On input graph G j (0 ≤ j ≤ k), this procedure constructs the cluster collection C ⊂ 2 V j and edge set F = ∪ v∈V j F v (F v ⊆ E j ) to be added to the spanner edges. The procedure (see Pseudocode 2) consists of two steps. In the first step, each node v tries to identify min{c exp n (2 j δ) log n, |N j (v)|} neighbors by an iterative random-edge sampling process 8 , where c is a sufficiently large constant to guarantee high success probability of the algorithm. For this process, v maintains a set X v ⊆ E j (v) of the edges that have not been explored yet. Initially, X v is set to X v = E j (v); the content of X v is then gradually eliminated by running 2/ = 2h trials. In every trial, each node v ∈ V j chooses c 2 exp n (2 j δ + ) log 3 n edges from X v independently and uniformly at random (possibly choosing the same edge twice or more). Each chosen edge is said to be a query edge. For a neighbor u ∈ N j (v) such that E j (v, u) contains a query edge, we say that u is queried by v. For each queried node u ∈ N j (v), v adds one arbitrary query edge e ∈ E j (u, v) to the edge set F v , and eliminates all the edges in E j (u, v) from X v (Figure 1 (a)-(c)). Then the procedure advances to the next trial unless |F v | ≥ c exp n (2 j δ) log n holds or X v is emptied. Let N j (v) ⊆ N j (v) be the set of the nodes queried by v after finishing 2h trials.
It is proved later that every node becomes either light or heavy whp.
In the second step, the algorithm creates (only if j < k) the vertex set V j+1 by clustering the nodes in V j . The algorithm marks each node v ∈ V j as a center w.p. p j = exp n (−2 j δ). Each node v having a center 7 Each cluster here will be a connected component. 8 For ease of writing long exponents, define exp a (x) = a x . u contained inN j (v) is merged into u (if two or more centers are contained, an arbitrary one is chosen). A merged cluster corresponds to a node in G j+1 (Figure 1 (d)-(f)). As a result, letting C ⊂ 2 V j be the clusters inducing the cluster graph G j+1 , each cluster C = C(u) ∈ C contains exactly one center u ∈ V j and some subset of N j (u). The node which is not merged into any center is said to be an unclustered node. Due to some technical reason, all the nodes in G k are defined to be unclustered. It is shown that every heavy node is merged into some center whp., and that every node in G k is light (proved later). Thus, every unclustered node is light.
run (at most) 2h trials 5: for x = 1 to c 2 exp n (2 j δ + ) log 3 n do 6: add an edge e selected uniformly at random from X v to F v 7: Remove all the edges incident to u from X v 10: Remove all the edges incident to u other than e from F v

11:
F v ← F v ∪ {e} 12: i ← i + 1 13: F ← ∪ v∈V j F v 14: / * Second Step * / 15: if j < k then 16: for all v ∈ V j do 17: mark v as a center and create C(v) = {v} w.p. exp n (−2 j δ) 18: for all non-center v ∈ V j do 19: if ∃(v, u) ∈ F : u is a center then

Analysis
Throughout this section, we refine the definition of terminology "whp." to claim that the probabilistic event considered in the context holds with probability 1 − 1/n Θ(c) for parameter c defined in the algorithm. With a small abuse of probabilistic arguments, we treat those events as if they necessarily occur (with probability one). Since we only handle a polynomially-bounded number of probabilistic events in the proof, the standard union-bound argument ensures that any consequence of the analysis also holds whp. for a sufficiently large c. We begin the analysis by bounding the number of nodes in graph G j . Denotep j = 0≤i≤j p i . It is easy to check thatp j−1 = exp n (−(2 j − 1)δ) for 1 ≤ j ≤ k.
Proof. The value n j follows the binomial distribution of n j−1 trials and success probability p j−1 . Applying Chernoff bound (under conditioning n j−1 ), the inequality below holds for all 1 ≤ j ≤ k whp. 9 1 − c log n n j−1 p j−1 n j−1 p j−1 ≤ n j ≤ 1 + c log n n j−1 p j−1 n j−1 p j−1 .
We next prove the facts mentioned in the explanation of the procedure Cluster j .
Lemma 4.2. For any 0 ≤ j ≤ k − 1, any heavy node v ∈ V j contains at least one center inN j (v) whp.
Proof. The probability that no center is contained inN j is (1 − p j ) |N j (v)| . Since |N j (v)| ≥ c exp n (2 j δ) log n = c log n/p j holds for any heavy node v, the probability is at most 1/n c .

Lemma 4.3.
For any 0 ≤ j ≤ k, any node v ∈ V j becomes light or heavy whp. Furthermore, any node v ∈ V k becomes light whp. 9 Chernoff bound for the binomial distribution X of m trials and success probability p is Proof. Let α = (3c exp n (2 j δ) log 2 n)/h for short.
be the set of the nodes not queried by v at the beginning of the i-th trial, and m i be the numbers of edges in X v at the beginning of the i-th trial. For any node u ∈ N i j (v), the value of |E j (v, u)| is called the volume of u. Similarly, for any X ⊆ N i j (v), we call the value of |E j (v, X)| the volume of X.
be the maximum-volume class of all at the beginning of the i-th trial. Since the volume of K i (v) is at least m i /2h, there exists at least one . By the definition of class K i (v), it implies that the volume of any node u ∈ K i (v) is at least m i /(2h|K i (v)|n ).
Let β be the non-negative integer satisfying (β − 1)α ≤ |K i (v)| ≤ βα. Then we consider an arbitrary partition of K i (v) into q = |K i (v)|/β groups K 1 , K 2 , . . . , K q of size β. Note that βq is not necessarily equal to |K i (v)|, but the residuals are omitted. Since any node in K i (v) has a volume at least m i /(2h|K i (v)|n ), the volume of K (1 ≤ ≤ q) is at least βm i /(2h|K i (v)|n ). Thus the probability that a query edge is sampled from E j (v, K ) is at least β/(2h|K i (v)|n ) ≥ 1/(2hαn ). Letting Z be the number of query edges in E j (v, K ) created at the i-th trial, for any 1 ≤ ≤ q, we have Thus, in every trial, at least one node in each group K is queried by v whp. If |K i (v)| ≤ α holds, β = 1 holds and thus each group consists of a single node in K i (v). Thus all nodes in K i (v) are queried by v whp. in the i-th trial (note that no node becomes a residual in the case of β = 1). Otherwise, log n holds, and thus v queries at least c exp n (2 j δ) log n nodes in the i-th trial. Consequently, if |K i (v)| ≤ α holds for all 1 ≤ i ≤ 2h, v queries all nodes in K i (v) at the i-th trial, that is, queries all nodes in N j (v) throughout the run of Cluster j . Then v becomes light. If |K i (v)| > α holds for some i, v queries at least c exp n (2 j δ) log n nodes in K i (v), which implies v becomes heavy or light. Finally, let us show that any node v ∈ V k is light. Since n k ≤ 3 exp n (1−(2 k −1)δ)/2 = 3 exp n (2 k δ)/2 ≤ (3c exp n (2 k δ) log 2 n)/h = α holds from Lemma 4.1, we have |N j (v)| ≤ n k ≤ α. Since K i (v) is a subset of N j (v), |K i (v)| ≤ α holds for all i. By the argument above, then v is light.
The rest of the analysis is divided into two parts: First, in Section 4.1, we analyze the stretch of H, proving that it is at most κ = O(3 k ). Section 4.2 then establishes anÕ(n 1+δ ) upper bound on the number of edges in H.

Bounding the Stretch
The following lemma is a well-known fact. . Let H = (V, X) be any (spanning) subgraph of G = (V, E) and C be the partition of V such that for any C ∈ C, Ind H (C) has a diameter at most . If X contains at least one edge in E(C i , C j ) for any pair (C i , C j ) ∈ C 2 such that E(C i , C j ) is nonempty, H is a (2 + 1)-spanner.
be the set of the nodes unclustered in the run of Cluster j , and V = 1≤j≤k−1 V j . We define C j (v) ⊆ V as the set of nodes in V which are clustered into v ∈ V j , and also define r(v) as the value j satisfying v ∈ V j for any v ∈ V . Let C(v) = C r(v) (v) for short. Proof. We show that Ind H (C j (v)) contains a spanning tree of Ind G (C j (v)) with height at most 3 j − 1.
The proof follows the induction on j. For j = 0, Ind H (C j (v)) = Ind G (C j (v)) is a graph consisting of a single node, and thus its diameter is zero. Suppose as the induction hypothesis that the lemma holds for some j, and consider the case of j + 1. Since any node v ∈ V j+1 corresponds to a center node v ∈ V j and a star-based connection with its neighbors in V j , Ind H (C j (v)) is obviously contains a spanning tree of Ind G (C j (v)) with a diameter at most 3(3 j − 1) + 2 = 3 j+1 − 1. The lemma follows.
Finally the following theorem is deduced. Proof. Since any node in G k is unclustered, are neighboring and r(u) ≤ r(v) holds, there exists a node w ∈ V r(u) such that u and w are neighboring in G r(u) and C r(u) (w) ⊆ C(v) holds. Since every unclustered node is light (by Lemmas 4.2 and 4.3), C(u) is light. Then at least one edge in E(C r(u) (u), C r(u) (w)) is added to S, which implies that at least one edge in E(C(u), C(v)) is added to S. Consequently, the edge set S constructed by Sampler satisfies the condition of Lemma 4.4 w.r.t. C. The remaining issue is to bound the diameter of C(v) ∈ C for all v ∈ V , which is shown by Lemma 4.5.

Bounding the Number of Edges
Using Lemma 4.1, we can bound the number of edges in S output by Sampler. Proof. Each trial of the first step in the run of Cluster j adds O(exp n (2 j δ) log 3 n) edges to S per node in V j , and n j = O(exp n (1 − (2 j − 1)δ)) holds by Lemma 4.1. Then the total number of edges added to S in Cluster j is h · O(exp n (2 j δ) log 3 n) · n j = O(hn 1+δ log 3 n), and thus the size of S is O(khn 1+δ log 3 n). Since k, h ≤ log n, we obtain the lemma by omitting all the logarithmic factors.
The key observation in this regard is that procedures Cluster j would have been naturally distributed if the nodes in V 0 , V 1 , . . . , V k could have performed local computations and exchanged messages over their incident edges in G 0 , G 1 , . . . , G k , respectively (recall that graphs G 1 , . . . , G k are virtual, defined only for the sake of the algorithm's presentation). Indeed, the action of marking a node as a center and the action of marking an edge as a query/probe edge are completely local and do not require any communication (in G j ). The action of checking whether a query edge e leads to a center and the action of identifying all the edges parallel to a query/probe edge e are easily implemented under the LOCAL model with unique edge IDs by sending a constant number of messages over edge e in G j .
So it remains to explain how local actions in graph G j , 1 ≤ j ≤ k, are simulated in the actual communication graph G. Let T j (v) be the spanning tree of Ind G (C j (v)) shown in the proof of Lemma 4.5. Once a cluster C j (v) is formed, no further edge is added to the inside of the cluster. Thus T j (v) is already contained in C j (v) at the beginning of the run of Cluster j . In the distributed implementation, the local actions of node v in G j are simulated by nodes C j (v) in G via a constant number of broadcast-convergecast sessions over T j (v) rooted at v ∈ V . (This is made possible by the choice of the LOCAL model with unique edge IDs). This process requires sending O(1) (additional) messages over each edge in T j (v), and by Lemma 4.5, it takes at most O(3 j ) rounds.
Proof. In the first step of Cluster j , each trial takes O(1) rounds in G j . The second step takes O(1) rounds in G j . Hence the total running time of Cluster j takes O(h(3 j − 1)) rounds in G. Summing up it over all 0 ≤ j ≤ k, the bound on the round complexity is k j=0 O(h(3 j − 1)) = O(3 k h). For the message complexity, the simulation of one round in G j is implemented with an additive overhead incurred by a constant-number sessions of broadcast and convergecast in each cluster C(v) for v ∈ V j , which use O(n) messages in total. Each trial of the first step in Cluster j uses O(exp n (2 j δ + ) log 3 n) messages per node in V j . Thus the total message complexity in Cluster j is O(exp n (2 j δ + ) log 3 n) · n j = O(hn 1+δ+ log 3 n) by Lemma 4.1. Summing up this over 0 ≤ j ≤ k, we can conclude that the message complexity is O(khn 1+δ+ log 3 n) =Õ(n 1+δ+ ) (recall k, h ≤ log n).

Message-Efficient Simulation of Local Algorithms
In this section, we provide a new and versatile message-reduction scheme for LOCAL algorithms based on the new Sampler algorithm. The technical ingredients of this scheme consist of a message-efficient t-local broadcast algorithm built on top of constructed spanners, which is commonly used in the past messagereduction schemes [8,22].
Consider the initial configuration where each node v ∈ V has a message M v , and let B G,t (v) = {u | dist G (v, u) ≤ t}. The task of the t-local broadcast is that each v ∈ V delivers M v to all the nodes u ∈ B G,t (v). In any t-round LOCAL algorithm, the computation at node v ∈ V relies only on the initial knowledge (i.e., its ID, initial state, and incident edge set) of the nodes in B G,t (v), and thus any t-local broadcast algorithm in the LOCAL model can simulate any t-round LOCAL algorithm. The core of the scheme is the following theorem. Lemma 6.1. There exist two t-local broadcast algorithms respectively achieving the following time and message complexities: •Õ(tn 1+2/(2 γ+1 −1) ) message complexity and O(3 γ t + 6 γ ) round complexity for any 1 ≤ γ ≤ log log n and t ≥ 1, •Õ(t 2 n 1+1/O(t 1/ log log t ) log t) ) message complexity and O(t) round complexity for t ≥ 1.
Proof. Consider the realization of the first algorithm. For any v, all the nodes in B G,t (v) are within αthop away from v in any α-spanner. Thus, once we got any α-spanner H = (V, S), the local flooding within distance αt in H trivially implements t-local broadcast. Setting k = γ and h = (2 γ+1 − 1) of Theorem 1.2 implements the spanner satisfying the first condition, where the additive O(6 γ ) term is the time for spanner construction. For the second algorithm, we utilize the spanner-construction algorithm by Derbel et al. [11] which provides a (3, O(3 k ))-spanner H withÕ(3 k n 1+1/O(k) ) edges within O(3 k ) rounds for any k ≥ 1. Consider the algorithm by Derbel et al. with parameter k = log 3 t − log log 3 t , which results in the O(t/ log 3 t)-round algorithm of constructing (3, O(t))-spanner withÕ(tn 1+1/O(log t) ) edges. We run this algorithm on top of the first simulation scheme with parameter γ = log 3 log 3 t. The simulated algorithm constructs a (3, O(t))-spanner H withÕ(tn 1+1/O(log t) ) edges spending O(3 log 3 log 3 t ·t/ log 3 t+ 6 log 3 log 3 t ) = O(t) rounds andÕ(tn 1+1/O(log t) ) messages. The local flooding within distance 3t + O(t) on top of H implements the t-local broadcast, which takes O(t) rounds andÕ(t 2 n 1+1/O(log t) ) messages. The lemma is proved.
As stated above, Theorem 1.3 is trivially deduced from Lemma 6.1.

Concluding Remarks
In this paper, we present an efficient spanner construction as well as two message-reduction schemes for LOCAL algorithms that preserve the asymptotic time complexity of the original algorithm. The reduced message complexity is close to linear (in n). Is this the best possible in constructing a spanner? Similarly, some open questions still lie on the line of developing efficient message-reduction schemes: (1) While our scheme only sendsÕ(t 2 n 1+O(1/ log t) ) messages for simulating t-round algorithms, it is not clear whether the additive O(1/ log t) term in the exponent can be improved further. Can one have a message-reduction scheme withÕ(poly(t)n 1+o(1/ log t) ) message complexity and no overhead in the round complexity? (2) Algorithm Sampler inherently relies on randomized techniques for probing the neighbors in G j using only few messages. Is it possible to obtain a deterministic message-reduction scheme with no degradation of time? Very recently, the authors received a comment on the first question, which states that utilizing the spanner construction by Elkin and Neiman [16] will improve the message complexity. Unfortunately, due to lack of time, we do not completely check this idea, and thus the current version only states the result based on the algorithm by Derbel et al., but certainly it is a promising approach. If it actually works, the message complexity will be reduced toÕ(t 2 n 1+O(1/t 1/ log log t ) ).
Finally, we note that using an o(m) messages spanner construction algorithm that does not increase the time can be useful also for global algorithms in the LOCAL model. It implies that any function can now be computed on the graph in strictly optimal O(diameter) time and o(m) messages (for large enough m).