Abstract 1 Introduction 2 Assigning Labels 3 Restoring a Greedy Ruling Set Despite Erasures 4 Partitioning 5 Resilient Labeling Schemes References Appendix A Missing Proofs

Near-Optimal Resilient Labeling Schemes

Keren Censor-Hillel ORCID Department of Computer Science, Technion, Haifa, Israel Einav Huberman ORCID Department of Computer Science, Technion, Haifa, Israel
Abstract

Labeling schemes are a prevalent paradigm in various computing settings. In such schemes, an oracle is given an input graph and produces a label for each of its nodes, enabling the labels to be used for various tasks. Fundamental examples in distributed settings include distance labeling schemes, proof labeling schemes, advice schemes, and more. This paper addresses the question of what happens in a labeling scheme if some labels are erased, e.g., due to communication loss with the oracle or hardware errors. We adapt the notion of resilient proof-labeling schemes of Fischer, Oshman, Shamir [OPODIS 2021] and consider resiliency in general labeling schemes. A resilient labeling scheme consists of two parts – a transformation of any given labeling to a new one, executed by the oracle, and a distributed algorithm in which the nodes can restore their original labels given the new ones, despite some label erasures.

Our contribution is a resilient labeling scheme that can handle F such erasures. Given a labeling of bits per node, it produces new labels with multiplicative and additive overheads of O(1) and O(log(F)), respectively. The running time of the distributed reconstruction algorithm is O(F+(F)/logn) in the Congest model.

This improves upon what can be deduced from the work of Bick, Kol, and Oshman [SODA 2022], for non-constant values of F. It is not hard to show that the running time of our distributed algorithm is optimal, making our construction near-optimal, up to the additive overhead in the label size.

Keywords and phrases:
Labeling schemes, Erasures
Funding:
Keren Censor-Hillel: The research is supported in part by the Israel Science Foundation (grant 529/23).
Copyright and License:
[Uncaptioned image] © Keren Censor-Hillel and Einav Huberman; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed algorithms
; Mathematics of computing Graph algorithms
Acknowledgements:
We thank Alkida Balliu and Dennis Olivetti for useful discussions about the state of the art for computing ruling sets.
Funding:
This research is supported in part by the Israel Science Foundation (grant 529/23).
Editors:
Silvia Bonomi, Letterio Galletta, Etienne Rivière, and Valerio Schiavoni

1 Introduction

Assigning labels to nodes of a graph is a part of many fundamental computing paradigms. In distributed computing, cornerstone examples include distance and connectivity labeling schemes, as well as labeling schemes for nearest common ancestor [38, 41, 21, 22, 5, 1, 32, 30, 27, 39, 28, 9, 23, 2], proof labeling schemes and distributed proofs [31, 15, 35, 13, 11, 25, 26, 34, 33, 10, 8], advice schemes [17, 19, 18, 20], and more, where the above are only samples of the extensive literature on these topics. The common theme in all of the above is that an oracle assigns labels to nodes in a graph, which are then used by the nodes in a distributed algorithm. Since distributed networks are prone to various types of failures, Fischer, Oshman and Shamir [16] posed the natural question of how to cope with nodes whose labels have been erased and defined the notion of resilient proof labeling schemes.

We will use the following definition, which applies to a labeling scheme for any purpose but captures the same essence. A resilient labeling scheme consists of two parts. The first part is an oracle transformation of an input labeling φ:V{0,1} into an output labeling ψ:V{0,1}, where =A+B. The second part is a distributed 𝖢𝗈𝗇𝗀𝖾𝗌𝗍111In the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍model, n nodes of a graph communicate in synchronous rounds by exchanging O(logn)-bit messages along the edges of the graph. algorithm, in which each node v receives ψ(v) as its input, except for at most F nodes where ψ(v) is erased, and each node must recover φ(v). The parameters of interest are (i) the number F of allowed label erasures, (ii) the multiplicative overhead A and the additive overhead B in the label sizes, and (iii) the round complexity of the distributed algorithm.

The aforementioned work of Fischer et al. [16] provides a resilient labeling scheme for the case of uniform labels, i.e., when the labels of all nodes are the same. For a general case, the work by Bick, Kol, and Oshman [7] implies a labeling scheme that is resilient to F failures, with a constant multiplicative overhead in the label size, within O(F3logF+(F3)/logn) rounds. In this paper, we ask how far we can reduce the overhead costs in the label size and the round complexity of the distributed algorithm.

Question.

What are the costs of resilient labeling schemes?

When F is constant, the work of Bick et al. [7] achieves constant overheads in the label size and algorithm complexity. We show that for larger values of F, one can do better. The following states our contribution.

Theorem 1.

There exists a resilient labeling scheme that tolerates F label erasures, has O(1) and O(logF) multiplicative and additive overheads to the size of the labels, respectively, and whose distributed 𝖢𝗈𝗇𝗀𝖾𝗌𝗍algorithm for recovering the original labels has a complexity of at most O(F+(F)/logn) rounds.

Notice that Ω(F) is a straightforward lower bound for the complexity of such a distributed algorithm. To see this, suppose the network is a path. If the labels of F consecutive nodes along the path get erased, then for the middle node v to obtain any information about its label, it must receive information from some node whose label is not erased. A message from such a node can only reach v after Ω(F) rounds. Further, this example shows that Ω((F)/logn) is also a lower bound for the number of rounds, due to an information-theoretic argument: the nodes on the subpath of P consisting of its middle F/2 nodes must obtain O(F) bits of information. This number of bits must pass over the two single edges that connect this subpath to the rest of P. Since each of the two edges can carry at most O(logn) bits, we get that Ω((F)/logn) rounds are required. This implies that our scheme is nearly optimal, leaving only the O(logF) additive overhead in the size of the label as an open question.

1.1 Technical Overview

To use a labeling scheme despite erasures, it must be strengthened by backing up information in case it is lost. Erasure-correcting codes are designed for this purpose, but they apply in a different setting – the entire codeword is available to the computing device, except for some bounded number of erasures. Yet, to handle label erasures, we need to encode the information and split the codeword among several nodes. Thus, when labels in a labeling scheme are erased, restoring them requires faulty nodes to obtain information from other nodes. This means that the oracle that assigns labels must encode its original labels into new, longer ones, and the nodes must communicate to restore their possibly erased labels.

Partitioning.

By the above discussion on using error-correction codes, it is clear that if we partition the nodes of the graph into groups of size Θ(F), we can encode the labels within each such subgraph, as follows. By ensuring that each group forms a connected subgraph, the nodes can quickly collect all non-erased labels within a group in O(F+(F)/logn) rounds, and thus decode and reconstruct all original labels.

When F is large, simple methods can construct such a partition with negligible complexity overhead compared to O(F). However, when F is small, such overheads become significant. While this motivation is clear for the complexity of the distributed algorithm, it also applies to the overhead in the label size: much work is invested in providing labels of small size, e.g., the MST advice construction of Fraigniaud, Korman, and Lebhar [19].

The caveat is that if F is small, there may not exist a distributed algorithm for such a partitioning that completes within O(F) rounds. To see why, consider a simple reduction from 3-coloring a ring, which is known to have a lower bound of Ω(logn) by Linial [36]. Given a partition into groups (paths) of size at least 3, we can color each group node-by-node in parallel, except for its two endpoints. This takes at most O(F) rounds, as this is the size of each group. In a constant number of additional rounds, we can in parallel stitch all groups by coloring every pair of adjacent endpoints in a manner consistent with their other two neighbors, which are already colored. Thus, an O(F)-round algorithm for obtaining such a partition contradicts the lower bound for small values of F.

We stress that it is essential that the partition obtained by such a partitioning algorithm is exactly the same one used by the oracle for encoding. In particular, this rules out randomized partitioning algorithms or algorithms whose obtained partition depends on the identity of nodes with label erasures.

Our approach to address this issue is to have the oracle include partitioning information in the new labeling ψ. This way, even if O(F) rounds are insufficient to construct the partition from scratch, the nodes can still reconstruct the partition using the labels within O(F) rounds, despite any F label erasures. A straightforward method would be to include all the node IDs of a group in each label. However, this would increase the size of the new labeling ψ to at least O(Flogn), which for small values of F is an overhead we aim to avoid. Thus, we need a way to hint at the partition without explicitly listing it.

Ruling sets.

To this end, we employ ruling sets. An (α,β)-ruling set is a set SV such that the distance between every two nodes in S is at least α, and every node is within distance at most β to the set S. Given a ruling set with values of α,β that are O(F), we prove that O(F) rounds are sufficient for obtaining a good partition.

Interestingly, the properties of the partition we obtain are slightly weaker but still sufficient for our needs (and still rule out an O(F)-round algorithm for constructing them from scratch, by a similar reduction from 3-coloring a ring). The partition we arrive at is a partition into groups of Θ(F) nodes, where each group may not be connected but has low-congestion shortcuts of diameter O(F) and congestion 2. This will allow us to collect all labels within a group in O(F+(F)/logn) rounds.

More concretely, we aim to construct a (2f+2,2f+1)-ruling set S, for some parameter fF. For intuition, assume f=F. We create groups by constructing a BFS tree Tv rooted at each node vS, capped at a distance of 2f+1 (with ties broken according to the smaller root ID). We then communicate only over tree edges to divide the nodes of Tv into groups. We choose α=2f+2 as a lower bound for the distance between two nodes in S, ensuring that each root v has at least f+1 nodes in its tree. These nodes are within distance f+1 from v and cannot be within that distance from any other node in the ruling set due to our choice of α=2f+2. Additionally, to ensure that the round complexity does not exceed O(f), we need to bound the diameter of each tree. Any β=O(f) would suffice. However, as we will argue, computing the ruling set in the distributed algorithm of our resilient labeling scheme is too expensive. Thus, the oracle must provide information in its new labels to help nodes reconstruct the ruling set S. We choose β=2f+1 so that a node u at distance 2f+2 from a vertex vS can determine it must be in S even if its label is erased.

As promised, we note that due to the lower bound of Balliu, Brandt, Kuhn, and Olivetti [4], no distributed algorithm can compute even the simpler task of a (2,β)-ruling set in fewer than Ω(logn/βloglogn) rounds. Thus, in an algorithm that takes O(F) rounds to construct a ruling set with β=Θ(F), F must be at least Ω((logn/loglogn)1/2). This leaves the range F=O((logn/loglogn)1/2) as an intriguing area. We emphasize that such label sizes are addressed by substantial research on labeling schemes, e.g.: Korman, Kutten, and Peleg [35] assert that for every value of F, there exists a proof-labeling scheme with this label length. Baruch, Fraigniaud, and Patt-Shamir [6] prove that a proof-labeling scheme of a given length can be converted into a randomized version with logarithmic length. Patt-Shamir and Perry [40] provide additional proof-labeling schemes with small non-constant labels, e.g., they prove the existence of a proof-labeling scheme for the k-maximum flow problem with a label size of O(logk).222To our knowledge, the fastest deterministic construction in 𝖢𝗈𝗇𝗀𝖾𝗌𝗍takes O~(log3n) rounds, due to Faour, Ghaffari, Grunau, Kuhn, and Václav [12]. Therefore, the actual complexity may be even larger, which would further widen the range of F in which our construction outperforms other methods.

The existence of this lower bound could be overcome by adding a single bit to each node’s new label to indicate whether the node is in the ruling set. However, another significant obstacle is that the nodes need to reconstruct the ruling set given F erasures. This turns out to be a major issue, as follows.

Suppose we have a path (w0,,wf+1,,w2f+1), with two nodes u,v connected to w2f+1, and the rest of the graph is connected through w0. Suppose a (2f+2,2f+1)-ruling set S contains w0 and one of u or v, and that ID(v)<ID(w0)<ID(u). If the labels of u and v both get erased, then the node wf+1 cannot determine whether it should be part of the tree Tw0 or not. Its distance from v,u and w0 is f+1, but the ID comparisons for w0,u and w0,v yield different results: if uS, then wf+1 should join Tw0, but if vS, then wf+1 should join Tv. Notice that all nodes with non-erased labels have the same distance to u as to v, making it impossible to distinguish between the two possibilities.

This seems to bring us to a dead-end, as the simple workaround of providing each node with the identity of its closest node in the ruling set S is too costly. It would require adding Θ(logn) bits to the new labeling ψ, which we wish to avoid.

Greedy ruling sets.

Our main technical contribution overcomes this challenge by demonstrating that a greedy ruling set does allow the nodes to recover it given F label erasures. Such a ruling set is constructed by iterating over the nodes in increasing order of IDs. When a node v is reached, it is added to the ruling set, and all nodes within distance 2f+1 from v are excluded from the ruling set and ignored in later iterations. Intuitively, in our above example, this ensures that S contains v and not u. Thus, when their labels are erased, the nodes can deduce that v should be in S because it has a smaller ID.

While this approach solves the previous example, it is still insufficient. Suppose there is a sequence of nodes v1,,vf with decreasing IDs whose labels get erased and are not covered by any other node in S (a node v is covered by a node in S if their distance is at most 2f+1). Assume that the distance between every two consecutive nodes in the sequence is Θ(f). This means that each node vi knows it needs to be in the ruling set if and only if vi+1 is not, essentially unveiling the decisions of the greedy algorithm. However, this implies that v1 needs to receive information about vf, which may be at distance of Θ(f2), implying more rounds than our target running time.

To cope with the latter example, the oracle will provide each node with its distance from S. This incurs an additive O(logF) bits in the label size. This information is useful for deciding, for some of the nodes with erased labels, whether they are in S or not and, in particular, it breaks the long chain of the latter example. Yet, notice that in the former example, this information does not help us, since all nodes apart from u and v have the same distance to u as to v. The reason this helps in one example but not in the other is as follows. For any two nodes u and v with erased labels that need to decide which of them is in S, as long as we have at least f1 additional nodes with a different distance from u and v, we can easily choose between u and v, because in total there can be at most f label erasures, so one of these f1 reveals which node needs to be chosen.

Thus, to reconstruct a greedy (2f+2,2f+1)-ruling set S using the above information in the non-erased labels, the nodes do the following. First, a node that sees a non-erased label in S within distance 2f+1 from it can deduce that it is not in S. Second, a node for which all labels within distance 2f+1 are not erased and are not in S can deduce that it is in S. For a remaining node u with an erased label, the distance from S of the non-erased labels in its 2f+1 neighborhood helps in deciding whether u is in S or not: if it has a node within this distance whose distance from u differs from its distance from S, then u is not in S. At this point in the reconstruction algorithm, the only remaining case is a node v which is not known to be covered by nodes in S at this stage and has a non-empty set of nodes X(v) within distance f from it whose labels are erased. We show that such nodes cannot create distant node sequences as in the latter example, which allows v to simply check if it has the smallest ID in X(v), indicating whether it should be in S.

Formally, we refer to such nodes as alternative nodes, defined as follows. Let dist(v,u) be the distance between nodes v and u, and let Bt(v) be the set of nodes at distance at most t from v. For a node v, a node uv is called an alternative node if dist(u,v)f and there are at most f2 nodes qBf+1(v)Bf+1(u) with dist(q,v)dist(q,u). We emphasize that proving that the only remaining case of alternative nodes is technical and non-trivial.

Given any fF, we prove that providing each node with a single-bit indication of whether it is in the greedy (2f+2,2f+1)-ruling set S and an additional O(logf) bits holding its distance from S is sufficient for the nodes to reconstruct S in O(f) rounds, despite F erasures. Due to the codes we choose, we will need f to be slightly larger than F. However, it is sufficient to choose f=Θ(F), so to reconstruct the greedy ruling set, the label size needs an additive overhead of O(logF) and the number of rounds remains O(F).

Wrap-up.

We can now describe our resilient labeling scheme in full.

In Section 2, we use a greedy ruling set with f=Θ(F) to construct groups and apply an error-correcting code to the labels of their nodes. This provides the entire algorithm of the oracle for transforming an -labeling φ into an -labeling ψ, such that has a constant multiplicative overhead for coding and an additive overhead of O(logF) for restoring the greedy (2f+2,2f+1)-ruling set.

In Section 3, we provide a distributed algorithm for restoring the greedy (2f+2,2f+1)-ruling set given the labeling ψ despite F label erasures, within O(F) rounds. In Section 4, the nodes partition themselves into the same groups as the oracle, within O(F) rounds. Finally, Section 5 presents our complete resilient labeling scheme, which requires an additional number of O(F+(F)/logn) rounds for communicating the non-erased labels within each group and decode to obtain the original labeling φ.

1.2 Preliminaries

Throughout the paper, let G=(V,E) be the network graph, which without loss of generality is considered to be connected. Let n denote the number of nodes in G. After labels are assigned to nodes by the oracle, there is a parameter F of the number of erasures of labels that may occur. For each node vV, we denote by ID(v) the unique identifier of v.

We denote by dist(v,u) the distance between two nodes v and u, and by Pv,u some shortest path between them. The ball of radius t centered at a node v consists of all nodes within distance t from v and is denoted by Bt(v)={uVdist(u,v)t}.

Throughout the paper, we use ab to refer to the concatenation of two strings a and b.

1.3 Related Work

As mentioned earlier, the work of Bick, Kol, and Oshman [7] implicitly suggests a resilient labeling scheme with an O(1) multiplicative overhead to the label size in O(F3logF+(F3)/logn) rounds. This is achieved by constructing a so-called (k,d,c,h)-helper assignment, which assigns k helper nodes to each node v in the graph, with paths of length at most d from each v to all of its helper nodes, such that each edge belongs to at most c paths, and a node is assigned as a helper node to at most h nodes v. The paper computes a (k,O(k),O(k2),O(k))-helper assignment in O(klogk) rounds, using messages of O(k2logn) bits. Plugging in k=Θ(F) gives the aforementioned resilient labeling scheme. For F=ω(1), the parameters we obtain improve upon the above.

Ostrovsky, Perry, and Rosenbaum [37] introduced t-proof labeling schemes, which generalize the classic definition by allowing t rounds for verification, and study the tradeoff between t and the label size. This line of work was followed up by Feuilloley, Fraigniaud, Horvonen, Paz, and Perry [15], and Fischer, Oshman, and Shamir [16]. All of these papers provide constructions of groups of nodes that allow sharing label information to reduce the label size in this context. However, our construction is necessary for handling: (i) general graphs, (ii) in the 𝖢𝗈𝗇𝗀𝖾𝗌𝗍model, (iii) with arbitrary labels, and (iv) while achieving our goal complexity. It is likely that our algorithm can be applied to this task as well. Error-correcting codes have already been embedded in the aforementioned approaches of Bick et al. [7] and Fischer et al. [16], and we do not consider their usage a novelty of our result.

Note that Feuilloley and Fraigniaud [14] design error-sensitive proof-labeling schemes, whose name may seem to hint at a similar task, but are unrelated to our work: these are proof-labeling schemes where a proof that is far from being correct needs to be detected by many nodes.

2 Assigning Labels

To allow the nodes in the distributed algorithm to decode their original labels despite F label erasures, we employ an error-correcting code in our resilient labeling scheme. Known codes can correct a certain number of errors at the cost of only a constant overhead in the number of bits. Therefore, we can split the nodes into groups of size Θ(F) and encode the labels within each group. We include in the new labels information about the group partitioning to allow the distributed algorithm to quickly reconstruct the same groups. This information comes in the form of a ruling set, on which we base the group partitioning. To ensure that this reconstruction can be done despite label erasures, we ensure that the oracle uses a greedy ruling set. We present the oracle’s algorithm and then prove its guarantees in Theorem 3.

Overview.

In Algorithm 1, the oracle first constructs a greedy (2f+2,2f+1)-ruling set S. For every node v, let b(v) be a single bit indicating whether vS, and let distS(v) be an O(logf)-bit value which is the distance between v and S.

The oracle then partitions the nodes into groups of size Θ(f). To later allow the distributed algorithm to reconstruct the same groups, the oracle uses the same algorithm the nodes will later use, i.e., Algorithm 3. However, for the description in this section, any partitioning with the same guarantees would give the desired result in Theorem 3 below.

Finally, the oracle applies an error-correcting code to each index of the labels of all nodes in each group. The new label it assigns to each node v is a concatenation of its b(v) and distS(v) values, along with the encoded label.

Algorithm 1 Oracle’s algorithm for assigning new labels.
Input: A graph G, a parameter F, and an -labeling φ:V{0,1}.
Output: An -labeling ψ:V{0,1}.
1 Construct a greedy (2f+2,2f+1)-ruling set S. For every node v, denote by distS(v) its distance to S.
2Simulate an execution of Algorithm 3 with b(v)=1 for every node vS and b(v)=0 for every node vS. At the end of the algorithm, every node v is associated with a group identifier Group(v).
For every group Group obtained in the previous step, let (v1,,vs) be the IDs of its nodes in increasing order. For every 1j, concatenate the j-th bit of the labels φ(v) of the nodes according to their increasing to obtain a string 𝚛j, and encode this string into a codeword 𝚠j according to the code 𝙲. Split the bits of 𝚠j into s consecutive blocks (𝚠1j,,𝚠sj) of sizes |𝚠j|/s or |𝚠j|/s. For every 1is, let 𝚠i=𝚠i1,,𝚠i and set ψ(vi)=b(vi)distS(vi)𝚠i.

Choice of code.

Different codes 𝙲 result in different overheads of compared to . A simple repetition code takes the string 𝚒𝚗𝚏𝚘 of length s and repeats it for s blocks to create the resulting codeword 𝚠=𝙲(𝚒𝚗𝚏𝚘), which is trivially decodable given any s1 blocks erasures. Splitting 𝚠 back to s pieces implies that 𝚠i equals 𝚒𝚗𝚏𝚘 for every node vi in the group. Taking f=F, this results in =1+O(logf)+s=O(logF)+s. Since the group size s is always Θ(f)=Θ(F), we get a multiplicative overhead of O(F) and an additive overhead of O(logF). To reduce the multiplicative overhead to O(1), we use a code with better parameters. Formally, a binary code 𝙲 maps strings of k bits into strings of N bits called codewords. The ratio ρ=N/k is the code’s rate. The relative distance, δ, ensures the Hamming distance (number of indices where the strings differ) between any two codewords is at most δN. Such a code can correct δN1 erasures.

Lemma 2 (Justesen Codes [29], phrasing adopted from [3]).

For any given rate ρ<12 and for any M>0, there exists a binary code 𝙲M:{0,1}kM=ρMNM{0,1}NM with NM=2M(2M1) such that the code 𝙲M has rate ρMρ and relative distance δM>(12ρ)H1(1/2), where H(x)=xlog(1/x)+(1x)log(1/(1x)) is the binary entropy function. Encoding and decoding can be done in polynomial time.

Given the bound F on the number of label erasures, we need to choose a value of f to work with. Then, given a group of size s such that f+1s3f+1, we need to choose values for ρ and M that determine the code 𝙲M. We set these parameters as follows. We let f=80F, and for every such s we choose ρ=1/4. To choose M, notice that Lemma 2 gives that for any M>0, the number of bits we can encode is kM=ρMNM. We choose the smallest M such that kM is at least s.

The next theorem proves that the overhead for the number of bits held by each node per coded bit is constant and that, given the coded blocks,it is possible to decode the original string of bits if up to F blocks are erased. The proof is presented in Appendix A.

Theorem 3.

Let f=80F and let ρ=1/4. Let (v1,,vs) be the sequence of nodes in a group of size s, such that f+1s3f+1, ordered by increasing IDs. Let 𝚛 be a string obtained by concatenating a single bit associated with each vi according to their order, and let 𝚠 be a codeword obtained in Step 1 of Algorithm 1, by using a code 𝙲M from Lemma 2, with a value of M that is the smallest such that kMs. Then,

  1. 1.

    it holds that |𝚠|/s80, and

  2. 2.

    the string 𝚛 is decodable given 𝚠=(𝚠1,,𝚠s) with up to F erasures of blocks 𝚠i.

For later use by the distributed algorithm in our resilient labeling scheme, we denote by decode(v,{(u,𝚠(u))uGroup(v)}) the function that takes as input a node v and the values 𝚠(u) for all the nodes u in Group(v) (along with their IDs so that they can be ordered correctly) and produces the bit of v in the string 𝚛 which was encoded into 𝚠, by Theorem 3.

3 Restoring a Greedy Ruling Set Despite Erasures

In this section we present a distributed algorithm that restores a greedy ruling set despite label erasures. Nodes are faulty if their label are erased, and non-faulty otherwise. Each node v aims to restore its value b(v) so that the set {vb(v)=1} is exactly the given ruling set S. Once a faulty node v sets its value b(v), it is no longer considered faulty for the remainder of the algorithm. The number of label erasures our algorithm can tolerate is denoted by f, and in a setting with at most F label erasures, the algorithm can be executed correctly for any value fF.

Theorem 4.

Let G=(V,E) be a graph and let S be a greedy (2f+2,2f+1)-ruling set S. Suppose that, apart from up to at most f faulty nodes, each node v is provided with a label ζ(v)=b(v)distS(v), where b(v)=1 if vS and b(v)=0 otherwise, and distS(v)=minuS{dist(v,u)}. Then, there exists a deterministic 𝖢𝗈𝗇𝗀𝖾𝗌𝗍algorithm at the end of which every node v restores its value b(v). The algorithm runs in O(f) rounds.

Before presenting and analyzing our algorithm for Theorem 4, we recall the definition of alternative nodes.

Definition 5 (Alternative Nodes).

For a node v, a node uv is called an alternative node, if dist(u,v)f and there are at most f2 nodes qBf+1(v)Bf+1(u) with dist(q,v)dist(q,u). Denote by M(v) the set of alternative nodes for v.

A key tool in our analysis is the proof that a greedy ruling set S excludes nodes v with alternative nodes uS having smaller ID. In Appendix A, we prove the following.

Lemma 6.

Let S be a greedy (2f+2,2f+1)-ruling set. Then, there exists no node vS for which there is an alternative node uS with ID(u)<ID(v).

We are now ready to prove Theorem 4. We begin by presenting the algorithm.

Overview.

In Algorithm 2, nodes restore a greedy ruling set S as follows. Before the algorithm starts, each non-faulty node w sets its value b(w) to the first bit of its label. At the beginning of the algorithm, non-faulty nodes in S broadcast a bit “1” for 2f+1 rounds. Here, every node that receives a bit “1” during these rounds forwards it to all of its neighbors. Faulty nodes u that receive a bit during these rounds sets b(u)=0, as this indicates that uB2f+1(v) for some such vS and thus uS.

Next, each faulty node propagates its ID through a distance of 2f+1. Faulty nodes u that do not receive any other ID set b(u)=1, as this means no other faulty nodes are in B2f+1(u) and thus uS.

Now, every remaining faulty node u propagate a message of the form ID(u)dist through a distance of f+1, where dist starts at 0 and is increased by 1 by each node along the way, corresponding to the length of the path. For each node w, let distu(w) be the smallest dist value received for u, representing the actual distance between w and u.

Each node wBf+1(u) receives the message and checks if it is possible that uS by verifying whether distS(w)=distu(w). If this equality does not hold, w propagates a message with ID(u) through a distance of f+1. Since dist(w,u)f+1, if uS, then distS(w)=distu(w). If w propagates ID(u), then this is not the case, so when u receives its own ID, it can conclude that it is not in S and sets b(u)=0.

Now, we are left with faulty nodes vS and faulty nodes uS. We will prove that u is an alternative node for a faulty node vS. Thus, every faulty node vS must have the lowest ID among faulty nodes in Bf+1(v) and can deduce that it is in S and set b(v)=1. Similarly, every faulty node uS must have a faulty node vS within Bf+1(u) such that ID(v)<ID(u), allowing u to deduce it is not in S and set b(u)=0.

Following is the formal algorithm and proof.

Algorithm 2 Restoring a greedy (2f+2,2f+1)-ruling set S despite f label erasures.
Input: A graph G with a greedy ruling set S, where each node w is provided with ζ(w)=b(w)distS(w), except at most f faulty nodes which are provided with ζ(w)=empty.
Output: Every node w outputs b(w) such that {wb(w)=1}=S.
1 Every non-faulty node wV sets b(w) to the first bit in ζ(w).
2Every node vS with ζ(v)empty broadcasts a bit “1” for 2f+1 rounds.
3Every faulty node u that receives a bit “1” during Step 2 sets b(u)=0.
4Every faulty node u propagates a message with ID(u) through a distance of 2f+1. Nodes that have already broadcast ID(u) do not repeat the broadcast.
5Every faulty node u which does not receive any other ID during Step 2 sets b(u)=1.
6Every faulty node u propagates a message of the form ID(u)dist through a distance of f+1, where dist starts at 0 and is increased by 1 by any node along the way. A node does not propagate a message with some ID if it has already done so with a smaller value of dist. For each node w, let distu(w) be the smallest dist value it receives for node u.
7Every node wBf+1(u) with ζ(w)empty receives a message during Step 2 and checks if distS(w)=distu(w). If the equality does not hold, then w propagates a message with ID(u) through a distance of f+1. Nodes that have already broadcast ID(u) do not repeat the broadcast.
8Every faulty node u that receives ID(u) during Step 2 sets b(u)=0.
9Every faulty node u propagates a message with ID(u) through a distance of f.
Let X(u) be the set of nodes whose IDs are received by u during Step 2. If ID(u)=minuX(u){ID(u)}, then u sets b(u)=1. Otherwise, it sets b(u)=0.

Proof of Theorem 4.

To prove that the algorithm restores the set S, we prove that every node wV holds b(w) at the end of Algorithm 2, such that {wb(w)=1}=S. By Step 2, this holds for all non-faulty nodes, so it remains to prove it for faulty nodes. In what follows, we go over each step in the algorithm where nodes may set their output (Steps 2,2,2, and 2), and we show that any node which sets its output during that step does so correctly.

Step 2.

In Step 2, every non-faulty node vS broadcasts a bit “1” for 2f+1 rounds. By the definition of a (2f+2,2f+1)-ruling set, every two nodes v,v′′S satisfy that dist(v,v′′)2f+2. Thus, if a faulty node u receives a bit “1” from vS during these 2f+1 rounds, it implies that uBf+1(v). Hence, uS since vS. So, for any node u that sets b(u)=0 in Step 2, its output is correct.

Step 2.

Afterwards, in Step 2, each faulty node propagates a message with its ID through a distance of 2f+1. Every faulty node u that does not receive any other ID during these rounds and did not receive a “1” during Step 2 sets b(u)=1, because this implies that there is no other faulty node in B2f+1(u) and hence it must hold that uS. So, it is correct that u sets b(u)=1 in Step 2.

Step 2.

Then, in Step 2, every node u that is still faulty propagates a message of the form ID(u)dist through a distance of f+1, where dist starts at 0 and is increased by 1 before the message is propagated further. This corresponds to the length of the path that this message traverses. Every node wBf+1(u) receives the above message and checks if it is possible that uS by verifying whether distS(w)=distu(w). We denote by dist(u) the value of dist from the message ID(u)dist that w receives. If equality does not hold, then w propagates a message with ID(u) through a distance of f+1. Since dist(u,w)f+1, if uS, then for every other node vS such that dist(w,v)f+1, it holds that 2f+2dist(u,v)dist(u,w)+dist(w,v)f+1+dist(w,v). Thus, dist(w,v)f+1, and since distS(w)=minvS{dist(q,v)} we have distS(w)=distu(w). But if w propagates ID(u), then this is not the case, so uS. Thus, it is correct that u sets b(u)=0 in Step 2.

Step 2.

First we prove that every node u that is still faulty at the beginning of this step and every node vX(u) are alternative nodes. By definition of alternative nodes, we need to prove that dist(u,v)f and that v and u have at most f2 nodes wBf+1(v)Bf+1(u) with dist(u,w)dist(v,w).

First, since vX(u), it must be that dist(u,v)f. Second, we prove that v and u have at most f2 nodes tBf+1(v)Bf+1(u) with dist(u,t)dist(v,t). Assume, towards a contradiction, that there are at least f1 nodes tBf+1(v)Bf+1(u) with dist(v,t)dist(u,t). Since u is still faulty in Step 2, there are at most f2 nodes qu,v in Bf+1(u) with dist(u,q)dist(v,q), because otherwise u would receive a message with ID(u) from such a non-faulty node during Step 2 and would set b(u)=0 in Step 2. Therefore, there exists a node sBf+1(v)Bf+1(u) that is still faulty in Step 2. We consider two cases, obtaining a contradiction in both, and conclude that the assumption is incorrect. Thus, v and u have at most f2 nodes tBf+1(v)Bf+1(u) with dist(u,t)dist(v,t).

Case 1: No node 𝒓𝒗 exists on 𝑷𝒗,𝒔𝑷𝒗,𝒖.

An illustration of this case is presented in Figure 1.

Refer to caption
Figure 1: Illustration for Case 1. No node rv exists on Pv,sPv,u. Therefore, Pv,u and Pv,s are disjoint except at the node v. The orange circle represents Bf+1(u) and the blue circle represents Bf+1(v). We assume that dist(u,v)f, so |Pu,v|f. Additionally, sBf+1(v)Bf+1(u).

In this case, every node tu,v on Pv,u satisfies that dist(u,t)dist(v,t), except at most one node a (such a node a exists if dist(u,v) is even and a is exactly in the middle of Pv,u). Additionally, every node t on Pv,s satisfies that dist(t,v)dist(t,u), as otherwise dist(u,s)dist(u,t)+dist(t,s)=dist(v,t)+dist(t,s)=dist(v,s)f+1, which contradicts the assumption that dist(u,s)f+2. Denote by P the path Pu,vPv,s at a distance of at most f+1 from u. There are f nodes on P that are neither u nor v. Additionally, excluding the node a, we have f1 nodes q on P with dist(v,q)dist(u,q), and they are all in Bf+1(u). This contradicts the assumption that there are at most f2 such nodes. Therefore, this case in not possible.

Case 2: There exists a node 𝒓𝒖 on 𝑷𝒗,𝒔𝑷𝒗,𝒖.

An illustration of this case is presented in Figure 2.

Refer to caption
Figure 2: Illustration for Case 2. There exists a node rv on Pv,sPv,u. We consider r to be the closest node to u in Pv,uPv,s. The orange circle represents Bf+1(u) and the blue circle represents Bf+1(v). We assume that dist(u,v)f, and since r is on Pu,v, we have |Pu,r|+|Pr,v|f. Additionally, sBf+1(v)Bf+1(u). Since r is on Pv,s, we have |Pv,r|+|Pr,s|f+1.

We consider r to be the closest node to u in Pv,uPv,s. In this case, every node tu,v on Pv,u satisfies that dist(u,t)dist(v,t), except at most one node a, as we proved in the previous case. Therefore, it is also correct for Pu,r. Additionally, every node t on Pr,s satisfies that dist(t,v)dist(t,u), as otherwise dist(u,s)dist(u,t)+dist(t,s)=dist(v,t)+dist(t,s)=dist(v,s)f+1, contradicting the assumption that dist(u,s)f+2.

Denote by P the path Pu,rPr,s at a distance of at most f+1 from u. There are f+1 nodes on P that are not u. Additionally, excluding the node a, we have f nodes q on P with dist(v,q)dist(u,q), and they are all in Bf+1(u). This contradicts the assumption that there are at most f2 such nodes. Therefore, this case in not possible.
This completes the proof that every node u that is still faulty at the beginning of this step and every node vX(u) are alternative nodes.

Now, we prove that if u sets b(u)=0 in this step, then it satisfies that uS. In this case, since u sets b(u)=0, there exists a node vX(u) with ID(v)<ID(u). Since v is an alternative node for u with ID(v)<ID(u), it follows that uS by Lemma 6.

Finally, we prove that every faulty node u that sets b(u)=1 satisfies that uS. By the definition of a (2f+2,2f+1)-ruling set, there exists a node vS such that dist(u,v)2f+1 (which may be u itself). We claim that vBf(u). Assume otherwise, i.e., that dist(u,v)f+1. Notice that v,u have at least |Pv,u|2 nodes j with dist(v,j)dist(u,j) on Pv,u, since there are |Pv,u|1 nodes ju,v on Pv,u, and at most one of them has dist(v,j)=dist(u,j) (such a node j exists if dist(u,v) is even and j is exactly in the middle of Pv,u). Therefore, if dist(u,v)f+1, consider nodes on the path Pu,v that are at a distance of at most f+1 from u. There are at least f1 such nodes ju,v with dist(u,j)dist(v,j), and so at least one of them is not faulty, which would lead to u setting b(u)=0 in Step 2. But, u is faulty at the beginning of Step 2, which leads to a contradiction.

Thus, vBf(u). Additionally, vX(u) because in Step 2, u is still faulty, and therefore, in Step 2, u does not set b(u)=0, so v must also still be faulty in Step 2. Since dist(u,v)2f+1 and v is still faulty in Step 2, u receives a message with ID(v) in Step 2. Note that if v is not faulty in Step 2, then either it is non-faulty to begin with, in which case u would have already set b(u)=0 in Step 2, or v sets b(v)=1 in Step 2. The latter is impossible because dist(u,v)2f+1 and u is still faulty in Step 2. Thus, v is still faulty in Step 2, which means that v would receive a message from u in Step 2 and not set b(v)=1 in Step 2. Thus, v is also still faulty in Step 2, and since dist(u,v)f, then vX(u).

Thus, since u sets b(u)=1, every node wX(u) satisfies ID(u)ID(w). Since, wX(u), then as we proved, w is an alternative node for u. By Lemma 6, this implies that w cannot be in S. Since vX(u)S, it must hold that v=u. Thus uS, as needed.

This completes the proof that, at the end of Algorithm 2, we set b(w) for every node wV correctly, satisfying {w|b(w)=1}=S. Finally, we sum the number of rounds the algorithm takes. In every broadcast in the algorithm, we have at most f different messages (each of which fits in O(logn) bits). Moreover, a node passes such a message only once. Thus, each broadcast takes O(f) rounds. Since there are only a constant number of steps with such broadcasts, the algorithm completes in O(f) rounds.

4 Partitioning

The nodes use the restored ruling set to partition themselves into groups, with each group containing an error correction code to recover the erased labels. The algorithm they follow is the same as that of the oracle, ensuring determinism and independence from faulty nodes. This guarantees consistent partitioning and correct decoding.

To ensure fast communication within each group, our goal is to partition the graph nodes into groups of size Θ(f). Our algorithm achieves guarantees that are slightly weaker, in the sense that a group may be a disconnected component in the graph, but its weak diameter (its diameter in the original graph) is still O(f). In general, such a guarantee may still be insufficient for fast communication within each group, as some edges may be used by multiple groups, causing congestion. However, our algorithm will make sure to provide low-congestion shortcuts. That is, it constructs groups of size Θ(f) with a weak diameter of O(f), such that each edge eE is associated with a constant number of groups that communicate over it. A formal definition of this concept is as follows.

Definition 7 (Low-Congestion Shortcuts [24]).

Given a graph G=(V,E) and a partition of V into disjoint subsets Q1,,QKV, a set of subgraphs H1,HKG is called a set of (c,d)-shortcuts, if:

  1. 1.

    The subgraph G[Qi]Hi is connected.

  2. 2.

    For each i, the diameter of the subgraph G[Qi]Hi is at most d.

  3. 3.

    For each edge eE, the number of subgraphs G[Qi]Hi containing e is at most c.

With the above definition, we can now state the properties of our partitioning algorithm.

Theorem 8.

Let G=(V,E) be a graph, and let S be a (2f+2,2f+1)-ruling set where each node v knows whether it is in S. There exists a deterministic 𝖢𝗈𝗇𝗀𝖾𝗌𝗍algorithm that partitions V into disjoint groups Q1,QKV with sizes ranging from f+1 to 3f+1, and provides a set of (2,4f)-shortcuts for these groups. The algorithm completes in Θ(f) rounds.

Overview.

Algorithm 3 constructs the required partition as follows. Initially, each node vS computes a BFS tree to depth 2f+1 by propagating its ID. Nodes that receive multiple root IDs choose the smallest and use that tree only. The rest of the algorithm operates within each tree independently.

Within each tree, each node u divides the nodes in its subtree into groups of size f+1s3f+1, with a remainder of up to f nodes. This remaining number is sent to its parent parent(u), who handles dividing the remainders of its children similarly. The 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(u) procedure sorts u’s children by increasing IDs, appends itself, and then forms groups by summing the remaining values of its children. Once a group reaches the desired size, a new group starts from the next child. Each child receives its group ID from u. When a node receives its group allocation, it may need to inform some of its children to ensure that remaining nodes from its subtree join this group as well. This is handled by the 𝚁𝚎𝚕𝚊𝚢𝙶𝚛𝚘𝚞𝚙𝚜(u) procedure, in which node u relays the group ID. During this process, u also sets the local variable Hu which is used for constructing low-congestion shortcuts for communication among the nodes of each group. Finally, if there is a small remainder at the root v, the root appends these nodes to one of the groups created in 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(v) or, if no such group was constructed by v, it appends them to a group with the closest node from this group to v. This is done by v obtaining the required group ID and relaying it to the relevant remaining nodes.

Following is the formal algorithm and proof.

Algorithm 3 Partitioning algorithm.
Input: A graph G and a (2f+2,2f+1)-ruling set S, where each node v is provided with b(v) such that S={vb(v)=1}.
Output: Each node v holds Group(v) and Hgroup(v) for every group group.
Constructing BFS trees:
Every node vS starts a BFS to a distance of 2f+1. Every node uS which receives BFS messages selects the one with the lowest ID and continues with it. Every node uS in a tree rooted at v, designates parent(u) as one of the nodes from which it received v, and sets leader(u)=v. Every node u sends a bit “1” to parent(u). Let children(u) denote the set of nodes from which u receives such a bit. Creating groups:
1 Every node uV with children(u)= sends remain(u)=1 to parent(u). Every node uV sorts its children by increasing IDs and appends itself last to create a sequence order(u)=(w1,,wru,u=wru+1), where ru=|children(u)|.
Every node uV that receives remain(w) from every node wchildren(u) locally computes Groups(u), Messages(u) and remain(u) using the 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(u) procedure. For each item (child,group)Messages(u), u sends group to wchild. Additionally, if uS it sends remain(u) to parent(u). Relaying group IDs:
Every node uS that receives group from its parent computes Messagesr(u) using the RelayGroups(u,group) procedure and sets Group(u)=group. For each item (child,group)Messagesr(u), u sends group to its child wchild. Root remainder:
2 Every node uS sends a group ID group to parent(u). If Group(u) then group is Group(u). If Group(u)= then group is the first group identifier that u receives from some child (the smallest ID if there is more than one).
3Every node vS with Group(v)= sets Group(v)=group, where group is the first group identifier that v receives from some child (the smallest ID if there is more than one). Then, v sends group to every child w with remain(w)>0 and Group(w)= and to the node it received group from in Step 3. For every such node w, the node u inserts the edge (u,w) into the set Hgroup(u).
Every node uS with Group(u)= which receives group from parent(u) sets Group(u)=group and sends group to every child w with remain(w)>0 and Group(w)= and to the node it received group from in Step 3 (if there exists such node). For every such node w, the node u inserts the edge (u,w) into Hgroup(u). Assigning shortcut edges:
Denote in Q1,,QK as the groups created by all nodes and define Hi=uVHID(Qi)(u).

Procedure 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(𝒖).

The node uV initializes a counter count=0, an empty sequence Groups(u)=, an empty set Messages(u)=, and a variable remain(u)=1. It then loops over 1iru+1 and computes countcount+remain(wi) until countf+1.

When this happens for some index i=j1, the node u adds the item (1,j1,ID(w1)) to the sequence Groups(u), sets count0, and continues looping over the nodes in order(u) from i=j1+1 in the same manner. That is, it computes countcount+remain(wi) until countf+1 for some value j2, after which it adds the item (j1,j2,ID(wj1)) to Groups(u), and so on. Thus, every item in Groups(u) is of the form (start,end,group), where group is the ID of the node wstart, which serves as the group identifier for all nodes wi for startiend.

Let (a,b,ID(wa)) be the last group created by u, so wb is the last child of u whose remain(wb) value is grouped. The node u computes remain(u)=(i=b+1ruremain(wi))+1. If uS and Groups(u), u updates (a,b,ID(wa)) to (a,ru+1,ID(wa)).

For each item (start,end,group) in Groups(u), u inserts a tuple of (group,i) into Messages(u) for every child wi where startiend and remain(wi)>0. Thus, every item is of the form: (child,group). For each item (child,group)Messages(u), the node u inserts the edge (u,wchild) into the set Hgroup(u).

Procedure 𝚁𝚎𝚕𝚊𝚢𝙶𝚛𝚘𝚞𝚙𝚜(𝒖,𝒈𝒓𝒐𝒖𝒑).

Let (a,b,ID(wa)) be the last group that u created during its 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(u) procedure. The node u initializes an empty set Messagesr(u)=. Then, u inserts a tuple (i,group) into Messagesr(u) for every child wi where b+1iru. Thus, every item is of the form (child,group). As in 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(u), for each item (child,group)Messagesr(u), the node u inserts the edge (u,wchild) into the set Hgroup(u).

This concludes the description of the algorithm. Appendix A has the proof of Theorem 8, as well as for the following claim derived from it, which will be useful in the subsequent section.

Claim 9.

In parallel, each node can collect Yf bits of information from all nodes in its group within O(f+(Yf)/logn) rounds, where each node holds Y bits of information and the group sizes are O(f).

5 Resilient Labeling Schemes

In this section, we wrap up the previous parts to obtain our resilient labeling scheme.

See 1

Overview.

Algorithm 4 gives our full resilient labeling scheme. The oracle executes Algorithm 1 to assign new labels to each node. In the distributed algorithm, each node parses its new label, which is done correctly since the length of each item in the string is known. The node then uses this label to reconstruct the greedy (2f+2,2f+1)-ruling set S (with f=80F) using Algorithm 2 and to obtain its group in the partition according to Algorithm 3. Finally, the nodes exchange all their labels within a group and decode their original labels, as guaranteed by Theorem 3.

Algorithm 4 Resilient Labeling Scheme.
Input to Oracle: A graph G, a parameter F, and a labeling φ:V{0,1}
Output of Oracle: A labeling ψ:V{0,1}
Input to Distributed Algorithm: Each node v gets the label ψ(v)
Output of Distributed Algorithm: Each node v outputs φ(v)
Oracle:
Run Algorithm 1 and assign ψ(v) to each node v . The distributed algorithm:
1 For every non-faulty node v, parse ψ(v) as b(v)distS(v)𝚠(v).
2Run Algorithm 2 with ζ(v)=b(v)distS(v) for each non-faulty node v to obtain b(v) for every node v.
3Run Algorithm 3 with b(v) for every node v, to obtain Group(v) for every node v.
4Every node v sends (v,𝚠(v)) to all nodes in Group(v).
Every node v outputs decode(v,{(u,𝚠(u))uGroup(v)}).

Proof of Theorem 1.

First, note that since b(v) is a single bit and distS(v) has O(logF) bits where the constant is known, the parsing of ψ(v) as b(v)distS(v)𝚠(v) in Step 4 is unique and matches Step 1 in Algorithm 1.

Correctness:

By Theorem 3, the values b(v) correspond to a greedy (2f+2,2f+1)-ruling set S, meaning that {vb(v)=1}=S. Thus, according to Theorem 4, after running Algorithm 2 in Step 4, each node v holds its value b(v). Since Algorithm 3 is deterministic and the oracle uses the same algorithm in Algorithm 1 in Step 4, after running Algorithm 3 in Step 4, each node v will have the same Group(v) as determined by the oracle in Step 1 of Algorithm 1. Therefore, by Theorem 3, the operation decode(v,{(u,𝚠(u))uGroup(v)}) in Step 4 will yield the value of φ(v) for each node v.

Round complexity:

By Theorems 4 and 8, Algorithms 2 and 3 in Steps 4 and 4, respectively, each take O(f) rounds. Additionally, according to Claim 9, every node can collect O(f) bits of information from all nodes in its group in parallel, within O(f+(f)/logn) rounds, as each node holds O() bits of information and each group is of size O(f). Thus, Step 4 completes in O(f+(f)/logn) rounds. In total, the distributed algorithm completes in O(f+(f)/logn) rounds. Since f=Θ(F), we have O(F+(F)/logn) rounds.

Label size:

According to Theorem 3, each of the original bits corresponds to a constant number of bits in the coded version. Additionally, there is one bit for the ruling set indication b(v) and O(logF) bits for distS(v), making the new label size O(log(F)+).

References

  • [1] Ittai Abraham, Shiri Chechik, and Cyril Gavoille. Fully dynamic approximate distance oracles for planar graphs via forbidden-set distance labels. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 1199–1218, 2012. doi:10.1145/2213977.2214084.
  • [2] Stephen Alstrup, Cyril Gavoille, Haim Kaplan, and Theis Rauhe. Nearest common ancestors: a survey and a new distributed algorithm. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 258–264, 2002. doi:10.1145/564870.564914.
  • [3] Yagel Ashkenazi, Ran Gelles, and Amir Leshem. Noisy beeping networks. Inf. Comput., 289(Part):104925, 2022. doi:10.1016/J.IC.2022.104925.
  • [4] Alkida Balliu, Sebastian Brandt, Fabian Kuhn, and Dennis Olivetti. Distributed Δ-coloring plays hide-and-seek. In Stefano Leonardi and Anupam Gupta, editors, STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 464–477. ACM, 2022. doi:10.1145/3519935.3520027.
  • [5] Aviv Bar-Natan, Panagiotis Charalampopoulos, Paweł Gawrychowski, Shay Mozes, and Oren Weimann. Fault-tolerant distance labeling for planar graphs. Theoretical Computer Science, 918:48–59, 2022. doi:10.1016/J.TCS.2022.03.020.
  • [6] Mor Baruch, Pierre Fraigniaud, and Boaz Patt-Shamir. Randomized proof-labeling schemes. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, pages 315–324, 2015. doi:10.1145/2767386.2767421.
  • [7] Aviv Bick, Gillat Kol, and Rotem Oshman. Distributed zero-knowledge proofs over networks. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2426–2458. SIAM, 2022. doi:10.1137/1.9781611977073.97.
  • [8] Keren Censor-Hillel, Ami Paz, and Mor Perry. Approximate proof-labeling schemes. Theor. Comput. Sci., 811:112–124, 2020. doi:10.1016/J.TCS.2018.08.020.
  • [9] Michal Dory and Merav Parter. Fault-tolerant labeling and compact routing schemes. In Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, pages 445–455, 2021. doi:10.1145/3465084.3467929.
  • [10] Yuval Emek and Yuval Gil. Twenty-two new approximate proof labeling schemes. In Hagit Attiya, editor, 34th International Symposium on Distributed Computing, DISC 2020, October 12-16, 2020, Virtual Conference, volume 179 of LIPIcs, pages 20:1–20:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.DISC.2020.20.
  • [11] Yuval Emek, Yuval Gil, and Shay Kutten. Locally Restricted Proof Labeling Schemes. In Christian Scheideler, editor, 36th International Symposium on Distributed Computing (DISC 2022), volume 246 of LIPIcs, pages 20:1–20:22, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2022.20.
  • [12] Salwa Faour, Mohsen Ghaffari, Christoph Grunau, Fabian Kuhn, and Václav Rozhoň. Local distributed rounding: Generalized to mis, matching, set cover, and beyond. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4409–4447. SIAM, 2023. doi:10.1137/1.9781611977554.CH168.
  • [13] Laurent Feuilloley. Introduction to local certification. Discrete Mathematics & Theoretical Computer Science, 23(Distributed Computing and Networking), 2021. doi:10.46298/DMTCS.6280.
  • [14] Laurent Feuilloley and Pierre Fraigniaud. Error-sensitive proof-labeling schemes. J. Parallel Distributed Comput., 166:149–165, 2022. doi:10.1016/J.JPDC.2022.04.015.
  • [15] Laurent Feuilloley, Pierre Fraigniaud, Juho Hirvonen, Ami Paz, and Mor Perry. Redundancy in distributed proofs. Distributed Computing, 34:113–132, 2021. doi:10.1007/S00446-020-00386-Z.
  • [16] Orr Fischer, Rotem Oshman, and Dana Shamir. Explicit space-time tradeoffs for proof labeling schemes in graphs with small separators. In Quentin Bramas, Vincent Gramoli, and Alessia Milani, editors, 25th International Conference on Principles of Distributed Systems (OPODIS 2021), LIPIcs, pages 21:1–21:22, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2021.21.
  • [17] Pierre Fraigniaud, Cyril Gavoille, David Ilcinkas, and Andrzej Pelc. Distributed computing with advice: information sensitivity of graph coloring. Distributed Computing, 21:395–403, 2009. doi:10.1007/S00446-008-0076-Y.
  • [18] Pierre Fraigniaud, David Ilcinkas, and Andrzej Pelc. Communication algorithms with advice. Journal of Computer and System Sciences, 76(3-4):222–232, 2010. doi:10.1016/J.JCSS.2009.07.002.
  • [19] Pierre Fraigniaud, Amos Korman, and Emmanuelle Lebhar. Local mst computation with short advice. In Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 154–160, 2007. doi:10.1145/1248377.1248402.
  • [20] Emanuele G Fusco and Andrzej Pelc. Trade-offs between the size of advice and broadcasting time in trees. In Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, pages 77–84, 2008. doi:10.1145/1378533.1378545.
  • [21] Cyril Gavoille and David Peleg. Compact and localized distributed data structures. Distributed Computing, 16:111–120, 2003. doi:10.1007/S00446-002-0073-5.
  • [22] Cyril Gavoille, David Peleg, Stéphane Pérennes, and Ran Raz. Distance labeling in graphs. Journal of algorithms, 53(1):85–112, 2004. doi:10.1016/J.JALGOR.2004.05.002.
  • [23] Paweł Gawrychowski, Fabian Kuhn, Jakub Łopuszański, Konstantinos Panagiotou, and Pascal Su. Labeling schemes for nearest common ancestors through minor-universal trees. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2604–2619. SIAM, 2018.
  • [24] Mohsen Ghaffari and Bernhard Haeupler. Distributed algorithms for planar networks ii: Low-congestion shortcuts, mst, and min-cut. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 202–219. SIAM, 2016. doi:10.1137/1.9781611974331.CH16.
  • [25] Mika Göös and Jukka Suomela. Locally checkable proofs. In Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing, pages 159–168, 2011. doi:10.1145/1993806.1993829.
  • [26] Mika Göös and Jukka Suomela. Locally checkable proofs in distributed computing. Theory of Computing, 12(1):1–33, 2016. doi:10.4086/TOC.2016.V012A019.
  • [27] Rani Izsak and Zeev Nutov. A note on labeling schemes for graph connectivity. Information processing letters, 112(1-2):39–43, 2012. doi:10.1016/J.IPL.2011.10.001.
  • [28] Taisuke Izumi, Yuval Emek, Tadashi Wadayama, and Toshimitsu Masuzawa. Deterministic fault-tolerant connectivity labeling scheme. In Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing, pages 190–199, 2023. doi:10.1145/3583668.3594584.
  • [29] Jørn Justesen. Class of constructive asymptotically good algebraic codes. IEEE Transactions on information theory, 18(5):652–656, 1972. doi:10.1109/TIT.1972.1054893.
  • [30] Michal Katz, Nir A Katz, Amos Korman, and David Peleg. Labeling schemes for flow and connectivity. SIAM Journal on Computing, 34(1):23–40, 2004. doi:10.1137/S0097539703433912.
  • [31] Gillat Kol, Rotem Oshman, and Raghuvansh R Saxena. Interactive distributed proofs. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, pages 255–264, 2018. URL: https://dl.acm.org/citation.cfm?id=3212771.
  • [32] Amos Korman. Labeling schemes for vertex connectivity. ACM Transactions on Algorithms (TALG), 6(2):1–10, 2010. doi:10.1145/1721837.1721855.
  • [33] Amos Korman and Shay Kutten. Distributed verification of minimum spanning trees. In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing, pages 26–34, 2006. doi:10.1145/1146381.1146389.
  • [34] Amos Korman and Shay Kutten. On distributed verification. In Distributed Computing and Networking: 8th International Conference, ICDCN 2006, Guwahati, India, December 27-30, 2006. Proceedings 8, pages 100–114. Springer, 2006. doi:10.1007/11947950_12.
  • [35] Amos Korman, Shay Kutten, and David Peleg. Proof labeling schemes. Distributed Comput., 22(4):215–233, 2010. doi:10.1007/S00446-010-0095-3.
  • [36] Nathan Linial. Locality in distributed graph algorithms. SIAM Journal on computing, 21(1):193–201, 1992. doi:10.1137/0221015.
  • [37] Rafail Ostrovsky, Mor Perry, and Will Rosenbaum. Space-time tradeoffs for distributed verification. In Shantanu Das and Sébastien Tixeuil, editors, Structural Information and Communication Complexity - 24th International Colloquium, SIROCCO 2017, Porquerolles, France, June 19-22, 2017, Revised Selected Papers, volume 10641 of Lecture Notes in Computer Science, pages 53–70. Springer, 2017. doi:10.1007/978-3-319-72050-0_4.
  • [38] Merav Parter and Asaf Petruschka. Õptimal dual vertex failure connectivity labels. In Christian Scheideler, editor, 36th International Symposium on Distributed Computing, DISC 2022, October 25-27, 2022, Augusta, Georgia, USA, volume 246 of LIPIcs, pages 32:1–32:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.DISC.2022.32.
  • [39] Merav Parter, Asaf Petruschka, and Seth Pettie. Connectivity labeling for multiple vertex failures. arXiv preprint, 2023. doi:10.48550/arXiv.2307.06276.
  • [40] Boaz Patt-Shamir and Mor Perry. Proof-labeling schemes: Broadcast, unicast and in between. In International Symposium on Stabilization, Safety, and Security of Distributed Systems, pages 1–17. Springer, 2017. doi:10.1007/978-3-319-69084-1_1.
  • [41] David Peleg. Proximity-preserving labeling schemes. Journal of Graph Theory, 33(3):167–176, 2000. doi:10.1002/(SICI)1097-0118(200003)33:3\%3C167::AID-JGT7\%3E3.0.CO;2-5.

Appendix A Missing Proofs

A.1 Proof of Theorem 3

In order to prove this theorem, we need the following claim on NM.

Claim 10.

For every M>1, it holds that NM5NM1.

Proof of Claim 10.

For every M>0, the value of NM is defined as 2M(2M1), and so we have for M>1:

NM =2M(2M1)
=2(M1)2M+22M2(M1)2
=2(2(M1)2M1)2(2(M1))+22M+2(M1)2
2NM1+22(M1)(2M1)22(M1)+6(M1)2
=4NM1+6(M1)2
5NM1,

where the last inequality follows since M>1 and thus 6(M1)2<NM1.

We now prove the aforementioned properties of the code implied by our choice of parameters using a very crude analysis of the constants, which is sufficient for our needs.

Proof of Theorem 3.

Since M is the smallest such that kMs, we have that kM1<s, or in other words, ρM1NM1<s. We prove in Claim 10 above that NM is at most 5NM1. Combining these two with the fact that ρM1ρ=1/4, gives that

(1/4)NM1ρM1NM1skM=ρMNMρM5NM15NM1,

where the last inequality follows since ρM1. This gives that kM/s is at most 20. To encode, we take the s bits and pad them with kMs zeros concatenated at their end. We get NM bits that are split among the s nodes, so now the number of bits that each node has is at most NM/s=kM/(ρMs). We know that ρM1/4 and that kM/s is at most 20, so each node holds at most 80 bits. This proves Item (1).

To prove Item (2), note that the code 𝙲M has a relative distance δM>(12ρ)H1(1/2)(12(1/4))0.111/20. The code can fix up to δMNM1 erasures, which is more than a 1/40 fraction of NM. Since f=80F and the number of blocks s is at least f+1 (which is at least f), any erasure of F blocks erases at most a 1/80 fraction of the blocks. Some blocks may be of size |𝚠|/s and others of size |𝚠|/s, which may differ by 1 and the larger blocks may get erased. However, even if we denote x=|𝚠|/s and assume |𝚠|/s=x1, then in the worst case, the fraction of bit erasures caused by F block erasures is at most (sx/80)/(sx(79s/80))=x/(80x79), which is at most 1/40 (since x>1). Thus, this may erase at most a 1/40 fraction of bits of 𝚠. Because the code can fix up to a 1/40 fraction of erasures, we can decode 𝚛, obtaining Item (2) as needed.

A.2 Proof of Lemma 6

Proof of Lemma 6.

Let v and u be alternative nodes, and let wv,u. We claim that dist(v,w)2f+1 if and only if dist(u,w)2f+1. This implies that if vS then (S{v}){u} is a (2f+2,2f+1)-ruling set. In particular, this means that apart from v, the node u does not have a node wS within distance at most 2f+1 from it. Therefore, ID(u) cannot be smaller than ID(v), since otherwise the greedy algorithm would reach u before v when iterating over the nodes and would inserted u into S.

To prove that dist(v,w)2f+1 if and only if dist(u,w)2f+1, we assume that dist(v,w)2f+1 and prove that dist(w,u)2f+1. The proof for the other direction is symmetric. Assume towards a contradiction that it is not the case, so dist(u,w)2f+2. Thus, using the triangle inequality we have

2f+2 dist(u,w)dist(u,v)+dist(v,w)f+dist(v,w),

where the last inequality follows from the definition of u,v as alternative nodes. Thus, dist(v,w)f+2. Assume towards a contradiction that there is a node t on Pv,w such that dist(v,t)=dist(u,t). Since t is on Pv,w then |Pv,w|=|Pv,t|+|Pt,w|. Therefore,

dist(u,w) |Pu,t|+|Pt,w|
=|Pu,t|+(|Pv,w||Pv,t|)
=dist(u,t)+(dist(v,w)dist(v,t))
=dist(v,t)+(dist(v,w)dist(v,t))
=dist(v,w)
2f+1.

This contradicts the assumption that dist(u,w)2f+2. Thus, every node t on Pv,w satisfies dist(u,t)dist(v,t).

Because dist(v,w)f+2, there are at least f+1 nodes q on Pv,w such that dist(q,u)dist(q,v), and at least f1 of them are at a distance of at most f+1 from v. Thus, u is not an alternative node for v, contradicting the assumption. Therefore, dist(u,w)2f+1.

A.3 Proof of Theorem 8

Proof of Theorem 8.

Since S is a (2f+2,2f+1)-ruling set, every node uV has a node vS within distance of at most 2f+1 from it. Thus, u belongs to exactly one tree, and in the BFS in Step 3 in the algorithm, the node u chooses one node to be its leader. Let Tv be the tree rooted at v.

The BFS tree 𝑻𝒗 of every node 𝒗𝑺 satisfies that |𝑻𝒗|𝒇+𝟏.

For every node vS, there are at least f+1 nodes within distance f from it (including v itself), since the graph is connected. These nodes cannot be within distance f from any other node vS, and thus they join the BFS tree Tv. This proves that |Tv|f+1 for all trees. In particular, there exists a partition of the nodes in Tv into groups of sizes at least f+1.

The algorithm obtains a partitioning of the nodes in every tree into groups with sizes ranging from 𝒇+𝟏 to 𝟑𝒇+𝟏.

Fix vS and consider only nodes in Tv. First, we prove that all groups created until the end of Step 3 are of size ranging from f+1 to 2f+1. Let uTv. In Step 3, the node u runs the procedure 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(u). During the procedure, when creating a group, the node u starts with the first child in order(u) and adds children according to order(u) until the counter count of their remain values is at least f+1. For a group that starts with child wa and ends with child wb, in Step 3 the node u sends this group’s identifier to every child wi for which aib. Then, each wi sends in 𝚁𝚎𝚕𝚊𝚢𝙶𝚛𝚘𝚞𝚙𝚜(wi) in Step 3 this group identifier to all its children for which it did not assign a group yet during the procedure 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(wi), and its children relay this identifier in the same manner. Thus, the value of count is the size of the group, and so this size is at least f+1.

Notice that if count is at most f, when adding remain(w)f of some child w, we get that the updated count is at most f+f=2f (any remain value is always at most f, as otherwise w would form a group from these at least f+1 nodes). A special case is when a root of a subtree u adds itself to a group it created, and then count is at most 2f+1. Therefore, since the size of a group is the value of count, then the size of a group ranges from f+1 to 2f+1.

This proves that all groups created until the end of Step 3 are of size ranging from f+1 to 2f+1, but there still could be some values that u needs to handle whose sum is remain(u). For every node uS, an ancestor of u handles these values. But, the algorithm for the root vS in this case differs, as it has no ancestors. Steps 3-3 provide the root with a group ID to which it adds its remainder, so the size of that group may become at most 2f+1+f=3f+1. Thus, all the group sizes range from f+1 to 3f+1.

Notice that since every node belongs to exactly one group, then all the groups are disjoint.

For each edge 𝒆𝑬, the number of subgraphs 𝑮[𝑸𝒊]𝑯𝒊 containing 𝒆 is at most 𝟐.

An edge (u,w) is inserted into Hgroup(u) if u sends the group identifier group to w and this happens only if w sends remain(w)>0 to u=parent(w) in Step 3. There are two cases. The first is that u assigns w a group during the steps of creating groups. Thus, u does not assign w another group in this or later steps. Therefore, in this case, the edge (u,w) is inserted into only one set Hgroup(u) of u. The same argument applies if the edge is inserted into Hgroup(u) during the steps of relaying group IDs.

The second case is that u receives group from its parent during the steps of handling the root remainder. Then, the root v satisfies that remain(v)>0. Thus, according to the algorithm, it assigns additional remain(v) nodes to an existing group. Therefore, (u,w) can be inserted into at most one additional group. Overall, there are at most two groups for which u assigns e=(u,w) to Hgroup(u).

The diameter of the subgraph 𝑮[𝑸𝒊]𝑯𝒊 is at most 𝟒𝒇.

Consider two cases, based on whether the group contains the root v or not.
Case 1: Let Q be a group which does not contain the root v, and let u be the node that builds Q during 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(u). Since the group identifier is the ID of the first child in the group, there can be only a single u that creates a group with this ID, and only nodes in the subtree of u can belong to it. Let tQ. In Step 3, parent(t) sends ID(Q) to t and inserts the edge (parent(t),t) into HID(Q)(parent(t)). Additionally, if uparent(t), then parent(t) also receives ID(Q) from its parent and sets Group(parent(t))=Q in Step 3 of procedure 𝚁𝚎𝚕𝚊𝚢𝙶𝚛𝚘𝚞𝚙𝚜(parent(t)). This process continues until we reach a node w which is a child of u. Since w also receives ID(Q) from u in Step 3, then u inserts the edge (u,w) into HID(Q)(u), and w sets Group(w)=Q in Step 3 of procedure 𝚁𝚎𝚕𝚊𝚢𝙶𝚛𝚘𝚞𝚙𝚜(w). Thus, on the path from u to t, all the edges belong to HID(Q), and all the nodes on the path from w to t belong to Q including w and t. Since this holds for every node tQ, the subgraph G[Q]HID(Q) is connected, and the distance between u and t in the tree is bounded by f, as otherwise, w has enough nodes to create a group of size f+1 which includes t, contradicting the fact that t is in the group Q created by u. Thus, the diameter of every such group is bounded by 2f.
Case 2: Let Q be the group which contains the root v. There are two possibilities. If Groups(v), then the remainder of v is assigned by v itself to one of the groups it creates, and the same argument as in the previous case applies.

Otherwise, if Groups(v)=, then in the steps handling the root remainder, the node v assigns its remainder of nodes to a group that one of its descendants created. We prove that the distance of v from the deepest node in Q is bounded by 2f. This implies that the diameter of G[Q]Hi is bounded by 4f.

Let t be the node which created the group Q in its 𝙲𝚘𝚖𝚙𝚞𝚝𝚎𝙶𝚛𝚘𝚞𝚙𝚜(t) procedure. If tQ before handling the root remainder, then Group(parent(t))= at that point since otherwise, v would receive Group(parent(t)) during Step 3 instead of Group(t) and thus v would not be in Q that t created. Thus, parent(t) sets Group(parent(t))=ID(Q) and the same argument applies to every ancestor of parent(t). This implies that the distance from t to the root v is at most f, since all of the nodes on that path join Q, and if the distance was larger then there would be enough nodes to create a group that is separate from Q. Moreover, the distance from t to every node in its subtree that is in Q is at most f, as otherwise the child of t in that subtree would have f+1 to create that group. Together, this means that the distance from v of any node in Q is at most f+f=2f.

If tQ before handling the root remainder, then a similar argument applies: any child w of t which is in Q has distance at most f1 to any other node in its subtree that is in Q. The node t itself is again within distance at most f to v. Thus, the distance from v of any node in Q is at most (f1)+1+f=2f.
The above proves that the set of subgraphs H1,,HK is a set of (2,4f)-shortcuts.

Round complexity:

Note that for every vS and uTv{v}, we have that dist(u,v)2f+1. Thus, downcasting and upcasting f messages in Tv takes at most O(f) rounds by standard pipelining. Since we only downcast or upcast a constant number of times, we have that the algorithm completes in O(f) rounds.

A.4 Proof of Claim 9

Proof of Claim 9.

Every node uV holds a set Hgroup(u) for every group group that needs to communicate through u. This set contain edges to children of u. From Theorem 8, for each edge eE, the number of subgraphs G[Qi]Hi containing e is at most 2. Thus, every node u can send to its children which edges belong to which group so that the group can communicate over them. Thus, all groups can communicate over their edges in parallel. Additionally, the size of every group Qi is O(f). Therefore, on every edge, we need to pass O(f) messages, where each message is of size Y. Moreover, according to Theorem 8, for each group i, the diameter of the subgraph G[Qi]Hi is at most 4f=O(f). Hence, by standard pipelining, for every node, it can collect Yf bits of information from all nodes in its group, within O(f+(Yf)/logn) rounds.