On Time-Optimal, Fault-Tolerant Algorithms for Connected Consensus Beyond Grade Two
Abstract
A common question in the asynchronous model is whether some given notion of agreement between processes is achievable. Usually, we formalise such agreement notions in the form of agreement problems. Some of these problems also receive the name of coordination primitives. Several fault-tolerant algorithms in asynchronous systems rely upon the use of different primitives as building blocks, such as adopt-commit, crusader agreement, or graded broadcast.
Recently, the connected consensus problem – a form of agreement over a specific family of graphs parametrised by a positive integer – was introduced. This problem unifies the three mentioned primitives while extending them for multi-valued inputs. Moreover, the problem is equipped with a security condition called binding, which limits the effect of malicious processes over the decision of correct parties. While fault-tolerant connected consensus algorithms for and are known, the existence of algorithmic solutions for any positive integer parameter remained an open question.
In this work, we introduce a pair of fault-tolerant algorithms for connected consensus when the parameter is any positive integer. We introduce a crash-resilient algorithm, which is optimal with respect to the maximum number of possible faulty processes. Our second algorithm is resilient to Byzantine failures; whose failure-resilience is optimal for a specific class of algorithms. Both algorithms satisfy the binding property and match the best known time complexities achieved for the and cases, further achieving time optimality for the general case in the crash-failure setting, and asymptotic time optimality in the Byzantine scenario.
Keywords and phrases:
Approximate Agreement, Binding, Connected ConsensusCopyright and License:
2012 ACM Subject Classification:
Theory of computation Distributed algorithmsEditors:
Andrei Arusoaie, Emanuel Onica, Michael Spear, and Sara Tucci-PiergiovanniSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
A central task in theoretical distributed computing is to enquire whether it is possible to achieve some agreement notion between processes that participate in a distributed system. Commonly, this task involves solving agreement problems – coordination tasks defined using a list of properties that establish how processes must decide over a set of valid values. Typically, one must fix a specific computation model with a well-defined communication mechanism between processes to formally describe the setting of such problems. Taking this into consideration, a classical model for studying agreement problems is the asynchronous model. In this model, processes communicate point-to-point via message passing over a complete network; also, there is a guarantee of the arrival of messages, yet there is no bound on their delay. In particular, we are interested in the solvability of agreement problems in such a model when we consider processes that exhibit faulty behaviour, such as crash failures or Byzantine faults.
A cornerstone result in the area is precisely the impossibility of deterministically reaching consensus in the asynchronous model if we assume the presence of even a single faulty process [18]. In order to circumvent this impossibility, different algorithmic approaches are often explored to achieve termination for consensus protocols under particular circumstances, which combine classic mechanisms such as randomisation or failure detectors with coordination primitives, which are precisely agreement problems with weaker agreement notions than consensus. Some of such primitives include adopt-commit [20], crusader agreement [13], and graded broadcast [16].
After some consideration, it is clear that the latter primitives are similar in nature: all of them depend upon the agreement on either the same or adjacent vertices of a specific kind of graph, thus, intuition suggests an intrinsic relationship between them. Recently, Attiya and Welch [11] presented an agreement problem called connected consensus. This problem effectively unifies the resolution of the three previous primitives by reaching approximate agreement on a particular class of graphs parametrised by a positive integer which is called the refinement parameter. In [15], is also referred to as the grade of the graph; hence, an alternate name for the problem is graded consensus. Moreover, the authors further generalise such primitives by extending the problem in order to accept multi-valued inputs. Furthermore, this recent problem definition also incorporates the binding property previously defined for the binary crusader agreement problem, which results helpful at limiting the decision power of malicious processes over the protocol’s final outcome.
There exist agreement problems, such as approximate graph agreement, that generalise connected consensus. Yet their known algorithms, for example [5], though capable of tolerating Byzantine failures, do not satisfy the binding property while also requiring exponential local computation. Therefore, the connected consensus problem remains a relevant research topic when we both need agreement on special-case graphs, and adversarial decision-power limitation by means of the binding property; not to mention its usefulness in works such as [15].
It is worth mentioning that, in the paper that introduced the connected consensus problem, Attiya and Welch also present algorithms for instances where the refinement parameter is either or – one for crash faults and two for malicious failures – while preserving the binding property. Additionally, the authors prove that the crash-resilient algorithm and one Byzantine-tolerant algorithm are optimal with respect to the number of communication rounds between processes. However, it remained an open question whether it was possible to devise algorithms for connected consensus instances where , and if so, if it was possible to satisfy the binding property all the same.
Therefore, as our main contribution within this work, we present algorithms that solve connected consensus for any positive integer parameter in the presence of both crash and Byzantine failures, therefore answering the open question in the affirmative.
As a first result, we present an algorithm that is tolerant to crash failures, with being the total number of processes and the maximum number of faulty parties. This algorithm successfully solves connected consensus for any positive integer parameter. For this algorithm, we make use of an averaging approach, which allows us to quickly reduce the decision space of possible vertices that processes are able to output as a solution. We prove this algorithm optimal with respect to the maximum number of crashed processes it can tolerate; and we do so by using a fault-resilience lower bound introduced by Attiya and Welch in their seminal work, which we explain in more detail in the section dedicated to prior work. We also prove that the binding property holds in this algorithmic solution. This algorithm runs in rounds, hence matching the time complexity for of the previous algorithm for crash faults by Attiya and Welch, which was in turn, optimal according to the number of rounds.
Next, as a second and core contribution, we introduce a Byzantine-tolerant algorithm for connected consensus, which again manages to solve the problem for any instance with parameter . This algorithm tolerates up to Byzantine failures which, albeit suboptimal in the general case, is in fact optimal for a class of connected consensus algorithms that Attiya and Welch call uniform. For this class of algorithms, we refer to a proof presented by the authors of the connected consensus paper. This proof is discussed within the “previous work” section of this paper. Regarding this second algorithm, we take inspiration from an algorithm tool called witness technique; which originally Abraham, Amit, and Dolev [1] introduced to achieve a Byzantine-tolerant approximate agreement algorithm on the asynchronous setting.
By combining this approach with an averaging technique similar to the crash-tolerant algorithm, we are able to devise an algorithm that makes use of a reduced number of communication rounds. In fact, we are able to match the time complexity for of the fast Byzantine-tolerant algorithm proposed by Attiya and Welch; which in turn, is optimal for these two parameter scenarios.
| Failure model | Crash | Byzantine | |||
|---|---|---|---|---|---|
| Algorithm | Alg. 1 | [11, Alg. 1] | Alg 2 | [11, Alg. 2] | [11, Alg. 3] |
| Fault-tolerance | |||||
| Messages | |||||
| Time | 1 | 1 | 1 | 1 | 5 |
| Time | 2 | 2 | 2 | 2 | 7 |
| Time | - | - | - | ||
| Binding property | Yes | Yes | Yes | Yes | Yes |
For reference, Table 1 compares the algorithms introduces in this paper with prior work, from where it becomes clear that the algorithms introduced in this work match time-optimality from previous algorithms for , while achieving for the first time solutions for positive values of beyond grade . It is important to mention that both algorithms introduced in this paper work for the multi-valued input scenario; and that, as a key feature, preserve the binding property. In brief, our contributions are the following:
-
We introduce a connected consensus algorithm resilient to crash failures for multi-valued inputs. This first algorithm exhibits tolerance to at most faulty processes, which is optimal using a result discussed in the next section. Also, it uses one communication round for and two communication rounds for . Therefore, our Algorithm matches the time-complexity of the crash-fault algorithm introduced by Attiya and Welch for the and cases; which turns out to be optimal for these scenarios. Moreover, this algorithm achieves solutions for for the first time using a logarithmic number of rounds. We also show that this approach satisfies the binding property.
-
As a second contribution, we provide a Byzantine-resilient algorithm with fault tolerance of and logarithmic time complexity. As with the crash-tolerant algorithm, this algorithm effectively solves connected consensus for any positive parameter . In a similar fashion to the previous algorithm, our algorithm matches the time-complexity of the only known algorithm for malicious failures, also optimal for the scenarios. Regarding fault-tolerance, we rely on a preliminary result which we present in the subsequent section to argue that our approach attains the best possible resilience bound without sacrificing clarity in its implementation. Finally, we prove that the binding property holds for this second algorithm as well.
The remainder of this paper is structured as follows. In Section 4, we provide a review of the previous results known algorithms for connected results presented by Attiya and Welch, in addition, we will briefly describe the interesting witness technique, which served as inspiration for the development of the Byzantine-tolerant algorithm we present in this work. Section 2 covers the definition of the asynchronous model upon which our problem is defined. Next, Section 3 presents the formal definition of the connected consensus problem for multi-valued inputs; we define the binding property, and include several other key definitions and useful notation as well.
As the core discussion of our work, Sections 5 and 6 present the fault-tolerant connected consensus algorithms for crash and Byzantine faults respectively. Finally, we bring our work to a close in Section 8.
2 Model of computation
In this work, we consider a standard message-passing asynchronous model consisting of a set of processes. In this model, processes communicate by sending messages to one another through a complete communication network of reliable bi-directional channels. These channels are point-to-point, therefore every process is able to directly communicate with any other (including itself) within the network.
As we consider full asynchrony in our model, we assume that messages sent in the communication network have an unbounded, yet finite transit time between the instant a certain process sends a message and the moment the target process receives it.
We also consider two possible kinds of failures that processes in our model may present: crash and malicious failures. Crash failures occur when faulty parties stop taking any additional computing steps and, consequently, stop communicating with other processes in the network from a certain instant in time onwards. For this kind of failure, we assume that processes do not recover and stay silent during the remainder of the execution of any distributed algorithm. On the other hand, parties that exhibit malicious failures (also known as Byzantine failures) behave in an arbitrary manner that may be even aimed to corrupt the outcome of the algorithm for correct parties. We call any process that exhibits faults faulty; furthermore, we use the terms malicious or dishonest for a process if its faults are Byzantine. Additionally, we denote by the maximum number of parties that behave in a faulty manner in the execution of a protocol.
In our model, processes are given a particular input value at the start of their execution from a set of possible input values . Furthermore, as we are interested in the decision that correct processes reach during the execution of a distributed algorithm, we also consider a particular output set from which processes end up deciding a particular value which we refer to as the output or decision of the process. If some correct process has decided a value among the output set, we say that it has decided on a value; notice that this decision may occur at any point of the execution of the algorithm.
Finally, in order to evaluate the performance of algorithms that solve the connected consensus problem, we present both time and communication measures for complexity. Take into account that we analyse complexities for the worst-case scenario of any execution. For time complexity, we adopt the same definition for asynchronous message-passing systems that Attiya and Welch use in the previous paper [11]. It is important to note that the definition used in this paper is, in turn, adopted from [9]. While this time measure is adequate for a granular time complexity analysis, for simplicity, we analyse our algorithms in terms of “rounds”. This is a time measure used likewise in the previous paper, which is directly comparable “time complexity” measure. A proper definition of this measure appears in [2]. Throughout this paper, we use the terms rounds and communication rounds interchangeably.
For communication complexity, we simply consider the maximum number of messages sent by correct processes over all the possible executions of a connected-consensus solving algorithm.
3 The Connected Consensus Problem definition
Let be a set of well-ordered values, and consider a special value . Given a specific positive integer value which we call the refinement parameter (also grade, following [15]), we denote by the “spider” graph with central vertex , such that has exactly paths spanning from the central vertex, each with length and having vertices different than the central vertex, labelled through for . Notice that is indeed a special kind of tree, thus connected and acyclic, and that for any value , the vertex is a leaf.
Hence, is given by the vertex set
and the edge set
Moreover, consider a subset of input values and define , the minimal subtree of as the induced subtree that connects the set of leaves . It is important to note that when consists of a singleton , then the minimal subtree is a single leaf, namely . Furthermore, let be the set of leaves induced by the inputs of correct processes. We define as the correct minimal subtree of .
Finally, let be a pair of vertices of . We define the middle vertex of as if and and neither nor equal , if , and, without loss of generality, if and .
Thus, we define the connected consensus problem for ,, with each process receiving a particular input value from through the following properties:
- Termination.
-
Each correct process must decide on a vertex of .
- Validity.
-
The decision of each (correct) process must be a vertex within the correct minimal subtree of . It is important to note that, in particular, if all processes start with the same input (and thus is a singleton), then the vertex must be the decision for each (correct) process.
- Agreement.
-
The distance between the vertices labelled by the decisions of all (correct) processes is at most one.
Notice that the output set for the connected consensus problem for is exactly the set , but since there is only one input value associated with each vertex of such set, a value is uniquely determined from the decision of each process.
Alongside the previous requirements for the connected consensus problem, we may require yet another interesting safety condition called binding, as proposed before for the Crusader Agreement primitive [13]. Such property can be stated as follows:
- Binding.
-
After the first (correct) process decides a vertex , with , the decision of every other (correct) process must be on the same branch labelled by of the spider graph.
It is noteworthy that in any execution in which the first correct process decides any vertex but , then the binding condition follows from Agreement; moreover, it is not difficult to deduce that the same argument holds when .
4 Previous work
In the full version of the paper defining the connected consensus problem [10], the authors also establish lower bounds on both fault-resilience and time-complexity for connected consensus algorithms that tolerate either crash faults or malicious behaviour. Using these bounds as a reference, Attiya and Welch proposed the only multi-valued connected consensus algorithms with the binding property to date.
In this section, we first review and formally state these lower bounds, noting that that they also hold for the algorithms that we present as the main contribution of our paper. Then, we discuss the only known connected consensus algorithms, highlighting their time and message complexity, while examining both their strengths and limitations.
Finally, we provide a brief overview of the witness technique, a recently proposed method originally applied to the closely related approximate agreement problem. This discussion is included since the technique plays a central role in the Byzantine-tolerant algorithm that we propose, which solves multi-valued connected consensus with binding for any positive integer parameter .
4.1 Connected consensus lower bounds for fault-tolerance and time complexity
Following the results presented by Attiya and Welch in the full version of the connected consensus paper, we note that there is a fault-resilience lower bound for solving connected consensus in the presence of crash faults. The authors obtained this result using a standard partition argument over the set of processes of the distributed system. We formally state the result as follows:
Proposition 1 ([10, Appendix A]).
For any connected consensus algorithm for with processes, out of which at most may crash, then at least is required.
Additionally, there is a lower bound for the number of rounds required to achieve connected consensus when considering crash faults and under a particular fault-resilience assumption. For this bound, the authors employed a reduction from approximate agreement. We state this lower bound result as follows:
Proposition 2 ([10, Appendix B.1]).
For any connected consensus algorithm for with processes, out of which at most may crash, at least two communication rounds are necessary if .
Now, regarding Byzantine faults, we also find lower bounds for both the maximum number of processes that may behave maliciously and the number of communication rounds required when , obtained by means of analogous arguments to those of the crash-faults scenario. Such bounds are the following:
Proposition 3 ([10, Appendix A]).
For any connected consensus algorithm for with processes, out of which at most are malicious, then at least is required.
Proposition 4 ([10, Appendix B.1]).
For any connected consensus algorithm for with processes, out of which at most may crash, at least two communication rounds are necessary if .
Finally, we present a more subtle result obtained by Attiya and Welch related to the binding property. Consider a connected consensus algorithm such that each process obtains values, at most one, from each process, and then decides based only on the multiset of values it has received; moreover the algorithm is such that for any pair of processes that obtain values corresponding to some process , then the values received from are the same, and if is correct, then the received values correspond to ’s input. Therefore, we say this algorithm is uniform. With this in mind, the result is the following:
Proposition 5 ([10, Appendix G, Theorem 19]).
For any uniform connected consensus algorithm for with processes, out of which at most are malicious such that satisfies the binding Property, then .
4.2 Previous connected consensus algorithms
As a first algorithm, Attiya and Welch introduced a crash-tolerant algorithm that works when is either or , while requiring . The key idea behind their solution is to perform as many communication rounds as the value of R: the algorithm first executes a communication round to decide on a branch of the spider graph; next, it uses a second round to decide on a particular vertex that lies within the selected branch, beginning at the central vertex.
Using a natural generalisation of this approach, we can obtain a safe algorithm with the same resilience. The generalisation works as follows: the first communication round again fixes a branch of the spider graph. Then, in subsequent rounds, the processes start at the central vertex and progressively refine the outcome, moving one vertex at a time towards the leaf corresponding to the chosen branch. Clearly, a drawback of this extension is its time complexity, since it would require a linear number of rounds, namely, as many rounds as the parameter. We state the formal result regarding this crash-tolerant algorithm proposed by Attiya and Welch as follows:
Proposition 6 ([11, Theorem 3]).
Consider a standard asynchronous model consisting of processes, out of which at most may crash. If , then Algorithm from [11] solves connected consensus with binding for in rounds, sending messages.
Furthermore, the authors present a pair of Byzantine-tolerant connected consensus algorithms for . The following are the formal results related to these algorithms, respectively:
Proposition 7 ([11, Theorem 4]).
Consider a standard asynchronous model consisting of processes, out of which at most are Byzantine. If , then Algorithm from [11] solves connected consensus with binding for in rounds, sending messages.
Proposition 8 ([11, Theorem 5]).
Consider a standard asynchronous model consisting of processes, out of which at most are Byzantine. If , then Algorithm from [11] solves connected consensus with binding for in rounds respectively, sending messages.
From the results in Section 4.1, it is clear that Attiya and Welch’s Algorithm is both optimal in time and fault tolerance. On the other hand, Byzantine-tolerant Algorithm is fault-optimal, whereas Byzantine-tolerant Algorithm is time-optimal. Lastly, an important observation is that all three algorithms assume ; hence, algorithmic solutions for remained unattempted in the original paper. In the following two sections, we address this gap by presenting new algorithms that extend the results to the general case .
4.3 The witness technique and resilience-optimal approximate agreement
As discussed in the introduction, several agreement problems have been introduced due to the impossibility of achieving consensus in asynchronous systems. One such problem first proposed by Dolev et. al [14] is approximate agreement, which arises from the relaxation of the consensus agreement property. The goal of this problem is for a set of processes receiving inputs from a given space to eventually decide values that lie in the convex hull of the inputs of correct processes. Additionally, any algorithm that solves approximate agreement must ensure that any two decided values differ in less than a fixed parameter. This parameter serves as an upper bound for the distance between any two decided values. As stated before, we can define the approximate agreement problem for through the following list of properties. This list is analogous to consensus, except that the agreement property is relaxed:
- Termination.
-
Each nonfaulty process decides a value.
- Agreement.
-
If processes decide values , then for all nonfaulty processors , (no two correct processes decisions differ more than ).
- Validity.
-
Any decided value from honest parties must be within the convex hull of the initial values of honest processes.
A key result established by Fischer, Lynch, and Merritt [17] is that fewer than processes may exhibit malicious behaviour to solve approximate agreement in the asynchronous model.
Since the introduction of approximate agreement, several fault-tolerant algorithms have been proposed. Among these, an interesting solution by Abraham, Amit, and Dolev [1], which matches the mentioned resilience lower bound, employs a novel idea known as the witness technique. In this approach, processes exchange their inputs using reliable broadcast, which prevents malicious parties from sending different values to different processes. Additionally, each process waits to receive a set of process identifiers from each other process. Next, if at some point the set of processes of for which it has received input values so far fully contains the set of processes reported by some process , then marks as a witness.
This technique is the core mechanism behind the gather coordination primitive, alongside applications such as asynchronous key generation [3], and a consensus protocol that relies on randomisation [9]. Finally, it is noteworthy that our Byzantine-tolerant algorithm in Section 6 draws heavy inspiration from the witness technique. Within the latter algorithm, this technique plays a central role in its ability to mitigate the effects of values proposed by malicious parties.
5 Crash-Tolerant Algorithm
Consider an asynchronous system with processes, out of which may crash. Recall that, following Proposition 1, is required to solve connected consensus for any instance where . Taking this into consideration, we introduce in this section a connected consensus algorithm for processes that works for any positive integer refinement parameter using logarithmic time, more specifically, it uses communication rounds, thus matching the optimal complexity for as detailed in Proposition 2.
The central idea behind this algorithm is to note that the fault-resilience bound nicely outlines a fixed number of different scenarios in which the input values of processes may be categorised in any instance of the connected consensus problem. More precisely, it turns out that there exist only three viable scenarios for any input set (Appendix A, Corollary 17). Using this result as a starting point, we devise an algorithm whose core property is the following: at the end of each round we preserve a restricted number of scenarios for the values that processes hold as their current value. We achieve this by using an averaging approach, in which processes exchange values, and compute values that lie in the average of the set of received values. In a nutshell, we initialise the current decision as a leaf on the spider graph, and we then update it as new information regarding the topology of the network is gathered.
The algorithm proceeds as follows. At the beginning of the algorithm, each process with input starts with a vertex variable, initially pointing to the leaf vertex . It is important to notice that this approach differs from the one used in the previous crash-tolerant algorithm from [11]; since we begin at leaf vertices and then work inwards, instead of first choosing a branch and then work outwards.
Afterwards, during each round labelled by , every process sends its current vertex value in a message tagged with the Roundr flag. Next, each process waits for exactly messages for the current round, and stores the corresponding vertex values in a set. Then, the process computes a new vertex value for its vertex variable, based in the number of different values stored in the set. For any process, one out of three possible vertices is assigned to its variable. These vertices are either:
-
A vertex if is a singleton .
-
A middle vertex if consists of exactly one pair of vertices.
-
The special central vertex if consists of more than two vertices.
As the core result behind the behaviour of the algorithm, we prove an important invariant: that at the end of any round, there exist only two possible vertices that figure as values of the variable of any process. Finally, after exactly rounds, each process decides the vertex value stored in its variable during the final round. By stating the previous invariant as an extended result (Appendix A, Lemma 21); we show that, at the end of the algorithm, all decided vertices lie at distance at most one in the spider graph. Hence, the agreement property holds.
As a subtle (yet important) result, we prove that for the single-input-value scenario, all processes decide the same vertex of the spider graph (Appendix A, Lemma 19). Therefore, the validity property also holds. Moreover, since the number of rounds does not depend on the input of the algorithm, termination holds.
From the previous arguments, we now present the main result regarding Algorithm 1. The complete proof of this result appears in Appendix A.
Theorem 9.
If , then Algorithm 1 solves connected consensus for with processes, up to of which may crash; using rounds, and sending messages.
As we mentioned in the introduction, a main reason behind proposing ad-hoc algorithms for connected consensus is that achieving the binding property is not trivial in algorithms for more general families of graphs. Thus, as a fundamental result, we prove that this algorithm also satisfies such property. We state this result as follows:
Theorem 10.
Algorithm 1 satisfies the binding condition for .
The proof behind this result is also addressed in Appendix A. The intuition behind this proof is simple: we show that at the end of the first round, the variable of any process can only point to one vertex out of a fixed pair of vertices which is solely determined by the inputs. Therefore, the corresponding branch of the spider graph whose ends are such vertices is already determined at the end of the first round.
6 Byzantine-Tolerant Algorithm
In this section, we outline the description of a connected consensus algorithm resilient to Byzantine failures. We take inspiration from the Attiya and Welch Byzantine-resilient algorithm with the same fault-tolerance discussed in Section 4.2, as well from the witness technique presented by Abraham, Amit, and Dolev [1], which we already discussed in the Previous Work section.
The idea of the algorithm is similar to the one used for the crash-tolerant algorithm introduced in the previous section; with the difference that we incorporate an additional mechanism to prevent malicious processes from corrupting the decision of correct processes. For this end, we rely on the use of the witness technique in each but the second round, alongside the convenient application of the reliable broadcast primitive, for which we assume a given implementation such as the one from Bracha [12].
In order to modularise the algorithm and to make the use of the two mentioned tools clear, we first define a subroutine called WitnessCollect, parametrised with a round number .
This subroutine receives as additional parameters a value which will be sent to other processes, a multiset where values coming from other processes will be stored, and an integer set that will store process identifiers in order to maintain small message sizes. The subroutine proceeds as follows for round : Firstly, each process sends the value alongside their identifier within a message labelled with using reliable broadcast. Next, after receiving such messages (at most one per process) each process sends the set of identifiers of processes corresponding to received messages together with their identifier under the label via reliable broadcast, while it keeps receiving messages and storing their respective senders in the integer set. Afterwards, each process waits to receive messages, reporting sets with process identifiers. If, at some given moment, a set of processes received from some process is entirely contained in the set of processes of process , then marks as a witness and adds it to a set of witnesses. Finally, once a process has successfully collected at least witnesses, then it removes the largest and smallest values from and stops adding further values to it.
Once we have outlined the signature of the WitnessCollect subroutine, we describe in detail the behaviour of the main algorithm.
At the beginning of the algorithm (as in the crash-tolerant approach) processes with input are assumed to “point” to the corresponding leaf vertex through a vertex variable. Next, the number of rounds to perform is determined by the refinement parameter , each communication round being labelled with an integer . Upon the start of each round labelled with , a multiset and a set are defined. Next, the algorithm performs a different sequence of steps depending on the current round counter.
If the first round is executed (namely, if ), each process calls the WitnessCollect subroutine with the first entry of their vertex variable as the value argument. In other words, each process communicates its input value with all other processes, and waits to learn theirs. Subsequently, each process gathers input values from other processes and removes outlier values upon collecting at least witnesses. After removing outliers, each process sets its vertex variable to the leaf corresponding to value branch of the spider graph if the remaining values in the multiset are all copies of some value . Otherwise, if there is more than one distinct remaining value in , the process sets vertex to the central vertex . As we did for the crash-tolerant algorithm, we also show that there is a restricted number of different vertices that any correct process stores in the vertex variable at the end of the first round (Appendix B, Lemma 38). More precisely, those vertices are either or , with being some fixed input value coming from correct processes. In fact, this result prevents Byzantine processes from proposing invalid vertices in subsequent rounds, by fixing a branch at the end of the first round.
Next, during the second round, processes exchange the value of the branch computed during the first round: either , or . After receiving Branch messages, and storing the corresponding values, processes recompute their vertex value if necessary, according to the multiplicity of the values gathered in the multiset. Therefore, each process finishes the round with one out of three possible vertices in their vertex variable: either , , or . The reason behind this idea is to ensure that the branch values across all processes are consistent at the end of the round. Namely, all process point to vertices different than the central vertex within the same branch . Otherwise, if some process points to the central vertex, the only other vertex that any other process may point to is . This way, in subsequent rounds, processes avoid inconsistent values in their vertex variable, such as , which may appear if we use an averaging approach in each round. We use a result to show that these are the only two possible scenarios (Appendix B, Lemma 42).
Finally, in the remaining rounds we use the same approach as in the first round, with the only difference that processes aim to agree in the grade of their current vertex. As an important remark, we ensure that if the scenario with at least one process pointing to happens, then each subsequent round preserves the property of consistent vertices. Otherwise, each round contributes in reducing the distance between vertices lying in the same branch. This approach is key to making grade values converge rapidly.
Using the previous arguments, we manage to prove that in any scenario, the vertices decided by correct processes converge to distance at most one. As for proving validity and termination, we rely on analogous arguments to those of Algorithm 1.
Hence, as our central correctness result for the Byzantine-tolerant Algorithm 1 we have the following Theorem, whose complete proof is presented in Appendix B.
Theorem 11.
If , then Algorithm 2 solves connected consensus for with processes, up to of which may be Byzantine; using rounds, and sending messages.
Finally, to conclude this section; in the same manner as with Algorithm 1, we prove that our Byzantine-tolerant algorithm also satisfies the binding property. The full proof is presented in Appendix B, and the formal result is as follows:
Theorem 12.
Algorithm 2 satisfies the binding condition for .
Again, the proof follows a simple argument that we previously discussed while presenting the algorithm. Recall that at the end of the first round, the algorithm fixes a particular branch, namely, the one induced by the only two possible vertices , and . As we mentioned, this particular result prevents malicious process from proposing invalid branches for the remainder of the algorithm, following the removal of outliers within the WitnessCollect subroutine.
7 Time and message complexities
Now, we analyse both the time and message complexities of the algorithms introduced in the previous sections. First, we do it so for the crash-tolerant algorithm, and then, we do it again for the Byzantine-tolerant one.
Before we move on to the discussion of such complexities; it is worth noting that the prior work by Attiya and Welch already introduced some useful time lower-bounds for connected-consensus algorithms when in the full version of the paper. We can summarise those results through the following remark:
Remark 13 ([10, Appendix B]).
Two rounds are necessary for solving connected consensus when in the presence of crash failures if . The same holds for malicious failures if .
7.1 Complexities for Algorithm 1
Beginning with time-complexity analysis: Since at least basic communication between processes is required, observe that at least one communication round is needed; hence, our first algorithm is optimal for . Following Remark 13, observe it is also optimal for .
Now for , we recall that, according to our previous description of the algorithm, a branch of the spider graph has already been fixed by the second round. Thus, the algorithm only needs to secure the agreement property in subsequent rounds. Observe that this problem is exactly the problem of approximate agreement on a subdivided path, on a path of length .
Therefore, in order to prove round optimality, we refer to a previous result by Attiya, et. al. [8] for asynchronous approximate agreement on a subdivided path. Observe that this problem is analogous to the one we just described, if we consider a path subdivided in segments. We restate their result as follows:
Proposition 14 ([8, Proposition 17]).
For every , agreement cannot be solved by processes in less than rounds.
Therefore we get the following result, considering a discretization of the interval:
Theorem 15.
For every path of length , approximate agreement on an edge of the path cannot be solved by processes in less than rounds.
Thus, using the previous result from the reduction from approximate agreement, we show that Algorithm 1 is time-optimal for .
Now, onto the message-complexity of the algorithm, it is clear that it corresponds to , since at any round , each process sends a message labelled with to any other process.
7.2 Complexities for Algorithm 2
For our second Algorithm, observe that if we observe the description of the Algorithm alone, without considering the WitnessCollect subroutine, the same arguments hold for both message and time complexities. Thus, we argue that this algorithm is still asymptotically time-optimal in terms of rounds, given the subroutine only implies a constant round overhead. Hence, the asymptotic complexity for the general case remains just the same.
Finally, with regards to message-complexity, at any round, we only consider the overhead of using some reliable broadcast implementation. Therefore, when using an implementation such as Bracha’s, we obtain the same complexity [19].
8 Conclusions
In this work, we successfully obtained connected consensus algorithms that work for cases where the refinement parameter is any positive integer, thus filling the gap left open by the algorithms presented in the original paper for scenarios where . We achieve this both for crash and Byzantine failures, while being driven by the goal of achieving asymptotic time-optimality in terms of communication rounds. Furthermore, in all presented algorithms we achieved the much-desired binding property, which is clearly deduced by the modular and clear nature of the algorithms. Our crash-tolerant algorithm achieves both time and resilience optimality, while using an approach that improves the -round-dependant approach used by Attiya, Welch. Moreover, although this algorithm works for any positive , it is self-contained in terms of employed primitives, since it relies only on basic message deliveries.
As the main contribution of this work, we provide an affirmative result concerning the possibility of achieving connected consensus in the presence of malicious faults for via our Byzantine-tolerant algorithm, which achieves round optimality for the cases as prior algorithms did, while doing it so for . Furthermore, although it requires , we deduce following the discussed result presented by Attiya and Welch that this resilience is needed given the uniform nature of the algorithm. We consider that the tradeoff between time and resilience complexity employed by our approach is justifiable in the light of the clarity and modularity achieved by our algorithm (cf. [11, Algorithm 3], which achieves fault-optimality through a significantly more complex code than the fault-suboptimal [11, Algorithm 2]). It is worth noting that concurrent work by Attiya, Flam, and Welch, presented as a DISC Brief Announcement [7] (full version [6]) also achieves a Byzantine-tolerant connected consensus algorithm for any through a reduction from the gather primitive.
As this work addresses the question of algorithmic solutions for connected consensus in any possible parametrised scenario where time-optimality is prioritised, an interesting further research direction is to inquire whether achieving fault-optimality in the presence of malicious faults justifies the intuitive required tradeoff with time-optimality. Another development is the possibility of translating the presented algorithms to models that differ from the one we used throughout this work either in communication mechanism (e.g. shared memory), timing (e.g. eventual synchrony), connectivity, or the nature of allowed faults. Finally, a possible future work is clearly the one related to find possible applications of connected consensus in different problems that benefit from the intuitive semantic behind the parameter, this is, a parametrisation of the degree of certainty (or preference) that a given process has over a proposed value. This may well include practical applications such as voting systems that quantify the trust/distrust of the participating agents, or theoretical ones that benefits from the use of variants of the commonly used crusader agreement primitive, such as [4].
References
- [1] Ittai Abraham, Yonatan Amit, and Danny Dolev. Optimal resilience asynchronous approximate agreement. In Proceedings of the 8th International Conference on Principles of Distributed Systems, OPODIS’04, pages 229–239, Berlin, Heidelberg, 2004. Springer-Verlag. doi:10.1007/11516798_17.
- [2] Ittai Abraham, Naama Ben-David, and Sravya Yandamuri. Efficient and adaptively secure asynchronous binary agreement via binding crusader agreement. In Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, PODC’22, pages 381–391, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3519270.3538426.
- [3] Ittai Abraham, Philipp Jovanovic, Mary Maller, Sarah Meiklejohn, Gilad Stern, and Alin Tomescu. Reaching consensus for asynchronous distributed key generation. In Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, PODC’21, pages 363–373, New York, NY, USA, 2021. Association for Computing Machinery. doi:10.1145/3465084.3467914.
- [4] Yehuda Afek, James Aspnes, Edo Cohen, and Danny Vainstein. Brief announcement: Object oriented consensus. In Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC ’17, pages 367–369, New York, NY, USA, 2017. Association for Computing Machinery. doi:10.1145/3087801.3087867.
- [5] Dan Alistarh, Faith Ellen, and Joel Rybicki. Wait-free approximate agreement on graphs. Theoretical Computer Science, 948:113733, 2023. doi:10.1016/j.tcs.2023.113733.
- [6] Hagit Attiya, Itay Flam, and Jennifer L. Welch. Beyond canonical rounds: Communication abstractions for optimal byzantine resilience, 2025. doi:10.48550/arXiv.2510.04310.
- [7] Hagit Attiya, Itay Flam, and Jennifer L. Welch. Brief Announcement: Communication Patterns for Optimal Resilience. In Dariusz R. Kowalski, editor, 39th International Symposium on Distributed Computing (DISC 2025), volume 356 of Leibniz International Proceedings in Informatics (LIPIcs), pages 46:1–46:7, Dagstuhl, Germany, 2025. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2025.46.
- [8] Hagit Attiya, Pierre Fraigniaud, Ami Paz, and Sergio Rajsbaum. One step forward, one step back: Flp-style proofs and the round-reduction technique for colorless tasks. In Rotem Oshman, editor, 37th International Symposium on Distributed Computing, DISC 2023, October 10-12, 2023, L’Aquila, Italy, volume 281 of LIPIcs, pages 4:1–4:23. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.DISC.2023.4.
- [9] Hagit Attiya and Jennifer Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics. John Wiley & Sons, Inc., Hoboken, NJ, USA, 2004.
- [10] Hagit Attiya and Jennifer L. Welch. Multi-valued connected consensus: A new perspective on crusader agreement and adopt-commit, 2023. doi:10.48550/arXiv.2308.04646.
- [11] Hagit Attiya and Jennifer L. Welch. Multi-Valued Connected Consensus: A New Perspective on Crusader Agreement and Adopt-Commit. In Alysson Bessani, Xavier Défago, Junya Nakamura, Koichi Wada, and Yukiko Yamauchi, editors, 27th International Conference on Principles of Distributed Systems (OPODIS 2023), volume 286 of Leibniz International Proceedings in Informatics (LIPIcs), pages 6:1–6:23, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2023.6.
- [12] Gabriel Bracha. Asynchronous byzantine agreement protocols. Inf. Comput., 75(2):130–143, November 1987. doi:10.1016/0890-5401(87)90054-X.
- [13] Danny Dolev. The byzantine generals strike again. Journal of Algorithms, 3(1):14–30, 1982. doi:10.1016/0196-6774(82)90004-9.
- [14] Danny Dolev, Nancy A. Lynch, Shlomit S. Pinter, Eugene W. Stark, and William E. Weihl. Reaching approximate agreement in the presence of faults. J. ACM, 33(3):499–516, May 1986. doi:10.1145/5925.5931.
- [15] Mose Mizrahi Erbes and Roger Wattenhofer. Asynchronous approximate agreement with quadratic communication, 2025. doi:10.48550/arXiv.2408.05495.
- [16] Paul Feldman and Silvio Micali. Optimal algorithms for byzantine agreement. In Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, STOC ’88, pages 148–161, New York, NY, USA, 1988. Association for Computing Machinery. doi:10.1145/62212.62225.
- [17] Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. In Proceedings of the Fourth Annual ACM Symposium on Principles of Distributed Computing, PODC ’85, pages 59–70, New York, NY, USA, 1985. Association for Computing Machinery. doi:10.1145/323596.323602.
- [18] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985. doi:10.1145/3149.214121.
- [19] Thomas Locher. Byzantine Reliable Broadcast with Low Communication and Time Complexity. In Silvia Bonomi, Letterio Galletta, Etienne Rivière, and Valerio Schiavoni, editors, 28th International Conference on Principles of Distributed Systems (OPODIS 2024), volume 324 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1–16:17, Dagstuhl, Germany, 2025. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.OPODIS.2024.16.
- [20] Jiong Yang, Gil Neiger, and Eli Gafni. Structured derivations of consensus algorithms for failure detectors. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing, PODC ’98, pages 297–306, New York, NY, USA, 1998. Association for Computing Machinery. doi:10.1145/277697.277755.
Appendix A Correctness proof for Algorithm 1
First, we prove a couple of results directly related to the resilience constraint for solving connected consensus in the asynchronous model while considering at most crash failures; then, starting from Lemma 18 we prove results directly related to the behaviour of the algorithm, leading to Theorem 9, which finally proves the correctness of the algorithm. Additionally, Theorem 10 proves that the binding property also holds for the algorithm.
Lemma 16.
If , for any input set , then there is at most one input value that appears at least times.
Proof.
Suppose a pair of values that both occur at least times. Let be the sets of processes such that have as input respectively. Since each process has a unique input value, it follows that . Given we assume that both are input for processes, then . Let be the set of all processes, and note that . Since are disjoint, it follows that , and as we assume , it follows:
Therefore:
But this cannot happen, since . Thus, necessarily .
Corollary 17.
If , the following are the only possible scenarios for any input set :
-
(a)
The input set consists of a singleton, i.e. .
-
(b)
There exists an input value that occurs at least times but less than times.
-
(c)
Any input value appears less than times.
Proof.
Let be the set of input values. If is a singleton, we get (a). Otherwise, from Lemma 16, there is at most one input value that appears at least times. If that is the case, we get (b), otherwise, we get (c).
Lemma 18.
At the end of the first round, at most two vertices of the spider graph are possible values for the vertex variable of any process.
Proof.
We analyse the three different scenarios for the input set from Corollary 17:
-
(a)
The input set consists of a singleton . Thus, in round any process receives at least messages equal to round1,). Given is the only input for any process, it follows that for all processes, thus, vertex is set to the vertex for every process.
-
(b)
There exists an input value that occurs at least times but less than times; therefore, the same number of processes necessarily begin with their vertex variable set to . Let be an arbitrary process; we have the following cases after receiving round1 messages:
-
If all the messages received by were for , then , with setting vertex to thereafter. From Corollary 1, is the only possible value that appears at least times, therefore there is a single possible vertex choice for the vertex variable.
-
If for some input values . Then, by line 9 of the algorithm, the middle vertex between is chosen as value for vertex. Since by initialization both vertices are leaves of the graph, the middle vertex has to be .
-
If contains at least three elements, it follows from line 11 that the vertex is chosen as value for vertex.
-
-
(c)
Any input value appears less than times. Therefore, the argument is analogous to the two latter cases for an arbitrary process after receiving round1 messages in scenario (b).
From the above, it follows that only one vertex is a possible value for the vertex variable in the unique-input scenario, the pair of vertices in scenario (b), and in scenario (c).
Lemma 19.
If there is a unique input value for any process, the algorithm ends with every correct process deciding the vertex .
Proof.
By induction on , we prove that every process has its vertex variable set to at the end of round , and thus, it ends deciding such vertex. Let . From the proof of the previous lemma, note that at the end of round , every correct process has its variable vertex set to .
Assume for . Therefore, in round every process sends roundr+1,). After receiving roundr+1 messages the same argument as the first round holds; hence, it is true that for every correct process , and so every process sets its vertex variable to .
Lemma 20.
At the beginning of the execution of any round of the algorithm, there is at most one vertex that appears at least times.
Proof.
Let be an arbitrary round of the algorithm and suppose a pair of vertices such that both occur at least times. Let be the sets of processes such that have in their vertex variables respectively. Since each process has a unique vertex value stored in its vertex variable, note that using a similar argument to the one on Lemma 16, necessarily . Hence, at most one vertex appears at least times at the beginning of round .
Lemma 21.
If there is more than an input value for any process, at the end of the algorithm, one of the following occurs:
-
All (correct) processes decide the same vertex .
-
All (correct) processes end deciding between one of two vertices at distance .
Proof.
Note from Corollary 17 that we are now dealing with the scenarios (b) and (c) from the same result; thus, from Lemma 18, at the end of round either the central vertex or a unique leaf vertex (with being the sole input value that appears at least times) are possible values for the vertex variable. Thus, we have the following cases at the end of round 1:
-
(a)
Every process has the leaf in its vertex variable. Since we only consider crash failures, no other message different than vertex2,) is received by any correct process in round . Thus, for every process, and is stored in the value vertex at the end of such line. Assume for ; therefore, the same argument holds for round and each process ends with the leaf in their vertex variable at the end of round , thus deciding at the end of the algorithm.
-
(b)
Every process has in its vertex variable. Note that since we consider crash failures only, the argument is analogous to the previous case, with every process deciding at the end of the rounds.
-
(c)
The values for the vertex variables of any process are either , or . Note and are the only possible vertices stored in the vertex variable of any process. Hence, we propose the following invariant:
At the end of round , either all processes have their vertex variable set to one unique vertex , or to one of two vertices at distance at most .
For , we explicitly are in the case that the vertex variables of any process are either the central vertex , or the leaf . Note that the distance between such vertices is exactly .
Assume the invariant holds for the round . For the round , if at the end of round , either all processes have their vertex variable set to one sole vertex , therefore the analysis is the same as in cases (a),(b), with each process storing the vertex at the end of round .
Otherwise, let be the two vertices that figure as the value stored in the vertex variable of any process. From Lemma 20, at the beginning of round , at most one of the mentioned vertices appears at least times. Then, we consider the following cases:
-
If both appear less than times, then in round every process sends either roundr+1,) or roundr+1,). After receiving roundr+1 messages, for lines 4,5 of the algorithm, necessarily for each process it is true that since neither , nor appear at least times. Thus, every process sets its vertex variable to the middle vertex between . Given the middle vertex is unique (if the distance between is odd) or a pair of vertices at distance at most (if the distance between i even), the invariant holds.
-
If one of the two vertices appear at least times, without loss of generality assume it corresponds to . Note that in round every process sends either roundr+1,) or roundr+1,). Consequently, each process waits afterwards for messages labelled for this round. Since appear at least times, for each process its vertex set may be either or . Therefore, the vertex variable may be assigned to either or the middle vertex between for every (correct) process. In addition, we notice that the distance between is , thence the distance between and the middle vertex between is .
Finally, since we showed the invariant to be true, after rounds, we end up with all processes either deciding the same vertex in the Spider Graph, or between one of two vertices at distance .
-
Theorem 9. [Restated, see original statement.]
If , then Algorithm 1 solves connected consensus for with processes, up to of which may crash; using rounds, and sending messages.
Proof.
We show that the three defining properties of connected consensus hold in any execution of Algorithm 1.
- Termination.
-
Since at most processes are faulty and all messages from correct processes arrive, then any round of the algorithm finishes. Note that, given is a tree, the middle vertex between any two vertices can always be determined.
- Validity.
-
Let be the set of input values for correct processes. If there is a unique input value for every process, then . It follows
From Lemma 19, at the end of the algorithm the vertex is decided for every correct process. We prove validity since .
Otherwise, there exists more than one input value for correct processes. From Lemma 21, a valid vertex of is decided as output for each correct process.
- Agreement.
Theorem 10. [Restated, see original statement.]
Algorithm 1 satisfies the binding condition for .
Proof.
From Lemma 18, at the end of round only two vertices (namely ) are possible values for the vertex variable of any process. From the proof of the mentioned Lemma, the vertex can only correspond to the leaf of the branch of a value that is repeated at least times and has the property of being the unique repeated such number of times. Therefore, there is only one branch related to a specific value with one end in and the other in over which any (correct) process can output a vertex value. Since for every it is true that , then for any execution of Algorithm 1 where , all processes reach the end of round 1, thus achieving the binding property.
Appendix B Correctness proof for Algorithm 2
Firstly, we prove some properties of the subroutine for round .
Lemma 22.
No correct process receives more than two labeled messages sent via Reliable Broadcast from the same process.
Proof.
If follows directly from the specification of the Reliable Broadcast primitive.
Corollary 23.
At most one value per process is added to in .
Proof.
From the fact that correct processes add exactly one value to per message reception.
Lemma 24.
For each correct process, before removals in line of WitnessCollect there is a one-to-one correspondence between the values in the multiset and the processes in the set.
Proof.
Notice from Corollary 23 that at most one value per process is added to in line of the subroutine. Also, observe from lines that each time a new process is added to , so is a value to ; therefore at least one value is added per process. Since neither processes nor values are removed from the corresponding structures after they added until the execution of line , it follows that for each process in , there is a corresponding value in and vice versa.
Lemma 25.
Let be a pair of correct processes receiving , messages respectively in line of . Then, .
Proof.
Let be correct processes receiving , respectively. If is Byzantine, it follows from the specification of the Reliable Broadcast primitive that , since both messages are labeled with . Otherwise, if is correct, from line of WitnessCollect trivially , since correct messages only send one message per round.
Lemma 26.
Each correct process eventually sends a message via Reliable Broadcast, with .
Proof.
From Lemma 22, each correct process receives at most one labeled message from any other process in WitnessCollect. Since there are at most faulty processes, each correct process eventually receives messages from at least different processes, thus storing the same number of processes in its set (Lemma 24); when that happens, given that messages arrive discretely, each correct process eventually sends a message labeled with containing with exactly elements (lines 5-7).
Lemma 27.
Let be a correct process that receives a message. If is correct, eventually the WitnessCollect line condition holds.
Proof.
Observe that given is correct, then . Next, note we have two scenarios: firstly, if notice that the condition of line trivially holds. On the other hand, if , there is at least one process such that . Since is correct, and it sent through a message, it means that it previously received a message from via Reliable Broadcast; therefore, from the Reliable Broadcast specification eventually will receive a message from , thus including it in its set. Since this happens inductively with any missing process from out of the processes in , eventually .
Lemma 28.
Eventually each correct process gathers at least witnesses, namely .
Proof.
Let be a correct process. Eventually all correct processes send messages via Reliable Broadcast including their sets of size (Lemma 26). It follows from the Reliable Broadcast specification that eventually messages from correct processes will arrive at . From Lemma 27, eventually the condition of line holds for each set received from a correct process , thus, each of the correct processes will eventually be included in the set of .
Corollary 29.
Eventually, each correct process executes line of .
Lemma 30.
For each correct process that passes WitnessCollect line verification, its multiset contains at least values before line execution.
Proof.
Assume, for sake of contradiction, that is a correct process that passes line verification and that . Since passes line verification, it happens that . Let be an arbitrary process, then according to the subroutine’s code was added to in line ; therefore, it must have occurred that at some point in time sent a message to via Reliable Broadcast containing , and that received it in line . Notice that if accepted ’s message, necessarily .
Observe that if was added to in line , it previously should have been the case that ; consequently, . It follows from Lemma 24 that , which is a contradiction.
Lemma 31.
For each correct process, there exist at least values in not coming from Byzantine processes after dropping outliers in line of WitnessCollect.
Proof.
Let be a correct process, therefore it eventually executes line (Corollary 29). Observe that contained at east values before dropping outliers (Lemma 30); therefore, after dropping the smallest and largest values, . Since we assume , it follows that , thus .
Corollary 32.
For each correct process, there exists at least one value in not coming from Byzantine processes after dropping outliers.
Proof.
Given there is a bijection between values in and processes in (Lemma 24).
Lemma 33.
Let be two process quorums such that , then .
Proof.
Let be such quorums. Notice that since there are processes in total, . Observe that , thus . Since , . It follows that .
Lemma 34.
Let be a pair of correct processes that gather witnesses; then, they share at least one correct witness.
Proof.
Let be as stated, and let be their respective sets of witnesses. From Lemma 33, . Given that , . Therefore, since there are at most Byzantine processes, share at least correct processes; in particular, they share at least one correct process.
Lemma 35.
Let be a pair of correct processes that gather witnesses; then, they share at least values in their multisets before any of them executes line of WitnessCollect.
Proof.
Let be a pair of correct processes with witnesses each, then they share at least one correct witness (Lemma 34). Let be such witness, and note from the subroutine description that since has as witness, it must have been added to its set only after verifies that , with being sent by .
Note that according to Line , necessarily . It follows that at least of the processes in are also in , and so values in are uniquely associated with the processes in sent by (Lemma 24). Notice the argument is analogous for process .
Given both received the same set from via Reliable Broadcast, it follows by transitivity that have at least values in common in their multisets.
Corollary 36.
After two correct processes execute line of , they have at least one common value in their multisets.
Proof.
Let be the multisets of correct processes respectively. After disarding the smallest and largest values from the multisets, at most values are removed from their intersection; given , it follows that .
Since , we have that . Thus,
Lemma 37.
Consider the multiset of a correct process after line verification and just before line execution. Let be respectively the minimum and maximum values stored in that come from correct processes. If after executing line there remains any value received from a Byzantine party, then .
Proof.
Considering that in any round at most values in come from Byzantine parties we have the following scenarios:
-
If each of the values coming from Byzantine parties are greater than any value coming from correct processes, then after the largest values are dropped from no value coming from Byzantine processes remains, so the property holds. Analogously if each of the values are less than any value added from correct processes.
-
If none of the Byzantine values lie within the range (since there are at most of such values) it follows that each one of them is deleted after outliers are dropped.
-
Finally, since the scenarios are exhaustive, if some value coming from some Byzantine party remains, then it lies within the interval.
Lemma 38.
After the first round, each correct process sets its vertex variable to one of only two possible vertices from the spider graph:
-
The cental vertex .
-
A leaf vertex , with being a value given as input to some correct process.
Proof.
From Corollary 29, eventually every correct process executes line of , removing outliers from , after which at least values remain. Therefore, notice that there exist only two scenarios concerning remaining values in :
-
If there are two or more different values, the else branch at line is executed; therefore, the vertex variable is set to .
-
Otherwise, there is only one (possibly repeated) value in ; thus, according to the Algorithm’s code, line is executed, and vertex is set to .
Now, we claim that for every pair of correct processes that set their vertex variables to and respectively, then . Suppose in contradiction processes set their vertex variables at the end of round to respectively, and that . Observe from Corollary 36 that the multisets of processes share at least one common value. Since sets it vertex variable to , it must have been the case that after removing outliers, consists only in instances of . Analogously for , its multiset consists only in instances. Now, observe from Corollary 36 that have at least one value in common in their multisets, a contradiction.
Finally, notice that since consists only in copies of , from Corollary 32 we get that must come from some correct process. Since every correct process passes its input value as argument to (line ), it follows that is an input value of some correct process.
Lemma 39.
Every correct process executing round of Algorithm 2 eventually gathers values in its multiset coming from Branch messages.
Proof.
Given there are at most Byzantine processes and each correct process receives only one message per process via Reliable Broadcast (Lemma 22).
Lemma 40.
Let be the two possible values for the vertex variable of correct processes at the end of the first round (Lemma 38). At the end of the second round, if the vertex variable of some correct process is set to , then no other correct process has its vertex variable set to .
Proof.
Let be a correct process such that at the end of the second round sets its vertex variable to . From the algorithm’s code, it follows that it executed line 22 branch, and that after gathering values in its multiset (Lemma 39) it stored fewer than times.
Since there are values in , then at least messages are for , and therefore at least of such messages come from correct processes. Now, observe there are exactly correct processes overall; therefore, since the branch variable of each process is either or , then at most correct processes have its branch variable set to .
Now, consider a correct process with branch variable set to , and notice it stores values in its multiset (Lemma 39). From such values at least come from correct processes, and note from the previous argument that at most out of them are copies of , hence at least the remaining values are for . Since we assume , if follows that , then stored at least times in , thus line is executed and sets is vertex variable to at the end of round .
Lemma 41.
Let be the two possible values for the vertex variable of correct processes at the end of the first round (Lemma 38). At the end of the second round, if the vertex variable of some correct process is set to , then the only other possible value for the vertex variable of any other correct process is either or .
Proof.
Let be a correct process such that at the end of the second round sets its vertex variable to . It follows from the Algorithm’s code that it executed line 16 branch, and that after gathering values in its multiset (Lemma 39) it stored fewer than times. Analogous to the proof of the previous Lemma, at most correct processes have its branch variable set to .
Notice that if a correct process has its branch variable set to at line , then according to the Algorithm, it sets its vertex variable to either or at the end of round . Therefore, let us consider a correct process with branch variable set to , and observe it stores values in its multiset. Using an argument analogous to that of the previous Lemma, we deduce that appears at least times in . Since we assume , it follows that ; therefore stored at least times in , thus line is executed and sets is vertex variable to at the end of round .
Lemma 42.
Let be the two possible values for the vertex variable of correct processes at the end of the first round (Lemma 38). At the end of the second round, one of the following happens:
-
The vertex variable of every correct process is set either to or to .
-
The vertex variable of every correct process is set either to or to .
Proof.
It follows from Lemma 40, Lemma 41, and the fact that are the only possible values for vertex variables at the end of the second round.
Now, having proved some important properties of the WitnessCollect subroutine, we move on to prove properties regarding Algorithm 2.
Lemma 43.
At the end of the first communication round of Algorithm 2, if there is a unique input value for every correct process, then for every correct process .
Proof.
Let be a correct process. Since we consider the first communication round, . Therefore, the code block beginning at line , and ending at line is executed. Now, notice that in line , a call to is made. Note that inside the subroutine, eventually drops outliers (Corollary 29), and given all values coming from correct processes are instances of we deduce from Lemma 37 that every remaining value coming from some Byzantine process is also . Thus, after dropping outliers the multiset only consists in instances of . Hence, after returning from the subroutine call, line is executed, after which .
Corollary 44.
At the end of the second communication round of Algorithm 2, if there is a unique input value for every correct process, then for every correct process .
Proof.
Let be a correct process. Since we consider the second communication round, . Therefore, the code block beginning at line , and ending at line is executed. Notice that at the end of the previous round ; thus, according to line , . Afterwards, each correct process broadcasts (line ). Now, from Lemma 39, eventually gathers values in its multiset. Notice that given , executes the line branch, and since every other process has its vertex variable set to at the end of the first round, then no more than messages containing are delivered to , hence, contains less than copies of , and therefore no modification is done to the vertex variable.
Lemma 45.
If there is a unique input value for every correct process, then every correct process decides when executing Algorithm 2.
Proof.
By induction on , we prove that every correct process has its vertex variable set to at the end of round , and therefore, it ends deciding such vertex.
Notice that for the property holds through Lemma 43 and Corollary 44 respectively. Assume for . According to the Algorithm’s code, the code block beginning at line is executed; therefore, (line ), and every correct makes a call to (line ). With an analogous argument to that of Lemma 43, after returning from the subroutine call, necessarily consists only in instances of . Thus, process proceeds to the line else if branch. Following that, line is executed, after which .
Lemma 46.
At the end of the first communication round, the vertex of the spider graph pointed by the vertex variable of any correct process lies within the correct minimal subtree of .
Proof.
Directly from Lemma 38, since both possible vertices lie within the correct minimal subtree of .
Corollary 47.
At the end of the second communication round, the vertex of the spider graph pointed by the vertex variable of any correct process lies within the correct minimal subtree of .
Proof.
Directly from Lemma 42.
Lemma 48.
Every correct process decides a vertex that lies within the correct minimal subtree of when executing Algorithm 2.
Proof.
By induction on , we prove that the variable of every correct process lies within at the end of round .
For notice the property holds by Lemma 46, and Corollary 47 respectively. Assume for , and let be a correct process. Therefore, we have two cases:
-
The vertex variable is set to , thus, according to the Algorithm’s code, branch is set to (line ), and therefore the vertex variable remains unmodified (lines ), therefore the property trivially holds.
-
The vertex variable is set to some vertex that lies within . Thus, branch is set to (line ). Next, calls (line ). Notice from Lemma 37 that each remaining value in must lie within , with values sent from some correct processes, which, by assumption, make their vertex variables to lie within . Thus, either if vertex is modified in line or in line , the resulting value lies within .
Lemma 49.
If , then the distance between the vertices of the spider graph labeled by the decisions of all correct processes is at most one.
Proof.
From Lemma 38, and given , notice that at the end of the first round, each correct process has its vertex variable set to either ,, with being a specific value given as input to some correct process, and observe that the distance between these two vertices is one. Given only one round is executed, every correct process decides on the value pointed by vertex, so the property holds.
Lemma 50.
If , then the distance between the vertices of the spider graph labeled by the decisions of all correct processes is at most one.
Proof.
From Lemma 42, and given ,one of the following happens for the values of the vertex variables of correct processes:
-
The vertex variable of every correct process is set either to or to .
-
The vertex variable of every correct process is set either to or to .
Observe that in any case, the distance between any two vertices is at most one. Since only two rounds are executed, then every correct process decides on the value pointed by vertex, so the property holds.
Lemma 51.
If , and at the end of the second round the vertex variable of every correct process is set either to or to ; then the distance between the vertices of the spider graph labeled by the decisions of all correct processes is at most one.
Proof.
Firstly, observe that since is fixed then are at distance one. We claim that at the end of round , the vertex variable of every correct process is set either to or to , and therefore the property holds.
According to the code of Algorithm 2, if for some correct process, its vertex variable is set to at the beginning of round , then its decision remains unchanged at the end of the same round. Therefore, assume vertex is set to ; thus, according to the algorithm code, it is left unmodified or is modified either in line or line of Algorithm 2. If vertex is modified in line it must have been the case that consisted only in copies of some value . Using Corollary 32 we deduce that this value comes from correct processes, and given (line ), it can only be equal to , therefore vertex is set again to at the end of the round. Otherwise, if vertex is modified in line , observe from line that there are different values in ; from Lemma 37, each value in lies within , therefore, vertex is set to at the end of the round.
Lemma 52.
Let be correct processes such that after calling the subroutine, end up with their multisets having only copies of values respectively. It follows that .
Proof.
Suppose otherwise, then according to Corollary 36 the multisets of both processes share at least one common value, a contradiction.
Lemma 53.
Let with . Let and be intervals with a nonempty intersection, such that , and . Define , , it follows that .
Proof.
Since and have a nonempty intersection, there exists such that and . Now, without loss of generality assume , therefore .
Now, notice that given and , it follows that ; therefore . Analogously, observe that and , thus , and .
From the previous arguments, we deduce that . Hence, , and by symmetry we have that .
Since trivially , it remains to prove that .
Lemma 54.
If , and at the end of the second round the vertex variable of every correct process is set either to or to ; then the distance between the vertices of the spider graph labeled by the decisions of all correct processes is at most one.
Proof.
In order to prove the Lemma, we first show that at the end of round , the distance between the vertices of the spider graph pointed by the vertex variables of correct processes is at most .
We proceed by induction over . Let , thus at the beginning of the round the vertex variable of every correct process is set either to or to . According to line , the branch variable of every correct process is set to , and every correct process calls the subroutine in line either passing or as the value argument, eventually gathering values and removing outliers in (Corollary 29). Now, let us consider a pair of correct processes that completed line , and let be the intervals induced by the values gathered in their multisets respectively. Since correct processes passed either or as the value argument to the subroutine, observe that the maximum and minimum values in coming from correct processes after removing outliers are and respectively, and notice that according to Lemma 37 no Byzantine value lies outside of the interval . From the previous argument, we get that and . Also, note from Corollary 36 that the intervals have a nonempty intersection. Since have their branch variables set to , and , either line or line is executed, and in both cases is computed as the ceiling of the average of the values in (this is trivially true for line when consists in only one repeated value). Therefore, from Lemma 53 we get that the distance between the values computed by is at most .
Assume for . For round , without loss of generality let be a pair of processes such that the vertices of the spider graph pointed by their vertex variables are of maximum distance, and note that by assumption, their distance is at most . Using an analogous argument as the previous one, and Lemma 53 we get that the distance between the values computed by any pair of correct processes is at most .
Since we showed the invariant to be true, after rounds, we end up with all processes deciding vertices of the Spider Graph that are at distance at most .
Lemma 55.
If , then the distance between the vertices of the spider graph labeled by the decisions of all correct processes is at most one.
Proof.
Theorem 11. [Restated, see original statement.]
If , then Algorithm 2 solves connected consensus for with processes, up to of which may be Byzantine; using rounds, and sending messages.
Proof.
We show that all of the connected consensus properties hold in any execution of Algorithm 2.
- Termination.
-
From Lemma 28, each correct process eventually passes line verification, and since each of the following lines of code can be executed independently from the state of the involved variables, each round of communication is eventually completed for every correct process. Given there are explicitly rounds of communication, it follows that Algorithm 2 terminates.
- Validity.
-
Let be the set of input values for correct processes. If there is a unique input value for every process, then . Then,
Thus, from Lemma 45, at the end of the algorithm the vertex is decided for every correct process. Then, validity holds since . Otherwise, from Lemma 48 it follows that any decided vertex lies within .
- Agreement.
Theorem 12. [Restated, see original statement.]
Algorithm 2 satisfies the binding condition for .
Proof.
From Lemma 38, at the end of the first round only two vertices () are possible values for the vertex variable of any process is the input of some correct process; with the value related to the leaf being the input of some correct process. Finally, notice that according to the proof of the same Lemma, the value is the same for all processes. Therefore, the branch induced by the vertex variables of every correct process is unique. Given no other branches are considered in subsequent rounds, the property holds.
