Mobile Byzantine Agreement in a Trusted World
Abstract
In this paper, we address the Byzantine Agreement problem in synchronous systems where Byzantine agents can move from process to process, corrupting their host. We focus on two representative models: Garay’s and Buhrman’s models. In Garay’s model, when a process has been left by the Byzantine agent, it enters a cured state, is aware of its condition, and can remain silent for a round to prevent the dissemination of incorrect information. In Buhrman’s model, a Byzantine agent moves together with the message. It has been shown that solving Byzantine Agreement requires at least processes in Garay’s model, and at least in Buhrman’s model. In this paper, we aim to increase the tolerance to mobile Byzantine agents by integrating a trusted counter abstraction into both models. This abstraction prevents nodes from equivocating. In the new models, we prove that at least , respectively processors are needed to tolerate mobile Byzantine agents. Furthermore, we propose novel Mobile Byzantine Agreement algorithms that match these new lower bounds for both Garay’s and Buhrman’s models, achieving agreement in synchronous rounds.
Keywords and phrases:
Byzantine Agreement, Mobile Faults, Trusted AbstractionsCopyright and License:
2012 ACM Subject Classification:
Theory of computation Distributed algorithmsEditors:
Andrei Arusoaie, Emanuel Onica, Michael Spear, and Sara Tucci-PiergiovanniSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Byzantine Agreement, introduced by Lamport et al. [23, 28], is a fundamental building block in distributed systems. Since its introduction, Byzantine Agreement has been studied from different angles in various models (from synchronous [23, 28, 29] to asynchronous [16, 25], from authenticated [19] to anonymous [26]). Byzantine Agreement has been addressed in both deterministic [23, 28] and probabilistic [9, 20] settings. The resiliency (i.e. the fraction of faulty nodes the protocol can tolerate) has been the core research around Byzantine Agreement for decades. Notably, Lamport et al. [23] fix this bound to one third in environments with no authentication and further Borderding [8] proves that this bound holds even for models with local authentication. In all the previous mentioned works it is assumed that the faulty behavior is not mobile. That is, once a node is affected by a Byzantine behavior it will stay affected all along the system execution.
The newly emergent area of blockchains and more generally distributed ledger technologies brought to the light new types of faults and attacks that cannot be predicted accurately and may affect the system at various moments during its execution (e.g. [12, 22, 1]). Some of these behaviors can be modeled as malicious “agent” that can move through the network and corrupt the nodes they occupy. In the literature this model is known as mobile Byzantine model.
Related works
This paper investigates the Byzantine Agreement problem in the mobile Byzantine model enhanced with trusted abstraction. The literature focusing on the mobile Byzantine model investigates two main types of mobility: constrained and unconstrained. Like the classical distributed systems, the main case study problem in mobile Byzantine faults model has been Byzantine Agreement. When faults are mobile, the problem is known as Mobile Byzantine Agreement and requires special attention to preserving agreement even after it has been reached.
In constrained mobility settings, Buhrman et al. [10] consider that malicious agents move from one node to another only when protocol messages are sent (similar to how viruses would propagate). In that model, they prove a tight bound for Mobile Byzantine Agreement (, where is the maximal number of simultaneously faulty processes) and propose a time-optimal protocol that matches this bound.
In unconstrained mobility, the mobility of malicious agents is not constrained by message exchanges [30, 27, 21, 4, 31]. In Garay [21], Banu et al. [4], and Sasaki et al. [31], the authors consider that processes execute synchronous rounds composed of three phases: send, receive, compute. Between two consecutive rounds, malicious agents can move from one host to another, hence the set of faulty processes has a bounded size, although its membership can change from one round to the next.
In Garay’s model, a process can detect its infection after the fact. More precisely, during the first round following the leave of the malicious agent, a process enters a cured state, during which it can take preventive actions to avoid sending messages based on a corrupted state. Under this assumption, Garay [21] proposes an algorithm that solves Mobile Byzantine Agreement provided that . In the same model, Banu et al. [4] propose a Mobile Byzantine Agreement algorithm for . Sasaki et al. [31] investigate the problem in a different model where processes do not have this ability to detect when malicious agents move. This work extends the tight bounds ( and ) for Byzantine Agreement of Dolev [18] in arbitrary networks with static faults. In [6] the authors prove that the Mobile Byzantine Agreement problem has no solution if the size of the network is (where is an upper bound on the number of faulty agents) and propose an algorithm that matches this bound in a variant of Sasaki’s model [31].
In order to further increase the resilience of systems to Byzantine behaviors, one recent direction is the use of a trusted environment that provides a simple and limited set of trusted services. In this line of research, several works investigated the use of trusted hardware or software. Correia et al. introduce TTCB wormhole in [14, 15] proved that BFT can support a fraction of half Byzantine nodes. Chun et al. proposed Attested Append-Only Memory (A2M) [13], a trusted system that removes from the faulty nodes the ability to equivocate (i.e. a faulty node may send different messages to different nodes). They introduce PBFT-EA, a modified PBFT [11] that uses A2M for each message exchanged; the message is appended to a log and the attestation produced is sent with the message. The use of this abstraction increases the resilience from one third to one half. Levin et al. propose TrInc [24], a trusted monotonic counter that deals with equivocation in large distributed systems by providing a primitive: once-in-a-lifetime attestations. They also prove that TrInc can implement A2M. Monotonic trusted counter is used by Veronese et al. in [32]. They propose USIG (Unique Sequential Identifier Generator) a local service available in each node that signs a message and assigns it the value of a counter.
Furthermore, in the context of blockchains in [33], the authors enhance HotStuff with small trusted environments to tolerate a minority of corruptions. Damysus (in [17]) proposes a BFT protocol that uses trusted environments to improve Hotstuff resilience. In [5], the authors proposed enhance Tendermint protocol with a specific authenticated broadcast service (called A2M-broadcast) to tolerate a minority of faulty processes. The above works do not address the tolerance to Mobile Byzantine faults and the trusted abstractions contain and make use of trusted memory (logs).
In this paper we follow the line of research recently opened by [2] where the authors introduce a simple trusted abstraction, the Trusted Monotonic Counter, defined in terms of liveness and safety. Integrating this simple trusted counter into Bracha’s algorithms increases their resilience to in the case of reliable broadcast and in the case of probabilistic consensus [3]. Differently from [2] and [3], we address Mobile Byzantine settings.
Our contribution
In this paper, we address the Mobile Byzantine Agreement problem in Garay’s model [21] and Buhrman’s model [10], both enhanced with the Trusted Monotonic Counter (TMC) abstraction [2]. For each of these models, we establish tight lower bounds on the number of processes required to solve Byzantine Agreement: in Section 3, we show that no solution exists for Garay’s model if , and for Buhrman’s model if , where is an upper bound on the number of faulty agents. Then we propose algorithms for both models that match these bounds in Section 5 and Section 6. Each algorithm guarantees agreement within synchronous rounds. It should be noted that compared to the previous algorithm proposed in [6], our algorithm shares a similar three-phase structure (proposing, collecting, deciding), but crucially differs in the way messages are authenticated and validated.
In our work, by integrating a Trusted Monotonic Counter (TMC), our design prevents equivocation: a faulty process cannot send different messages without being detected. This stronger consistency guarantee allows us to lower the validating thresholds, thus improving the fault tolerance for Garay’s model from [4] to processes, and for Buhrman’s model from [10] to processes.
2 Model and Definition
2.1 System model
Processes
We consider a synchronous message-passing system consisting of processes , , …, where denotes the set of process indices. Each process is an automaton whose state evolves following the execution of its local algorithm. All processes execute the same algorithm.
The network is fully connected: all pairs of processes are directly linked with a reliable bidirectional channel; i.e. there is no loss, duplication, or alteration of messages. The system evolves in synchronous rounds and all processes start simultaneously at round . There is a round counter accessible to the algorithm executed by each process. Each round consists of three steps; send, receive, and compute. Based on its current local state, a process (1) computes and sends a message to all processes (including itself); (2) receives messages sent by all processes (including itself); and (3) computes its new state based on its current state and the set of received messages.
Mobile malicious agents
Faults are represented by malicious mobile agents that can move from process to process between rounds. There are at most malicious agents, with , and any process can be occupied by an agent. A process is said to be faulty in a given round if it is occupied by an agent in that round. A process which is not occupied by a malicious agent, but was occupied in the previous round is called a cured process.
A process which is neither faulty nor cured is called a correct process. , , and denote respectively the set of faulty, correct, and cured processes at round . For ease of writing, we also consider the combined sets of correct/cured processes as the set of non-faulty processes .
Malicious agents are mobile. The behavior of a faulty process is controlled by the malicious agent. In particular, the agent can corrupt the local state of its host process, and force it to send arbitrary messages (potentially different messages to different processes). However, a malicious agent cannot corrupt the identity of that process (i.e., it cannot send messages using another identity), and is unable to modify the code of the algorithm (i.e., the process resumes executing the correct algorithm after the malicious agent moves away). So we assume a secure, tamper-proof read-only memory where the identity and the code are stored. While it is possible for each non-faulty process to rejuvenate its code at the beginning of each round, local variables may still be corrupted (and of course cannot be recovered on its own).
In this paper we consider two variants of mobility illustrated in Figure 1:
- Garay’s model [21]:
-
Malicious agents can move between the compute step of a round and the send step of the next round. When a process is in the cured state, it is aware of its condition and thus can remain silent for a round to prevent the dissemination of wrong information.
- Buhrman’s model [10]:
-
Malicious agents move together with the message, i.e., they move between the send step and the receive round of the same round. When a process is in the cured state it is aware of that.
Initial state corruptions.
At most processes may begin their execution in a corrupted initial state. In order to keep a consistent notation and the presentation elegant, we simply represent this case by considering an imaginary round -1 in which that process is occupied. Thus, a process starting its execution in a corrupted initial state is either faulty or, by an abuse of terminology, cured in round 0. Therefore, just like in any other subsequent round, it is possible for round 0 to have at most cured processes in addition to faulty processes.
Notation.
In the formal definitions and proofs, denotes the value of variable in process at the end of round . We also use the notation to refer to the number of occurrences of in tuple .
In this work, we assume that the round number counter is stored in a tamper-resistant register and cannot be modified by Byzantine agents. This counter ensures that messages can be uniquely tagged with the corresponding round identifier. Note that while we adopt this assumption for simplicity, prior work [6] has shown that this constraint can be relaxed. Specifically, they propose techniques to emulate round numbers securely without requiring a trusted round counter. Our protocol can be extended using such techniques to remove this assumption without compromising correctness.
2.2 Mobile Byzantine Agreement problem
Each initially-correct process has an initial value . All processes must decide111We use a terminology consistent with the classical definition of Byzantine agreement. However, the action “decide” does not in itself guarantee a permanent decision. Indeed, due to the mobility of the malicious agents, non-faulty processes must re-decide the decision at the end of each round. a value such that the following properties hold:
-
1.
BA-Termination: Eventually, all non-faulty processes during a round terminate the round with a non-bottom decided value.
-
2.
BA-Agreement: No two non-faulty processes decide different values:
-
3.
BA-Validity: If all initially-correct processes propose the same value , correct processes can decide only .
After some round, all non-faulty processes must decide the same non-bottom value at every round. Temporary undecided states (denoted by ) are permitted during the convergence phase, but not after agreement is reached.
We assume that at least one process remains correct for rounds of communication. This assumption originates from the impossibility result of [21], and is acknowledged in [6], as one way of overcoming the impossibility of solving the Mobile Byzantine Agreement problem. The assumption is expressed as for generality, our upper bounds rely only on the fact that one process remains correct for rounds.
2.3 Trusted Monotonic Counter Object
We enhance the system with the Trusted Monotonic Counter Oracle abstraction TMC-Object defined below. We adopt the specification of the Trusted Monotonic Counter presented in [3].
This abstraction allows us, in Garay’s model, to design a Mobile Byzantine Agreement algorithm that supports Byzantine failures among processes, where , which improves the bound [4]. In Buhrman’s model, we reduce the bound to , improving the bound [10].
Informally, a TMC provides each process with a tamper-proof counter that attaches a unique, strictly increasing number and an unforgeable certificate to every message. This prevents equivocation, ensuring that a process cannot send conflicting messages or reuse old counters. Each process operates its local TMC independently, so parallel broadcasts proceed without interference or the need for centralized trusted infrastructure.
TMC-Object provides each process with a read-only local variable, called trustedCounter. Whenever the TMC-Object is invoked, it returns a value for trustedCounter strictly greater than any previous value it returned. The difference between two successive counter values is exactly 1.
The TMC-Object supports the operation get_certificate(). A process invokes get_certificate(m) with a message . The object returns a certificate and the value of the counter. The certificate object certifies that the returned value was created by the tamper-proof TMC-Object object for the message . The unique identifier is essentially a reading of the monotonic counter trustedCounter, which is incremented whenever get_certificate(m) is called.
The TMC-Object object guarantees the following properties:
-
Uniqueness: The get_certificate() operation of TMC-Object will never assign the same identifier to two different messages.
-
Sequentiality: The get_certificate() operation of TMC-Object will always assign an identifier that is the successor of the previous one.
To send a message certified by TMC-Object, a process invokes the get_certificate() operation of TMC-Object. This invocation creates a certificate corresponding to the value of the trustedCounter . Then the process sends the tuple , which can be verified by any other process receiving the message via the check_certificate() operation of the local TMC-Object, which confirms whether the given certificate was produced for the given message. Each invocation of the get_certificate() operation of TMC-Object increments the value of the trustedCounter of process . We call that sequence of operations TMC-Object-Send .
When receiving a message , a process must check if the certificate for message corresponds to the value of the counter . If not, the message is considered invalid and is ignored. If they correspond, the message is said to be valid according to the TMC-Object.
Note that Trusted Monotonic Counters are already provided by real-world real implementations such as TrInc [24] and USIG [32], which realize sequential counters with minimal trusted functionality. These systems show that TMCs can be efficiently implemented in real-world distributed settings without requiring expensive trusted computing bases.
3 Lower bound
In this section, we prove that in environments where nodes are equipped with equivocation detection mechanisms, in the presence of mobile Byzantine agents, nodes are necessary to solve Mobile Byzantine Agreement problem in Garay’s model, and nodes are necessary in Buhrman’s model.
3.1 Garay’s model
Theorem 1.
If , there is no deterministic algorithm that solves the Mobile Byzantine Agreement problem in a synchronous -process system with equivocation prevention, in the presence of (1) mobile Byzantine agents following Garay’s model and (2) processes with an initially-corrupted state (even with a permanently correct process).
Proof.
Without loss of generality, it suffices to prove the case of 3-process system with . The generalization of this result for , can be done by replacing each process appearing in the proof by a group of processes of size at most . Since the adversary can control agents, it can corrupt one group per round, preserving the argument.
Given a system consisting of 3 processes , where at least one is permanently correct in one execution. The proof is by contradiction. Let us suppose that there exists a deterministic algorithm that can solve the BA problem in the presence of a single malicious mobile agent. In this algorithm, with equivocation prevention, processes send the same message to all processes.
General idea.
We consider three executions of this algorithm: , and . In executions and , all correct processes propose the same value; and respectively. The BA properties imply that, eventually, non-faulty processes respectively decide and in these two executions. The third execution, called , brings a contradiction: some processes decide while others decide .
The three executions are illustrated in Figure 2. Arrows correspond to messages exchanged. Gray boxes contain the new local state computed by each process at the end of each round, which is then used to send messages in the following round. Red (resp. green) indicates actions taken by the faulty (resp. cured) processes. Vertical dashed lines separate successive rounds.
Executions and .
In , the malicious agent alternates between processes and , and process is initially correct and proposes in . In , the malicious agent alternates between processes and , and process is initially correct and proposes in .
For non-faulty processes, the messages sent during these executions are computed by the algorithm based on the local states of processes. For correct processes (i.e., excluding cured ones), let us denote by (resp. ) the local state of process at the beginning of the round in execution (resp. ). Based on this local state, let (resp. ) denote the message computed and sent by a correct process at round in execution (resp. ).
We now define the behavior of the malicious agent. In , for the faulty process (either or ) at round of execution , we choose that sends the message (i.e., the message would have sent at the same round in ) and we choose that when is faulty, it updates its local state to at the end of the round (i.e., the same state would have computed in ). Similarly we choose that the faulty process (either or ) at round of execution sends the message , and when is faulty, it updates its state to . (The state of is irrelevant as it is never correct in all three executions.)
Execution .
In execution , the malicious agent always occupies process . The two other processes are initially (and forever) correct. As in , process propose . As in , process propose . In this execution, the faulty process always stays silent to all processes.
Indistinguishability.
An induction on the round number shows that: and are indistinguishable for (which is always correct in these two executions), and and are indistinguishable for (which is always correct in these two executions). The complete induction proof is given in Appendix A.
The BA properties imply that, eventually decides in , it also decides in . Since eventually decides in , it also decides in . This violates the BA-Agreement property in , therefore contradiction.
3.2 Buhrman’s model
For Buhrman’s model, in fact it is sufficient to show that nodes are necessary to solve even the static Byzantine Agreement, let alone any mobile model.
Lemma 2.
There is no deterministic algorithm that solves the static Byzantine Agreement problem in a synchronous -process system with equivocation prevention if .
Proof.
Similarly to the proof of Theorem 1, consider the three executions illustrated in Figure 3. and are indistinguishable for (which is always correct in these two executions), and and are indistinguishable for (which are always correct in these two executions). The BA properties imply that, eventually decide in , it also decides in . Since eventually decides in , it also decides in . This violates the BA-Agreement property in , therefore contradiction.
Theorem 3.
If , there is no deterministic algorithm that solves the Mobile Byzantine Agreement problem in a synchronous -process system with equivocation prevention, in the presence of (1) mobile Byzantine agents following Buhrman’s model and (2) processes with an initially-corrupted state (even with a permanently correct process).
4 Byzantine Reliable Broadcast with Trusted Monotonic Counter
Several approaches to reliable broadcast have been proposed. Some, such as [7], address the problem in a different model by relying on a local failure oracle to detect past faults, leading to a bound of . Our work instead builds upon the broadcast primitive introduced in [3], which uses a Trusted Monotonic Counter to eliminate equivocation by design, and requires only . This leads to a simpler and more efficient construction.
Algorithm 1 invokes get_certificate() of the TMC-Objectwith the counter initialized at . When a process (the initiator) broadcasts a value , it uses TMC-Object-Send to send together with its certificate and counter to all processes. Upon receiving a valid initial message, each process forwards it and then delivers . Validity is ensured because only counter value 1 is accepted for the first broadcast, and more generally, each new value must increase the counter by exactly one. This prevents equivocation, as even a Byzantine initiator can commit to only one value per broadcast.
In our synchronous round-based model with mobile Byzantine faults, the same guarantees hold: within each round faults are static, and all broadcasts complete in the send phase. Hence each initiator produces at most one certified message that reaches all processes, and the primitive implements Byzantine Reliable Broadcast for any .
Theorem 4 (adapted from [3]).
Let be the number of processes, and let be an upper bound on the number of Byzantine agents. In a synchronous system with , the TMC-Object-Broadcast primitive implements Byzantine Reliable Broadcast with messages and a single call to get_certificate(). This guarantee holds even in the presence of mobile Byzantine faults, provided that broadcasts are invoked during the send step of each round.
The use of TMC-Object guarantees that even a faulty process cannot equivocate – that is, it cannot send conflicting messages to different recipients without being detected. Each faulty process is restricted to committing to a single value per round. This stronger consistency guarantee allows us to lower the validation thresholds used in our Mobile Byzantine Agreement algorithms, thereby improving fault tolerance while simplifying message validation logic.
5 Mobile BA algorithm for Garay’s model with Trusted Monotonic Counter
Building on the primitives introduced above, we now present our solution for Garay’s model. Algorithm 2 uses TMC-Object and solves Mobile Byzantine Agreement under the following two conditions: (1) there are at least processes in total, and (2) some process remains uncorrupted for the first rounds. This improves the fault tolerance bound from (in [4]) to processes.
5.1 Description of the MBA algorithm
The MBA algorithm consists of four types of rounds. During the first phases, each composed of 3 rounds, it follows a repeating sequence of proposing, collecting, and deciding rounds. At the end of round , all non-faulty processes are guaranteed to hold the same value and decide on it (line 24). From round onward, it continuously executes the maintaining round.
In accordance with Garay’s model, a process that was occupied by a Byzantine agent in the previous round – i.e., a process that is cured in the current round – must remain silent during this round.
-
Proposing round: The goal is to filter out conflicting values and retain only those with strong support. Line 7 ensures that a value is accepted only if it is well supported (at least processes) and not contradicted by too many () conflicting values. These thresholds guarantee that all non-faulty processes (at least ) end the round with at most one non-bottom value . Consequently, all correct processes (at least ) in the next (collecting) round start with at most one non-bottom value .
-
Collecting round: The goal of this round is to gather and stabilize the values selected in the previous round. Each process broadcasts its current value and collects values from others into an array . This array represents the current opinions of the system and will be used to compute candidate values in the next step. For all non-faulty processes, according to the TMC-Object, only valid values are stored.
-
Deciding round: In this round processes attempt to reach agreement using the rotating coordinator paradigm. Each process broadcasts its array (gathered in the previous round) and collects the arrays from others into a matrix , where stores the value reported by process for position . Processes then analyze each column of to identify consistent values. Each such value is stored in a candidate array , where records the majority-supported value (if any) for position . If enough support for a single value is observed, it is selected. Otherwise, the coordinator (chosen as process in phase ) helps break the tie. If the coordinator of the current round is correct during the entire phase, all non-faulty processes are guaranteed to terminate the phase with the same value. By assumption, such a coordinating round is guaranteed to exist. This property persists in all subsequent phases even if the corresponding coordinators in those phases are faulty.
-
Maintaining round: This part repeats indefinitely starting from round . The goal is to allow cured processes to recover the decided value from correct ones. All processes exchange their current decided values and update their variable to the value that has been received at least times. In each of these round, all the correct processes send the same value (guaranteed by the algorithm), all non-faulty processes receive messages containing this same value, thus decide accordingly.
5.2 Proof of correctness
Lemma 5.
For any , at the end of the round , all the non-faulty processes can have at most one non-bottom value. Formally:
Proof.
We prove the lemma by contradiction. Suppose that there exists two different non-bottom value and , and two different non-faulty processes and , such that (resp. ) ends round with (resp. ). We show that this is impossible under the threshold conditions of line 7. With the use of TMC-Object, in fact the array are the same for all non-faulty processes since there is no equivocation. According to line 7 we have and . So , which is impossible since the size of is . Therefore contradiction.
Lemma 6.
There exists a phase such that all non-faulty processes end with the same non-bottom value. Formally:
Proof.
As we assume that at least one process remains non-faulty for at least rounds. Let be this process, and we consider the th phase where is the coordinator (line 22).
In round , by Lemma 5, all non-faulty processes can have at most one non-bottom value. Formally:
If any non-faulty process has a non-bottom value, denote .
In round , all correct processes in this round were non-faulty in the previous round, so these correct processes send either or (if exists). Cured ones don’t send. Only faulty ones may send other values. Thus all non-faulty processes receive at most values different from and . Therefore, the arrays of all non-faulty processes contain at most elements different from and and these elements are all located at the same indexes. Formally:
In round , all correct processes (non-faulty in round ) send an array satisfying the previous conditions, and then all non-faulty processes update such that:
As there is at most faulty processes, for all such indexes , the test at line 17 can be true only for the value :
The test line 18 can be true only for the value . Starting from the test of line 18, non-faulty processes update their variable by line 19, 22, or 23:
-
If no non-faulty process updates its by line 19: This means that all of them update their by testing line 22. As we have assumed that is correct, it sends the same array to all processes in round , then all non-faulty processes have the same line , which means that all non-faulty processes terminate the phase with the same non-bottom value, either all at line 22, or all at line 23.
-
If at least one non-faulty process updates its variable at line 19: Denote such a process. Similarly, it remains to prove that the remaining non-faulty processes which update their from the test of line 22 also update to the same value .
As updates at line 19, its contains more than times the value . Denote the set of indexes such that .
As has executed line 17, for all in , the column contains more than times the value . Since there are at most faulty processes, at least one correct process is the source of these, which means: for all in , there exists a correct in round , such that sends with . Formally:
At the point of view of (correct in round thus non-faulty in round ), this value comes from the line 10, which means in the round received from . With the equivocation prevention guaranteed by TMC, broadcasts in round , no matter whether it is faulty or not. Thus all processes of broadcast the value in the round .
Therefore, all non-faulty processes of round have received these values from processes of . It includes the process which is always correct by assumption. Then sends in round an array that contains at least times of value .
Consequently during round , each non-faulty process which has not updated its from line 19 update from line 22 (since the test of line 22 is true). It means that all non-faulty processes update their to , which concludes the proof.
Lemma 7.
If all correct processes start a phase with the same variable , then all non-faulty processes terminate the phase with this same value. Formally:
Proof.
We consider the three rounds of phase :
Round .
All correct processes (at least ) start the round with the same value . They all send this at line 4. Cured ones stay silent. Only faulty ones sends other values. Then all non-faulty processes receive at least times. and the sum of the number of occurrences of and in is at least . Then all of them update their to .
Round .
All correct processes were non-faulty in the previous round, then all of them have value at the beginning of this round and send at line 9. Then all non-faulty processes receive at least from all correct processes, thus form an array containing at least times the value at the indexes corresponding to correct processes.
Round .
All correct processes were non-faulty in the round , then all of them send an array at line 12 containing at least times the value . Therefore all non-faulty processes obtain a satisfying:
As there are more than correct processes, all non-faulty processes execute line 17 for all indexes corresponding to correct processes of the round :
Then for all non-faulty processes, the array contains at least times of , thus all non-faulty processes get positive test at line 18, which means all of them update their to , which concludes the proof.
Theorem 8.
Let be the number of processes, and be an upper bound of the Byzantine processes following Garay’s model. If (1) and (2) there is at least one process remaining correct for the first rounds, Algorithm 2 solves the Mobile Byzantine Agreement problem in a synchronous -process system.
Proof.
Lemma 6 and 7 guarantee that all non-faulty processes of round terminate the round with the same non-bottom value in variable . At line 24, all these non-faulty processes decide the value . Starting from round , since there are at least correct processes during each round, all non-faulty processes will always decide this same value . It proves the BA-Termination and BA-Agreement properties. To prove the BA-Validity, it is sufficient to apply Lemma 7 from the first round. This concludes the proof.
5.3 Round Complexity
The following complexity bound follows directly from the proof of Lemma 6, which identifies the first phase where the coordinator remains correct throughout.
Theorem 9.
In Algorithm 2, all non-faulty processes reach agreement by the end of round , where is the index of the first phase with a correct coordinator. In the worst case, , resulting in a round complexity of rounds.
This bound holds under the assumption that at least one process remains correct for rounds, as stated in the system model. In our model, up to processes may be faulty over time. Therefore, this bound of also corresponds to when compared to the early Byzantine Agreement result of Reischuk [30], where denotes the number of processes that are ever faulty. In practice, if coordinators are scheduled uniformly or the fault pattern is favorable, the index can be significantly smaller than , leading to faster agreement.
6 Mobile BA algorithm for Buhrman’s model with Trusted Monotonic Counter
We now consider Buhrman’s model. We introduce Algorithm 3 that uses TMC-Object and solves Mobile Byzantine Agreement under the following two conditions: (1) there are at least processes in total, and (2) some process remains uncorrupted for the first rounds. This improves the fault tolerance bound from (in [10]) to processes.
We recall that, in Buhrman’s model, the mobility of malicious agents is constrained: an agent may only move together with messages – that is, from the send step of one process to the receive step of another, within the same round. As a result, the set of faulty processes is determined at the time messages are received. Cured processes are never faulty in the next round’s send step, and they are aware that they are no longer infected and resume active participation immediately. Therefore, any process that is non-faulty during the receive step behaves as a correct process in that round, regardless of whether it was faulty in the previous round. As a consequence, up to processes can be trusted to send the correct decision value in each maintaining round.
The structure and correctness proof follow similar principles to the previous section, but are adapted to Buhrman’s mobility model and thresholds made possible by the use of TMC. Our algorithm adjusts thresholds across all rounds to reflect this increased reliability. In particular, a process accepts a value if it is received from at least distinct sources, rather than . These relaxed thresholds are safe due to two key properties: (1) cured processes resume participation immediately and contribute to communication, and (2) the Trusted Monotonic Counter abstraction ensures that faulty processes cannot equivocate.
Lemma 10.
For any , at the end of the round , all the non-faulty processes can have at most one non-bottom value. Formally:
Proof.
The argument follows the same structure as Lemma 5, but under the modified threshold condition of line 7. Suppose two distinct values satisfy the condition in line 7, we have and . So , which is impossible since the size of is . Therefore, at most one value can satisfy the condition.
Lemma 11.
There exists a phase such that all non-faulty processes end with the same non-bottom value. Formally:
Proof.
Lemma 12.
If all non-faulty processes at the end of phase terminate with the same variable , then all non-faulty processes at the end of phase terminate with this same value. Formally:
Proof.
The complete proof is in Appendix C.
Theorem 13.
Let be the number of processes, and be an upper bound of the Byzantine processes following Buhrman’s model. If (1) and (2) there is at least one process remaining correct for the first rounds, Algorithm 3 solves the Mobile Byzantine Agreement problem in a synchronous -process system.
Theorem 14.
In Algorithm 3, all non-faulty processes reach agreement by the end of round , where is the index of the first phase with a correct coordinator. In the worst case, , yielding a round complexity of rounds.
7 Conclusion
In this paper, we propose novel solutions for Mobile Byzantine Agreement in an environment where processes are equipped with a Trusted Monotonic Counter abstraction that provides a non-falsifiable, verifiable, unique, monotonic, and sequential counter. We investigate the problem under two different mobile Byzantine models: Garay’s and Buhrman’s. In the case of Garay’s model, our deterministic algorithm tolerates mobile Byzantine processes with while in the original model the best known bound was . Our second algorithm for Buhrman’s model tolerates Byzantine processes with while in the original model the bound was . We also prove that these two bounds are tight. Both algorithms share a common three-phase structure and achieve agreement within synchronous rounds. An interesting direction for future work is to design a generic transformer from mobile Byzantine model to mobile Byzantine with trusted components model as suggested in [5] for the static case.
References
- [1] Ittai Abraham, Dahlia Malkhi, Kartik Nayak, and Ling Ren. Dfinity consensus, explored. IACR Cryptol. ePrint Arch., page 1153, 2018. URL: https://eprint.iacr.org/2018/1153.
- [2] Yackolley Amoussou-Guenou, Lionel Beltrando, Maurice Herlihy, and Maria Potop-Butucaru. Byzantine reliable broadcast with one trusted monotonic counter. In Toshimitsu Masuzawa, Yoshiaki Katayama, Hirotsugu Kakugawa, Junya Nakamura, and Yonghwan Kim, editors, Stabilization, Safety, and Security of Distributed Systems - 26th International Symposium, SSS 2024, Nagoya, Japan, October 20-22, 2024, Proceedings, volume 14931 of Lecture Notes in Computer Science, pages 360–374. Springer, 2024. doi:10.1007/978-3-031-74498-3_26.
- [3] Yackolley Amoussou-Guenou, Maurice Herlihy, and Maria Potop-Butucaru. Asynchronous byzantine consensus with trusted monotonic counters. In Ulrich Schmid and Roman Kuznets, editors, Structural Information and Communication Complexity - 32nd International Colloquium, SIROCCO 2025, Delphi, Greece, June 2-4, 2025, Proceedings, volume 15671 of Lecture Notes in Computer Science, pages 23–38. Springer, 2025. doi:10.1007/978-3-031-91736-3_2.
- [4] Nazreen Banu, Samia Souissi, Taisuke Izumi, and Koichi Wada. An improved byzantine agreement algorithm for synchronous systems with mobile faults. International Journal of Computer Applications, 43(22):1–7, April 2012.
- [5] Lionel Beltrando, Maria Potop-Butucaru, and José Alfaro. Tendertee: Increasing the resilience of tendermint by using trusted environments. In 24th International Conference on Distributed Computing and Networking, ICDCN 2023, Kharagpur, India, January 4-7, 2023, pages 90–99. ACM, 2023. doi:10.1145/3571306.3571394.
- [6] François Bonnet, Xavier Défago, Thanh Dang Nguyen, and Maria Potop-Butucaru. Tight bound on mobile byzantine agreement. Theor. Comput. Sci., 609:361–373, 2016. doi:10.1016/J.TCS.2015.10.019.
- [7] Silvia Bonomi, Giovanni Farina, and Sébastien Tixeuil. Reliable broadcast despite mobile byzantine faults. In Alysson Bessani, Xavier Défago, Junya Nakamura, Koichi Wada, and Yukiko Yamauchi, editors, 27th International Conference on Principles of Distributed Systems, OPODIS 2023, December 6-8, 2023, Tokyo, Japan, volume 286 of LIPIcs, pages 18:1–18:23. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.OPODIS.2023.18.
- [8] Malte Borderding. Levels of authentication in distributed agreement. In Özalp Babaoglu and Keith Marzullo, editors, Distributed Algorithms, 10th International Workshop, WDAG ’96, Bologna, Italy, October 9-11, 1996, Proceedings, volume 1151 of Lecture Notes in Computer Science, pages 40–55. Springer, 1996. doi:10.1007/3-540-61769-8_4.
- [9] Gabriel Bracha. An o(log n) expected rounds randomized byzantine generals protocol. J. ACM, 34(4):910–920, 1987. doi:10.1145/31846.42229.
- [10] Harry Buhrman, Juan A. Garay, and Jaap-Henk Hoepman. Optimal resiliency against mobile faults. In Digest of Papers: FTCS-25, The Twenty-Fifth International Symposium on Fault-Tolerant Computing, Pasadena, California, USA, June 27-30, 1995, pages 83–88. IEEE Computer Society, 1995. doi:10.1109/FTCS.1995.466995.
- [11] Miguel Castro and Barbara Liskov. Practical byzantine fault tolerance. In Margo I. Seltzer and Paul J. Leach, editors, Proceedings of the Third USENIX Symposium on Operating Systems Design and Implementation (OSDI), New Orleans, Louisiana, USA, February 22-25, 1999, pages 173–186. USENIX Association, 1999. URL: https://dl.acm.org/citation.cfm?id=296824.
- [12] Jing Chen and Silvio Micali. Algorand: A secure and efficient distributed ledger. Theor. Comput. Sci., 777:155–183, 2019. doi:10.1016/J.TCS.2019.02.001.
- [13] Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatowicz. Attested append-only memory: making adversaries stick to their word. In Thomas C. Bressoud and M. Frans Kaashoek, editors, Proceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP 2007, Stevenson, Washington, USA, October 14-17, 2007, pages 189–204. ACM, 2007. doi:10.1145/1294261.1294280.
- [14] Miguel Correia, Nuno Ferreira Neves, and Paulo Veríssimo. How to tolerate half less one byzantine nodes in practical distributed systems. In 23rd International Symposium on Reliable Distributed Systems (SRDS 2004), 18-20 October 2004, Florianpolis, Brazil, pages 174–183. IEEE Computer Society, 2004. doi:10.1109/RELDIS.2004.1353018.
- [15] Miguel Correia, Nuno Ferreira Neves, and Paulo Veríssimo. BFT-TO: intrusion tolerance with less replicas. Comput. J., 56(6):693–715, 2013. doi:10.1093/COMJNL/BXS148.
- [16] Miguel Correia, Giuliana Santos Veronese, and Lau Cheuk Lung. Asynchronous byzantine consensus with 2f+1 processes. In Sung Y. Shin, Sascha Ossowski, Michael Schumacher, Mathew J. Palakal, and Chih-Cheng Hung, editors, Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, 2010, pages 475–480. ACM, 2010. doi:10.1145/1774088.1774187.
- [17] Jérémie Decouchant, David Kozhaya, Vincent Rahli, and Jiangshan Yu. DAMYSUS: streamlined BFT consensus leveraging trusted components. In Yérom-David Bromberg, Anne-Marie Kermarrec, and Christos Kozyrakis, editors, EuroSys ’22: Seventeenth European Conference on Computer Systems, Rennes, France, April 5 - 8, 2022, pages 1–16. ACM, 2022. doi:10.1145/3492321.3519568.
- [18] Danny Dolev. The byzantine generals strike again. J. Algorithms, 3(1):14–30, 1982. doi:10.1016/0196-6774(82)90004-9.
- [19] Danny Dolev, Michael J. Fischer, Robert J. Fowler, Nancy A. Lynch, and H. Raymond Strong. An efficient algorithm for byzantine agreement without authentication. Inf. Control., 52(3):257–274, 1982. doi:10.1016/S0019-9958(82)90776-8.
- [20] Pesech Feldman and Silvio Micali. An optimal probabilistic protocol for synchronous byzantine agreement. SIAM J. Comput., 26(4):873–933, 1997. doi:10.1137/S0097539790187084.
- [21] Juan A. Garay. Reaching (and maintaining) agreement in the presence of mobile faults (extended abstract). In Gerard Tel and Paul M. B. Vitányi, editors, Distributed Algorithms, 8th International Workshop, WDAG ’94, Terschelling, The Netherlands, September 29 - October 1, 1994, Proceedings, volume 857 of Lecture Notes in Computer Science, pages 253–264. Springer, 1994. doi:10.1007/BFB0020438.
- [22] Timo Hanke, Mahnush Movahedi, and Dominic Williams. DFINITY technology overview series, consensus system. CoRR, abs/1805.04548, 2018. arXiv:1805.04548.
- [23] Leslie Lamport, Robert E. Shostak, and Marshall C. Pease. The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, 1982. doi:10.1145/357172.357176.
- [24] Dave Levin, John R. Douceur, Jacob R. Lorch, and Thomas Moscibroda. Trinc: Small trusted hardware for large distributed systems. In Jennifer Rexford and Emin Gün Sirer, editors, Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2009, April 22-24, 2009, Boston, MA, USA, pages 1–14. USENIX Association, 2009. URL: http://www.usenix.org/events/nsdi09/tech/full_papers/levin/levin.pdf.
- [25] Jean-Philippe Martin and Lorenzo Alvisi. Fast byzantine consensus. IEEE Trans. Dependable Secur. Comput., 3(3):202–215, 2006. doi:10.1109/TDSC.2006.35.
- [26] Michael Okun and Amnon Barak. Efficient algorithms for anonymous byzantine agreement. Theory Comput. Syst., 42(2):222–238, 2008. doi:10.1007/S00224-007-9006-9.
- [27] Rafail Ostrovsky and Moti Yung. How to withstand mobile virus attacks (extended abstract). In Luigi Logrippo, editor, Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing, Montreal, Quebec, Canada, August 19-21, 1991, pages 51–59. ACM, 1991. doi:10.1145/112600.112605.
- [28] Marshall C. Pease, Robert E. Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, 1980. doi:10.1145/322186.322188.
- [29] Michel Raynal. Fault-tolerant Agreement in Synchronous Message-passing Systems. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers, 2010. doi:10.2200/S00294ED1V01Y201009DCT003.
- [30] Rüdiger Reischuk. A new solution for the byzantine generals problem. Inf. Control., 64(1-3):23–42, 1985. doi:10.1016/S0019-9958(85)80042-5.
- [31] Toru Sasaki, Yukiko Yamauchi, Shuji Kijima, and Masafumi Yamashita. Mobile byzantine agreement on arbitrary network. In Roberto Baldoni, Nicolas Nisse, and Maarten van Steen, editors, Principles of Distributed Systems - 17th International Conference, OPODIS 2013, Nice, France, December 16-18, 2013. Proceedings, volume 8304 of Lecture Notes in Computer Science, pages 236–250. Springer, 2013. doi:10.1007/978-3-319-03850-6_17.
- [32] Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau Cheuk Lung, and Paulo Veríssimo. Efficient byzantine fault-tolerance. IEEE Trans. Computers, 62(1):16–30, 2013. doi:10.1109/TC.2011.221.
- [33] Sravya Yandamuri, Ittai Abraham, Kartik Nayak, and Michael K. Reiter. Communication-efficient BFT protocols using small trusted hardware to tolerate minority corruption. IACR Cryptol. ePrint Arch., page 184, 2021. URL: https://eprint.iacr.org/2021/184.
Appendix A Induction proof for indistinguishability argument
We now provide the full inductive proof of the property , used in the proof of Theorem 1.
We define the predicate as:
Base case ()
The initial local states in and are the same for , since both propose 0, and is correct in both. Similarly, is correct and proposes 1 in both and , so:
Thus, holds.
Inductive step
Assume holds. We now show that holds as well.
-
In and , starts round in the same state and is correct in both. Hence, it sends the same message to all processes.
-
In and , starts round in the same state and is also correct in both. Hence, it sends in both.
-
For , in it is faulty and always silent. In , is either (1) faulty, in which case the malicious agent forces to send ), or (2) cured, in which case it remains silent.
-
Therefore, receives the same set of messages in round in and . Likewise for between and .
-
Since both and are correct, they apply the same algorithm to identical input messages and compute the same next state:
Hence, holds. By induction, is true for all .
Appendix B Proof for Lemma 11
We consider the th phase were remains non-faulty for at least rounds.
In round , by Lemma 10, all non-faulty processes can have at most one non-bottom value. If any non-faulty process has a non-bottom value, denote .
In round , all processes, which were non-faulty in round , send at line 9 of round either or (if exists), then all non-faulty processes of round receive at most values different from and . Therefore, the arrays of all non-faulty processes contain at most elements different from and and these elements are all located at the same indexes.
In round , all processes which were non-faulty in round , send at line 12 of round an array satisfying the previous conditions, then all non-faulty processes of round update such that:
The remaining of the proof follows the same as in the round part of Lemma 6.
Appendix C Proof for Lemma 12
Round . All non-faulty processes (at least ) of round start the round with the same value . They all send this at line 4. Then all non-faulty processes of round receive at least times. Then all of them update their to .
Round . All non-faulty processes of round send at line 9 in round . Then all non-faulty processes of round receive at least times, thus form an array containing at least times the value at the indexes corresponding to .
Round . All non-faulty processes of round send an array at line 12 in round containing at least times the value . Therefore all non-faulty processes obtain a satisfying:
As , all non-faulty processes execute line 17 for all indexes corresponding to :
Then for all non-faulty processes, the array contains at least times of , thus all non-faulty processes get positive test at line 18, which means all of them update their to , which concludes the proof.
