How Robust Are Synchronous Consensus Protocols?
Abstract
Synchronous Byzantine fault-tolerant (BFT) protocols have long been a reality in an academic setting, yet their practicality remains debated. The main concern of skeptics of synchronous systems is that the correctness of these protocols depends on the timely delivery of all messages within a predefined synchronous bound, . This dependency creates a challenging tradeoff between protocol correctness and performance, as directly impacts both. In this paper, we examine this tradeoff in detail. Specifically, we introduce BoundBFT, a new synchronous BFT consensus protocol. We analyze how BoundBFT’s correctness can be compromised and use this analysis to design and implement the most effective attack strategies that malicious processes could employ. Furthermore, we experimentally determine the synchronous bound that provides sufficient confidence in maintaining protocol correctness even in the presence of malicious replicas. Finally, we apply this discovered bound to BoundBFT, evaluate its performance, and compare it to state-of-the-art synchronous and partially synchronous protocols.
Keywords and phrases:
Synchronous Consensus, Byzantine Failures, BlockchainCopyright and License:
2012 ACM Subject Classification:
Computer systems organization Reliability ; Computer systems organization Availability ; Computer systems organization Redundancy ; Computing methodologies Distributed algorithmsFunding:
This work was partially supported by the Swiss National Science Foundation (grant 175717).Editors:
Silvia Bonomi, Letterio Galletta, Etienne Rivière, and Valerio SchiavoniSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Synchronous consensus protocols have long been a topic of debate in robust distributed systems. On the one hand, synchronous consensus protocols can tolerate Byzantine or malicious processes out of processes [17, 15, 26], an improvement over partially synchronous consensus protocols, which require [13]. On the other hand, the correctness of a synchronous protocol hinges on the timely delivery of messages within a fixed time bound, . To ensure synchrony is not violated (i.e., messages are delivered within ), existing synchronous consensus protocols set conservatively, as the 99.99-th percentile of sampled communication [30] or as a 10-time factor of average latency [2]. This reliance on presents a critical tradeoff: a more conservative minimizes the risk of synchrony violations, thus favoring correctness, but comes at the cost of reduced protocol performance, since synchronous protocols execute at the pace of .
This paper offers a new perspective on synchronous systems by starting with the observation that some synchronous consensus protocols are resilient to synchrony violations; that is, even if some messages are not delivered within , correctness is not compromised. As we now illustrate, resilience to synchrony violations happens due to communication diversity and redundancy in a protocol. In Figure 1 (left), process sends request to process and sets a timeout for the answer from . Even if ’s request violates synchrony (i.e., takes longer than to arrive at ), ’s response () makes up for the delay and arrives at within the expected . In Figure 1 (right), sends request to and and sets a timeout for their answer. Process receives timely, replies to () and relays to . Although receives from after , it receives from timely and responds to (). As a result, receives responses from and within the expected . These communication patterns are at the core of BoundBFT, a novel Byzantine fault-tolerant synchronous consensus protocol introduced in the paper.
Tolerating synchrony violations can provide substantial improvement in performance. To understand why, consider Table 1, reproduced from [30], which compares the 99.99-th and 99.999-th percentile of communication across Amazon EC2 datacenters. While a protocol that can tolerate one synchrony violation in every one hundred thousand messages exchanged between US West and US East must set to at least 82190 milliseconds, a protocol that tolerates a synchrony violation in every ten thousand transmitted messages can set to 1097 milliseconds. Since synchronous consensus protocols run at the pace of , this represents a 75 improvement in performance!
Since BoundBFT tolerates Byzantine failures, synchrony violations should not introduce vulnerabilities that could be exploited by malicious processes. In leader-based consensus protocols that tolerate Byzantine failures, such as BoundBFT, the leader is the most advantageous role for a malicious process as it can induce honest processes into inconsistent decisions, possibly with help from other malicious processes. In a synchronous protocol, the malicious leader can hope to get “additional help” from synchrony violations, as some honest processes may be delayed with respect to other processes. The attack will work as long as the deceived honest process does not find out about the trickery before deciding. But honest processes communicate with many other honest processes, so there is ample opportunity to find out about the attack even if some messages are delayed.
In this paper, we assess the robustness of BoundBFT, that is, its ability to maintain correctness under synchrony violations, both in the presence and absence of malicious processes. Leveraging BoundBFT’s leader-based execution model with signed messages, we first characterize the range of potential attacks and examine their effects when combined with synchrony violations. We then design and implement specific Byzantine attacks to rigorously evaluate BoundBFT’s robustness. Namely, we conduct experiments to determine an appropriate synchrony bound, , that provides high confidence in preserving protocol correctness, even under the attack. Finally, we apply this bound to evaluate BoundBFT’s performance, enabling meaningful comparison with partially synchronous protocols.
| US West (CA) | Europe (EU) | Tokyo (JP) | Sydney (AU) | Sao Paolo (BR) | ||||||
| 99.99% | 99.999% | 99.99% | 99.999% | 99.99% | 99.999% | 99.99% | 99.999% | 99.99% | 99.999% | |
| US East (VA) | 1097 | 82190 | 1112 | 85649 | 1226 | 81177 | 1372 | 95074 | 1214 | 85434 |
| US West (CA) | 1184 | 1974 | 1133 | 1180 | 1209 | 6354 | 1252 | 90980 | ||
| Europe (EU) | 1310 | 1397 | 1375 | 3154 | 1257 | 1382 | ||||
| Tokyo (JP) | 1149 | 1414 | 2496 | 11399 | ||||||
| Sydney (AU) | 1496 | 2134 | ||||||||
We have implemented BoundBFT and compared it to state-of-the-art synchronous (Sync HotStuff [2, 3]) and partially synchronous consensus protocols (Tendermint [5] and HotStuff-2 [32]). Our evaluation in an emulated geographically distributed system showed that BoundBFT’s synchrony bounds can be in some cases more than one order of magnitude smaller than usual conservative synchronous bounds [30]. As a result, BoundBFT achieves latency comparable to partially synchronous consensus protocols, while offering higher throughput, reliability, and availability.
The remainder of the paper is structured as follows. Section 2 defines the system model and introduces background information on blockchain. Section 3 presents BoundBFT, a new Byzantine fault-tolerant consensus algorithm designed for the synchronous system model. Section 4 analyzes BoundBFT under synchrony violations and attacks. Section 5 experimentally evaluates BoundBFT and competing approaches. Section 6 overviews related work and Section 7 concludes. The Appendix contains BoundBFT’s proof of correctness and the full data of our experimental evaluation.
2 Background
2.1 System model
We consider a message-passing geographically distributed system consisting of a set of processes (or replicas) that do not have access to a shared memory or a global clock. Each process has its own local (hardware) clock, and while these clocks are not synchronized, they all run at the same speed. Processes can be honest or faulty. An honest process follows its specification; a faulty or Byzantine process presents arbitrary behavior. There are faulty processes out of processes. Processes communicate using point-to-point reliable links: every message an honest sender sends to an honest receiver is received.
We assume a synchronous system: there exists a known bound on maximal network transmission delay in communication between honest processes. We do not assume lock-step execution (e.g., [28, 12]); instead, we assume that all honest replicas start the execution within time. We compare our proposed synchronous protocol to protocols that assume partial synchrony: the system is initially asynchronous, without communication bounds, and eventually becomes synchronous.
We use cryptographic techniques for authentication, and digest calculation. We assume that adversaries (and Byzantine processes under their control) are computationally bound so that they are unable to subvert the cryptographic techniques used. Adversaries can coordinate Byzantine processes but cannot delay honest processes.
2.2 Blockchain
A blockchain is a distributed append-only log of transactions implemented by geographically distributed processes. A blockchain (consensus) protocol forms a chain of blocks, where a block’s position in the chain is the block’s height. A block at height has the following format where denotes a proposed value (i.e., a set of transactions) and is a hash digest of the predecessor block. The first block, , has no predecessor. Every subsequent block must specify a predecessor block by including a hash of it. A block is valid if (i) its predecessor is valid or , and (ii) its proposed value meets application-level validity conditions and is consistent with its chain of ancestors (e.g., there are no double-spending transactions). If block is an ancestor of block (i.e., ), we say extends . We say blocks and equivocate each other if they do not extend one another.
We assume that a blockchain (consensus) protocol must satisfy the following properties:
-
Agreement: No two honest replicas commit different blocks at the same height.
-
Progress: All honest replicas keep committing new blocks.
-
External validity: Every committed block satisfies the predefined valid() predicate.
3 BoundBFT
3.1 The protocol
BoundBFT is a synchronous BFT consensus protocol with a rotating leader [3]. It tolerates up to Byzantine replicas. BoundBFT adopts the good-case execution of the rotating-leader version of Sync HotStuff [3, 2], achieving optimal latency and responsive leader rotations [3]. However, it introduces a different epoch synchronization mechanism that reduces the waiting time for a new leader to propose from to when the previous leader is silent.
Algorithm 1 presents BoundBFT’s pseudo-code that covers executions when leaders are honest. BoundBFT’s execution evolves as a sequence of epochs, numbered , with each replica tracking the last epoch it started, denoted as . Each epoch has a designated leader, computed using a deterministic function .
-
1:
Initialization:
-
2:
the current epoch
-
3:
has the replica voted in the current epoch?
-
4:
the most recent block certificate the replica is aware of and…
-
5:
the block certified by
-
6:
the block certificate the replica is locked on and…
-
7:
the block certified by
-
8:
an epoch can be in one of the states: active, committed, not-committed
-
2:
-
9:
when bootstrapping do the execution starts in epoch 0
-
10:
Procedure : upon starting an epoch:
-
11:
the replica resets the current epoch variables
-
12:
-
13:
-
14:
if then if the replica is the leader in the current epoch…
-
15:
it gets new transactions to include in the new block
-
16:
if then then, if it knows of a previously certified block…
-
17:
it links the new block with that block
-
17:
-
18:
broadcast lastly, it broadcasts the proposal with the new block…
and the certificate for the block it is extending,
-
15:
-
11:
-
19:
when receive where and upon receiving the valid proposal…
-
20:
and do from the leader of the current epoch:
-
21:
if if the epoch is still active, replica has not voted yet, and…
-
22:
then proposal’s is at least as recent as replica’s …
-
23:
broadcast the replica votes for a proposal, vote message contains block’s hash
-
24:
then, the replica sets so it does not vote twice, and…
-
25:
forward forwards the proposal message
-
23:
-
21:
-
26:
when receive and distinct when the replica receives a proposal and…
-
27:
where do votes from the current epoch:
-
28:
from it forms a block certificate
-
29:
if then if no misbehavior is noticed in the current epoch…
-
30:
the replica locks on this block by setting to and…
-
31:
to , and…
-
32:
start starts
-
30:
-
33:
the replica always updates its and …
-
34:
to the most recent block
-
35:
forward from lastly, the replica forwards the votes to other replicas and…
-
36:
starts the next epoch
-
28:
-
37:
when expires do when expires and…
-
38:
if then the replica did not observe any proof of misbehavior,
-
39:
the replica commits the block and…
-
40:
all its already uncommited ancestor blocks
-
39:
-
38:
At the start of an epoch, the leader broadcasts the proposal containing a new block that extends the most recently certified block it knows of, (lines 14:–18: in Algorithm 1). Along with the new block, the leader includes the certificate for , .
Upon receiving a proposal (lines 19:–25: in Algorithm 1), a replica verifies the proposal’s validity and votes for it if the leader’s block certificate is at least as recent as the replica’s . The replica votes by sending a signed vote message to all replicas. A vote contains the current epoch number and the hash of the block, .
When a replica receives a proposal and votes for it, it forms a block certificate for the proposed block. If the replica has no proof of leader misbehaving, it locks on and triggers (lines 26:–32: in Algorithm 1). The replica then updates its and variables (lines 33:–36: in Algorithm 1) and starts epoch . To ensure all honest replicas receive the proposal and its certificate, the replica forwards them (lines 25: and 35: in Algorithm 1), allowing all honest replicas to start epoch within time.
When expires and the replica has no evidence of leader misbehavior for epoch , it commits block and all blocks extends (lines 37:–40: in Algorithm 1). In other words, it directly commits block and indirectly commits all its uncommitted ancestor blocks.
The replica does not wait for to expire before starting the next epoch. Instead, it begins epoch immediately after receiving a block certificate in epoch (line 36: in Algorithm 1). This approach allows BoundBFT to change leaders without waiting for the conservative network delay when we have a sequence of honest leaders, a property known as optimistic responsiveness [32]. Additionally, BoundBFT implements pipelining [39], enabling replicas to start working on the next block before committing the previous one. Specifically, the leader in epoch will propose a new block once it receives a block certificate for a block in epoch .
Algorithm 2 presents BoundBFT’s pseudo-code responsible for handling Byzantine leaders. To detect a malicious leader, a replica starts a timer, , when it enters epoch (line 2: in Algorithm 2). If expires and is still in epoch , it indicates that did not receive a block certificate, which can only occur if the leader is Byzantine. Consequently, replica blames the leader and broadcasts a message (lines 3:–5: in Algorithm 2).
-
1:
upon starting the epoch do when a replica enters a new epoch:
-
2:
start it starts the timer used to detect a malicious leader
-
2:
-
3:
when expires do when expires…
-
4:
if then in the current epoch that is still active
-
5:
broadcast the replica blames the leader by broadcasting a blame message
-
5:
-
4:
-
6:
when receive distinct do when receiving distinct blame messages from an epoch:
-
7:
from the replica forms a blame certificate and…
-
8:
calls with the certificate and epoch as parameters
-
7:
-
9:
when receive and when replica receives two proposals…
-
10:
where and do from leader for two distinct blocks:
-
11:
from and the replica forms…
-
12:
an equivocation certificate and calls
-
11:
-
13:
Procedure : when misbehavior is detected in an epoch:
-
14:
if then if the epoch is still active
-
15:
the replica sets state to not-committed
-
16:
if then if is the first certificate for the current epoch,
-
17:
forward from the replica forwards the messages from certificate and…
-
18:
start triggers
-
17:
-
15:
-
14:
-
19:
when expires do when expires:
-
20:
if then if the replica is in epoch
-
21:
the replica starts the next epoch
-
21:
-
20:
When a replica receives blame messages for epoch from distinct replicas, it has proof that at least one honest replica blamed the leader and forms a blame certificate (lines 6:–7: in Algorithm 2).
Additionally, if an honest replica receives proposals for two distinct blocks signed by the leader in the same epoch , it has proof that the leader is misbehaving. The replica then constructs an equivocation certificate (lines 9:–11: in Algorithm 2).
Whenever a replica has proof of the leader’s misbehavior (i.e., a blame or equivocation certificate), it calls the function and forwards the certificate and epoch number to it (lines 8: and 12: in Algorithm 2). If a block is not committed in epoch , the replica marks the epoch state as not-committed (line 15: in Algorithm 2). Moreover, if is the first certificate in epoch , the replica forwards the certificate and triggers (lines 16:–18: in Algorithm 2). Forwarding the certificate ensures that all honest replicas learn that the leader is Byzantine within time. Additionally, the extra timeout allows the replica to learn if an honest replica moved to the next epoch before detecting leader misbehavior, i.e., received a block certificate in epoch , locked on it, and moved to the next epoch.
When expires and the replica is still in epoch , it moves to epoch (lines 19:–21: in Algorithm 2). Replicas wait for before moving to the next epoch only in the case of a Byzantine leader. If the leader is honest, replicas form a block certificate and move to the next epoch without waiting for any timeouts.
3.2 BoundBFT’s correctness
In this section, we provide the intuition behind BoundBFT’s correctness. The appendix contains a detailed correctness proof of BoundBFT.
3.2.1 Intuition behind epoch synchronization
The epoch synchronization mechanism guarantees that honest replicas progress through each epoch in a coordinated manner. Specifically, all honest replicas initiate each epoch within time, ensuring synchronization. Additionally, Byzantine replicas cannot disrupt or halt the protocol during any epoch.
The epoch synchronization mechanism in BoundBFT relies on certificates: to start a new epoch, a certificate (i.e., block, blame, or equivocation certificate) must be formed in the previous epoch. BoundBFT ensures that, regardless of Byzantine behavior, a certificate is created in each epoch.
The mechanism BoundBFT employs to guarantee the existence of a certificate is as follows: honest replicas initiate a upon entering a new epoch (line 2: in Algorithm 1). If the timeout expires without receiving a certificate, the replica blames the leader. This results in two possible outcomes: (i) an honest replica forms one of the certificates, or (ii) no honest replica receives a certificate before the expires. In case (ii), all honest replicas will blame the leader, resulting in the formation of a blame certificate.
Once a certificate is ensured for an epoch, synchronizing replicas becomes straightforward: each replica forwards the received certificate (lines 25: and 35: in Algorithm 1 and line 17: in Algorithm 2). Within time, all honest replicas receive the certificate and start the next epoch if they have not already done so.
3.2.2 Intuition behind agreement
BoundBFT ensures that no two honest replicas commit different blocks in the same blockchain height. Consequently, the resulting blockchain remains consistent and does not have forks.
In epochs with a Byzantine leader, multiple certificates can be created. As a result, different honest replicas may start the next epoch receiving different certificates. For instance, one honest replica may receive a block certificate, while another may receive an equivocation certificate. To account for this scenario, an honest replica commits a proposed block in epoch only if it knows that the first certificate received by all honest replicas in is a certificate for . This guarantees two properties: (i) all honest replicas vote for block , and (ii) all honest replicas lock on block in epoch . Property (i) ensures that no other block can be certified and afterward committed in epoch . Property (ii) guarantees that honest replicas vote only for blocks extending in the following epochs. As a result, only and blocks extending will be certified and committed in epochs , and the agreement property will be satisfied.
The mechanism BoundBFT uses to verify the commit condition is as follows. Upon receiving a certificate for block , , as the first certificate in epoch , forwards the certificate and triggers at time (lines 32: and 35: in Algorithm 1). Consequently, knows that all honest replicas will receive by time . If an honest replica received a different certificate before , it must have received it at time . Since also forwards its certificate, will receive it by time . Therefore, setting to ensures that receives ’s certificate on time. Ultimately, if expires and has not heard about any other certificates, can be sure that the first certificate received by all honest replicas in is . In this case, commits block .
3.2.3 Intuition behind progress
BoundBFT ensures that all honest replicas commit a new block in every epoch with an honest leader. It does so by: (i) ensuring that all honest replicas vote for the leader’s proposal, and (ii) preventing the creation of blame or equivocation certificates. Property (i) guarantees the creation of a unique block certificate, while property (ii) guarantees that all honest replicas must receive the block certificate, trigger , and, when it expires, commit the proposed block.
An honest leader of an epoch proposes a new block that extends its and sends together with the new block. Other honest replicas will vote for the new proposal only if the sent by the leader is at least as recent as their . Consequently, BoundBFT ensures that whenever an honest replica locks on a block in epoch , all honest replicas update the and to the block certified in epoch . Therefore, the s on all honest replicas are always at least as recent as s on all honest replicas.
BoundBFT ensures that and are always up to date by relying on a mechanism that uses . Namely, an honest replica cannot start the next epoch immediately if the first certificate it receives in the current epoch is a blame or equivocation certificate. Instead, it must ensure no other honest replica locks on a block in this epoch. Consequently, forwards its certificate (line 17: in Algorithm 2), knowing that in time, all honest replicas will receive it. If any honest replica locked on a block, it must have done so before receiving the forwarded certificate. As a result, upon forwarding its certificate, sets to expire in time (line 18: in Algorithm 2). Moreover, starts the next epoch only when this timeout expires, or it receives the block certificate for the current epoch. Since also forwards the certificate after locking (line 35: in Algorithm 1), knows it will receive it before expires. Notably, will not lock on a block certificate if it receives the certificate after is initiated.
Lastly, BoundBFT must ensure that no equivocation or blame certificates are possible in epochs with honest leaders. An equivocation certificate will not be formed since the honest leader will not propose two different blocks. However, ensuring that no blame certificate is possible requires that no honest replica blames the leader. In other words, every honest replica must receive the block certificate before expires. Consequently, honest replicas set to . The first accounts for epoch drift time, the second for the time it takes for the leader’s proposal to reach all honest replicas, and the last is for the reception of the votes broadcast by honest replicas. Since we already showed that all honest replicas will vote for the honest leader, the block certificate is formed on all honest replicas before the expires, and no honest replica blames the leader.
4 Debunking synchrony violations
BoundBFT relies on synchrony every time it uses one of its timeouts, expressed as a multiple of a synchrony bound . A synchrony violation may result in a scenario where the timeout expires before a replica receives an expected message from some honest replica. We refer to this phenomenon as a timeout violation. In this section, we first examine how malicious replicas may attempt to compromise BoundBFT. We then consider the consequences of timeout violations in the presence and absence of malicious replicas. We present a detailed Byzantine protocol in the appendix.
4.1 Byzantine behavior
Listing possible faulty behaviors of Byzantine replicas is unusual since, by definition, a Byzantine replica can behave arbitrarily. In the context of leader-based protocols where messages are signed, however, the scope for deviation is limited, as we now explain.
In the case of a Byzantine leader, these are the possible faulty behaviors:
-
SILENCE: The leader does not send a proposal to a subset of replicas (possibly all).
-
EQUIVOCATION: The leader proposes multiple blocks in the same epoch.
-
AMNESIA: The leader does not extend the blockchain with a new block and proposes an alternative block for one of the previously committed blocks.
In addition, non-leader Byzantine replicas may misbehave as follows.
-
MULTI-VOTE: A Byzantine replica can choose to vote for any proposal it likes. It can also vote for multiple proposals in the same epoch.
-
BLAME: A Byzantine replica can blame the leader by broadcasting a blame message at any point in the execution, even if the leader is honest.
In addition, Byzantine replicas can always remain silent or discard messages selectively.
4.2 Timeout
The is the only timeout responsible for BoundBFT’s agreement. An honest replica sets this timeout after locking on a block in an epoch (line 32: in Algorithm 1). The replica then forwards the block certificate and waits for this timeout to receive certificates from all other honest replicas.
If is violated, an honest replica may miss a certificate from some honest replicas. As a result, it may commit block in epoch thinking all honest replicas locked on , while in reality, some honest replicas received a different certificate and moved to epoch without locking on . If enough honest replicas did not lock on , the agreement might be compromised as honest replicas may vote for an alternative block , create a block certificate, and commit .
The likelihood of this situation in the absence of Byzantine replicas is low, as the following conditions must be fulfilled:
-
1.
A blame certificate must be formed in epoch , meaning a majority of replicas must have blamed the leader in epoch (i.e., was violated in all of these replicas).
-
2.
A majority of replicas did not lock on in epoch , receiving a blame certificate before receiving a block certificate for .
-
3.
The leader of epoch did not receive ’s block certificate in epoch , thus not updating its and to and ’s certificate (i.e., its was violated in epoch ).
Even though and are responsible for BoundBFT’s progress, in this scenario, they also play a role in guarding the protocol’s agreement.
Byzantine replicas can exploit violations and potentially compromise BoundBFT’s agreement through the following attacks:
-
AMNESIA-ATTACK: The Byzantine leader ignores the algorithm (line 17: in Algorithm 1) and does not propose a block that extends its validBlock. Instead, it proposes an alternative block for its validBlock (i.e., ). Byzantine replicas vote for this proposal. The agreement can be violated if an honest replica committed while some honest replicas, due to violations, did not lock on . As a result, these replicas will vote for block , and if their votes, together with Byzantine votes, form a majority, block will be certified and committed. To increase the probability of this scenario, in epochs with an honest leader, Byzantine replicas send votes for the block proposed by the honest leader to one subset of honest replicas to help them form the block certificate faster and commit a block. At the same time, they send blame messages to a different subset of honest replicas to help them form a blame certificate before receiving a block certificate.
-
EQUIVOCATION-ATTACK: The Byzantine leader proposes two distinct proposals in the same epoch, and Byzantine replicas vote for both. The first proposal and its votes are sent to one subset of honest replicas, and the second proposal and its votes to another subset. As a result, two honest replicas may vote for, receive block certificates, and commit different blocks if their s expire before they learn about the equivocated proposal. In epochs when the leader is honest, Byzantine replicas remain silent and update and to stay aware of the most recently certified block. This is important so that when they become the leader, Byzantine replicas can generate new blocks that honest replicas will vote for.
4.3 Timeout
Honest replicas initiate this timeout upon starting an epoch (line 2: in Algorithm 2). Its purpose is twofold. First, it ensures that an honest replica does not wait indefinitely for a silent malicious leader. Second, when the leader is honest and proposes a block that all honest replicas will vote for, should not expire before all honest replicas receive the proposal and the votes from all other honest replicas. In other words, before they receive a block certificate. Consequently, no honest replicas will blame an honest leader.
If is violated, an honest replica will incorrectly blame the honest leader. If a majority of honest replicas blame the leader, a blame certificate can be formed, and the decision might not be reached in the current epoch. However, the block will be committed when the next honest leader proposes a block and other honest replicas receive the block certificate on time.
The creation of a blame certificate is easier in the presence of Byzantine replicas:
-
BLAME-ATTACK: Byzantine replicas do not vote for the proposal sent by the honest leader. Instead, they broadcast blame messages upon starting the epoch with an honest leader. As a result, the blame certificate can be formed if a single honest replica blames the leader. In epochs when the leader is Byzantine, the Byzantine replicas just remain silent.
Apart from unconditionally blaming the leader and hoping that one of the honest replicas will also blame the leader, there is no other way for Byzantine replicas to prevent honest replicas from committing a block in epochs with honest leaders. violations may slow down the execution but will not lead to violations in agreement (i.e., when two honest replicas decide on different blocks).
4.4 Timeout
An honest replica triggers this timeout when it receives a blame or equivocation certificate as the first certificate in an epoch (line 18: in Algorithm 2). This timeout ensures that if another honest replica receives a block certificate as the first certificate and locks on it in the same epoch, will receive this block certificate and update its and before starting the next epoch. Consequently, starts the next epoch when it receives a block certificate from the current epoch or when expires.
If the timeoutEpochChange is violated, an honest replica will not hear about the locked block. Consequently, if the replica is the next epoch leader, it will propose a block that locked replicas will not accept. If the set of remaining honest replicas that vote for a proposed block is less than the majority, the block certificate will not be formed, and a decision will not be reached even though the epoch leader is honest. However, the new block will be committed when one of the locked honest replicas becomes a leader.
Honest replicas rely on this timeout only in epochs when an equivocation or a blame certificate is formed. In the absence of attacks, creating an equivocation certificate is impossible. As a result, an honest replica uses solely in case it receives a blame certificate. This can happen only if - is violated on a majority of honest replicas, and they blame the honest leader.
With Byzantine replicas, however, both equivocation and blame certificates are possible. Byzantine replicas can exploit violations as follows:
-
EQUIVOCATION-CERTIFICATE-ATTACK: The Byzantine leader broadcasts a proposal for block . Then, the Byzantine replicas send votes for block to one subset of honest replicas to help them create a block certificate and lock on . At the same time, the Byzantine leader sends a second proposal for block to the other subset of honest replicas. These honest replicas will form an equivocation certificate and start . As a result, they will not lock on block .
-
BLAME-CERTIFICATE-ATTACK: Similarly, Byzantine replicas impose locking on one subset of honest replicas by sending a proposal and votes for block . Instead of equivocating, the Byzantine leader remains silent and does not send any proposal to the other subset of honest replicas. Moreover, all Byzantine replicas send blame messages to these replicas. If these replicas did not receive a block certificate before the expires,111Even though the Byzantine replicas do not send the proposal and votes to these replicas, they can receive the forwarded messages from other honest replicas and form a block certificate before expires. they will blame the leader and, together with blame messages from Byzantine replicas, form a blame certificate and start -, without locking on block .
In both attacks, Byzantine replicas remain silent in epochs with an honest leader. By remaining silent, these replicas ensure that if a violation happened and the next honest leader proposes a block that does not extend block , a block certificate will not be formed and a decision will not be reached.
Similarly to , violations of can slow down the execution but cannot lead to violations in agreement.
5 Evaluation
5.1 Experimental environment and setup
We conducted our experiments in a cluster with emulated wide-area latencies between 6 AWS zones (see Table 1). Latencies between nodes were configured using the Linux Traffic Control kernel module [25]. The emulated WAN provided an affordable approximation of the AWS environment since our evaluation required hundreds of hours of experiments (see Appendix B). The cluster contains 60 nodes divided in two groups: (i) EPYC Zen 2 with two 16-Core AMD EPYC 2881 MHz and 32GB of RAM, and (ii) HP SE1102 with two Quad-Core Intel Xeon 2.5GHz and 8GB of RAM. We implemented BoundBFT, all competing protocols, and all proposed attacks (see the Appendix) in Go. The implementations use SHA256 hashes and Ed25519 64-byte digital signatures. We rely on libp2p [1] for communication between replicas.
5.2 BoundBFT’s synchrony bound
In this section, we experimentally determine the value for BoundBFT’s synchrony bound : We ran BoundBFT in the presence of malicious replicas and determined that gives enough confidence that BoundBFT’s correctness will not be compromised. Initially, we set to 1250 ms (99.99%), the synchronous bound from [30], and gradually decreased it to the point where we started to observe BoundBFT’s agreement and progress violations. The complete data for these experiments can be found in Appendix B. In the following, we comment on the main takeaways.
When there is a single () or no () Byzantine replicas in the system, we did not observe any agreement violations, even if we set as low as 50 ms – the average latency between 80% of replicas in our system is higher than 50 ms. This shows that agreement violations are highly unlikely if the number of Byzantine replicas is low, even if many messages violate synchronous bounds. When the number of Byzantine replicas is (i.e., the maximum number of Byzantine replicas partially synchronous protocols can tolerate), the agreement violations were observed only when we lowered the to 50 ms. Moreover, even with ms, the BoundBFT’s agreement was violated in less than 10% of epochs in which the attack was launched. However, to prevent agreement violations when the number of Byzantine replicas is (i.e., the maximum BoundBFT can tolerate), we needed to increase to 150ms and 300ms, for 1KB and 32KB block sizes, respectively. This makes sense since to create a block certificate an honest replica needs to receive a vote from itself, Byzantine replicas, and one honest replica. Importantly, even in this case, the resulting was 8 and 4 times lower than the initial one. Notice that when , partially synchronous protocols will halt if Byzantine replicas remain silent.
Table 2 shows the s that resulted in no agreement violations and less than 5% of progress violations for all considered setups. Namely, no two honest replicas committed on different blocks for the same height and in less than 5% of epochs with an honest leader, some honest replicas did not commit a new block. The progress violations can also be lowered to 0 %, but this would require a slight increase in the chosen (see Appendix B). We believe this is not necessary since progress violations can only lead to a slight decrease in performance and do not affect agreement.
| Attack type | 1KB | 32KB | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| f=29 | f=19 | f=1 | f=29 | f=19 | f=1 | |||||||
| EQUIVOCATION | 150 | 150 | 100 | 100 | 100 | 100 | 150 | 300 | 150 | 150 | 100 | 100 |
| AMNESIA | 150 | 150 | 150 | 150 | 100 | 100 | 300 | 300 | 150 | 150 | 100 | 100 |
| EQUIVOCATION-CERTIFICATE | 150 | 150 | 150 | 150 | 100 | 100 | 300 | 300 | 150 | 150 | 100 | 100 |
| BLAME-CERTIFICATE | 150 | 150 | 150 | 150 | 100 | 100 | 150 | 150 | 150 | 150 | 100 | 100 |
| BLAME | 150 | 150 | 100 | 300 | 150 | 100 | ||||||
| NO ATTACK | 100 | 100 | ||||||||||
5.3 Performance
In this section, we compare BoundBFT to state-of-the-art partially synchronous and synchronous protocols. In the partially synchronous model, we choose Tendermint [5] and HotStuff-2 [32], the most recent protocol of the HotStuff family [39]. In the synchronous system model, we consider Sync HotStuff [2, 3]. We limited our evaluation to Byzantine consensus protocols with a rotating leader: a new leader is elected when the protocol changes epoch (or round or view) as part of the normal execution, not only when the leader fails. These protocols are preferred in blockchain systems since they provide better fairness and censorship resistance than protocols with a stable leader (e.g., PBFT) [3].
To generate a load in our experiments, we equip every replica with a built-in client that generates transactions in advance and stores them in a local pool. When a replica is a leader in an epoch, it takes transactions from the pool and forms a block. The block size defines the number of transactions taken from the pool. This design leaves the mempool (i.e., the part of a blockchain responsible for propagating client transactions across the system) out of the discussion as different systems may implement it in different ways. Consequently, the latencies we report in the paper represent consensus latencies (i.e., the time the leader of an epoch needs to commit a block). Throughput is computed as the rate of committed blocks per time unit. Every point in the graphs is an average of 3 runs. We ran each experiment for 5 minutes.
5.3.1 Latency
BoundBFT and Sync HotStuff wait for (BoundBFT’s ) before committing a block in epoch . Consequently, their latency is directly affected by the chosen synchrony bound. For BoundBFT, we adopt the synchrony bound based on the experiments from the previous section (see Table 2). Conversely, for Sync HotStuff, we use the from [30].


We measured latencies in a system with 60 replicas with 1KB and 32KB block sizes. Figure 2 (left) shows the average latency computed by epoch leaders. First, we notice the significant impact of BoundBFT’s synchrony bound on latency. Namely, BoundBFT achieves 5.4 and 3.4 lower latency than Sync HotStuff with 1KB and 32KB block sizes, respectively. Second, BoundBFT’s latency is in between the latencies of partially synchronous protocols. It is 1.3 and 1.8 lower than HotStuff-2’s latency and 1.4 and 2 higher than Tendermint’s for small and large blocks, respectively. HotStuff-2 has higher latency due to its linear communication pattern, which requires five communication steps, while Tendermint has quadratic communication and commits a block in only three communication steps. We can also see that HotStuff-2 has the most significant standard deviation; we attribute this to the lower redundancy due to its linear communication pattern.
5.3.2 Throughput
BoundBFT and Sync HotStuff [2, 3] use pipelining to limit the impact of on throughput, which allows the leader to propose a block that extends block after receiving , i.e., before committing block . This way, protocols can order multiple blocks in parallel, and the throughput is unaffected by . This technique was initially introduced in HotStuff [39], and we implemented a pipelined version of HotStuff-2. Adapting pipelining to Tendermint is more complex and out of the scope of this paper. Moreover, notice that we compare BoundBFT a pipelined partially synchronous protocol, HotStuff-2.
We evaluated throughout in a system with 60 replicas and various block sizes (see Figure 2 (right)). BoundBFT and Sync HotStuff have similar throughout, as they both start ordering the next block after receiving a certificate for the previous block. Moreover, they outperform partially synchronous protocols for all block sizes cosidered, reaching throughput more than 2 higher than Tendermint’s for all block sizes. The reason behind this is that Tendermint does not use pipelining. They also perform better, from 1.4 to 3, than HotStuff-2, a partially synchronous protocol with pipelining. This is because even though both protocols start ordering the next block after collecting a certificate for the previous block, the certificate in HotStuff-2 requires votes from a two-third majority of replicas, while in BoundBFT the votes from the majority are enough.
5.4 Summary
In this section, we summarize the main takeaways of our evaluation.
-
We did not observe any agreement violations when the number of Byzantine replicas was 0 and 1. Synchrony violations combined with a minority or one-third of colluded Byzantine replicas resulted in agreement violations.
-
BoundBFT can use a that is 4 to 8 smaller than the conservative 99.99% [30], allowing it to improve latency from 3.4 to 5.4.
-
BoundBFT’s is big enough to ensure correctness with high probability when the system is under attack.
-
BoundBFT achieves from 1.4 to 3 higher throughput and comparable latency to state-of-the-art partially synchronous protocols.
6 Related work
The synchronous system model in its purest form requires that every message sent in the system be delivered within some known synchrony bound . Some protocols have been designed for this model such as Dfinity [24] and Sync HotStuff [2, 3]. These protocols are optimal in terms of resilience [15, 26] as they can tolerate up to a minority of faulty processes (i.e., , where is the number of Byzantine processes out of ). However, deploying these protocols in a blockchain environment usually comes with conservative that should ensure with high probability that the system is synchronous. Consequently, these protocols perform poorly. BoundBFT belongs to this family of protocols and is similar to the rotating version of Sync HotStuff [3]. The two protocols share similar behavior in the common-case but they have different epoch synchronization mechanisms. In this paper, we show that we do not need to use conservative bounds when deploying BoundBFT and as a consequence it can deliver reasonably good performance.
Many deterministic consensus protocols assume the partially synchronous system model [13]. The model allows consensus protocols that need synchrony only for liveness but not for safety. More precisely, the partially synchronous system model ensures that after some point, usually referred to as GST (Global Stabilization Time), messages exchanged among honest processes will be delivered within an unknown synchrony bound. A fundamental limitation of the partially synchronous system model is that any consensus protocol designed for this model can tolerate up to one third of faulty processes (i.e., ) [13]. The first practical representative of these protocols is PBFT [7], and more recently, in the blockchain context, Tendermint [5], HotStuff [39], HotStuff-2 [32] and ICC [6], to name a few.
Guo et al. [23] introduced the “weak synchronous model” (called mobile sluggish model in [2] for consistency with other works in the literature). The idea is to distinguish between prompt processes, those that respect synchrony bounds, from sluggish processes, those for whom messages may violate synchrony bounds. Moreover, the set of sluggish processes may change over time. The mobile sluggish model weakens the synchronous model and may result in more practical protocols. However, this is true only in situations when the actual number of faulty processes in the system is less than the minority (e.g., [2, 8]).
XFT [30] is based on the observation that typical BFT consensus protocols assume a powerful adversary that fully controls malicious processes and the network between honest processes, which is unrealistic. We share this view, and consider an adversary that cannot control the network between honest processes. XFT differentiates between three types of faulty processes: crash, Byzantine, and partitioned (i.e., processes that cannot exchange messages with other honest processes within the known synchrony bound). It ensures progress as long as the total number of faulty processes in the system is lower than . In other words, XFT assumes a majority of honest replicas that can communicate timely. Since selecting a quorum of responsive replicas out of replicas requires an exponential number of attempts, the solution is practical when is small.
An alternative way to increase the resilience of consensus protocols while assuming a partially synchronous system is to rely on trusted hardware (e.g., Trusted Execution Environment). This idea was introduced in A2M [9] and explored in many works (e.g, [10, 29, 37, 29, 38, 11]). These techniques have the disadvantage of requiring blockchain servers to be equipped with special hardware, which is not the case with BoundBFT.
The consensus protocols presented above assume some synchrony to circumvent the FLP impossibility result [16]. Alternatively, there are many asynchronous consensus protocols that rely on randomization to solve consensus [4, 34, 14, 20, 36, 33, 22, 18, 21]. Asynchronous protocols are robust since they do not assume any synchrony, but they provide probabilistic guarantees and perform worse than partially synchronous and synchronous protocols. Consequently, some protocols use a simpler leader-based deterministic protocol to improve the latency in good cases [19, 27, 35, 31].
7 Conclusion
In this paper, we have shown how we can circumvent major performance drawbacks of synchronous consensus protocols by choosing protocol-specific time bounds instead of conservative model-specific bounds. Instead of ensuring that all messages are received within synchronous bounds with high probability, we have analyzed protocol semantics and potential correctness violations in case of synchrony violations and Byzantine attacks, and have shown how to select a time bound that does not hurt correctness. As a showcase, we designed BoundBFT, a new Byzantine fault-tolerant synchronous consensus protocol, and have shown experimentally that BoundBFT withstands synchrony bound violations under attack and outperforms traditional synchronous consensus protocols. Furthermore, BoundBFT achieves similar latency and better throughput than state-of-the-art partially synchronous protocols, offering higher resilience.
References
- [1] Libp2p. https://libp2p.io. [Accessed 2023-01-12].
- [2] Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, and Maofan Yin. Sync HotStuff: Simple and practical synchronous state machine replication. In 2020 IEEE Symposium on Security and Privacy (SP), May 2020. doi:10.1109/sp40000.2020.00044.
- [3] Ittai Abraham, Kartik Nayak, and Nibesh Shrestha. Optimal good-case latency for rotating leader synchronous BFT. In 25th International Conference on Principles of Distributed Systems (OPODIS 2021), pages 27:1–27:19, 2022. doi:10.4230/LIPICS.OPODIS.2021.27.
- [4] Michael Ben-Or. Another advantage of free choice: Completely asynchronous agreement protocols (extended abstract). In Robert L. Probert, Nancy A. Lynch, and Nicola Santoro, editors, Proceedings of the Second Annual ACM Symposium on Principles of Distributed Computing, Montreal, Quebec, Canada, August 17-19, 1983, PODC’83, pages 27–30. ACM, August 1983. doi:10.1145/800221.806707.
- [5] Ethan Buchman, Jae Kwon, and Zarko Milosevic. The latest gossip on BFT consensus. CoRR, July 2018. doi:10.48550/arXiv.1807.04938.
- [6] Jan Camenisch, Manu Drijvers, Timo Hanke, Yvonne-Anne Pignolet, Victor Shoup, and Dominic Williams. Internet computer consensus. In Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, PODC’22, pages 81–91, July 2022. doi:10.1145/3519270.3538430.
- [7] Miguel Castro and Barbara Liskov. Practical byzantine fault tolerance. In Margo I. Seltzer and Paul J. Leach, editors, Proceedings of the Third USENIX Symposium on Operating Systems Design and Implementation (OSDI), New Orleans, Louisiana, USA, February 22-25, 1999, OSDI ’99, pages 173–186, USA, 1999. USENIX Association. URL: https://dl.acm.org/citation.cfm?id=296824.
- [8] T-H. Hubert Chan, Rafael Pass, and Elaine Shi. Pili: An extremely simple synchronous blockchain. Cryptology ePrint Archive, Paper 2018/980, 2018. URL: https://eprint.iacr.org/2018/980.
- [9] Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatowicz. Attested append-only memory: making adversaries stick to their word. In Thomas C. Bressoud and M. Frans Kaashoek, editors, Proceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP 2007, Stevenson, Washington, USA, October 14-17, 2007, volume 41(6), pages 189–204. ACM, October 2007. doi:10.1145/1294261.1294280.
- [10] Miguel Correia, Nuno Ferreira Neves, and Paulo Veríssimo. How to tolerate half less one byzantine nodes in practical distributed systems. In 23rd International Symposium on Reliable Distributed Systems (SRDS 2004), 18-20 October 2004, Florianpolis, Brazil, pages 174–183. IEEE Computer Society, August 2004. doi:10.1109/RELDIS.2004.1353018.
- [11] Jérémie Decouchant, David Kozhaya, Vincent Rahli, and Jiangshan Yu. DAMYSUS: streamlined bft consensus leveraging trusted components. In Proceedings of the Seventeenth European Conference on Computer Systems. ACM, March 2022. doi:10.1145/3492321.3519568.
- [12] Danny Dolev and H. Strong. Authenticated algorithms for byzantine agreement. SIAM J. Comput., 12:656–666, November 1983. doi:10.1137/0212045.
- [13] Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM (JACM), 35(2):288–323, April 1988. doi:10.1145/42282.42283.
- [14] Pesech Feldman and Silvio Micali. An optimal probabilistic protocol for synchronous byzantine agreement. SIAM Journal on Computing, 26(4):873–933, August 1997. doi:10.1137/s0097539790187084.
- [15] Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. Distributed Comput., 1(1):26–39, January 1986. doi:10.1007/BF01843568.
- [16] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of ACM, 32(2):374–382, April 1985. doi:10.1145/3149.214121.
- [17] Matthias Fitzi. Generalized communication and security models in Byzantine agreement. PhD thesis, ETH Zurich, Zürich, Switzerland, March 2003. Reprint as vol. 4 of ETH Series in Information Security and Cryptography, ISBN 3-89649-853-3, Hartung-Gorre Verlag, Konstanz, 2003. URL: https://d-nb.info/967397375.
- [18] Yingzi Gao, Yuan Lu, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. Dumbo-ng: Fast asynchronous BFT consensus with throughput-oblivious latency. In Heng Yin, Angelos Stavrou, Cas Cremers, and Elaine Shi, editors, Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022, CCS ’22, pages 1187–1201, New York, NY, USA, 2022. ACM. doi:10.1145/3548606.3559379.
- [19] Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, and Zhuolun Xiang. Jolteon and ditto: Network-adaptive efficient consensus with asynchronous fallback. CoRR, abs/2106.10362, June 2021. doi:10.48550/arXiv.2106.10362.
- [20] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich. Algorand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017, SOSP ’17, pages 51–68, New York, NY, USA, 2017. ACM. doi:10.1145/3132747.3132757.
- [21] Bingyong Guo, Yuan Lu, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. Speeding dumbo: Pushing asynchronous bft closer to practice. Cryptology ePrint Archive, Paper 2022/027, 2022. URL: https://eprint.iacr.org/2022/027.
- [22] Bingyong Guo, Zhenliang Lu, Qiang Tang, Jing Xu, and Zhenfeng Zhang. Dumbo: Faster asynchronous BFT protocols. In Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna, editors, CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020, CCS ’20, pages 803–818, New York, NY, USA, 2020. ACM. doi:10.1145/3372297.3417262.
- [23] Yue Guo, Rafael Pass, and Elaine Shi. Synchronous, with a chance of partition tolerance. In Advances in Cryptology – CRYPTO 2019, pages 499–529, August 2019. doi:10.1007/978-3-030-26948-7_18.
- [24] Timo Hanke, Mahnush Movahedi, and Dominic Williams. DFINITY technology overview series, consensus system. CoRR, abs/1805.04548, May 2018. arXiv:1805.04548, doi:10.48550/arXiv.1805.04548.
- [25] Bert Hubert, Gregory Maxwell, Martijn van Oosterhout, Remco van Mook, Paul B. Schroeder, et al. Linux advanced routing & traffic control HOWTO. https://lartc.org/lartc.html, 2002. [Accessed 2024-22-5].
- [26] Jonathan Katz and Chiu-Yuen Koo. On expected constant-round protocols for byzantine agreement. In Advances in Cryptology - CRYPTO 2006, 26th Annual International Cryptology Conference, volume 4117 of Lecture Notes in Computer Science, pages 445–462. Springer, 2006. doi:10.1007/11818175_27.
- [27] Klaus Kursawe and Victor Shoup. Optimistic asynchronous atomic broadcast. IACR Cryptol. ePrint Arch., page 22, 2001. URL: http://eprint.iacr.org/2001/022.
- [28] Leslie Lamport, Robert E. Shostak, and Marshall C. Pease. The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, July 1982. doi:10.1145/357172.357176.
- [29] Dave Levin, John R Douceur, Jacob R Lorch, and Thomas Moscibroda. TrInc: Small trusted hardware for large distributed systems. In NSDI, volume 9, pages 1–14, 2009. URL: http://www.usenix.org/events/nsdi09/tech/full_papers/levin/levin.pdf.
- [30] Shengyun Liu, Paolo Viotti, Christian Cachin, Vivien Quéma, and Marko Vukolic. XFT: practical fault tolerance beyond crashes. In Kimberly Keeton and Timothy Roscoe, editors, 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, OSDI’16, pages 485–500, USA, 2016. USENIX Association. URL: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/liu.
- [31] Yuan Lu, Zhenliang Lu, and Qiang Tang. Bolt-dumbo transformer: Asynchronous consensus as fast as pipelined BFT. CoRR, abs/2103.09425, 2021. arXiv:2103.09425, doi:10.48550/arXiv.2103.09425.
- [32] Dahlia Malkhi and Kartik Nayak. Extended abstract: Hotstuff-2: Optimal two-phase responsive BFT. IACR Cryptol. ePrint Arch., page 397, 2023. URL: https://eprint.iacr.org/2023/397.
- [33] Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. The honey badger of BFT protocols. In Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi, editors, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, CCS ’16, pages 31–42, New York, NY, USA, 2016. ACM. doi:10.1145/2976749.2978399.
- [34] Michael O. Rabin. Randomized byzantine generals. In 24th Annual Symposium on Foundations of Computer Science (sfcs 1983), November 1983. doi:10.1109/sfcs.1983.48.
- [35] HariGovind V. Ramasamy and Christian Cachin. Parsimonious asynchronous byzantine-fault-tolerant atomic broadcast. IACR Cryptol. ePrint Arch., page 82, 2006. URL: http://eprint.iacr.org/2006/082.
- [36] Dmitry Tanana. Avalanche blockchain protocol for distributed computing security. In 2019 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom). IEEE, June 2019. doi:10.1109/blackseacom.2019.8812863.
- [37] Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau Cheuk Lung, and Paulo Verissimo. Efficient byzantine fault-tolerance. IEEE Transactions on Computers, 62(1):16–30, 2013. doi:10.1109/TC.2011.221.
- [38] Sravya Yandamuri, Ittai Abraham, Kartik Nayak, and Michael K. Reiter. Communication-efficient bft protocols using small trusted hardware to tolerate minority corruption. Cryptology ePrint Archive, Paper 2021/184, 2021. URL: https://eprint.iacr.org/2021/184.
- [39] Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, and Ittai Abraham. HotStuff: BFT consensus with linearity and responsiveness. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC’19, pages 347–356, July 2019. doi:10.1145/3293611.3331591.
Appendix A Appendix: Algorithm correctness
A.1 Proof of correctness
This section presents the proof that BoundBFT satisfies all the properties of blockchain consensus protocol (Section 2.2).
Lemma 1.
Every honest replica always moves to the next epoch.
Proof.
Assume for contradiction that an honest replica remains in some epoch indefinitely. This would imply that did not generate any of the certificates , , or . However, each honest replica starts the timer, , upon entering epoch (line 2: in Algorithm 2). When this timeout expires, if an honest replica has not received any certificate, it broadcasts the blame message (lines 3:–5: in Algorithm 2). Consequently, if no certificate is formed before the expires, all honest replicas will broadcast the blame message, leading to the formation of the blame certificate . This contradicts our assumption, thus proving that every honest replica moves to the next epoch.
Lemma 2.
If an honest replica starts epoch at time , then all honest replicas start epoch by time .
Proof.
Suppose an honest replica starts epoch at time . This implies that receives and broadcasts at time (lines 35:–36: in Algorithm 1), or at time , receives and broadcasts or (lines 17: and 21: in Algorithm 2). Messages with certificates will arrive within time. Consequently, in the former case, all honest replicas receive by time and start epoch . In the latter case, all honest replicas receive or by time and within they start epoch , ensuring that all honest replicas start epoch by time .
Theorem 3.
(Epoch synchronization) All honest replicas continuously move through epochs, with each replica starting a new epoch within time of any other honest replica.
Proof.
First, from Lemma 1, we know that every honest replica always moves to the next epoch. This ensures that no honest replica remains stuck in any epoch indefinitely.
Second, from Lemma 2, we know that if an honest replica starts epoch at time , then all honest replicas start epoch by time . This guarantees that all honest replicas start each epoch within time of each other.
Combining these two results, we can conclude that all honest replicas continuously move through epochs, with each replica initiating a new epoch within time of any other honest replica.
Lemma 4.
If an honest replica directly commits block in epoch , then (i) no block different than can be certified in epoch , and (ii) every honest replica locks on block in epoch .
Proof.
Suppose an honest replica directly commits in epoch at time (line 40: in Algorithm 1). This means that at time , received , locked on it, and started (lines 26:–32: in Algorithm 1). Moreover, replica forwarded all messages representing (lines 25: and 35: in Algorithm 1) so all honest replicas received these messages in time, by time .
For part (i), assume for a contradiction that some honest replica received and voted in epoch for the block . Since every honest replica votes only once, must have received a proposal for before receiving a proposal message for , at some time . As a result, forwards the propose message for at time (line 25: in Algorithm 1). Replica will receive this message by time , that is, before . Since these two propose messages lead to a certificate, would not commit (lines 9:–12: and 15: in Algorithm 2), a contradiction. Therefore, property (i) holds since no honest replica votes for a block different than ; otherwise replica would not commit.
For part (ii), we know by (i) that if replica directly commits in epoch , there is not any possible . So, we need to prove that every honest replica receives before receiving or . For a contradiction, assume that receives or before receiving . This must happen at time as receives by time . After receiving or , broadcasts them (line 17: in Algorithm 2). So, broadcasts or at time and receives them by time . Since replica will not commit , a contradiction.
Lemma 5.
If is the only certified block in epoch and honest replicas lock on block in epoch ( and ), then in all epochs , they vote only for blocks extending , or they blame the proposer.
Proof.
The proof proceeds by induction on the epoch number.
Base step ().
Let denote the set of honest replicas. The replicas in set do not vote for proposals that do not extend blocks certified in epochs higher than or equal to their (line 22: in Algorithm 1). As a result, when expires, no block certificate will be formed since no honest replica has voted, causing honest replicas to blame the proposer by sending message. Therefore, the lemma holds for the base step since honest replicas vote only for a block if it extends .
Induction step ().
Assume that no replica in set has voted for a block not extending until epoch . We now show that the lemma holds for epoch . Since replicas from the set vote for blocks extending or blame the proposer in epochs , no block not extending can receive votes in those epochs. Therefore, for all processes in set , and , where or extends . Assume, for the sake of contradiction, that a process in set votes in epoch for a block not extending . An honest replica will not vote for a block not extending its (line 22: in Algorithm 1), leading to a contradiction. Hence, the lemma holds for epoch as well.
Lemma 6.
If an honest replica directly commits block in epoch , then any block that is certified in epoch must extend .
Proof.
The proof follows directly from Lemmas 4 and 5. More precisely, if an honest replica directly commits block in epoch , by Lemma 4, we know that honest replicas (set ) lock on block in epoch and is the only certified block in epoch . Consequently, by Lemma 5, replicas from vote only for the blocks extending block in epochs . Therefore, no block that does not extend can collect votes and thus cannot be certified in any epoch .
Theorem 7 (Agreement).
No two honest replicas commit different blocks at the same height.
Proof.
Suppose, for the sake of contradiction, that two distinct blocks and are committed for the same height . Assume that is committed as a result of being directly committed in epoch and is committed as a result of being directly committed in epoch . Without loss of generality, assume . Note that all directly committed blocks are certified. This is true because in order to start for block , a replica needs to receive (lines 26:–32: in Algorithm 1). By Lemma 6, extends . Therefore, , which contradicts the assumption that and are distinct. Hence, no two honest replicas can commit different blocks at the same height.
Lemma 8.
If an honest replica locks on a block in epoch , no honest replica starts epoch before updating to , where does not need to be equal to .
Proof.
Assume that an honest replica locks on a block in epoch at time . This implies receives at time and does not receive or before that. Since all messages representing are broadcast (lines 25: and 35: in Algorithm 1), all honest replicas receive these messages by time .
Suppose for a contradiction that some honest replica starts the epoch before receiving or some other , in other words at time . This means it had received or and broadcast messages representing them at time (line 17: in Algorithm 2). Consequently, replica receives or by time and as , it does not lock on , a contradiction.
Corollary 9.
Every honest replica starts epoch with that is at least as recent as any certificate any honest replica locks on in any epoch .
Proof.
Suppose that the last epoch in which some honest replica locks on a block is epoch . By Lemma 8, we know that all honest replicas update their to some certificate from the same epoch (), before starting epoch . From this and the fact that no honest replica, in any of the following epochs (), updates its to an older certificate (lines 33:–34: in Algorithm 1), we see that this corollary holds.
Theorem 10 (Progress).
All honest replicas keep committing new blocks.
Proof.
From Lemmas 1 and 2, we see that replicas proceed through epochs, each epoch having a dedicated leader. If the leader of an epoch is Byzantine and does not propose any block or proposes equivocating blocks, honest replicas will collect or and move to the next epoch. Due to the round-robin leader election, there will be epochs with honest leaders.
Consider an epoch with an honest leader . Let be the time when the first honest replica starts epoch . By Lemma 2, all honest replicas enter epoch by the time . Therefore, by the time at the latest, an honest leader broadcasts the proposal . All honest replicas receive proposal by time . Since by the Corollary 9, is at least as recent as any of any honest replica, all honest replicas vote for the proposal. As a result, all honest replicas receive by time . Since , no honest replica will send a message in epoch , and cannot be formed. Furthermore, considering that replica is honest, it does not equivocate, so no will be formed in epoch . Consequently, all honest replicas start , and when it expires, they commit and all its ancestors. This scenario will occur in every epoch with an honest leader, ensuring that all honest replicas consistently commit new blocks across all such epochs.
Theorem 11 (External validity).
Every committed block satisfies the predefined valid() predicate.
Proof.
This follows directly from the requirement that every committed block must first be certified (lines 26:, 32:, and 40: in Algorithm 1). This implies that at least one honest replica accepted the block, meaning that returned true for this block on at least one honest replica (line 19: in Algorithm 1).
A.2 The Byzantine protocol
-
1:
Initialization:
-
2:
the current epoch
-
3:
the most recent block certificate the replica is aware of and…
-
4:
the block certified by
-
5:
the set of honest replicas
-
6:
the number of Byzantine replicas
-
7:
the attack type the Byzantine replica launches
-
8:
the size of the two random sets of honest replicas that are under the attack
-
2:
-
9:
when bootstrapping do the execution starts in epoch 0
-
10:
Procedure : upon starting the epoch…
-
11:
the replica sets the current epoch, and…
-
12:
switch : invokes the specific attack and pass the necesarry arguments to it
-
13:
case EQUIVOCATION-ATTACK :
-
14:
-
14:
-
15:
case AMNESIA-ATTACK :
-
16:
-
16:
-
17:
case BLAME-ATTACK :
-
18:
-
18:
-
19:
case EQUIVOCATION-CERTIFICATE-ATTACK :
-
20:
-
20:
-
21:
case BLAME-CERTIFICATE-ATTACK :
-
22:
-
22:
-
13:
-
11:
-
23:
when receive and distinct when replica receives a proposal…
-
24:
where do and votes for it in the current epoch…
-
25:
from it forms a block certificate,…
-
26:
updates its and …
-
27:
to the most recent block, and…
-
28:
starts immidiately the next epoch
-
25:
-
29:
when receive and upon receiving two different proposals…
-
30:
where and and do from the leader in the current epoch:
-
31:
start the replica triggers
-
31:
-
32:
when receive distinct where do upon receiving a blame certificate in the…
-
33:
start current epoch the replica triggers
-
33:
-
34:
when expires do when expires and…
-
35:
if then the replica is stil in epoch e…
-
36:
the replica starts the next epoch
-
36:
-
35:
Algorithm 3 presents the Byzantine replica protocol. Byzantine replicas proceed through epochs in the same way as honest replicas. Namely, if they receive a block certificate, they start the next epoch immediately (lines 23:–28: in Algorithm 3), while if they receive a blame or equivocation certificate they wait for before starting the next epoch (lines 29:–36: in Algorithm 3). They do this to be synchronized with honest replicas so they can launch the attack at the moment that maximizes the attack’s effectiveness. Moreover, a Byzantine replica waits for to update its and to the most recent values. As a result, when leader in an epoch, a Byzantine replica can propose a valid block (i.e., an invalid block would be easily dismissed by honest replicas).
Algorithms 4 and 5 present the logic for attacks on BoundBFT’s agreement and progress, respectively. We empower the attacks by assuming that Byzantine replicas know each other and collude (Section 2.1): each Byzantine replica has private keys of all Byzantine replicas. Therefore, a Byzantine replica can sign and send messages on behalf of other Byzantine replicas.
Upon starting an epoch, a Byzantine replica launches a specific attack, which is a parameter of an algorithm:
-
EQUIVOCATION-ATTACK,
-
AMNESIA-ATTACK,
-
BLAME-ATTACK,
-
EQUIVOCATION-CERTIFICATE-ATTACK, or
-
BLAME-CERTIFICATE-ATTACK.
-
1:
Procedure ***EQUIVOCATION-ATTACK***
-
2:
-
3:
if then if the epoch leader is a Byzantine replica:
-
4:
the replica creates two distinct…
proposals
-
5:
then, the replica generates vote messages for , and…
-
6:
vote messages for , one for each Byzantine replica
-
7:
lastly, the replica divide honest replicas…
in two random sets of size
-
8:
send and to then, it sends the first proposal and votes for it to the first set, ,
-
9:
send and to while, sending the second proposal and its votes to the second set,
-
4:
-
3:
-
10:
Procedure ***AMNESIA-ATTACK***
-
11:
-
12:
if then if the epoch leader is a Byzantine replica:
-
13:
the replica generates…
an alternative proposal for
-
14:
then, the replica generates vote messages for ,…
one for each Byzantine replica
-
15:
send and to the replica sends the proposal and…
votes for it to the all honest replicas
-
13:
-
16:
else else, if the leader is an honest replica:
-
17:
the replica divides honest replicas…
in two random sets of size
-
18:
upon receiving then, when it receives the proposal…
-
19:
where do from epoch leader…
-
20:
it generates vote messages for received proposal and…
-
21:
blame messages for epoch , one for each Byzantine replica
-
22:
send to then, it sends votes to one subset of honest replicas, ,…
-
23:
send b to while sending blames to the second subset of honest replicas,
-
20:
-
17:
-
12:
-
1:
Procedure ***BLAME-ATTACK***
-
2:
-
3:
if then if the epoch leader is honest:
-
4:
the replica generates blame messages,…
one for each Byzantine replica, and…
-
5:
send to send them to all honest replicas
-
4:
-
3:
-
6:
Procedure ***EQUIVOCATION-CERT-ATTACK***
-
7:
-
8:
if then if the epoch leader is Byzantine:
-
9:
the replica generates…
two different proposals, and…
-
10:
vote messages only for , one for each Byzantine replica
-
11:
then, the replica divides honest replicas…
in two random sets of size
-
12:
send and to finally, it sends the first proposal and votes for it…
to the first set of honest replicas, ,…
-
13:
send and to while sending both proposals to the second set of honest replicas,
-
9:
-
8:
-
14:
Procedure ***BLAME-CERT-ATTACK***
-
15:
-
16:
if then if the epoch leader is Byzantine:
-
17:
the replica generates new proposal for epoch ,…
-
18:
vote messages for proposal, and…
-
19:
blame messages, one for each Byzantine replica
-
20:
then, replica divides honest replicas…
in two random sets of size
-
21:
send and to finally, it sends the proposal and votes for it…
to the first set of honest replicas, ,…
-
22:
send to while sending blame messages to the second set of honest replicas,
-
17:
-
16:
All Byzantine replicas launch the same attack, not only the current epoch leader. This ensures that messages arrive at their destinations as fast as possible. So, if the malicious leader is far from some honest replica, the honest replica will receive attack messages from its closest Byzantine replica. For example, if the attack is EQUIVOCATION-ATTACK, each Byzantine replica will generate two proposals and votes for these proposals and send one proposal and its votes to one subset of honest replicas and another proposal and its votes to the other subset of honest replicas.
In attacks where Byzantine replicas divide honest replicas into two subsets and send different messages to them, the subsets are picked randomly. The size of these subsets is a parameter of the algorithm, defined by . If is set to , function will return two different subsets, each containing two random elements. Byzantine replicas ensure they have the same subsets by using the current epoch number as a random number generator seed.
Appendix B Determining BoundBFT’s synchrony bound
This section presents the complete data of the experiments we used to determine BoundBFT’s synchronous bound (see Section 5.2). Namely, we implemented all proposed attacks (see Algorithm 3 and Figures 4 and 5). Then, we ran BoundBFT in our cluster while varying the number of Byzantine replicas . As a starting point, we set to 1250 ms (99.99%), the synchronous bound from [30], and gradually decreased it to the point where we started to observe BoundBFT’s agreement and progress violations. More than 300 hours worth of experiments were conducted in total. Tables 3 and 4 below show data for 1KB and 32KB block sizes, respectively.
