Abstract 1 Introduction 2 System Model 3 Probabilistic Reliable Broadcast 4 Witness Oracle 5 Security and Performance 6 Timeout and Recovery 7 Related Work References Appendix A Secret Sharing Scheme Appendix B Witness Oracle - Correctness Appendix C Security Against Passive Attacks Appendix D Recovery Protocol Appendix E Applications and Ramifications

Dynamic Probabilistic Reliable Broadcast

João Paulo Bezerra ORCID LTCI, Télécom Paris, Institut Polytechnique de Paris, France Veronika Anikina ITMO University, St. Petersburg, Russia Petr Kuznetsov ORCID LTCI, Télécom Paris, Institut Polytechnique de Paris, France Liron Schiff Akamai Technologies, Cambridge, MA, USA Stefan Schmid Technische Universität Berlin, Germany
Abstract

Byzantine reliable broadcast is a fundamental primitive in distributed systems that allows a set of processes to agree on a message broadcast by a dedicated process, even when some of them are malicious (Byzantine). It guarantees that no two correct processes deliver different messages, and if a message is delivered by a correct process, every correct process eventually delivers one. Byzantine reliable broadcast protocols are known to scale poorly, as they require Ω(n2) message exchanges, where n is the number of system members. The quadratic cost can be explained by the inherent need for every process to relay a message to every other process.

In this paper, we explore ways to overcome this limitation by casting the problem to the probabilistic setting. We propose a solution in which every broadcast message is validated by a small set of witnesses, which allows us to maintain low latency and small communication complexity. In order to tolerate the slow adaptive adversary, we dynamically select the witnesses through a novel stream-local hash function: given a stream of inputs, it generates a stream of output hashed values that adapts to small deviations of the inputs.

Our performance analysis shows that the proposed solution exhibits significant scalability gains over state-of-the-art protocols.

Keywords and phrases:
Reliable broadcast, probabilistic algorithms, witness sets, stream-local hashing, cryptocurrencies, accountability
Copyright and License:
[Uncaptioned image] © João Paulo Bezerra, Veronika Anikina, Petr Kuznetsov, Liron Schiff, and Stefan Schmid; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed algorithms
Acknowledgements:
This project was supported by TrustShare Innovation Chair. Akamai Technologies provided access to the hardware essential to our simulations. We would also like to thank Laurent Decreusefond, Emre Telatar, Nirupam Gupta and Anastasiia Kucherenko for fruitful discussions.
Editors:
Silvia Bonomi, Letterio Galletta, Etienne Rivière, and Valerio Schiavoni

1 Introduction

Modern distributed computing systems are expected to run in extremely harsh conditions. Besides communicating over weakly synchronous or even purely asynchronous communication networks, the processes performing distributed computations may be subject to failures: from hardware crashes to security attacks or malicious (Byzantine) behavior. In these environments, ensuring that a system never produces wrong outputs (safety properties) and, at the same time, makes progress by producing some outputs is extremely challenging. The distributed computing literature reveals a plethora of negative results, from theoretical lower bounds and impossibility results to empirical studies, that exhibit fundamental scalability limitations.

Efficient protocols that tolerate Byzantine failures are in high demand. Let us consider cryptocurrencies, by far the most popular decentralized application nowadays. Originally, cryptocurrency systems were designed on top of consensus-based blockchain protocols [37, 43]. However, consensus is a notoriously hard synchronization problem [19, 16, 10, 11]. It came as good news that we do not need consensus to implement a cryptocurrency [23, 24], which gave rise to asynchronous, consensus-free cryptocurrencies [14, 4, 27] that exhibit significant performance gains over the consensus-based protocols. At a high level, these implementations replace consensus with (Byzantine) reliable broadcast [6], where a designated sender broadcasts a message so that no two correct processes deliver different messages (consistency), either all correct processes deliver a message or none does (totality), and if the sender is correct, all correct processes eventually deliver the broadcast message (validity).

Starting from the classical Bracha’s algorithm [7], Byzantine reliable broadcast algorithms [32, 34, 41] are known to scale poorly, as they typically have O(n) per-process communication complexity, where n is the number of processes. This can be explained by their use of quorums [33, 42], i.e., sets of processes that are large enough (typically more than 2/3n) to ensure that any two such sets have at least one correct process in common. By relaxing the quorum-intersection requirement to only hold with high probability, the per-process communication complexity can be reduced to O(n) [35]. Guerraoui et al. [22] describe a probabilistic broadcast protocol that replaces quorums with randomly selected samples. This gossip-based broadcast consists of three phases, where each phase involves communication with a small (of the order O(logn)) randomly selected set of processes (a sample), which would give O(logn) per-process communication cost. It can be shown that assuming a static adversary and an underlying uniform random sampling mechanism, the protocol can be tuned to guarantee almost negligible probabilities of failing the properties of reliable broadcast.

In this paper, we take a step forward by introducing a probabilistic reliable-broadcast protocol that tolerates an adaptive adversary, incurs an even lower communication overhead. Our protocol replaces samples with witness sets: every broadcast message is assigned with a dynamically selected small subset of processes that we call the witnesses of this message.

The witnesses are approached by the receivers to check that no other message has been issued by the same source and with the same sequence number. The processes select the witness set by applying a novel stream-local hash function to the current random sample. The random sample is a set of random numbers that the participants periodically generate and propagate throughout the system, and we ensure that “close” random samples induce similar witness sets. To counter a dynamic adversary manipulating the random samples and, thus, the witness sets in its favor, the random numbers are generated and committed in advance using a secret-sharing mechanism [39, 15]. The committed random numbers are revealed only after a certain number of broadcast instances, which is a parameter of the security properties of our protocol. As a result, the protocol is resistant against a slowly adaptive adversary. We model the evolution of the random samples as an ergodic Markov process: the time for the adversary to corrupt a process is assumed to be much longer than the mixing time of a carefully defined random walk in a multi-dimensional space. Intuitively, even if the adversary introduces a biased value instead of a random one, by the time the value is used, the distribution of the random sample is close to uniform and the benefits of the bias are lost.

We argue that the desired safety properties of Byzantine reliable broadcast can be achieved under small, O(logn), witness sets. When the communication is close to synchronous, which we take as a common case in our performance analysis, the divergence between the random samples evaluated by different processes and, thus, the witness sets for the given broadcast event, is likely to be very small. Thus, in the common case, our broadcast protocol maintains O(nlogn) communication complexity, or O(logn) per node, similar to sample-based gossiping [22], by additionally exhibiting constant latency.

The current “amount of synchrony” may negatively affect the liveness properties of our algorithm: the less synchronous the network becomes, the less accurate may the witness set evaluation become and, thus, the longer it takes to deliver a broadcast message. Notice that we do not need the processes to perfectly agree on the witnesses for any particular event: thanks to the use of stream-local hashing, a sufficient overlap is enough. To compensate for liveness degradation when the network synchrony weakens, i.e., the variance of effective message delays increases, we propose a recovery mechanism relying on the classical (quorum-based) reliable broadcast [6].

Our comparative performance analysis shows that throughput and latency of our protocol scale better than earlier protocols [6, 22], which makes its use potentially attractive in large-scale decentralized services [14, 4].

The rest of the paper is organized as follows. In Section 2, we describe the system model and in Section 3, we formulate the problem of probabilistic reliable broadcast and present our baseline witness-based protocol. In Section 4, we extend the baseline protocol to implement probabilistic broadcast. In Section 5, we analyze the security properties of our protocol and we present the outcomes of our performance analysis. In Section 6, we sketch a recovery mechanism that can be used to complement our baseline witness-based protocol. We conclude the paper in Section 7 with a discussion of related work. Proofs and technical details of the secret-sharing and recovery schemes are provided in the appendix.

2 System Model

A system is composed of a set Π of processes. Every process is assigned an algorithm (we also say protocol). Up to f<|Π|/3 processes can be corrupted by the adversary. Corrupted processes might deviate arbitrarily from the assigned algorithm, in particular they might prematurely stop sending messages. A corrupted process is also called faulty (or Byzantine), otherwise we call it correct.

We assume a slow adaptive adversary: it decides which processes to corrupt depending on the execution, but there is a delay before the corruption takes effect. When selecting a process p to corrupt at a given moment in the execution, the adversary can have access to p’s private information and control its steps only after every other correct process has terminated Δ protocol instances, where Δ is a predefined parameter.111In this paper we consider broadcast protocols (Section 3), so that an instance terminates for a process when it delivers a message. In addition, we assume that previously sent messages by p cannot be altered or suppressed.

Every pair of processes communicate through authenticated reliable channels: messages are signed and the channel does not create, drop or duplicate messages. We assume that the time required to convey a message from one correct process to another is negligible compared to the time required for any individual correct process to terminate in Δ instances.222The time to process Δ instances can be in range of hours or days (see Section 5).

We use hash functions and asymmetric cryptographic tools: a public-key/private-key pair is associated with every process in Π [9]. The private key is known only to its owner and can be used to produce a signature for a message or statement, while the public key is known by all processes and is used to verify that a signature is valid. We assume that the adversary is computationally bounded: no process can forge the signature for a statement of a benign process. In addition, signatures satisfy the uniqueness property: for each public-key pk and a message m, there is only one valid signature for m relative to pk. The hash functions are modeled as a random oracle.

3 Probabilistic Reliable Broadcast

The Byzantine reliable broadcast abstraction [6, 9] exports operation 𝑏𝑟𝑜𝑎𝑑𝑐𝑎𝑠𝑡(m), where m belongs to a message set , and produces callback 𝑑𝑒𝑙𝑖𝑣𝑒𝑟(m), m. Each instance of reliable broadcast has a dedicated source, i.e., a single process broadcasting a message. In any execution with a set F of Byzantine processes, the abstraction guarantees the following properties:

  • (𝑉𝑎𝑙𝑖𝑑𝑖𝑡𝑦) If the source is correct and invokes 𝑏𝑟𝑜𝑎𝑑𝑐𝑎𝑠𝑡(m), then every correct process eventually delivers m.

  • (𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦) If p and q are correct and deliver m and m respectively, then m=m.

  • (𝑇𝑜𝑡𝑎𝑙𝑖𝑡𝑦) If a correct process delivers a message, then eventually every correct process delivers a message.

  • (𝐼𝑛𝑡𝑒𝑔𝑟𝑖𝑡𝑦) If the source p is correct and a correct process delivers m, then p previously broadcast m.

In a long-lived execution of reliable broadcast, each process maintains a history of delivered messages and can invoke 𝑏𝑟𝑜𝑎𝑑𝑐𝑎𝑠𝑡 an unbounded number of times, but correct processes behave sequentially, i.e., wait for the output of a 𝑏𝑟𝑜𝑎𝑑𝑐𝑎𝑠𝑡 invocation before starting a new one. The abstraction can easily be implemented using an instance of reliable broadcast for each message, by attaching to it the source’s 𝑖𝑑 and a sequence number [6].

Instances of reliable broadcast can also be probabilistic [22], in which case there is a probability for each instance that the protocol does not satisfy some property (e.g. violates 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦). If one uses instances of probabilistic reliable broadcast for building a long-lived abstraction, the probability of failure converges to 1 in an infinite run (assuming processes broadcast messages an infinite number of times). We therefore consider the expected failure time of a long-lived probabilistic reliable broadcast (L-PRB) as the expected number of broadcast instances by which the protocol fails.

Definition 1 (Long-lived Probabilistic Reliable Broadcast).

An ϵ-Secure L-PRB has an expected time of failure (average number of instances until some property is violated) of at least 1/ϵ instances.

3.1 Protocol Description

We first present an algorithm that implements Byzantine reliable broadcast using a distributed witness oracle ω. Intuitively, every process pi can query its local oracle module ωi to map each event e=(id,seq) (a pair of a process identifier and a sequence number) to a set of processes that should validate the seq-th event of process id. The oracle module ωi exports two operations: 𝑔𝑒𝑡𝑃𝑜𝑡𝑊𝑖𝑡𝑛𝑒𝑠𝑠𝑒𝑠(𝑖𝑑,𝑠𝑒𝑞), which returns a set Vi of processes potentially acting as witnesses for the pair (𝑖𝑑,𝑠𝑒𝑞), and 𝑔𝑒𝑡𝑂𝑤𝑛𝑊𝑖𝑡𝑛𝑒𝑠𝑠𝑒𝑠(𝑖𝑑,𝑠𝑒𝑞), that returns a set WiVi of witnesses particular to pi, referred to as pi’s witness set.

We now describe an algorithm that uses w to implement Byzantine reliable broadcast, a variation of Bracha’s algorithm [6] that instead of making every process gather messages from aquorum, we delegate this task to the witnesses.333A quorum is a subset of processes that can act on behalf of the system. A Byzantine quorum [33] is composed of q=n+f2+1 processes, for a system with n processes in which f are Byzantine. Each process waits for replies from a threshold k of its witnesses in Wi to advance to the next protocol phase. However witness sets can differ, so messages are sent to Vi to guarantee that every process acting as a witness can gather enough messages from the network. The protocol maintains correctness as long as for each correct process pi, there are at least k correct witnesses and at most k1 faulty witnesses in Wi. Moreover, Vi should include the witness set of every other correct process. In Section 4, we describe a method to select k and a construction of ω that satisfies these with very large expected failure time.

The pseudo-code for a single instance of Byzantine reliable broadcast (parameterized with a pair (𝑖𝑑,𝑠𝑒𝑞)) is presented in Algorithm 1. Here we assume the set of participants Π (|Π|=n) to be static: the set of processes remains the same throughout the execution. f denotes the number of faulty nodes tolerated in Π (see 3.2).

Algorithm 1 Witness Based Broadcast.

At the source s, a single instance of Witness-Based Broadcast (WBB) parameterized with (s,𝑠𝑒𝑞) is initialized when s invokes 𝑏𝑟𝑜𝑎𝑑𝑐𝑎𝑠𝑡(m), where (s,𝑠𝑒𝑞) is attached to m. On the remaining processes, the initialization happens when first receiving a protocol message associated to (s,𝑠𝑒𝑞). Upon initialization, processes sample Vi and Wi from ωi, which are fixed for the rest of the instance.

Each action is tagged with [S], [W] or [Π], where [S] is an action performed by the source, [W] is performed by a process acting as witness and [Π] by every process. A process pi can take multiple roles in the same instance and always takes actions tagged with [Π], but performs each action only once per instance. Moreover, pi (correct) acts as witness iff piVi. We assume every broadcast m to include the source’s signature so that every process can verify its authenticity.

Algorithm 2 uses WBB as a building block to validate and deliver messages. Clearly, if the validation procedure satisfies the reliable broadcast properties, then Algorithm 2 implements long-lived reliable broadcast.

Algorithm 2 Long-Lived Reliable Broadcast.

3.2 Protocol Correctness

Consider an instance of WBB where f<|Π|/3 and for every pi correct:

  1. 1.

    Wi has at least k correct witnesses and at most k1 faulty witnesses;

  2. 2.

    for every pj correct, WjVi.

Then the following theorem holds:

Theorem 2.

Algorithm 1 implements Byzantine reliable broadcast.

Proof.

𝑉𝑎𝑙𝑖𝑑𝑖𝑡𝑦: Let pi be a correct process and assume that a correct source is broadcasting m. Since for every pj correct WiVj (assumption 2), all correct witnesses in Wi receive n+f2+1 echoes and reply with 𝑅𝐸𝐴𝐷𝑌. From assumption 1, Wi has at least k correct witnesses that reply to pi, which in turn sends its own 𝑅𝐸𝐴𝐷𝑌 back. The validation phase follows similarly and pi delivers m after receiving k 𝑉𝐴𝐿𝐼𝐷𝐴𝑇𝐸 messages from witnesses.

𝐼𝑛𝑡𝑒𝑔𝑟𝑖𝑡𝑦: m is signed by the source and processes verify its authenticity, so a message broadcast is only delivered if the authentication is successful. By assumption, the adversary cannot crack the private key of a correct process and forge signatures.

𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦: Because Wi has at most k1 faulty processes, pi is guaranteed to receive a message from at least one correct process in lines 10 and 14 before proceeding to a new phase. A correct witness has to receive n+f2+1 𝑅𝐸𝐴𝐷𝑌,m,Π in order to send 𝑉𝐴𝐿𝐼𝐷𝐴𝑇𝐸,m. Since f<|Π|/3, every pair of subsets of n+f2+1 processes intersects in at least one correct process, thus two correct witnesses cannot send 𝑉𝐴𝐿𝐼𝐷𝐴𝑇𝐸 for distinct messages (this would require a correct process to send 𝑅𝐸𝐴𝐷𝑌 for distinct messages).

If a correct process delivers m, then at least n+f2+1 processes sent 𝑅𝐸𝐴𝐷𝑌,m,Π to a correct witness, thus, at least f+1 correct processes sent 𝑅𝐸𝐴𝐷𝑌,m,Π to every witness. Consequently from assumption 2, correct witnesses receive f+1 readies for a message and are able to trigger line 9 to send 𝑅𝐸𝐴𝐷𝑌,,W. In order to send 𝑅𝐸𝐴𝐷𝑌 without hearing from f+1 processes, witnesses gather echoes from n+f2+1 processes. Similarly to the 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 part of the proof, two correct witnesses cannot then send 𝑅𝐸𝐴𝐷𝑌 for distinct messages. Finally, 𝑇𝑜𝑡𝑎𝑙𝑖𝑡𝑦 holds from the fact that every Wi has at least k correct witnesses, which send 𝑅𝐸𝐴𝐷𝑌,m,Π to all processes.

For a single broadcast instance, the message complexity depends on the size of Vi (the set of potential witnesses). Let |Π|=n and v the expected size of Vi, the message complexity is O(nv). We assume that parameters for the witness-oracle (Section 4) are chosen such that v is Ω(logn), resulting in a complexity of O(nlogn). Moreover, WBB takes 5 message delays to terminate with a correct source.

4 Witness Oracle

Refer to caption

Figure 1: Illustration of how stream local hashing of similar histories (S and S^) results in similar witness set selections (W and W^).

The witness oracle is responsible for selecting a set of witnesses for each broadcast message. This selection should be unpredictable so that it is known only with close proximity to the time of receiving the broadcast message. We implement this service locally on each node and require that for every broadcast message and for any pair of nodes, the locally selected witness sets will be similar.

In order to provide unpredictable outputs, each node uses the hash of a history of random numbers that are jointly delivered with messages. The witness set used in a particular broadcast instance is defined according to the current history, and remains the same (for that instance) after the first oracle call.

Let Wil(𝑖𝑑,𝑠𝑒𝑞) and Vil(𝑖𝑑,𝑠𝑒𝑞) be the outputs for ωi.𝑔𝑒𝑡𝑂𝑤𝑛𝑊𝑖𝑡𝑛𝑒𝑠𝑠𝑒𝑠(𝑖𝑑,𝑠𝑒𝑞) and ωi.𝑔𝑒𝑡𝑃𝑜𝑡𝑊𝑖𝑡𝑛𝑒𝑠𝑠𝑒𝑠(𝑖𝑑,𝑠𝑒𝑞) respectively at the moment pi’s history has l elements. The witness oracle construction satisfies:

  • (Witness Inclusion) For any pair of correct nodes pi and pj, and instance (𝑖𝑑,𝑠𝑒𝑞):

    Wi(𝑖𝑑,𝑠𝑒𝑞)Vj(𝑖𝑑,𝑠𝑒𝑞)
  • (Unpredictability) Let S be the history of correct process p and w the average witness set size. Then for a given ϵ>0, there exists L such that, for any process q and LL:

    Pr(qWl+L(,)|S|=l)1w+ϵ

Witness Inclusion ensures that correct processes use close witness sets in the same instance. Unpredictability hinders processes from accurately predicting witness sets: based on p’s history S at any point in the execution, it is impossible to precisely determine whether any process q will be selected as a witness after L instances.

To support a high rate of broadcast messages, nodes do not wait or try to establish the same histories before hashing. Instead, we use a novel locality sensitive hashing scheme to ensure that small differences in histories will result in small differences in the selected witness set, as illustrated in Figure 1.

The history of random numbers can be regarded as a set. However, existing locality sensitive hashing algorithms for sets (such as those based on MinHash [8]) are less sensitive to small differences with bigger sets, which does not fit well the case of histories that are expected to grow indefinitely. Note that hashing just a sliding window within histories may result with distinct nodes having different hash values even if their histories are the same, due to differences in the order that values can be added.

Another disadvantage of existing locality sensitive hashing algorithms is their vulnerability to messages crafted to manipulate the hash result. For example in MinHash based algorithms, adding an item with extremely small hash value will most likely keep the hash of the entire set constant for a long time regardless of insertions of new items. In our case, when using history hash for witness selection, it could allow the adversary to make the witness selection predictable.

In short, let f:({0,1})rb be a stream local function that can hash a set of binary strings into a vector in b-dimensional modr torus. We require f to satisfy:

  • (Locality) Let Si,Sj({0,1}). There exists λ>0 and L such that:

    (|SiSjSiSj|L)distance(f(Si),f(Sj))λL

    For some pre-defined distance measure in rb.

Next, we show a construction of a safe stream local hashing for histories (𝑆𝐿𝐴𝑆𝐻), a hashing scheme that guarantees high degree of unpredictability and also similar outputs for similar histories. Later we show how 𝑆𝐿𝐴𝑆𝐻 can be used for witness selection.

4.1 Constructing a safe stream local hashing of histories

We consider a history as a set of binary strings and we assume the existence of a family of one-way functions Hc={hs:{0,1}{0,1}c}. For example, for c=256, we can use hs(x)=SHA256(xs) where s is a predefined random binary string with length at least c and “” is the bit concatenation operator.

We define a family of 𝑆𝐿𝐴𝑆𝐻 functions Fr,b,Hc={fs:({0,1})rb}, where b<2c. Each of these functions can hash a set of arbitrary binary strings into a vector in b-dimensional modr torus and should be locality sensitive in the sense that sets with small differences should be hashed to vectors with small distance, where vector distance is defined as follows:

TorusDistr,b(X,Y):=𝑚𝑎𝑥({min(XjYjmodr,YjXjmodr)|j{ 0,,b1}})

Our construction of 𝑆𝐿𝐴𝑆𝐻 functions Fr,b,Hc can be described as follows: each evaluation of fFr,b,Hc on a set S{0,1} defines a random walk in rb where each item x in S accounts for an independent random step based on the hash of x. More specifically:

f(S):=g(S0),g(S1),,g(Sb1),

where Sy:={xS|hs(x)modb=y} and g(V)=vV(1)hs(v)÷bmodr.

The distance between any two sets is not affected by shared items while each non-shared item increases or decreases the distance by at most 1.

Theorem 3 (Locality).

Let S,T({0,1}), then for any fFr,b,Hc:

TorusDistr,b(f(S),f(T))|STST|.

Proof.

Each element xS can be mapped to a single Sy, in which it either increases or decreases g(Sy) by 1. Suppose that STST={x} (so either S=T{x} or T=S{x}). Let hs(x)modb=y, then f(S) is identical to f(T) in all positions 0,,b1 except for y, where f(S)[y]=f(T)[y]±1. So:

TorusDistr,b(f(S),f(T))=1=|STST|

Now assume that TorusDistr,b(f(U),f(V))|UVUV| for |UVUV|=L, and let STST={x1,,xL+1}. Assume without loss of generality that xL+1S, and let S=S{xL+1}. Then by assumption, for any position m in 0,,b1:

min(f(S)[m]f(T)[m]modr,f(T)[m]f(S)[m]modr)L

Now let hs(xL+1)modb=y, then the inequalities above are satisfied for f(S) and f(T) at any position 0,,b1 except maybe for y, where:

min(f(S)[y]f(T)[y]modr,f(T)[y]f(S)[y]modr)L+1

Therefore: TorusDistr,b(f(S),f(T))|STST|.

Theorem 3 implies that 𝑆𝐿𝐴𝑆𝐻 satisfies Locality with λ=1.

We consider node ids to be binary strings of length c. During the execution, each node i maintains a view regarding the set of active node ids Πi{0,1}d and the history of delivered random numbers Ri. In addition, a 𝑆𝐿𝐴𝑆𝐻 function fj is maintained for each node in Πi. All nodes are initialized with the same functions fj as well as the one-way function hjHc, so that for each particular fj the id of node j is used as the seed for computing hj(x), in order to make each step computed for different fj independent.

To select witness sets, nodes are also initialized with a distance parameter d+. To determine the set of witnesses Wi(𝑖𝑑,𝑠𝑒𝑞), node i computes yj=fj(Ri) and then selects all nodes j for which yj is at distance at most d from the origin.

Wi={jΠi|TorusDistr,b(yj,[0])d}. (1)

Keeping a 𝑆𝐿𝐴𝑆𝐻 instance for each node does not add significant storage overhead, since we expect its size to be smaller than c in number of bits (see Section 5.1). Moreover, the computational cost of updating 𝑆𝐿𝐴𝑆𝐻 should also be significantly smaller than verifying digital signatures.

4.2 Secret Sharing

Unpredictability is achieved with a secret sharing protocol, in which a random number is secretly shared by the source at the start of every instance. In the reveal phase, the number is added to a local history and it is used to compute a random step in 𝑆𝐿𝐴𝑆𝐻.

We capitalize on the steady distribution of a random walk which is the uniform distribution in a torus444To guarantee uniform distribution, the diameter r needs to be odd [5], we can trivially achieve this by skipping the last point (so the diameter of each dimension becomes r1). (this means that as we increase the number of steps, the distribution of 𝑆𝐿𝐴𝑆𝐻 converges to uniform). The number of steps necessary to make the distribution of the random walk close to uniform is called the mixing time. For each new random number that is shared by a node, we delay its addition to the history by δ=mixTime/thc steps, where thc is the fraction of the throughput generated by correct sources.555One can adjust thc based on the broadcast rate of the nf nodes with smallest rate. Alternatively one can wait until the total number of additions to the history, committed by the nf nodes with smallest rate, has reached mixTime. This guarantees that the adversary cannot use it’s current state to issue carefully chosen numbers and make the outcome of 𝑆𝐿𝐴𝑆𝐻 close to any specific value.

To further prevent the adversary from biasing the outcome of 𝑆𝐿𝐴𝑆𝐻, we require each number to be generated by a verifiable source of randomness. This is achieved using the signature scheme and a hash function. Consider a random string s known to all nodes. To generate the random number x of instance (pi,𝑠𝑒𝑞), pi signs s(pi,𝑠𝑒𝑞) and assigns to x the hash of the resulting signature. The signature for s(pi,𝑠𝑒𝑞) is then used as proof that x was correctly generated. The random string s used for (pi,𝑠𝑒𝑞) is the random number x generated in the previous instance, and the original seed can be any agreed upon number.

We use Shamir’s secret sharing scheme [40], the protocol’s integration with WBB is described in Appendix A.

4.3 Witness Oracle – Correctness

The detailed analysis for the oracle correctness can be found in Appendix B. To verify the conditions under which Witness Inclusion and Unpredictability are satisfied, we assume that:

  • The interval between the first time a correct process recovers a secret x, and the last time a correct process does so is upper bounded by γ;

  • correct processes reveal numbers (adding them to their local history) at the same rate λ;

  • at any correct process and within any δ interval of time, there is a lower bound thc on the fraction of numbers revealed that come from correct processes.

Let d1 and d2 be the selection radius for Vi and Wi respectively (Equation 1), i.e., a node j is selected as witness if yj is at distance at most d2 from the origin in rb.

Theorem 4 (Witness Inclusion).

As long as d1d22λγ, for any pair of correct nodes pi and pj, and instance (𝑖𝑑,𝑠𝑒𝑞): Wi(𝑖𝑑,𝑠𝑒𝑞)Vj(𝑖𝑑,𝑠𝑒𝑞).

In Section 6 and Appendix D, we show how to avoid relying on the above condition by introducing a fallback protocol that ensures progress for correct processes, even in the absence of network synchrony. Now let w to be the predetermined average witness set size.

Theorem 5 (Unpredictability).

Let Si be the history of process pi at a particular moment in the execution. For any process pj:

Pr(pjWl+δ(,)|Si|=l)1w+eπ2δthc2br2

According to our model, it takes Δ instances for a node corruption to take effect. In our approach we set Δ=2δ since the same witnesses used in WBB are involved in the reveal phase δ instances later. The parameters δ, r and b are chosen based depend on the expected speed of adversarial corruption and the probability that the adversary can accurately predict the processes selected as witnesses. As shown in Section 5.1, this probability is negligible given a realistic adversarial corruption speed.

5 Security and Performance

The witness set selections depend on the positions of the random walks in rb, and there may happen that not enough good nodes or too many bad nodes arrive in the selection zone. Thus the average number of instances until some property of WBB is violated depends on the random walks properties. Next, we determine the expected failure time of the complete protocol by calculating the time it takes for the random walks to reach a position where a “bad” witness set is chosen. Note that in our approach, the probability of failure for a particular instance is dependent on previous instances, as the random walks depend on their previous state.

5.1 Expected Time of Failure

We say that the long-lived protocol fails when, in any WBB instance, a correct process selects a witness set containing at least k corrupted parties (consistency failure) or fewer than k correct ones (liveness failure). The adversary may attempt to corrupt processes dynamically to cause protocol failure as quickly as possible. However, as shown in Theorem 5, the probability of accurately predicting which processes to corrupt decreases with δthc. We choose δ (mixing time in Figure 2) to ensure that the total variation distance between the 𝑆𝐿𝐴𝑆𝐻 outcome and the uniform distribution is bounded by ϵ<220, giving the adversary a negligible advantage in predicting witnesses.

Given that the 𝑆𝐿𝐴𝑆𝐻 probability distribution is close to uniform from the time a value is shared to the moment it is added to the history, we consider every step to be independent from the 𝑆𝐿𝐴𝑆𝐻 states. In addition, since we are in the random model for hash functions, the distribution of the values issued by malicious nodes becomes uniform for a large number of values. We therefore assume that every value added to a local history comprises a random step in rb.

Random walks on br may randomly result in the selection of a “bad” witness set. We analyze this scenario bellow. Since the adversary cannot predict witness sets with significant advantage, we assume all adversarial nodes in the execution are corrupted from start, thereby maximizing the probability of selecting a sufficiently corrupted witness set.

We call gathering time the average number of steps until at least k nodes out of f are selected as witnesses, assuming that the initial distribution for each node is uniform and independent. We give a detailed discussion of the gathering time calculations in Appendix C. Intuitively, we relate the problem to the well known hitting time [18] of Markov chains, in which one calculates the first time the chain reaches a particular set of states. Let Xl be the number of corrupted nodes in the witness selection area after l steps, we consider a Markov Chain {Xl|l} whose possible states are 0,,f. The gathering time is then the average number of steps l such that Xlk. We compare the numbers obtained from our analysis with simulations of f random walks666We choose f=0.25n, based on the total number of nodes n we consider for each run. in rb in Appendix C, the results show that our approach gives conservative777In general, our estimations are at least 100 times smaller than the simulated values. gathering times.

The problem is the same when it comes to calculating the time until the witness set is not live. The possible states are 0,,nf and we calculate the average number of steps l such that Xl<k.

The choice of parameters for the witness oracle depends on the characteristics of the application. Selecting a bigger diameter r increases the average time of failure, but also increases the mixing time, and thus the system can tolerate adversaries that take longer to corrupt nodes. We consider the example of a system with 1024 nodes, and give parameters so that the mixing time is in a practical range.888The mixing time is 107.56 steps for b=4 and r=210. If we consider a throughput of 1000 broadcast messages per second, then the system takes around 10 hours to mix. Figure 2 shows the expected time of failure and the mixing time for b=4, r=210 and k=0.45w, where w is the expected witness set size.

Refer to caption

Figure 2: Average number of instances for gathering according to expected witness set size. The parameters used are: n=1024, b=4, r=210, k=0.45w. Each curve is for distinct t, the fraction of malicious nodes.

To contextualize Figure 2, we compare our protocol’s expected time of failure with two well-known probabilistic protocols: the gossip based reliable broadcast by Guerraoui et al. [22] and the Algorand protocol [20].

Guerraoui et al. [22] propose a one-shot probabilistic reliable broadcast that can be used in Algorithm 2 to implement L-PRB. In their construction, each instance has an independent probability of failure based on the sample size, with a communication complexity of O(nsample size), similar to WBB’s O(nv). For a system with 1024 nodes and t=0.15, they require a sample size of approximately 250 nodes to achieve an expected failure time of 1012 broadcast instances. In contrast, our protocol achieves the same expected time of failure with just 100 witnesses.

Algorand [20] employs a randomly selected committee to solve Byzantine Agreement, where the committee size influences the failure probability. In a system with t=0.2, Algorand requires a committee size of 2000 nodes to achieve an average failure time of 5109, whereas our approach needs fewer than 130 witnesses.

Instead of relying on a single, evolving witness set to validate all messages, multiple independent histories can be maintained, each assigned based on the issuer’s ID and sequence number. This creates parallel dynamic witness sets, where each set validates an equal portion of the workload. The hitting time for multiple random walks decreases linearly with the number of walks [1], as does the gathering time. However, since each history receives a fraction of the random numbers proportional to the number of witness sets, the total gathering time remains roughly the same. On the downside, the overall number of delivered messages required to ensure unpredictability increases proportionally with the number of witness sets.

5.2 Scalability – A Comparative Analysis

We use simulations to make a comparative analysis between Bracha’s reliable broadcast, the probabilistic reliable broadcast from [22] (hereby called scalable broadcast) and our witness-based reliable broadcast. We implemented all three protocols in Golang, the protocol and simulation codes are available at [2, 3]. For the simulation software, we used Mininet [30] to run all processes in a single machine.

We used a Linode dedicated CPU virtual machine [31] with 64 cores, 512GB of RAM, running Linux 5.4.0-148-generic Kernel with Mininet version 2.3.0.dev6 and Open vSwitch [38] version 2.13.8.

We chose witness set and sample sizes so that both WBB and scalable broadcast have longevity of 106 instances when f=0.1. For the witness set size, we use W=2log(n) and V=3log(n), the parameters selected for the 𝑆𝐿𝐴𝑆𝐻 construction are the same as in Figure 2 and 8 parallel witness sets are used. For the sample size we use 5log(n). The small sizes allow us to better compare the performance of both protocols with Bracha’s broadcast for a small number of nodes.

Since in a simulated environment we can change network parameters to modify the system’s performance, we analyze normalized values instead of absolute ones, using Bracha’s protocol as the base line.999Simulating hundreds of nodes on a single server presents significant performance challenges. In throughput tests involving a large number of nodes, we allocate reduced resources, such as limited bandwidth and CPU time, to each node. Processes are evenly distributed in a tree topology structure, with 16 leaves on the base connected by a single parent node. The bandwidth speed is limited to 20Mbps per link.

In the first simulation, we observe each protocol’s performance with a high volume of transmitted messages and how it relates to the number of processes. Each process initiates broadcast of a new message once the previous one was delivered throughout all the experiment. We then measure the achievable throughput (number of delivered messages per second) and average latency for different system sizes as it’s show in Figures 3(a) and 3(b).

Refer to caption
(a) Throughput according to the number of nodes.
Refer to caption
(b) Latency according to the number of nodes.
Figure 3: Normalized throughput and latency according to the number of processes.

The superior asymptotic complexity of the probabilistic protocols is illustrated in Figures 3(a) and 3(b), where both WBB and scalable broadcast show improved performance relative to Bracha’s broadcast as the number of processes increases. Additionally, WBB scales consistently better than scalable broadcast, as it requires fewer processes for message validation overall.

6 Timeout and Recovery

Based on the assumptions outlined in Section 4.3, the level of synchrony may contribute to discrepancies in local histories among correct processes. A process with a significantly divergent history from the rest of the system may attempt to validate messages using a non-responsive witness set.

In this section, we present an extension of our protocol that includes a liveness fallback: if it takes too long for a process to validate a message, a timeout mechanism and a recovery protocol are triggered to ensure progress.

First, to account for time, we modify the WBB initialization block: after sampling the witnesses we call the method 𝑠𝑒𝑡𝑇𝑖𝑚𝑒𝑜𝑢𝑡()(Algorithm 3). This method guarantees that a 𝑡𝑖𝑚𝑒𝑜𝑢𝑡 is triggered after a predetermined amount of time has passed without a successful validation. Once a process receives a protocol message containing a new broadcast message m, it initializes a new instance of message validation and relays m to the corresponding witnesses in a NOTIFY message.

Algorithm 3 WBB – Initialization Update.

When a 𝑡𝑖𝑚𝑒𝑜𝑢𝑡 event is triggered, a process resorts to the recovery protocol outlined in Algorithm 7 (Appendix D). The actions on the recovery protocol should be executed by every process, and are performed only once a process triggers 𝑡𝑖𝑚𝑒𝑜𝑢𝑡 or delivers a message in that instance.101010Messages from the recovery protocol received by a process before satisfying any of these conditions are then stored and treated later.

The idea behind the algorithm is to “recover” any value that could have possibly been delivered in the witness validation procedure, and then execute Bracha’s Byzantine reliable broadcast [6]. We achieve this by making processes echo the last WBB they acknowledged, so that if a message is delivered by any process using WBB, no distinct message can be delivered using the recovery protocol. A detailed description of the protocol, as well as correctness proofs can be found in Appendix D.

The inclusion of the recovery protocol ensures progress if WBB is not live for a correct process, albeit with a higher communication cost as we use Bracha’s Byzantine reliable broadcast which has communication complexity of O(n2). Reduced communication using WBB is thus achieved in optimistic runs, when network asynchrony is not too severe.

7 Related Work

Solutions designed for partially synchronous models [17] are inherently optimistic: safety properties of consensus are always preserved, but liveness is only guaranteed in sufficiently long periods of synchrony, when message delays do not exceed a pre-defined bound. This kind of algorithms, also known as indulgent [21], were originally intended to solve the fundamental consensus problem, which gave rise to prominent partially synchronous state-machine replication protocols [29, 10] and, more recently, partially-synchronous blockchains.

Reliable broadcast protocols [6] provide a weaker form synchronization than consensus: instead of reaching agreement on the total order of events, reliable broadcast only establishes common order on the messages issued by any given source. This partial order turns out handy in implementing “consensus-free” asset transfer [23, 24], by far the most popular blockchain application, and resulting implementations are simpler, more efficient and more robust than consensus-based solutions [4, 14, 27].111111These implementations, however, assume that no account can be concurrently debited. i.e., no conflicting transactions must ever be issued by honest account owners [28]. To alleviate O(n2) communication complexity of classical quorum-based reliable broadcast algorithms [7, 32, 34, 41], one can resort to probabilistic relaxations of its properties [35, 22]. The probabilistic broadcast protocol in [22] achieves, with a very high level of security, O(nlogn) expected complexity and O(logn/loglogn) latency by replacing quorums with randomly selected samples of O(logn) size. In this paper, we propose an optimistic probabilistic reliable broadcast algorithm that, in a good run, exhibits even better security (due to the “quasi-deterministic” though unpredictable choice of witnesses) and, while achieving the same communication complexity, constant expected latency. For simplicty, as a fall-back solution in a bad run, we propose the classical O(n2) broadcast protocol [6]. However, one can also use the protocol of [22] here, which would give lower costs with gracefully improved security.

In the Algorand blockchain [20, 12], scalable performance is achieved by electing a small-size committee for each new block. To protect the committee members from a computationally bound adversary, one can use verifiable random functions (VRF) [36]. The participants use a hash of the last block and their private keys as inputs to the VRF that returns a proof of selection. As the proof is revealed only when a committee member proposes the next block, the protocol is protected against an adaptive adversary. The approach was later applied to asynchronous (randomized) Byzantine consensus [13], assuming trusted setup and public-key infrastructure (PKI). Similar to [20, 13], we use local knowledge for generating unpredictable process subsets of a fixed expected size. In contrast, our protocol does not assume trusted setup. We generate pseudo-randomness based on the local states, without relying on external sources (e.g., the blockchain state). However, we only tolerate a slow adversary (time to corrupt a node considerably exceeds communication delay).

References

  • [1] Noga Alon, Chen Avin, Michal Koucky, Gady Kozma, Zvi Lotker, and Mark R Tuttle. Many random walks are faster than one. In Proceedings of the twentieth annual symposium on parallelism in algorithms and architectures, pages 119–128, 2008. doi:10.1145/1378533.1378557.
  • [2] Veronika Anikina and João Paulo Bezerra. Mininet distributed system simulation, 2024. URL: https://github.com/testJota/mininet-distributed-system-simulation.
  • [3] Veronika Anikina and João Paulo Bezerra. Stochastic checking simulation, 2024. URL: https://github.com/interestIngc/stochastic-checking-simulation.
  • [4] Mathieu Baudet, George Danezis, and Alberto Sonnino. Fastpay: High-performance byzantine fault tolerant settlement. In Proceedings of the 2nd ACM Conference on Advances in Financial Technologies, pages 163–177, 2020. doi:10.1145/3419614.3423249.
  • [5] Nathanaël Berestycki. Mixing times of markov chains: Techniques and examples. Alea-Latin American Journal of Probability and Mathematical Statistics, 2016.
  • [6] Gabriel Bracha. Asynchronous Byzantine agreement protocols. Information and Computation, 75(2):130–143, 1987. doi:10.1016/0890-5401(87)90054-X.
  • [7] Gabriel Bracha and Sam Toueg. Asynchronous Consensus and Broadcast Protocols. JACM, 32(4), 1985. doi:10.1145/4221.214134.
  • [8] Andrei Z Broder. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171), pages 21–29. IEEE, 1997. doi:10.1109/SEQUEN.1997.666900.
  • [9] Christian Cachin, Rachid Guerraoui, and Luís Rodrigues. Introduction to reliable and secure distributed programming. Springer Science & Business Media, 2011.
  • [10] Miguel Castro and Barbara Liskov. Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems (TOCS), 20(4):398–461, 2002. doi:10.1145/571637.571640.
  • [11] Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. The weakest failure detector for solving consensus. J. ACM, 43(4):685–722, July 1996. doi:10.1145/234533.234549.
  • [12] Jing Chen and Silvio Micali. Algorand. arXiv preprint, 2016. arXiv:1607.01341.
  • [13] Shir Cohen, Idit Keidar, and Alexander Spiegelman. Not a coincidence: Sub-quadratic asynchronous byzantine agreement whp. arXiv preprint, 2020. arXiv:2002.06545.
  • [14] Daniel Collins, Rachid Guerraoui, Jovan Komatovic, Petr Kuznetsov, Matteo Monti, Matej Pavlovic, Yvonne Anne Pignolet, Dragos-Adrian Seredinschi, Andrei Tonkikh, and Athanasios Xygkis. Online payments by merely broadcasting messages. In 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2020, Valencia, Spain, June 29 - July 2, 2020, pages 26–38. IEEE, 2020. doi:10.1109/DSN48063.2020.00023.
  • [15] Sourav Das, Zhuolun Xiang, and Ling Ren. Asynchronous data dissemination and its applications. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 2705–2721, 2021. doi:10.1145/3460120.3484808.
  • [16] Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. J. ACM, 35(2):288–323, April 1988. doi:10.1145/42282.42283.
  • [17] Cynthia Dwork, Nancy A. Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288–323, April 1988. doi:10.1145/42282.42283.
  • [18] Robert Elsässer and Thomas Sauerwald. Tight bounds for the cover time of multiple random walks. In Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I, ICALP ’09, pages 415–426, Berlin, Heidelberg, 2009. Springer-Verlag. doi:10.1007/978-3-642-02927-1_35.
  • [19] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. JACM, 32(2):374–382, April 1985. doi:10.1145/3149.214121.
  • [20] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich. Algorand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th symposium on operating systems principles, pages 51–68, 2017. doi:10.1145/3132747.3132757.
  • [21] Rachid Guerraoui. Indulgent algorithms (preliminary version). In PODC, pages 289–297. ACM, 2000. doi:10.1145/343477.343630.
  • [22] Rachid Guerraoui, Petr Kuznetsov, Matteo Monti, Matej Pavlovic, and Dragos-Adrian Seredinschi. Scalable byzantine reliable broadcast. In Jukka Suomela, editor, 33rd International Symposium on Distributed Computing, DISC 2019, October 14-18, 2019, Budapest, Hungary, volume 146 of LIPIcs, pages 22:1–22:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.DISC.2019.22.
  • [23] Rachid Guerraoui, Petr Kuznetsov, Matteo Monti, Matej Pavlovic, and Dragos-Adrian Seredinschi. The consensus number of a cryptocurrency. In PODC, 2019. arXiv:1906.05574.
  • [24] Saurabh Gupta. A Non-Consensus Based Decentralized Financial Transaction Processing Model with Support for Efficient Auditing. Master’s thesis, Arizona State University, USA, 2016.
  • [25] Andreas Haeberlen and Petr Kuznetsov. The fault detection problem. In OPODIS, pages 99–114, 2009. doi:10.1007/978-3-642-10877-8_10.
  • [26] Andreas Haeberlen, Petr Kuznetsov, and Peter Druschel. PeerReview: Practical accountability for distributed systems. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07), October 2007.
  • [27] Petr Kuznetsov, Yvonne-Anne Pignolet, Pavel Ponomarev, and Andrei Tonkikh. Permissionless and asynchronous asset transfer. In DISC 2021, volume 209 of LIPIcs, pages 28:1–28:19, 2021. doi:10.4230/LIPICS.DISC.2021.28.
  • [28] Petr Kuznetsov, Yvonne-Anne Pignolet, Pavel Ponomarev, and Andrei Tonkikh. Cryptoconcurrency: (almost) consensusless asset transfer with shared accounts. CoRR, abs/2212.04895, 2022. doi:10.48550/arXiv.2212.04895.
  • [29] Leslie Lamport. The Part-Time parliament. ACM Transactions on Computer Systems, 16(2):133–169, May 1998. doi:10.1145/279227.279229.
  • [30] Bob Lantz, Brandon Heller, and Nick McKeown. A network in a laptop: rapid prototyping for software-defined networks. In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, pages 1–6, 2010.
  • [31] Linode (Akamai). Plan Types - Dedicated CPU Compute Instances. https://www.linode.com/docs/products/compute/compute-instances/plans/dedicated-cpu/, 2023. [Online; accessed 10-May-2023].
  • [32] Dahlia Malkhi, Michael Merritt, and Ohad Rodeh. Secure reliable multicast protocols in a WAN. In ICDCS, pages 87–94. IEEE Computer Society, 1997. doi:10.1109/ICDCS.1997.597857.
  • [33] Dahlia Malkhi and Michael Reiter. Byzantine quorum systems. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 569–578. ACM, 1997. doi:10.1145/258533.258650.
  • [34] Dahlia Malkhi and Michael K. Reiter. A high-throughput secure reliable multicast protocol. Journal of Computer Security, 5(2):113–128, 1997. doi:10.3233/JCS-1997-5203.
  • [35] Dahlia Malkhi, Michael K Reiter, Avishai Wool, and Rebecca N Wright. Probabilistic quorum systems. Inf. Comput., 170(2):184–206, November 2001. doi:10.1006/INCO.2001.3054.
  • [36] Silvio Micali, Michael Rabin, and Salil Vadhan. Verifiable random functions. In 40th annual symposium on foundations of computer science (cat. No. 99CB37039), pages 120–130. IEEE, 1999. doi:10.1109/SFFCS.1999.814584.
  • [37] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008.
  • [38] Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan J. Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Jonathan Stringer, Pravin Shelar, Keith Amidon, and Martín Casado. The design and implementation of open vswitch. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, pages 117–130, USA, 2015. USENIX Association. URL: https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/pfaff.
  • [39] Irving S Reed and Gustave Solomon. Polynomial codes over certain finite fields. Journal of the society for industrial and applied mathematics, 8(2):300–304, 1960.
  • [40] Adi Shamir. How to share a secret. Communications of the ACM, 22(11):612–613, 1979. doi:10.1145/359168.359176.
  • [41] Sam Toueg. Randomized byzantine agreements. In Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, PODC ’84, pages 163–178, New York, NY, USA, 1984. ACM. doi:10.1145/800222.806744.
  • [42] Marko Vukolic. The origin of quorum systems. Bulletin of the EATCS, 101:125–147, 2010. URL: http://eatcs.org/beatcs/index.php/beatcs/article/view/183.
  • [43] Gavin Wood. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper, 151(2014):1–32, 2014.

Appendix A Secret Sharing Scheme

In order to prevent the adversary from biasing the outcome of 𝑆𝐿𝐴𝑆𝐻, we require each number to be generated by a verifiable source of randomness. This is achieved using the signature scheme and a hash function. Consider a random string s known to all nodes, to generate the random number x of instance (pi,𝑠𝑒𝑞), pi signs s(pi,𝑠𝑒𝑞) and assigns to x the hash of the resulting signature. The signature for s(pi,𝑠𝑒𝑞) is then used as proof that x was correctly generated. The random string s used for (pi,𝑠𝑒𝑞) is the random number x generated in the previous instance, and the original seed can be any agreed upon number.

On top of that, we capitalize on the steady distribution of a random walk which is the uniform distribution in a torus121212To guarantee uniform distribution, the diameter r needs to be odd [5], we can trivially achieve this by skipping the last point (so the diameter of each dimension becomes r1). (this means that as we increase the number of steps, the distribution of 𝑆𝐿𝐴𝑆𝐻 converges to uniform). The number of steps necessary to make the distribution of the random walk close to uniform is called the mixing time. For each new random number that is broadcast by a node, we delay its addition to the history by δ=mixTime/thc steps, where thc is the fraction of the throughput generated by correct sources.131313One can adjust thc based on the broadcast rate of the nf nodes with smallest rate. Alternatively one can wait until the total number of additions to the history, committed by the nf nodes with smallest rate, has reached mixTime. This guarantees that the adversary cannot use it’s current state to issue carefully chosen numbers and make the outcome of 𝑆𝐿𝐴𝑆𝐻 close to any specific value, in other words, the adversary does not know if the delayed step will bring the value of the hash closer of farther to a desirable value.

The unpredictability is achieved with a secret sharing protocol, in which the source’s signature for (𝑖𝑑,𝑠𝑒𝑞) is shared and only revealed after δ messages are delivered. The goal of the protocol is to guarantee that:

  • Correct processes agree on the revealed number.

  • No information about a number issued by a correct process is revealed until at least one correct process starts the reveal phase.

We use Shamir’s secret sharing scheme [40] abstracted as follows: the split method SH.Split(x,n,f+1) takes a string x and generates n shares such that any f+1 shares are sufficient to recover x. The recover method SH.Recover(X) takes a vector with f+1 shares as input and outputs a string x such that, if X was generated using SH.Split(x,n,f+1), then x=x. In addition, no information about x is revealed with f or less shares.

The pseudo-codes in Algorithms 4 and 5 describe the complete protocol. In short, the source p first generates the signature π for s(𝑖𝑑,𝑠𝑒𝑞) and splits π in n shares. Next, p includes the corresponding share to each NOTIFY message. We assume that the source’s signature for the message with the share is also sent alongside it.

Algorithm 4 Secret Sharing Protocol - Sharing Phase.
Algorithm 5 Secret Sharing Protocol - Reveal Phase.

When delivering the message associated with (𝑖𝑑,𝑠𝑒𝑞), nodes store the current number of delivered messages in the history. After delivering δ new messages, each process starts executing the reveal phase using the same witnesses to recover the secret. Each process then sends its share to the witnesses which, after gathering enough shares, either reveal the secret or build a proof that the shares were not correctly distributed by the source. In the former case, each process verifies the revealed signature π and adds the hash of π to secret, defining a step in the random walk on 𝑆𝐿𝐴𝑆𝐻. In the latter, when receiving such proof, a process marks the faulty source as “convicted”.

The following results assume correctness of the underlying WBB protocol.

Proposition 6.

If pi and pj correct add x and x to secret[𝑖𝑑,𝑠𝑒𝑞] respectively, then x=x.

Proof.

Before adding x or x to secret, they check whether the revealed signature π is valid for the string s(𝑖𝑑,𝑠𝑒𝑞) (line 11). From the uniqueness property of the signature scheme, it follows that x=x.

Proposition 7.

If a correct process pi adds x to secret[𝑖𝑑,𝑠𝑒𝑞], then eventually every correct process adds x to secret[𝑖𝑑,𝑠𝑒𝑞].

Proof.

After receiving 𝐷𝑂𝑁𝐸,𝑖𝑑,𝑠𝑒𝑞,π from a witness, pi relays the message to every witness in Vi (line 14). Because the underlying WBB is correct, WjVi for any pj correct. Thus, there is a common correct witness that receives π from pi and relays it to pj, which adds h(π) to secret (note that since pi added x=h(π) to secret, π is a valid signature).

Proposition 8.

Let π be the signature a correct process shares at instance (𝑖𝑑,𝑠𝑒𝑞), then any other process can only reveal π if at least one correct process delivers δ new messages after delivering (,𝑖𝑑,𝑠𝑒𝑞).

Proof.

From Shamir’s construction, no information about π is revealed unless a process gathers f+1 shares [40]. A correct process does not reveal its share unless it has delivered δ new messages (line 1), and a correct source does not reveal the secret π. It follows that π can be revealed only when at least one correct process reveals its share.

Detecting when a source incorrectly distributes shares is important to prevent the adversary from controlling when the secret is revealed. A malicious process can send malformed shares that cannot be recovered by correct processes, and later (in an arbitrary time) send 𝐷𝑂𝑁𝐸 with the correct secret through a corrupted witness. We employ the detection mechanism to discourage malicious sources from distributing bad shares. Processes relay the proof as soon as an incorrect share is detected, leaving little time for malicious nodes to take advantage of the incorrect distribution. Correct processes can subsequently exclude the misbehaving party from the system.

One can also guarantees the recovery of a secret (independently of adversarial action) with a stronger primitive called Verifiable Secret Sharing (VSS [15]). The asynchronous VSS protocol presented in [15] allows nodes to check their shares against a commitment that must be reliably broadcast, thus preventing a malicious source from distributing bad ones. One can replace the reliable broadcast with WBB, attaching the commitment to the broadcast message to reduce communication complexity. Because the size of the commitment can be large and the scheme computationally intensive, we use Algorithms 4 and 5 as a more efficient approach. For the remaining of the paper, we consider optimistic runs of the algorithms in which the detection mechanism successfully deter malicious participants from distributing incorrect shares.

Appendix B Witness Oracle - Correctness

The pseudo-code for the witness oracle implementation is described in Algorithm 6. Parameters d1 and d2 are the distances of the selection area for Vi and Wi respectively, while Si is node i’s current local history of random numbers.

Algorithm 6 Local Witness Oracle - code for process i.

We now analyse the properties of the resulting protocol composed of Algorithms 264 and 5. To combine Algorithm 6 with 5, we make Sisecret[𝑖𝑑][𝑠𝑒𝑞]. First, to verify the conditions under which Witness Inclusion and Unpredictability are satisfied, we assume that:

  • The interval between the first time a correct process adds x to secret, and the last time a correct process does so is upper bounded by γ;

  • correct processes add new numbers to secret at the same rate λ;

  • at any correct process and within any δ interval of time, there is a lower bound thc on the fraction of numbers added to secret that come from correct processes.

Recall that d1 and d2 in Algorithm 6 are the selection radius for Vi and Wi respectively, i.e., a node j is selected as witness if yj is at distance at most d2 from the origin in rb.

Theorem 4 (Witness Inclusion).

As long as d1d22λγ, for any pair of correct nodes pi and pj, and instance (𝑖𝑑,𝑠𝑒𝑞): Wi(𝑖𝑑,𝑠𝑒𝑞)Vj(𝑖𝑑,𝑠𝑒𝑞).

Proof.

Let Ri(t) and Rj(t) be pi’s and pj’s histories at a particular time t respectively. The condition Wi(𝑖𝑑,𝑠𝑒𝑞)Vj(𝑖𝑑,𝑠𝑒𝑞) is guaranteed if TorusDistr,b(fi(Ri),fi(Rj))d1d2.

By assumption, all numbers in Ri(tγ) are already in Rj(t). The only numbers that might be in Ri(t)Rj(t)Ri(t)Rj(t) are those pi and pj have added to their histories in (tγ,t], therefore, because they add numbers at a maximum rate of λ:

|Ri(t)Rj(t)Ri(t)Rj(t)|2λγ

Moreover, TorusDistr,b(f(S),f(T))|STST| from the Locality property of 𝑆𝐿𝐴𝑆𝐻. Thus Wi(𝑖𝑑,𝑠𝑒𝑞)Vj(𝑖𝑑,𝑠𝑒𝑞) is satisfied whenever d1d22λγ.

We generalize an upper bound on the mixing time of a random walk on the circle n [5], by including a multiplicative factor of b to calculate the mixing time of a random walk on rb:

MixingTime(rb,ϵ)2br2ln(ϵ)π2

Where ϵ is the upper bound on the distance between the random walk distribution μ and the uniform distribution υ, using the total variation distance:

TotalVariationDistance(μ,υ)=12xrb|μ(x)υ(x)|<ϵ

Consider w to be the predetermined average witness set size, and d the corresponding distance selected to achieve w.

Theorem 5 (Unpredictability).

Let Si be the history of process pi at a particular moment in the execution. For any process pj:

Pr(pjWl+δ(,)|Si|=l)1w+eπ2δthc2br2

Proof.

Let yj=fj(Si), Siδ be pi’s history after adding δ new numbers to Si and yj=fj(Siδ). Calculating the probability that pjWl+δ(,) is equivalent to calculate the probability that yj is within distance d from the origin.

Any of the next δ numbers pi will reveal must be already shared by its source and the corresponding instance finished delivered by pi, so that pi can start counting the number of instances that passed before revealing the number. Any number originated from a correct process is shared regardless of the current state and thus comprise a random step in rb when applying fj, note that there are at least δthc such numbers. On the other hand, numbers originated from malicious nodes were already shared based on information up to Si. Thus, we can apply these numbers in any order from Si, since they do not depend in later states of pi’s history.

Consider the resulting hash value yj′′ after applying all numbers issued by malicious nodes at once from Si. Now, when we apply the remaining steps coming from correct processes, it is equivalent of performing a random walk starting from yj′′, and the upper bound on the mixing time applies to this case as well.

The probability that yj is within distance d from the origin is given by the resulting distribution υ after applying all remaining steps. If the distribution is uniform, then the probability is 1/w. The maximum the probability can variate is given by the TotalVariationDistance, which when replacing ϵ becomes:

TotalVariationDistance(μ,υ)<eπ2Nsteps2br2

Where Nsteps is the number of steps in the random walk. The result follows by replacing Nsteps with the minimum amount of numbers coming from correct processes: δthc.

Appendix C Security Against Passive Attacks

In the passive attack, the adversary waits until the history hash will be such that the selected witness set for a malicious message will contain many compromised nodes. The selection of each compromised node depends on what can be considered as an independent random walk (the per node history hash) arriving to a small subspace in rb (within a defined distance from the origin).

The expected ratio of compromised nodes in the witness set is their ratio in the total population. We argue that the number of random walk steps (new numbers revealed and added to the local history) required to obtain a much higher ratio of compromised witnesses can be very high. One key indication that it takes long for multiple random walks to co-exist in the same region in rb is the linear speedup in parallel coverage time of rb [18].

C.1 The Gathering Time Problem

We are interested in the average occurrence time of two events: the time until the number of selected compromised nodes exceeds a given ratio of the expected witness set size, and the time until the number of selected correct nodes is smaller than said ratio. The former event is discussed first, which we call Gathering Time since the adversary waits until enough random walks gather in a defined area.

Our problem is related to Hitting Time bounds [18] that were well studied in recent years and consider the time it takes for one (or many) random walks to arrive to a specific destination point. The Gathering Time differs from the Hitting Time in two main aspects: 1) we require that the random walks will be at the destination area at the same time and 2) we do not require all random walks to arrive at the destination but instead we are interested in the first time that a subset of them, of a given size, will arrive at the destination area. Next we provide an approximation for a lower bound which we compare to simulated random walks results.

C.2 A simplified Markov Chain approach

We assume that nodes are initially mapped to points in rb and are selected to be witnesses if they are at distance at most rq/2 from the origin in rb (based on L). The initial mapping of nodes is uniform and independent, also, the movement of nodes in the space can me modeled as independent random walks.

The initial probability that a node i is found at distance at most pr is equal to the probability that in all b dimensions its location is at most pr, i.e. Pr(distipr)=(2p)b. Therefore, qb is the initial probability that a node is selected and the expected witness set size is nqb. We consider configurations where the witness set size is logarithmic with n, i.e., nqb=clog2n.

Next, we estimate the expected time until at least k out of f compromised nodes will be selected, where t<n/3 and s>t are the ratio of compromised nodes in the system and in the witness set respectively, i.e., f=tn and k=scnqb=sclog2n. Initially, f nodes are uniformly distributed in rb. In one step, every node moves a single unit (+1 or 1) in one dimension.

Let A be the witness selection area in rb and B=rbA, that is, B is the complement of A in rb. Calculating the hitting time for k out of f nodes to reach A is an analytical and computational challenge. Suppose, for instance, that we use the transition matrix of a random walk in rb, depending on the size of the space, the number of states makes it implausible to solve the hitting time computationally using (even without considering the f simultaneous random walks).

Instead, we represent the number of nodes inside A as a Markov Chain {Xl|l}, where S=0,,f are the possible states and Xl is the number of nodes in the targeted area after l steps. In order to calculate Pr(Xl+1=y|Xl=x) we consider, as a simplification, only the average probability (over all possible positions) that each individual node inside A may leave the area in one step, as well as the average probability that each individual node inside B may move to A. These probabilities are assumed to be the same for every node independently of its position inside A (or inside B).

Only nodes located in the “border” of A with B can move to B in one step. Intuitively, the border of A is composed of points that are one step away from a point in B. To formalize this notion we use the norm-1 distance, which can be thought as the minimum number of steps needed to go from a point u to another point v.

1-Distr,b(u,v):=j=0b1𝑚𝑖𝑛(ujvjmodr,vjujmodr)

We define the distance between a point and an area Brb as the minimum distance from u to any point of B. The border of an area A is comprised of all the points that are at distance 1 from its complement B.

border(A):={uA|1-Distr,b(u,B)=1}

Nodes inside A that are not located in the border cannot leave the area in a single step. Thus, the average probability that a node leaves A after one step (over all points of A141414As shown later, the probability that a node leaves B given that it is in border(B) is the same for every point in border(B), which is not the case for all points in border(A).) is:

𝐴𝑣𝑔Pr(ileavesA)=Pr(ileavesA|iborder(A))|border(A)||A|

The probability of a node leaving B is analogous.

Let Yl denote the number of nodes that leave A from step l to step l+1, and Wl the number of nodes that leave B. We define Cl=WlYl as the variation of the number of nodes in A, thus:

Xl+1=Xl+Cl
Pr(Xl+1=y|Xl=x)=Pr(Cl=yx|Xl=x)

The witness selection region A is comprised of all the points at a distance d=rq/2 (L) from the origin. We say that u=[u1,,ub]A iff ui:duid (where d is the same as rd). The size of the space (number of points) is rb, while the size of A is (2d+1)b, and thus |B|=rb(2d+1)b.

A point v inside B that is one step away from A satisfy the following condition: there is a single vi such that vi=d+1 or vi=(d+1), and for all other vj, dvjd. For a specific vi, there are (2d+1)d1 points in border(B) with vi=d+1 (same for vi=(d+1)). Since there are b dimensions, the total number of points in border(B) is 2b(2d+1)b1. Now for any node in border(B), a random step can change the value of a single position by +1 or 1, so the probability that such node moves to A after one step is 12b.

On the other hand, in border(A) there are points that are at distance 1 from multiple points in B. A point u in border(A) should satisfy: ui,duid and uj:uj=d or uj=d. The probability of moving from border(A) to B depends on how many values uj=d (or d), and there can be at most b values equal to d. Suppose there are k such uj, then the probability that a node in this point leaves to B is 12(bk+1).

There are (bk) distinct ways of choosing k out of b values to be either d or d, and for each choice of k elements, there are 2k ways of arranging it in a k-sequence of d and d. Finally, there are 2k(bk)(2d1)bk points u in which k ui have value either d or d. The total number of points in border(A) is the sum over all possible k:

|border(A)|=j=1b2j(bj)(2d1)bj

We can now calculate the average probability that a node in border(A) moves to B:

𝐴𝑣𝑔Pr(ileavesA|iborder(A))=j=1b2j(bj)(2d1)bj2(bj+1)|border(A)|

Next, we calculate each component of the transition matrix.

Let pA=𝐴𝑣𝑔Pr(ileavesA) and pB=𝐴𝑣𝑔Pr(ileavesB) the individual probabilities that a node leave the current area in one step. Suppose that there are x nodes in A and fx nodes in B. Then,

YlBinomial(x,pA)
WlBinomial(fx,pB)

The variation Cl is calculated as following:

Pr(WlYl=c|Xl=x)=c1c2=cPr(Wl=c1|X=x)Pr(Yl=c2|X=x)

Where c1 can range from 0 to fx and c2 can range from 0 to x. Let P be the transition matrix of the Markov Chain {Xl|l}, and Pij the component of the ith row and jth columns, then:

Pij=Pr(Xl+1=j|Xl=i)=Pr(WlYl=ji|Xl=i)

We use P to compute the average number of steps that starting from a particular state S, the chain arrives at a state Sk. The average hitting time is the weighted sum of all individual hitting times, where the weight for each initial state is the probability of that state in the initial distribution. To calculate the initial distribution, we assume that all nodes are distributed uniformly at random and have the same probability qb. Let Xinit denote the initial number of nodes inside A, then:

XinitBinomial(f,qb)

We ran simulations for multiple random walks in rb and compared the hitting time of the simulations with the values obtained from the simplified analysis. For this purpose, we chose parameters so that the number of steps required for witness set corruption is small enough to allow the simulations to finish in a reasonable time.

Refer to caption

Figure 4: Simulations (thick line) and simplified analysis (dashed line). On the left graph: r=210. On the right graph: b=8. In both graphs t=0.25, c=2 and s=0.5.

Figure 4 shows results from the simulations and from the simplified analysis. For each point in the graphs, we took the average of 1000 runs. From both graphs, we can conclude that the simplified analysis gives a conservative estimate of the hitting time.

In the counterpart problem of the Gathering Time, we want to know the average number of steps after which there are not enough correct nodes in the witness set. It can be calculated with simple modifications in the Markov chain: the possible states are now 0,,nf, and we use the updated P to compute the average number of steps until the chain arrive at a state S<k.

Because of history differences, the path of local random walks may slightly diverge among processes. Consider the extreme case where each local history evolves independently from each other (and thus the difference can be arbitrarily large). The speed up property of multiple random random walks guarantees that a linear speed up (on the number of walks f) of the hitting time occurs [1]. In reality, the walks are highly correlated and are apart from each other only by a factor that is much smaller than the hitting time. Thus, the speed up in the cases we consider is negligible in relation to the hitting time.

C.3 Liveness vs Consistency graph

To optimize the overall time of failure, one has to select the threshold s so that it takes the same amount of steps to violate liveness and consistency. In Figure 5, we show both curves for s=0.45 and t=0.25.151515Ideally, s should be selected to optimize the failure time for each value of w and t. However, calculating liveness results takes considerably longer (than consistency) due to the size of matrix P. Therefore, we present the values for t=0.25, where the size of P is smaller.

Refer to caption

Figure 5: Liveness and Consistency failure times. The parameters used are: n=1024, b=4, r=210, k=0.45w and t=0.25.

Appendix D Recovery Protocol

A process pi first sends 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 to every process and includes the latest WBB message it has sent with tag [Π]. If a process pj that has already delivered a message m receives 𝑅𝐸𝐶𝑂𝑉𝐸𝑅, it replies with m. When pi receives f+1 replies for m, it knows at least one is from a correct process, and can safely deliver it. Moreover, when receiving f+1 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 messages, pj also sends a 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 to every process (this threshold ensures that at least one correct process should propose to initiate a reliable broadcast instance).

When pi receives 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 from n+f2+1 processes, if a unique message m was received so far, it then starts Bracha’s broadcast 𝐸𝐶𝐻𝑂 phase for m. Otherwise, pi waits until it receives f+1 𝑅𝐸𝐴𝐷𝑌,m,[Π] from the recovery messages to start echoing. The traditional Byzantine reliable broadcast is then executed.

Algorithm 7 Recovery Protocol.

For a particular instance f<|Π|/3, we assume that for every correct process pi, Wi has at most k1 faulty witnesses161616Note that these assumptions are weaker than those of 3.2. This is because the addition of the recovery protocol and a timeout compensate for the non-responsiveness of witness sets.. In addition, we assume the 𝑡𝑖𝑚𝑒𝑜𝑢𝑡 time to be set much smaller than the execution time of δ broadcast instances, such that the adversary cannot change the composition of faulty processes in witness sets before processes timeout.

Lemma 9.

If two correct processes p and q deliver m and m, then m=m.

Proof.

There are three different scenarios according to the algorithm in which p and q deliver messages: both deliver in line 15 (Algorithm1), one delivers in line 15 and the other in lines 28 or 41 (Algorithm 7), or both deliver in lines 28 or 41.

In the first case, since at most k1 witnesses are faulty in each witness set, correct processes receive messages from at least one correct witness in lines 10 and 14, which guarantees 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 (see proof of Theorem 2).

For the second case, if q delivers m in line 28, it is guaranteed to receive a reply from at least a correct process r. Since processes only take step in Algorithm 7 after timing out or delivering a message, it must be that r delivered m in line 15. From the first case, m=m.

On the other hand, if q delivers m in line 41, it received n+f2+1 𝑅𝐸𝐴𝐷𝑌,m. Suppose that mm, then n+f2+1 processes sent 𝐸𝐶𝐻𝑂,m (sufficient to make a correct process send ready for m). But since p delivers m after receiving 𝑉𝐴𝐿𝐼𝐷𝐴𝑇𝐸,m from at least a correct witness, it is also the case that at least n+f2+1 processes sent 𝑅𝐸𝐴𝐷𝑌,m,[Π].

Consequently, a correct process r must have sent both 𝐸𝐶𝐻𝑂,m and 𝑅𝐸𝐴𝐷𝑌,m,Π. Two scenarios are possible: if r sent 𝐸𝐶𝐻𝑂 in line 34, then it had readies for m and m stored (since r receives n+f2+1 recovery messages, as least one contains a 𝑅𝐸𝐴𝐷𝑌,m,Π), a contradiction with the guard of line 33. If it was in line 36, then a correct process sent 𝑅𝐸𝐴𝐷𝑌,m,Π, also a contradiction since two correct processes sent 𝑅𝐸𝐴𝐷𝑌,,Π for distinct messages m and m (see proof of Theorem 2).

In the third case, suppose p delivers m in line 28 and q delivers m in line41. There is a correct process r that delivers m in line 15 and sent a reply to p, which from the second case above implies m=m. If both deliver m and m in line 41, there is at least one correct process that send both 𝑅𝐸𝐴𝐷𝑌,m and 𝑅𝐸𝐴𝐷𝑌,m. Since correct processes do not send readies for distinct messages, m=m.

Lemma 10.

If a correct process delivers a message, then every correct process eventually delivers a message.

Proof.

A correct process p can deliver a message in three possible occasions: in line 15 (Algorithm 1), and lines 28 and 41 (Algorithm 7).

If p delivers m in line 15, because p’s witness set has at least one correct witness that sends 𝐸𝐶𝐻𝑂,m,W to everyone, every correct process either times-out or delivers a message in line 15 (using witnesses). Moreover, from Theorem 9, no correct process delivers mm. At least one correct witness w sent 𝑉𝐴𝐿𝐼𝐷𝐴𝑇𝐸,m to p, n+f2+1 processes sent 𝑅𝐸𝐴𝐷𝑌,m,Π to w, from which at least f+1 are correct.

If another process q times-out, it can deliver m by receiving f+1 replies from a 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 message. Suppose q does not receive enough replies, then at least f+1 correct processes do not deliver m in line 15 (Algorithm 1), that is, they timeout. Consequently, every correct process receives f+1 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 messages and also send 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 (even if they already delivered m). q then gathers n+f2+1 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 messages, and since there is at least one correct process among them that sent 𝑅𝐸𝐴𝐷𝑌,m,Π, q echoes m (if m is the only gathered message, line 33).

If q receives a distinct m before echoing a message, it waits for f+1 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 messages containing 𝑅𝐸𝐴𝐷𝑌,m,Π, which is guaranteed to happen (since at least f+1 correct processes previously sent 𝑅𝐸𝐴𝐷𝑌,m,Π). Thus, every correct process echoes m, gathers n+f2+1 echoes and sends 𝑅𝐸𝐴𝐷𝑌,m. Any process that has not delivered a message then receives n+f2+1 readies and deliver m.

In the case where p delivers m in line 28, at least one correct process sent reply to p and delivered m in Algorithm 1, which implies the situation described above.

Now suppose that p delivers m in line 41, and no correct process delivers a message in Algorithm 1. p received n+f2+1 𝑅𝐸𝐴𝐷𝑌,m, which at least f+1 are from correct processes. Moreover, because correct processes wait for n+f2+1 echoes (or f+1 readies) before sending 𝑅𝐸𝐴𝐷𝑌, they cannot send 𝑅𝐸𝐴𝐷𝑌 for distinct messages m and m, since that would imply that a correct process sent 𝐸𝐶𝐻𝑂 for both messages. Therefore, every correct process q is able to receive f+1 readies for m and also send ready for it. q then receives n+f2+1 readies and deliver m.

Lemma 11.

If a correct process broadcasts m, every correct process eventually delivers m.

Proof.

If any correct process delivers m, from Lemma 10 every correct process delivers it. Suppose that p is the source and that no process delivers m before it times-out. p then sends 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 (including m) to every process, so that even if p reaches no correct witnesses, correct processes still receive a protocol message and initializes the instance. Since no correct process delivers m before timing-out, they also trigger 𝑡𝑖𝑚𝑒𝑜𝑢𝑡 and send 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 with m.

Because p is correct, it sends no protocol message for mm. Every correct process then gathers enough 𝑅𝐸𝐶𝑂𝑉𝐸𝑅 messages to send 𝐸𝐶𝐻𝑂 and later 𝑅𝐸𝐴𝐷𝑌. p eventually gathers n+f2+1 readies and deliver m.

Lemmas 910 and 11 imply Theorem 12.

Theorem 12.

Algorithms 13 and 7 together satisfy 𝑉𝑎𝑙𝑖𝑑𝑖𝑡𝑦, 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 and 𝑇𝑜𝑡𝑎𝑙𝑖𝑡𝑦.

Appendix E Applications and Ramifications

In this section, we briefly overview two potential applications of our broadcast protocol: asynchronous asset transfer and a generic accountability mechanism. We also discuss open questions inspired by these applications.

E.1 Asset Transfer

The users of an asset-transfer system (or a cryptocurrency system) exchange assets via transactions. A transaction is a tuple 𝑡𝑥=(s,r,v,𝑠𝑒𝑞), where s and r are the sender’s and receiver’s account 𝑖𝑑 respectively, v is the transferred amount and 𝑠𝑒𝑞 is a sequence number. Each user p maintains a local set of transactions T and it adds a transaction 𝑡𝑥 to T (we also say that p commits 𝑡𝑥), when p confirms that (i) all previous transactions from s are committed, (ii) s has not issued a conflicting transaction with the same sequence number, and (iii) based on the currently committed transactions, s indeed has the assets it is about to transfer.171717Please refer to [14] for more details.

One can build such an abstraction atop (probabilistic) reliable broadcast: to issue a transaction tx, a user invokes 𝑏𝑟𝑜𝑎𝑑𝑐𝑎𝑠𝑡(tx). When a user receives an upcall 𝑑𝑒𝑙𝑖𝑣𝑒𝑟(tx) it puts tx on hold until conditions (i)-(iii) above are met and then commits tx. Asset-transfer systems based on classical broadcast algorithms [6] exhibit significant practical advantages over the consensus-based protocols [14, 4]. One can reduce the costs even further by using our broadcast protocol. The downside is that there is a small probability of double spending. A malicious user may make different users deliver conflicting transactions and overspend its account by exploiting “weak” (not having enough correct members) witness sets or over-optimistic evaluation of communication delays in the recovery protocol. We can temporarily tolerate such an overspending and compensate it with a reconfiguration mechanism that detects and evicts misbehaving users from the system, as well as adjusting the total balance. It is appealing to explore whether such a solution would be acceptable in practice.

E.2 Accountability and beyond

In our approach probabilistic protocol, every broadcast event is validated by a set of witnesses. The validation here consists in ensuring that the source does not attempt to broadcast different messages with the same sequence number. As a result, with high probability, all correct processes observe the same sequence of messages issued by a given source.

One can generalize this solution to implement a lightweight accountability mechanism (in the vein of  [26, 25]): witnesses collectively make sure that the sequence of events generated by a process corresponds to its specification. Here a process commits not only to the messages it sends, but also to the messages it receives. This way the witnesses may verify if its behavior respects the protocol the process is assigned.

Notice that one can generalize this approach even further, as the verified events do not have to be assigned to any specific process. In the extreme case, we can even think of a probabilistic state-machine replication protocol [29, 10]. What if the processes try to agree on an ever-growing sequence of events generated by all of them in a decentralized way? Every next event (say, at position k) may then be associated with a dynamically determined pseudo-random set of witnesses that try to make sure that no different event is accepted at position k. Of course, we need to make sure that the probability of losing consistency and/or progress is acceptable and a probability analysis of this kind of algorithms is an appealing question for future research.