Abstract 1 Introduction 2 Preliminaries 3 The Coded-MBRB algorithm 4 Conclusion References Appendix A Correctness analysis Appendix B Communication analysis Appendix C Using Bracha’s BRB on hash values under a message adversary

Near-Optimal Communication Byzantine Reliable Broadcast Under a Message Adversary

Timothé Albouy ORCID Univ Rennes, Inria, CNRS, IRISA, 35042 Rennes-cedex, France Davide Frey ORCID Univ Rennes, Inria, CNRS, IRISA, 35042 Rennes-cedex, France Ran Gelles ORCID Bar-Ilan University, Ramat Gan, Israel Carmit Hazay ORCID Bar-Ilan University, Ramat Gan, Israel Michel Raynal ORCID Univ Rennes, Inria, CNRS, IRISA, 35042 Rennes-cedex, France Elad Michael Schiller ORCID Chalmers University of Technology, Gothenburg, Sweden François Taïani ORCID Univ Rennes, Inria, CNRS, IRISA, 35042 Rennes-cedex, France Vassilis Zikas ORCID Georgia Institute of Technology, Atlanta, GA, USA
Abstract

We address the problem of Reliable Broadcast in asynchronous message-passing systems with n nodes, of which up to t are malicious (faulty), in addition to a message adversary that can drop some of the messages sent by correct (non-faulty) nodes. We present a Message-Adversary-Tolerant Byzantine Reliable Broadcast (MBRB) algorithm that communicates 𝒪(|m|+nκ) bits per node, where |m| represents the length of the application message and κ=Ω(logn) is a security parameter. This communication complexity is optimal up to the parameter κ. This significantly improves upon the state-of-the-art MBRB solution (Albouy, Frey, Raynal, and Taïani, TCS 2023), which incurs communication of 𝒪(n|m|+n2κ) bits per node. Our solution sends at most 4n2 messages overall, which is asymptotically optimal. Reduced communication is achieved by employing coding techniques that replace the need for all nodes to (re-)broadcast the entire application message m. Instead, nodes forward authenticated fragments of the encoding of m using an erasure-correcting code. Under the cryptographic assumptions of threshold signatures and vector commitments, and assuming n>3t+2d, where the adversary drops at most d messages per broadcast, our algorithm allows at least =nt(1+ϵ)d (for any arbitrarily low ϵ>0) correct nodes to reconstruct m, despite missing fragments caused by the malicious nodes and the message adversary.

Keywords and phrases:
Asynchronous message-passing, Byzantine fault-tolerance, Message adversary, Reliable broadcast, Erasure-correction codes, Threshold signatures, Vector commitments
Copyright and License:
[Uncaptioned image] © Timothé Albouy, Davide Frey, Ran Gelles, Carmit Hazay, Michel Raynal, Elad Michael Schiller, François Taïani, and Vassilis Zikas; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed algorithms
Acknowledgements:
R. Gelles would like to thank Paderborn University and CISPA – Helmholtz Center for Information Security for hosting him while part of this research was done.
Funding:
This work was partially supported by the French ANR projects ByBloS (ANR-20-CE25-0002-01) and PriCLeSS (ANR-10-LABX-07-81), devoted to the modular design of building blocks for large-scale Byzantine-tolerant multi-users applications. Research supported in part by the United States-Israel Binational Science Foundation (BSF) through Grant No. 2020277.
Editors:
Silvia Bonomi, Letterio Galletta, Etienne Rivière, and Valerio Schiavoni

1 Introduction

Reliable Broadcast allows n asynchronous nodes to agree eventually on a message sent by a designated node, the sender, despite the possible malicious behavior by some nodes and the transmission network. Reliable broadcast plays a crucial role in key applications, including consensus algorithms, replication, event notification, and distributed file systems. These systems sometimes require broadcasting large messages or files (e.g., permissioned blockchains), and thus, reducing the communication overhead to a minimum is an important aspect of achieving scalability. In that vein, this work aims at providing communication efficient solutions for the task of reliable broadcast in the presence of node and link faults.

Byzantine nodes [20, 29] are faulty nodes that are assumed to act cooperatively in an arbitrary manner to hinder the non-faulty nodes (also known as correct nodes) from reaching an agreement on the value of a sent message. These faulty nodes can manifest in various ways, such as delivering fake messages that were never sent, altering the payload of messages sent by any faulty nodes, delaying message delivery, or even omitting messages altogether. Also, a Byzantine failure can present itself differently to different nodes.

Solving reliable broadcast in the presence of Byzantine nodes (a problem known as BRB for Byzantine Reliable Broadcast [10]) has been extensively studied for at least four decades. Bracha [11, 12] in particular showed that BRB could be implemented in asynchronous networks as soon as the number t of Byzantine nodes is limited to be less than a third of the nodes. This seminal result has since been followed by hundreds of works, with a various range of optimizations and tradeoffs between different parameters such as resilience, efficiency, and communication; see [38] for an excellent book on this topic.

A significant challenge to reliable broadcast algorithms arises when the message-passing system is unreliable and possibly cooperates with the Byzantine nodes. Link faults [41, 43] give Byzantine nodes (potentially limited) control over certain network links, enabling them to omit or corrupt messages (an ability captured under the umbrella term message adversary [37]). This work focuses on a specific type of message adversary [37] that can only omit messages sent by correct nodes, but that cannot alter their content. This message adversary abstracts cases related to silent churn, where nodes may voluntarily or involuntarily disconnect from the network without explicitly notifying other nodes. During disconnection, the adversary causes correct nodes to pause the execution of their algorithm temporarily and resume upon reconnecting. In the message adversary model, correct nodes may miss messages sent over reliable communication channels by other nodes while they are disconnected, as there is no explicit notification about the message omission.

Problem overview.

We assume n nodes over an asynchronous network, where a message can be delayed for an arbitrary yet finite amount of time (unless omitted by the message adversary). We assume the existence of t Byzantine nodes and a message adversary capable of omitting up to d messages per node’s broadcast. To be more precise, a node communicates through a comm primitive (or a similar multicast/unicast primitive that targets a dynamically defined subset of nodes), which results in the transmission of n messages, with each node being sent one message, including the sender. The message adversary can omit messages in transit to a subset of at most d correct nodes. The adversary is only limited by the size of that subset. For instance, between different comm invocations, the adversary can modify the set of correct nodes to which messages are omitted. Furthermore, a designated sender node holds a message m that it wishes to broadcast to all the nodes.

An algorithm that satisfies the requirements of reliable broadcast despite Byzantine nodes and a message adversary is called a Message-adversary Byzantine Reliable Broadcast (MBRB) algorithm. The detailed version of MBRB’s requirements was formulated in [4], see Section 2. We informally summarize them here as follows. (1) For any sender invoking the broadcast algorithm, no two correct nodes deliver m and m, such that mm. (2) For any sender invoking the broadcast algorithm, either zero or at least correct nodes will deliver m. The quantity might depend on the adversary’s power, i.e., on t and d. (3) If a correct node delivers some message m from a correct sender, this correct sender has broadcast m previously and at least correct nodes will deliver it.

Background.

Albouy, Frey, Raynal, and Taïani [4] recently proposed a Message-adversary Byzantine Reliable Broadcast algorithm (denoted AFRT for short) for asynchronous networks that withstands the presence of t Byzantine nodes and a message adversary capable of omitting up to d messages per node’s broadcast. AFRT guarantees the reliable delivery of any message when n>3t+2d. Moreover, they demonstrate the necessity of this bound on the number of Byzantine nodes and omitted messages, as no reliable broadcast algorithm exists otherwise.

One caveat of AFRT regards its communication efficiency. While it achieves an optimal number of 𝒪(n2) messages, and an optimal delivery power =ntd, each node’s communication requires 𝒪(n(|m|+nκ)) bits, where |m| represents the number of bits in the broadcast message and κ is the length of the digital signatures used in their algorithm. In the current work, we design an algorithm that significantly reduces the communication cost per node while preserving the total number of messages communicated. Our solution features at most 4n messages per correct node (corresponding to 4n2 messages overall), and only 𝒪(|m|+nκ) bits per correct node. Overall, 𝒪(n|m|+n2κ) bits are communicated by correct nodes. This bound is tight (up to the size of the signature κ) for deterministic algorithms using signatures [18, 33], as every correct node must receive the message m, and as the reliable broadcast of a single bit necessitates at least Ω(n2) messages [21].

Contributions and techniques.

This paper is the first to present an MBRB algorithm able to tolerate a hybrid adversary combining t Byzantine nodes and a Message Adversary of power d, while providing optimal Byzantine resilience, near-optimal communication, and near-optimal delivery power .

Theorem 1.1 (Main, informal).

For any ε>0, there exists an efficient MBRB algorithm, such that every message m broadcast via this scheme is delivered by at least =nt(1+ε)d correct nodes under the assumption n>3t+2d. Each correct node communicates no more than 4n messages and 𝒪(|m|+nκ) bits overall, where |m| is the length of the message m.

The above asymptotic communication complexity holds assuming a sufficiently long message m. Further, ntd is a natural upper bound on the delivery power  of any MBRB algorithm. This bound arises from the power of the message adversary to isolate a subset of the correct parties of size d, and omit all messages sent to this subset. Our solution obtains a delivery power  that is as close to the limit as desired, at the cost of increasing communication (through the hidden constants in the asymptotic 𝒪() term, which depends on ε). Finally, n>3t+2d is a necessary condition to implement MBRB under asynchrony [3], thus making our solution optimal in terms of Byzantine resilience.

The starting point of our algorithm is the AFRT algorithm [4]. This algorithm achieves all the desired MBRB properties (Definition 2.3), albeit with a large communication cost of at least n2|m| bits overall. This communication cost stems from the re-emission strategy used by AFRT. In the AFRT algorithm, the sender first disseminates the message m to all nodes. To counter a possibly faulty sender, each node that receives m signs it and forwards it to the entire network, along with its own signature and any other signature observed so far for that message. This re-broadcasting step leads to n2|m| bits of communication.

In order to reduce the communication costs, we apply a coding technique, inspired by an approach by Alhaddad et al. [5]. Instead of communicating the message m directly, the sender first encodes the message using an error-correction code and “splits” the resulting codeword between the nodes, so that each node receives one fragment of size 𝒪(|m|/n) bits. Now, each node needs to broadcast only its fragment of the message rather than the entire message. This reduced per-node communication effectively reduces the overall communication for disseminating the message itself to n|m| bits.

Due to the message adversary and the actions of the Byzantine nodes, some of the fragments might not arrive at their destination. Error-correction codes have the property that the message m can be reconstructed from any sufficiently large subset of the fragments. But Byzantine nodes can do even worse, namely, they can propagate an incorrect fragment. Correct nodes cannot distinguish correct fragments from incorrect ones (at least, not until enough fragments are collected, and the message is reconstructed). Without this knowledge, correct nodes might assist the Byzantine nodes in propagating incorrect fragments, possibly harming the correctness and/or performance of the algorithm. To prevent this, the sender could sign each fragment that it sends. A node receiving a fragment could then verify that it is correctly signed by the sender and ignore it otherwise. The drawback of this solution is that only the sender can generate signatures for fragments.

In our MBRB algorithm, we rely on correct nodes that have already reconstructed the correct message to disseminate its fragments to the nodes that have not received any (say, due to the message adversary). In principle, when a node reconstructs the correct message, it can generate the codeword and obtain all the fragments, even if it did not receive some of them beforehand. However, the node cannot generate the sender’s signature for the fragments it generated by itself. Because of this, the node cannot relay these fragments to the other nodes, potentially leading to a reduced delivery power .

We avert this issue by exploiting vector commitments [15]. This cryptographic primitive generates a unique short digest C for any input vector of elements V. Additionally, it generates succinct proofs of inclusion for each element in V. In our system, the fragments of the (coded) message m form the vector V, and the inclusion proofs replace the need to sign each fragment separately. In more detail, every fragment of the codeword communicated by some node is accompanied by two pieces of information: the commitment C for the vector V containing all fragments of m, and a proof of inclusion showing that the specific fragment indeed belongs to V (see Section 2 for a formal definition of these properties). The sender signs only the commitment C. This means that Byzantine nodes cannot generate an incorrect fragment and a proof that will pass the verification, since they cannot forge the sender’s signature on C. Yet, given the message m, generating a proof of inclusion for any specific fragment can be done by any node. The vector commitment on the message m creates the same commitment C and the same proofs of inclusion generated by the sender. These could then be propagated to any other node along with the sender’s signature on C.

To complete the description of our MBRB algorithm, we mention that, similar to AFRT, our algorithm tries to form a quorum of signatures on some specific vector commitment C. In parallel, nodes collect fragments they verify as part of the message whose vector commitment is C. Once a node collects enough signatures (for some C) and at the same time obtains enough message fragments that are proven to belong to the same C, the node can reconstruct m and deliver (accept) it. At this point, the node also disseminates the quorum of signatures along with (some of) the fragments. This allows other correct nodes to reconstruct the message and verify a quorum has been reached. In fact, the dissemination of fragments, including fragments that this node did not have before reconstructing the message, is a crucial step in amplifying the number of nodes that deliver m to our stated level of =nt(1+ε)d. See the full description of the MBRB algorithm in Section 3.

Although our algorithm builds quorums on commitments, it departs substantially from the BRB algorithm proposed by Das, Xiang, and Ren [18], which avoids signatures and relies on hashes only. Their solution provides an overall communication complexity in 𝒪(n|m|+n2κ) that is optimal up to the κ parameter. Following the sender’s initial dissemination of message m, their proposal runs Bracha’s algorithm on the hash value of the broadcast message to ensure agreement. Unfortunately, when used with a message adversary, Bracha’s algorithm loses the sub-optimal Byzantine resilience n>3t+2d that AFRT and our solution provide, which is why the solution presented in this paper avoids it. (See Appendix C for a more detailed discussion of why this is so.) Due to the page limit, our algorithm’s complete analysis and proof details appear in the Appendix.

Related work.

Byzantine reliable broadcast (BRB) can be traced back to Pease, Shostak, and Lamport [34], which considered the particular case of synchronous message-passing systems. Since then, solutions for reliable broadcast, together with the related consensus problem, have been considered for many distributed systems [38]. In asynchronous systems, BRB solutions can be traced back to Bracha and Toueg [11, 12]. Recent advances [30, 27, 9, 23] in Byzantine fault-tolerant (BFT) solutions to the problem of BRB include the above-mentioned AFRT [4], which safeguards also against the message adversary. Our solution features substantially lower communication than AFRT, without harming the other properties, e.g., the number of messages or the delivery power.

Computation in networks with link faults, namely, with Byzantine links was considered by Santoro and Widmayer [41, 42], who discussed various types of Byzantine actions, e.g., omitting messages, corrupting messages, injecting messages, and combinations thereof. The works of [6, 43] focus on the case of reliable broadcast with such faults. In [35], Pelc proved that robust communication is feasible over graphs whose edge-connectivity is more than 2f, assuming the number of Byzantine links is bounded by f. This is also implied by the work of Dolev [20]. Censor-Hillel et al. [16] and Frei et al. [24] show that any computation can be performed when all links suffer arbitrary substitution faults (but no crashes), given that the network is 2-edge connected. When all links suffer corruption, but their overall amount is restricted, any computation can be reliably performed by Hoza and Schulman [28], for synchronous networks where the topology is known, or by Censor-Hillel, Gelles, and Haeupler [17], for asynchronous networks with unknown topology, see also [25].

Settings that allow Byzantine nodes in addition to faulty links were considered [7, 8, 19, 26, 36, 46]. Building on [4], the algorithm was also extended to signature-free asynchronous systems [3], albeit with lower delivery guarantees, and a weaker resilience with respect to d and t. We consider fully connected asynchronous networks, Byzantine nodes, and omission only link failures. But, our MBRB is the first, to the best of our knowledge, to offer a near-optimal communication (up to the length of the signature, κ) and delivery power .

Coding techniques are implemented to minimize the dissemination costs associated with message transmission across the network, ensuring the ability to reconstruct data in the event of node failures or adversarial compromises. In the context of Blockchains, significant contributions have been made by Cachin and Tessaro [14] as well as Cachin and Poritz in SINTRA [13], followed by its successors such as HoneyBadgersBFT [32], BEAT [22], DispersedLedger [47] and Alhaddad et al. [5]. These solutions leverage digital signatures and coding techniques to provide a balanced and reliable broadcast. Our work contributes to the advancement of the state of the art in the field of coded reliable broadcast by offering improved fault-tolerance guarantees that are stronger than the aforementioned solutions.

2 Preliminaries

General notations and conventions. For a positive integer n, let [n] denote the set {1,2,,n}. A sequence of elements (x1,,xn) is shorthanded as (xi)i[n]. We use the symbol ‘-’ to indicate any possible value. That is, (h,-) means a tuple where the second index includes any arbitrary value which we do not care about. All logarithms are base 2.

Nodes and Network.

We focus on asynchronous message-passing systems that have no guarantees of communication delay. The system consists of a set, 𝒫={p1,,pn}, of n fail-prone nodes that cannot access a clock or use timeouts. We identify node i with pi.
Communication means. Any ordered pair of nodes pi,pj𝒫 has access to a communication channel, 𝑐ℎ𝑎𝑛𝑛𝑒𝑙i,j. Each node can send messages to all nodes (possibly by sending a different message to each node). That is, any node, pi𝒫, can invoke the transmission macro, comm(m1,,mn), that communicates the message mj to pj over 𝑐ℎ𝑎𝑛𝑛𝑒𝑙i,j. The message mj can also be empty, in which case nothing will be sent to pj. However, in our algorithms, all messages sent in a single comm activation will have the same length. Furthermore, when a node sends the same message m to all nodes, we write broadcast(m)=comm(m,m,,m) for shorthand. We call each message mj transmitted by the protocol an implementation message (or simply, a message) to distinguish such messages from the application-level messages, i.e., the one the sender wishes to broadcast.
Byzantine nodes. Faulty nodes are called Byzantine, and their adversarial behavior can deviate from the proposed algorithm in any manner. For example, they may crash or send fake messages. Their ability to communicate and collude is unlimited. They might perform any arbitrary computation, and we assume their computing power is at least as strong as that of non-faulty nodes, yet not as strong as to undermine the security of the cryptographic signatures we use. We assume that at most t nodes are faulty, where t is a value known to the nodes. Non-faulty nodes are called correct nodes. The set of correct nodes contains c nodes where ntcn. The value of c is unknown.

Faulty nodes may deviate arbitrarily from the correct implementation of comm(). For instance, they may unicast messages to only a subset of the nodes in 𝒫. As mentioned, each pair of nodes can communicate using 𝑐ℎ𝑎𝑛𝑛𝑒𝑙i,j. While 𝑐ℎ𝑎𝑛𝑛𝑒𝑙i,j is assumed to be a reliable channel that is not prone to message corruption, duplication, or the creation of fake messages that were never sent by nodes; the message adversary [41, 43, 37], which we specify next, has a limited ability to cause message loss.
Message adversary. This entity can remove implementation messages from the communication channels used by correct nodes when they invoke comm(). More precisely, during each activation of comm(m1,,mn), the adversary has the discretion to choose up to d messages from the set {mi} and eliminate them from the corresponding communication channels where they were queued. We assume that the adversary has full knowledge of the contents of all messages {mi}, and thus it makes a worst-case decision as to which messages to eliminate.

The failures injected by a message adversary differ from those of classical sender and/or receiver omissions in that they are mobile. I.e., they are not pinned to a set of particular nodes but may move between any correct nodes during the same execution since they are defined per invocation of comm(). In particular, no node is immune to the message adversary.

Note that for the case of d=0, the adversary is as weak as the common settings in which all communication channels are reliable (since no message is ever lost) and t nodes are Byzantine. Assumption 2.1 limits the adversary’s power to avoid network partitions. As mentioned above, this is necessary for any MBRB algorithm [4].

Assumption 2.1 (Adversary-power-assumption).

n>3t+2d.

Since the message adversary can omit all implementation messages that are sent to a given set D𝒫:|D|=d, we know that , the number of correct nodes that are guaranteed to output the broadcast m correctly, must satisfy cd.

Such a hybrid fault model has been studied in the past in synchronous networks [8], but has remained little studied in an asynchronous setting, except for the work of Schmid and Fetzer [44] that limits itself to round-based algorithms (unlike us) and does not cover full disconnections of correct nodes (which we do). Modeling full disconnections is relevant as this captures correct nodes that remain disconnected for long periods. In addition to bounding the maximal number of outgoing messages that a correct sender might lose (as we do), the model proposed by Schmid and Fetzer [44] also bounds the maximal number of incoming messages that any correct node might miss. This adds an elegant symmetry to the model but poses significant challenges when considering asynchronous networks, as there is no obvious scope on which to limit the number of incoming messages missed by a given node. Schmid and Fetzer [44] therefore restrict the fault model to round-based algorithms. By contrast, our model allows for algorithms that do not follow this structure.

Error Correction Codes.

A central tool used in our algorithm is an error-correction code (ECC) [40]. Intuitively speaking, an ECC takes a message as input and adds redundancy to create a codeword from which the original message can be recovered even when parts of the codeword are corrupted. In this work, we focus on erasures, a corruption that replaces a symbol of the codeword with a special erasure mark .

Let 𝔽 denote a finite field whose size we set later, and let be a special symbol 𝔽. Given two strings of the same length, x,y𝔽n, their Hamming distance is the number of indices where they differ, Δ(x,y)=|{i[n]xiyi}|. Given a subset I[n], we denote by xI𝔽|I| the string x restricted to the indices in I.

To avoid confusion with global parameters, we denote the ECC-specific parameters by using a bar (e.g., x¯). An error-correction code is a function ECC:𝔽k¯𝔽n¯, with rate r¯=k¯/n¯, and distance d¯=minx,y𝔽k¯,xyΔ(ECC(x),ECC(y)). The Singleton bound determines that d¯n¯k¯+1, and when the equality holds, the code is said to be maximum distance separable (MDS). A prominent example of MDS codes is Reed-Solomon (RS) codes [39], which exist for any k¯,n¯, and |𝔽|n¯. Such codes can be efficiently encoded and decoded [40]. The erasure correction capabilities of a code depend on its distance, as given by the following fact.

Fact 2.2 (Erasure Correction Capability).

Any error-correction code of distance d¯ can recover up to d¯1 erasures. That is, for any y(𝔽{})n¯, let E={iyi=} the set of erased indices. Then, if |E|<d¯, there is at most a single x𝔽k¯ such that y[n¯]E=ECC(x)[n¯]E.

Cryptographic Primitives.

Our algorithm relies on cryptographic assumptions. We assume that the Byzantine nodes are computationally bounded with respect to the security parameter, denoted by κ. That is, all cryptographic algorithms are polynomially bounded in the input 1κ. We assume that κ=Ω(logn). We further assume the availability of Public Key Infrastructure (PKI), which is the setting that assumes each node is assigned a pair of public/private keys generated according to some standard key-generation algorithm. Further, at the start of the computation, each node holds its own private key and the public key of all other parties. This setting implies private and authenticated channels. In particular, each node has public and private keys to support the following cryptographic primitives.
Threshold signatures. In a (τ,n) threshold signature scheme [45], at least τ out of all n nodes (the threshold) produce individual signatures shares σ for the same message m, which are then aggregated into a fixed-size threshold signature Σ. Verifying Σ does not require the public keys of the signers; one needs to use a single system-wide public key, the same for all threshold signatures produced by the scheme. This system public key, known to everyone, is generated during the system setup phase and distributed through the PKI.

Formally, we define a (τ,n) threshold signature scheme as a tuple of (possibly randomized) algorithms 𝖳𝖲𝖨𝖦=(𝗍𝗌_𝗌𝗂𝗀𝗇_𝗌𝗁𝖺𝗋𝖾,𝗍𝗌_𝗏𝖾𝗋𝗂𝖿𝗒_𝗌𝗁𝖺𝗋𝖾,𝗍𝗌_𝖼𝗈𝗆𝖻𝗂𝗇𝖾,𝗍𝗌_𝗏𝖾𝗋𝗂𝖿𝗒). The signing algorithm executed by node pi (denoted, 𝗍𝗌_𝗌𝗂𝗀𝗇_𝗌𝗁𝖺𝗋𝖾i) takes a message m (and implicitly a private key) and produces a signature σ=𝗍𝗌_𝗌𝗂𝗀𝗇_𝗌𝗁𝖺𝗋𝖾i(m). The share verification algorithm takes a message m, a signature share σi, and the identity i of its signer pi (and implicitly pi’s public key), and outputs a single bit, b=𝗍𝗌_𝗏𝖾𝗋𝗂𝖿𝗒_𝗌𝗁𝖺𝗋𝖾(m,σi,i){valid,invalid}, which indicates whether the signature share is valid or not. The combination algorithm takes a set 𝑠𝑖𝑔𝑠 of τ valid signature shares produced by τ out of n nodes and the associated message m (and implicitly the system public key) and outputs a threshold signature Σ=𝗍𝗌_𝖼𝗈𝗆𝖻𝗂𝗇𝖾(𝑠𝑖𝑔𝑠). The threshold signature verification algorithm takes a message m and a threshold signature Σ (and implicitly the system public key) and outputs a single bit b=𝗍𝗌_𝗏𝖾𝗋𝗂𝖿𝗒(m,Σ){valid,invalid}, indicating if the threshold signature is valid or not.

We require the conventional robustness and unforgeability properties for threshold signatures. This scheme is parameterized by a security parameter κ, and the size of signature shares and threshold signatures, |σ|=|Σ|=𝒪(κ), is independent of the size of the signed message, m. In our algorithm, we take τ=n+t2+1 (i.e., the integer right above n+t2).
Vector commitments (VC). A vector commitment (VC) is a short digest C for a vector of elements V, upon which a user can then generate a proof of inclusion π (sometimes called partial opening) of some element in V without disclosing the other elements of V to the verifier: the verifier only needs C, π, the element, and its index in the vector to verify its inclusion in V. A Merkle tree [31] is a notable example of vector commitment, although with several sub-optimal properties. For example, for a hash size of κ, a Merkle proof of inclusion is of 𝒪(κlog|V|) bits, which is significantly larger than modern schemes such as Catalano-Fiore vector commitments [15], which produce proofs of inclusion with an optimal size of 𝒪(κ) bits. In our construction, we use these optimal VC schemes (such as Catalano-Fiore’s), which provide commitments and proofs in 𝒪(κ) bits. The VC scheme provides two operations, parameterized by the security parameter κ: 𝗏𝖼_𝖼𝗈𝗆𝗆𝗂𝗍() and 𝗏𝖼_𝗏𝖾𝗋𝗂𝖿𝗒(), that work as follows. For any vector of strings V=(x1,,xn)=(xi)i[n], the function 𝗏𝖼_𝖼𝗈𝗆𝗆𝗂𝗍(V)(C,π1,,πn) returns C, the commitment, and every πi, the proof of inclusion for xi. The following hold.

  1. 1.

    Proof of inclusion (Correctness): Let (C,(πi)i[n])=𝗏𝖼_𝖼𝗈𝗆𝗆𝗂𝗍((x1,,xn)). Then for any i[n], it holds that 𝗏𝖼_𝗏𝖾𝗋𝗂𝖿𝗒(C,πi,xi,i)=valid.

  2. 2.

    Collision-resistance (Binding): For any j[n] and any randomized algorithm A taking (xi)i[n] and (C,(πi)i[n])=𝗏𝖼_𝖼𝗈𝗆𝗆𝗂𝗍(x1,,xn) as input, Pr[A outputs (xj,πj,j)𝗏𝖼_𝗏𝖾𝗋𝗂𝖿𝗒(C,πj,xj,j)=valid]<2Ω(κ). Namely, it is difficult to generate xjxj and a valid proof πj for the same commitment C.

We omit the traditional Hiding property of VC schemes (a commitment does not leak information on the vector [2]) as it is unneeded in our algorithm. We also implicitly assume that the 𝗏𝖼_𝖼𝗈𝗆𝗆𝗂𝗍 operation is deterministic: it always returns the same commitment C given the same input vector V, no matter the calling node pi. This is the case for Catalano-Fiore’s scheme [15], which does not use random salt.

Specification of the MBRB primitive.

MBRB’s objective is to guarantee a reliable broadcast, meaning it aims to ensure that a bounded minimum number of correct nodes ultimately deliver the broadcast messages to the application while upholding specific safety and liveness criteria. This assurance holds even when confronted with Byzantine faults and a message adversary capable of selectively suppressing messages.

An MBRB algorithm provides the MBRB-broadcast and MBRB-deliver operations. The following specification is presented in a single-shot, single-sender version, where ps is the sending node. A multi-sender, multi-shot version can be derived by adding the sender’s identifier and a running sequence number to messages and signatures. Nodes invoke the MBRB-deliver operation to deliver (to the application layer) messages broadcast by ps.

Table 1: Notations used by Algorithm 2.
n total number of nodes
t upper bound on the number of Byzantine nodes
d removal power of the message adversary
k reconstruction threshold of the erasure code, k out of n
ps the designated sending node (with identity s)
σi signature share by node pi
𝑠𝑖𝑔i the pair (σi,i)
𝑠𝑖𝑔𝑠,𝑠𝑖𝑔𝑠i sets of (signature share,id) pairs
Σ threshold signature (TS)
τ threshold of the TS scheme (set to τ=n+t2+1 in our algorithm)
m,m,mi application messages
C,C,Cm vector commitments
m~i ith message fragment of application message m
π~i proof of inclusion of fragment m~i

Definition 2.3 specifies the safety and liveness properties of MBRB. Safety ensures that messages are delivered correctly without spurious messages, duplication, or duplicity. Liveness guarantees that if a correct node broadcasts a message, it will eventually be delivered by at least one correct node (MBRB-Local-delivery). If a correct node delivers a message from any specific sender, that message will eventually be delivered by a sufficient number, , of correct nodes (MBRB-Global-delivery), where is a measure of the delivery power of the MBRB object. The parameter might depend on the adversary’s power, i.e., on t and d. Since the message adversary can omit all implementation messages that are sent to an unknown set D𝒫:|D|=d, we know that cd.

Definition 2.3.

An MBRB is an algorithm that satisfies the following properties.

  • MBRB-Validity.   If ps is correct and a correct node, pi, MBRB-delivers an application message m, then, node ps has MBRB-broadcast m (before that MBRB-delivery).

  • MBRB-No-duplication.   A correct node pi MBRB-delivers at most one application message m.

  • MBRB-No-duplicity.   No two different correct nodes MBRB-deliver different application messages from node ps.

  • MBRB-Local-delivery.   Suppose ps is correct and MBRB-broadcasts an application message m. At least one correct node, pj, eventually MBRB-delivers m from node ps.

  • MBRB-Global-delivery.   Suppose a correct node, pi, MBRB-delivers an application message m from ps. Then, at least correct nodes MBRB-deliver m from ps.

3 The Coded-MBRB algorithm

The proposed solution, named Coded MBRB (Algorithm 2), allows a distinguished sender ps (known to everyone) to disseminate one specific application message m. In Appendix A, we discuss how to extend this algorithm so that it implements a general MBRB algorithm, allowing any node to be the sender, as well as allowing multiple instances of the MBRB, either with the same or different senders, to run concurrently. In the algorithm, we instantiate the threshold signature scheme with the threshold value set to τ=n+t2+1 (see Section 2).

Algorithm 2 introduces the MBRB-broadcast(m) operation, which takes message m and disseminates it reliably to a minimum bound of correct nodes, denoted . That is, after executing Algorithm 2, and assuming a correct sender, at least correct nodes will have invoked the MBRB-delivery(m) procedure, while no correct node will have invoked MBRB-delivery with mm. Table 1 summarizes the notations used by Algorithm 2.

Algorithm 1 Helper functions of the Coded MBRB Algorithm (code for pi).

Algorithm description.

MBRB-broadcast(m) allows the sender ps to start disseminating the application message, m (line 17). The sender (line 18) starts by invoking computeFragVecCommit(m) (Algorithm 1). This function encodes the message m using an error-correction code, divides it into n fragments and constructs a vector commitment with an inclusion proof for each fragment. The function returns several essential values: the commitment C, and the fragment details (m~j,πj,j), which contain the fragment data itself m~j (the j-th part of the codeword ECC(m); see below for detail), a proof of inclusion πj for that part, and each fragment’s respective index j. For ease of reference, let 𝖢𝗈𝗆𝗆𝗂𝗍𝗆𝖾𝗇𝗍(m) represent the commitment C obtained from computeFragVecCommit(m). This commitment serves as a compact representation of the entire message m. The sender node ps is responsible for signing the computed commitment C and generating a signature share 𝑠𝑖𝑔s (line 19) which includes ps’s identifier. The sender then initiates m’s propagation by employing the operation comm (line 20), which sends to each node, pj, an individual message, vj. The message vj includes several components: the message type (send), the commitment C, the j-th fragment details (m~j,πj,j), and the signature share 𝑠𝑖𝑔s (line 19) for C.

The rest of the algorithm progresses in two phases, which we describe in turn. The first phase is responsible for message dissemination, which forwards message fragments received by the sender. The other role of this phase is reaching a quorum of nodes that vouch for the same message. A node vouches for a single message by signing its commitment. Nodes collect and store signature shares until it is evident that sufficiently many nodes agree on the same message. The subsequent phase focuses on disseminating the quorum of signature shares so that it is observed by at least  correct nodes, and on successfully terminating while ensuring the delivery of the reconstructed message.

Algorithm 2 Phases of the Coded MBRB Algorithm (code for pi, single-shot, single-sender, threshold for the TS scheme τ=n+t2+1).

Validating message integrity. The validity of the signatures and inclusion proofs are checked each time a new message is received (at lines 21, 26, and 42) using the function isValid (Algorithm 1). All message types (send, forward, and bundle) carry a vector commitment (C or C) and up to two message fragments with their inclusion proofs. Moreover, the send and forward types contain up to two signature shares for the provided commitment, and the bundle type contains a threshold signature for the provided commitment. The validation hinges on three key criteria. Every enclosed signature share or threshold signature must be valid and correspond to the accompanying commitment. For send or forward messages, the signature share from the designated sending node ps must be present. All message fragments must contain valid inclusion proofs for the provided commitment. Note that πi, the proof of inclusion of m~i, does not need to be signed by ps, as the commitment already is.
Phase I: Message dissemination. This phase facilitates the widespread distribution of the message fragments m~j. Recall that the sender has sent each node a different (encoded) fragment of the message m, however, no node currently holds enough information to retrieve m. The phase also sets the ground for forming a quorum on m. When a node receives a send message from the sender, it begins by validating the fragment’s authenticity (line 22), and it forwards this fragment to all other nodes by broadcasting a forward message (line 25).

Upon receiving a send,C, (m~i,πi,i),𝑠𝑖𝑔s message from ps, the recipient pi validates the incoming message (line 22). pi then determines whether it had previously broadcast a forward message at line 25 or signed a commitment C′′ from ps distinct from the currently received C, in which case the incoming message is discarded (line 23). Otherwise, pi proceeds to store the received information (line 24), encompassing the fragment m~i and the associated signature share 𝑠𝑖𝑔s, linked to the specific commitment C. We clarify that pi never stores multiple copies of the same information, i.e., all store operations are to be read as adding an item to a set. Subsequently, pi generates its own signature share 𝑠𝑖𝑔i for the commitment C, storing it for later utilization (line 24). Node pi then disseminates all the relevant information, by broadcasting the message forward,C,(m~i,πi,i),{𝑠𝑖𝑔s,𝑠𝑖𝑔i}.

The broadcast of a forward message is instrumental in disseminating information for several reasons. First, up to d nodes might not receive the sender’s send message. Second, this is the node’s way to disseminate its own fragment and signature share for that specific C.

Upon the arrival of a forward,C,𝑓𝑟𝑎𝑔𝑡𝑢𝑝𝑙𝑒j,𝑠𝑖𝑔𝑠j message from pj (line 26), the recipient pi validates the incoming message using the isValid function (Algorithm 1), discarding invalid messages (line 27). As for send messages, pi checks if it already signed a message from ps with a different commitment C′′, in which case it discards the message (line 28). Subsequently, pi stores the set of signature shares 𝑠𝑖𝑔𝑠j linked to the specific commitment C (line 29) and 𝑓𝑟𝑎𝑔𝑡𝑢𝑝𝑙𝑒j contained in this message, if any (line 31). Also, pi assesses whether a forward message has been previously dispatched. If it has already done so, there is no reason to re-send it, and the processing ends here. Otherwise, similar to above, pi generates its own signature share 𝑠𝑖𝑔i for the commitment C, and broadcasts the message forward,C,,𝑠𝑖𝑔s,𝑠𝑖𝑔i. Note that, in this case, pi is unaware of his own fragment (i.e., it has not received a send message, or otherwise it would have already sent a forward message in line 25); therefore it sends the sentinel value  instead.
Phase II: Reaching quorum and termination. This phase relies on the getThreshSig function described in Algorithm 1, which, given a commitment C, either returns a threshold signature for C (received beforehand or aggregating τ=n+t2+1 signature shares stored for C) if it exists, or otherwise. This phase focuses on ensuring that, once a Byzantine quorum (represented by the threshold signature returned by getThreshSig) and enough message fragments for reconstructing the original message m are gathered, at least  correct nodes deliver m and terminate. Node pi enters Phase II only when there is a commitment C for which getThreshSig returns a valid threshold signature, and pi stores at least k message fragments. As long as no application message from ps was delivered (line 35), pi reconstructs the application message mi (line 36) using the stored message fragments, and use this message as an input to computeFragVecCommit (line 37), which outputs its commitment C=𝖢𝗈𝗆𝗆𝗂𝗍𝗆𝖾𝗇𝗍(mi) along with coded message fragments and proofs of inclusion, (m~j,πj,j). Node pi then ensures that the computed commitment C matches the stored commitment C (line 38). If this condition holds true, then mi=m is the message sent by the sender111To see why this is needed, consider a Byzantine sender that disseminates fragments m~1,,m~n, that do not form a proper codeword., and in particular, pi now holds all the possible fragments for m along with their valid proof of inclusion, including fragments it has never received before! Node pi then retrieves the threshold signature ΣC of C using the getThreshSig function (line 39), and disseminates it along with the message fragments to the rest of the network. In particular, to each pj in the network, pi sends a bundle message (line 40) that includes the commitment C, fragment details (m~i,πi,i) and (m~j,πj,j), and the associated threshold signature ΣC. After these transmissions, pi can MBRB-deliver the reconstructed message mi (line 41).

The parameter k used at line 35 is the number of (valid) fragments sufficient to reconstruct the application message m by the error-correction code ECC. This parameter should be practically selected by the desired given in Lemma A.12. That is, one needs to set ε>0 for =nt(1+ε)d and then choose k1+ε1+ε(ntd). See details in Appendix A.

Upon the arrival of a bundle message, i.e., bundle,C,(m~j,πj,j),𝑓𝑟𝑎𝑔𝑡𝑢𝑝𝑙𝑒i,Σ arriving from pj (line 42), the recipient pi validates the received message using the isValid function (with the 𝑖𝑠𝑇ℎ𝑟𝑒𝑠ℎ𝑆𝑖𝑔 parameter set to 𝖳𝗋𝗎𝖾 to indicate that we verify a threshold signature) and discards invalid messages (line 43). Node pi proceeds to store the arriving information regarding the message segments (m~j,πj,j) and threshold signature Σ for the specific commitment C (line 44). In the case that no bundle message was sent by pi and the received 𝑓𝑟𝑎𝑔𝑡𝑢𝑝𝑙𝑒i is nonempty (so pi learns its fragment, which it stores at line 47, unless already known), pi broadcasts a bundle,C,(m~i,πi,i),,Σ message (line 48).

The use of values appears also in bundle messages (lines 40 and 48). A bundle message might contain up to two fragments: the sender’s fragment (m~i in the pseudo-code), which is always included, and the receiver’s fragment (m~j), which is included only when the sender was able to reconstruct the message m (at line 36). The sender’s fragments are collected by the receivers and allow reconstruction of the message once enough bundle messages are received. The receiver’s fragment allows the receiver to send bundle messages (with its fragment), facilitating the dissemination of both threshold signatures and fragments.

The error-correction code in use.

computeFragVecCommit (Algorithm 1) uses an error-correction code at line 2 to encode the application message m, before it is split into n fragments that will be disseminated by the parties. The code uses a fixed parameter k, that can be set later. Our algorithm requires that the ECC will be able to decode the message m from any subset of k fragments out of the n fragments generated at line 3. That is, we need an ECC that can deal with erasures, where the erased symbols are those contained in the nk missing fragments. To that end, we use a Reed-Solomon code ECC:𝔽k¯𝔽n¯ with k¯>|m|/log|𝔽|. Each fragment contains n¯/n symbols of the codeword, and to be able to recover from (nk)n¯/n erased symbols by ˜2.2, we can set the code’s distance to be d¯>(nk)n¯/n. Since a Reed-Solomon code is MDS (see Section 2), d¯n¯k¯+1, and we can set n¯>nk(k¯1). The code will have a constant rate, i.e., |ECC(m)|=𝒪(|m|), as long as m is sufficiently long, i.e., |m|=Ω(nlog|𝔽|), which implies that k¯=Ω(n), and as long as k=Ω(n). Recall also that |𝔽|n¯ is in a Reed-Solomon code.

Analysis.

The following main theorem states that our algorithm is correct.

Theorem 3.1 (main).

Assume n>3t+2d, k(nt2d) and ε>0. Algorithm 2 implements MBRB with >nt(1+ε)d. Any algorithm activation on the input message m communicates 4n2 messages, where each node communicates 𝒪(|m|+nκ) bits.

Due to the page limit, the complete proof appear in Appendices A and B. Here, we sketch the proof of the MBRB-Global-delivery property in Lemma 3.2 (assuming the other properties hold) and the communication analysis in Lemma 3.3.

Lemma 3.2 (MBRB-Global-delivery).

If a correct node pi MBRB-delivers an application message m, then at least =cd/(1((k1)/(cd))) correct nodes MBRB-deliver m.

Proof Sketch of Lemma 3.2.

Let Cm=𝖢𝗈𝗆𝗆𝗂𝗍𝗆𝖾𝗇𝗍(m). The proof counts the bundle messages disseminated by correct nodes. If a correct node disseminates a bundle message both at lines 40 and 48, we only consider the one from line 40. Let B𝗌𝖾𝗇𝖽 be the set of correct nodes that disseminate at least one bundle message, let B𝗋𝖾𝖼𝗏 be the set of correct nodes that receive at least one valid bundle message from a correct node during their execution, and let Bk,𝗋𝖾𝖼𝗏 be the set of correct nodes that receive bundle messages from at least k distinct correct nodes. The following holds.

Observation 3.2.1.

c|B𝗋𝖾𝖼𝗏|cd and c|B𝗌𝖾𝗇𝖽|cd.

Proof of Observation 3.2.1.

Since B𝗌𝖾𝗇𝖽 and B𝗋𝖾𝖼𝗏 contain only correct nodes, trivially c|B𝗌𝖾𝗇𝖽| and c|B𝗋𝖾𝖼𝗏|. |B𝗋𝖾𝖼𝗏|cd and |B𝗌𝖾𝗇𝖽|cd follow from the definition of the message adversary, and the way the algorithm chains bundle messages.

Observation 3.2.2.

|Bk,𝗋𝖾𝖼𝗏|×|B𝗌𝖾𝗇𝖽|+(k1)(|B𝗋𝖾𝖼𝗏||Bk,𝗋𝖾𝖼𝗏|)|B𝗌𝖾𝗇𝖽|(cd).

Proof of Observation 3.2.2.

The inequality follows from a counting argument on the overall number of valid bundle messages received by correct nodes from distinct correct senders. In particular, we use the fact that nodes in B𝗋𝖾𝖼𝗏Bk,𝗋𝖾𝖼𝗏 each receives at most k1 valid bundle messages from distinct correct senders.

Observation 3.2.3.

|Bk,𝗋𝖾𝖼𝗏|cd/(1((k1)/(cd))).

Proof of Observation 3.2.3.

The observation is obtained by isolating Bk,𝗋𝖾𝖼𝗏 in Observation 3.2.2, and minimizing the right-hand side to remove the dependence on |B𝗌𝖾𝗇𝖽|.

Observation 3.2.4.

All nodes in Bk,𝗋𝖾𝖼𝗏 MBRB-deliver m.

Proof of Observation 3.2.4.

The observation results from the properties of the vector commitment scheme, the unforgeability of signatures, and the ability of the ECC scheme to reconstruct m from k distinct valid fragments for m.

As all nodes in Bk,𝗋𝖾𝖼𝗏 are correct, the above observations yield the lemma. (The full proof of this lemma can be found in Appendix A, page A.12.)

Lemma 3.3.

Correct nodes collectively communicate at most 4n2 messages. Each correct node sends at most 𝒪(|m|+nκ) bits. Overall, the system sends at most 𝒪(n|m|+n2κ) bits.

Proof Sketch of Lemma 3.3.

For concision, we use in the following the verb disseminate for referring to a correct node that sends a message to all n nodes of the system, whether it be using the broadcast operation or the comm operation.

Let us first consider message complexity. If the sender ps is correct, it disseminates one send message to all nodes at line 20, receives it immediately at line 21, and disseminates one forward message at line 25 (as ps cannot not pass line 32 afterward). Every other correct node disseminates at most two forward messages at lines 25 and 34. Moreover, each correct node disseminates at most two bundle messages at lines 40 and 48. This amounts to at most 4n2 messages sent by correct nodes.

Let us now analyze the bit complexity of the algorithm. If the sender ps is correct, it disseminates a send message (line 20) including a fragment of m (𝒪(|m|/n) bits) with its proof of inclusion (𝒪(κ) bits), a commitment (𝒪(κ) bits), and a signature share (𝒪(κ) bits). Thus, ps communicates at most 𝒪(|m|+nκ) bits in send messages. Moreover, a forward message contains a commitment (𝒪(κ) bits), at most one message fragment with its proof of inclusion (𝒪(|m|/n+κ) bits), and two signature shares (𝒪(κ) bits). Hence, every correct node communicates at most 𝒪(|m|+κn) bits in forward messages (line 25 or 34). Additionally, a bundle message contains a commitment (𝒪(κ) bits), at most two message fragments with their proof of inclusion (𝒪(|m|/n+κ) bits), and one threshold signature (𝒪(κ) bits). Therefore, every correct node communicates at most 𝒪(|m|+κn) bits in bundle messages (line 25 or 34). This amounts to a total communication of 𝒪(|m|+nκ) bits sent per correct node, and 𝒪(n|m|+n2κ) bits sent overall.

Let us remark that the above analysis also holds in the presence of Byzantine nodes, even if ps is Byzantine. (The full proof of this lemma can be found in Appendix B, page B.1.)

4 Conclusion

We introduced a Coded MBRB algorithm that significantly improves the state-of-the-art solution. It achieves optimal communication (up to the size of cryptographic parameter κ) while maintaining a high delivery power, i.e., it ensures that messages are delivered by at least =nt(1+ε)d correct nodes, where ε>0 is a tunable parameter. The proposed solution is deterministic up to its use of cryptography (threshold signatures and vector commitments). Each correct node sends no more than 4n messages and communicates at most 𝒪(|m|+nκ) bits, where |m| represents the length of the input message and κ is a security parameter. We note that the algorithm’s communication efficiency holds for sufficiently long messages and approaches the natural upper bound on delivery power, ntd, which accounts for the message adversary’s ability to isolate a subset of correct nodes. The proposed approach achieves a delivery power that can be made arbitrarily close to this limit, albeit with a marginal increase in communication costs, which depends on the chosen ε. This work represents a significant advancement in Byzantine Reliable Broadcast, offering a practical solution for robust communication in asynchronous message-passing systems with malicious nodes and message adversaries. One intriguing question is whether it is possible to devise an (M)BRB algorithm without the κ parameter in its communication complexity or the ϵ parameter in its delivery power , e.g., by leveraging randomization [1] or error-freedom [5].

References

  • [1] Ittai Abraham, T.-H. Hubert Chan, Danny Dolev, Kartik Nayak, Rafael Pass, Ling Ren, and Elaine Shi. Communication complexity of byzantine agreement, revisited. Distributed Comput., 36(1):3–28, 2023. doi:10.1007/S00446-022-00428-8.
  • [2] Timothé Albouy, Emmanuelle Anceaume, Davide Frey, Mathieu Gestin, Arthur Rauch, Michel Raynal, and François Taïani. Asynchronous BFT asset transfer: quasi-anonymous, light, and consensus-free. CoRR, arXiv:2405.18072, 2024. doi:10.48550/arXiv.2405.18072.
  • [3] Timothé Albouy, Davide Frey, Michel Raynal, and François Taïani. A modular approach to construct signature-free BRB algorithms under a message adversary. In 26th International Conference on Principles of Distributed Systems (OPODIS 2022), volume 253 of LIPIcs, pages 26:1–26:23, 2023. doi:10.4230/LIPICS.OPODIS.2022.26.
  • [4] Timothé Albouy, Davide Frey, Michel Raynal, and François Taïani. Asynchronous Byzantine reliable broadcast with a message adversary. Theor. Comput. Sci., 978:114110, 2023. doi:10.1016/J.TCS.2023.114110.
  • [5] Nicolas Alhaddad, Sourav Das, Sisi Duan, Ling Ren, Mayank Varia, Zhuolun Xiang, and Haibin Zhang. Balanced Byzantine reliable broadcast with near-optimal communication and improved computation. In Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, PODC’22, pages 399–417, 2022. doi:10.1145/3519270.3538475.
  • [6] Piotr Berman, Krzysztof Diks, and Andrzej Pelc. Reliable broadcasting in logarithmic time with Byzantine link failures. Journal of Algorithms, 22(2):199–211, 1997. doi:10.1006/jagm.1996.0810.
  • [7] Martin Biely. An optimal Byzantine agreement algorithm with arbitrary node and link failures. In IASTED Parallel and Distributed Computing and Systems, volume 1, pages 146–151, 2003. URL: https://www.auto.tuwien.ac.at/Projects/W2F/documents/pdcs03.ps.gz.
  • [8] Martin Biely, Ulrich Schmid, and Bettina Weiss. Synchronous consensus under hybrid process and link failures. Theoretical Computer Science, 412(40):5602–5630, 2011. doi:10.1016/j.tcs.2010.09.032.
  • [9] Silvia Bonomi, Jérémie Decouchant, Giovanni Farina, Vincent Rahli, and Sébastien Tixeuil. Practical Byzantine reliable broadcast on partially connected networks. In ICDCS, pages 506–516. IEEE, 2021. doi:10.1109/ICDCS51616.2021.00055.
  • [10] Gabriel Bracha. Asynchronous byzantine agreement protocols. Inf. Comput., 75(2):130–143, 1987. doi:10.1016/0890-5401(87)90054-X.
  • [11] Gabriel Bracha and Sam Toueg. Resilient consensus protocols. In Proceedings of the Second Annual ACM Symposium on Principles of Distributed Computing, pages 12–26. ACM, August 1983. doi:10.1145/800221.806706.
  • [12] Gabriel Bracha and Sam Toueg. Asynchronous consensus and broadcast protocols. J. ACM, 32(4):824–840, 1985. doi:10.1145/4221.214134.
  • [13] C. Cachin and J.A. Poritz. Secure INtrusion-tolerant replication on the internet. In Proceedings International Conference on Dependable Systems and Networks (DSN), pages 167–176, 2002. doi:10.1109/DSN.2002.1028897.
  • [14] C. Cachin and S. Tessaro. Asynchronous verifiable information dispersal. In 24th IEEE Symposium on Reliable Distributed Systems (SRDS’05), pages 191–201, 2005. doi:10.1109/RELDIS.2005.9.
  • [15] Dario Catalano and Dario Fiore. Vector commitments and their applications. In Proc. 16th Int’l Conference on Practice and Theory in Public-Key Cryptography (PKC’13), volume 7778 of Lecture Notes in Computer Science, pages 55–72. Springer, 2013. doi:10.1007/978-3-642-36362-7_5.
  • [16] Keren Censor-Hillel, Shir Cohen, Ran Gelles, and Gal Sela. Distributed computations in fully-defective networks. In Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, PODC’22, pages 141–150, 2022. doi:10.1145/3519270.3538432.
  • [17] Keren Censor-Hillel, Ran Gelles, and Bernhard Haeupler. Making asynchronous distributed computations robust to noise. Distributed Computing, 32(5):405–421, 2019. doi:10.1007/s00446-018-0343-5.
  • [18] Sourav Das, Zhuolun Xiang, and Ling Ren. Asynchronous data dissemination and its applications. In CCS, pages 2705–2721. ACM, 2021. doi:10.1145/3460120.3484808.
  • [19] Pallab Dasgupta. Agreement under faulty interfaces. Information Processing Letters, 65(3):125–129, 1998. doi:10.1016/S0020-0190(97)00202-0.
  • [20] Danny Dolev. The Byzantine generals strike again. Journal of Algorithms, 3(1):14–30, 1982. doi:10.1016/0196-6774(82)90004-9.
  • [21] Danny Dolev and Rüdiger Reischuk. Bounds on information exchange for Byzantine agreement. J. ACM, 32(1):191–204, January 1985. doi:10.1145/2455.214112.
  • [22] Sisi Duan, Michael K. Reiter, and Haibin Zhang. BEAT: asynchronous BFT made practical. In CCS, pages 2028–2041. ACM, 2018. doi:10.1145/3243734.3243812.
  • [23] Romaric Duvignau, Michel Raynal, and Elad M. Schiller. Self-stabilizing Byzantine fault-tolerant repeated reliable broadcast. In Stabilization, Safety, and Security of Distributed Systems, pages 206–221. Springer International Publishing, 2022. doi:10.1007/978-3-031-21017-4_14.
  • [24] Fabian Frei, Ran Gelles, Ahmed Ghazy, and Alexandre Nolin. Content-oblivious leader election on rings. In 38th International Symposium on Distributed Computing (DISC 2024), volume 319, pages 24:1–24:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.DISC.2024.26.
  • [25] Ran Gelles. Coding for interactive communication: A survey. Foundations and Trends® in Theoretical Computer Science, 13(1–2):1–157, 2017. doi:10.1561/0400000079.
  • [26] Li Gong, Patrick Lincoln, and John Rushby. Byzantine agreement with authentication: Observations and applications in tolerating hybrid and link faults. In Dependable Computing and Fault Tolerant Systems, volume 10, pages 139–158. IEEE, 1995.
  • [27] Rachid Guerraoui, Jovan Komatovic, Petr Kuznetsov, Yvonne-Anne Pignolet, Dragos-Adrian Seredinschi, and Andrei Tonkikh. Dynamic Byzantine reliable broadcast. In 24th International Conference on Principles of Distributed Systems (OPODIS 2020), volume 184, pages 23:1–23:18, Dagstuhl, Germany, 2021. doi:10.4230/LIPIcs.OPODIS.2020.23.
  • [28] William M. Hoza and Leonard J. Schulman. The adversarial noise threshold for distributed protocols. In ACM-SIAM Symposium on Discrete Algorithms, pages 240–258, 2016. doi:10.1137/1.9781611974331.ch18.
  • [29] Leslie Lamport, Robert E. Shostak, and Marshall C. Pease. The Byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, 1982. doi:10.1145/357172.357176.
  • [30] Alexandre Maurer and Sébastien Tixeuil. Self-stabilizing Byzantine broadcast. In 2014 IEEE 33rd International Symposium on Reliable Distributed Systems, pages 152–160, 2014. doi:10.1109/SRDS.2014.10.
  • [31] Ralph C. Merkle. A certified digital signature. In Gilles Brassard, editor, Advances in Cryptology — CRYPTO’ 89 Proceedings, pages 218–238, New York, NY, 1990. Springer New York. doi:10.1007/0-387-34805-0_21.
  • [32] Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. The honey badger of BFT protocols. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, pages 31–42, 2016. doi:10.1145/2976749.2978399.
  • [33] Kartik Nayak, Ling Ren, Elaine Shi, Nitin H. Vaidya, and Zhuolun Xiang. Improved extension protocols for Byzantine broadcast and agreement. In 34th International Symposium on Distributed Computing (DISC 2020), volume 179, pages 28:1–28:17, 2020. doi:10.4230/LIPIcs.DISC.2020.28.
  • [34] M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, April 1980. doi:10.1145/322186.322188.
  • [35] Andrzej Pelc. Reliable communication in networks with Byzantine link failures. Networks, 22(5):441–459, 1992. doi:10.1002/net.3230220503.
  • [36] Kenneth J. Perry and Sam Toueg. Distributed agreement in the presence of processor and communication faults. IEEE Trans. Softw., SE-12(3):477–482, 1986. doi:10.1109/TSE.1986.6312888.
  • [37] Michel Raynal. Message adversaries. In Encyclopedia of Algorithms, pages 1272–1276. Springer, 2016. doi:10.1007/978-1-4939-2864-4_609.
  • [38] Michel Raynal. Fault-Tolerant Message-Passing Distributed Systems - An Algorithmic Approach. Springer, 2018. doi:10.1007/978-3-319-94141-7.
  • [39] Irving S. Reed and Gustave Solomon. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics, 8(2):300–304, 1960.
  • [40] Ron M. Roth. Introduction to coding theory. Cambridge University Press, 2006.
  • [41] Nicola Santoro and Peter Widmayer. Time is not a healer. In STACS 89, pages 304–313, Berlin, Heidelberg, 1989. Springer Berlin Heidelberg. doi:10.1007/BFb0028994.
  • [42] Nicola Santoro and Peter Widmayer. Distributed function evaluation in the presence of transmission faults. In International Symposium on Algorithms, pages 358–367. Springer, 1990. doi:10.1007/3-540-52921-7_85.
  • [43] Nicola Santoro and Peter Widmayer. Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci., 384(2-3):232–249, 2007. doi:10.1016/J.TCS.2007.04.036.
  • [44] Ulrich Schmid and Christof Fetzer. Randomized asynchronous consensus with imperfect communications. In SRDS, pages 361–370. IEEE Computer Society, 2003. doi:10.1109/RELDIS.2003.1238089.
  • [45] Victor Shoup. Practical threshold signatures. In Bart Preneel, editor, Proc. Int’l Conference on the Theory and Application of Cryptographic Techniques (EUROCRYPT’2000), volume 1807 of Lecture Notes in Computer Science, pages 207–220. Springer, 2000. doi:10.1007/3-540-45539-6_15.
  • [46] Hin-Sing Siu, Yeh-Hao Chin, and Wei-Pang Yang. Byzantine agreement in the presence of mixed faults on processors and links. IEEE Transactions on Parallel and Distributed Systems, 9(4):335–345, April 1998. doi:10.1109/71.667895.
  • [47] Lei Yang, Seo Jin Park, Mohammad Alizadeh, Sreeram Kannan, and David Tse. Dispersedledger: High-throughput byzantine consensus on variable bandwidth networks. In NSDI, pages 493–512. USENIX Association, 2022. URL: https://www.usenix.org/conference/nsdi22/presentation/yang.

Appendix A Correctness analysis

We now analyze Algorithm 2 and show it satisfies the MBRB properties specified in Definition 2.3. This analysis, along with the communication analysis in Lemma B.1 prove our main Theorem 3.1.

Assumption A.1 (coded-MBRB-assumption).

n>3t+2d and k(nt)2d.

Theorem A.2.

For any network that satisfies ˜A.1 and for any ε>0, Algorithm 2 implements an MBRB algorithm (Definition 2.3) with >nt(1+ε)d.

Recall that any MBRB algorithm requires n>3t+2d [4]. Our coded MBRB algorithm uses an error-correction code that can reconstruct any encoded message from k fragments of the codeword. While we have some flexibility in selecting the value of k, which affects the parameters of the ECC and thus the communication complexity, our proof requires that k will not be too large. We begin with a few technical lemmas.

Lemma A.3.

If a correct node pu stores a message fragment m~j associated to a proof of inclusion πj for some commitment C and node identity j, then πj is valid with respect to C, that is 𝗏𝖼_𝗏𝖾𝗋𝗂𝖿𝗒(C,πj,m~j,j)=valid.

Proof of Lemma A.3.

A correct node stores fragments for a commitment C at lines 24, 31, 44, and 47, when receiving send, forward, or bundle messages, respectively. The fragments stored at these lines and their proof have been received through the corresponding message, whose content is verified by a call to isValid (at lines 22, 27, and 43). isValid (described in Algorithm 1) checks that proofs of inclusion are valid for the corresponding commitment.

The following notion of valid messages will be used throughout the analysis to indicate messages containing only valid information, as the algorithm dictates.

Definition A.4 (Valid messages).

We say that a message of type send, forward, or bundle is valid if and only if isValid returns 𝖳𝗋𝗎𝖾 at line 22, 27, or 43, respectively, upon the receipt of that message.

Operatively, valid messages satisfy the following which is immediate from the definition of the isValid function (Algorithm 1).

Corollary A.5.

To be valid, a message must meet the following criteria: (i) all the signatures shares or threshold signatures it contains must be valid and correspond to the commitment included in the message; (ii) if it is of type send or forward, it must contain a signature by the designated sending node ps; and (iii) all inclusion proofs must be valid with respect to the commitment included in the message.

We now show that the correct parties always send valid messages. However, they might receive invalid messages sent by Byzantine nodes.

Lemma A.6.

All send, forward, or bundle messages sent by a correct node pu, are valid.

Proof of Lemma A.6.

The only correct node that sends send messages is ps at line 20. Indeed, when ps is correct, this message will contain a valid signature share by ps and all proofs of inclusion are valid, by lines 18 and 19.

Now consider a forward message sent either at lines 25 or 34. To reach there, pu must have passed line 22 or line 27, which guarantees pu received a valid signature for C made by ps (where C is the commitment in the received message triggering this code). Then, at line 24 or at line 29, pu stores a signature of ps for this C, and at lines 24 and 33, pu signs the same C. Thus, conditions (i) and (ii) of Corollary A.5 hold, and if the forward is sent at line 34, then condition (iii) vacuously holds as well. If the forward message is sent at line 25, it contains a fragment that was stored by pu for the same C, and by Lemma A.3, its associated proof of inclusion is valid; thus condition (iii) holds in this case as well.

Finally, consider a bundle message. First off, this type of message is not concerned by condition (ii) of Corollary A.5. For the transmission at line 40, condition (i) follows from the construction of the threshold signature ΣC at line 39. ΣC is guaranteed to be non- by the condition at line 35 of Algorithm 2, and is provided by the helper function getThreshSig() (line 16 of Algorithm 1). When executing getThreshSig(), the first possibility is that ΣC is already known by pu because it was received by pu at line 42 and stored at line 44. In this case, the validity of ΣC is ensured by the check at line 43. The second possibility is that ΣC aggregates τ=n+t2+1 signature shares received by pu at line 21 or line 26, and stored at line 24 or line 29, respectively. In this case, the validity of all these signature shares is ensured by the checks at line 22 and line 27, respectively, and thus the aggregated threshold signature ΣC is also valid. Condition (iii) follows since the proofs of inclusion were computed at line 37 by pu and match the same commitment C used in that bundle message, as enforced by line 38. For the broadcast at line 48, conditions (i) and (iii) follow since the threshold signature Σ and the fragment tuple (m~j,πj,j) come from the incoming bundle message at line 42, whose validity (w.r.t. C) has been verified at line 43.

Lemma A.7.

A correct node p signs a most a single commitment C.

Proof of Lemma A.7.

p signs a commitment either at line 19 (for ps), 24 or 33. We consider two cases, depending on whether p is ps or not.

  • Case 1: Assume pps. p can sign some commitment only at line 24 or 33. By the conditions at lines 23 and 28, lines 24 and 33 are executed only if either p has not signed any commitment yet, or has already signed the exact same commitment C.

  • Case 2: If p=ps, because valid messages must contain ps’s signature share (due to calls to isValid() at lines 22 and 27), and because we have assumed that signatures cannot be forged, line 19 is always executed before lines 23 and 32. By the same reasoning as Case 1, ps therefore never signs a different commitment at line 24 or 33.

We recall that the above lemmas, and as a consequence, the upcoming theorems, hold with high probability, assuming a computationally-bounded adversary that forges signature shares/threshold signatures or finds commitment collisions with only negligible probability. We can now prove the properties required for an MBRB algorithm, as depicted in Definition 2.3.

Lemma A.8 (MBRB-Validity).

Suppose ps is correct and a correct node, pi, MBRB-delivers an application message, m. Then, node ps has previously MBRB-broadcast m.

Proof of Lemma A.8.

Suppose pi MBRB-delivers m at line 41. Consider C the commitment that renders true the condition at line 35, and C the commitment that is computed at line 37. It holds that C=C by line 38, or otherwise pi could not have reached line 41.

Consider the threshold signature ΣC returned by the getThreshSig function at line 39. Using the same reasoning as in the proof of Lemma A.6, ΣC is valid, and must, therefore, aggregate at least τ=n+t2+1 valid signature shares for C. Let us remark that, out of all these valid signature shares, at least n+t2+1t=nt2+11 are generated by correct nodes222 Remind that, x,i:x+i=x+i. . Thus, at least one correct node pj must have produced a signature share for C, whether it be at line 24 or line 33 if pjps, or at line 19 if pj=ps. However, in all these cases, the sender ps must have necessarily produced a signature share for C: the case pj=ps is trivial, and in the case pjps, pj must have verified the presence of a valid signature share from ps in the message it received, at line 22 or line 27, respectively.

Under the assumption that the adversary cannot forge signature shares/threshold signatures (Section 2), and recalling that ps is correct, the only way in which ps could have signed C is by executing line 19 when MBRB-broadcasting some message m at line 17; see also the proof of Lemma A.7. Furthermore, recall that the commitment is collision-resistant (or binding, see Section 2), meaning that except with negligible probability, the message m that ps uses in line 17 satisfies m=m, since it holds that C=𝖢𝗈𝗆𝗆𝗂𝗍𝗆𝖾𝗇𝗍(m)=𝖢𝗈𝗆𝗆𝗂𝗍𝗆𝖾𝗇𝗍(m)=C.

Lemma A.9 (MBRB-No-duplication).

A correct node pi MBRB-delivers at most one application message, m.

Proof of Lemma A.9.

The condition at line 35 directly implies the proof.

Lemma A.10 (MBRB-No-duplicity).

No two different correct nodes MBRB-deliver different application messages.

Proof of Lemma A.10.

Suppose, towards a contradiction, that pi MBRB-delivers m and pj MBRB-delivers mm, where pi and pj are both correct nodes. Let us denote by C, resp. C, the commitment returned by computeFragVecCommit() for m, resp. for m. As commitments are assumed to be collision-resistant (Section 2), mm implies CC.

By the condition at line 35, pi gets a threshold signature Σi from the getThreshSig function that aggregates a set Qi containing τ=n+t2+1 valid signature shares for C. Similarly, pj gets a threshold signature Σj aggregating a set Qj of signature shares for C. We know that |QiQj|=|Qi|+|Qj||QiQj|. Moreover, we know that, x,k:k=x+1k>x, and hence we have Qi>n+t2<Qj. Thus, |QiQj||Qi|+|Qj|n>2n+t2n=t. In other words, Qi and Qj have at least one correct node, pu, in common that has signed both C and C. Lemma A.7, and the fact that pu has signed both C and C leads the proof to the needed contradiction. Thus, m=m, and the lemma holds.

Lemma A.11 (MBRB-Local-delivery).

Suppose ps is correct and MBRB-broadcasts m. At least one correct node, say, pj, MBRB-delivers m.

Proof of Lemma A.11.

Let us denote by Cm the commitment computed at line 4 when executing computeFragVecCommit(m). The proof of the lemma will follow from Observations A.11.1A.11.8 stated and proven below.

Observation A.11.1.

All valid send, forward, or bundle messages received by some correct node pu contain Cm.

Proof of Observation A.11.1.

Recall that ps MBRB-broadcasts m, thus we know that ps has included its own signature share, 𝑠𝑖𝑔s=(𝗍𝗌_𝗌𝗂𝗀𝗇_𝗌𝗁𝖺𝗋𝖾s(Cm),s), when it propagates send,Cm,(m~j,πj,j),𝑠𝑖𝑔s (lines 1820). Consider a correct node pu that receives a valid send, forward, or bundle message containing a commitment Cu at lines 21, 26, or 42. If the message is of type send or forward, then, as it is valid, it must contain ps’s signature on Cu. If the message is of type bundle, then, similarly to Lemma A.8, its valid threshold signature for Cu aggregates a set of valid signature shares for Cu that contains at least one share produced by a correct node px. But for px to produce this share, ps must also have produced a valid signature share for Cu, either because px must have checked its existence at line 22 or line 27 (prior to signing, at line 24 or line 33, respectively), or because px is the sender. Hence, in any case, ps produces a signature share for Cu. Since ps is correct, by Lemma A.7, it does not sign another commitment CCm. Under the assumption that signatures cannot be forged, the above implies that Cu=Cm.

Observation A.11.2.

A correct node pu only signs valid signature shares for Cm.

Proof of Observation A.11.2.

If pu=ps, it MBRB-broadcasts a single message and executes line 19 only once, signing Cm. Besides line 19, a correct node pu only signs signature shares after receiving a valid send or forward message (at lines 24 and 33), and when it does, pu only ever signs the commitment received in the message. By Observation A.11.1, this implies pu never signs any CCm.

Observation A.11.3.

If a correct node pu broadcasts a forward message, this message is of the form forward,Cm,-,{𝑠𝑖𝑔s,𝑠𝑖𝑔u}, where 𝑠𝑖𝑔s,𝑠𝑖𝑔u are ps’s and pu’s valid signature shares for Cm.

Proof of Observation A.11.3.

Consider a correct node pu that broadcasts a message forward,C,-,𝑠𝑖𝑔𝑠 either at lines 25 or 34. By Observation A.11.1, C=Cm. The observation then follows from Lemma A.6.

We now define F𝗋𝖾𝖼𝗏 to be the set of correct nodes that receive a valid message forward,Cm,-,𝑠𝑖𝑔𝑠 at line 26, where 𝑠𝑖𝑔𝑠 contains ps’s valid signature for Cm. We analyze its size and the behavior of such nodes in the following observations.

Observation A.11.4.

F𝗋𝖾𝖼𝗏 contains at least one correct node, i.e., F𝗋𝖾𝖼𝗏.

Proof of Observation A.11.4.

If ps is correct and MBRB-broadcasts m, it executes line 20 and disseminates messages of the form send,Cm,(m~j,πj),𝑠𝑖𝑔s to all nodes, where 𝑠𝑖𝑔s is ps’s signature share of Cm. By definition of the message adversary, these send messages are received by at least cd correct nodes.

By ˜A.1, n>3t+2d, and therefore cdntd>0. At least one correct node px, therefore, receives one of the send messages disseminated by ps at line 20. As ps is correct, by Lemma A.6, this message is valid, and is handled by px at lines 2125.

By Observation A.11.2, px only signs signature shares for Cm, and thus passes the test at line 23, and reaches line 25, where it disseminates a forward message. By Observation A.11.3, this message is of the form forward,Cm,-,{𝑠𝑖𝑔s,𝑠𝑖𝑔x}, and is valid. As above, by definition of the message adversary, this forward message is received by at least cd>0 correct nodes. By definition these nodes belong to F𝗋𝖾𝖼𝗏, which yield |F𝗋𝖾𝖼𝗏|>0 and F𝗋𝖾𝖼𝗏.

Observation A.11.5.

Any puF𝗋𝖾𝖼𝗏 broadcasts a forward,Cm,-, {𝑠𝑖𝑔s,𝑠𝑖𝑔u} message, where 𝑠𝑖𝑔s and 𝑠𝑖𝑔u are ps and pu’s valid signature shares for Cm, respectively.

Proof of Observation A.11.5.

Let puF𝗋𝖾𝖼𝗏 upon receiving a valid forward message forward,Cm,-,𝑠𝑖𝑔𝑠 at line 26. By the condition of line 32, pu has either previously sent a forward message at line 25 or it will send such a message at line 34. In both cases, Observation A.11.3 applies and guarantees that this message contains Cm and both pu’s and ps’s valid signature shares.

Note that F𝗋𝖾𝖼𝗏 is defined over an entire execution of Algorithm 2. Observation A.11.5 therefore states that any correct node pu that receives a valid forward message at some point of its execution also broadcasts a matching forward message at some point of its execution. The two events (receiving and sending a forward message) might, however, occur in any order. For instance, pu might first receive a send message from ps at line 21, disseminate a forward message as a result at line 25, and later on possibly receive a forward message from some other node at line 26. Alternatively, pu might first receive a forward message at line 26, and disseminate its own forward message at line 34 as a result. In this second case, pu might also eventually receive a send message from ps (at line 21). If this happens, pu will disseminate a second forward message at line 25. A correct node, however, never disseminates more than two forward messages (at lines 25 and 34).

Observation A.11.6.

Any broadcast of forward,Cm,𝑓𝑟𝑎𝑔𝑡𝑢𝑝𝑙𝑒,{𝑠𝑖𝑔s,𝑠𝑖𝑔u} by a correct puF𝗋𝖾𝖼𝗏 arrives to at least cd correct nodes that are each, eventually, in F𝗋𝖾𝖼𝗏.

Proof of Observation A.11.6.

Each broadcast by a correct pu of a forward message is eventually received by at least cd correct nodes by definition of the message adversary. By Observation A.11.3 the forward message contains Cm, by Lemma A.6 it is valid. Thus, each of its at least cd correct recipients belong in F𝗋𝖾𝖼𝗏, by definition.

Because forward messages are disseminated at line 34, the reception and sending of forward messages by correct nodes will induce a “chain reaction” until a correct node is reached that has already disseminated a forward message. This “chain reaction” mechanism is the intuitive reason why some correct node will eventually receive enough distinct forward messages to trigger an MBRB-delivery, as captured by the following observation.

Observation A.11.7.

There exists a correct node pj that receives messages forward,Cm,-,𝑠𝑖𝑔𝑠u={𝑠𝑖𝑔s,𝑠𝑖𝑔u} from at least (cd) distinct correct nodes pu, where 𝑠𝑖𝑔s=(𝗍𝗌_𝗌𝗂𝗀𝗇_𝗌𝗁𝖺𝗋𝖾s(Cm),s) and 𝑠𝑖𝑔u=(𝗍𝗌_𝗌𝗂𝗀𝗇_𝗌𝗁𝖺𝗋𝖾u(Cm),u) are ps and pu’s valid signature shares for Cm, respectively, and the forward message is the last message sent by pu.

Proof of Observation A.11.7.

By Observation A.11.5, any nodes puF𝗋𝖾𝖼𝗏 broadcasts at least one message forward,Cm,-,𝑠𝑖𝑔𝑠u={𝑠𝑖𝑔s,𝑠𝑖𝑔u}, that includes pu’s valid signature share for Cm, 𝑠𝑖𝑔u= (𝗍𝗌_𝗌𝗂𝗀𝗇_𝗌𝗁𝖺𝗋𝖾u(Cm),u). Consider all the forward messages sent by nodes in F𝗋𝖾𝖼𝗏 during the last time they perform such a broadcast. By Observation A.11.6, there are |F𝗋𝖾𝖼𝗏| senders, puF𝗋𝖾𝖼𝗏, such that each of pu’s last broadcast of a forward message is guaranteed to be delivered to at least cd correct nodes px, such that eventually pxF𝗋𝖾𝖼𝗏. Thus, at least |F𝗋𝖾𝖼𝗏|(cd) such messages are received by nodes in F𝗋𝖾𝖼𝗏, overall. By Observation A.11.4, F𝗋𝖾𝖼𝗏 contains at least one node. We can, therefore, apply the pigeonhole principle, where F𝗋𝖾𝖼𝗏 are the holes and the above |F𝗋𝖾𝖼𝗏|(cd) messages are the pigeons, and observe that there exists a node pjF𝗋𝖾𝖼𝗏 that will receive at least |F𝗋𝖾𝖼𝗏|(cd)/|F𝗋𝖾𝖼𝗏| such messages. Since we limit the discussion to a single, i.e., the last, broadcast performed by each node in F𝗋𝖾𝖼𝗏, no node in F𝗋𝖾𝖼𝗏 receives two of the above messages that were originated by the same node in F𝗋𝖾𝖼𝗏. Therefore, we deduce that pj has received messages of the form forward,Cm,-,𝑠𝑖𝑔𝑠u from at least (cd) distinct correct nodes pu and the forward message is the last message sent by pu.

Observation A.11.8.

At least one correct node MBRB-delivers m from ps.

Proof of Observation A.11.8.

By Observation A.11.7, there is a correct node pj that receives messages of the form forward,Cm,-,𝑠𝑖𝑔𝑠u from at least (cd) distinct correct nodes pu, such that these forward messages are the last message sent by each pu. Let us denote by U the set of such nodes pu, hence, |U|cd.

Still by Observation A.11.7, pj receives a valid signature share 𝑠𝑖𝑔u=(𝗍𝗌_𝗌𝗂𝗀𝗇_𝗌𝗁𝖺𝗋𝖾u(Cm),u) from each node puU. It thus receives at least (cd) distinct signature shares for Cm. ˜A.1 says 3t+2d<n, and thus, n+3t+2d<2n and n+t<2n2t2d. Since ntc, we have (n+t)/2<ntdcd. Thus, pj receives more than (n+t)/2 valid distinct signature shares for Cm.

Let us now consider the set of correct nodes S that receive the initial send messages disseminated by ps at line 20. Any node pxS receives through the send message its allocated fragment (m~x,πx,x) from ps. By definition of the message adversary, the send messages disseminated at line 20 are received by at least cd correct nodes, therefore |S|cd. Furthermore, all nodes in S broadcast a forward message at line 25, and this will be their last forward message, due to the condition of line 32. By the above reasoning, this forward message will contain their message fragment, that is, it will be of the form forward,Cm,(m~x,πx,x),𝑠𝑖𝑔𝑠u. By Lemma A.6, they are all valid.

By definition of S and U, both these sets contain only correct nodes, thus, |SU|c. As a result, |SU|=|S|+|U||SU|2×(cd)c=c2d. The last forward messages broadcast by nodes in SU are received by pj by the definition of U. As argued above about nodes in S (and thus applying to nodes in SU), forward messages sent by a node in SU contain their valid message fragment and proof of inclusion (m~x,πx,x). It follows that pj accumulates at least c2d distinct such message fragments with their (valid) proof of inclusion. By ˜A.1, c2dk.

To conclude the proof, note that we have shown that pj eventually receives more than (n+t)/2 valid distinct signature shares for Cm, and additionally, that pj accumulates at least k valid message fragments with their proof of inclusion. At this point, the condition of line 35 becomes 𝖳𝗋𝗎𝖾 for pj. Because the commitment is collision-resistant (Section 2), once Cm is fixed, we can assume that, except with negligible probability, all the message fragments that pj has received correspond to the fragments computed by ps at line 18. By the parameters of the ECC we use, it can recover the message m from any k or more (correct) fragments generated by ps, where missing fragments are considered as erasures. Therefore, the message mj reconstructed at line 36 by pj is the message initially MBRB-broadcast by ps. As a result mj=m, and pj MBRB-delivers m at line 41.

Lemma A.12 is the detailed version of Lemma 3.2.

Lemma A.12 (MBRB-Global-delivery).

Suppose a correct node, pi, MBRB-delivers an application message m. At least =cd(11(k1cd)) correct nodes MBRB-deliver m.

Proof of Lemma A.12.

Suppose a correct node pi MBRB-delivers m (line 41). Let us denote by Cm the commitment returned by computeFragVecCommit(m). The proof follows a counting argument on the bundle messages disseminated by correct nodes at lines 40 and 48. In the following, if a correct node disseminates a bundle message both at lines 40 and 48, we only consider the one from line 40.

Observation A.12.1.

All valid bundle messages exchanged during the execution of Algorithm 2 contain Cm, the commitment of the message m, where m is the message MBRB-delivered by pi.

Proof of Observation A.12.1.

Consider a valid message bundle,C,𝑓𝑟𝑎𝑔𝑡𝑢𝑝𝑙𝑒j,𝑓𝑟𝑎𝑔𝑡𝑢𝑝𝑙𝑒i,Σ. By definition of a valid bundle message, Σ aggregates a set 𝑠𝑖𝑔𝑠 of τ=n+t2+1 valid signature shares for C. Similarly, when pi MBRB-delivers m at line 41, it has a threshold signature Σm which aggregates a set 𝑠𝑖𝑔𝑠m of τ=n+t2+1 valid signature shares for Cm. By a reasoning identical to that of Lemma A.10, these two inequalities imply that 𝑠𝑖𝑔𝑠𝑠𝑖𝑔𝑠m contains the signature shares from at least one common correct node, pu. As signature shares cannot be forged, pu has issued signature shares for both C and Cm, and by Lemma A.7, C=Cm. To complete the proof, note that by the definition of a valid bundle message, the threshold signature it contains is valid with respect to the commitment it carries. Hence, all valid bundle messages must contain the commitment Cm of the application message m that matches the threshold signature Σ they contain.

Let B𝗌𝖾𝗇𝖽 be the set of correct nodes that disseminate at least one bundle message during their execution. Similarly, let B𝗋𝖾𝖼𝗏 be the set of correct nodes that receive at least one valid bundle message from a correct node during their execution. The following holds.

Observation A.12.2 is a detailed version of Observation 3.2.1 in Lemma 3.2.

Observation A.12.2.

c|B𝗋𝖾𝖼𝗏|cd and c|B𝗌𝖾𝗇𝖽|cd.

Proof of Observation A.12.2.

Since B𝗌𝖾𝗇𝖽 and B𝗋𝖾𝖼𝗏 contain only correct nodes, trivially c|B𝗌𝖾𝗇𝖽| and c|B𝗋𝖾𝖼𝗏|. Since pi MBRB-delivers m at line 41, it must have disseminated bundle messages of the form bundle,Cm,(m~i,πi,i),(m~j,πj,j),ΣC at line 40. The bundle messages sent by pi eventually reach at least cd correct nodes, as the message adversary can remove at most d of these bundle messages. By Lemma A.6, these bundle messages are valid. Hence, B𝗋𝖾𝖼𝗏cd>0 proves the lemma’s first part.

The nodes in B𝗋𝖾𝖼𝗏 (which are correct) execute lines 42 and 43, and reach line 45. Because pi has included a non- second fragment in all its bundle message, any of the (cd) nodes of B𝗋𝖾𝖼𝗏 that receive one of pi’s bundle messages and has not already sent a bundle message passes the condition at line 45. Each such node then disseminates a (valid) bundle message at line 48. This behavior yields |B𝗌𝖾𝗇𝖽|cd.

Let Bk,𝗋𝖾𝖼𝗏 be the set of correct nodes that receive bundle messages from at least k distinct correct nodes.

Observation A.12.3 is a detailed version of Observation 3.2.2 in Lemma 3.2.

Observation A.12.3.

|Bk,𝗋𝖾𝖼𝗏|×|B𝗌𝖾𝗇𝖽|+(k1)(|B𝗋𝖾𝖼𝗏||Bk,𝗋𝖾𝖼𝗏|)|B𝗌𝖾𝗇𝖽|(cd).

Figure 1: Distribution of distinct bundle messages received by correct nodes. The proof of Observation A.12.4 shows that |B𝗌𝖾𝗇𝖽|>k1.

Proof of Observation A.12.3.

Let us denote by #bundle the overall number of valid bundle messages received by correct nodes from distinct correct senders. More specifically, in the case when a correct node disseminates bundle messages both at lines 40 and 48, we only consider the last bundle message, i.e., the one of line 40. We know that each pB𝗌𝖾𝗇𝖽 sends a bundle message, which by Lemma A.6 is valid. As the message adversary may drop up to d out of the n messages of this comm, we are guaranteed that at least cd correct nodes receive p’s bundle message. This immediately implies that

#bundle|B𝗌𝖾𝗇𝖽|(cd). (1)

As illustrated in Figure 1, the nodes in Bk,𝗋𝖾𝖼𝗏 may receive up to |B𝗌𝖾𝗇𝖽| valid bundle messages from distinct correct senders (one from each sender in B𝗌𝖾𝗇𝖽), for a maximum of |Bk,𝗋𝖾𝖼𝗏|×|B𝗌𝖾𝗇𝖽| bundle messages overall. The remaining nodes of B𝗋𝖾𝖼𝗏Bk,𝗋𝖾𝖼𝗏 may each receive up to k1 valid bundle messages from distinct correct senders, by definition of Bk,𝗋𝖾𝖼𝗏. As Bk,𝗋𝖾𝖼𝗏B𝗋𝖾𝖼𝗏 by definition, |B𝗋𝖾𝖼𝗏Bk,𝗋𝖾𝖼𝗏|=|B𝗋𝖾𝖼𝗏||Bk,𝗋𝖾𝖼𝗏|, and the nodes of B𝗋𝖾𝖼𝗏Bk,𝗋𝖾𝖼𝗏 accounts for up to (k1)(|B𝗋𝖾𝖼𝗏||Bk,𝗋𝖾𝖼𝗏|) valid bundle messages overall. As the bundle messages counted by #bundle are received either by correct nodes in Bk,𝗋𝖾𝖼𝗏 or in Bk,𝗋𝖾𝖼𝗏B𝗋𝖾𝖼𝗏, these observations lead to

|Bk,𝗋𝖾𝖼𝗏|×|B𝗌𝖾𝗇𝖽|+(k1)(|B𝗋𝖾𝖼𝗏||Bk,𝗋𝖾𝖼𝗏|)#bundle. (2)

Combining Equations 1 and 2 yields the desired inequality.

Observation A.12.4 is a detailed version of Observation 3.2.3 in Lemma 3.2.

Observation A.12.4.

|Bk,𝗋𝖾𝖼𝗏|cd(11(k1cd)).

Proof of Observation A.12.4.

Rearranging the terms of Observation A.12.3, and recalling that |B𝗋𝖾𝖼𝗏|c and k1, we get

|Bk,𝗋𝖾𝖼𝗏|×(|B𝗌𝖾𝗇𝖽|k+1)|B𝗌𝖾𝗇𝖽|(cd)|B𝗋𝖾𝖼𝗏|(k1)|B𝗌𝖾𝗇𝖽|(cd)c(k1). (3)

By Observation A.12.2 and ˜A.1, |B𝗌𝖾𝗇𝖽|cdc2dk, therefore |B𝗌𝖾𝗇𝖽|k+1>0, and the previous equation can be transformed in

|Bk,𝗋𝖾𝖼𝗏||B𝗌𝖾𝗇𝖽|(cd)c(k1)|B𝗌𝖾𝗇𝖽|k+1. (4)

Note that the right-hand side of Equation 4 is a monotone increasing function in |B𝗌𝖾𝗇𝖽| when |B𝗌𝖾𝗇𝖽|>k1, as its derivative, d(k+1)(|B𝗌𝖾𝗇𝖽|k+1)2, is positive. By Observation A.12.2, |B𝗌𝖾𝗇𝖽|[cd,c][k,c]. The minimum of the right-hand side of Equation 4 is therefore obtained for B𝗌𝖾𝗇𝖽=cd, yielding

|Bk,𝗋𝖾𝖼𝗏|(cd)2c(k1)(cd)k+1=cd(11(k1cd)). (5)

Observation A.12.5 is a detailed version of Observation 3.2.4 in Lemma 3.2.

Observation A.12.5.

All nodes in Bk,𝗋𝖾𝖼𝗏 MBRB-deliver m.

Proof of Observation A.12.5.

Consider puBk,𝗋𝖾𝖼𝗏. By the definition of Bk,𝗋𝖾𝖼𝗏, the node pu receives k valid bundle messages from k distinct correct nodes. Let us denote by bundle,Cx,(m~x,πx,x),-,Σx these k messages with x[k]. By Observation A.12.1, for all x[k], Cx=Cm. In addition, pu stores each received threshold signature Σx, which is valid for Cm.

Because the messages are valid, so are the proofs of inclusions πx, and as we have assumed that the commitments are collision-resistant, Cx=Cm implies that the received fragments m~x all belong to the set of fragments computed by ps at line 18 for m. As the bundle messages were received from k distinct correct nodes, the node pu receives at least k distinct valid fragments for m during its execution. If pu has not MBRB-delivered any message yet, the condition at line 35 eventually becomes true for Cm, and pu reconstructs m at line 36, since it possesses at least k (correct) message fragments, which are sufficient for the correct recovery of m by our choice of ECC. Then, pu MBRB-delivers m at line 41. On the other hand, if pu has already MBRB-delivered some message m, then Lemma A.10 (MBRB-No-duplicity) implies m=m, since pi is known to have MBRB-delivered m. Therefore, in all possible cases, pu MBRB-delivers m.

Lemma A.12 follows from Observations A.12.4 and A.12.5 and the fact that all nodes in Bk,𝗋𝖾𝖼𝗏 are correct.

Discussion: Selection of 𝒌.

In the above analysis, we set k to be a parameter that controls the number of fragments that allow decoding the ECC. To obtain the communication depicted in Section 3, we assumed k=Ω(n). Furthermore, this parameter affects the delivery power of the MBRB algorithm, as seen in Lemma A.12, namely =cd(1(k1cd))1.

Let us assume that we wish to design an MBRB algorithm with a specified delivery power of =c(1+ε)d, for some ε>0. Plugging in Lemma A.12, we need the delivery power  provided by the Algorithm 2 to surpass c(1+ε)d, thus

c(1+ε)dcd(11(k1cd))

leading to kε1+ε(cd)+1. That is, choosing any integer kε1+ε(ntd)+1, satisfies the above. Recall that the blowup of the ECC is given by n¯/k¯n/k (Section 3), which implies that for any application message m, we have |ECC(m)|nk|m|=1+εεnntd|m|.

Together with ˜A.1, we conclude that the constraints on k that support delivery power of nt(1+ε)d, are

kmin{nt2d,ε1+ε(ntd)+1}.

Supporting multiple instances and multiple senders.

We remark that the above analysis fits the single-shot broadcast algorithm with a fixed sender. As mentioned above, a multi-shot multi-sender algorithm can be achieved by communicating the identity of the sender and a sequence number along with any piece of information communicated or processed during this algorithm. This added information uniquely identifies any piece of information with the respective instance. Additionally, signature shares, threshold signatures, commitments, and proofs of inclusion should be performed on the application message m augmented with the sender’s identifier and the sequence number. This will prevent Byzantine nodes from using valid signature shares/threshold signatures from one instance in a different instance. As a result, an additive factor of 𝒪(logn) bits has to be added to each communicated message, which yields additive communication of 𝒪(n2logn) and has no effect on the asymptotic communication, as we explained in the proof of Lemma B.1. Other changes, such as augmenting the application message m with the sender’s identifier and sequence number do not affect the length of signature shares, threshold signatures, commitments, and proofs of inclusion.

Appendix B Communication analysis

This appendix section analyzes the communication cost of Algorithm 2. Lemma B.1 is the detailed version of Lemma 3.3.

Lemma B.1.

Correct nodes collectively communicate at most 4n2 messages. Each correct node sends at most 𝒪(|m|+nκ) bits. Overall, the system sends at most 𝒪(n|m|+n2κ) bits.

Proof of Lemma B.1.

Let us count the messages communicated by counting comm and broadcast invocations. The sender, ps, sends send messages at line 20. In Phase I, each correct node that has received a send message broadcasts a forward message once (lines 25 and 23). However, if it receives a forward before the send arrives, it performs one additional forward broadcast (line 34). This yields at most 2 comm and broadcast invocations per correct node until the end of Phase I. We can safely assume that a correct sender always sends a single forward (i.e., it immediately and internally receives the send message sent to self). Thus, ps is also limited to at most 2 invocations up to this point. In Phase II, each correct node that MBRB-delivers a message at line 41 transmits bundle messages at line 40. This can only happen once due to the condition at line 35. Additionally, it may transmit bundle messages also at line 48, upon the reception of a bundle. However, this second bundle transmission can happen at most once, due to the if-statement at line 45. This leads to at most 2 additional comm and broadcast invocations per correct node. Thus, as the number of correct nodes is bounded by n, the two phases incur in total at most 4n invocations of comm and broadcast overall. Since each invocation communicates exactly n messages, the total message cost for correct nodes when executing one instance of Algorithm 2 is upper bounded by 4n2. Note that the above analysis holds for correct nodes also in the presence of Byzantine participants, including when ps is dishonest.

We now bound the number of bits communicated by correct nodes throughout a single instance of Algorithm 2. Consider computeFragVecCommit. Let m be a specific application message. We have |m~|=𝒪(|m|) since we use a code with a constant rate. Thus, any specific fragment m~i has length |m~i|=𝒪(|m|/n). Recall that the sizes of a signature share σ, a threshold signature Σ, a commitment C, and an inclusion proof π all have 𝒪(κ) bits (Section 2). Along a signature share pair 𝑠𝑖𝑔, the identifier of the signing node is included, which takes additional 𝒪(logn) bits. However, since κ=Ω(logn), the inclusion of this field does not affect asymptotic communication costs.

We now trace all the comm and broadcast instances in Algorithm 2 and analyze the number of bits communicated in each. The send comm (line 20) communicates n messages, where each message includes a fragment of m (𝒪(|m|/n) bits) with its proof of inclusion (𝒪(κ) bits), a commitment (𝒪(κ) bits), and a signature share (𝒪(κ) bits). Thus, this operation allows the sender to communicate at most 𝒪(|m|+nκ) bits. Each forward broadcast in lines 25 and 34 sends n copies of a message containing a commitment (𝒪(κ) bits), at most one message fragment with its proof of inclusion (𝒪(|m|/n+κ) bits), and two signature shares (𝒪(κ) bits). Hence, each one of lines 25 and 34 communicates a total of 𝒪(|m|+κn) bits. The bundle communication (lines 40 or 48) sends n messages, where each contains a commitment (𝒪(κ) bits), at most two message fragments with their proof of inclusion (𝒪(|m|/n+κ) bits), and one threshold signature (𝒪(κ) bits). Hence, each line communicates at most 𝒪(|m|+nκ) bits. As analyzed above, the sending node (ps, when correct) performs at most one comm of send messages, while each correct node performs at most two broadcast of forward messages, and at most two comm/broadcast of bundle messages. Thus, each node communicates at most 𝒪(|m|+nκ) bits. Overall, the total bit communication by correct nodes during Algorithm 2’s execution is 𝒪(n|m|+n2κ). As mentioned above, the analysis holds in the presence of Byzantine nodes, even if ps is dishonest.

Appendix C Using Bracha’s BRB on hash values under a message adversary

Das, Xiang, and Ren [18] have proposed a communication optimal BRB algorithm that does not use signatures and relies on Bracha instead to reliably broadcast a hash value of the initial sender’s message. One might legitimately ask whether this approach could not be easily adapted to withstand a message adversary, possibly resulting in an MBRB algorithm exhibiting optimal communication complexity (up to the size of hashes κ), optimal Byzantine resilience (n>3t+2d), and optimal delivery power ntd (or at least some close-to-optimal delivery power , up to some factor ϵ).

Unfortunately, under a message adversary, Bracha’s BRB leads to a sub-optimal Byzantine resilience, and degraded delivery power . In particular, Albouy, Frey, Raynal and Taiani [3] have shown that Bracha can be used to implement a MBRB algorithm, but their solution requires a sub-optimal resilience bound (n>3t+2d+2td) and yields a reduced delivery power 𝐵𝑅𝐵=nt(ntn3td)d. Disappointingly, these less-than-optimal properties would in turn be passed on to any MBRB algorithm using Bracha’s BRB along the lines of Das, Xiang, and Ren’s solution.333Taking into account the initial dissemination of the message m by the broadcaster, which is also impacted by the message adversary, such an algorithm could in fact at most reach a delivery power of max(0,(ntd)+𝐵𝑅𝐵(nt))=max(0,𝐵𝑅𝐵d)=max(0,nt(ntn3td+1)d). By contrast, the algorithm we propose is optimal in terms of communication cost (up to κ) and Byzantine resilience, and close to optimal in terms of delivery power (up to some parameter ϵ that can be chosen arbitrarily small).

To provide a hint of why Bracha’s BRB leads to degraded resilience and delivery power when confronted with a message adversary (MA), consider the classical echo phase of Bracha’s BRB [10]. At least one correct node must receive (n+t)/2 echo messages to ensure the first ready message by a correct node can be emitted. To ensure Local-delivery, the threshold (n+t)/2 must remain lower than the worst number of echo messages a correct node can expect to receive when the sender is correct. Without an MA this constraint leads to (n+t)/2<nt, which is verified by assuming n>3t. With an MA, the analysis is more complex. Applying a similar argument to that of the proof of Lemma 3.2, one can show that in the worst case the adversary can ensure that no correct node receives more than (ntd)2/(nt) echo messages. Ensuring that at least one correct node reaches the Byzantine quorum threshold (n+t)/2 requires therefore that

(n+t)/2<(ntd)2/(nt).

This leads to a quadratic inequality involving n, t and d, which results in the following constraint on n:

n>2t+2d+(d+t)2+d23t+3d.

In their analysis [3], Albouy, Frey, Raynal, and Taiani improve on this resilience bound by systematically optimizing the various retransmission and phase thresholds used by Bracha’s BRB algorithm, but their solution still falls short of the optimal lower bound n>3t+2d, which the solution presented in this paper provides.