Distributed Download from an External Data Source in Asynchronous Faulty Settings
Abstract
The distributed Data Retrieval (DR) model consists of peers connected by a complete peer-to-peer communication network, and a trusted external data source that stores an array X of bits (). Up to of the peers might fail in any execution (for ). Peers can obtain the information either by inexpensive messages passed among themselves or through expensive queries to the source array X. In the DR model, we focus on designing protocols that minimize the number of queries performed by any nonfaulty peer (a measure referred to as the query complexity) while maximizing the resiliency parameter .
The Download problem requires each nonfaulty peer to correctly learn the entire array X. Earlier work on this problem focused on synchronous communication networks and established several deterministic and randomized upper and lower bounds. Our work is the first to extend the study of distributed data retrieval to asynchronous communication networks. We address the Download problem under both the Byzantine and crash failure models. We present query-optimal deterministic solutions in an asynchronous model that can tolerate any fixed fraction of crash faults. In the Byzantine failure model, it is known that deterministic protocols incur a query complexity of per peer, even under synchrony. We extend this lower bound to randomized protocols in the asynchronous model for , and further show that for , a randomized protocol exists with near-optimal query complexity.
Keywords and phrases:
Byzantine Fault Tolerance, Blockchain Oracle, Data Retrieval Model, Distributed Download, asynchronyFunding:
John Augustine: Supported by the Centre for Cybersecurity, Trust and Reliability (CyStar) Centre, IIT Madras.Copyright and License:
David Peleg; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed algorithmsEditors:
Andrei Arusoaie, Emanuel Onica, Michael Spear, and Sara Tucci-PiergiovanniSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
1.1 Background and motivation
The Data Retrieval Model (DR) was first introduced in [3] to abstract the fundamental process of a group learning from a reliable external data source, where the data source is too large or it is too expensive to be learned individually (i.e., requires members of the group to collaborate), and some members of the group might crash during execution or act in other ways to deliberately sabotage the learning process. One key example of systems where this process takes place is blockchain oracles [12, 16]. We address this Oracle data delivery process in detail later and present a method for improving its performance using the DR model and the protocols designed in this work.
The DR model contains two entities: (i) a peer-to-peer network and (ii) an external data source in the form of an bit array X. There are peers, up to fraction of which may be faulty (and at least fraction of which are nonfaulty). Each peer has access to the content of the array through queries. The general class of retrieval problems consists of problems requiring every peer to output for some computable function of the input array X. In this work, we focus on the most fundamental retrieval problem, where , referred to hereafter as the Download problem111It is fundamental since every retrieval problem can be solved by first performing download and then locally computing ., where every peer needs to learn the entire input X.
In the absence of failures, the problem can be easily solved in a query-balanced manner. Even with failures, the problem can be trivially solved at the cost of a large number of queries, as the non-faulty peers can directly query all the bits. This solution is prohibitively expensive; thus, we focus on minimizing the number of queries made by each non-faulty peer. For synchronous systems with Byzantine faults, a lower bound of on query complexity for deterministic Download is shown in [3] for every , followed by a matching upper bound when . This implies that in the presence of Byzantine faults, one cannot attain the ideal query complexity of without using randomization.
In this work, we consider Download protocols in the asynchronous setting for both the crash and Byzantine fault models. In the asynchronous Byzantine fault setting, we prove that, unlike the synchronous setting, where randomization can overcome the deterministic lower bound, queries per peer are required when , even for randomized protocols. We complement this lower bound with a protocol for the regime that achieves a query complexity of , which, for a constant , is within log factors of the generic lower bound of .
Turning to the more benign setting of crash faults (i.e., where all peers are honest but some fraction may stop functioning), the picture is brighter. For this model, it turns out that even in the asynchronous setting, one can get efficient deterministic Download protocols that achieve the optimal query complexity of , for any fraction of crashes.
1.2 The Model
In the Data Retrieval (DR) model, the system consists of two components. The first is a collection of peers, each equipped with a unique ID from the range , connected by a complete communication network (or clique). The network provides peer-to-peer message passing, namely, every peer can send at time a (possibly different) message of size at most bits to each other peer.
The second component of the DR model is an external data source. The source stores an -bit input array . It provides the peers with read-only access, allowing each peer to retrieve the data through queries of the form , for . The answer returned by the source would then be , the element in the array. This type of communication is referred to as source-to-peer communication.
We consider asynchronous communication, where any communication (both among peer-to-peer and source-to-peer) can be delayed by any finite amount of time. For randomized protocols, we use the following notion of cycles.
Cycles.
In the asynchronous model, there is no global notion of rounds, as each peer operates at a different pace. Nevertheless, to describe our protocols and analyze their performance, it is convenient to divide the local execution of each peer into (varying time) cycles. Each such local cycle consists of the following stages.
-
Sending (0 or more) queries and getting answers.
-
Sending (0 or more) messages.
-
Waiting to receive messages.
We assume that local computation takes 0 time and can be performed at any point in a cycle. Moreover, when waiting for messages, after every message is received, the peer can adaptively decide whether to keep waiting for an additional message or continue to the next cycle. Note that the local cycle of peer might coincide with a different local cycle of another peer .
In the absence of global time units, it is convenient to break the time axis into “virtual blocks” by defining , for integer , as the first time any peer started its local cycle .
Every message is of size at most bits, where is a system parameter. Note that throughout the paper, we either set to a specific value or leave it as a parameter, in which case increasing the message size parameter would result in faster protocols.
The adversary.
Our analysis uses the notion of an adversary, representing the adverse conditions in which the system operates, including the asynchronous communication and the possibility of failures.
The adversary has two types of operations. First, it can fail up to peers, under the restriction that it can only fail a peer between its cycles (or before the first cycle), meaning that a peer can make random decisions in its current cycle without the adversary being able to react until the end of the cycle. Second, it can set the time it takes a message sent by peer in its local cycle to reach peer , under the restriction that it must set the time for every pair of peers , before time . In other words, the adversary must set the latency of each message sent during a cycle before any peer starts cycle . The adversary can also decide when every peer starts its execution (i.e., we do not assume a simultaneous start). Note that in the case of deterministic protocols, the notion of cycles is irrelevant, and we consider a standard adversary that can fail a peer at any point of the execution and can delay messages for any finite amount of time.
The adversary selects the input data and determines the failure pattern of the peers. In the crash failure model, the adversary’s power is limited to crashing some of the peers in every execution of the protocol. Once a peer crashes, it stops its local execution of the protocol arbitrarily and permanently. This could happen in the middle of operation, e.g., after the peer has already sent some, but perhaps not all, of the messages it was instructed by the protocol to send out at a given point in time. In contrast, in the Byzantine failure model, a failed peer can deviate from the protocol in arbitrary ways. We assume that the adversary can fail at most peers, for some given222We do not assume to be a fixed constant (unless mentioned otherwise). . We let , so there is (at least) a fraction of nonfaulty peers in every execution. Denote the set of faulty (respectively, nonfaulty) peers in the execution by . (resp., ).
We assume that the adversary knows the protocol and hence can simulate it (up to random coins).
We concentrate on the following complexity measures.
- Query Complexity ():
-
the maximum number of bits queried by a nonfaulty peer during the execution.
- Time Complexity ():
-
the time it takes for the protocol to terminate.
- Message Complexity ():
-
the total number of messages sent by nonfaulty peers during the execution.
We assume that queries to the source are the more expensive component in the system, so we focus mainly on optimizing the query complexity . Measuring the maximum cost per peer (rather than the total cost) gives priority to a balanced load of queries over the nonfaulty peers.
Let us now formally define the Download problem. Consider a DR network with peers, where at most can be faulty, and a source that stores a bit array . Each peer is required to learn X. Formally, each nonfaulty peer outputs a bit array , and it is required that, upon termination, for every and .
In the absence of failures, this problem can be solved by sharing the task of querying all bits evenly among the peers, yielding . The message complexity is , assuming small messages of size , and the time complexity is since bits need to be sent along each communication link when the workload is shared.
1.3 Related Work
The vast literature on fault-tolerant distributed computing includes extensive work on both crash faults and Byzantine faults. A foundational result by Fischer, Lynch, and Paterson (FLP)[20] demonstrated that in asynchronous networks, even a single crash fault renders many fundamental problems – such as consensus and reliable broadcast – impossible to solve deterministically. Specifically, they showed that the adversary can indefinitely delay progress, violating the termination property. To circumvent this impossibility, many subsequent works in asynchronous settings have adopted randomized techniques[7, 22] or relaxed the termination requirement [9].
Given the practical relevance of the asynchronous model, it has been widely adopted for studying fault-tolerant protocols under crash faults [19, 24, 17, 6] and Byzantine faults [1, 8, 10, 13, 14, 15, 18, 21, 23, 28]. Fundamental problems such as agreement and reliable broadcast have often served as building blocks in distributed protocol design, typically assuming that all input data is already locally available to the peers.
In contrast, our work addresses the Data Retrieval (DR) model, where each peer must actively fetch data from a trusted external source and disseminate it to the rest of the peers while minimizing the cost associated with querying the source. We focus on the Download problem, which requires every nonfaulty peer to correctly learn the entire data. Unlike classical problems, we show that Download – despite requiring termination – can be solved deterministically in asynchronous networks. This highlights that employing reliable broadcast or agreement as foundational components is not necessary to solve the Download problem. Moreover, this can be done with optimal query complexity for any fraction of crash-faults. In the more adversarial Byzantine setting, we also design randomized protocols that solve the problem even when a majority of the peers are Byzantine.
To the best of our knowledge, this work is the first to study retrieval problems in the Data Retrieval (DR) model under asynchronous communication. The DR model has previously been explored in synchronous networks, most notably in [3, 5]. In particular,[3] introduced the Download problem, motivated by practical applications such as Distributed Oracle Networks (DONs), which form a crucial component of blockchain systems and employ protocols like OCR and DORA[12, 16].
The primary focus of [3] was on minimizing query complexity in the presence of Byzantine faults in synchronous settings. They proved that any deterministic protocol must incur a query complexity of at least and matched this with an upper bound when . On the randomized front, they proposed two protocols. The first tolerates any constant fraction of Byzantine faults but has suboptimal query complexity, achieving with high probability. The second protocol improves upon this by achieving near-optimal query complexity 333We use the notation to hide factors and polylogarithmic terms in and . with high probability, but it can only tolerate up to a fraction of Byzantine faults.
The paper [5] builds upon the foundational work in [3] by closing several gaps in the randomized setting. It presents a randomized Download protocol with query complexity , time complexity , and message complexity for any Byzantine-fault fraction . Additionally, it establishes a lower bound showing that in any single-round randomized protocol, each peer must essentially query the entire input, indicating the inherent limitations of extremely fast protocols.
Moving to two-round protocols, [5] proposes a randomized solution that achieves query complexity with high probability, improving upon the time complexity of [3] under the same fault threshold . Notably, this protocol operates under a stronger adversarial model – referred to as Dynamic Byzantine – in which the set of Byzantine peers may change from one round to another. Under this dynamic fault model, the authors further develop a protocol that achieves expected query complexity (within logarithmic factors), at the expense of a higher time complexity of .
In contrast to prior work, this paper explores the Download problem in asynchronous networks, under both crash and Byzantine faults. Our results are summarized in the next section; a concise comparison to prior synchronous protocols is provided in Table 1.
| Synchrony | Query | Fault Model | Resilience | Type | Reference |
|---|---|---|---|---|---|
| Prior Work | |||||
| Synchronous | Byzantine | Randomized | [3] | ||
| Synchronous | Byzantine | Randomized | [3] | ||
| Synchronous | Byzantine | Randomized | [5] | ||
| This Paper | |||||
| Asynchronous | Crash | Deterministic | Thm 8 | ||
| Asynchronous | Byzantine | Randomized | Thm 10 | ||
| Asynchronous | Byzantine | Randomized | Thm 12 | ||
1.4 Contributions
We present the Download problem in asynchronous communication networks, under both crash and Byzantine failures settings. In the crash-fault setting, our deterministic results are optimal w.r.t. to the resilience (for any ) and query complexity. Notice that this optimality also holds for randomized algorithms. In the Byzantine failure setting, we provide deterministic and randomized lower bounds as well as upper bounds. The main results are:
-
(1)
Deterministic Download in Crash-Fault: We present a deterministic protocol for solving Download problem in the asynchronous setting with at most crash faults () with , and where is the message size. Our result achieves the optimal query complexity for any fraction of crash fault, .
-
(2)
Deterministic Lower Bound in Byzantine Fault: We show that for , every deterministic asynchronous Download protocol that is resilient to Byzantine faults requires .
-
(3)
Deterministic Download in Byzantine Fault: We show that for , there exists a deterministic asynchronous protocol that solves Download with , and .
-
(4)
Randomized Lower Bound in Byzantine Fault: We show that for any randomized asynchronous Download protocol where , there does not exist any execution in which every peer queries less than or equal bits.
-
(5)
2-cycle Randomized Download in Byzantine Fault: We present a 2-cycle asynchronous randomized protocol for Download with and where , and the message size is .
-
(6)
Randomized Download in Byzantine Fault: We present a -cycle protocol that computes Download whp in the point-to-point model having expected query complexity and where , and the message size is .
2 Deterministic Download in the Asynchronous Model with Crash Faults
In this section, we present deterministic protocols that solve the Download problem in an asynchronous setting. In the full version of the paper, we show a deterministic protocol that handles a single crash failure. In the current version, we only present an extended solution that tolerates crashes for , in section 2.1.
2.1 Tolerating any Number of Crashes
In this subsection, we present a protocol that can tolerate up to crashes for any (for a more basic algorithm for the case of see [4]). The main difficulty in achieving tolerance with up to crashes is that in the presence of asynchrony, one cannot distinguish between a slow peer and a crashed peer, making it difficult to coordinate.
Algorithm 1 executes in phases, each consisting of three stages. Each peer stores the following local variables. (We omit the superscript when it is clear from the context.)
-
: ’s current phase.
-
: ’s current stage within the phase.
-
: the correct set of for phase , i.e., the set of peers heard from during phase .
-
: the assignment function of for phase , which assigns the responsibility for querying each bit to some peer .
-
: the output array.
In the first stage of phase , each peer queries bits according to its local assignment and sends a request (asking for bit values according to , namely ) to every other peer and then continues to stage 2. Upon receiving a request, waits until it is at least in stage 2 of phase and returns the requested bit values that it knows.
In stage 2 of phase , waits until it hears from at least peers (again, waiting for the remaining peers risks deadlock). Then, it sends a request containing the set of peers’ IDs (namely, all the peers it didn’t hear from during phase ) and continues to stage 3. Upon receiving a request, waits until it is at least in stage 3 of phase , and replies to every peer as follows. For every , it sends ’s bits if and “me neither” otherwise.
In stage 3 of phase , waits for responses. Then, for every , if it received only “me neither” messages, it reassigns ’s bits evenly between peers . Otherwise, it updates in the appropriate indices. Finally, it continues to stage 1 of phase . Upon receiving a response, updates in the appropriate index and updates for every bit value in the message. The pseudocode is provided in Algorithm 1.
Before diving into the analysis, we overview the following intuitive flow of the protocol’s execution. At the beginning of phase 1, the assignment function is the same for every peer. Every peer is assigned bits, which it queries and sends to every other peer. Every peer hears from at least peers, meaning that it has at most unknown bits after phase 1. In the following phases, every peer reassigns its unknown bits uniformly among all the peers, such that the bits assigned to every peer are either known to it from a previous phase or is about to query them in the current phase (i.e., assigned itself the same bits). Hence, after every phase, the number of unknown bits diminishes by a factor of . After sufficiently many phases, the number of unknown bits will be small enough to be directly queried by every peer.
We start the analysis by showing some properties of the relations between local variables.
Observation 1.
For every nonfaulty peer , if for some phase then
Proof.
Let be such that , and consider . There exists some such that . Since , has heard from , so , and overall .
Denote by the local value of for peer at the beginning of phase . Denote by the local value of for peer after stage 1 of phase .
Claim 2.
For every phase , two nonfaulty peers , and bit , one of the following holds.
-
, i.e., both and assign the task of querying to the same peer, or
-
or .
Proof.
By induction on . For the basis, , the claim is trivially true because of the initialization values (specifically, property () holds).
For . By the induction hypothesis, either or holds. Suppose first that holds, i.e., or . Without loss of generality, assume that . Then, since values are never overwritten, , so holds as well.
Now suppose that holds, i.e., . Let be an index such that . If both and didn’t hear from during phase , then both peers will assign the same peer to in stage 3 of phase (see Line 22), so holds. If one of the peers heard from , w.l.o.g assume did, then . Hence, holds.
Claim 2 yields the following corollary.
Corollary 3.
Every phase stage request received by a nonfaulty peer is answered with the correct bit values.
Next, we show that the protocol never deadlocks, i.e., whenever a nonfaulty peer waits in stages 2 and 3 (see Lines 14 and 18), it will eventually continue.
Claim 4.
If one nonfaulty peer has terminated, then every nonfaulty peer will eventually terminate.
Proof.
Let be a nonfaulty peer that has terminated. Prior to terminating, queried all the remaining unknown bits and sent all the bits to every other peer. Since is nonfaulty every other nonfaulty peer will eventually receive the message sent by and will set , resulting in by Observation 1. Subsequently, will terminate as well.
Claim 5.
While no nonfaulty peer has terminated, a nonfaulty peer will not wait infinitely for responses.
Proof.
Let be the least advanced nonfaulty peer, i.e, for every nonfaulty peer either , or with . Note that there are at least nonfaulty peers. As none of these peers terminate before receiving a request from (premise of the claim), they will send back a response. Hence, will receive responses from at least different peers, and will not wait infinitely.
The combination of Claims 5 and 4 implies that eventually, every nonfaulty peer satisfies the termination condition (see Line 41) and subsequently terminates correctly (since it queries all unknown bits beforehand). That is because by Claim 5 some nonfaulty peer will get to phase , or set prior to that, and terminate, which will lead to the termination of every nonfaulty peer by Claim 4.
Claim 6.
At the start of phase , every nonfaulty peer has at most unknown bits.
Proof.
By induction on . Consider nonfaulty peer . For the base step the claim holds trivially by the initialization values.
Now consider . By the induction hypothesis on , has at most unknown bits at the start of phase . Since unknown bits are assigned evenly in stage (see Line 22), each peer is assigned unknown bits (to be queried during phase ). During stage 2 of phase , waits until , meaning that did not receive the assigned bits from at most peers. Hence, at most bits are unknown after stage 2 of phase . The claim follows.
From the above discussion, we have the following lemma.
Lemma 7.
Algorithm 1 solves Download in the asynchronous setting with at most crash faults after phases with and
Proof.
By Claim 6 and since unknown bits are distributed evenly among , every nonfaulty peer queries at most in phase and at most additional bits when terminating (By Observation 1). Hence, the worst case query complexity (per peer) is bounded by
We next turn to time analysis. Consider a peer . For every phase , after time, every response by a nonfaulty peer is heard by (even slow ones), and stage 2 starts. After that, it takes at most time units for every response to be heard by , allowing it to move to stage 3. Hence, it takes at most time for phase to finish once started it. Finally, upon termination, sends which takes time. Let be the phase such that and be the phase such that . Hence, and . Overall the time complexity is
The lemma follows.
Finally, a modification of the protocol (see [4]) yields an improved time complexity, resulting in the following theorem.
Theorem 8.
There is a deterministic protocol for solving Download in the asynchronous setting with at most crash faults (for any ) with , and .
3 Download in the Asynchronous Model with Byzantine Faults
In this section, we consider the asynchronous model with Byzantine faults, rather than crashes. In this setting, the Download problem is still solvable by the naive protocol where each honest peer queries all bits, but it is unclear whether one can do better. It turns out that this depends on whether or not. The rest of the section handles these two cases.
3.1 Majority Byzantine Failures ()
When , any asynchronous Download protocol that is resilient to Byzantine faults requires . Moreover, any deterministic asynchronous Download protocol resilient to Byzantine faults requires , namely, the only such protocol is the naive one.
Theorem 9.
When , every deterministic asynchronous Download protocol that is resilient to Byzantine faults has .
We defer the proof to [4] since we next establish a similar result for randomized protocols (with a slightly weaker bound of instead of ).
One subtle point that makes the randomized lower bound more complicated has to do with the limitations imposed on the adversary due to the fact that honest peers are aware of the minimal number of honest peers in every execution, and are therefore entitled to wait for these many messages. This point deserves further scrutiny. In the asynchronous model, each peer operates in an event-driven mode. This means that its typical cycle consists of (a) performing some local computation, (b) sending some messages, and then (c) entering a waiting period, until “something happens.” Typically, this “something” is the arrival of a new message. However, the algorithm may instruct the peer to continue waiting until it receives new messages from distinct peers. Since it is guaranteed that at least this many honest peers exist in the execution, this instruction is legitimate (in the sense that it cannot cause the peer to deadlock). Moreover, in case has already identified and “blacklisted” a set of peers as failed peers, it is entitled to wait until it receives new messages from distinct peers in . This observation restricts the ability of the adversary to delay messages sent by honest peers indefinitely. In particular, at some point during the execution, it may happen that all honest peers are waiting for new messages from other honest peers and will not take any additional actions until they do. In such a situation, sometimes referred to in the literature as reaching quiescence, the adversary may not continue delaying messages indefinitely and is compelled to “release” some of the delayed messages and let them reach their destination. Throughout, we assume that the adversary abides by this restriction.
Theorem 10.
For any asynchronous Download protocol where , there are executions in which some peer queries more than bits.
Proof.
Assume towards contradiction that there exists a randomized protocol such that in every execution of every peer queries at most bits. Consider the following (types of) executions.
- Execution .
-
The input is all 0’s. The adversary corrupts the peers and delays the peers until terminates. (If quiescence is reached before terminates, then the adversary forwards all delayed messages and abandons its attempt to fail the protocol). The adversary sets the random string used by the peers to ( should be picked in a way that makes the probability of quiescence negligible; we explain later how this is done). The corrupted peers act as they would in an honest execution (except they use set by the adversary).
- Execution .
-
The input is all 0’s, except for one index ( should be picked according to the random distribution used by such that the probability that gets queried by is less than ; we explain later how this is done). The adversary corrupts the peers and delays the peers until terminates (same as in execution ). The adversary sets the random string used by the peers to the same as in execution . The corrupted peers act as if they are in execution .
Denote by (respectively ) an execution of type (respectively ) where uses as its random string and use . We denote by (respectively, ) the execution (resp., ) where is chosen by the adversary.
Note that from the point of view of , executions and are indistinguishable if does not query bit . Also note that the adversary’s strategy is only valid if does not reach a quiescent state, namely, one in which it will not terminate before receiving a message from at least one of the peers . We now show that and can be chosen such that the probability (over the possible random choices of ) of reaching quiescence is negligible and the fraction of pairs in which queries bit is at most .
W.l.o.g, we assume that picks exactly bits to query. First, for every set of bits, the adversary knows the probability that will query . For every , denote the probability that bit gets queried by by . Note that (since every set contributes its probability to the sum times ). The adversary picks bit with probability (note that ).
Let and be random variables indicating the random selection of a set by and an index by the adversary, respectively. Then
where the second equality follows since and are independent and the inequality is derived by the Cauchy–Schwarz inequality.
Next, we show that there is a choice of for which the probability (over the possible random choices of ) of reaching quiescence in is negligible. Denote by the event of reaching quiescence in (where uses the random string ). Denote by the event of reaching quiescence in (with the random strings , ). Assume towards contradiction that for every value of , the probability of , over the possible random choices of , is . Consider an execution , where the input is all 0’s, the adversary crashes peers , and the random strings used by the peers and are and , respectively. Note that if quiescence is reached in , then quiescence is also reached in . Hence, does not terminate. Noting that
we get that the probability that the algorithm fails (does not terminate) in is at least , a contradiction. It follows that there must be a choice of such that the probability of over the possible random choices of satisfies . The theorem follows.
3.2 Minority Byzantine Failures ()
For , we consider an input vector X that is divided into contiguous segments, each of length approximately . The th segment is denoted by . Peers issue queries for specific segments and obtain the corresponding bit strings as responses. These responses are then broadcast to all other peers in the form of messages , where denotes the segment index and the returned bit string. Two messages are said to be overlapping if they refer to the same segment and consistent if, in addition, they carry identical bit strings. A set of consistent responses from at least peers is referred to as a -frequent string, denoted by the function applied to a multiset of overlapping responses.
To handle inconsistencies caused by Byzantine peers, we employ a decision-tree construction for each set of overlapping responses corresponding to a particular segment. If contains only one candidate, the tree consists of a single leaf labeled by that bit string. Otherwise, two non-consistent bit strings are chosen, and the first index at which they differ is identified as a separating index. An internal node is created at this index, and the set is split into two subsets according to the value of the separating bit. The process is applied recursively until the leaves correspond to distinct candidate bit strings. By querying X at the separating indices, inconsistencies can be resolved, and the correct bit string is determined as long as it appears among the leaves of the decision tree.
Next, the input is first partitioned into segments, and each peer independently selects one segment uniformly at random, queries it, and broadcasts the result to all other peers. Subsequently, each peer constructs decision trees for every segment using the overlapping responses it received. A segment is determined by forming a decision tree from the responses that appeared at least times, ensuring that the correct bit string for each segment is eventually recovered. From the above discussion, we have the following results (due to space constraints, details are deferred to [4].
Theorem 11.
There is a 2-cycle asynchronous randomized protocol for Download with .
To extend the above protocol (-cycle) into a multi-cycle protocol that improves the expected query complexity. The first cycle matches the first cycle of the -cycle protocol, but in each later cycle , the input is divided into segments of size . Each -segment consists of two -segments. Every peer selects an -segment uniformly at random, reconstructs its value from decision trees built in the previous cycle, and then broadcasts the result. Since segment size doubles each cycle, after cycles, each peer has the entire input and outputs correctly, therefore, leading to the following results (details are deferred to [4].
Theorem 12.
There is a -cycle protocol which w.h.p. computes Download in the point-to-point model with expected query complexity and message size .
3.3 Deterministic Download protocol
We now present a deterministic asynchronous Download protocol for Byzantine faults where . Consider the deterministic synchronous protocol presented in [3], where a committee of size is formed for every , and every peer queries the bit and broadcasts the message to all other peers. In order to adapt that protocol to the asynchronous model, we modify the final part of the protocol, requiring each non-committee peer to wait until it gets messages with identical value from at least peers , and then decide . We get the following result (see [4]).
Theorem 13.
When , there exists a deterministic asynchronous protocol that solves Download with , and .
4 Application: Efficient Blockchain Oracles
Blockchain systems [25] have seen a rise in popularity due to their ability to provide both transparency and strong cryptographic guarantees of agreement on the order of transactions, without the need for trusted third-party entities. More general computational abilities have also been well sought out for blockchains. Smart contracts [27] fulfill that need by providing users of the blockchain a way to run programs on the blockchain that ensures reliable and deterministic execution while providing transparency and immutability of both the program code and its state(s). Note that since the execution is required to be deterministic, i.e., every node must produce the same result, smart contracts are restricted to accessing (agreed upon) on-chain data, as off-chain data may introduce non-determinism to the execution.
Blockchain oracles [2, 11, 26] are components of blockchain systems that provide multiple services that support and extend the functionality of smart contracts (and other on-chain entities). The most important and fundamental service a blockchain oracle provides is bridging between the on-chain network and off-chain resources [12, 16], providing smart contracts access to external data without introducing non-determinism into their execution. We focus on this service and artificially consider it to be the sole responsibility of a blockchain oracle. In the remainder of this section, we explain in detail a possible application of the DR model and the Download problem for improved query efficiency within the context of blockchain oracles.
Blockchain oracles general structure.
Blockchain oracles consist of an on-chain component and an off-chain component. The off-chain component encompasses the different data sources that store the required external information (e.g., stock prices, weather predictions) and the network of nodes in charge of retrieving that information and transmitting it to the on-chain component. The on-chain component can be thought to be (but is not necessarily) a smart contract that is responsible for verifying the validity of the report, making the information public on the blockchain it is hosted on, and using it for its execution.
Formally, the off-chain component consists of two parts: an asynchronous444The network is sometimes assumed to be partially synchronous in blockchain oracles. oracle network, with peers (nodes) , , capable of exchanging direct messages among themselves, and data sources , , each storing an array of variables in which the on-chain component is interested. Each peer can read the -th cell from the -th data source by invoking . A fraction of up to of the peers may be Byzantine, and a fraction of up to of the data sources may be Byzantine. Denote the on-chain component by .
The goal of blockchain oracles, as mentioned above, is to pull information from external sources and push a final value on-chain. There are a few difficulties that may arise when trying to develop such a system. First, it might be the case that different data sources report slightly different values (e.g., prices of a specific stock), even if all of them act honestly. Moreover, corrupted data sources might provide false and even inconsistent values (providing some nodes with value and another with ). The system needs to pick a final value in a way that (1) represents the range of honest values pulled from honest data sources and (2) malicious players (both data sources and peers) cannot force the system to pick a final value that does not represent the range of honest values.
The Oracle Data Delivery (ODD) problem.
Denote by the set of honest data sources. Let and . The honest range of is the range . The ODD problem requires the on-chain to publish an array of values to the target blockchain such that , for every .
A blockchain oracle protocol can generally be split into three distinct steps: (1) collecting data, (2) reaching an agreement on the collected data, and (3) deriving and publishing a final value. Note that this abstraction is the minimum required abstraction to capture the operation of blockchain oracle protocols such as OCR [12] and DORA [16]555These protocols have many additional technical aspects, different structures, and different ways of handling steps (2) and (3). As our focus is on improving step (1), we may w.l.o.g. assume the abstract structure..
We now show how our Download protocols can be used to significantly reduce the cost of the Oracle Data Collection (ODC), i.e, step (1) of blockchain oracles.
Improving ODC by blockchain oracles via Download.
Current protocols perform the data collection step by the following ODC process:
For every node:
-
Pick data sources into a set .
-
Perform , for every and .
-
Calculate the median and proceed to step (2).
Theorem 14 ([12, 16]).
The ODC process guarantees that for every and has total query cost and worst case individual query cost .
Instead, we propose utilizing the guarantees of Download protocols, namely, that for an honest data source , the output of each peer is exactly , to construct the following modification of the ODC steps.
-
For every node, pick data sources into a set .
-
For every data source , run a Download protocol (denote the result for cell from data source by ).
-
Calculate the median and proceed to step (2).
It is easy to verify that this modified construction yields the following.
Theorem 15.
The Download-based ODC process guarantees for every and takes total queries and w.h.p.
Note that the Download protocol presented in this paper assumes a binary input array, but this can be extended to numbers via a relatively simple extension. However, it is important to note that our solution relies on the following restrictive assumption. For two honest peers , if both and issue the query , then they get the same result, for every and honest data source (i.e., the data does not change if queried at different times). Getting rid of this assumption and solving the problem efficiently for dynamic data is left as an open problem for future study.
References
- [1] Ittai Abraham, Dahlia Malkhi, and Alexander Spiegelman. Asymptotically optimal validated asynchronous byzantine agreement. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, pages 337–346, 2019. doi:10.1145/3293611.3331612.
- [2] John Adler, Ryan Berryhill, Andreas Veneris, Zissis Poulos, Neil Veira, and Anastasia Kastania. Astraea: A decentralized blockchain oracle. In 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pages 1145–1152, 2018. doi:10.1109/Cybermatics_2018.2018.00207.
- [3] John Augustine, Jeffin Biju, Shachar Meir, David Peleg, Srikkanth Ramachandran, and Aishwarya Thiruvengadam. Byzantine Resilient Distributed Computing on External Data. In Dan Alistarh, editor, 38th International Symposium on Distributed Computing (DISC 2024), volume 319 of Leibniz International Proceedings in Informatics (LIPIcs), pages 3:1–3:23, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2024.3.
- [4] John Augustine, Soumyottam Chatterjee, Valerie King, Manish Kumar, Shachar Meir, and David Peleg. Distributed download from an external data source in asynchronous faulty settings. CoRR, abs/2509.03755, 2025. doi:10.48550/arXiv.2509.03755.
- [5] John Augustine, Soumyottam Chatterjee, Valerie King, Manish Kumar, Shachar Meir, and David Peleg. Distributed Download from an External Data Source in Byzantine Majority Settings. In Dariusz R. Kowalski, editor, 39th International Symposium on Distributed Computing (DISC 2025), volume 356 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1–9:22, Dagstuhl, Germany, 2025. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2025.9.
- [6] Adam D Barwell, Ping Hou, Nobuko Yoshida, and Fangyi Zhou. Crash-stop failures in asynchronous multiparty session types. Logical Methods in Computer Science, 21, 2025. doi:10.46298/LMCS-21(2:5)2025.
- [7] Michael Ben-Or. Another advantage of free choice (extended abstract): Completely asynchronous agreement protocols. In Proceedings of the Second Annual ACM Symposium on Principles of Distributed Computing, PODC ’83, pages 27–30, New York, NY, USA, 1983. Association for Computing Machinery. doi:10.1145/800221.806707.
- [8] Michael Ben-Or. Another advantage of free choice (extended abstract) completely asynchronous agreement protocols. In Proceedings of the second annual ACM symposium on Principles of distributed computing, pages 27–30, 1983.
- [9] Gabriel Bracha. Asynchronous byzantine agreement protocols. Information & Computation, 75:130–143, 1987. doi:10.1016/0890-5401(87)90054-X.
- [10] Gabriel Bracha. Asynchronous byzantine agreement protocols. Information and Computation, 75(2):130–143, 1987. doi:10.1016/0890-5401(87)90054-X.
- [11] Lorenz Breidenbach, Christian Cachin, Benedict Chan, Alex Coventry, Steve Ellis, Ari Juels, Farinaz Koushanfar, Andrew Miller, Brendan Magauran, Daniel Moroz, Sergey Nazarov, Alexandru Topliceanu, Florian Tram‘er, and Fan Zhang. Chainlink 2.0: Next steps in the evolution of decentralized oracle networks. Technical report, Chainlink Labs, 2021.
- [12] Lorenz Breidenbach, Christian Cachin, Alex Coventry, Ari Juels, and Andrew Miller. Chainlink off-chain reporting protocol. Technical report, Chainlink Labs, 2021.
- [13] Christian Cachin, Klaus Kursawe, Frank Petzold, and Victor Shoup. Secure and efficient asynchronous broadcast protocols. In Annual International Cryptology Conference, pages 524–541. Springer, 2001. doi:10.1007/3-540-44647-8_31.
- [14] Christian Cachin, Klaus Kursawe, and Victor Shoup. Random oracles in constantipole: practical asynchronous byzantine agreement using cryptography. In Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, pages 123–132, 2000.
- [15] Ran Canetti and Tal Rabin. Fast asynchronous byzantine agreement with optimal resilience. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 42–51, 1993. doi:10.1145/167088.167105.
- [16] Prasanth Chakka, Saurabh Joshi, Aniket Kate, Joshua Tobkin, and David Yang. DORA: distributed oracle agreement with simple majority. CoRR, abs/2305.03903, 2023. doi:10.48550/arXiv.2305.03903.
- [17] Brian A. Coan. A compiler that increases the fault tolerance of asynchronous protocols. IEEE Transactions on Computers, 37(12):1541–1553, 1988. doi:10.1109/12.9732.
- [18] Sisi Duan, Michael K Reiter, and Haibin Zhang. Beat: Asynchronous bft made practical. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 2028–2041, 2018. doi:10.1145/3243734.3243812.
- [19] Alan David Fekete. Asynchronous approximate agreement. In Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, pages 64–76, 1987. doi:10.1145/41840.41846.
- [20] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985. doi:10.1145/3149.214121.
- [21] Bruce M Kapron, David Kempe, Valerie King, Jared Saia, and Vishal Sanwalani. Fast asynchronous byzantine agreement and leader election with full information. ACM Transactions on Algorithms (TALG), 6(4):1–28, 2010. doi:10.1145/1824777.1824788.
- [22] Valerie King and Jared Saia. Breaking the o(n2) bit barrier: Scalable byzantine agreement with an adaptive adversary. J. ACM, 58(4), July 2011. doi:10.1145/1989727.1989732.
- [23] Julian Loss and Tal Moran. Combining asynchronous and synchronous byzantine agreement: The best of both worlds. Cryptology ePrint Archive, 2018.
- [24] Nancy Lynch and Srikanth Sastry. Consensus using asynchronous failure detectors. arXiv preprint arXiv:1502.02538, 2015. arXiv:1502.02538.
- [25] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. whitepaper, May 2009.
- [26] Supra Research. Supra’s blockchain infrastructure stack. Whitepaper, Supra Labs, November 2024. URL: https://supra.com/documents/SupraTech-Whitepaper.pdf.
- [27] Nick Szabo. Formalizing and securing relationships on public networks. First Monday, 2, 1997. doi:10.5210/FM.V2I9.548.
- [28] Lewis Tseng and Nitin H Vaidya. Asynchronous convex hull consensus in the presence of crash faults. In Proceedings of the 2014 ACM symposium on Principles of distributed computing, pages 396–405, 2014. doi:10.1145/2611462.2611470.
