Abstract 1 Introduction 2 Preliminaries 3 Computing a Circuit in the presence of crashes References Appendix A Chernoff Bounds Appendix B Missing Proofs from Section 3

Two for One, One for All:​ Deterministic LDC–Based Robust Computation in Congested Clique

Keren Censor-Hillel ORCID Technion, Haifa, Israel Orr Fischer ORCID Bar-Ilan University, Ramat Gan, Israel Ran Gelles ORCID Bar-Ilan University, Ramat Gan, Israel Pedro Soto ORCID Virginia Tech, Blacksburg, VA, USA
Abstract

We design a deterministic compiler that makes any computation in the Congested Clique model robust to a constant fraction α<1 of adversarial crash faults. In particular, we show how a network of n nodes can compute any circuit of depth d, width ω, and gate total fan Δ, in dωn2+Δn2O(lognloglogn) rounds in such a faulty model. As a corollary, any T-round Congested Clique algorithm can be compiled into an algorithm that completes in T2no(1) rounds in this model.

Our compiler obtains resilience to node crashes by coding information across the network, and its main underlying observation is that we can leverage locally-decodable codes (LDCs) to maintain a low complexity overhead, as these allow recovering the information needed at each computational step by querying only small parts of the codeword, instead of retrieving the entire coded message, which is inherent when using block codes.

The main technical contribution is that because erasures occur in known locations, which correspond to crashed nodes, we can derandomize classical LDC constructions by deterministically selecting query sets that avoid sufficiently many erasures. Moreover, when decoding multiple codewords in parallel, our derandomization load-balances the queries per-node, thereby preventing congestion and maintaining a low round complexity.

Deterministic decoding of LDCs presents a new challenge: the adversary can target precisely the (few) nodes that are queried for decoding a certain codeword. We overcome this issue via an adaptive doubling strategy: if a decoding attempt for a codeword fails, the node doubles the number of its decoding attempts. We employ a similar doubling technique when the adversary crashes the decoding node itself, replacing it dynamically with two other non-crashed nodes. By carefully combining these two doubling processes, we overcome the challenges posed by the combination of a deterministic LDC with a worst case pattern of crashes.

Keywords and phrases:
Congested Clique, Fault Tolerance, Error Correction Codes
Funding:
Keren Censor-Hillel: is supported in part by the Israel Science Foundation, grant 529/23.
Orr Fischer: is supported in part by the Israel Science Foundation, grant No. 1042/22 and 800/22.
Ran Gelles: supported in part by the United States – Israel Binational Science Foundation (BSF), grant No. 2020277.
Copyright and License:
[Uncaptioned image] © Keren Censor-Hillel, Orr Fischer, Ran Gelles, and Pedro Soto; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed algorithms
Related Version:
Full Version: https://doi.org/10.48550/arXiv.2508.08740 [8]
Acknowledgements:
We would like to thank Merav Parter and Noga Ron-Zewi for helpful discussions.
Editor:
Dariusz R. Kowalski

1 Introduction

Robustness is a crucial component in the design of distributed algorithms, as faults lie at the heart of distributed computing environments. Thus, addressing various types of failures has been heavily studied, see, e.g., seminal results covered in classic books on distributed computing [2, 37, 44]. In this paper, we focus on node crashes in the Congested Clique model (introduced by Lotker et al. [36]), in which the computing devices use bandwidth-restricted point-to-point communication over a complete network graph. So far, despite abundant research in this model (see related work in Section 1.2), relatively less attention was given to fault tolerance questions in this useful model [3, 34, 33, 15, 24].

In some settings, such as the Congest model or the work in Congested Clique of [34, 33], if node crashes are allowed then the requirement is that the output of the computation corresponds only to the inputs of non-crashed nodes. In contrast, Censor-Hillel and Soto [13] show how to avoid losing any information in the Congested Clique despite node crashes. They do this by having each node encode its input and the results of any subsequent local computations using erasure correction codes, split the codewords to pieces and distribute them to the nodes of the network. Upon a node crash, the other nodes collect the pieces of the relevant codeword and decode it in order to continue the computation. This approach implies that any task over a network of n nodes (with the typical setup of O(nlogn) bits of input per node which we will also work with here) can be computed in O(n) rounds despite crashes: after each node distributes its inputs in an encoded way, all other nodes gather all the pieces and reconstruct the entire O(n2logn)-bit input, from which they can locally compute the output. The work [13] beats this bound for certain tasks whose circuit representation has good properties. While their work can achieve fast resilient algorithms for well-behaved circuits, e.g., retain the O~(n1/3)-round complexity for matrix multiplication [12] even with faults, for a general circuit, their resilient algorithm may incur a multiplicative overhead of n rounds, which is worse than the solution that simply learns the entire encoded input.

In this paper, we provide a compiler that makes Congested Clique algorithms robust to a constant fraction of node crashes, by computing general circuits faster. This results in a compiler with a complexity of T2no(1) rounds for a Congested Clique algorithm of T rounds, and thus it beats the O(n)-round solution when T=o(n1/2o(1)). In the crash model we consider, node crashes occur at the start of a round, and the adversary may crash up to αn nodes in total throughout the algorithm, for a fault parameter α[0,1).

Theorem 1.

Let 𝖠𝖫𝖦 be any Congested Clique algorithm, such that each node’s input and randomness string is of size at most O~(n) bits, and that completes in T rounds. Then, for any α[0,1), there is an equivalent algorithm 𝖠𝖫𝖦 that is resilient to an α fraction of crashes and completes in T2no(1) rounds.

A concrete example for an application of Theorem 1 is computing exact single-source-shortest-paths (SSSP) in weighted undirected graphs. The O~(n1/6)-round algorithm of [6] translates by our compiler to an algorithm that completes in n1/3+o(1) rounds, even if any αn nodes may crash during its execution, for any α<1.

At the heart of our compiler is a faster deterministic algorithm for computing general circuits in the faulty model. The following is our main technical contribution.

Theorem 2.

Let C be circuit of depth d, max total-fan Δ, and width ω. Then, for any α[0,1) there exists a deterministic Congested Clique algorithm for computing the output of C in the presence of αn crashes, whose round complexity is dωn2+Δn2O(lognloglogn).

While our transformation also goes through computing circuits, our algorithm differs from that of [13] in two main aspects, which we discuss next.

The first regards the initialization phase of the resilient algorithm. Note that if the adversary fails even a single node before the start of the computation, then this node’s input is lost forever. The solution taken by [13] is promising 1/(1α) quiet rounds where no crashes can happen. These quiet rounds can be used to encode and distribute the nodes inputs. We take a different approach: we consider the inputs at the start of the computation as encoded versions of the inputs to the Congested Clique model (similar to Spielman’s coded computation [45]). Since our robust algorithm is such that the outputs are also encoded in this manner, we get composability for free – we can simply start another computation after completing a former one.

Second, instead of using block error correction codes, we use locally decodable codes (LDCs) [31, 46], which allow a node to query only a small number of other nodes in order to decode only the information it needs for its next local computation. This yields a dramatic improvement in the round complexity of computing a circuit because of its huge effect on congestion. Working with LDCs gives rise to new challenges, because decoding an LDC codeword can be easily targeted by the adversary which can crash so many nodes. The crux of the proof of Theorem 2 shows how to overcome this, and that in fact it can be implemented in a deterministic manner.

Theorem 1 is immediately established by combining our robust computation of circuits in Theorem 2 with the natural conversion of any Congested Clique algorithm to a circuit (see a formal proof in, e.g., [13]).

Lemma 3.

Let 𝖠𝖫𝖦 be a Congested Clique algorithm that computes a function f in T rounds, where the node’s input and randomness strings are of size O~(n). Then, there is a circuit with depth d=2T+1, width ω=Θ~(Tn2), and maximal gate total fan of Δ=O~(Tn), whose inputs are all the O~(n2) input bits of the nodes and whose outputs are the outputs of the nodes after running 𝖠𝖫𝖦.

1.1 Technical Overview

Our main theorem states that we can robustly compute a circuit C of depth d and arbitrary gates, despite a possible (worst-case) crash of an α fraction of the nodes. Towards this end, the network computes the gates of C, layer by layer, where each node is assigned some of the gates in each layer in a dynamic manner that adapts to node failures.

As mentioned above, when a node crashes, it can no longer send messages, and thus any information that was privately held in that node’s memory is lost forever. To be resilient to crashes and avoid losing important information, we need to store information “in the network” so that it can be retrieved despite a constant fraction of node crashes. Indeed, [13] used block error correction codes (ECCs) to encode data and split the resulting codewords across the network. This way, each piece of information can be retrieved by querying all the nodes for their part of the codeword; the decoding succeeds even if a constant fraction of the nodes crash, where the constant depends on the strength of the ECC in use. The drawback is that even if a single bit of information is needed from a coded string, its entire codeword needs to be decoded.

Our starting point is that we replace block codes with Locally Decodable Codes (LDCs). Informally, such codes allow decoding specific parts of the message, rather than decoding the entire message. Furthermore, decoding does not require obtaining the entire (corrupted) codeword, but rather queries relatively small parts of it (a subpolynomial number of symbols), while still guaranteeing correct decoding of the desired symbol with a high probability.

Hence, switching to LDCs benefits our algorithm in the sense that nodes make less queries in order to retrieve exactly the information they need for the computation. The advantage of LDCs over standard (i.e., block-codes) ECCs becomes clear when considering the case where some node is assigned to compute a gate with n inputs, each of which is stored in a different codeword. With standard ECCs, this means that the node must access all n codewords and retrieve all information bits, i.e., n2 bits, assuming n-bit codewords, which causes high congestion.

Algorithm Overview.

The high-level idea of our robust circuit computation algorithm is as follows. Consider a circuit C, whose inputs are distributed over the network in an encoded manner using some LDC code. The network computes C layer by layer. That is, let 𝗀𝖺𝗍𝖾𝗌(1) be the first layer of gates in C, i.e., all the gates whose inputs are the inputs of C. We first distribute the tasks of computing these gates across the (non-crashed) nodes so that each node is assigned roughly the same number of gates to compute. Each node then attempts to compute all the gates allocated to it. To do this, the node first obtains the inputs for each gate assigned to it by decoding the corresponding inputs of C, which are stored in the network using codewords of an LDC. If this information retrieval is successful, the node computes the outputs of the gates allocated to it, and then “stores” them in the network by encoding them with an LDC and distributing the codewords to the nodes in the network. Once the first layer is computed and stored in the network, the nodes continue to compute the second layer of gates in C, denoted 𝗀𝖺𝗍𝖾𝗌(2), which includes all gates whose inputs are either the inputs of C or the outputs of the gates in the first layer, 𝗀𝖺𝗍𝖾𝗌(1). This continues until all the outputs of C are computed and stored in the network. Naturally, crashes that occur during the algorithm may prevent the network from completing the computation of a specific layer and progressing to the next layer, which we discuss next.

Overcoming Crashes I.

In our robust circuit computation algorithm, each gate is assigned to a dedicated node responsible for its computation. If that node crashes, the gates assigned to it remain uncomputed. A trivial solution is to reassign any uncomputed gate in the current layer of C (whose original node is crashed) to a new node that is not crashed. However, the adversary could then crash this new node and eventually cause a delay of αn rounds, which is extremely expensive. Our strategy is different: when a node crashes, we reassign its gate(s) to two fresh nodes. If both of those nodes crash, we again double the redundancy, forcing the adversary to double its effort to keep the gate uncomputed. After at most logn such iterations, every gate is guaranteed to be computed by at least one live node. This progressive doubling remains feasible without causing excessive congestion because of two factors: First, the nodes that do not crash successfully compute and store their assigned gates. These nodes are now available to take over the gates of the crashed nodes. Second, the adversary cannot (effectively) corrupt too many nodes in the same round: If the adversary crashes too many nodes during a short period of time, we call this step overwhelmingly faulty, and simply restart the computation of this layer with the remaining nodes. While this translates to no progress, it reduces the adversary’s budget of crashes and hence cannot occur too many times.

Deterministic LDCs and Congestion.

The decoding algorithm of LDCs is inherently random. Indeed, if a fixed (small) number of codeword symbols are queried during a decoding attempt and these symbols are corrupted, then decoding is certainly impossible. If so, the code cannot correct a constant fraction of corruptions, as would be normally expected. In particular, given a budget of αn node crashes, an all-knowledgeable adversary may be able to crash a subset of nodes in a way that prevents any meaningful progress of the deterministic computation.

Despite the above conundrum, our robust algorithm is fully deterministic. In particular, we derandomize the LDC decodings performed throughout the computation while maintaining resilience to αn node crashes. Our derandomization relies on two important properties, specific to our model. First, when a node crashes, all other nodes are aware of this event because the crashed node does not send any messages from the round in which it crashed. This allows the remaining nodes to maintain a consistent view of the crashed and alive set of nodes. Second, crashed nodes that are queried for their respective parts of the LDC codeword do not reply, and thus the LDC decoding algorithm is missing some parts of the queried codeword, known as erasure corruptions. These are easier to correct than when the codeword contains incorrect information. The combination of erasure corruptions and knowledge of which nodes have crashed in each round allows a decoder to predict whether a specific set of queries will result in successful decoding. Thus the decoder can pick a set of queries that is guaranteed to succeed if no further nodes crash.

While the above idea derandomizes the (inherently random) LDC decoding algorithm, it creates a new challenge regarding the resulting congestion. To illustrate this challenge, suppose that a node performs the above deterministic selection of queries separately for each piece of information it wants to retrieve. Then, it may end up querying the same subset of nodes over and over again, thus causing a large congestion. Randomized decoding averts this issue by querying a set of nodes in a near-uniform distribution. However, even if we could deterministically replicate this querying distribution, we face again the issue mentioned above, where many of these queries are erased and do not lead to a correct decoding.

Nevertheless, our analysis, which is based on the probabilistic method, shows that it is possible to select query sets for multiple (independent) LDC-decoding instances in the presence of a constant fraction of erasures in positions known to the decoding algorithm, so that the following hold simultaneously: (i) each decoding instance successfully decodes the correct information, and (ii) the congestion per queried node is small, i.e., the queries are well-distributed over the network. To show the latter, we analyze an equivalent bins-into-balls experiment, showing that the event that too many balls (queries) aggregate in one specific bin (node) happens with small probability. Union-bounding over all nodes keeps the probability of the bad event below 1, thus proving the existence of good query sets that avoid congestion.

Overcoming Crashes II.

The above discussion implies that the adversary cannot select a set of αn nodes to crash for invalidating many of the LDC decoding attempts throughout the computation, as long as these indices are known to the nodes. However, the adversary may decide to crash nodes after a decoder fixes its selection of nodes to query in a given round, as this selection depends only on nodes that are crashed prior to that round. To overcome this problem, the nodes dynamically increase the number of times they attempt to LDC decode each piece of information, according to corruptions made so far. Namely, if the decoding of some LDC-encoded information fails due to new corruptions of the queried nodes, then the decoding node performs two independent decoding attempts. These new queries depend on all the crashes so far, and in particular, on the “new” crashes that invalidated the original decoding attempt. If these two attempts fail as well (due to new crashes that occur after the nodes queried by these two attempts are decided), the decoding node doubles its number of attempts again, and so on. Overall, after a logarithmic number of doublings, this approach potentially causes a large, near-linear number of LDC decoding attempts, and the adversary can only fail a constant fraction of them without exceeding its budget. Note that it only takes one successful attempt to move on, so the adversary must fail all attempts of a single codeword to prevent progress.

Recap.

We can now summarize the overview of our robust circuit computation. For a given circuit C, the computation goes layer by layer, where computing a layer of C means: (1) assigning uncomputed gates of the layer to non-crashed nodes; (2) retrieving the inputs to the gates of this layer, that are stored in the network via an LDC during the computation of previous layers; (3) computing the gates; (4) storing the outputs of the gates via an LDC. Technically speaking, this computation of each layer is done in two nested loops: The external loop doubles, in each iteration, the number of nodes responsible for computing some uncomputed gate (we call this loop the 𝖭𝗈𝖽𝖾𝖣𝗈𝗎𝖻𝗅𝗂𝗇𝗀 loop). The internal loop doubles, in each iteration, the number of independent decoding attempts each node makes for each uncomputed gate assigned to it (we call this loop the 𝖠𝗍𝗍𝖾𝗆𝗉𝗍𝖣𝗈𝗎𝖻𝗅𝗂𝗇𝗀 loop). Section 3 fully details our algorithm. The algorithm’s analysis and the derandomization of the LDC in our setup appear in the full version of the paper [8].

1.2 Additional Related Work

Since its introduction for faster MST computation [36], the Congested Clique model has been extensively explored during recent decades for various tasks. The MST complexity was eventually shown to be constant [40] following a beautiful line of work [28, 26, 32, 30]. Additional examples include routing [35], coloring [42, 43, 14, 4, 17, 16], subgraph finding [19, 29, 41, 21, 9, 10, 7], and many more. Hardness of obtaining lower bounds in this model is established in [20].

Fault-tolerance in the Congested Clique model was explored by [34, 33] for graph realization problems under crash-faults, by [3, 15] for recognizing connectivity and hereditary properties under Byzantine faults, and by [23, 24] for general computations under edge faults. In particular, [24] employs LDCs as means to concentrate information from many nodes into few.

Coding theory, in various forms, has been extensively used in many other areas of distributed computing, including: distributed zero-knowledge [27, 5], proof labeling schemes [22, 11], the beeping model [18, 1, 25], and distributed interactive proofs [39].

2 Preliminaries

For an integer n1 we denote [n]={1,2,,n}. All logarithms are taken to base 2 unless otherwise mentioned. We say that an event occurs with high probability (in n, which is usually implicit) if its probability is at least 11/n10. For a string x and for any i[|x|], let x[i] denote the i-th symbol of x.

2.1 Computation Model

Suppose a Congested Clique network, where n nodes, v1,,vn, communicate in synchronous rounds by exchanging blogn-bit messages in an all-to-all fashion, for some constant b. Throughout the computation, an adversary may choose to crash up to αn nodes, where the constant α[0,1) is a parameter of the model. A crashed node does not send any messages starting from the round in which it is crashed. The corruption is worst case: an all-knowledgeable adversary bases its decision on which nodes to fail on all its available information, including the algorithm that the nodes execute, their inputs, and their local randomness (if any). Note that all nodes know which nodes are non-faulty at the end of each round, denoted as the set 𝒜 (to indicate that they are alive).

The compiler of Algorithm 5 is completely deterministic, in the sense that it does not add any new randomness. In particular, The robust algorithm 𝖠𝖫𝖦 remains deterministic if 𝖠𝖫𝖦 was deterministic, and similarly, it is randomized if 𝖠𝖫𝖦 was so. In the latter case, we assume that the randomness of ALG is given to the representing circuit C as input.

Coded inputs and outputs.

In the Congested Clique model, it is common to refer to problems in which each node holds a private input x of O(nlogn) bits. As explained in the introduction, in our faulty model, the inputs must be coded in a way that prevents them from being lost if some node crashes before the first round of communication. We thus consider inputs as encoded via an LDC code (see Definition 4 in Section 2.3 below), and the respective codeword is distributed across the nodes of the network. We employ a (q,δ,ϵ)-LDC code with block length n, whose parameters will be specified later. Such a code guarantees that every bit of the input x (of each node) can be retrieved by querying q nodes with probability 1ϵ over a uniform choice of the randomness string, even if δn of the nodes have crashed. It will be the case that α<δ.

We require the output to be stored in the network in a similar way: the outputs should be encoded via an LDC code whose resulting codewords are split among the nodes, so that it is possible to retrieve each bit of the output despite δn crashes. This choice allows the composition of computations, where the outputs of one computation are the inputs of the next.

2.2 Layered Circuits

We identify a circuit C with a directed acyclic graph C=(VC,EC) in which every gate is associated with a node, and every wire connecting gates is associated with an edge. Each bit-input to the circuit (an input gate) is associated with a leaf node and each output of the circuit (an output gate) with a root node. Other nodes are associated with the computational gates of the circuit. We say that a node gVC depends on a node gVC if there is a directed path from g to g. The notion of gate dependencies induces layers in the circuit, where all input gates are in layer 0, and a gate g is placed in layer i if i is the minimal integer such that g depends only on nodes in layers at most i1. For a gate gVC, we denote by 𝗅𝖺𝗒𝖾𝗋(g) the layer of g, and let 𝗀𝖺𝗍𝖾𝗌(i)={gVC𝗅𝖺𝗒𝖾𝗋(g)=i} be the set of all the gates in layer i. The depth of C, denoted d=d(C), is defined to be maxgVC𝗅𝖺𝗒𝖾𝗋(g). We denote by 𝗐𝗂𝗋𝖾𝗌(i)={(u,v)EC𝗅𝖺𝗒𝖾𝗋(u)=i} the set of all wires that go out of the gates in layer i. Note that 𝗐𝗂𝗋𝖾𝗌(0) are the inputs to the circuit. For a gate g, denote by 𝖿𝖺𝗇𝗂𝗇(g) its in-degree and by 𝖿𝖺𝗇𝗈𝗎𝗍(g) its out-degree; let 𝖿𝖺𝗇(g)=𝖿𝖺𝗇𝗂𝗇(g)+𝖿𝖺𝗇𝗈𝗎𝗍(g) be its total fan. The width of the circuit, ω=ω(C), is defined as the maximal number of outgoing wires of any layer, ω=maxi(𝗐𝗂𝗋𝖾𝗌(i)). We assume throughout that all parameters of the circuit are polynomial in the size of the network, i.e. d,ω,Δ=O(poly(n)). This fits the case where C represents a Congested Clique algorithm with O(nlogn) bits of input per node. We note, however, that the statement of Theorem 2 holds, up to logarithmic terms, for any parameters d, ω, and Δ.

Figure 1: An example of a circuit C of depth d=3 and width ω=6. We have 𝗀𝖺𝗍𝖾𝗌(1)={g1,g2} and 𝗀𝖺𝗍𝖾𝗌(2)={g3}. The gate g2 has 𝖿𝖺𝗇𝗂𝗇=4 and 𝖿𝖺𝗇𝗈𝗎𝗍=1 giving 𝖿𝖺𝗇=5, while the gate in1 has 𝖿𝖺𝗇𝗈𝗎𝗍=2 and 𝖿𝖺𝗇=2. The set 𝗐𝗂𝗋𝖾𝗌(0) and 𝗐𝗂𝗋𝖾𝗌(1) are indicated on the figure.

2.3 Error Correcting Codes and Locally Decodable Codes

For an alphabet Σ, the Hamming distance of two strings x,y(Σ) of the same length, i.e., |x|=|y|, is the number of indices for which x and y differ and is denoted by Hamm(x,y)=|{ix[i]y[i]}|. For two strings xΣ, y(Σ{}) and value c(0,1), we say that y can be obtained by a c-fraction of erasures from x if |x|=|y|, and for all i[|x|], it holds that either x[i]=y[i] or y[i]=, where the latter case happens at most c|x| times. An index i in which y[i]= is called an erasure.

For a prime power p, we denote by 𝔽p the finite field of size p. An error correcting code is a mapping 𝖤𝗇𝖼:𝔽pK𝔽pN that takes K symbols of the alphabet 𝔽p into N symbols of the alphabet 𝔽p.111We can map [p] and 𝔽p with a fixed isomorphism, so that 𝖤𝗇𝖼:[p]K[p]N. The value N is called the block length of the code. The ratio K/N is called the rate of the code. The relative distance of a code is the normalized Hamming distance between any two codewords, denoted δ=minmm1NHamm(𝖤𝗇𝖼(m),𝖤𝗇𝖼(m)).

Next, we formally define the notion of locally decodable codes. For the purposes of our work, we only consider the erasure setting, in which we are given access to a possibly corrupted codeword y, obtained by erasing at most δ-fraction of some 𝖤𝗇𝖼(x) for some x𝔽pK, and an index i[K], and our goal is to find the i-th symbol of x.

Definition 4 (Locally Decodable Codes (LDCs) for erasures).

An error correcting code 𝖤𝗇𝖼:𝔽pK𝔽pN is said to be a (q,δ,ϵ)-LDC if there exists a randomized decoding algorithm 𝖣𝖾𝖼 that receives as input a string y(𝔽p{})N and an index i[K], performs at most q queries to y, and outputs a value with the following guarantee: if there exists x𝔽pK such that y can be obtained by at most a δ-fraction of erasures from 𝖤𝗇𝖼(x), then Pr(𝖣𝖾𝖼(y,i)=x[i])1ϵ.

Definition 5 (Non-adaptiveness).

An LDC is called non-adaptive if for every call to its decoding algorithm 𝖣𝖾𝖼, the set of queries it performs given input index i is only a function of the randomness and the index i. In particular, a query does not depend on the outcome of previous queries.

We can think of the decoding algorithm of a non-adaptive LDC code 𝖣𝖾𝖼(,i) as an algorithm with oracle access to the codeword, that first generates q indices to query, and then, once provided these (possibly corrupt) q symbols, returns the decoded message symbol.

The following smoothness property of LDCs means that decoding an index requires querying the codeword in a “smooth” (near-uniform) way. This property is important in order to avoid congestion when a node decodes multiple values.

Definition 6 (Smoothness).

An LDC is called s-smooth if there exists a decoding algorithm 𝖣𝖾𝖼, such that during any call to 𝖣𝖾𝖼, any entry j[N] of the codeword is queried with probability at most s.

The following theorem suggests that smoothness is an inherent property of LDCs, since any decoding algorithm can be transformed into a smooth one.

Theorem 7 ([31, Theorem 1]).

Every (q,δ,ϵ)-LDC of block length N is q/δN-smooth.

3 Computing a Circuit in the presence of crashes

We show how to efficiently and deterministically compute a specified circuit C in the Congested Clique model, in the presence of up to αn crashes. We start, in Section 3.1, by describing the procedures 𝖲𝗍𝗈𝗋𝖾, 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾, and 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 used to store and retrieve information in our algorithm. In section Section 3.2 we describe another procedure, 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾, that assigns gates to nodes in a balanced-manner. Finally, in Section 3.3, we describe our circuit computation algorithm based on these procedures.

3.1 The 𝗦𝘁𝗼𝗿𝗲, 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗲, and 𝗕𝘂𝗹𝗸𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗲 Procedures

As mentioned above, any information that may get lost due to node crash is stored in the network via an LDC. We first describe the LDC we use and then detail the store and retrieve procedures.

The LDC instantiation

We assume a fixed LDC code, whose exact details appear in the full version of the paper [8]. Specifically, for some power of prime q=2O(logn) and ρ such that ρ1=2O(lognloglogn), our LDC has an encoding function 𝖤𝗇𝖼:[q]ρn[q]n and a decoding function 𝖣𝖾𝖼 which, given an input index i[n], smoothly queries q indices of the codeword and decodes correctly even if up to δ-fraction of the q queried symbols are erased, for some predetermined constant distance α<δ<1. We assume that n=qr for some integer r (specified in the detailed construction); this assumption can be lifted using standard methods. Note that Definition 4 says that 𝖣𝖾𝖼 is randomized, but our goal is to compute the circuit deterministically. We thus treat any randomness used by 𝖣𝖾𝖼 as originating from some randomness string, but our implementation of obtaining such randomness strings will be deterministic rather than randomized which will render our implementation of 𝖣𝖾𝖼, and hence our circuit computation algorithm, deterministic.

The 𝗦𝘁𝗼𝗿𝗲 Procedure

The 𝖲𝗍𝗈𝗋𝖾 procedure “saves” information in the network in a robust way, by encoding it with an LDC and distributing the codeword among the nodes of the network.

Each node vj begins the 𝖲𝗍𝗈𝗋𝖾 procedure with a bit-string Uj it wishes to store in the network. It first splits Uj into parts of size ρnlogq bits each (so that each can be represented by a string of ρn symbols over the alphabet [q]), padding with zeros as necessary. Set 𝗅𝖺𝗌𝗍j=|Uj|/(ρnlogq) and denote these parts Uj1,,Uj𝗅𝖺𝗌𝗍j. The node vj then encodes each Uji using 𝖤𝗇𝖼 to obtain a codeword 𝖤𝗇𝖼(Uji)=Lji of size |Lji|=n symbols.

Next, vj distributes the codewords to the network nodes. Specifically, in round i=1,,𝗅𝖺𝗌𝗍j, it sends the symbols of Lji – one symbol to each node in the network. This takes a single round of communication because each symbol in the LDC codeword comes from the alphabet [q] with q=2O(logn), hence it can be encoded in logq=O(logn) bits. The formal description is depicted in Algorithm 1.

Algorithm 1 𝖲𝗍𝗈𝗋𝖾 (for node vj).

We say that vj stored Uj in the network if vj completed the 𝖲𝗍𝗈𝗋𝖾 procedure without crashing. The following is straightforward from Algorithm 1.

Observation 8.

Storing a string Uj takes O(|Uj|/(ρnlogq)) rounds of communication.

The 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗲 and 𝗕𝘂𝗹𝗸𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗲 Procedures

In the 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 procedure, a node vj is given as input an index w of some string U that was previously stored in the network using the 𝖲𝗍𝗈𝗋𝖾 procedure. Node vj is additionally given a string R called the randomness string. One can think of this procedure as randomized, with R as its randomness, but in all our invocations of 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾, the string R is set deterministically, as will be explained later.

The goal of the 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 procedure is to retrieve the value of the w-th bit of the previously stored U. To that end, vj first identifies the codeword that contains the bit w: recall that the 𝖲𝗍𝗈𝗋𝖾 procedure splits U into parts of size ρnlogq bits. Denote by Ui the respective part and by i the index of the symbol in Ui that contains the bit-value U[w] in which we are interested. In the following, we say “decode U[w]” to actually mean decoding the respective index i of the possibly corrupted codeword 𝖤𝗇𝖼(Ui) that contains the respective value.

To retrieve the value of U[w] from the (stored) codeword 𝖤𝗇𝖼(Ui), the node vj executes 𝖣𝖾𝖼(.) using the randomness string R and obtains indices of random q symbols of 𝖤𝗇𝖼(Ui) needed for the decoding. It is possible to learn these q indices in advance because the LDC is non-adaptive (Definition 5). The node vj then queries the respective nodes for their stored symbols and provides 𝖣𝖾𝖼(.) with their replies. Note that crashed nodes do not reply, which translates to erasures given to 𝖣𝖾𝖼(.). Further, in hindsight, the randomness string R will be derandomized, which has the effect that all nodes know R. It will follow that vj does not actually need to send any message in order to query any node; the q nodes will know that they are the nodes that should give information back to vj, since they will know the identity of i,Ui and the value of R to begin with. See Lemma 14.

Under some circumstances, we allow the retrieval to fail, in which case the output is . This could happen in two cases: (i) when there are too many erasures and 𝖣𝖾𝖼(.) returns , or (ii) if one of the nodes queried during this 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 invocation crashes during the execution of this 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 invocation. For the former, our derandomization will guarantee that this event cannot happen if at most αn nodes have crashed. For the latter, in this case we set the output to be even if the decoding 𝖣𝖾𝖼 successfully retrieves the symbol. This decision does not affect the correctness of the algorithm, but rather simplifies its analysis.

Algorithm 2 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 (for node vj).

The 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 procedure generalizes the above to allow retrieving multiple previously stored bits. Now, vj is given as input a collection of indices Wj, where each index wWj refers to some (predetermined) U(w) that was previously stored in the network via a 𝖲𝗍𝗈𝗋𝖾. The strings U(w) may be different for different values of w, or they may be the same. Additionally, the procedure gets a multiplicity parameter . The goal is to output the value of the bit in index w of U(w), for each index wWj.

Towards this goal, all nodes in the network first deterministically compute a set of randomness strings (vj)={Rvj,w,ii[2],wWj} for each vjV, with good properties, which we define and discuss in detail later in the section (see Definition 9). The deterministic generation of these strings is given by Lemma 10. After this step, all nodes vV know (vj) for every vj.

Next, vj performs, for each index wWj, a batch of 2 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 procedures, where the i-th invocation uses randomness string Rvj,w,i. Similar to the case of 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾, we allow some retrieves to fail and output . If at least one of the 2 invocations of 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾(w,Rvj,w,i) succeeds, its output becomes the output of 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 for the index w; otherwise, the respective output is .

All the |Wj|2 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 invocations are executed in parallel. However, in order to avoid congestion, vj pipelines requests targeted to the same node. That is, it sends at most one query to any node in any given round. Similar to above, the set of strings {U(w)}wWj is predetermined and known to all nodes, and the identities of the q nodes that are queried in a specific 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾(w,Rvj,w,i) are generated using Rvj,w,i, which is also known to all nodes. Hence, each queried node can infer the respective U(w) for each query, without the need for vj to communicate this data. The formal procedure is depicted in Algorithm 3.

Algorithm 3 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 (for node vj).

Consider a specific instance of 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾(Wj,). The selection of randomness strings (vj) that vj uses has a tremendous effect on the induced congestion. Indeed, assume that (vj) is such that all 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾s query some vi, implying 2|Wj| rounds of communication where in each round a single LDC symbol is communicated. This should be contrasted with the fully randomized case, where each node is queried in a near-uniform distribution (implied by the smoothness of LDC codes, Theorem 7), implying that each vi is queried 2|Wj|q/n times, in expectation. Standard tail bounds show that the number of queries of the maximal node (and hence the round complexity) is bounded by 2|Wj|q/nO(logn). This gives that there exists a way to select the randomness strings (vj) while maintaining the same round complexity.

However, while the above gives uniform query locations for the goal of controlling congestion, it does not address the problem that many locations might be erased. To illustrate this point, assume that out of the 2 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 instances, all but one are querying mostly erased symbols (crashed nodes), and only one 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 correctly decodes the value. The output of 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 would be correct in this case, but in this scenario the adversary needs to crash only a single additional node in order to fail it.

Indeed, what we show is even stronger than mimicking a fully randomized case by some naïve load balancing. The following Lemma 10 shows that we can find a set of randomness strings that maintains a similar round complexity even if each codeword has αn symbols erased, and furthermore, each individual 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 succeeds decoding the respective value, as long as no new crashes happened during that 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾. In other words, we can de-randomize the random sampling of codeword symbols to query while (i) maintaining complexity (by controlling congestion) and (ii) performing only “useful” queries, hence maintaining our resilience to an all-knowledgeable adversary. Our choice of randomness strings guarantees that the adversary must waste 2 of its crashing budget in order to fail the 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾, which is crucial for the correctness proof.

We now define the notion of good randomness strings, namely, strings that provide the above properties for the 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 procedure.

Definition 9 (Good randomness strings).

Fix a node vj, parameters Wj,, and the set of non-crashed nodes 𝒜. A collection of randomness strings (vj)={Rvj,w,i}wWj,i[2] is called good for an instance of 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾(Wj,), if the following holds:

  1. (1)

    Each node is queried at most O(2|Wj|qnlogn) times in total by vj, and

  2. (2)

    For all wWj and i[2], the invocation of 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾(w,Rvj,w,i) succeeds (given no further changes in 𝒜).

In the full version of the paper [8], we prove the following.

Lemma 10.

There is a zero-round deterministic algorithm which, given the set of non-crashed nodes 𝒜, a collection of indices Wj, and a multiplicity parameter , computes a good collection of randomness strings (vj) for vj.

With the above, the following is immediate.

Lemma 11.

𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾(Wj,) takes O(2|Wj|qnlogn) rounds.

3.2 The 𝗔𝗹𝗹𝗼𝗰𝗮𝘁𝗲 Procedure

The purpose of the 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾 procedure is to assign a set of given gates that need to be computed to non-crashed nodes. The procedure is deterministic and runs locally on each node, without any communication. However, all nodes reach the same allocation, since they all have the same knowledge regarding crashed nodes and regarding failed LDC queries, where the latter is due to the randomness strings being generated deterministically by all nodes and known to all (due to Lemma 10 above and the upcoming Lemma 14 which essentially says that the nodes are able to keep a consistent view of all of these variables by careful bookkeeping).

In more detail, the procedure is given as input the current set of non-crashed nodes 𝒜, a set of gates GVC (of the circuit C=(VC,EC), known to all), and a multiplicity parameter 1. For each gate gG, 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾 assigns g to a set of min(21,|𝒜|) nodes from 𝒜 using the following sequential “greedy” process: Sort the gates in G by their total fan (denoted 𝖿𝖺𝗇), in descending order. Assign the gates one by one to a set of min(21,|𝒜|) distinct nodes in 𝒜 whose loads are minimal (break symmetry by node IDs). The load of a node v, denoted λ(v), is defined as the sum of the total 𝖿𝖺𝗇 of all gates assigned to it so far during this 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾 instance. See Algorithm 4.

Algorithm 4 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾 (for node vj).

The assignment can be computed locally in a consistent manner across all non-crashed nodes without any communication, since all relevant information, namely G,1, and 𝒜, is known to all nodes in 𝒜. Note that since this is a local computation procedure, we can assume that set 𝒜 does not change throughout the computation, and that all nodes use the same set 𝒜 representing the non-crashed nodes at the beginning of that round.

The following lemma bounds the load assigned to each node, and is proven in Appendix B.

Lemma 12.

Let L=min(21,|𝒜|), P=gG𝖿𝖺𝗇(g), and assume maxgG𝖿𝖺𝗇(g)Δ. Then, 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾(G,1) puts a maximal load of max(4PL/|𝒜|,Δ) on each node.

3.3 The Circuit Computation Algorithm

We can now complete the description of our circuit computation algorithm, presented in Algorithm 5. The algorithm takes as input a circuit C whose inputs (the wires 𝗐𝗂𝗋𝖾𝗌(0)) are already stored in the network.

The algorithm computes the gates of C layer by layer, in a sequence of d steps referred to as 𝗅𝖺𝗒𝖾𝗋-steps. For 𝗅𝖺𝗒𝖾𝗋-step i=1,,d, we assume that 𝗐𝗂𝗋𝖾𝗌(0),,𝗐𝗂𝗋𝖾𝗌(i1) have already been stored by previous iterations, and the goal is to compute and store 𝗐𝗂𝗋𝖾𝗌(i). To this end, the nodes execute 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾, which assigns to each non-crashed node vj a set of gates Gj𝗀𝖺𝗍𝖾𝗌(i) to compute and store their output wires (line 5).

Then, each node vj tries to retrieve the input wires of the gates Gj assigned to it via the 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 procedure (line 8). If successful, the node computes the gates assigned to it (line 12) and obtains the values of all output wires Uj of the gates Gj. The node then stores these wires in the network (line 13).

However, crashes that occur during this computation may hinder the computation of some wires. To overcome this issue, the computation of layer i consists of two nested loops. The outer loop, which we call the 𝖭𝗈𝖽𝖾𝖣𝗈𝗎𝖻𝗅𝗂𝗇𝗀-loop, iterates over 1=1,,logn and doubles the number of nodes that try to compute a given gate. The inner loop, called the 𝖠𝗍𝗍𝖾𝗆𝗉𝗍𝖣𝗈𝗎𝖻𝗅𝗂𝗇𝗀-loop, iterates over 2=1,,logΛ for some parameter Λ (fixed in the analysis), and doubles the number of retrieval attempts a given node performs for each input wire assigned to it.

If during the computation of 𝗅𝖺𝗒𝖾𝗋 i, more than cfn/(qlogn) new crashes have occurred, for some sufficiently small constant cf>0 determined later, we re-start the computation of that layer with the remaining nodes. Namely, we maintain a counter fi,rep of newly crashed nodes in the 𝗅𝖺𝗒𝖾𝗋-step, which is initialized at the start of the 𝗅𝖺𝗒𝖾𝗋-step to be 0. Once it passes cfn/(qlogn), we reset the counter to 0, reset the multiplicity parameters 1,2 to 1, and retry to compute and store all remaining unstored wires in 𝗐𝗂𝗋𝖾𝗌(i). This action is captured in lines 1620, assisted by the variable rep, that counts the number of repetition attempts of computing layer i. We call such a repetition of a 𝗅𝖺𝗒𝖾𝗋-step overwhelmingly faulty:

Definition 13.

Let cf>0 be a sufficiently small constant. A repetition of a 𝗅𝖺𝗒𝖾𝗋-step is called overwhelmingly faulty if cfn/(qlogn) new crashes occur during this step.

Algorithm 5 Robust Circuit Computation (for node vj).

The next lemma captures the following observation: the nodes are capable of “bookkeeping” the progress of the computation at any given round of Algorithm 5. This bookkeeping information includes gates that were computed and stored, gates that still need to be computed, 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 calls that succeeded and those that failed, etc. In particular, when a node needs to access some wire w, that was previously stored, the node knows exactly which LDC codeword contains it, and which index of that codeword it should decode in order to retrieve w.

Lemma 14 (Bookkeeping).

Any non-crashed node knows, at the start of any round, the following information: (1) the set S of gates whose outputs were stored in the network, and (2) for any gS and any output w of g, the LDC codeword that contains w (namely, the node vj that stored it and the round in which it was stored) and the index of w in the string Uj that vj stored.

Proof.

Recall that, by definition, since a crashed node does not send any messages starting from the round in which it crashes, all nodes know the set 𝒜 of non-crashed nodes at the beginning of every round.

We prove Items (1) and (2) by induction on the round number. At initialization (the beginning of the first round, r=1), the inputs to the circuit C are assumed to be stored, hence (1) and (2) hold for the input gates 𝗀𝖺𝗍𝖾𝗌(0).

Next, we assume the statement holds at the beginning of some 𝖭𝗈𝖽𝖾𝖣𝗈𝗎𝖻𝗅𝗂𝗇𝗀-step, and we show it holds in every round until the end of this 𝖭𝗈𝖽𝖾𝖣𝗈𝗎𝖻𝗅𝗂𝗇𝗀-step. Note that all nodes execute Algorithm 5 in synchrony and, specifically, they all perform 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 or 𝖲𝗍𝗈𝗋𝖾 at the same rounds (other actions do not involve communication as they are purely computational and thus take zero rounds).

If round r is not the final round of a 𝖲𝗍𝗈𝗋𝖾 procedure, then the set of stored wires is unchanged. Otherwise, each node knows the set S of stored gates at the beginning of the 𝖲𝗍𝗈𝗋𝖾, by the induction hypothesis, and thus it also knows the set G=𝗀𝖺𝗍𝖾𝗌(i)S. Since 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾 is deterministic and depends only on C, 1, G, and S, then all nodes learn the same output of 𝖠𝗅𝗅𝗈𝖼𝖺𝗍𝖾(G,1). In particular, they all learn the gates Gj that each vj𝒜 is assigned to compute in this 𝖭𝗈𝖽𝖾𝖣𝗈𝗎𝖻𝗅𝗂𝗇𝗀-step.

With this knowledge, all nodes can (locally) generate the good randomness strings that are used by some vj for each of its 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 invocations (Lemma 10). Further, all nodes have the same knowledge about 𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾 calls that failed in previous rounds due to new crashes. They can thus infer which nodes vj have successfully retrieved all input wires of Gj (i.e., those for which Wj=) and satisfy the condition of in line 10. Only these nodes perform the 𝖲𝗍𝗈𝗋𝖾 that completes in that round r.

Out of the nodes that perform 𝖲𝗍𝗈𝗋𝖾, any node vj that does not crash before round r, succeeds in storing all the wires in Uj (i.e., all the output wires of Gj).

Therefore, at the start of round r+1, all the nodes in 𝒜 learn the set of nodes that performed a successful 𝖲𝗍𝗈𝗋𝖾. They also know the set Uj of each vj that completed a 𝖲𝗍𝗈𝗋𝖾, in particular, which wires it contains and their internal order.222While Uj is a set of wire-values, the 𝖲𝗍𝗈𝗋𝖾 procedure expects a bitstring, hence we induce some standard order on the set Uj that maps it into a bitstring, and this ordering is known by all nodes. This implies Item (2). It also follows that all nodes in 𝒜 can update their set S of stored gates in a consistent manner (line 15), which implies Item (1).

As a result of this careful bookkeeping, vj does not need to send any “metadata” information to the nodes it needs to query – they already have all the needed information (in the notations of 𝖡𝗎𝗅𝗄𝖱𝖾𝗍𝗋𝗂𝖾𝗏𝖾, they know w and U(w), the randomness strings (vj), and the specific round(s) in which vj queries them (i.e., is expecting a symbol from them).

This concludes the description of the algorithm. The algorithm’s analysis, the description of the LDC we construct for the purpose of deterministic local decoding, and the proof of Lemma 10 are deferred to the full version of the paper [8].

References

  • [1] Yagel Ashkenazi, Ran Gelles, and Amir Leshem. Noisy beeping networks. Information and Computation, 289:104925, 2022. doi:10.1016/j.ic.2022.104925.
  • [2] Hagit Attiya and Jennifer L. Welch. Distributed computing - fundamentals, simulations, and advanced topics (2. ed.). Wiley series on parallel and distributed computing. Wiley, 2004.
  • [3] John Augustine, Anisur Rahaman Molla, Gopal Pandurangan, and Yadu Vasudev. Byzantine connectivity testing in the congested clique. In 36th International Symposium on Distributed Computing (DISC), volume 246, pages 7:1–7:21, 2022. doi:10.4230/LIPIcs.DISC.2022.7.
  • [4] Philipp Bamberger, Fabian Kuhn, and Yannic Maus. Efficient deterministic distributed coloring with small bandwidth. In ACM Symposium on Principles of Distributed Computing (PODC), pages 243–252, 2020. doi:10.1145/3382734.3404504.
  • [5] Aviv Bick, Gillat Kol, and Rotem Oshman. Distributed zero-knowledge proofs over networks. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2426–2458, 2022. doi:10.1137/1.9781611977073.97.
  • [6] Keren Censor-Hillel, Michal Dory, Janne H. Korhonen, and Dean Leitersdorf. Fast approximate shortest paths in the congested clique. Distributed Computing, 34(6):463–487, 2021. doi:10.1007/s00446-020-00380-5.
  • [7] Keren Censor-Hillel, Orr Fischer, François Le Gall, Dean Leitersdorf, and Rotem Oshman. Quantum distributed algorithms for detection of cliques. In 13th Innovations in Theoretical Computer Science Conference (ITCS), volume 215, pages 35:1–35:25, 2022. doi:10.4230/LIPIcs.ITCS.2022.35.
  • [8] Keren Censor-Hillel, Orr Fischer, Ran Gelles, and Pedro Soto. Two for one, one for all: Deterministic ldc-based robust computation in congested clique, 2025. doi:10.48550/arXiv.2508.08740.
  • [9] Keren Censor-Hillel, Orr Fischer, Tzlil Gonen, François Le Gall, Dean Leitersdorf, and Rotem Oshman. Fast distributed algorithms for girth, cycles and small subgraphs. In 34th International Symposium on Distributed Computing (DISC), volume 179, pages 33:1–33:17, 2020. doi:10.4230/LIPIcs.DISC.2020.33.
  • [10] Keren Censor-Hillel, François Le Gall, and Dean Leitersdorf. On distributed listing of cliques. In Symposium on Principles of Distributed Computing (PODC), pages 474–482, 2020. doi:10.1145/3382734.3405742.
  • [11] Keren Censor-Hillel and Einav Huberman. Near-optimal resilient labeling schemes. In 28th International Conference on Principles of Distributed Systems (OPODIS), volume 324, pages 35:1–35:22, 2024. doi:10.4230/LIPIcs.OPODIS.2024.35.
  • [12] Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and Jukka Suomela. Algebraic methods in the congested clique. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing (PODC), pages 143–152, 2015. doi:10.1145/2767386.2767414.
  • [13] Keren Censor-Hillel and Pedro Soto. Computing in a faulty congested clique. CoRR, abs/2505.11430, 2025. doi:10.48550/arXiv.2505.11430.
  • [14] Yi-Jun Chang, Manuela Fischer, Mohsen Ghaffari, Jara Uitto, and Yufan Zheng. The complexity of (Δ+1) coloring in congested clique, massively parallel computation, and centralized local computation. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), pages 471–480, 2019. doi:10.1145/3293611.3331607.
  • [15] David Cifuentes-Núñez, Pedro Montealegre, and Ivan Rapaport. Recognizing hereditary properties in the presence of byzantine nodes. CoRR, abs/2312.07747, 2023. doi:10.48550/arXiv.2312.07747.
  • [16] Sam Coy, Artur Czumaj, Peter Davies, and Gopinath Mishra. Optimal (degree+1)-coloring in congested clique. In 50th International Colloquium on Automata, Languages, and Programming (ICALP), volume 261, pages 46:1–46:20, 2023. doi:10.4230/LIPIcs.ICALP.2023.46.
  • [17] Artur Czumaj, Peter Davies, and Merav Parter. Simple, deterministic, constant-round coloring in congested clique and MPC. SIAM J. on Computing, 50(5):1603–1626, 2021. doi:10.1137/20M1366502.
  • [18] Peter Davies. Optimal message-passing with noisy beeps. In Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing (PODC), pages 300–309, 2023. doi:10.1145/3583668.3594594.
  • [19] Danny Dolev, Christoph Lenzen, and Shir Peled. “tri, tri again”: Finding triangles and small subgraphs in a distributed setting. In Distributed Computing, volume 7611, pages 195–209. Springer, 2012. doi:10.1007/978-3-642-33651-5_14.
  • [20] Andrew Drucker, Fabian Kuhn, and Rotem Oshman. On the power of the congested clique model. In ACM Symposium on Principles of Distributed Computing (PODC), pages 367–376, 2014. doi:10.1145/2611462.2611493.
  • [21] Orr Fischer, Tzlil Gonen, Fabian Kuhn, and Rotem Oshman. Possibilities and impossibilities for distributed subgraph detection. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 153–162, 2018. doi:10.1145/3210377.3210401.
  • [22] Orr Fischer, Rotem Oshman, and Dana Shamir. Explicit space-time tradeoffs for proof labeling schemes in graphs with small separators. In 25th International Conference on Principles of Distributed Systems (OPODIS 2021), volume 217, pages 21:1–21:22, 2021. doi:10.4230/LIPIcs.OPODIS.2021.21.
  • [23] Orr Fischer and Merav Parter. Distributed CONGEST algorithms against mobile adversaries. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), pages 262–273, 2023. doi:10.1145/3583668.3594578.
  • [24] Orr Fischer and Merav Parter. All-to-all communication with mobile edge adversary: Almost linearly more faults, for free. CoRR, abs/2505.05735, 2025. doi:10.48550/arXiv.2505.05735.
  • [25] Pawel Garncarek, Dariusz R. Kowalski, Shay Kutten, and Miguel A. Mosteiro. Beeping deterministic congest algorithms in graphs. CoRR, abs/2502.13424, 2025. doi:10.48550/arXiv.2502.13424.
  • [26] Mohsen Ghaffari and Merav Parter. MST in log-star rounds of congested clique. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC), pages 19–28, 2016. doi:10.1145/2933057.2933103.
  • [27] Alex B. Grilo, Ami Paz, and Mor Perry. Distributed non-interactive zero-knowledge proofs. CoRR, abs/2502.07594, 2025. doi:10.48550/arXiv.2502.07594.
  • [28] James W. Hegeman, Gopal Pandurangan, Sriram V. Pemmaraju, Vivek B. Sardeshmukh, and Michele Scquizzato. Toward optimal bounds in the congested clique: Graph connectivity and MST. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), pages 91–100, 2015. doi:10.1145/2767386.2767434.
  • [29] Taisuke Izumi and François Le Gall. Triangle finding and listing in CONGEST networks. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), pages 381–389, 2017. doi:10.1145/3087801.3087811.
  • [30] Tomasz Jurdzinski and Krzysztof Nowicki. MST in O(1) rounds of congested clique. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2620–2632, 2018. doi:10.1137/1.9781611975031.167.
  • [31] Jonathan Katz and Luca Trevisan. On the efficiency of local decoding procedures for error-correcting codes. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing (STOC), pages 80–86, 2000. doi:10.1145/335305.335315.
  • [32] Janne H. Korhonen. Deterministic MST sparsification in the congested clique. CoRR, abs/1605.02022, 2016. doi:10.48550/arXiv.1605.02022.
  • [33] Manish Kumar. Fault-tolerant graph realizations in the congested clique, revisited. In Distributed Computing and Intelligent Technology, volume 13776, pages 84–97. Springer, 2023. doi:10.1007/978-3-031-24848-1_6.
  • [34] Manish Kumar, Anisur Rahaman Molla, and Sumathi Sivasubramaniam. Fault-tolerant graph realizations in the congested clique. In Algorithmics of Wireless Networks, volume 13707, pages 108–122, 2022. doi:10.1007/978-3-031-22050-0_8.
  • [35] Christoph Lenzen. Optimal deterministic routing and sorting on the congested clique. In Proceedings of the 2013 ACM Symposium on Principles of Distributed Computing (PODC), pages 42–50, 2013. doi:10.1145/2484239.2501983.
  • [36] Zvi Lotker, Boaz Patt-Shamir, Elan Pavlov, and David Peleg. Minimum-weight spanning tree construction in O(log log n) communication rounds. SIAM J. on Computing, 35(1):120–131, 2005. doi:10.1137/S0097539704441848.
  • [37] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
  • [38] Michael Mitzenmacher and Eli Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
  • [39] Moni Naor, Merav Parter, and Eylon Yogev. The power of distributed verifiers in interactive proofs. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1096–115, 2020. doi:10.1137/1.9781611975994.67.
  • [40] Krzysztof Nowicki. A deterministic algorithm for the MST problem in constant rounds of congested clique. In 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1154–1165, 2021. doi:10.1145/3406325.3451136.
  • [41] Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. On the distributed complexity of large-scale graph computations. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 405–414, 2018. doi:10.1145/3210377.3210409.
  • [42] Merav Parter. (delta+1) coloring in the congested clique model. In 45th International Colloquium on Automata, Languages, and Programming (ICALP), volume 107, pages 160:1–160:14, 2018. doi:10.4230/LIPIcs.ICALP.2018.160.
  • [43] Merav Parter and Hsin-Hao Su. Randomized (Delta+1)-coloring in O(log* Delta) congested clique rounds. In 32nd International Symposium on Distributed Computing (DISC), volume 121, pages 39:1–39:18, 2018. doi:10.4230/LIPIcs.DISC.2018.39.
  • [44] David Peleg. Distributed computing: a locality-sensitive approach. SIAM, 2000.
  • [45] Daniel A. Spielman. Highly fault-tolerant parallel computation. In Proceedings of 37th Conference on Foundations of Computer Science (FOCS), pages 154–163, 1996. doi:10.1109/SFCS.1996.548474.
  • [46] Sergey Yekhanin. Locally decodable codes. Foundations and Trends® in Theoretical Computer Science, 6(3):139–255, 2012. doi:10.1561/0400000030.

Appendix A Chernoff Bounds

Theorem 15 (Chernoff inequality for independent Bernoulli variables).

Let X1,,Xn be mutually independent 0–1 random variables with Pr(Xi=1)=pi. Let X=i=1nXi and set μ=E[X]. The following holds,

  1. 1.

    for any δ>0, Pr(X(1+δ)μ)(eδ(1+δ)(1+δ))μ

  2. 2.

    for 0<δ1, Pr(X(1+δ)μ)eμδ2/3

  3. 3.

    for R6μ, Pr(XR)2R

For proof, see Theorem 4.4 in [38].

Appendix B Missing Proofs from Section 3

Proof of Lemma 12.

Set r=|G|. let g1,,gr be the gates in G sorted in a decreasing order of their 𝖿𝖺𝗇. For any i[r], set di=𝖿𝖺𝗇(gi), then,

drdr1d1Δ.

First, we analyze the case where |𝒜|/2L|𝒜|. We notice that the load of each vertex v is trivially bounded by λ(v)i=1rdi (which is obtained if all gates are allocated to v). On the other hand, by assumption that |𝒜|/2L it follows that

λ(v)i=1rdi=P2PL/|𝒜|,

and the claim for the case of |𝒜|/2L|𝒜| follows.

We assume in the remainder of the proof that L<|𝒜|/2. First, we make the very simple observation that at all times during the procedure, it holds that

vVλ(v)i=1rdiL=PL. (1)

We split the analysis into two phases of the greedy allocation procedure: we say that the procedure is in its first phase while di2PL/|𝒜| and we say that it is in its second phase while di<2PL/|𝒜|. In each phase, we show that following a gate being allocated to batch of L vertices, all vertices have load at most max(4PL/|𝒜|,Δ).

While di2PL/|𝒜|, we must have at least |𝒜|/2 vertices with no load. Assume otherwise, then we notice that since the gates g1,,gr are sorted in descending order according to d1,,dr, then each vertex with an allocated gate has load at least di2PL/|𝒜|. Summing the loads of all vertices, we conclude that the total load is at least 2PL|𝒜|(|𝒜|2+1)>PL, contradicting Equation 1. Hence, after any allocation where di2PL/|𝒜|, we only pick vertices with λ(v)=0, and the load of any such vertex v is therefore at most Δ.

Next, we consider the phase where di2PL/|𝒜|. By an averaging argument on Equation 1, at any point of time of the procedure we have at least |𝒜|/2 vertices with load at most 2PL/|𝒜|. Since L<|𝒜|/2, the new load of a vertex v that is assigned a new gate is at most

λ(v)di+2PL|𝒜|4PL|𝒜|,

and the claim follows.