Stabilizing Consensus Is Impossible in Lossy Iterated Immediate Snapshot Models
Abstract
A substantial portion of distributed computing research is dedicated to terminating problems like consensus and similar agreement problems. However, non-terminating problems have been intensively studied in the context of self-stabilizing distributed algorithms, where processes may start from arbitrary initial states and can tolerate arbitrary transient faults. In between lie stabilizing problems, where the processes start from a well-defined initial state, but do not need to decide irrevocably and are allowed to change their decision finitely often until a stable decision is eventually reached.
Stabilizing consensus has been studied within the context of synchronous message adversaries. In particular, Charron-Bost and Moran showed that a necessary condition for stabilizing consensus is the existence of at least one process that reaches all others infinitely often (a perpetual broadcaster). However, it was left open whether this is also a sufficient condition for solving stabilizing consensus.
In this paper, we introduce the novel Delayed Lossy-Link (DLL) model, and the Lossy Iterated Immediate Snapshot Model (LIIS), for which we show stabilizing consensus to be impossible. The DLL model is introduced as a variant of the well-known Lossy-Link model, which admits silence periods of arbitrary but finite length. The LIIS model is a variant of the Iterated Immediate Snapshot (IIS), model which admits finite length periods of at most omission faults per layer. In particular, we show that stabilizing consensus is impossible even when .
Our results show that even in a model with very strong connectivity, namely, the Iterated Immediate Snapshot (IIS) model, a single omission fault per layer effectively disables stabilizing consensus. Furthermore, since the DLL model always has a perpetual broadcaster, the mere existence of a perpetual broadcaster, even in a crash-free setting, is not sufficient for solving stabilizing consensus, negatively answering the open question posed by Charron-Bost and Moran.
Keywords and phrases:
distributed systems, dynamic networks, dynamic graphs, message adversaries, stabilizing consensus, asynchronous message passingFunding:
Hugo Rincon Galeana: supported by German Research Foundation (DFG), Schwerpunktprogramm (SPP 2378), project ReNO, 2023-2027, and the financial support of Najla Amira Ochoa Leonor.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Computer systems organization Fault-tolerant network topologies ; Networks Network properties ; Theory of computation Distributed algorithms ; Theory of computation Randomness, geometry and discrete structuresAcknowledgements:
We would like to thank Ulrich Schmid and Kyrill Winkler for their helpful remarks, which undoubtedly improved this paper.Funding:
This work was supported by the DMAC project (P32431).Editors:
Silvia Bonomi, Letterio Galletta, Etienne Rivière, and Valerio SchiavoniSeries and Publisher:

1 Introduction
Agreement tasks, and in particular consensus, have always been a focal point of distributed computing research, not only because of their practical applicability, but also because consensus tasks characterize very precisely the limits of a distributed computing model. Understanding such limitations not only facilitates the quick assessment of problem solvability in a given model, but also sheds light on the impact that certain properties, such as synchrony, impose on the system. For instance, the celebrated FLP impossibility result [15] reveals the devastating effect of both asynchrony and process crashes on deterministic consensus solvability. In a similar vein, it has been shown that process crashes impair general task solvability [21] even in asynchronous shared memory systems, and that byzantine faults [22] and/or message loss [25, 26, 11, 29, 30] impair consensus solvability even in the context of synchronous message passing systems.
Although distributed systems research has focused extensively on terminating tasks, there exist interesting applications and problems that are inherently non-terminating. Apart from the specific class of self-stabilizing distributed algorithms [12, 14], which can even tolerate massive transient faults, asymptotic consensus [4, 5], stabilizing consensus [2, 9, 27], and approximate consensus [8, 13] are examples of non-terminating distributed tasks: processes are not required to terminate after having computed some final value, but rather eventually converge to some stable configuration. Such tasks are not only of theoretical interest, but are also essential for implementing practical distributed problems such as clock synchronization [23, 28] and sensor fusion [3].
In this paper, we focus on stabilizing consensus, which can be viewed as the non-terminating variant of consensus. As in the case of consensus, all processes start with their own input values and must eventually agree on a common value. The fundamental difference to terminating consensus is that every process may start arbitrarily late, and does not need to decide on a value irrevocably and exactly once, but can change its decision value finitely often. Nevertheless, all processes must eventually stabilize on the same decision value.
It should be noted that stabilizing consensus is also loosely related to asymptotic consensus [16], in the sense that, in neither problem, processes are required to terminate. A fundamental difference, however, is that asymptotic consensus allows processes to decide from a real-valued range, in contrast to the validity condition for stabilizing consensus, which only allows decisions from the pool of input values. Stabilizing consensus is a stronger problem, since any protocol that solves stabilizing consensus also solves asymptotic consensus. Indeed, we provide a simple novel model introduced below (the DLL), where stabilizing consensus is impossible (see Theorem 3.10), yet asymptotic consensus can be solved. In this light, asymptotic consensus is more closely related to approximate consensus [8], while stabilizing consensus is more closely related to consensus.
1.1 Related work
We study stabilizing consensus in the Synchronous Message Passing setting where processes communicate through uni-directional links in a round by round fashion and an adversary suppresses certain links in every round. More specifically, the communication in every round is modeled as a directed graph sequence, which is under the control of a message adversary [1]. Charron-Bost and Moran [9] provided a class of algorithms in this setting, called the MinMax Algorithms, solving stabilizing consensus under any message adversary that generates graph sequences adhering to two constraints: (i) there exists a process (a perpetual broadcaster) that is able to reach all other process (possibly via multiple hops) infinitely often, and (ii) the broadcasting time (in terms of rounds) for doing so is bounded by an unknown constant. The authors showed that (i) is necessary for solving stabilizing consensus, but leave the question open whether or not (i) is also sufficient.
In this paper, we extend the Lossy-Link model, introduced by Santoro and Widmayer [25], where two processes are connected by a pair of links that may drop at most one (directional) message per round, to the novel Delayed Lossy-Link (DLL) model, where both messages may be lost but (some) messages are guaranteed to be transmitted infinitely often. Whereas stabilizing consensus can be solved in the Lossy-Link model, we show that stabilizing consensus is impossible in the DLL model, thus negatively answering Charron-Bost and Moran’s open question in Theorem 4.8. Our proof vaguely resembles the bivalency proof used for showing the impossibility of terminating consensus in the Lossy-Link model [25], in the sense that we construct a forever conflicted run, i.e., a run where processes always decide on invalid output configurations, by extending conflicted prefixes. Whereas a bivalent prefix leads to more than one decision value eventually, a conflicted prefix already yields an output configuration that violates agreement. A bivalency argument is not enough to prove stabilizing consensus impossible, however.
Afek and Gafni [1] showed that asynchronous wait-free shared memory models such as the Iterated Immediate Snapshot (IIS) model can also be represented by a synchronous message adversary. We extend the message adversary formulation of the IIS model by adding up to read-omission faults per round to the snapshot operations. We call this new model the Lossy Iterated Immediate Snapshot model . In particular, in the case of only 2 processes, corresponds to the DLL model. The model is loosely related to the -solo models studied by Herlihy, Rajsbaum, Raynal and Stainer [20], as the presence of read omissions may lead to -solo rounds, i.e., rounds where up to processes do not receive any information from the rest of the processes.
1.2 Contributions and paper organization
In Section 2, we provide the preliminary definitions and the overall framework that are essential for our results. In Section 3, we present the Lossy-Link model, and introduce the Delayed Lossy-Link model for which we prove stabilizing consensus to be impossible, namely Theorem 3.10. This is our first main result and constitutes the foundation for the rest of the paper. In Section 4, we present the IIS model and our novel model, and show that stabilizing consensus is not solvable in for any , via Theorem 4.8. Finally, in Section 5 we discuss the implications of our result and provide some interesting directions for future work.
2 The System Model
We consider a finite set of process that communicate through message passing via directional point-to-point links in lock-step synchronous rounds. We assume that rounds are communication-closed, i.e., messages sent at a round will also arrive at round or will not be delivered. Note that in contrast to [9] we assume synchronous process starts (i.e., starting round for all ) as this special case already facilitates our impossibility result. Obviously, our result then carries over to the more general model presented in the original paper.
Each process is given an initial input value from a finite set of inputs . We say that an initial global configuration is an -tuple of process-input pairs. We denote the set of possible input configurations as . Similarly, an output configuration is an -tuple . We denote the set of possible output configurations by .
We model distributed problems as input/output relations called tasks that map a set of input configurations to a set of valid output configurations. Formally, a task is a triple where is a validity map that determines a subset of valid output configurations from a single input configuration. Tasks are widely used in the literature to capture problems in distributed setting, see for example [24, 17, 18].
The communication is governed by a message adversary that determines which processes are able to communicate at a given round. We use a directed graph on , which we call the communication graph, to represent the message exchange at round . We denote by the in-neighborhood of in graph , which always includes . Note that represents the set of processes from which receives a message at round . We represent the communication in an execution by a communication graph sequence , called a communication pattern, where . We denote by the concatenation of with and by the sequence of repeated times. If is a communication pattern, we say that is the -prefix of , denoted by . Conversely, denotes the suffix starting from round . We define to be a sub-sequence of , denoted by , iff there is a strictly increasing sequence such that . Note that the sub-sequence relation is a partial order on sequences with the same domain and co-domain. A message adversary is just a set of communication patterns .
Since our focus is on solvability, we assume for convenience that the processes execute a full-information flooding protocol. That is, at any given round , each process will send a copy of its local state, also called view to all possible recipients. For terminating protocols and terminating tasks, we consider strong termination, i.e., a process stops its execution whenever a decision is made by the protocol. Thus, processes will stop sending messages and updating their views once the protocol has output a decision value. In contrast, for stabilizing tasks and protocols, we consider that the protocol never terminates and processes keep communicating and updating their views forever.
We define the view of a process at the end of a round under and initial configuration as: , where . We define , where is the input value of process in .
We also employ a subview relation, denoted by , which can be defined inductively: iff and ; or for some
. Intuitively, this subview relation captures all the previous local states that a given process is aware of at a given round .
We denote by the set of inputs has heard of by round . Whenever is clear from the context or not relevant, we will omit it in favor of notational simplicity.
We define the kernel of a sequence as the set of processes that reach all other processes infinitely often, i.e., processes in are able to disseminate their local state at any point in the run. Formally, if and only if .
We define the global configuration at the end of round under a communication pattern as an -tuple .
As we only consider deterministic protocols, global configurations are fully determined by the input configuration, and the communication pattern . Therefore, in the context of this paper, we will consider that a run consists of the input configuration and communication pattern. Furthermore, whenever it is clear from the context, or the input configuration is not relevant, we will use communication pattern and run interchangeably.
We use views to determine indistinguishability between global configurations. More precisely, a global configuration is indistinguishable from a global configuration for a process iff , where are input configurations, and are communication patterns.
Note that any protocol can be split into a full-information flooding part that generates views and sends messages, and a decision map that produces an output value. If the output values form a valid configuration, then we say that the protocol solves a problem.
Formally, we define a decision map of a protocol as a function that maps process views generated by the full information flooding protocol to output values. We will denote a decision map as , where is the set of possible views induced by the message adversary under the protocol , and is the set of possible output values. Note that for terminating protocols, an additional undecided value may be used for representing that the protocol is not yet ready to decide. However, since we are focusing only on stabilizing protocols, we require that a decision is made at every view.
We define the stabilization property for tasks, which can be thought of as a relaxation of termination. Whereas termination requires a process to irrevocably choose its output value once and possibly even finalize its execution, stabilization allows the protocol to run indefinitely and correct its output finitely often.
Nevertheless processes must eventually stabilize on a common constant output value. We call the round on which a process stabilizes, i.e., never changes its output again, the stabilization round . It should be noted that stabilization may be reached agnostically. This means that there might be protocols where processes are guaranteed to stabilize, yet they are possibly never aware that they have already reached a stable state.
We model stabilizing problems as tasks, with the following definition of solvability:
Definition 2.1 (Stabilizing Task Solvability).
A protocol solves a stabilizing task iff for any run , there exists an output configuration and a stabilization round for every process , such that:
A protocol that solves a stabilizing task is called a stabilizing protocol.
2.1 Consensus and stabilizing consensus
Consensus is traditionally defined via the following three properties.
- 1. Termination:
-
Any process eventually irrevocably decides on an output value at a round .
- 2. Validity:
-
If a process decides , then was the input value of some process , i.e .
- 3. Agreement:
-
If process decides on an output value , and process decides , then .
Note that we assume that is finite and that any combination of input configurations is possible. Therefore, the above conditions do not define a single unique task, but rather a family of tasks. For instance, binary consensus (where ), multi-valued consensus (where ) correspond to different tasks, while both are instances of consensus.
Stabilizing consensus is defined by replacing the termination property 1. by the weaker condition 1’.. Since stabilizing protocols are allowed to decide at each round, the agreement condition 3. also needs to be adjusted to 3.’.
- 1’. Stabilization:
-
Any process eventually stabilizes on some valid output value after some round .
- 3’. Stable Agreement:
-
If process stabilizes on an output value , and another process stabilizes on , then .
Note that a task does not consider any termination or stabilization condition. Therefore, the stabilizing consensus problem, as well as terminating consensus, is determined by a consensus task and a stabilizing (respectively terminating) condition.
Definition 2.2 (Consensus Task).
We define the consensus task with respect to an input configuration set as .
Possible output configurations are vectors satisfying agreement , where denotes the set of input values induced by the input configurations , and the task map preserves validity .
3 The Lossy-Link, and Delayed Lossy-Link Message Adversaries
We will first introduce two particular message adversaries defined for only two processes, namely Lossy-Link (denoted by LL) and Delayed Lossy-Link (denoted by DLL). Lossy-Link was introduced by Santoro and Widmayer [25] and revisited in [26, 11], where it was shown that consensus is impossible even if at most a single message may be lost in every round. We will show that stabilizing consensus is solvable in the LL but impossible in the DLL.
Lossy-Link consists of 2 processes that communicate through a bi-directional link that may lose at most one message per round. For readability purposes, throughout this section we will denote the set of processes as .
Definition 3.1 (Lossy-Link message adversary).
We define the LL as111Throughout this paper, denotes the first infinite ordinal, it is very convenient for expressing regular infinite sequences in a compact way.:
Although the Lossy-Link message adversary prohibits solving terminating consensus [25], it admits a simple stabilizing protocol that solves stabilizing consensus as it satisfies both that any LL pattern has a non empty kernel, and any LL pattern has a trivially bounded broadcast time. Hence, we provide a 2-process instance of the MinMax Algorithm, introduced in [9], for stabilizing consensus. Since we adapt it to our full-information and flooding model, it suffices to provide a decision map ,
We omit a proof (for a complete formal treatment see [9]), but provide a sketch. The case where both have identical inputs is trivial, so assume w.l.o.g. that . We distinguish three cases, either (a) never hears from , or conversely (b) never hears from , or (c) both hear from each other. In (c) both eventually know both values after some round . They then always choose , as its the smallest value they know and as its the smallest value in any message they will receive, starting from round .
Case (a) is only possible in , so always hears from and chooses , as its the maximum heard of in the last round ( also chooses by validity). Similarly, case (b) is only possible in , where after round the minimum has heard of is , and will never receive a larger value, thus always chooses ( chooses the same by validity).
Now consider a different setting, where the communication link between and is also allowed to drop both messages, but only for at most consecutive rounds for some fixed . We call this model the Bounded-Delay Lossy-Link, abbreviated as . Following our previous notation, denotes the silent graph.
Definition 3.2 (Bounded-Delay Lossy-Link message adversary).
We define the -Bounded-Delay Lossy-Link message adversary, denoted by , as:
More generally, we define the Bounded-Delay Lossy-Link message adversary, as
Note that, as some communication is always guaranteed, has a non-empty kernel for any , satisfying (i) and trivially (ii). This also applies to the BDLL, where any sequence is an instance of some .
Nevertheless, just already breaks , as it is not guaranteed anymore that the maximum in the system reaches all other processes in every round: Consider the graph sequence with , where always alternates its decision value: it will choose if the last communication graph was , and if it was , as it only considers messages from the last round. In order to circumvent this alternation, we could adapt to look back two rounds instead of only looking at the last one. In fact, we can generalize this idea, by adapting to consider the past rounds and thus making it suitable for : , where is the set of views has received in the last rounds (when ).
A similar correctness argument as the one for can be used to show that solves stabilizing consensus in . However, any will fail in . Charron-Bost and Moran cleverly fixed this in [9] and provided an algorithm capable of solving stabilizing consensus in for any fixed , i.e., in BDLL. Intuitively their safe MinMax algorithm works as follows: Any sequence is also a member of , for some . Of course, one cannot choose some that is always larger than any , but, as the sequence is chosen beforehand and therefore fixed, one can gradually increase such that it eventually surpasses any fixed . And since also correctly runs for any , once surpassed it will run correctly forever. Note that we will formalize this in Section 4 for processes.
Nevertheless, we will prove that no stabilizing protocol can exist if there is no fixed bound on the length of consecutive silence periods. We call this model the Delayed Lossy-Link message adversary, which can be seen as the limit of the BDLL.
Definition 3.3 (Delayed Lossy-Link Message Adversary).
We define the Delayed Lossy-Link Message Adversary, denoted as DLL as:
where is the Kleene star of , i.e.,
Note that the DLL again has a non-empty kernel, as some communication always happens. It hence satisfies (i), does not satisfy (ii), as there is no bound on how many consecutive silent rounds may happen.
Also note that any communication pattern is essentially a sequence in the LL with arbitrarily long but finite periods of no communication in-between. Consequently, any communication pattern has a unique corresponding silence-free communication pattern where , which we call the silence-free core of . Reciprocally, we say that is a delayed pattern of .
3.1 Stabilizing consensus is impossible in the DLL
In this sub section, we prove one of our main results, namely the impossibility of stabilizing consensus in the DLL.
The proof strategy is the following: we assume by contradiction that there is a stabilizing protocol that can solve consensus in the DLL. Then we show that it is possible to construct a run that infinitely often decides for a conflicting configuration, i.e., with different decision values for and . The crucial argument for constructing a conflicting run is Lemma 3.9, which proves that any stabilizing protocol in the DLL can be translated into a patient protocol. A patient protocol allows processes to change their decision value only in rounds where they receive a message from another process.
We start by defining conflicted prefixes, which are somewhat similar (but different) to bivalent prefixes: whereas a bivalent prefix is a prefix that may eventually lead to mutually incompatible output configurations, a conflicted prefix is one that currently outputs an invalid configuration. For instance, the communication pattern under with yields a bivalent prefix for any round . Nevertheless, never outputs a conflicted prefix after round in this setting.
Definition 3.4 (Conflicted prefix).
Let be a stabilizing protocol, a task, and a communication pattern. We say that a prefix is conflicted with input configuration , iff .
Respectively, we say that a communication pattern is conflicted infinitely often iff there is a strictly increasing sequence such that , the -prefix of , is conflicted. Clearly, a stabilizing protocol solves a task under a message adversary iff there are no communication patterns in that are conflicted infinitely often.
Note that the possibility of arbitrary long silence periods adds a new layer of uncertainty, particularly in the context of stabilizing protocols. In fact, this removes any possibility of acquiring any useful information through silence. For instance, in LL, even if process does not receive a message in round , it is at least sure that will receive a message in round . The effect of this uncertainty on stabilizing protocols is reflected by the following definitions, Lemma 3.6, and Lemma 3.9.
Let be a prefix of length . For notational simplicity, we will use to denote , whenever the protocol is either unique or not immediately relevant to the context; similarly denotes the output configuration . Furthermore, if is an infinite communication pattern and is a stabilizing protocol, we denote by the stable decision by under with initial configuration .
Definition 3.5 (Patient prefix).
Let be a stabilizing protocol, and a stabilizing task in the DLL model. We say that a prefix is a patient prefix for input configuration , iff for every ;
Lemma 3.6 (Patience Lemma).
Let be a stabilizing protocol for a task , , and any prefix in the DLL model. There exists such that is a patient prefix.
Proof.
Note that is an admissible execution in the DLL message adversary. Since is a stabilizing algorithm, must eventually stabilize on a decision . Hence there is some minimal such that for any ,
Reciprocally, there is some minimal such that for any ,
Note that is indistinguishable from to . Symmetrically is indistinguishable from to .
Let , and consider . For any , is indistinguishable from to , and is indistinguishable from to .
Moreover, since , it follows that , and . Consequently, is indeed a patient prefix.
Note that Lemma 3.6 allows us to define the patience of a prefix , for a particular stabilizing protocol in the DLL and a particular input configuration .
Definition 3.7 (Patience).
Let be a stabilizing protocol, be an input configuration, and a prefix in the DLL. We say that is the patience of for and , if is minimal such that is a patient prefix.
Definition 3.8 (Patient Protocol).
Let be a stabilizing protocol in the DLL. We say that is patient if the patience of any prefix is .
Lemma 3.9 (Patient Protocol Reduction).
Let be a protocol that solves a stabilizing task
in the
DLL. Then there exists a patient protocol that solves .
Proof.
Consider a communication pattern . We define inductively a prefix sequence in the following way: , where We define a communication pattern , as the limit of .
Consider a protocol with decision map defined by: . We will show that is a patient stabilizing protocol that solves .
By construction, each is a patient prefix of , hence is patient. Note that is an infinite sub-sequence of that does not include an infinite silence suffix , does not include an infinite silent suffix either. Thus, .
Since is a stabilizing protocol that solves in DLL, in particular, provides a stabilizing solution for communication pattern . Since , it follows that is a stabilizing protocol that solves for communication pattern .
Theorem 3.10 (Stabilizing consensus Impossibility in DLL).
Let be an arbitrary stabilizing protocol in DLL. There exists a valid communication pattern such that does not solve stabilizing consensus for .
Proof.
Let us assume for a contradiction that there is a stabilizing protocol that solves stabilizing consensus in DLL. By Lemma 3.9, there exists a patient protocol that solves stabilizing consensus in DLL. We will provide an inductive construction of a run that is conflicted. More precisely, we define a sequence of conflicted prefixes such that , and .
Base case (Empty prefix ).
For the base case, consider an initial input configuration with different values . Due to the validity condition, each process must decide on its own value. Since , is a conflicted prefix.
Induction step (Prefix ).
Assume that is a conflicted prefix, i.e.,
We assert that at least one of , , is conflicted. Assuming that none is conflicted, we derive a contradiction (as illustrated in Figure 1).
Recall that is patient and is indistinguishable from for . Therefore, .
Since we assumed that is not conflicted, then
.
Note that is indistinguishable from for , and therefore .
Similarly, since is not conflicted by assumption,
.
Note that is indistinguishable from for
and thus,
.
Since is not conflicted by assumption, .
Finally, since is indistinguishable from for and is patient, .
Therefore , which contradicts the induction hypothesis, namely that is conflicted. Thus, either one of , , or is a conflicted prefix.
Choosing such that is conflicted, we define . Setting , note that by construction and each is a conflicted prefix: Therefore is indeed conflicted infinitely often, and does not solve stabilizing consensus.
4 The Iterated Immediate Snapshot and the Lossy Iterated Immediate Snapshot Models
Previously, we introduced the Lossy-Link and Delayed Lossy-Link message adversary as a starting point for our stabilizing consensus impossibility result. This is for two main reasons: first, the simplicity of the Lossy-Link highlights the source of the impossibility result without adding distracting details to the proofs, and second, the DLL impossibility translates transparently to other message adversaries that generalize DLL.
One such message adversaries is the Iterated Immediate Snapshot message adversary, which is computationally equivalent to the asynchronous Iterated-Immediate Snapshot model defined as an -process asynchronous shared memory model where processes communicate via atomic snapshots. The first approach towards this model was introduced by Borowski and Gafni [6], who presented the Immediate-atomic-snapshot model as a generalization of the classical FLP model [15]. They also re-used it as a convenient model for characterizing wait-free asynchronous shared memory systems [7].
In the IIS model, one considers a set of processes that communicate through a shared memory object . Each process may write into its own register and is able to read the whole memory in an atomic snapshot operation. A process that executes an atomic snapshot writes into its register and reads all of instantaneously. Since snapshot operations executed by different processes may occur concurrently, it is assumed that every process in a concurrent set is capable of reading each others output. Although atomic snapshot objects appear to be very strong, it has been shown that they are equivalent to single-reader single-writer wait-free shared registers [19].
Perhaps one of its most useful features of the IIS model is that it allows us to express an asynchronous shared memory model in terms of a synchronous message adversary [1], an asynchronous snapshot by all processes, can hence be represented by a communication graph. If a process and a process execute a snapshot operation, there are possibilities, either executed its snapshot before , before , or both snapshots are concurrent. If executes its snapshot before , then cannot read what wrote, but will be able to read the value from . Reciprocally, if executes its snapshot before , will see ’s value but not vice versa. If and have concurrent snapshots, then both are able to read each other’s value. Since we consider time to be linearly ordered, and snapshots occur within the same time frame (round), then this means that for any pair of processes and , either reads or reads , or both. When translated to communication graphs, this implies that each communication graph in the IIS model is semi-complete 222A directed graph is semi-complete if for any pair of vertices , either or .. Since the time at which each process executed its snapshot determines completely the memory registers that each process is able to read, every communication graph in the IIS model is also transitive 333A directed graph is transitive iff ..
The converse can also be shown, such that for every communication pattern of semi-complete and transitive graphs, there is an IIS schedule that matches . Thus, for the rest of this paper, we will simply consider that , where is the set of semi-complete and transitive graphs on . Note carefully that LL satisfies both transitivity and semi-completeness.
4.1 The Lossy Iterated Immediate Snapshot Model
Having introduced the IIS model as an -process generalization of the Lossy-Link model, we will now add read-omission faults to the immediate snapshot operations. This means that, whenever processes execute a snapshot in a round, there may be at most total read-omission failures of other processes’ registers.
Note that the empty graph , can be viewed as a read-omission in either or . However, for , faulty snapshots are slightly more complex than simply considering empty graphs. Consequently, we define the set of lossy snapshot graphs, as the set of directed graphs from (consisting of semi-complete and transitive directed graphs on ) with at least one missing edge and at most missing edges. More precisely, .
Similarly to Section 3, we start by limiting the number of consecutive iterations where a snapshot might be faulty. We denote the Bounded Lossy Iterated Immediate Snapshot model by , where is the maximum number of omission faults per iteration, and is the maximum number of consecutive rounds an immediate snapshot might be faulty. We then relax this restriction by requiring that correct snapshot iterations must only happen infinitely often (similar to the DLL). We call this model the Lossy Iterated Immediate Snapshot model, denoted by . Note that for processes, corresponds to , while corresponds to DLL.
These message adversaries can be formally defined as follows:
Definition 4.1 (IIS, BLIIS, LIIS Message Adversaries).
Let be the set of semi-complete and transitive graphs on , and the set of faulty snapshot graphs for .
-
-
-
-
, where is the Kleene star of .
First, we prove that all presented message adversaries have a non-empty kernel, and hence satisfy (i).
Lemma 4.2.
IIS, , and all have a non empty kernel, i.e., for any , we have .
Proof.
As every communication graph in is semi-complete and transitive, contains at least one process that directly reaches all other processes444It is a well known result that transitive tournaments are equivalent to strict total order graphs, and hence finite transitive tournaments always contain a dominating vertex. Thus, any finite semi-complete and transitive graph also includes a dominating vertex. See for example [10] for more results on tournaments.. As there are only finitely many processes, the pigeonhole principle guarantees that in the infinite sequence at least one process reaches all others infinitely often, implying .
Now assume to be an admissible run in either , or . By construction, in any of these models, there is an infinite number of instances of graphs in , i.e., and . But this is a sub-sequence with a non-empty kernel as proven above, therefore holds.
We will now argue that IIS, and allow solving stabilizing consensus. We start with the following technical lemma:
Lemma 4.3.
Let be any communication pattern. There exists a round such that no process learns any new input values anymore. Formally,
. We call this set ’s stable value set.
Proof.
As the set of processes is finite and input values are never forgotten in a run, the set can be increased only finitely often.
Lemma 4.4.
Let be any communication pattern. After round , ’s stable value set contains the stable value set of all processes in . Formally, , .
Proof.
By contradiction. Assuming the contrary implies that never receives a message from that reveals some input value known to . This contradicts the assumption that is in the kernel.
Corollary 4.5.
For any communication pattern with , processes in the kernel have identical stable value sets.
Theorem 4.6.
For any ,
solves stabilizing consensus in . Note that the number of past rounds considered (i.e, ) now depends on the current round .
Proof.
We set , i.e., the round after which all processes have arrived at their stable value sets. From Lemma 4.4, it follows that for all rounds ,
. From Corollary 4.5, we infer that processes in the kernel have identical minima.
-
Assume , and . As any is semi-complete and transitive, all processes not in the kernel receive the view of a process . By the above reasoning, ’s minimum is larger than all other minima, therefore the decision function chooses correctly from round on.
-
Assume , and . As is semi-complete and transitive for at least one graph among the ones for , we can repeat the previous reasoning but applied to views received in the previous rounds, . Eventually, i.e., after round , we are sure that the interval to contains a view with the stable value set of a process in the kernel. Note that as , this also solves stabilizing consensus on the IIS.
-
Assume , and . As does not have a predefined bound , we cannot resort to the previous reasoning. However, we know there is a bound, as implies for some . As any solves stabilizing consensus also on any for by inclusion, we can dynamically consider the previous rounds. This ensures that (a) we eventually surpass the fixed bound , and thus eventually always consider the view of a process in the kernel, and (b) eventually also exceeds and we only consider stable value sets of . Together with the fact that, ’s stable value set contains the largest minimum among all processes, we conclude that solves stabilizing consensus on .
As the main result of this section, we will now prove that stabilizing consensus is impossible in , i.e., in the presence of just one faulty register per round finitely consecutive rounds. We extend the proof of Theorem 3.10 from DLL to by identifying four admissible graphs in that “replicate” the DLL impossibility result in the LIIS model.
Note that , thus if stabilizing consensus is impossible in , then it is also impossible in for any .
For the remainder of this paper, we will consider a system with at least processes, and focus on two distinct processes and . For convenience, we assume that , i.e., as the choice of index is free, we rename in DLL to in the context of the , and symmetrically . We use the following convenient notations: ; , is the complete graph on ; represents the graph defined by , and , i.e., adding all possible edges from to but not the other way round, and also extend it to runs .
We define the following graphs: ; ; ; and . Note that , and are semi-complete and transitive and therefore snapshot graphs in the IIS. Likewise, , since it can be obtained by removing from or by removing from .
We show that any sequence in the DLL can be extended to a sequence in the where and have identical views, implying that a protocol solving stabilizing consensus in the necessarily solves it in the DLL, deriving a contradiction.
Lemma 4.7.
Any can be extended to a , s.t., .
Proof.
Let and consider for . By construction we know that
and as does not contain an infinite silent suffix (i.e., for all ), also has no infinite suffix where no communication between and happens. Thus, is an admissible sequence.
By construction, in , and only receive messages from each other, moreover, they only do so in rounds where they also receive a message in . Therefore their views are identical .
Theorem 4.8 (Stabilizing consensus Impossibility in ).
Let be an arbitrary stabilizing protocol in the model. does not solve stabilizing consensus.
Proof.
Assume for a contradiction that solves stabilizing consensus in the model. Take any , from Lemma 4.7, it follows that there exists a with identical views for , . The decision map therefore solves stabilizing consensus on directly contradicting Theorem 3.10.
5 Conclusion
In this paper we provided the first stabilizing consensus impossibility result where the existence a non-empty kernel is guaranteed (i) but the number or rounds where everybody hears from a member in the kernel (ii) is not . While it has been shown that a non-empty kernel is necessary for solving stabilizing consensus in [9], we prove that it is not sufficient.
At the core of the stabilizing consensus impossibility lies the fact that padding the communication with arbitrarily long but finite silence periods greatly impairs the decision power of stabilizing protocols, essentially forcing them to fix a value during silence periods. This limitation enables us to find a valid prefix that has conflicting decision values, and thus prevents stabilization. This result highlights the importance of communication liveness within a system, since any bounded variants of the DLL model are capable of solving stabilizing consensus.
Furthermore, we extended this impossibility result to the Iterated Immediate Snapshot model, where the possibility of a single read-faulty snapshot makes stabilizing consensus impossible. This result sheds new light on the impact of omission faults on consensus, even when the termination condition is relaxed to stabilization, and the communication assumptions are as strong as a shared-memory model.
References
- [1] Yehuda Afek and Eli Gafni. Asynchrony from synchrony. In Davide Frey, Michel Raynal, Saswati Sarkar, RudrapatnaK. Shyamasundar, and Prasun Sinha, editors, 14th International Conference on Distributed Computing and Networking (ICDCN), volume 7730 of LNCS, pages 225–239. Springer Berlin Heidelberg, 2013. doi:10.1007/978-3-642-35668-1_16.
- [2] Dana Angluin, Michael J. Fischer, and Hong Jiang. Stabilizing consensus in mobile networks. In Phillip B. Gibbons, Tarek Abdelzaher, James Aspnes, and Ramesh Rao, editors, Distributed Computing in Sensor Systems, pages 37–50, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. doi:10.1007/11776178_3.
- [3] Jón Atli Benediktsson and Philip H. Swain. Consensus theoretic classification methods. IEEE Trans. Syst. Man Cybern., 22(4):688–704, 1992. doi:10.1109/21.156582.
- [4] Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientific, Belmont, MA, 1989.
- [5] V.D. Blondel, J.M. Hendrickx, A. Olshevsky, and J.N. Tsitsiklis. Convergence in multiagent coordination, consensus, and flocking. In Proceedings of the 44th IEEE Conference on Decision and Control, pages 2996–3000, 2005. doi:10.1109/CDC.2005.1582620.
- [6] Elizabeth Borowsky and Eli Gafni. Generalized flp impossibility result for t-resilient asynchronous computations. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, STOC ’93, pages 91–100, New York, NY, USA, 1993. Association for Computing Machinery. doi:10.1145/167088.167119.
- [7] Elizabeth Borowsky and Eli Gafni. A simple algorithmically reasoned characterization of wait-free computation. In Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing (PODC), pages 189–198, 1997.
- [8] Bernadette Charron-Bost, Matthias Függer, and Thomas Nowak. Approximate consensus in highly dynamic networks: The role of averaging algorithms. In Magnús M. Halldòrsson, Kazuo Iwama, Naoki Kobayashi, and Bettina Speckmann, editors, International Colloquium on Automata, Languages, and Programming (ICALP), volume 9135 of Lecture Notes in Computer Science, pages 528–539. Springer Berlin Heidelberg, 2015. doi:10.1007/978-3-662-47666-6_42.
- [9] Bernadette Charron-Bost and Shlomo Moran. Minmax algorithms for stabilizing consensus. Distributed Comput., 34(3):195–206, 2021. doi:10.1007/s00446-021-00392-9.
- [10] Gary Chartrand, Linda Lesniak, and Ping Zhang. Graphs & Digraphs. Chapman & Hall/CRC, 6th edition, 2015.
- [11] Étienne Coulouma, Emmanuel Godard, and Joseph G. Peters. A characterization of oblivious message adversaries for which consensus is solvable. Theoretical Compututer Science, 584:80–90, 2015. doi:10.1016/j.tcs.2015.01.024.
- [12] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(11):643–644, 1974. doi:10.1145/361179.361202.
- [13] Danny Dolev, Nancy Lynch, Shlomit S. Pinter, Eugene W. Stark, and William E. Weihl. Reaching approximate agreement in the presence of faults. J. ACM, 33(3):499–516, July 1986. doi:10.1145/5925.5931.
- [14] Shlomi Dolev. Self-Stabilization. MIT Press, 2000.
- [15] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985. doi:10.1145/3149.214121.
- [16] Matthias Függer, Thomas Nowak, and Manfred Schwarz. Tight bounds for asymptotic and approximate consensus. J. ACM, 68(6), October 2021. doi:10.1145/3485242.
- [17] Eli Gafni. Distributed computing: a Glimmer of a theory. In Algorithms and theory of computation handbook: general concepts and techniques, page 29. Chapman & Hall/CRC, 2 edition, February 2010.
- [18] Hugo Rincon Galeana, Sergio Rajsbaum, and Ulrich Schmid. Continuous Tasks and the Asynchronous Computability Theorem. In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022), volume 215 of Leibniz International Proceedings in Informatics (LIPIcs), pages 73:1–73:27, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2022.73.
- [19] Maurice Herlihy, Dmitry Kozlov, and Sergio Rajsbaum, editors. Distributed Computing Through Combinatorial Topology. Morgan Kaufmann, Boston, 2014. doi:10.1016/B978-0-12-404578-1.00017-6.
- [20] Maurice Herlihy, Sergio Rajsbaum, Michel Raynal, and Julien Stainer. From wait-free to arbitrary concurrent solo executions in colorless distributed computing. Theoretical Computer Science, 683:1–21, 2017. doi:10.1016/j.tcs.2017.04.007.
- [21] Maurice Herlihy and Nir Shavit. The asynchronous computability theorem for t-resilient tasks. Conference Proceedings of the Annual ACM Symposium on Theory of Computing, August 1994. doi:10.1145/167088.167125.
- [22] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, July 1982. doi:10.1145/357172.357176.
- [23] Jennifer Lundelius and Nancy Lynch. A new fault-tolerant algorithm for clock synchronization. In Proc. 3rd ACM Symposium on Principles of Distributed Computing (PODC), pages 75–88, August 1984. doi:10.1145/800222.806738.
- [24] Michel Raynal and Julien Stainer. Synchrony weakened by message adversaries vs asynchrony restricted by failure detectors. In ACM Symposium on Principles of Distributed Computing (PODC), pages 166–175, Montrèal, Quèbec, Canada, 2013. ACM Press. doi:10.1145/2484239.2484249.
- [25] Nicola Santoro and Peter Widmayer. Time is not a healer. In B. Monien and R. Cori, editors, 6th Annual Symposium on Theoretical Aspects of Computer Science (STACS), pages 304–313, Berlin, Heidelberg, 1989. Springer Berlin Heidelberg. doi:10.1007/BFB0028994.
- [26] Ulrich Schmid, Bettina Weiss, and Idit Keidar. Impossibility results and lower bounds for consensus under link failures. SIAM Journal on Computing, 38(5):1912–1951, 2009. doi:10.1137/S009753970443999X.
- [27] Manfred Schwarz and Ulrich Schmid. Round-oblivious stabilizing consensus in dynamic networks. In Colette Johnen, Elad Michael Schiller, and Stefan Schmid, editors, Stabilization, Safety, and Security of Distributed Systems - 23rd International Symposium, SSS 2021, Virtual Event, November 17-2 0, 2021, Proceedings, volume 13046 of Lecture Notes in Computer Science, pages 154–172. Springer, 2021. doi:10.1007/978-3-030-91081-5_11.
- [28] Josef Widder and Ulrich Schmid. Booting clock synchronization in partially synchronous systems with hybrid process and link failures. Distributed Computing, 20(2):115–140, 2007. doi:10.1007/s00446-007-0026-0.
- [29] Kyrill Winkler, Ami Paz, Hugo Rincon Galeana, Stefan Schmid, and Ulrich Schmid. The Time Complexity of Consensus Under Oblivious Message Adversaries. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 100:1–100:28, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2023.100.
- [30] Kyrill Winkler, Ulrich Schmid, and Yoram Moses. A Characterization of Consensus Solvability for Closed Message Adversaries. In 23rd International Conference on Principles of Distributed Systems (OPODIS 2019). Schloss Dagstuhl – Leibniz Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.OPODIS.2019.17.