Abstract 1 Introduction 2 Preliminaries 3 RSC systems with interferences 4 𝒌-Multiparty Compatibility with interferences 5 Crash-stop failures 6 Session types with crash-stop failures 7 Experimental evaluation 8 Conclusion and further related work References Appendix A Proofs from §3

Unreliability in Practical Subclasses of Communicating Systems

Amrita Suresh ORCID University of Oxford, UK Nobuko Yoshida ORCID University of Oxford, UK
Abstract

Systems of communicating automata are prominent models for peer-to-peer message-passing over unbounded channels, but in the general scenario, most verification properties are undecidable. To address this issue, two decidable subclasses, Realisable with Synchronous Communication (rsc) and k-Multiparty Compatibility (k-mc), were proposed in the literature, with corresponding verification tools developed and applied in practice. Unfortunately, both rsc and k-mc are not resilient under failures: (1) their decidability relies on the assumption of perfect channels and (2) most standard protocols do not satisfy rsc or k-mc under failures. To address these limitations, this paper studies the resilience of rsc and k-mc under two distinct failure models: interference and crash-stop failures. For interference, we relax the conditions of rsc and k-mc and prove that the inclusions of these relaxed properties remain decidable under interference, preserving their known complexity bounds. We then propose a novel crash-handling communicating system that captures wider behaviours than existing multiparty session types (MPST) with crash-stop failures. We study a translation of MPST with crash-stop failures into this system integrating rsc and k-mc properties, and establish their decidability results. Finally, by verifying representative protocols from the literature using rsc and k-mc tools extended to interferences, we evaluate the relaxed systems and demonstrate their resilience.

Keywords and phrases:
Communicating automata, lossy channel, corruption, out of order, session types, crash-stop failure
Copyright and License:
[Uncaptioned image] © Amrita Suresh and Nobuko Yoshida; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Concurrency
; Theory of computation Distributed computing models ; Theory of computation Automata extensions
Related Version:
Full Version: http://arxiv.org/abs/2510.03941
Acknowledgements:
We thank Martin Vassor for his comments on an early version of this paper.
Funding:
This work is supported by EPSRC EP/T006544/2, EP/T014709/2, EP/Y005244/1, EP/V000462/1, EP/X015955/1, EP/Z0005801/1, EU Horizon TARDIS 101093006 (UKRI number 10066667) and ARIA.
Supplementary Material:
Software: https://github.com/NobukoYoshida/Interference-Tool
Editors:
C. Aiswarya, Ruta Mehta, and Subhajit Roy

1 Introduction

Asynchronous processes that communicate using First In First Out (FIFO) channels [11], henceforth referred to as FIFO systems, are widely used to model distributed protocols, but their verification is known to be computationally challenging. The model is Turing-powerful for even just two processes communicating via two unidirectional FIFO channels [11].

To address this challenge, several efforts have focused on identifying practical yet decidable subclasses – those expressive enough to model a wide range of distributed protocols, while ensuring that verification problems such as reachability and model checking remain decidable. Most FIFO systems assume perfect channels, which is too restrictive to model the real-world distributed phenomena where system failures often happen. This paper investigates whether two practical decidable subclasses of communicating systems, Realisable with Synchronous Communication (rsc) [18] and k-Multiparty Compatibility (k-mc) [35], are resilient when integrated with two different kinds of failures. These failure models were originally introduced in the contexts of contracts [37] and session types [3, 6]. We say a system is resilient under a given failure model if (i) the inclusion remains decidable, and (ii) the verification properties of interest remain decidable under that failure model.

Failures in communications.

A widely studied failure model in FIFO systems is lossy channels. Finkel [21] showed that the termination problem is decidable for the class of completely specified protocols, a model which strictly includes FIFO systems with lossy channels. Abdulla and Jonsson [1] developed algorithms for verifying termination, safety, and eventuality properties for protocols on lossy channels, by showing that they belong to the class of well-structured transition systems.

Another type of failure, studied in a more practical setting, occurs when one or more processes crash. In the most general case, Fekete et al. [20] proved that if an underlying process crashes, no fault-tolerant reliable communication protocol can be implemented. To address this, they consider faultless models which attempt to capture the behaviour of crashes by broadcasting crash messages. Such approaches have been explored in the context of runtime verification techniques [31] and session types [3, 6, 5]. In this work, we closely study a failure model proposed by [3, 5].

Figure 1: Classes of communication systems (since the -marked definitions are introduced in the context of CSA (Def 20), we restrict them accordingly).

Restricting the channel behaviour.

To define decidable subclasses, many works study how to restrict read and write access to channels. For two-process (binary) FIFO systems, the notion of half-duplex communication was introduced in [14], where at most one direction of communication is active at any time. For such systems, reachability is decidable in polynomial time. However, generalising this idea to the multiparty setting often yields subclasses that are either too restrictive or lose decidability.

Di Giusto et al. [17, 18] extended this idea to multiparty systems while preserving decidability, resulting in the notion of systems realisable with synchronous communication (rsc). They showed that this definition overlaps with that in [14] for mailbox communication. However, in the case of peer-to-peer communication where the two definitions differ, peer-to-peer rsc behaviour was proved to be decidable. rsc systems are related to synchronisable systems [10, 19, 8], in which FIFO behaviours must admit a synchronisable execution. The tool ReSCu applies this idea to verify real-world distributed protocols [16].

Another approach to restricting channel behaviours is to bound the length of the channel. Lohrey [36] introduced existentially bounded systems (see also [24, 23]) where all executions that reach a final state with empty channels can be re-ordered into a bounded execution. Although many verification problems are decidable for this class of systems, checking if a system is existentially k-bounded is undecidable, even if k is given as part of the input.

A decidable bounded approach, k-multiparty compatibility (k-mc), was introduced in [35]. This property is defined by two conditions, exhaustivity and safety. Exhaustivity implies existential boundedness and characterises systems where each automaton behaves the same way under bounds of a certain length. Checking k-mc is decidable, and the tool k-mc-checker is implemented and applied to verify Rust [12, 32] and OCaml [30] programs.

Combining the two approaches.

As far as we are aware, the intersection between expressive, decidable subclasses and communication failures is less explored. Lozes and Villard [37] studied reliability in binary half-duplex systems and showed that many communicating contracts can be verified with this model. Inspired by this, we investigate whether practical multiparty subclasses, rsc and k-mc, remain robust in the presence of communication errors.

Although failure models such as lossy channels are well studied, the complexity of verification in their presence is often very high – for instance, reachability in lossy systems is non-primitive recursive [21]. Our goal is not only to show that rsc and k-mc systems are resilient, but also that their inclusion remains decidable under failure models, with complexity maintained from the failure-free case.

This paper extends rsc [17, 18] and k-mc [35] by integrating two distinct failure models. rsc and k-mc systems are incomparable to each other (rsc is not a subset of k-mc and vice-versa), but both are closely related to existentially bounded systems. Figure 1 illustrates their relationship with other models.

For failures, first, we consider interferences from the environments by modelling lossiness, corruption (a message is altered to a different message) and out-of-ordering (two messages in a queue are swapped) of channels studied in the context of FIFO systems. Secondly, we consider potential crashes of processes introduced in the setting of session types [3, 6].

Figure 2: The above system 𝒮 is half-duplex in the absence of errors. However, in case of (any or multiple) errors, it is no longer half-duplex.

Let us consider the following simple half-duplex protocol as an example.

Example 1.

The system in Figure 2 is half-duplex under the assumption of perfect channels [14]. It consists of two processes, a 𝗌ender (𝗌) and a (dual) 𝗋eceiver (𝗋), communicating via unbounded FIFO channels. A transition 𝗌𝗋!m denotes that the 𝗌ender puts (asynchronously) a message m on channel 𝗌𝗋, and similarly, 𝗌𝗋?m denotes that message m is consumed by the 𝗋eceiver from channel 𝗌𝗋. Since the channel 𝗋𝗌 only contains messages after the 𝗋eceiver receives the end message and has emptied 𝗌𝗋, the system satisfies the half-duplex condition. Moreover, the 𝗌ender never sends data without having first sent start so the error loop is never triggered.

Now suppose the channels are prone to corruption. A message data could be altered to end after being sent. This allows the 𝗋eceiver to react prematurely by sending ack, while the 𝗌ender continues sending data. As a result, both channels may become non-empty, violating the half-duplex property. Similarly in the presence of other forms of interference, as shown in Example 6, this system no longer satisfies the half-duplex condition.

Contributions and outline.

The main objective of this paper is to investigate whether multiparty adaptations of half-duplex systems (rsc and k-mc) retain both their expressiveness for modelling real-world protocols and the decidability of their inclusions, with preserved complexity, under two distinct kinds of communication failures: interferences and crash-stops. § 2 introduces preliminary notions, notably FIFO systems and interference models; § 3 studies rsc under interference, and shows that relaxing certain conditions on matching send and receive actions retains both expressiveness and decidability (Theorem 19). § 4 examines k-mc with interferences, and proposes a relaxed version, k-wmc (weak k-mc), by weakening the safety condition. We prove that checking the k-wmc property remains decidable under interferences (Theorem 26). § 5 introduces the FIFO systems with crash-stop failures (called crash-handling systems), and shows that checking rsc and k-wmc under crash-stop failures is decidable (Theorems 32 and 33); § 6 defines a translation from (local) multiparty session types (MPST) to crash-handling systems and proves that this translation preserves trace semantics. This implies the decidability of rsc and k-mc within the asynchronous MPST system extended to crash-stop failures (Theorem 40); § 7 evaluates protocols from the literature extending the existing tools with support for interferences; and § 8 concludes with further related and future work. Proofs are provided in the appendix and [45]. The tools and benchmarks are available from https://github.com/NobukoYoshida/Interference-Tool.

2 Preliminaries

For a finite set Σ, we denote by Σ the set of finite words over Σ, and the empty word with ε. We use |w| to denote the length of the word w, and w1w2 indicates the concatenation of two words w1,w2Σ. Given a (non-deterministic) finite-state automaton 𝒜, we denote by (𝒜) the language accepted by 𝒜. Consider a finite set of processes (ranged over by 𝗉,𝗊,𝗋, or occasionally by 𝗋𝟣,𝗋𝟤,) and a set of messages Σ. In this paper, we consider the peer-to-peer communication model; i.e., there is a pair of unidirectional FIFO channels between each pair of processes, one for each direction of communication. In our model, processes act either by point-to-point communication or by internal actions (actions local to a single process). Moreover, in this setting, we consider messages to be atomic, akin to letters of an alphabet.

Let 𝐶ℎ ={𝗉𝗊𝗉𝗊 and 𝗉,𝗊} be a set of channels that stand for point-to-point links. Since we are considering the peer-to-peer model of communication, there is a unique process that can send a message to (or dually, receive a message from) a particular channel. An action a=(𝗉𝗊,!,m)𝖠𝖼𝗍 indicates that a process 𝗉 sends a message m on the channel 𝗉𝗊. Similarly, a=(𝗊𝗉,?,m)𝖠𝖼𝗍 indicates that 𝗉 receives a message m on the channel 𝗊𝗉. We henceforth denote an action a=(𝗉𝗊,,m)𝖠𝖼𝗍, where {!,?}, in a simplified form as 𝗉𝗊m. An internal action c𝗉 means that process 𝗉 performs the action c. We define a finite set of actions as 𝖠𝖼𝗍(𝐶ℎ×{!,?}×Σ)𝖠𝖼𝗍τ where 𝖠𝖼𝗍τ is the set of all internal actions.

Definition 2 (FIFO automaton).

A FIFO automaton 𝒜𝗉, associated with 𝗉, is defined as 𝒜𝗉=(Q𝗉,δ𝗉,q0𝗉) where: Q𝗉 is the finite set of control-states, δ𝗉Q𝗉×𝖠𝖼𝗍×Q𝗉 is the transition relation, and q0𝗉Q𝗉 is the initial control-state.

Note that in this model, there are no final or accepting states.

The set of outgoing channels of process 𝗉 is represented by 𝐶ℎo,𝗉={𝗉𝗊𝗊𝗉}. Similarly, 𝐶ℎi,𝗉={𝗊𝗉𝗊𝗉} is the set of incoming channels of process 𝗉.

Given an action a, an active process, denoted by 𝗉𝗋𝗈𝖼(a), is defined as: 𝗉𝗋𝗈𝖼(𝗉𝗊!m)=𝗉 and 𝗉𝗋𝗈𝖼(𝗉𝗊?m)=𝗊. Similarly, 𝖼𝗁(𝗉𝗊!m)=𝖼𝗁(𝗉𝗊?m)=𝗉𝗊.

We say a control state qQ𝗉 is a sending state (resp. receiving state) if all its outgoing transitions are labelled by send (resp. receive) actions. If a control state is neither a sending nor receiving state, i.e., it either has both send and receive actions or neither, then we call it a mixed state. We say a sending (resp. receiving) state is directed if all the send (resp. receive) actions from that control state are to the same process. Like for finite-state automata, we say that a FIFO automaton 𝒜𝗉=(Q𝗉,δ𝗉,q0𝗉) is deterministic if for all transitions (q,a,q),(q,a,q′′)δ𝗉, a=aq=q′′. We write q1a1anqn+1 for (q1,a1,q2)(qn,an,qn+1)δ𝗉. Unless specified otherwise, we consider non-deterministic automata, allowing mixed states, and all states do not have to be directed.

Definition 3 (FIFO system).

A FIFO system 𝒮=(𝒜𝗉)𝗉 is a set of communicating FIFO automata. A configuration of 𝒮 is a pair γ=(q;w) where q=(q𝗉)𝗉 is called the global state with q𝗉Q𝗉 being one of the local control-states of 𝒜𝗉, and where w=(w𝗉𝗊)𝗉𝗊𝐶ℎ with w𝗉𝗊Σ.

The initial configuration of 𝒮 is γ0=(q0;ε) where q0=(q0𝗉)𝗉 and we write ε for the |𝐶ℎ|-tuple (ε,,ε). We let 𝒜𝗉=(Q𝗉,δ𝗉,q0𝗉) be a FIFO automaton. Let 𝒮=(𝒜𝗉)𝗉 be the system whose initial configuration is γ0. The FIFO automaton 𝗉𝗋𝗈𝖽𝗎𝖼𝗍(𝒮) associated with 𝒮 is the standard asynchronous product automaton: 𝗉𝗋𝗈𝖽𝗎𝖼𝗍(𝒮)=(Q,δ,q0) where Q=𝗉Q𝗉, q0=(q0𝗉)𝗉, and δ is the set of triples (q1,a,q2) for which there exists 𝗉 such that (q1𝗉,a,q2𝗉)δ𝗉 and q1𝗋=q2𝗋 for all 𝗋{𝗉}. An execution e=a1a2an𝖠𝖼𝗍 is an arbitrary finite sequence of actions. We write 𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮) for {e𝖠𝖼𝗍γ0𝑒γ for some configuration γ}. Given e=a1a2an, we write 𝖠𝖼𝗍(e)={a1,a2,,an}. Moreover, we say two systems are trace-equivalent if they produce the same set of executions, i.e. 𝒮𝒮 is as follows: ϕ,ϕ𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮)ϕ𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮).

Interferences.

In this paper, we do not restrict the study to perfect channels, and instead consider that they may subject to interferences from the environment. Interferences are modelled as potential evolution of channel contents without a change in the global state of the system. As in [37], we model interferences by a preorder over words Σ×Σ, which will parametrise the semantics of dialogue systems. We denote by ww if w and w are the contents of the buffer respectively before and after the interferences.

Definition 4 (Interference model).

(from [37]) An interference model is a binary relation Σ×Σ satisfying the following axioms:

Reflexivity Transitivity Additivity Integrity Non-expansion
aΣaa wwww′′ww′′ w1w1w2w2w1w2w1w2 εww=ε ww|w||w|

Axiom Additivity defines that failures can happen at any part of the words; axiom Integrity says ε is the least word; and axiom Non-expansion says that preserves the size of words. Based on interferences, we define three failures as follows:

  • Lossiness: Possible leaks of messages during transmission are modelled by adding the axiom aε.

  • Corruption: Possible transformation of a message a into a message b is modelled by adding the axiom ab.

  • Out-of-order: Out-of-order communications are modelled by adding axioms abba for all a,bΣ.

We now define successor configurations for FIFO systems with interferences.

Definition 5 (Successor configuration under interference).

Let 𝒮 be a FIFO system. A configuration γ=(q;w) is a successor of another configuration γ=(q;w), by firing the transition (q𝗉,a,q𝗉)δ𝗉, written γγ or γ𝑎γ, if either: (1) a=𝗉𝗊!m and (a) q𝗋=q𝗋 for all 𝗋𝗉; and (b) w𝗉𝗊w𝗉𝗊m and w𝗋𝗌w𝗋𝗌 for all 𝗋𝗌𝗉𝗊; or (2) a=𝗊𝗉?m and (a) q𝗋=q𝗋 for all 𝗋𝗉; and (b) mw𝗊𝗉w𝗊𝗉 and w𝗋𝗌w𝗋𝗌 for all 𝗋𝗌𝗊𝗉.

The condition (1-b) puts the content to a channel 𝗉𝗊, while (2-b) gets the content from a channel 𝗉𝗊. The reflexive and transitive closure of is . We write γ1a1a2amγm+1 for γ1a1γ2γmamγm+1. Moreover, we write (γ1,a1a2am,γm+1)δ to denote {(γ1,a1,γ2),,(γm,am,γm+1)}δ. A configuration γ is reachable if γ0γ and we define 𝑅𝑆(𝒮)={γ|γ0γ}.

A configuration γ=(q;w) is said to be k-bounded if for all 𝗉𝗊𝐶ℎ, |w𝗉𝗊|k. We say that an execution e=e1e2en is k-bounded from γ1 if γ1e1γ2γnenγn+1 and for all 1in+1, γi is k-bounded; we denote this as γ1𝑒kγn+1.

We define the k-reachability set of 𝒮 to be the largest subset 𝑅𝑆k(𝒮) of 𝑅𝑆(𝒮) within which each configuration γ can be reached by a k-bounded execution from γ0. Note that, given a FIFO system 𝒮, for every integer k, the set 𝑅𝑆k(𝒮) is finite and computable.

Example 6.

Let us revisit the system in Figure 2 and explore each of the interferences with the following executions (we denote by red the messages subject to interference):

  • Corruption: Let us consider execution ec=𝗌𝗋!start.𝗌𝗋?start.𝗌𝗋!data. 𝗌𝗋?end.𝗋𝗌!ack.𝗌𝗋!data. Here, the message data has been corrupted to end. Hence, process 𝗋 incorrectly receives the message end, and assumes that process 𝗌 has stopped sending data, while process 𝗌 continues to send it.

  • Lossiness: Consider the execution e=𝗌𝗋!start.𝗌𝗋?start.𝗌𝗋!data.𝗌𝗋?data. 𝗌𝗋!end. Here, the message end has been lost, which means process 𝗋 will be stuck waiting for process 𝗌 to either send data or end.

  • Out-of-order: Let eo=𝗌𝗋!start.𝗌𝗋?start.𝗌𝗋!data.𝗌𝗋!end.𝗌𝗋?end.𝗋𝗌!ack.𝗋𝗌?ack.𝗌𝗋?data. 𝗋𝗌!err.𝗋𝗌?err. In this case, the order of data and end has been swapped in the queue, which leads to a configuration where the error message is sent.

As shown in [1, 21], for communicating automata with lossiness, the reachability set is recognisable, and the reachability problem is decidable. In the case of out-of-order scheduling, it is easy to see that the problem reduces to reachability in Petri nets. It is less clear, but it can also be reduced to Petri net reachability problem in case of corruption. However, the complexity of reachability for these systems is very high – it is non-primitive recursive for lossy systems [44], and Ackermann-hard for corruption and out-of-order [13]. Hence, it is still worth exploring subclasses in the presence of errors.

3 RSC systems with interferences

We first extend the definitions of synchrony in systems from [18] to consider possible interferences. The main extension relates to the definition of matching pairs. Intuitively, matching pairs refer to a send action and the corresponding receive action in a given execution. In the presence of interferences, it is not necessary that the same message that is sent is received (due to corruption), or that the k-th send action corresponds to the k-th receive action (due to lossiness or out-of-order). Hence we extend the definition of matching pairs.

Definition 7 (Matching pair with interference).

Given an execution e=a1an, if there exists a channel 𝗉𝗊, messages m,mΣ and j,j,k,k{1,,n} where j<j, and the following four conditions:

(1) aj=𝗉𝗊!m; (2) aj=𝗉𝗊?m; (3) aj is the k-th send action to 𝗉𝗊 in e; and (4) aj is the k-th receive action on 𝗉𝗊 in e, then we say that {j,j}{1,,n} is a matching pair with interference, or i-matching pair.

Note that if m=m and k=k, we are back to the original definition of matching pairs in [17, Section 2], which we shall refer to henceforth as perfect matching pairs. When we refer to a matching pair, we mean either a perfect or i-matching pair. Moreover, our formalism allows for a single message to have more than one kind of interference, e.g. the same message can be corrupted and received out-of-order.

Example 8.

Consider the following execution e=a1a5=𝗉𝗊!a𝗊𝗉!b𝗊𝗉?b𝗉𝗊!c𝗉𝗊?c. For the channel 𝗊𝗉, we have a perfect matching pair {2,3} which corresponds to the actions 𝗊𝗉!b and 𝗊𝗉?b, the 1st send and receive action along 𝗊𝗉. For the channel 𝗉𝗊, we see that the 1st receive action is not 𝗉𝗊?a, and hence, there is no perfect matching pair corresponding to 𝗉𝗊!a. However, in case of interferences, we can have the following cases:

  • If the message a is lost, i.e., 𝗉𝗊!a would be a lost action, 𝗉𝗊!c𝗉𝗊?c would be a matched send-receive pair, and therefore, {4,5} would be the corresponding i-matching pair.

  • If the message a was corrupted to c, then, 𝗉𝗊?c would be the receive action corresponding to 𝗉𝗊!a, and we would have {1,5} as an i-matching pair.

  • If the trace with an appended action as follows: e=𝗉𝗊!a𝗊𝗉!b𝗊𝗉?b𝗉𝗊!c𝗉𝗊?c𝗉𝗊?a, then it could be that messages a and c were scheduled out-of-order in the channel 𝗉𝗊. Then we have i-matching pairs {1,6} and {4,5}.

We now modify the definition of interactions from [17] as follows.

Definition 9 (Interaction).

An interaction of e is either a (perfect or i-) matching pair, or a singleton {j} such that aj is a send action and j does not belong to any matching pair (such an interaction is called unmatched send).

Given e=a1an, a set of interactions ν is called a valid communication of e if for every index j{1,,n}, there exists exactly one interaction χν such that jχ. I.e., we need to ensure that every action in e belongs to exactly one interaction in the valid communication. We denote by Comm(e) the set of all valid communications associated to e.

Example 10.

Revisiting Example 8, given the execution e=a1a5=𝗉𝗊!a𝗊𝗉!b𝗊𝗉?b𝗉𝗊!c𝗉𝗊?c, there are two valid communications, ν1={{1,5},{2,3},{4}} and ν2={{1},{2,3},{4,5}}, and Comm(e)={ν1,ν2}.

For the rest of this section, when we refer to an execution, we are referring to a tuple (e,ν) such that νComm(e). We say that two actions a1, a2 commute if 𝗉𝗋𝗈𝖼(a1)𝗉𝗋𝗈𝖼(a2) and they do not form a matching pair.

Given an execution (e,ν) such that e=a1an and νComm(e), we say that je,νj if (1) j<j and (2) aj, aj do not commute in ν. We now graphically characterise causally equivalent executions, using the notion of a conflict graph. This allows us to establish equivalences between different executions in which actions can be interchanged.

Definition 11 (Conflict graph).

Given an execution (e,ν), the conflict graph 𝖼𝗀𝗋𝖺𝗉𝗁(e,ν) is the directed graph (ν,e,ν) where for all interactions χ1,χ2ν, χ1e,νχ2 if there is j1χ1 and j2χ2 such that j1e,νj2.

The conflict graph corresponding to Example 10, 𝖼𝗀𝗋𝖺𝗉𝗁(e,ν1), is:

Two executions (e,ν) and (e,ν) are causally equivalent, denoted by (e,ν)(e,ν), if their conflict graphs are isomorphic.

We are now ready to define i-rsc systems, which is the extension of rsc to include interferences. Intuitively, i-rsc executions can be reordered to mimic rendezvous (or synchronous) communication. In the case with interference, we enforce that every valid communication is equivalent to a RSC execution.

Definition 12 (i-RSC system).

An execution (e,ν) is i-rsc if all matching pairs in ν are of the form {j,j+1}. A system 𝒮 is i-rsc if for all tuples (e,ν) such that e𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮) and νComm(e), we have 𝖼𝗀𝗋𝖺𝗉𝗁(e,ν)=𝖼𝗀𝗋𝖺𝗉𝗁(e,ν) where (e,ν) is an i-rsc execution.

Example 13.

From Ex. 10, (e,ν2) is an i-rsc execution, but (e,ν1) is neither an i-rsc execution nor equivalent to one, as message a has to be sent before message b is received by process 𝗉 while message b has to be sent before the corresponding message (which is now c due to corruption) is received by process 𝗊.

This is the strictest version, however, this can be adapted to include only one communication by assuming a single instance of ν instead of all. We formalise our observation about non-i-rsc behaviours from Example 13, and show that i-rsc still maintains the good properties of the conflict graph as in [17].

Lemma 14.

An execution (e,ν) is causally equivalent to an i-rsc execution iff the associated conflict graph 𝖼𝗀𝗋𝖺𝗉𝗁(e,ν) is acyclic.

A borderline violation for interferences defined below is a key concept for the decidability of rsc systems. Intuitively, it provides a “minimal counter-example” for non-rsc behaviour.

Definition 15 (Borderline violation).

An execution (e,ν) is a borderline violation if (1) (e,ν) is not causally equivalent to an i-rsc execution, (2) e=e𝖼?m for some execution e such that (a) for all νComm(e), (e,ν) is equivalent to an i-rsc execution and (b) there exists ν1Comm(e) such that (e,ν1) is an i-rsc execution.

Lemma 16.

𝒮 is i-rsc if and only if for all e𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮) and νComm(e), (e,ν) is not a borderline violation.

Following the same approach as in [17], we show that inclusion into the i-rsc class is decidable. For simplicity, we construct the following sets: 𝖠𝖼𝗍nr={𝖼!?m𝖼!m𝖠𝖼𝗍,𝖼?m𝖠𝖼𝗍}{𝖼!m𝖼!m𝖠𝖼𝗍} and 𝖠𝖼𝗍?={𝖼?m𝖼?m𝖠𝖼𝗍}. Note that 𝖼!?m could include a send-receive pair where the message sent is different from the one received. This ensures inclusion of matching pairs due to corruption. An i-rsc execution can be represented by a word in 𝖠𝖼𝗍nr and a borderline violation by a word in 𝖠𝖼𝗍nr.𝖠𝖼𝗍?. We first show that the set of borderline violations is regular.

Lemma 17.

Let 𝒮 with 𝗉𝗋𝗈𝖽𝗎𝖼𝗍(𝒮)=(Q,Σ,𝐶ℎ,𝖠𝖼𝗍,δ,qo). There is a non-deterministic finite state automaton 𝒜𝑏𝑣 computable in time 𝒪(|𝐶ℎ|3|Σ|2) such that (𝒜𝑏𝑣)={e𝖠𝖼𝗍nr.𝖠𝖼𝗍?νComm(e) such that (e,ν) is a borderline violation}.

Next we show that the subset of executions 𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮) that begin with an i-rsc prefix and terminate with a reception is regular. The construction of the automaton 𝒜𝑟𝑠𝑐 recognising such a language mimics the i-rsc executions of the original system 𝒮, storing only the information on non-empty buffers, guessing which is the send message that will be matched by the final reception.

For the following result, we let the size |𝒜| of an automaton 𝒜=(Q,δ,q0) be |Q|+|δ|. Moreover, the size |𝒮| of a system 𝒮=(𝒜𝗉)𝗉 = 𝗉|𝒜𝗉|.

Lemma 18.

Let 𝒮 be a FIFO system. There exists a non-deterministic finite state automaton 𝒜𝑟𝑠𝑐 over 𝖠𝖼𝗍nr𝖠𝖼𝗍? such that (𝒜𝑟𝑠𝑐)={e𝗉𝗊?m𝖠𝖼𝗍nr.𝖠𝖼𝗍?e𝗉𝗊?m𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮) and νComm(e) such that (e,ν) is an i-rsc execution}, which can be constructed in time 𝒪(n||+2|𝐶ℎ|2×2|𝐶ℎ|), where n is the size of 𝒮.

Using the above lemmas, we derive the following main theorem in this section, which states that the inclusion of an i-rsc system is decidable; and the complexity is comparable to that of checking inclusion to rsc[17, Theorem 12].

Theorem 19.

Given a system 𝒮 of size n, deciding whether it is an i-rsc system can be done in time 𝒪(n||+2|𝐶ℎ|5×2|𝐶ℎ|×|Σ|2).

4 𝒌-Multiparty Compatibility with interferences

We extend our analyses to consider k-multiparty compatibility (k-mc) which was introduced in [35] for a subset of FIFO systems, called communicating session automata (CSA). CSA strictly include systems corresponding to asynchronous multiparty session types [15].

Definition 20 (Communicating session automata).

A deterministic FIFO automaton which has no mixed states is defined as a session automaton. FIFO systems comprising session automata are referred to as communicating session automata (CSA).

In this section, we only consider communicating session automata. We begin by recalling the definition of k-mc which is composed of two properties, k-safety and k-exhaustivity.

Definition 21 (k-Safety, Definition 4 in [35]).

A communicating system 𝒮 is k-safe if the following holds for all (q;w)𝑅𝑆k(𝒮):

(k-er)

𝗉𝗊𝐶ℎ, if w𝗉𝗊=m.u then (q;w)k𝗉𝗊?mk.

(k-pg)

if q𝗉 is receiving, then (q;w)k𝗉𝗊?m for some mΣ.

The k-safety condition is composed of two properties, the first being eventual reception (k-er) which ensures that every message sent to a channel is eventually received. The other property is progress (k-pg) where the system is not “stuck” at any receiving state.

A system is k-exhaustive if for all k-reachable configurations, whenever a send action is enabled, it can be fired within a k-bounded execution.

Definition 22 (k-Exhaustivity, Definition 8 in [35]).

A communicating system 𝒮 is k-exhaustive if for all (q;w)𝑅𝑆k(𝒮) and 𝗉𝗊𝐶ℎ, if q𝗉 is a sending state, then for all (q𝗉,𝗉𝗊!m,q𝗉)δ𝗉, there exists (q;w)k𝗉𝗊!mk.

Figure 3: The above system 𝒮 is k-MC but has a non-regular reachability set.

Example 23 shows that the reachability set of k-mc is not necessarily regular, unlike the binary half-duplex systems [14].

Example 23.

The system 𝒮 depicted in Figure 3 is an example of a k-mc system for which the reachability set is not regular. It consists of four participants, sending messages amongst themselves. The first participant 𝗌 can send equal number of a, b, c letters in a loop to participants 𝗉, 𝗊, and 𝗋 respectively. The participants 𝗉, 𝗊, and 𝗋 behave similarly, so let us take the example of 𝗉. It consumes one letter from the channel 𝗌𝗉, then as a way of synchronisation sends a message α to 𝗊 and waits to receive a message α from 𝗋. This ensures that participants 𝗉, 𝗊, and 𝗋 consume equal number of letters from their respective channels with 𝗌. Hence, the reachability set for initial configuration (s0,p0,q0,r0) is an#bn#cn which is context-sensitive, hence non-regular.

We prove that k-mc in the absence of errors, for a large class of systems the k-safety property subsumes k-exhaustivity.

Theorem 24.

If a directed CSA 𝒮 is k-safe, then 𝒮 is k-exhaustive.

𝒌-MC with interferences.

Theorem 24 shows that the k-safety is a too strong condition in the presence of interferences. For instance, in case of lossiness, progress cannot be guaranteed. This is because there is always the potential of losing messages and being in a receiving state forever. We are now ready to define k-weak multiparty compatibility.

Definition 25 (k-Weak Multiparty Compatibility).

A communicating system 𝒮 is weakly k-mc, or k-wmc, if it satisfies k-er and is k-exhaustive.

This notion covers a larger class of systems than k-mc systems, and it is more natural in the presence of errors. Moreover, we still retain the decidability of k-wmc in the presence of errors. We briefly discuss weaker refinements to these properties in § 8. We conclude with the following theorem which states that the k-wmc property is decidable.

Theorem 26.

Given a system 𝒮 with lossiness (resp. corruption, resp. out-of-order) errors, checking the k-wmc property is decidable and pspace-complete.

Proof.

To check whether 𝒮 is not k-exhaustive, i.e., for each sending state q𝗉 and send action from q𝗉, we check whether there is a reachable configuration from which this send action cannot be fired. Hence, we need to search 𝑅𝑆k(𝒮), which has an exponential number of configurations (wrt. k). Note that due to interferences, each of these configurations can now have modified channel contents. We need to store at most ||n|𝐶ℎ||Σ|k configurations, where n is the maximum number of local states of a FIFO automata, following ideas from [35] and [9]. Hence, the problem can be decided in polynomial space when k is given in unary.

Next, to show that k-wer is decidable, we check for every such reachable configuration, that there exists a receive action from the same channel (note that we do not need to ensure it is the same message).

5 Crash-stop failures

Session types [46, 27, 28] are a type discipline to ensure communication safety for message passing systems. Most session types assume a scenario where participants operate reliably, i.e. communication happens without failures. To model systems closer to the real world, Barwell et al. [6, 3] introduced session types with crash-stop failures. In this section, we consider the same notion for communicating systems which we define as crash-handling.

5.1 Crash-handling FIFO systems

We extend this framework to FIFO systems. As in [6, 3], we declare a (potentially empty) set of reliable processes, which we denote as . If a process is assumed reliable, the other processes can interact with it without needing to handle its crashes. Hence if =, there is no additional crash-handling behaviour for the system. In this way, we can model a mixture of reliable and unreliable processes. For simplicity in the construction, we enforce an additional constraint that in the crash-handling branches, there is no receive action from the crashed process.

We use a shorthand for the broadcast of a message mΣ by process 𝗉 along all outgoing channels: (q,broadcast𝗉(m),q) if q𝗉𝗋𝟣!m.𝗉𝗋𝟤!m𝗉𝗋𝗇!mq such that n=|𝐶ℎo,𝗉| and 𝗋𝗂𝗋𝗃 for all 𝗂𝗃. We denote by crash-broadcast𝗉(m) the concatenation crash𝗉broadcast𝗉(m) where crash𝗉𝖠𝖼𝗍τ is an internal action reserved for when process 𝗉 crashes.

Let 𝒮=(𝒜𝗉)𝗉 be a FIFO system over Σ{}. Let the set of reliable processes be . For each 𝗉:

  • We divide the state set as follows : Q𝗉=Q𝗉,1Q𝗉,2Q𝗉,3.

  • Let 𝖠𝖼𝗍(𝐶ℎ×{!,?}×(Σ{}))𝖠𝖼𝗍τ be the set of actions.

  • We split δ𝗉=δ𝗉,1δ𝗉,2 such that:

    • δ𝗉,1Q𝗉,1×(𝐶ℎ×{!,?}×Σ)×(Q𝗉,1Q𝗉,2), and

    • δ𝗉,2Q𝗉×[(𝐶ℎ×{!,?}×{})𝖠𝖼𝗍τ]×Q𝗉.

We say that a process 𝗉 has crash-handling behaviour in 𝒮 if δ𝗉,2 is the smallest set of transitions such that:

  1. 1.

    Crash handling (ch): For all (q,𝗋𝗉?a,q)δ𝗉,1 such that γ0𝑒γ=(q;w) and q𝗉=q and 𝗋, there exists q′′(Q𝗉,1Q𝗉,2) such that (q,𝗋𝗉?,q′′)δ𝗉,2.

  2. 2.

    Crash broadcast (cb): If 𝗉, then for all qQ𝗉,1, there exists a crash-broadcast (q,crash-broadcast𝗉(),qstop)δ𝗉,2, for some qstopQ𝗉,2 and all intermediate states belonging to Q𝗉,3.

  3. 3.

    Crash redundancy (cr): Finally, we have the condition that any dangling crash messages are cleaned up. For all qQ𝗉,2, (q,𝗋𝗉?,q)δ𝗉,2.

Condition (ch) enforces that every state in the system which receives from an unreliable process has a crash-handling branch, so that the receiving process is not deadlocked waiting for a message from a process that has crashed. Condition (cb) ensures that every unreliable process can non-deterministically take the internal action crash𝗉 when it crashes and broadcast this information to all the other participants. Condition (cr) ensures that from all states in Q𝗉,2, any dangling crash messages are cleaned up from an (otherwise empty) channel.

Definition 27 (Crash-handling systems).

We say that a system 𝒮 is crash-handling if every process 𝗉 has crash-handling behaviour in 𝒮.

Figure 4: (a) The system 𝒮 (right) is crash-handling. (b) FIFO automata (right) of the type in Example 36.

Consider the following example, which models a simple send-receive protocol between a sender and a receiver.

Example 28.

Figure 4(a) shows a crash-handling system. It consists of two processes, a 𝗌erver (𝗌) and a 𝖼lient (𝖼). We assume that the 𝗌erver is reliable, i.e. does not crash, while the 𝖼lient is unreliable, i.e could crash. Hence the 𝖼lient can crash in any control state, while the 𝗌erver is always ready to handle a crash when it is waiting for a message from the 𝖼lient.

In this construct, it is still possible to send messages to a crashed process. This is because from the perspective of the sending process, the crash of the receiving process is not necessarily known. Therefore, in this model, while processes can continue to send messages to crashed processes, the crashed processes would not be able to receive any messages.

Note that these properties are local to each individual automaton, hence the verification of these properties is decidable.

Lemma 29.

It is decidable to check whether a system is crash-handling.

We see that this behaviour can be appended to any FIFO system, but it does not affect the underlying verification properties of the automata. We demonstrate with the example of boundedness (i.e. checking if every execution is k-bounded for some k), but a similar argument can be used for reachability or deadlock.

Lemma 30.

The boundedness problem is undecidable for crash-handling systems.

5.2 Crash-handling subsystems

Next we investigate inclusion of crash-handling systems in the aforementioned classes.

Crash-handling RSC systems.

Checking that the rsc property is decidable for crash-handling systems amounts to verifying if the proofs hold for communicating automata with internal actions. Let us first look at an example.

Example 31.

The system in Figure 4 is a crash-handling system that is also rsc. We see that the behaviour of the system in the absence of crashes is rsc, and in the presence of crashes, there is no additional non-rsc behaviour. Moreover, even if the 𝑟𝑒𝑞 is sent, followed by the crash broadcast – since the crash message is never received, the behaviour of the system is still rsc. However, this need not be the case for other examples.

Next we show that the proofs from [17] can be adapted to automata with internal actions.

Theorem 32.

Given a crash-handling system 𝒮, it is decidable to check inclusion to the rsc class.

Crash-handling 𝒌-WMC systems.

We now show that checking k-wmc is decidable for crash-handling systems generated from a collection of local types. The reason for considering k-wmc instead of k-mc in [35, Definition 9] is that for crash-handling systems generated from local types, the end states are receiving states (as opposed to final states). This result is adapted from [35] with the inclusion of internal actions.

Theorem 33.

Given a crash-handling system 𝒮 generated from a collection of communicating session automata, it is decidable to check k-wmc, and can be done in pspace.

6 Session types with crash-stop failures

This section shows that the crash-handling system strictly subsumes the crash-stop systems in [3, 5], preserving the semantics. We recall the crash-stop semantics for local types defined in [3] where the major additions are (1) a special local type 𝗌𝗍𝗈𝗉 to denote crashed processes; and (2) a crash-handling branch (catch) in one of branches of an external choice.

The syntax of the local types (S,T,) are given as:

S,T 𝗉?{mi.Ti}iI𝗉!{mi.Ti}iI (external choice, internal choice)
μ𝐭.T𝐭𝖾𝗇𝖽𝗌𝗍𝗈𝗉 (recursion, type variable, end, crash)

An external choice (branching) (resp., an internal choice (selection)), denoted by 𝗉?{mi.Ti}iI (resp., 𝗉!{mi.Ti}iI) indicates that the current role is to receive from (or send to) the process 𝗉. We require pairwise-distinct, non-empty labels and the crash-handling label (catch) not appear in internal choices; and that singleton crash-handling labels not permitted in external choices. The type 𝖾𝗇𝖽 indicates a successful termination (omitted where unambiguous), and recursive types are assumed guarded, i.e., μ𝐭.𝐭 is not allowed, and recursive variables are unique. A runtime type 𝗌𝗍𝗈𝗉 denotes crashes.

We point out here that while this is a bottom-up view of the crash-handling behaviour introduced in [3], we have taken a purely type-based approach here. For a calculus based approach, we refer the reader to [4].

We define the LTS over local types and extend the notions to communicating systems. We use the same labels as the ones for communicating systems.

Definition 34 (LTS over local types).

The relation T𝑎T for the local type of role 𝗉 is defined as:

[LR1]𝗊{mi.Ti}iI 𝗉𝗊mk Tk,where {!,?} and mkcatch.
[LR2]T[μ𝐭.T/𝐭]𝑎T μ𝐭.T𝑎T
[LR3]𝗊{mi.Ti}iI crash-broadcast𝗉() 𝗌𝗍𝗈𝗉, where {!,?}.
[LR4]𝗊?{mi.Ti}iI 𝗊𝗉? Tk, if mk=catch.
[LR5]T 𝗊𝗉? T,𝗊{𝗉} for T{𝗌𝗍𝗈𝗉,𝖾𝗇𝖽}.

Rules [LR1] and [LR2] are standard output/input and recursion rules, respectively; rule [LR3] accommodates for the crash of a process; rule [LR4] is the main rule for crash-handling where the reception of crash information leads the process to a crash-handling branch; and rule [LR5] allows any dangling crash information messages to be read in the sink states.

The LTS over a set of local types is defined as in Definition 3, where a configuration γ=(T;w) of a system is a pair with T={T𝗉}𝗉 and w=(w𝗉𝗊)𝗉𝗊𝐶ℎ with w𝗉𝗊Σ.

Next we algorithmically translate from local types to FIFO automata preserving the trace semantics. Below we write μ𝐭.T for μ𝐭1.μ𝐭2μ𝐭n.T with n0.

In order to construct the FIFO automata, we first need to define the set of states. Intuitively, this is the set of types which result from any continuation of the initial local type. Below we define a type occurring in another type (based on the definition in [47]).

Definition 35 (Type occurring in type, [47]).

We say a type T occurs in T (denoted by TT) if and only if at least one of the following conditions holds: (1) if T is 𝗉?{mi.Ti}iI, there exists iI such that TTi; (2) if T is 𝗉!{mi.Ti}iI, there exists iI such that TTi; (3) if T is μ𝐭.Tμ then TTμ; or (4) T=T, where = denotes the syntactic equality.

Example 36.

Let ={𝖠,𝖡,𝖢} and ={𝖡,𝖢}. Consider a local type of 𝖢: T=μ𝐭.𝖡?{𝑠𝑖𝑔.𝖠?{𝑐𝑜𝑚𝑚𝑖𝑡.𝐭,catch.𝖾𝗇𝖽},𝑠𝑎𝑣𝑒.𝖠?{𝑓𝑖𝑛𝑖𝑠ℎ.𝖾𝗇𝖽,catch.𝖾𝗇𝖽}}.

Then, the set of all TT is {T,𝖡?{𝑠𝑖𝑔.𝖠?{𝑐𝑜𝑚𝑚𝑖𝑡.𝐭,catch.𝖾𝗇𝖽}},𝖠?{𝑐𝑜𝑚𝑚𝑖𝑡.𝐭},𝖡?{𝑠𝑎𝑣𝑒.𝖠?{𝑓𝑖𝑛𝑖𝑠ℎ.𝖾𝗇𝖽,catch.𝖾𝗇𝖽}},𝖠?{catch.𝖾𝗇𝖽},𝖠?{𝑓𝑖𝑛𝑖𝑠ℎ.𝖾𝗇𝖽},𝖾𝗇𝖽,𝐭}.

And now, we are ready to define the FIFO automata.

Definition 37 (From local types to FIFO automata).

Let T0 be the local type of participant 𝗉. The automaton corresponding to T0 is 𝒜(T0)=(Q,δ,q0) where:

  1. 1.

    Q={TTT0,T𝐭,Tμ𝐭.T}{qcrash}{qsend,𝗋𝗋{𝗉}}

  2. 2.

    q0=strip(T0);

  3. 3.

    δ is the smallest set of transitions such that TQ:

    1. (a)

      If T=𝗊{mi.Ti}iI and kI, mkcatch, and {!,?}
      {(T,𝗉𝗊mk,strip(Tk))δif Tk𝐭(T,𝗉𝗊mk,strip(T))δif Tk=𝐭 with μ𝐭.TT0.

    2. (b)

      If T=𝗊?{mi.Ti}iI with kI, mk=catch
      {(T,𝗊𝗉?,strip(Tk))δif Tk𝐭(T,𝗊𝗉?,strip(T))δif Tk=𝐭 with μ𝐭.TT0.

    3. (c)

      If T{𝗌𝗍𝗈𝗉,𝖾𝗇𝖽}, then (T,crash-broadcast𝗉(),𝗌𝗍𝗈𝗉)δ where

      1. i.

        (T,crash,qcrash)δ

      2. ii.

        (qcrash,𝗉𝗋𝟣!,qsend,𝗋𝟣)δ

      3. iii.

        (qsend,𝗋𝗂,𝗉𝗋𝗂+𝟣!,qsend,𝗋𝗂+𝟣)δ i{1,,n2}, where n=|𝐶ℎo,𝗉|

      4. iv.

        (qsend,𝗋𝗇𝟣,crash,𝗌𝗍𝗈𝗉)δ

    4. (d)

      If T{𝗌𝗍𝗈𝗉,𝖾𝗇𝖽}, then (T,𝗊𝗉?,T)δ for all 𝗊{𝗉}.

where strip(T)=defstrip(T) if T=μ𝐭.T; otherwise strip(T)=defT.

Example 38.

The FIFO automata constructed from the type in Example 36 is shown in Figure 4. We see that for all receiving actions from process 𝖠, which is not in the reliable set of processes, there is a crash-handling branch, where:

T0 =𝖡?{𝑠𝑖𝑔.𝖠?{𝑐𝑜𝑚𝑚𝑖𝑡.𝐭,catch.𝖾𝗇𝖽},𝑠𝑎𝑣𝑒.𝖠?{𝑓𝑖𝑛𝑖𝑠ℎ.𝖾𝗇𝖽,catch.𝖾𝗇𝖽}}
T1 =𝖠?{𝑐𝑜𝑚𝑚𝑖𝑡.𝐭,catch.𝖾𝗇𝖽}T2=𝖠?{𝑓𝑖𝑛𝑖𝑠ℎ.𝖾𝗇𝖽,catch.𝖾𝗇𝖽}T3=𝖾𝗇𝖽

We now prove that the automata generated from a local type can be composed into communicating session automata, and this translation preserves the semantics.

Lemma 39.

Assume T𝗉 is a local type. Then 𝒜(T𝗉) is deterministic, directed and has no mixed states. Moreover, T𝗉𝒜(T𝗉), i.e. ϕ,ϕ𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(T𝗉)ϕ𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒜(T𝗉)).

By Lemma 39, we derive that the resulting systems belong to the class of crash-handling systems, and the problem of checking rsc and k-wmc is decidable for this class.

Theorem 40.

The FIFO system generated from the translation of crash-stop session types is a crash-handling system. Moreover, it is decidable to check inclusion to the rsc and k-wmc classes.

7 Experimental evaluation

Table 1: Experimental evaluation of benchmarks in the ReSCu and kmc tool, under the presence of no errors, out-of-order, lossiness and corruption errors. Note that the systems were checked for k10. to denotes timeout after 5 minutes. The *-marked examples originate from the ReSCu tool [16], having been translated from other papers, as detailed in the original publication.
Protocol No errors Out of order Lossiness Corruption
k-mc rsc k-mc rsc k-exh k-ER k-PG rsc k-exh k-ER k-PG rsc
Alternating Bit [41] yes yes yes yes yes yes no yes yes yes no yes
Alternating Bit [7] yes no yes yes yes yes no yes yes yes no yes
Bargain [34] yes yes yes yes yes yes no yes yes no no yes
Client-Server-Logger [35] yes no yes no yes yes no yes yes yes no yes
Cloud System v4 [25] yes yes yes no no no no yes no no no no
Commit protocol [10] yes yes yes yes yes no no yes yes no no yes
Dev System [40] yes yes yes yes yes no no yes yes no no yes
Elevator [10] yes no yes no yes yes no no no to no no
Elevator-dashed [10] yes no yes no no no no no no to no no
Elevator-directed [10] yes no yes no no no no no no to no no
Filter Collaboration [49] yes yes yes yes yes yes no yes yes no no yes
Four Player Game [34] yes yes yes no no yes no yes no yes no yes
Health System [35] yes yes yes yes yes no no yes yes no no yes
Logistic [39] yes yes yes yes yes yes no yes no no no yes
Sanitary Agency (mod) [42] yes yes yes yes yes no no yes yes to no yes
TPM Contract [26] yes yes yes no yes yes no yes no no no no
2-Paxos 2P3A ([45]) yes yes yes yes yes yes yes yes yes no no yes
Promela I* [16] yes no yes no yes yes no yes yes yes yes yes
Web Services* [16] yes yes yes yes yes yes no yes yes no no yes
Trade System* [16] yes yes yes yes yes yes no yes yes no no yes
Online Stock Broker* [16] no no no no no no no yes no no no yes
FTP* [16] yes yes yes yes yes yes no yes no no no yes
Client-server* [16] yes yes yes yes yes yes no yes yes no no yes
Mars Explosion* [16] yes yes yes yes yes no no yes no no no yes
Online Computer Sale* [16] no yes no yes yes yes no yes no no no yes
e-Museum* [16] yes yes yes no yes no no yes yes no no yes
Vending Machine* [16] yes yes yes yes yes yes no yes yes no no yes
Bug Report* [16] yes yes yes no yes yes no yes no no no yes
Sanitary Agency* [16] no yes no yes yes yes no yes yes no no yes
SSH* [16] no yes no yes yes yes no yes yes yes no yes
Booking System* [16] no yes no yes yes yes no yes yes no no yes
Hand-crafted Example* [16] no yes no yes yes no no yes yes no no yes

We verify protocols in the literature under interferences in order to compare how inclusion to the rsc and k-wmc classes change; and which of rsc and k-mc is more resilient under failures. We used the tools, rsc-checker ReSCu [16] (implemented in OCaml), and k-mc-checker kmc [35] (implemented in Haskell). The ReSCu tool implements a version of the out-of-order scheduling with an option, so we use this available option to take our benchmark.

The kmc tool implements the options to check (1) the k-exhaustivity (k-exh, Def. 22); (2) the k-eventual reception (k-er, Def 21); and (3) the progress (k-pg, Def 21). The k-weak multiparty compatibility condition (k-wmc, Def 25) no longer checks for k-pg but checks k-exh and k-er. Hence checking (1,2,3) gives us the justification whether k-wmc is more resilient than k-mc. For the out-of-order, we implemented the out-of-order scheduling in Haskell mirroring the implementation as in ReSCu. To model lossiness, we add reception self-loops as defined in completely specified protocols [21], and for corruption, we allow the sending of arbitrary messages; both of these are implemented in Python.

Table 1 shows the evaluation results. From the benchmarks in [16], we selected all the relevant benchmarks (CSA and peer-to-peer) in order to evaluate them by kmc. Interestingly, in the case of out-of-order errors, all of the protocols which satisfy k-mc without errors still satisfy k-mc. However rsc does not satisfy some k-mc protocols. This would imply that in most real-world examples, the flexibility in behaviour introduced by relaxing the FIFO condition does not affect the inclusion to k-mc. On the other hand, under lossiness and corruption, most examples no longer belong to k-mc. More specifically, k-pg fails for most cases. The k-er also fails for many cases, especially in the presence of corruption. This justifies a relevance of our definition of k-wmc under those two failures.

We implement a k-bounded version of the Paxos protocol [33], a consensus algorithm that ensures agreement in a distributed system despite failures like lossiness and reordering, using a process of proposing and accepting values (c.f. [45] for the details). This version limits retry attempts to k. Our implementation (for 2 retries, 2 proposers and 3 acceptors) shows it is k-mc and rsc both without errors, and k-mc under lossiness. Since Paxos does not assume corruption, it is unsurprising that it is no longer k-mc under corruption.

8 Conclusion and further related work

In this paper, we derived decidability and complexity results for two subclasses, rsc and k-mc, under two types of communication failures: interferences and crash-stop failures. In the absence of errors, rsc systems and k-mc systems are incomparable, even if we restrict the analyses to 1-mc systems. For example, [35, Example 4] is 1-mc but not rsc. Conversely, [17, Example 4] is rsc but does not satisfy the progress condition, and hence is not k-mc for any k. Despite these distinctions, both classes aim to generalise the concept of half-duplex communication to multiparty systems. This serves as our primary motivation for examining failures in a uniform way across both rsc and k-mc systems.

In the interference model, we introduced i-rsc systems, which relax the matching-pair conditions in rsc; and k-wmc which omits the progress condition to accommodate a model with no final states. We proved that the inclusion problem for these relaxed properties remain decidable within the same complexity class as their error-free counterparts. The evaluation results in § 7 confirm that relaxed systems are more resilient than the original ones.

As the second failure model, we investigated crash-stop failures. We defined crash-handling communicating systems which strictly include the class of local types with crash-stop failures. We also proved that both rsc and k-mc properties are decidable for this class. Note that multiparty session types with crash-stop failures studied in [6] are limited to synchronous communications. Meanwhile, the asynchronous setting in [3] restricts expressiveness to a set of local types projected from global types (which is known to be less expressive than those not using global types [43]). Therefore, both of these systems are strictly subsumed by our crash-handling system as proven in Theorem 40.

Integrating the k-mc-checker and the ReSCu tool (with support for crash-stop failures) into the Scala toolchain of [3] is a promising direction for future work, potentially enabling the verification of a broader class of programs than those considered in [43, 3].

Various failure-handling systems have been studied in the session types literature, e.g., affine session types [38, 22, 29], link-failure [2] and event-driven failures [48]. Interpreting their failures into our framework would offer a uniform analysis of failure processes.

References

  • [1] P. Abdulla and B. Jonsson. Verifying programs with unreliable channels. In [1993] Proceedings Eighth Annual IEEE Symposium on Logic in Computer Science, pages 160–170, June 1993. doi:10.1109/LICS.1993.287591.
  • [2] Manuel Adameit, Kirstin Peters, and Uwe Nestmann. Session Types for Link Failures. In Ahmed Bouajjani and Alexandra Silva, editors, Formal Techniques for Distributed Objects, Components, and Systems, pages 1–16, Cham, 2017. Springer International Publishing. doi:10.1007/978-3-319-60225-7_1.
  • [3] Adam D. Barwell, Ping Hou, Nobuko Yoshida, and Fangyi Zhou. Designing Asynchronous Multiparty Protocols With Crash Stop Failures. In European Conference on Object Oriented Programming, volume 263, pages 1:1–1:30, 2023. doi:10.4230/LIPIcs.ECOOP.2023.1.
  • [4] Adam D. Barwell, Ping Hou, Nobuko Yoshida, and Fangyi Zhou. Designing Asynchronous Multiparty Protocols with Crash-Stop Failures, May 2023. arXiv:2305.06238 [cs]. doi:10.48550/arXiv.2305.06238.
  • [5] Adam D. Barwell, Ping Hou, Nobuko Yoshida, and Fangyi Zhou. Crash-Stop Failures in Asynchronous Multiparty Session Types. Logical Methods in Computer Science, 2025. URL: https://arxiv.org/abs/2311.11851.
  • [6] Adam D. Barwell, Alceste Scalas, Nobuko Yoshida, and Fangyi Zhou. Generalised Multiparty Session Types with Crash-Stop Failures. In Bartek Klin, S\lawomir Lasota, and Anca Muscholl, editors, 33rd International Conference on Concurrency Theory (CONCUR 2022), volume 243 of Leibniz International Proceedings in Informatics (LIPIcs), pages 35:1–35:25, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. ISSN: 1868-8969. doi:10.4230/LIPIcs.CONCUR.2022.35.
  • [7] Bernard Boigelot and Patrice Godefroid. Symbolic verification of communication protocols with infinite state spaces using QDDs. In Rajeev Alur and Thomas A. Henzinger, editors, Computer Aided Verification, Lecture Notes in Computer Science, pages 1–12, Berlin, Heidelberg, 1996. Springer. doi:10.1007/3-540-61474-5_53.
  • [8] Benedikt Bollig, Cinzia Di Giusto, Alain Finkel, Laetitia Laversa, Etienne Lozes, and Amrita Suresh. A Unifying Framework for Deciding Synchronizability. In Serge Haddad and Daniele Varacca, editors, 32nd International Conference on Concurrency Theory (CONCUR 2021), volume 203 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1–14:18, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. ISSN: 1868-8969. doi:10.4230/LIPIcs.CONCUR.2021.14.
  • [9] Benedikt Bollig, Dietrich Kuske, and Ingmar Meinecke. Propositional Dynamic Logic for Message-Passing Systems. Logical Methods in Computer Science, Volume 6, Issue 3, September 2010. doi:10.2168/LMCS-6(3:16)2010.
  • [10] Ahmed Bouajjani, Constantin Enea, Kailiang Ji, and Shaz Qadeer. On the Completeness of Verifying Message Passing Programs Under Bounded Asynchrony. In Hana Chockler and Georg Weissenbacher, editors, Computer Aided Verification - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II, Lecture Notes in Computer Science, pages 372–391, Cham, 2018. Springer International Publishing. doi:10.1007/978-3-319-96142-2_23.
  • [11] Daniel Brand and Pitro Zafiropulo. On Communicating Finite-State Machines. J. ACM, 30(2):323–342, April 1983. doi:10.1145/322374.322380.
  • [12] Zak Cutner, Nobuko Yoshida, and Martin Vassor. Deadlock-free asynchronous message reordering in Rust with multiparty session types. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’22, pages 246–261, New York, NY, USA, March 2022. Association for Computing Machinery. doi:10.1145/3503221.3508404.
  • [13] Wojciech Czerwinski, Slawomir Lasota, Ranko Lazic, Jérôme Leroux, and Filip Mazowiecki. The Reachability Problem for Petri Nets Is Not Elementary. J. ACM, 68(1):7:1–7:28, 2021. doi:10.1145/3422822.
  • [14] Gérard Cécé and Alain Finkel. Verification of programs with half-duplex communication. Information and Computation, 202(2):166–190, November 2005. doi:10.1016/j.ic.2005.05.006.
  • [15] Pierre-Malo Deniélou and Nobuko Yoshida. Multiparty Compatibility in Communicating Automata: Characterisation and Synthesis of Global Session Types. In Fedor V. Fomin, Rūsiņš Freivalds, Marta Kwiatkowska, and David Peleg, editors, Automata, Languages, and Programming, Lecture Notes in Computer Science, pages 174–186, Berlin, Heidelberg, 2013. Springer. doi:10.1007/978-3-642-39212-2_18.
  • [16] Loïc Desgeorges and Loïc Germerie Guizouarn. RSC to the ReSCu: Automated Verification of Systems of Communicating Automata. In Sung-Shik Jongmans and Antónia Lopes, editors, Coordination Models and Languages, Lecture Notes in Computer Science, pages 135–143, Cham, 2023. Springer Nature Switzerland. doi:10.1007/978-3-031-35361-1_7.
  • [17] Cinzia Di Giusto, Loïc Germerie Guizouarn, and Etienne Lozes. Multiparty half-duplex systems and synchronous communications. Journal of Logical and Algebraic Methods in Programming, 131:100843, February 2023. doi:10.1016/j.jlamp.2022.100843.
  • [18] Cinzia Di Giusto, Loïc Germerie Guizouarn, and Etienne Lozes. Towards Generalised Half-Duplex Systems. Electron. Proc. Theor. Comput. Sci., 347:22–37, October 2021. arXiv:2110.00145 [cs]. doi:10.4204/EPTCS.347.2.
  • [19] Cinzia Di Giusto, Laetitia Laversa, and Etienne Lozes. On the k-synchronizability of Systems. In Jean Goubault-Larrecq and Barbara König, editors, Foundations of Software Science and Computation Structures - 23rd International Conference, FOSSACS 2020, Proceedings, Lecture Notes in Computer Science, pages 157–176, Cham, 2020. Springer International Publishing. doi:10.1007/978-3-030-45231-5_9.
  • [20] Alan Fekete, Nancy Lynch, Yishay Mansour, and John Spinelli. The impossibility of implementing reliable communication in the face of crashes. J. ACM, 40(5):1087–1107, November 1993. doi:10.1145/174147.169676.
  • [21] Alain Finkel. Decidability of the termination problem for completely specified protocols. Distrib Comput, 7(3):129–135, March 1994. doi:10.1007/BF02277857.
  • [22] Simon Fowler, Sam Lindley, J. Garrett Morris, and Sára Decova. Exceptional Asynchronous Session Types: Session Types without Tiers. Proc. ACM Program. Lang., 3(POPL):28:1–28:29, 2019. doi:10.1145/3290341.
  • [23] Blaise Genest, Dietrich Kuske, and Anca Muscholl. On Communicating Automata with Bounded Channels. Fundamenta Informaticae, 80(1-3):147–167, January 2007. Publisher: IOS Press. URL: https://content.iospress.com/articles/fundamenta-informaticae/fi80-1-3-09.
  • [24] Blaise Genest, Anca Muscholl, and Dietrich Kuske. A Kleene Theorem for a Class of Communicating Automata with Effective Algorithms. In Cristian Calude, Elena Calude, and Michael J. Dinneen, editors, Developments in Language Theory, 8th International Conference, DLT 2004, Auckland, New Zealand, December 13-17, 2004, Proceedings, volume 3340 of Lecture Notes in Computer Science, pages 30–48. Springer, 2004. doi:10.1007/978-3-540-30550-7_4.
  • [25] Matthias Güdemann, Gwen Salaün, and Meriem Ouederni. Counterexample guided synthesis of monitors for realizability enforcement. In ATVA 2012, pages 238–253, 2012. doi:10.1007/978-3-642-33386-6_20.
  • [26] Sylvain Hallé and Tevfik Bultan. Realizability analysis for message-based interactions using shared-state projections. In SIGSOFT 2010, pages 27–36, 2010. doi:10.1145/1882291.1882298.
  • [27] Kohei Honda, Vasco T. Vasconcelos, and Makoto Kubo. Language primitives and type disciplines for structured communication-based programming. In Proceedings of ESOP 1998, volume 1381 of Lecture Notes in Computer Science, pages 22–138. Springer, 1998.
  • [28] Kohei Honda, Nobuko Yoshida, and Marco Carbone. Multiparty asynchronous session types. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’08, pages 273–284, New York, NY, USA, January 2008. Association for Computing Machinery. doi:10.1145/1328438.1328472.
  • [29] Ping Hou, Nicolas Lagaillardie, and Nobuko Yoshida. Fearless Asynchronous Communications with Timed Multiparty Session Protocols. In ECOOP 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.ECOOP.2024.19.
  • [30] Keigo Imai, Julien Lange, and Rumyana Neykova. Kmclib: Automated inference and verification of session types from OCaml programs. In Dana Fisman and Grigore Rosu, editors, Tools and Algorithms for the Construction and Analysis of Systems - 28th International Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2-7, 2022, Proceedings, Part I, volume 13243 of Lecture Notes in Computer Science, pages 379–386. Springer, 2022. doi:10.1007/978-3-030-99524-9_20.
  • [31] Shokoufeh Kazemlou and Borzoo Bonakdarpour. Crash-Resilient Decentralized Synchronous Runtime Verification. In 2018 IEEE 37th Symposium on Reliable Distributed Systems (SRDS), pages 207–212, October 2018. ISSN: 2575-8462. doi:10.1109/SRDS.2018.00032.
  • [32] Nicolas Lagaillardie, Rumyana Neykova, and Nobuko Yoshida. Stay Safe under Panic: Affine Rust Programming with Multiparty Session Types. In 36th European Conference on Object-Oriented Programming, volume 222 of LIPIcs, pages 4:1–4:29. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.ECOOP.2022.4.
  • [33] Leslie Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2):133–169, 1998. doi:10.1145/279227.279229.
  • [34] Julien Lange, Emilio Tuosto, and Nobuko Yoshida. From communicating machines to graphical choreographies. In POPL 2015, pages 221–232, 2015. doi:10.1145/2676726.2676964.
  • [35] Julien Lange and Nobuko Yoshida. Verifying Asynchronous Interactions via Communicating Session Automata. In Isil Dillig and Serdar Tasiran, editors, Computer Aided Verification, Lecture Notes in Computer Science, pages 97–117, Cham, 2019. Springer International Publishing. doi:10.1007/978-3-030-25540-4_6.
  • [36] Markus Lohrey and Anca Muscholl. Bounded MSC Communication. In Mogens Nielsen and Uffe Engberg, editors, Foundations of Software Science and Computation Structures, 5th International Conference, FOSSACS 2002. Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2002 Grenoble, France, April 8-12, 2002, Proceedings, volume 2303 of Lecture Notes in Computer Science, pages 295–309. Springer, 2002. doi:10.1007/3-540-45931-6_21.
  • [37] Etienne Lozes and Jules Villard. Reliable Contracts for Unreliable Half-Duplex Communications. In Marco Carbone and Jean-Marc Petit, editors, Web Services and Formal Methods, Lecture Notes in Computer Science, pages 2–16, Berlin, Heidelberg, 2012. Springer. doi:10.1007/978-3-642-29834-9_2.
  • [38] Dimitris Mostrous and Vasco Thudichum Vasconcelos. Affine sessions. In International Conference on Coordination Languages and Models, pages 115–130, Berlin, Heidelberg, 2014. Springer. doi:10.1007/978-3-662-43376-8_8.
  • [39] OMG. Business Process Model and Notation, 2018. https://www.omg.org/spec/BPMN/2.0/.
  • [40] Roly Perera, Julien Lange, and Simon J. Gay. Multiparty compatibility for concurrent objects. In PLACES 2016, pages 73–82, 2016. doi:10.4204/EPTCS.211.8.
  • [41] Introduction to protocol engineering. Available at http://cs.uccs.edu/˜cs522/pe/pe.htm, 2006.
  • [42] Gwen Salaün, Lucas Bordeaux, and Marco Schaerf. Describing and reasoning on web services using process algebra. IJBPIM, 1(2):116–128, 2006. doi:10.1504/IJBPIM.2006.010025.
  • [43] Alceste Scalas and Nobuko Yoshida. Less is more: multiparty session types revisited. Proc. ACM Program. Lang., 3(POPL):30:1–30:29, January 2019. doi:10.1145/3290343.
  • [44] Philippe Schnoebelen. Verifying lossy channel systems has nonprimitive recursive complexity. Inf. Process. Lett., 83(5):251–261, 2002. doi:10.1016/S0020-0190(01)00337-4.
  • [45] Amrita Suresh and Nobuko Yoshida. The full version of this paper, 2025.
  • [46] Kaku Takeuchi, Kohei Honda, and Makoto Kubo. An Interaction-based Language and its Typing System. In PARLE’94, volume 817 of Lecture Notes in Computer Science, pages 398–413, 1994. doi:10.1007/3-540-58184-7_118.
  • [47] Martin Vassor and Nobuko Yoshida. Refinements for multiparty message-passing protocols: Specification-agnostic theory and implementation. In Jonathan Aldrich and Guido Salvaneschi, editors, 38th European Conference on Object-Oriented Programming, ECOOP 2024, September 16-20, 2024, Vienna, Austria, volume 313 of LIPIcs, pages 41:1–41:29. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ECOOP.2024.41.
  • [48] Malte Viering, Raymond Hu, Patrick Eugster, and Lukasz Ziarek. A Multiparty Session Typing Discipline for Fault-Tolerant Event-Driven Distributed Programming. Proc. ACM Program. Lang., 5(OOPSLA), October 2021. doi:10.1145/3485501.
  • [49] Daniel M. Yellin and Robert E. Strom. Protocol specifications and component adaptors. ACM Trans. Program. Lang. Syst., 19(2):292–333, 1997. doi:10.1145/244795.244801.

Appendix A Proofs from §3

Lemma 14. [Restated, see original statement.]

An execution (e,ν) is causally equivalent to an i-rsc execution iff the associated conflict graph 𝖼𝗀𝗋𝖺𝗉𝗁(e,ν) is acyclic.

Proof.

The left to right implication follows from two observations: first, two causally equivalent executions have isomorphic conflict graphs. Secondly, the conflict graph of an RSC execution is acyclic, because for an RSC execution and vertices χ1,χ2 in the conflict graph, χ1e,νχ2 if there is j1χ1 and j2χ2 such that j1e,νj2. Moreover, if there is more than action in either χ1 or χ2, for i-rsc executions by definition, min(χ1)e,νmin(χ2). Therefore, if there is a cycle in the conflict graph, then this would imply min(χ2)e,νmin(χ1), which would be a contradiction.

For the converse direction, let us assume that a conflict graph associated to e=a1a2an is acyclic. Let us consider the associated communication set ν. Let χ1χn be a topological order on ν. Let e=χ1χn be the corresponding RSC execution, and ν the communication set associated to e that is RSC.

Let σ be the permutation such that e=aσ(1)aσ(n). Following the proof idea in [17], we show that e is causally equivalent to e. Let j,j be two indices of e, and let us show that jj iff σ(j)σ(j).

We have that {j,j} is a matching pair in e, iff, by construction, {σ(j),σ(j)} is a matching pair in e. If {j,j} is not a matching pair of e, then let χ and χ be the interactions containing j and j respectively. Since jj, there is an arrow between χ and χ in the conflict graph, and moreover aj and aj cannot commute. Note that there is an arrow in the conflict graph of e as well. Since the conflict graph is acyclic, we have σ(j)σ(j).

Lemma 16. [Restated, see original statement.]

𝒮 is i-rsc if and only if for all e𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮) and νComm(e), (e,ν) is not a borderline violation.

Proof.

By definition, if there exists an execution (e,ν) in system 𝒮 such that it is a borderline violation, then 𝒮 is not i-rsc. Conversely, if 𝒮 is not i-rsc, let (e,ν) be (one of) the shortest execution that is not causally equivalent to an i-rsc execution. Then, e=ea such that for all νComm(e), we have (e,ν) is equivalent to an i-rsc execution. Let ν′′ be the communication set of e such that it is a subset of the communication set ν. Let (e′′,ν′′) be the i-rsc execution that is causally equivalent to (e,ν′′). Then, there exists an execution e^ such that (e^,ν) is an execution of 𝒮. Moreover, if a is a send action, then (e^,ν) is i-rsc which is a contradiction. Therefore, (e,ν) is a borderline violation.

Lemma 17. [Restated, see original statement.]

Let 𝒮 with 𝗉𝗋𝗈𝖽𝗎𝖼𝗍(𝒮)=(Q,Σ,𝐶ℎ,𝖠𝖼𝗍,δ,qo). There is a non-deterministic finite state automaton 𝒜𝑏𝑣 computable in time 𝒪(|𝐶ℎ|3|Σ|2) such that (𝒜𝑏𝑣)={e𝖠𝖼𝗍nr.𝖠𝖼𝗍?νComm(e) such that (e,ν) is a borderline violation}.

Proof.

Let 𝒜𝑏𝑣=(Q𝑏𝑣,δ𝑏𝑣,q0,𝑏𝑣,{qf}), with Q𝑏𝑣={q0,𝑏𝑣,qf}(𝐶ℎ×𝖠𝖼𝗍×{0,1}), and for all a,a𝖠𝖼𝗍nr, for all c𝐶ℎ, m,mΣ:

  1. 1.

    (q0,𝑏𝑣,a,q0,𝑏𝑣): this is a loop on the initial state that reads all interactions until the chosen send message

  2. 2.

    (q0,𝑏𝑣,𝗉𝗊!m,(𝗉𝗊,𝗉𝗊!m,0)): this is the transition where we non-deterministically select the send message that is matched to the final reception to be borderline.

  3. 3.

    In case we do not consider out-of-order errors, we add the following step: ((𝖼,a,0),a,(𝖼,a,0)), if 𝖼𝗁(a)𝖼: again loop for every communication but we do not accept any further communication on the channel 𝖼 in order to stay borderline. Note that this step is skipped if we consider the general case with out-of-order errors as we can have matched pairs between a matched send and receive action.

  4. 4.

    ((𝖼,a,0),a,(𝖼,a,1)), if 𝗉𝗋𝗈𝖼(a)𝗉𝗋𝗈𝖼(a): here, the second interaction that will take part in the conflict graph cycle is guessed. We ensure there is a process in common with a for there to be an edge between them.

  5. 5.

    ((𝖼,a,1),a,(𝖼,a,1)), if 𝖼𝗁(a)𝖼: once again a loop for every interaction.

  6. 6.

    ((𝖼,a,1),a,(𝖼,a,1)), if 𝗉𝗋𝗈𝖼(a)𝗉𝗋𝗈𝖼(a): the next vertex (or vertices) (if any) of the conflict graph is guessed.

  7. 7.

    ((𝖼,a,1),𝗉𝗊?m,qf), if 𝗉𝗋𝗈𝖼(a)𝗉𝗋𝗈𝖼(𝗉𝗊?m): finally, an execution is accepted if it closes the cycle.

Moreover, each transition of 𝒜𝑏𝑣 can be constructed in constant time, so 𝒜𝑏𝑣 can be constructed in time 𝒪(|𝐶ℎ|3|Σ|2).

Lemma 18. [Restated, see original statement.]

Let 𝒮 be a FIFO system. There exists a non-deterministic finite state automaton 𝒜𝑟𝑠𝑐 over 𝖠𝖼𝗍nr𝖠𝖼𝗍? such that (𝒜𝑟𝑠𝑐)={e𝗉𝗊?m𝖠𝖼𝗍nr.𝖠𝖼𝗍?e𝗉𝗊?m𝖾𝗑𝖾𝖼𝗎𝗍𝗂𝗈𝗇𝗌(𝒮) and νComm(e) such that (e,ν) is an i-rsc execution}, which can be constructed in time 𝒪(n||+2|𝐶ℎ|2×2|𝐶ℎ|), where n is the size of 𝒮.

Proof.

Let 𝒜𝑟𝑠𝑐=(Qrsc,δrsc,q0,rsc,{qf}) be the non-deterministic automata, with Qrsc=Q×({ε}𝐶ℎ)×2𝐶ℎ{qf}. We define the transitions as follows:

  1. 1.

    First, while performing the action a𝖠𝖼𝗍nr, (q,χ,S)𝑎(q,χ,S) if

    • (q,v)(q,v) in the underlying transition system, for some buffer values v,v and for all 𝖼𝐶ℎ, vc iff 𝖼S and v𝖼 iff 𝖼S, and

    • this condition is added in the absence of out-of-order errors: if a = 𝗉𝗊!?m, then 𝗉𝗊S, and

    • either χ=χ, or a=𝖼!m and χ=𝖼

  2. 2.

    Second, while performing the action a=𝗉𝗊?m, we have (q,χ,S)𝑎qf if χ=𝗉𝗊 and (q,a,q)δ𝒮 for some q.

Each transition of 𝒜𝑟𝑠𝑐 can be constructed in constant time. An upper bound on the number of transitions can be computed as follows: if (q,χ,S)𝑎(q,χ,S) is a transition, then q and q only differ on at most two machines (the one that executed the send, and the one that executed the receive), so there are at most n2 different possibilities for q once q and a are fixed. There are at most two possibilities for χ once χ and a are fixed, and S is fully determined by S and a. Finally, there are n||(1+|𝐶ℎ|)×2|𝐶ℎ|×2×|𝐶ℎ| possibilities for a choice of the pair ((q,χ,S),a).

Theorem 19. [Restated, see original statement.]

Given a system 𝒮 of size n, deciding whether it is an i-rsc system can be done in time 𝒪(n||+2|𝐶ℎ|5×2|𝐶ℎ|×|Σ|2).

Proof.

The set of borderline violations of a system 𝒮 can be expressed as (𝒜𝑟𝑠𝑐)𝖠𝖼𝗍nr(𝒜𝑏𝑣). Therefore, checking for inclusion in i-rsc reduces to checking the emptiness of this intersection, which can be done in time 𝒪(n).