Partial-Order Reduction Is Hard
Abstract
The goal of partial-order methods is to accelerate the exploration of concurrent systems by examining only a representative subset of all possible runs. The stateful approach builds a transition system with representative runs, while the stateless method simply enumerates them. The stateless approach may be preferable if the transition system is tree-like; otherwise, the stateful method is more effective.
In the last decade, optimality has been a guiding principle for developing stateless partial-order reduction algorithms, and without doubt contributed to big progress in the field. In this paper we ask if we can get a similar principle for the stateful approach. We show that in stateful exploration, a polynomially close to optimal partial-order algorithm cannot exist unless P=NP. The result holds even for acyclic programs with just await instructions. This lower bound result justifies systematic study of heuristics for stateful partial-order reduction. We propose a notion of IFS oracle as a useful abstraction. The oracle can be used to get a very simple optimal stateless algorithm, which can then be adapted to a non-optimal stateful algorithm. While in general the oracle problem is NP-hard, we show a simple case where it can be solved in linear time.
Keywords and phrases:
Formal verification, Concurrent systems, Partial-order reduction, ComplexityFunding:
Frédéric Herbreteau: Supported by the French government in the framework of the France 2030 programme IdEx université de Bordeaux / RRI ROBSYS.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Logic and verificationEditors:
Patricia Bouyer and Jaco van de PolSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The goal of partial-order methods is to speed up explicit state exploration of concurrent systems. The state space of such systems grows exponentially with the number of processes. Fortunately, many runs of a concurrent system can usually be considered equivalent, so it is enough to explore only one run in each equivalence class. For example, if one process assigns and another then the order of execution of these two operations is usually irrelevant; the two interleavings are equivalent, and it is enough to explore only one of the two. This reduces the number of visited states as well as the exploration time. In some cases, the reductions are very substantial.
In recent years, we have seen novel applications of partial-order methods. One is proving the properties of concurrent programs [10], where equivalence between runs is not only used to reduce the number of proof objectives, but also to simplify proof objectives by choosing particular linearizations. Similarly, in symbolic executions [9] or in testing [28], partial-order can be used to limit the exploration space while still being exhaustive. Another application is verification of timed systems using recently developed local-time zones [15]. All these applications rely on explicit state enumeration, as opposed to symbolic methods such as SAT or BDDs.
The first partial-order methods were proposed about 35 years ago under the names of stubborn sets, persistent sets, or ample sets [34, 14, 20]. In 2005, Flanagan and Godefroid introduced stateless dynamic partial-order reductions [12], making stateless methods a focal point of subsequent research. In 2014, Abdulla et al. proposed a notion of trace-optimality111We use the term trace-optimality instead of simply optimality as in op.cit. to differentiate from state-optimality we consider in this paper. [1], and a race reversal technique, initiating a new cycle of work on stateless partial-order methods [6, 5, 18, 22, 23, 3, 21, 24, 4].
In this paper, we focus on stateful partial-order reduction methods, that have seen relatively less progress over the last two decades [29, 37, 8, 7]. A partial-order algorithm produces a reduced transition system containing a representative for each equivalence class of runs of the system. A stateless approach produces a tree of runs, but saves memory by storing only one run at a time. In contrast, a stateful approach keeps all visited states in memory so that an exploration can be stopped if a state is revisited. Each of the two approaches has its advantages. When the objective is to verify existing code by running an instrumentation of it, the states are too complex to keep in memory, so stateless exploration is the only solution. When verifying pseudocode, say of a mutual exclusion protocol, the number of non-equivalent runs may be several orders of magnitude bigger than the number of states, rendering the stateless approach infeasible. The stateful approach is then necessary.
Our first question is whether we can find a guiding principle for the stateful approach similar to the trace-optimality for the stateless approach. The example in Figure 1 shows why we cannot just settle on trace optimality. The two transition systems in the figure are both trace-optimal, since no two maximal paths are equivalent. But one is of linear size and the other is of exponential size. The obvious parameter for optimality of the stateful approach is the size of the transition system. So our question is whether a state-optimal, or close to optimal, partial-order algorithm can exist.
It is not difficult to see that getting a reduced system of exactly the minimal size is NP-hard. Our main result says something much stronger: even approximating the minimal size within a polynomial factor remains NP-hard (Theorem 5). This holds even for acyclic concurrent programs using only await and write instructions. Arguably, this looks like a minimal sensible set of instructions. If we allowed cycles then we could simulate await with active wait, reducing the set of instructions to just reads and writes.
This negative result justifies starting a systematic study of heuristics for stateful partial-order reduction. We propose an approach based on a notion of IFS oracle. This oracle expresses an important sub-problem encountered in partial-order reduction algorithms, namely avoiding so called sleep-blocked executions. Our proposal is to concentrate on the heuristics for this oracle to improve the efficiency of stateful methods. We justify this approach by showing that in the stateless setting, the IFS oracle gives a very simple trace-optimal algorithm. We show how to adapt this algorithm to the stateful approach. Finally, we narrow our study down to a special case, generalizing some settings considered in the stateless literature [5, 22], where the IFS oracle can be computed in linear time.
Before starting, we would like to shortly discuss the models of concurrent systems we are using. In recent years it is common to consider a model of processes with shared variables, eventually adding lock instructions. This is the model we are using for our lower bound. However, we also consider a stronger model of concurrent systems synchronizing on shared actions. This model can encode variables, and most of the synchronization mechanisms used in practice. For this reason we prefer this model for our positive results. Moreover, since the model is strong, our lower bound result is easier to prove for it, so it serves as a good intermediate step in our proof.
In this work we assume that programs are acyclic. Dealing with cycles in partial-order reduction is a separate problem from the reduction itself, and complicates the algorithms substantially. Finding a clean way of dealing with cycles in partial-order reduction is beyond the scope of this paper.
To summarize our three principal contributions are:
-
A strong lower bound showing that, assuming , no deterministic algorithm can construct a reduced transition system for a concurrent program , of polynomial size relative to the minimal size of a reduced system for , while working in time polynomial in the size of its input and its output (Theorem 5). This holds even if every process in the program is acyclic and uses only await and write instructions.
-
A notion of IFS oracle and a simple trace-optimal stateless algorithm using this oracle. We adapt this algorithm to the stateful approach.
-
A special subcase of linear fully non-blocking systems, where the IFS oracle can be computed in linear time. This case covers some situations considered in the stateless partial-order reduction literature.
1.1 Related Work
The literature on partial-order methods has expanded rapidly in the last decade. In this brief discussion, we focus only on results that are closely related to our work, primarily citing more recent papers.
Partial-order methods have been introduced around 1990 [34, 13, 20]. Applications requiring stateful partial-order methods [11] are still based on the same basic principles. Another significant concept for us here is the use of lexicographic ordering to identify representative runs [36, 19, 35]. We also highlight a Petri net unfolding-based approach [32, 8] that distinguishes itself by using prime event structures instead of transition systems.
There are already two NP-hardness results in the literature on partial-order methods. The first is the NP-hardness of computing optimal ample sets [30]. The second is NP-hardness of computing so-called alternatives in the context of unfolding based methods [8]. These results can be compared to our observation (Proposition 22) that IFS problem is NP-complete. Our main negative result is much stronger. It does not address a particular method of doing partial-order reduction, but shows that any method must be NP-hard, even when instead of optimality we just want to be polynomially close to optimal.
Stateless partial-order reduction, initiated by [12], gained momentum thanks to [1]. This approach has been the focus of extensive research in recent years, culminating in truly stateless algorithms using only polynomial-sized memory [22, 3]. Extensions of stateless techniques to systems admitting blocking, like some synchronization mechanisms, have been proposed very recently [18, 23]. Await instructions we use in our lower bound are blocking. We consider non-blocking systems in our positive result in Proposition 24.
The race reversal technique introduced in [1] has been partially adapted for use in stateful methods [37, 7]. Apart from that, the stateful methods still rely on the original stubborn/persistent set approach. The unfolding based partial-order methods have seen developments like the -partial alternatives [8]. Although IFS oracle comes from standard concepts and problems in partial-order methods, we hope this notion will help to advance the stateful approach.
2 Modeling concurrent systems
For our lower bounds we will consider a very simple standard model of concurrent programs with variables and locks. Variables range over finite domains, and locks are binary. The only operations on variables are write and await. The later operation blocks until the value of a variable is in a specified set of values. We do not need a read operation in our constructions, so we do not mention it to simplify the presentation.
A process is a directed acyclic graph with edges labeled by instructions. The instructions come from the set:
Here, sets the value of variable to , blocks until the value of is in the set of values , acquires lock , and releases it. In case of acyclic processes, it is natural to have the blocking instruction. For example, all classical mutual exclusion algorithms rely on it. If we had cycles, could be simulated with an active wait.
A concurrent program is a finite set of processes, over finite sets of variables and locks. Each variable has a finite domain. All variables and locks are global, meaning that they are shared by all processes. A conditional instruction is modeled by branching in the processes. The semantics of a program is given by a transition system . Its states consist of a local state for each process, and a value for each variable and lock. In the initial state, each process is in its initial local state, and each variable and lock have value . The transitions in are defined according to the usual semantics of instructions from . Observe that is acyclic since each process is acyclic.
We will also consider a much more flexible and expressive model, that we call concurrent system. It consists of a collection of processes synchronizing on common actions. Synchronization on actions can directly encode variables, locks, and many other synchronization primitives. In the context of this paper it will be easier to show the lower bound in this model than in the model with variables and locks. But, since the model is strong the lower bound result for this model is relatively weak. This is why we will eventually refine the lower bound construction to concurrent programs without locks.
A concurrent system is determined by an alphabet of abstract actions, and a set of processes that are graphs with edges labeled by actions. A process must be action deterministic: for every and there is at most one with . The term process is used both in the case of concurrent systems, and concurrent programs, but it will always be clear from the context which model we are using.
Every action has its domain, , that is a set of processes using it. We write if there is an outgoing -transition from state of process . For a sequence of actions we write for , and if there is a path labeled from in process .
The semantics of a concurrent system is a transition system whose states are tuples of states of process transition systems, ; the initial state is the tuple consisting of initial states of each process, ; every action synchronizes processes involved in it: if for , and for . We write for the set of actions labeling transitions outgoing from the global state of .
Finally, we impose an acyclicity condition: for every action , at least one of the processes in must be acyclic. This guarantees that the global transition system is acyclic.
We will need to talk about runs in a transition system , be it for a concurrent program or for a concurrent system . A run is a path in the transition system , not necessarily from the initial state. We write if there is a run labeled with a sequence of actions from to . Sometimes we write just when is not relevant. A maximal run is a run reaching a terminal state – a state with no outgoing transitions. A full run is a maximal run starting in the initial state.
Figure 2 on the left shows Peterson’s mutual exclusion algorithm for processes. The middle-left picture shows its modeling as a concurrent program . It has three variables: , and that range over as in the pseudocode. The middle-right picture shows the same algorithm modeled as a concurrent system . Processes and implement the code while the other processes model the variables , and . Action writes the value to variable . Action checks if the variable has value ; observe that it is enabled in the top state where has value , but not in the bottom state where it has value . Similarly, for actions of . Processes in synchronize on common actions. For instance, synchronizes and , while synchronizes with . We write this as and . The transition system is shown on the right. It represents both and , although we have chosen to show labels from . Each full run corresponds to an execution of Peterson’s algorithm.
3 Partial-order reduction
The initial motivation for partial-order reduction comes from the observation that some sequences of actions may be considered equivalent, and only one of the equivalent sequences needs to be explored. The equivalence relation on sequences is defined in terms of the independence relation on actions: two sequences are por-equivalent if one can be obtained from the other by permuting adjacent independent actions. We will use to denote a por-equivalence relation on sequences of actions. As we will see our arguments do not rely on a particular independence relation, as long as two actions using different processes and different resources are independent.
For concurrent programs with variables the classical independence relation says that two actions of different processes are independent if either they use different variables or they use the same variable and both of them are awaits. In particular lock operations are dependent.
For concurrent systems, the most common independence relation is the one coming from Mazurkiewicz trace theory [26]. Two actions are independent if they have disjoint domains: if .
It will also be useful to use the dual concept of dependent actions. Two actions are dependent, written , if they are not independent. For example, when in case of Mazurkiewicz independence. These notions are extended to sequences: means that is independent of all actions in , and that is dependent on some action from . We write for the set of actions dependent on .
The goal of partial-order reduction is to construct, for a given concurrent system or a concurrent program , a reduced transition system representing all full runs in or (recall that these are maximal runs from the initial state).
Definition 1.
We say that is a reduced transition system for if it is:
-
sound: every full run of is a full run in , and
-
complete: for each full run in there is a full run in such that
A general approach for constructing a reduced transition system is to determine for each state of a covering source set [1]: a subset of enabled actions that is sufficient to explore. For example, if every sequence starting from is equivalent to a sequence starting from then we may choose to include only in the source set. The notion of the first action modulo an equivalence relation on sequences is central for partial-order reduction.
We want a source set in a state to be large enough to contain a first action of at least one representative from every equivalence class of maximal runs from the state. Using the above definition this can be formulated as: a source set in should intersect every for a maximal run from . We formalize this as follows.
Definition 2.
For a state of we define as the set of first sets of all maximal runs from :
A set of actions is a covering source set in if for every .
In particular, if contains all enabled actions from then is a covering source set at . Intuitively, smaller covering source sets should give smaller reduced transition systems. This is not always true. A bigger but incomparable w.r.t. set inclusion covering source set may give a better reduction.
Observe that the notion of a covering source set depends on the por-equivalence relation , as the definition of depends on it.
Suppose we have an assignment of a set of actions for every state of . We can use it to restrict the transition relation to transitions allowed by source sets: define when and . If every is a covering source set in then this restricted transition relation is enough.
Proposition 3.
Let be a finite acyclic transition system with a transition relation . Suppose is the restricted transition relation derived from a covering source set assignment. For every state of , and every maximal run there is a run with .
This proposition allows us to construct a reduced transition system by keeping only transitions and states reachable from the initial state using these transitions. We must underline though that source sets are not the only mechanism needed in partial-order reduction. Sleep-sets are another important ingredient (c.f. Remark 21).
4 Partial-order reduction is NP-hard
Recall that the goal of partial-order reduction is to construct for a given program or a given system , a reduced system that is sound and complete. Among such systems we would ideally like to construct one with the smallest number of states. We write for the smallest number of states of a sound and complete reduced transition system for . Similarly, we write for a system . There may be several non-isomorphic transition systems with a minimal number of states.
Stateless algorithms use the concept of trace-optimality. This means that there are no two equivalent runs in the reduced transition system. As we have noted in the introduction, this concept is not very useful for our purposes. Indeed, the example from Figure 1 shows two trace-optimal transition systems, one of linear size and the other of exponential size. So in stateful partial-order reduction we should aim for a reduced transition system of small size. Even more so as there are also examples where the smallest reduced transition system is necessarily not trace-optimal.
In this section, we show that there is no polynomial-time algorithm for constructing a reduced transition system that is polynomially close to optimal. The next definition makes this precise.
Definition 4.
We say that is an excellent POR algorithm if there are polynomials and such that given a concurrent program , the algorithm constructs a sound and complete transition system for of size bounded by in time .
We use the same definition for concurrent systems .
The main result of this section is
Theorem 5.
If PNP then there is no excellent POR algorithm, even for concurrent programs using only write and await operations.
We will prove this theorem in three steps. First we will consider concurrent systems. The flexibility of synchronization operations makes the constructions used in the proof much simpler, and the arguments can be supported by figures of a reasonable size. In the next step we adapt the construction to concurrent programs with locks and await. In the last step we eliminate locks. The proof is structured in such a way that the same statements need to be proved in each of the three steps.
4.1 Synchronizations
The proof of Theorem 5 uses an encoding of the -SAT problem into a concurrent system . If is not satisfiable then has a small reduced transition system. If it is satisfiable then all reduced transition systems for are big. With such an encoding we show that an excellent POR algorithm would give a polynomial-time algorithm for deciding -SAT.
Consider a -CNF formula over literals and with clauses:
where each is a literal.
The concurrent system is presented in Figure 3. It has two processes for each propositional variable, and , two special processes , and , and two auxiliary processes and . For every we have a process corresponding to variable . Process has two states: top and bottom. There are two transitions from top to bottom state, the transition intuitively says that should be true, and that it should be false. This choice is not encoded in the reached state though, as the two transitions go to the same bottom state where a transition on is possible; interactions with special processes will make the difference. Process is similar, but now we have , , and actions.
Apart from the variable processes and , we have two special processes and . Only depends on formula . Process synchronizes on actions, and on actions. Process starts with a choice between and , while the second part of corresponds to the clauses of the formula , and finishes with action . Finally, there are two auxiliary processes and . The first does that is enabled in the initial state. The second can do , but only after doing , and this in turn can happen only when terminates.
Lemma 6.
If is not SAT then all runs of start with . In other words, for every full run of .
Proof.
Suppose is a full run without in . Since is always possible until is executed, we must have on . This must be preceded by . So the first part, call it , of consists of synchronizations of variable processes and with , without moving until completes its run, that is until it does . From the form of it follows that the first part of is a sequence where each is either or . This defines a valuation . At this stage, the variable processes corresponding to literals true in are in their bottom states where the actions on these literals are possible. The other variable processes are in their top states where the action on the corresponding literals are impossible. So, since cannot move, can get to action iff is a satisfying valuation. As is not SAT this is impossible. Hence, such cannot exist.
From the above proof we can actually see that appears on the run iff is satisfiable. We do not need this fact in the rest of the argument, but it may be helpful to fix the intuitions.
Lemma 7.
If is not SAT then the transition system from Figure 4 is a sound and complete transition system for .
Proof.
First observe that in the state just before ’s, all variable processes are in their bottom states, so all events and are enabled. Thus, all local paths of on ’s are feasible. Then action is possible, but is disabled since is in its bottom state. Thus, the transition system from Figure 4 is sound: every full path is a run of .
It remains to verify that the transition system is complete. By Lemma 6, is a first action of all full runs of . Take such a run . Looking at we see that after there are four possible actions: . We will consider only the case when the next action is as the argument for the other possibilities is analogous. So for some sequences and ; observe that must appear on the run, as after there is no action that can disable it. Since the only action on which can synchronize is , all actions in are synchronizations with . Clearly they do not involve , as there is no way to execute at this stage. Hence, is independent of giving us that is trace equivalent to . Repeating this reasoning we obtain that is trace equivalent to a run in the transition system from Figure 4.
Lemma 8.
If is SAT then there are runs containing . In every sound and complete reduced transition system for there are at least as many states as there are satisfying valuations for .
Proof.
For a valuation we consider a run taking if and taking if . For example, if , and hold but does not hold in then this run would look something like:
Here is a literal that holds in the clause . Since is a satisfying valuation, process can get to .
Observe that there is no concurrency in the run . All and actions synchronize with process . Then and are also dependent on each other as they happen on the same process. Action is the first action of . It is followed by a sequence of actions of process .
Consider two different valuations and satisfying . Let be the state of system reached after on the run . Similarly, for and . In state , the variable processes that are in the bottom states are those corresponding to literals that are true in and similarly for . Since and are distinct valuations, the states and are distinct. So any sound and complete transition system for must have at least as many states as there are satisfying valuations of .
We write for the length of . Observe that this is an upper bound on the number of variables as well as on the number of clauses in , namely, .
Corollary 9.
If is not SAT then states. If is SAT then is bigger than the number of satisfying valuations for .
Now, we prove Theorem 5 for concurrent systems. Suppose to the contrary that is an excellent POR algorithm. We use it to solve SAT in deterministic polynomial time.
Let and be the polynomials associated to . Namely, working in time produces a sound and complete reduced transition system of size at most .
Given a formula , consider an integer and a formula
where are new variables. Clearly is satisfiable iff is. If is satisfiable then has at least satisfying valuations.
Now we construct our system and run on it for time. If is not SAT then, by Corollary 9, the algorithm stops and produces a sound and complete transition system. If is SAT then by Corollary 9 the algorithm cannot stop in this time as the smallest sound and complete transition system for has at least states, and we can choose big enough so that .
4.2 Await and locks
In this section we show that the same phenomenon appears in concurrent programs. Here instead of direct synchronizations we have variables and locks. The only operations on variables are write and await.
The argument follows the same lines as the one for the direct synchronizations. Observe that the proof of Theorem 5 above uses Lemmas 6, 7 and 8. So it is enough to show that the three lemmas hold for concurrent programs with locks and variables to prove Theorem 5.
Given a propositional formula , we build a concurrent program . As in the previous construction, we will have two processes for each propositional variable in , and two special processes: one for lambdas and one for thetas. However, we do not need processes and anymore. Initially all variables in are set to .
Figure 5 shows the processes and for propositional variable , as well as the two special processes and . The processes and use two locks, and . After acquiring one of the two locks, each process sets the corresponding variable or to in order to communicate their choice to the two special processes and . Next, it awaits an acknowledgment from the special process in the form of the value of the variable or becoming . Finally, it sets or to .
The two special processes and are quite similar to the previous construction. Only process depends on the formula we are encoding. Process first tests if the value of the variable is or . Next it awaits for variable to become , and then sets it to . The same pattern repeats for up to . Similarly, process awaits for variable to become , and then sets it to . After that, it tests if the formula is true, under the chosen valuation, and finally sets to .
As in the previous section, we claim that if is not satisfiable, then there is a sound and complete transition system of a small size. This transition system has a similar structure to the one in the previous section (Figure 4), but is longer due to additional actions await and write. So our argument, while similar, becomes more complicated too.
Lemma 10.
If is not satisfiable, then can be set to only after process executes .
Proof.
Suppose not. Then there is a run setting to without doing ; meaning that is set to before moves. On this run all must be set to by processes and , and then set to by process . Exactly one among processes and can proceed to setting or to ; as the other needs to acquire , and then wait for to become . Hence, when reaches its part, only one of or is set to . This defines a valuation, and can set to if and only if this valuation satisfies . Hence, if is not satisfiable, then cannot set to .
Lemma 11.
If is not satisfiable, then there is a sound and complete transition system of size .
We describe the transition system . By the previous lemma, if is not satisfiable, must perform before is set to . Thus, every run is equivalent to a run starting with . Then after the first transition, is a sequence of blocks as shown in Figure 6. The upper line and the lower line depend on which process takes the lock and the lock. Then, these blocks in , we have the part of process , including the last transition that sets variable to . So the size of the transition system is linear in the size of .
The proof of the third lemma is the same as for the previous construction.
Lemma 12.
If is SAT then there are runs containing . In every sound and complete reduced transition system for there are at least as many states as there are satisfying valuations for .
The three lemmas give us the proof of Theorem 5 for concurrent programs with locks and variables.
4.3 Just await
The previous construction uses locks and await instructions on variables. There is a number of classical algorithms for implementing locks with await, like Peterson, Bakery, or Szymanski. So it is natural to ask if we could eliminate locks from our construction. This is the goal of this section.
In the previous construction we used locks, but each lock was shared between only two processes, and taken only once. We cannot just use one of the standard lock implementations out of the box because our construction is based on and competing for two locks. Moreover, we need to be careful with the number of states of the resulting transition system. So while there are no important new ideas in the encoding below, we face a challenge resembling a bit the classical problem of writing a program that prints itself.
Our solution is inspired by Peterson’s mutual exclusion algorithm for 2 processes that uses a shared variable ( in Figure 2) to order the processes when they both request access to the critical section. Figure 7 (left) shows the two processes and that only use instructions await and write. Following the principle of Peterson’s algorithm, the processes and share a variable that replaces the lock operations, and that ensures that one of and will set while the other will set . The await instructions on variable are followed by the same sequences of actions on or as previously. The two special processes and are as in Figure 5. For notational convenience, we assume that the variables are initialized to , whereas is initialized to . The differences between and are highlighted in Figure 7.
The proof takes the same three steps as the previous one.
Lemma 13.
If is not satisfiable, then can be set to only after process performs .
Proof.
Suppose there is a run where is not a first action. This can only happen when is executed before process starts. Hence, all must be set to by variable processes. This means that exactly one of the processes or can complete its operations and set or to ; while the other will not pass . If is not satisfiable then this is not enough for process to reach the point where it can set to .
Lemma 14.
If is not satisfiable, then there is a sound and complete transition system of size .
The argument is similar to the one for the construction with locks. has a similar structure as before, with blocks as in Figure 7 (right).
Lemma 15.
If is SAT then there are runs containing . In every sound and complete reduced transition system for there are at least as many states as there are satisfying valuations for .
The third lemma is the same as in the previous construction, and its proof is also the same. The tree lemmas give us the proof of Theorem 5 for concurrent programs using only await operations.
5 An approach to partial-order reduction
In the light of the negative results from the previous section, it is clear that practical approaches to stateful partial-order reduction should be based on heuristics, and cannot aim at optimality, or even at approximate optimality up to a polynomial factor. In this section we propose a concrete problem, that when solved with a satisfactory heuristic should advance the quality of our partial-order methods. This approach also allows to make a link with stateless partial-order methods. It leads to a simple trace-optimal algorithm for the stateless case for a restricted class of concurrent systems.
We start with a simple algorithm for constructing trace-optimal reduced transition systems. It uses three ingredients that we describe in more details later.
-
Lexicographic order on sequences allowing to determine a representative run for each class of a por-equivalence relation .
-
Sleep sets giving enough information about the exploration context.
-
An oracle “includes first sets”, denoted , permitting to avoid sleep-blocked runs.
Lexicographic ordering has been already used in the context of partial-order reduction [19]. Sleep sets are one of the classical concepts for partial-order methods [14]. oracle is a direct way of avoiding sleep-blocked executions, a well known challenge in partial-order reduction [1]. Our contribution here is to make the problem explicit. Without surprise at this stage, the problem is NP-hard, but we show one case where it is polynomial.
Let us fix a concurrent system . Let us also assume that we have some linear ordering on actions of . This determines a lexicographic ordering on sequences of actions. Given some equivalence relation on sequences we can use the lexicographic order to define representatives for equivalence classes of relation.
Definition 16.
We say that is a lex-sequence if is the smallest lexicographically among all sequences equivalent to it: that is the smallest sequence in . A lex-run of is a run whose labels form a lex-sequence.
We are going to present an algorithm enumerating full lex-runs of . For this we will use the “includes first set (IFS)” oracle. The idea is as follows. Suppose the algorithm has reached a state and produced a sleep set containing actions that need not be explored from . We would like to check if there is something left to be explored. We need to check if the exploration is not sleep-blocked, namely, if there is a maximal run from with . If there is such run, we need to explore one of the actions in from .
Definition 17 (IFS).
Let be a state of and a subset of actions. We say that includes a first-set in if there is a maximal run from with . We write when there exists such a maximal run .
Listing 1 presents a very simple algorithm enumerating all full lex-runs of . The algorithm also gives us an opportunity to explain sleep sets in details. Each node of the tree constructed by the algorithm is a pair consisting of a state of , denoted , and a set of actions . For readability, we write instead of , for the set of outgoing actions from .
Sleep sets are a very elegant mechanism to gather some information about the exploration context. They can be computed top-down when constructing an exploration graph. At the root, the sleep set is empty. For a node and a transition we have , where are labels of transitions from created before the -transition; and the final term means that we remove all the actions dependent on . The intuition behind this formula is as follows. Assume that we keep an invariant:
(sleep-invariant): after exploration of a node for every maximal run from in such that we have a path from with .
Then the formula for says that we need not explore from a run starting, say, with if . This is because when looking from such a run has in its first set, and it has already been explored from ’s successor of .
The algorithm from Listing 1 examines all enabled transitions in a node in our fixed order on actions. For every enabled transition it uses the oracle in Line 10 to decide if it is necessary to explore it. If the answer is positive, then it creates node with an appropriate sleep set. If not, then it skips the transition. This guarantees that the algorithm is trace-optimal.
Lemma 18.
The algorithm in Listing 1 constructs a tree such that: (i) every full run in the tree is a full lex-run of , and (ii) for every full run of there is a unique full run in the tree with .
We could use this algorithm in stateless model-checking if we had an implementation of . In a stateful version, the algorithm is not very interesting as trees produced by this algorithm can be, and often are, orders of magnitude bigger than , the transition system of without any reduction. But we can modify the algorithm to produce a graph instead of a tree. For this we need to introduce a subsumption relation between nodes.
Definition 19.
We say that subsumes , in symbols if the states in and are the same: , and there are less sleep-blocked actions from than from : .
Observe that if subsumes , all runs that are not sleep blocked from are also not sleep blocked from . Thus replacing by still yields a sound and complete reduced transition system. Thus, we add subsumption to Listing 1 as shown in Listing 2. We first check if the successor of is subsumed by an existing node in line 11. If yes, then, instead of creating a new node, we add an edge to the subsuming node (line 12). Otherwise, we proceed as in Listing 1. Observe that the algorithm in Listing 2 builds a directed acyclic graph (DAG) instead of a tree, hence avoiding visiting a node more than once.
The new algorithm does not satisfy a statement as in Lemma 18 because edges added due to the subsumption relation may create paths that are not lex-runs. We can get a variant of this optimality property when looking at states. Let us call a state lex-useful if it appears on a full lex-run.
Lemma 20.
The algorithm with subsumption in Listing 2 builds a reduced transition system such that: (i) every full run in is a full lex-run of , and (ii) for every full run of there is a full run in with . Moreover, for every node , its state is lex-useful.
Remark 21.
Sleep sets are needed for the optimality result, in a sense that some information about exploration context is needed. Consider the system of four process in Figure 8 (left) and its runs (right).
The algorithm first explores the run , and then in the root state asks if there is a run where is not a first action. The answer is positive because of the run . The algorithm then decides to explore as it is the smallest enabled action. At this point it arrives at . Without a sleep set it has no choice but to explore both and from . But so this is not trace-optimal. With sleep sets we get which is exactly what is needed to block unnecessary exploration. So an idealized view suggested by Section 3 that partial-order reduction boils down to finding good covering source sets is not the whole story. Fortunately, the additional required information is easy to compute from the context.
5.1 The IFS oracle
The use of queries gives a very clean approach to partial-order reduction, unfortunately the problem is NP-hard.
Proposition 22.
The following problem is NP-hard: given a concurrent system , its global state , and a set of actions , does hold?
Proof.
Suppose we are given a 3CNF formula consisting of clauses over variables . Say the -th clause is of the form where each is either some variable or its negation . We will construct a concurrent program such that is satisfiable if and only if holds for the initial state and some set of actions we make precise below.
All actions in will implement taking or releasing a lock. There is one lock per propositional variable and its negation, ; one per clause, ; one additional lock for every variable in , as well as a special lock .
System is presented in Figure 9. For every propositional variable there are two processes and . For every clause there are processes . Additionally, there are two processes and .
Consider the state where every process is in its initial state, and where the locks are taken whereas the other locks are available. At this point the only processes enabled are , and the processes corresponding to propositional variables, namely . Process can take the lock . For each , either process or process takes the lock which amounts to choosing a value for variable , as either the lock or the lock is released, respectively. This may allow some process corresponding to a clause to move. For example, suppose was released, and , the first literal of the -th clause, is . Then process can take and release thus testing if was available. If the lock was available, then it can release . Releasing intuitively means that the -th clause is satisfied. Thus, all locks can be released if and only if the formula is satisfied by the valuation determined by locks from that have been released. If all are released then process can take all these locks and then take . So if is satisfiable then there is a maximal run of the system where is not a first action. If is not satisfiable then there is no way to release all the locks , so cannot take . In this case is in the first set of every maximal run because process can always take . This shows that holds iff is satisfiable (where is the set of all actions in ).
In our opinion it is worthwhile to study heuristics for the problem. One possibility is to use existing static [14, 30, 34] or dynamic [12, 1, 18, 22] partial-order techniques to compute, for each node, a set of actions to visit to ensure completeness. Another intriguing approach consists in encoding as a SAT problem, and then use a solver. This makes sense since the problem is NP-hard. Designing heuristics that are efficient in practice is out of the scope of this paper. Still, we show that the problem can actually be solved efficiently in a special case of non-blocking systems.
There is a simple setting where the test can be done in polynomial time. Here the only operations are reads and writes, and programs are straight lines, in particular there are no conditionals. This case is indeed considered in the stateless model-checking literature [5, 22]. We generalize this situation to the case when we have variables but no operation on a variable can ever be blocked. So instead of just reads and writes we can also have operations like test-and-set, or fetch-and-add.
Definition 23.
An extended concurrent program is a concurrent program with arbitrary instructions on variables. It is fully non-blocking if in every state every first action of every process is enabled. It is linear if every process is a sequence of instructions, meaning there is no branching or conditionals.
We use the term fully non-blocking to differentiate from non-blocking from [1] that is rather a forward non-blocking condition.
Proposition 24.
If is a linear, fully non-blocking extended concurrent program then test can be done in polynomial time.
Recall that means that there is a maximal run from with . Let denote the set of actions not in . For a sequence of actions we denote by the set , namely the set of actions in that are not dependent on actions in . Then, we can check if by constructing a run as follows. Suppose is an already constructed run from . Extend with the smallest action enabled in the state reached after , that is not in . When there is no such action return true if is maximal, and false otherwise.
Thus, in the case covered by the above proposition the algorithm from Listing 1 is optimal and works in polynomial time w.r.t. size of the constructed tree. It works differently than race-reversal algorithms [1, 3].
The test is NP-hard when one of the two conditions of Proposition 24 is lifted. This does not preclude that there is some case between fully non-blocking and blocking for which IFS is in Ptime. However, it is quite hard to imagine how to weaken the linearity condition.
6 Conclusions
Our main result is a negative one, indicating that we cannot hope for too much in terms of theoretically good partial-order algorithms. It has been known before that certain existing partial-order methods are NP-hard [30], but here we show that no method can be both polynomially close to optimal and not NP-hard.
As we have explained in the introduction, we believe that stateful partial-order methods have applications, and that there are even more applications to come. In this context our negative result gives us an argument for looking at well motivated heuristics rather than for a theoretically grounded result. We propose a new schema of a partial-order reduction algorithm based on IFS test. While in general the test is NP-complete, we hope that good heuristics for this test can translate into progress in partial-order methods. This is a direction we are actively investigating.
There is a big variety of concurrent systems, in particular depending on communication operations between processes. We have exhibited one case when IFS test can be done in Ptime. Intuitively, it captures “fully non-blocking” situations. An interesting question is what happens in the forward non-blocking case [1], or maybe to some other formalization of non-blocking in the spirit wait-free operations [16] or MPI [27].
References
- [1] Parosh Aziz Abdulla, Stavros Aronis, Bengt Jonsson, and Konstantinos Sagonas. Source Sets: A Foundation for Optimal Dynamic Partial Order Reduction. Journal of the ACM, 64(4):1–49, 2017. doi:10.1145/3073408.
- [2] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Frederik Meyer Bønneland, Sarbojit Das, Bengt Jonsson, Magnus Lang, and Konstantinos Sagonas. Tailoring Stateless Model Checking for Event-Driven Multi-threaded Programs. In Étienne André and Jun Sun, editors, Automated Technology for Verification and Analysis, pages 176–198. Springer Nature Switzerland, 2023. doi:10.1007/978-3-031-45332-8_9.
- [3] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Sarbojit Das, Bengt Jonsson, and Konstantinos Sagonas. Parsimonious optimal dynamic partial order reduction. In Arie Gurfinkel and Vijay Ganesh, editors, Computer Aided Verification - 36th International Conference, CAV 2024, Montreal, QC, Canada, July 24-27, 2024, Proceedings, Part II, volume 14682 of Lecture Notes in Computer Science, pages 19–43. Springer, 2024. doi:10.1007/978-3-031-65630-9_2.
- [4] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Sarbojit Das, Bengt Jonsson, and Konstantinos Sagonas. Trading space for simplicity in stateless model checking. In Susanne Graf, Paul Pettersson, and Bernhard Steffen, editors, Real Time and Such - Essays Dedicated to Wang Yi to Celebrate His Scientific Career, volume 15230 of Lecture Notes in Computer Science, pages 79–97. Springer, 2025. doi:10.1007/978-3-031-73751-0_8.
- [5] Pratyush Agarwal, Krishnendu Chatterjee, Shreya Pathak, Andreas Pavlogiannis, and Viktor Toman. Stateless Model Checking Under a Reads-Value-From Equivalence. In Alexandra Silva and K. Rustan M. Leino, editors, Computer Aided Verification, volume 12759, pages 341–366. Springer International Publishing, 2021. doi:10.1007/978-3-030-81685-8_16.
- [6] Marek Chalupa, Krishnendu Chatterjee, Andreas Pavlogiannis, Nishant Sinha, and Kapil Vaidya. Data-centric dynamic partial order reduction. Proceedings of the ACM on Programming Languages, 2(POPL):1–30, 2018. doi:10.1145/3158119.
- [7] Berk Cirisci, Constantin Enea, Azadeh Farzan, and Suha Orhun Mutluergil. A Pragmatic Approach to Stateful Partial Order Reduction. In Cezara Dragoi, Michael Emmi, and Jingbo Wang, editors, Verification, Model Checking, and Abstract Interpretation, volume 13881, pages 129–154. Springer Nature Switzerland, 2023. doi:10.1007/978-3-031-24950-1_7.
- [8] Camille Coti, Laure Petrucci, César Rodríguez, and Marcelo Sousa. Quasi-optimal partial order reduction. Formal Methods in System Design, 57(1):3–33, 2021. doi:10.1007/s10703-020-00350-4.
- [9] Frank S. De Boer, Marcello Bonsangue, Einar Broch Johnsen, Violet Ka I Pun, S. Lizeth Tapia Tarifa, and Lars Tveito. SymPaths: Symbolic Execution Meets Partial Order Reduction. In Wolfgang Ahrendt, Bernhard Beckert, Richard Bubel, Reiner Hähnle, and Mattias Ulbrich, editors, Deductive Software Verification: Future Perspectives, volume 12345, pages 313–338. Springer International Publishing, 2020. doi:10.1007/978-3-030-64354-6_13.
- [10] Azadeh Farzan. Commutativity in Automated Verification. In 2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–7. IEEE, 2023. doi:10.1109/LICS56636.2023.10175734.
- [11] Azadeh Farzan, Dominik Klumpp, and Andreas Podelski. Sound sequentialization for concurrent program verification. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pages 506–521. ACM, 2022. doi:10.1145/3519939.3523727.
- [12] Cormac Flanagan and Patrice Godefroid. Dynamic Partial-Order Reduction for Model Checking Software. In POPL’05, 2005.
- [13] Patrice Godefroid. Using partial orders to improve automatic verification methods. In Edmund M. Clarke and Robert P. Kurshan, editors, Computer-Aided Verification, pages 176–185. Springer, 1991. doi:10.1007/BFb0023731.
- [14] Patrice Godefroid. Partial-Order Methods for the Verification of Concurrent Systems An Approach to the State-Explosion Problem. PhD thesis, Université de Liège, 1994.
- [15] R. Govind, Frédéric Herbreteau, Srivathsan, and Igor Walukiewicz. Abstractions for the local-time semantics of timed automata: A foundation for partial-order methods. In Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science, pages 1–14. ACM, 2022. doi:10.1145/3531130.3533343.
- [16] Maurice Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1):124–149, 1991. doi:10.1145/114005.102808.
- [17] Casper S. Jensen, Anders Moller, Veselin Raychev, Dimitar Dimitrov, and Martin Vechev. Stateless model checking of event-driven applications. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 57–73. Association for Computing Machinery, 2015. doi:10.1145/2814270.2814282.
- [18] Bengt Jonsson, Magnus Lang, and Konstantinos Sagonas. Awaiting for Godot Stateless Model Checking that Avoids Executions where Nothing Happens. In {22nd }Formal Methods in Computer-Aided Design, {}FMCAD{} 2022. IEEE, 2022.
- [19] Vineet Kahlon, Chao Wang, and Aarti Gupta. Monotonic Partial Order Reduction: An Optimal Symbolic Partial Order Reduction Technique. In Ahmed Bouajjani and Oded Maler, editors, Computer Aided Verification, volume 5643, pages 398–413. Springer Berlin Heidelberg, 2009. doi:10.1007/978-3-642-02658-4_31.
- [20] Shmuel Katz and Doron Peled. Verification of distributed programs using representative interleaving sequences. Distributed Computing, 6(2):107–120, 1992. doi:10.1007/BF02252682.
- [21] Michalis Kokologiannakis, Rupak Majumdar, and Viktor Vafeiadis. Enhancing GenMC’s Usability and Performance. In Bernd Finkbeiner and Laura Kovács, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 66–84. Springer Nature Switzerland, 2024. doi:10.1007/978-3-031-57249-4_4.
- [22] Michalis Kokologiannakis, Iason Marmanis, Vladimir Gladstein, and Viktor Vafeiadis. Truly stateless, optimal dynamic partial order reduction. Proceedings of the ACM on Programming Languages, 6(POPL):1–28, 2022. doi:10.1145/3498711.
- [23] Michalis Kokologiannakis, Iason Marmanis, and Viktor Vafeiadis. Unblocking Dynamic Partial Order Reduction. In Constantin Enea and Akash Lal, editors, Computer Aided Verification, volume 13964, pages 230–250. Springer Nature Switzerland, 2023. doi:10.1007/978-3-031-37706-8_12.
- [24] Michalis Kokologiannakis, Iason Marmanis, and Viktor Vafeiadis. SPORE: combining symmetry and partial order reduction. Proc. ACM Program. Lang., 8(PLDI):1781–1803, 2024. doi:10.1145/3656449.
- [25] Pallavi Maiya, Rahul Gupta, Aditya Kanade, and Rupak Majumdar. Partial Order Reduction for Event-Driven Multi-threaded Programs. In Marsha Chechik and Jean-François Raskin, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 680–697. Springer, 2016. doi:10.1007/978-3-662-49674-9_44.
- [26] Antoni W. Mazurkiewicz. Introduction to trace theory. In Volker Diekert and Grzegorz Rozenberg, editors, The Book of Traces, pages 3–41. World Scientific, 1995. doi:10.1142/9789814261456_0001.
- [27] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard Version 4.1, November 2023. URL: https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf.
- [28] Burcu Kulahcioglu Ozkan, Rupak Majumdar, and Simin Oraee. Trace aware random testing for distributed systems. Proceedings of the ACM on Programming Languages, 3(OOPSLA):1–29, 2019. doi:10.1145/3360606.
- [29] Doron Peled. Partial-order reduction. In Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem, editors, Handbook of Model Checking, pages 173–190. Springer, 2018. doi:10.1007/978-3-319-10575-8_6.
- [30] Doron A. Peled. All from one, one for all: on model checking using representatives. In Costas Courcoubetis, editor, Computer Aided Verification, 5th International Conference, CAV ’93, Elounda, Greece, June 28 - July 1, 1993, Proceedings, volume 697 of Lecture Notes in Computer Science, pages 409–423. Springer, 1993. doi:10.1007/3-540-56922-7_34.
- [31] The Anh Pham. Efficient State-Space Exploration for Asynchronous Distributed Programs: Adapting Unfolding-Based Dynamic Partial Order Reduction to MPI Programs. These de doctorat, Rennes, Ecole normale superieure, 2019.
- [32] César Rodríguez, Marcelo Sousa, Subodh Sharma, and Daniel Kroening. Unfolding-based Partial Order Reduction. LIPIcs, Volume 42, CONCUR 2015, 42:456–469, 2015. doi:10.4230/LIPICS.CONCUR.2015.456.
- [33] Samira Tasharofi, Rajesh K. Karmani, Steven Lauterburg, Axel Legay, Darko Marinov, and Gul Agha. TransDPOR: A Novel Dynamic Partial-Order Reduction Technique for Testing Actor Programs. In Holger Giese and Grigore Rosu, editors, Formal Techniques for Distributed Systems, pages 219–234. Springer, 2012. doi:10.1007/978-3-642-30793-5_14.
- [34] Antti Valmari. Stubborn sets for reduced state space generation. In G. Goos, J. Hartmanis, D. Barstow, W. Brauer, P. Brinch Hansen, D. Gries, D. Luckham, C. Moler, A. Pnueli, G. Seegmüller, J. Stoer, N. Wirth, and Grzegorz Rozenberg, editors, Advances in Petri Nets 1990, volume 483, pages 491–515. Springer Berlin Heidelberg, 1991. doi:10.1007/3-540-53863-1_36.
- [35] Bjorn Wachter, Daniel Kroening, and Joel Ouaknine. Verifying multi-threaded software with impact. In 2013 Formal Methods in Computer-Aided Design, pages 210–217. IEEE, 2013. doi:10.1109/FMCAD.2013.6679412.
- [36] Chao Wang, Zijiang Yang, Vineet Kahlon, and Aarti Gupta. Peephole Partial Order Reduction. In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 4963, pages 382–396. Springer Berlin Heidelberg, 2008. doi:10.1007/978-3-540-78800-3_29.
- [37] Yu Yang, Xiaofang Chen, Ganesh Gopalakrishnan, and Robert M. Kirby. Efficient Stateful Dynamic Partial Order Reduction. In Klaus Havelund, Rupak Majumdar, and Jens Palsberg, editors, Model Checking Software, pages 288–305. Springer, 2008. doi:10.1007/978-3-540-85114-1_20.
