Abstract 1 Introduction 2 Preliminaries 3 Problem Statement 4 Certificate for Distributional 𝝎-regular Specifications 5 Template-based Strategy Verification and Synthesis with Certificates 6 Experimental Evaluation 7 Conclusion References

Omega-Regular Verification and Control for Distributional Specifications in MDPs

S. Akshay ORCID Dept of CSE, Indian Institute of Technology Bombay, Mumbai, India Ouldouz Neysari ORCID Singapore Management University, Singapore
University of Tehran, Iran
Đorđe Žikelić ORCID Singapore Management University, Singapore
Abstract

A classical approach to studying Markov decision processes (MDPs) is to view them as state transformers. However, MDPs can also be viewed as distribution transformers, where an MDP under a strategy generates a sequence of probability distributions over MDP states. This view arises in several applications, even as the probabilistic model checking problem becomes much harder compared to the classical state transformer counterpart. It is known that even distributional reachability and safety problems become computationally intractable (Skolem- and positivity-hard). To address this challenge, recent works focused on sound but possibly incomplete methods for verification and control of MDPs under the distributional view. However, existing automated methods are applicable only to distributional reachability, safety and reach-avoidance specifications.

In this work, we present the first automated method for verification and control of MDPs with respect to distributional omega-regular specifications. To achieve this, we propose a novel notion of distributional certificates, which are sound and complete proof rules for proving that an MDP under a distributionally memoryless strategy satisfies some distributional omega-regular specification. We then use our distributional certificates to design the first fully automated algorithms for verification and control of MDPs with respect to distributional omega-regular specifications. Our algorithms follow a template-based synthesis approach and provide soundness and relative completeness guarantees, while running in PSPACE. Our prototype implementation demonstrates practical applicability of our algorithms to challenging examples collected from the literature.

Keywords and phrases:
MDPs, Distributional objectives, ω-regularity, Certificates
Copyright and License:
[Uncaptioned image] © S. Akshay, Ouldouz Neysari, and Đorđe Žikelić; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Verification by model checking
Funding:
This work was supported in part by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant (Project ID:22-SIS-SMU-100), Google Research Award 2023 and the SBI Foundation Hub for Data Science & Analytics, IIT Bombay.
Editors:
Patricia Bouyer and Jaco van de Pol

1 Introduction

Markov decision processes (MDPs) are a standard model for reasoning and sequential decision making in the presence of uncertainty. The verification community has long studied MDPs as state transformers, where their semantics are interpreted over cylinder sets of paths (see e.g. [8]). As a result, quantitative verification questions focus on state-based properties, such as the eventual reachability of a state with maximum probability over all MDP strategies. There is a rich body of literature on efficient algorithms for reasoning about state-based properties in MDPs, including model checking over expressive logics such as PCTL* [30].

An orthogonal class of objectives, which forms our focus in this paper, considers properties that are defined over the space of probability distributions over MDP states, rather than the state space of the MDP. This allows one to reason about the movement of the probability mass, for instance, one can say that always in the future the probability mass is equally divided between two bi-stable states. Such objectives, that we call distributional objectives, are simpler to reason about under alternative semantics which view MDPs as distribution transformers. In this view, starting from some initial distribution over MDP states, the MDP under a strategy induces a sequence of distributions over MDP states, generating a new distribution at each time step. One can then specify properties with respect to this sequence of distributions, such as distributional reachability or safety. This view naturally arises in several applications, including multi-agent systems [9, 5] or biochemical reaction networks [29, 28]. However, it turns out that even the simplest distributional properties such as distributional reachability and safety cannot be expressed in PCTL* [10], rendering classical probabilistic model checking algorithms inapplicable to reasoning about distributional specifications. This means that reasoning about distributional properties in MDPs requires new methods. The past decade has seen a rich line of theoretical work on analyzing Markov chains and MDPs as distribution transformers [31, 29, 11, 6, 4, 5, 2, 26]. However, existing automated methods are restricted to distributional reachability, safety, or reach-avoidance specifications.

In this paper, we present the first automated method for strategy verification and synthesis in MDPs with respect to distributional ω-regular specifications, significantly extending the class of distributional objectives for which automated methods are available. In doing so, we focus on the verification problem for a given MDP strategy, as well as the control problem which asks to synthesize an MDP strategy which ensures that a distributional ω-regular specification is satisfied. Our strategy verification and synthesis methods are based on the novel notion of distributional certificates which we introduce in this work. Distributional certificates provide a sound and complete proof rule for proving that an MDP under a distributionally memoryless strategy satisfies the distributional ω-regular specification of interest. We restrict our attention to distributionally memoryless strategies, which make moves based on the current distribution rather than the state, and which are known to be sufficient for reasoning about distributional reachability and safety [4, 5]. Our distributional certificates build on the ideas from program verification and certificates such as ranking function [22], invariants [21] or Büchi ranking functions [17], and bring these ideas to the setting of reasoning about distributional ω-regular specifications in MDPs.

We then present our automated algorithms for strategy verification and synthesis in MDPs with respect to distributional ω-regular specifications. In the setting of distributional objectives, it is known that safety and reachability are already hard, or more precisely, positivity and Skolem-hard [3]. The decidability of both are long-standing open problems in linear dynamical systems [32]. As a result, rather than aiming for sound and complete algorithms that would inherently be computationally expensive/infeasible, we adopt a template-based synthesis approach and instead design algorithms that can more efficiently search for distributional certificates and distributionally memoryless strategies that can be expressed in affine arithmetic. By fixing symbolic affine templates for the certificate and the strategy and by using existing methods for solving quantified formulas over real arithmetic, one is able to reduce the strategy verification and synthesis problems to satisfiability checking in existential theory of the reals, therefore obtaining sound algorithms that run in PSPACE. Furthermore, our algorithms also provide relative completeness guarantees for computing an affine distributional certificate and a memoryless strategy, whenever they exist.

We implement our approach and consider standard benchmarks and examples from the literature, while focusing on several distributional ω-regular specifications for our evaluation. Our results show the practicality of the approach and the potential for future applications.

Related work.

In addition to the work already mentioned, we discuss a (non-exhaustive list of) few others. Our distributional certificates draw insights from classical certificates for program verification and template-based synthesis algorithms for their computation. Notable examples include ranking functions for proving termination in programs [22, 13] and invariants for proving safety in programs [21, 14]. Our distributional certificates draw insights from Büchi ranking functions of [17] for proving LTL properties in programs. However, there are several important differences. First, we leverage and lift the idea of Büchi ranking to the setting of probability distributions over MDP states. Second, our distributional certificates and algorithms for their computation also need to reason about strategies in MDPs. This is reflected in the following key difference. Our certificates provide soundness and completeness guarantees for all distributional ω-regular specifications and distributionally memoryless strategies, and proving this requires reasoning about reachability under a strategy (see the proof of Theorem 9). The certificate of [17], on the other hand, is complete only for specifications that can be represented via deterministic Büchi automata, if the specification needs to be satisfied from a set of initial states (see Corollary 2 in [17]).

In the distributional setting, the probabilistic logics defined in [10, 11, 2] are all orthogonal to the classical semantics, and the model checking techniques developed are not template-based. To the best of our knowledge, none of these works have been automated. The works [4, 5] propose certificates for distributional reachability, safety and reach-avoidance and design template-based synthesis algorithms for their computation. Our paper follows this line of work and introduces distributional certificates and template-based algorithms for distributional ω-regular specifications, hence significantly generalizing the class of distributional properties that we can automatically reason about.

Certificates were also used for reasoning about infinite-state probabilistic models such as probabilistic programs under the state-based view. In particular, supermartingale certificates were proposed for qualitative reachability [12, 15], quantitative reachability, safety and reach-avoidance [33, 19, 34, 18, 35], and most recently for qualitative ω-regular specifications [1]. However, these certificates are not, a priori, applicable to the distributional setting.

2 Preliminaries

In this section, we recall the basics of probabilistic systems and Markov decision processes. A probability distribution over a countable set X is a map μ:X[0,1] such that xXμ(x)=1. The support of X is defined via supp(μ)={xXμ(x)>0}. We use Δ(X) to denote the set of all probability distributions over X.

MDPs.

A Markov decision process (MDP) is a tuple =(S,𝐴𝑐𝑡,P). We use S to denote a finite set of states and 𝐴𝑐𝑡 to denote a finite set of actions. Slightly overloading the notation, for each state sS, we write 𝐴𝑐𝑡(s)𝐴𝑐𝑡 to denote the set of available actions at s. Finally, P:S×𝐴𝑐𝑡Δ(S) is a transition function, assigning to each state s and available action a𝐴𝑐𝑡(s) a probability distribution over the succcessor states. When |𝐴𝑐𝑡(s)|=1 for each state s, we say that is a Markov chain.

An infinite path in an MDP is a sequence ρ=s1,a1,s2,a2,(S×𝐴𝑐𝑡)ω, such that ai𝐴𝑐𝑡(si) and P(si,ai)(si+1)>0 for all i. A finite path ϱ in an MDP is a finite prefix of an infinite path that ends in a state. We use ρi and ϱi to denote the i-th state along an (in)finite path. We use 𝖨𝖯𝖺𝗍𝗁𝗌 and 𝖥𝖯𝖺𝗍𝗁𝗌 to denote the sets of all infinite and finite paths in the MDP , respectively.

Semantics of MDPs.

The semantics of MDPs are formalized in terms of strategies. A strategy (or policy) in an MDP is a function π:𝖥𝖯𝖺𝗍𝗁𝗌Δ(𝐴𝑐𝑡) which to each finite path (called a history) assigns a probability distribution over the action to be taken next. We require that, if a finite path ϱ𝖥𝖯𝖺𝗍𝗁𝗌 ends in a state s, then supp(π(ϱ))𝐴𝑐𝑡(s). A strategy is said to be memoryless if the probability distribution over actions depends only on the last state of the finite path and not on the whole history, i.e. if π(ϱ)=π(ϱ) whenever ϱ and ϱ end in the same state. For every initial state distribution μ0Δ(S), an MDP and a strategy π together give rise to a probability space over the set of all infinite paths in the MDP [8]. We denote by μ0π the probability measure and by 𝔼μ0π the expectation operators in this probability space, omitting the MDP from the notation when clear from the context.

MDPs as distribution transformers.

MDPs are typically regarded as random generators of infinite paths, giving rise to a probability space over the set of all infinite paths in the MDP. Classical probabilistic model checking problems then explore the expected behaviour of these randomly generated infinite paths, giving rise to path properties [8]. However, one can also view MDPs as (deterministic) transformers of distributions.

Consider an MDP , a strategy π, and an initial state distribution μ0Δ(S). For each i and state s, define μi(s)=μ0π[ρ𝖨𝖯𝖺𝗍𝗁𝗌ρi=s], i.e. the probability that the i-th state of a randomly generated infinite path is s. We write μi=π(μ0,i) for the induced probability distribution of the i-th state of a randomly generated infinite path. Hence, the MDP , a strategy π, and an initial state distribution μ0Δ(S) together give rise to a sequence of probability distributions over the MDP states

μ0,μ1=π(μ0,1),μ2=π(μ0,2),μ3=π(μ0,3),

One can then study properties of this sequence of distributions. Some examples are distributional reachability and distributional safety, which ask if the induced sequence of distributions contains or does not contain an element of some specified set of distributions [4, 5].

𝝎-regular specifications.

In this work we will consider ω-regular specifications, which subsume a broad class of specifications such as those belonging to linear temporal logic (LTL) or computation tree logic (CTL) [8]. An ω-regular specification over a finite set 𝖠𝖯 of atomic propositions is defined by a non-deterministic Büchi automaton (NBA) N=(Q,Σ,δ,q0,F), where Q is a finite set of states, Σ=2𝖠𝖯 is a finite set of letters, δ:Q×Σ2Q is a (non-deterministic) transition function, q0Q is the initial state and FQ are accepting states. An infinite word of letters σ1,σ2,Σω is said to be accepting, if it gives rise to at least one accepting run in N, i.e. if there exists a run q0,q1,q2, such that qi+1δ(qi,σi) for each i and such that qiF for infinitely many i. Note that given an LTL formula φ, it can be converted to an equivalent NBA Nφ in exponential time (see e.g., [8]). In what follows, we will often write examples and benchmarks in LTL as it will be easier and often more intuitive. But for our analysis, we will convert them to their equivalent NBA and reason only about these NBA as the ω-regular specification.

Transition systems.

In order to reason about the synchronous evolution of a sequence of distributions over the MDP states and a run in the NBA, we will later introduce the notion of product distributional transition systems in Section 4. Hence, we here recall the notion of transition systems, which are commonly used to model imperative numerical programs.

An (infinite-state) transition system is a tuple 𝒯=(L,V,linit,θinit,), where

  • L is a finite set of locations,

  • V is a finite set of real-valued variables,

  • linitL is the initial location,

  • θinit|V| is the set of initial variable valuations, and

  • is a finite set of transitions of the form τ=(lτ,lτ,Gτ,Uτ) with lτ a source location, lτ a target location, Gτ a guard which is a boolean predicate over the variables in V, and Uτ:nn an update function.

A state in the transition system is a tuple (l,x)L×|V| consisting of a location in L and a valuation of variables in V. A transition τ=(lτ,lτ,Gτ,Uτ) is said to be enabled at a state (l,x) if l=lτ and xGτ. An infinite path (or a run) in the transition system is a sequence of states (l0,x0),(l1,x1), with l0=linit, x0𝖨𝗇𝗂𝗍, and where for each i0 there exists a transition τi=(li,li+1,Gτ,Uτ) enabled at (li,xi) such that xi+1=Uτ(xi). A state (l,x) is said to be reachable, if there exists an infinite path that contains (l,x).

3 Problem Statement

We now formally define the problems that we consider in this work. Our goal is to design fully automated algorithms for formal verification and control in MDPs with respect to distributional ω-regular specifications. Hence, we first need to formalize the notion of distributional ω-regular specifications. In what follows, let =(S,𝐴𝑐𝑡,P) be an MDP.

Distributional 𝝎-regular specifications.

Similarly to the classical ω-regular specification setting, we first need to specify a finite set of atomic propositions 𝖠𝖯. We are interested in reasoning about a sequence of distributions induced by an MDP under a strategy. Hence, we let the set 𝖠𝖯 consist of finitely many logical formulas of the form 𝖾𝗑𝗉(μ(s1),,μ(s|S|))0. Here, 𝖾𝗑𝗉:|S| is an arithmetic expression over the probabilities μ(s1),,μ(s|S|) of being in each state of the MDP, where s1,,s|S| is an arbitrary (but throughout fixed) enumeration of MDP states. In practice, we let 𝖠𝖯 contain exactly those atomic propositions that appear in the property that we want to reason about. A distributional ω-regular specification φ is then defined by an NBA Nφ=(Q,Σ,δ,q0,F) with Σ=2𝖠𝖯.

We now define the semantics of distributional ω-regular specifications. Consider a finite set of atomic propositions 𝖠𝖯, a distributional ω-regular specification φ, a strategy π and an initial distribution μ0Δ(S) in the MDP . The MDP under strategy π from the initial distribution μ0 induces an infinite word σ0,σ1,σ2, in the language 2𝖠𝖯 as follows. As defined in Section 2, an MDP under strategy π from the initial distribution μ0 induces a sequence μ0,μ1,μ2, of distributions over MDP states. Then, for each i0, we define the letter σi as the set of all atomic propositions in 𝖠𝖯 that are satisfied at the distribution μi, i.e. σi={p𝖠𝖯μip}, where we use to denote proposition satisfaction. We say that the MDP satisfies distributional ω-regular specification φ under strategy π from initial distribution μ0Δ(S), if this infinite word σ0,σ1,σ2, is accepted by the NBA Nφ.

Distributionally memoryless strategies.

We restrict our attention to a class of strategies called distributionally memoryless strategies. A strategy π:𝖥𝖯𝖺𝗍𝗁𝗌Δ(𝐴𝑐𝑡) is said to be distributionally memoryless if the probability distribution over actions prescribed by the strategy depends only on the current distribution over the MDP states and not on the whole history. Formally, we require that for any initial distribution μ0𝖨𝗇𝗂𝗍 and for any two finite runs ρ=s0,a0,s1,a1,,sn and ρ=s0,a0,s1,a1,,sn that induce the sequences of probability distributions μ0,μ1,,μn and μ0,μ1,,μn with μn=μn, we have π(ρ)=π(ρ). When the strategy π is distributionally memoryless, we write π(μ)=π(μ,1) to denote an application of a single-step distribution transformer operator.

It was shown in [4, 5] that distributionally memoryless strategies are sufficient for reasoning about distributional reachability, safety and reach-avoid specifications. That is, for each of these distributional specifications, there exists a strategy in the MDP under which the specification is satisfied if and only if there exists a distributionally memoryless strategy in the MDP under which the specification is satisfied. While this result need not necessarily hold for distributional ω-regular specifications, the restriction will be needed for enabling automated verification and synthesis as they can be represented in a more compact form.

Problem statement.

We are now ready to define our strategy verification and synthesis problems. Consider an MDP , a set of initial distributions 𝖨𝗇𝗂𝗍Δ(S), and a distributional ω-regular specification φ:

  1. 1.

    Strategy verification problem. Given a distributionally memoryless strategy π, verify that the MDP satisfies φ under π from every initial distribution μ0𝖨𝗇𝗂𝗍.

  2. 2.

    Strategy synthesis problem. Compute a distributionally memoryless strategy π, such that the MDP satisfies φ under π from every initial distribution μ0𝖨𝗇𝗂𝗍.

Figure 1: An MDP which will serve as our running example. The MDP contains three states S={A,B,C}, two actions 𝐴𝑐𝑡={a,b} with 𝐴𝑐𝑡(A)={a,b}, 𝐴𝑐𝑡(B)={a}, 𝐴𝑐𝑡(C)={a}, and its transition function is defined via P(A,a)(A)=1, P(A,b)(B)=1, P(B,a)(C)=1, P(C,a)(C)=P(C,a)(A)=0.5. We consider a singleton initial distribution set 𝖨𝗇𝗂𝗍={(A:13,B:13,C:13)}.
Example 1 (Running example).

The MDP shown in Fig. 1 was considered in [4] for studying distributional safety specifications and it will serve as our running example. We consider the strategy synthesis and verification problems with respect to the distributional ω-regular specification φ=𝐆𝐅(p(B)0.249). For readability, we specify φ as an LTL formula over the set of atomic propositions 𝖠𝖯={(μ(B)0.249)}. This is an example of a distributional persistence specification, which specifies that the sequence of distributions μ0,μ1, should contain infinitely many distributions μi with μi(B)0.249. The goal of strategy synthesis is to compute a strategy under which φ is satisfied. An example of such a strategy is a (distributionally) memoryless strategy π which in state A takes action b with probability 1. The goal of strategy verification is to verify this claim.

 Remark 2 (Problem hardness).

The problem of determining if an MDP under a distributionally memoryless strategy π satisfies a distributional ω-regular specification φ is computationally hard. It was shown to be Skolem-hard already in the very restricted setting when is a Markov chain (so the strategy π is trivial) and φ is a distributional reachability specification for an affine set of goal distributions {μΔ(S)μ(s1)=0.25} [3].

 Remark 3 (Universal and existential satisfaction problems).

In the terminology of [5] which considered distributional reachability and safety specifications, our problem corresponds to the universal satisfaction setting, where the specification needs to be satisfied from every initial distribution μ0𝖨𝗇𝗂𝗍. Dually, one can also consider the existential satisfaction setting, where the specification needs to be satisfied from at least one initial distribution μ0𝖨𝗇𝗂𝗍. While we will focus on the universal satisfaction setting for ease of presentation, we also show that all our results straightforwardly extend to the existential satisfaction setting as well.

 Remark 4 (Memoryless vs distributionally memoryless strategies).

Note that distributionally memoryless strategies are not necessarily memoryless (in the “classical” state-based sense). This fact was already shown in [4] for distributional safety specifications, where one may require infinite memory as well as randomized strategies in order to satisfy the specification. This is in stark contrast with the state-based view, where deterministic memoryless strategies are sufficient for specifications such as reachability and safety [8].

4 Certificate for Distributional 𝝎-regular Specifications

We now present our sound and complete certificate for proving that an MDP under a distributionally memoryless strategy satisfies some distributional ω-regular specification, which is the main theoretical contribution of this work. In Section 5, we will present our algorithms for automated synthesis and verification of strategies in MDPs with respect to distributional ω-regular specifications, where the certificate will play a central role.

In what follows, we fix an MDP =(S,𝐴𝑐𝑡,P), a set of initial distributions 𝖨𝗇𝗂𝗍, a distributionally memoryless strategy π, and a distributional ω-regular specification φ defined over atomic propositions 𝖠𝖯 with an NBA Nφ=(Q,Σ,δ,q0,F) where Σ=2𝖠𝖯.

4.1 Product Distributional Transition System

Recall from Section 2 that, for each initial distribution in 𝖨𝗇𝗂𝗍, the MDP and the strategy π induce a sequence of distributions over the MDP states. This sequence gives rise to an infinite word in the language 2𝖠𝖯 and a run in the NBA Nφ. In what follows, we introduce product distributional transition systems (PDTS), which will allow us to synchronously reason about the distribution sequence and the NBA run.

Definition 5 (Product distributional transition system).

Let =(S,𝐴𝑐𝑡,P) be an MDP, 𝖨𝗇𝗂𝗍 be a set of initial distributions, π be a distributionally memoryless strategy, and Nφ=(Q,2𝖠𝖯,δ,q0,F) be an NBA for some distributional ω-regular specification φ defined over atomic propositions 𝖠𝖯. A product distributional transition system (PDTS) is a transition system 𝒯×=(L×,V×,linit×,θinit×,×), where

  • L×=Q is the set of states of Nφ;

  • V×={μ1,,μ|S|} is a finite state of real-valued variables, with each variable μi corresponding to the probability of being in an MDP state si;

  • linit×=q0 is the initial state of Nφ;

  • θinit×=𝖨𝗇𝗂𝗍 is the set of initial distributions in ; and

  • ×={(q,q,G(σ),π)q,qQ,σ2𝖠𝖯,qδ(q,σ)}, where G(σ)=(pσp)(p𝖠𝖯\σ¬p) is the predicate defined by atomic propositions contained in σ, and π is the linear function defined by the single-step distribution transformer operator of and π.

Example 6.

Fig. 2 left shows the NBA for the distributional specification φ=𝐆𝐅(p(B)0.249) considered in Example 1. Fig. 2 right then shows the PDTS of our running example MDP in Fig. 1 and the NBA. The PDTS has the same set of locations L×={q0,q1} as the NBA and the set of variables V×={μ1,μ2,μ3} corresponding to the probabilities of being in MDP states A,B,C. The initial location is linit=q0 and the set of initial distributions is 𝖨𝗇𝗂𝗍={(A:13,B:13,C:13)}. Finally, the three PDTS transitions are shown in Fig. 2.

Note that PDTS indeed models a synchronous execution of a sequence of distributions over MDP states and a run in the NBA. Each infinite path (q0,μ0),(q1,μ1), in the PDTS starts from a state (q0,μ0){q0}×𝖨𝗇𝗂𝗍. Then, for each state (qi,μi) along the infinite path, the next state is obtained by applying some enabled transition (qi,qi+1,G(σi),π). For the transition to be enabled, we must have σi={p𝖠𝖯μip} be the unique letter defined by all atomic propositions satisfied in μi. The PDTS then moves to a state (qi+1,μi+1), where μi+1=π(μi) is indeed the next distribution in the sequence and qi+1δ(qi,σi) is indeed a successor state in the NBA.

Figure 2: The figure on the left shows the NBA for distributional specification φ=GF(p(b)0.249). The figure on the right then shows the PDTS of the MDP in Fig. 1 with the strategy that in state A takes action b with probability 1, and the NBA on the left. Each PDTS transition is labeled with its guard (top line) and its update function (bottom line). We write μ=π(μ) as a shorthand notation for μ={A:0.5μ(C),B:μ(A),C:μ(B)+0.5μ(C)}.

An infinite path (q0,μ0),(q1,μ1), in the PDTS is accepting if (qi,μi)F×Δ(S) for infinitely many i0, i.e. if it visits states with locations belonging to the accepting set of the NBA infinitely often. The following proposition formalizes the relationship between the satisfaction of a distributional ω-regular specification in the MDP and the existence of an accepting infinite path in the PDTS. We state the proposition for a single initial distribution μ0𝖨𝗇𝗂𝗍, so that it is applicable both in the universal and the existential satisfaction problem settings (see Remark 3).

Proposition 7.

An MDP with an initial distribution μ0Δ(S) satisfies a distributional ω-regular specification φ under a distributionally memoryless strategy π if and only if there exists an accepting infinite path in the PDTS 𝒯×=(L×,V×,linit×,θinit×,×).

Proof.

Suppose that MDP with initial distribution μ0 satisfies distributional ω-regular specification φ under distributionally memoryless strategy π. Let μ0,μ1, be the sequence of distributions induced by the MDP under strategy π, and let σ0,σ1, be the induced infinite word from the initial distribution μ0. By the definition of satisfiability of distributional ω-regular specifications in Section 3, the infinite word σ0,σ1, is accepted by the NBA Nφ. Hence, there exists a run q0,q1, in Nφ such that qi+1δ(qi,σi) for each i. But this also means that (q0,μ0),(q1,μ1), is an accepting path in the PDTS 𝒯×=(L×,V×,linit×,θinit×,×), which proves one direction of the proposition.

Conversely, suppose that there exists an accepting infinite path (q0,μ0),(q1,μ1), in the PDTS 𝒯×=(L×,V×,linit×,θinit×,×). Then, by the definition of transition update functions in the PDTS, we know that μ0,μ1, is the sequence of distributions induced by the MDP under strategy π. Moreover, by the definition of transition guards in the PDTS, we know that qi+1δ(qi,σi) for each i with σi being the unique letter defined by atomic propositions in 𝖠𝖯 satisfied in μi. Hence, q0,q1, is an infinite run in the NBA Nφ induced by the infinite word σ0,σ1,. But from the fact that (q0,μ0),(q1,μ1), is an accepting run in the PDTS, it follows that qiF for infinitely many i and so the infinite word σ0,σ1, is accepted by the NBA Nφ. By the definition of satisfiability of distributional ω-regular specifications in Section 3, this implies that MDP with initial distribution μ0 satisfies specification φ under strategy π, which concludes our proof.

4.2 Distributional Certificates

We are now ready to define our notion of distributional certificates. A distributional certificate is a pair (𝒞,I) that consists of two components – a distributional Büchi ranking function 𝒞 and a distributional invariant I. The distributional Büchi ranking function 𝒞:Q×Δ(S) is a function that to each state of the PDTS assigns a real value, which is required to satisfy two conditions. First, the Initial condition requires the value of 𝒞 to be non-negative at all initial states of the PDTS. Second, the Büchi ranking condition requires that, for every reachable state in the PDTS at which the value of 𝒞 is non-negative, there exists at least one successor state at which non-negativity is preserved. Furthermore, the value of 𝒞 decreases by at least 1 if the state is not contained in the accepting set of the PDTS. We prove in Theorem 9 below that these two conditions are necessary and sufficient to ensure that, for every initial state (q0,μ0) in the PDTS, there exists an accepting infinite path in the PDTS.

Note that the Büchi ranking condition needs to be satisfied only at reachable states of the PDTS. However, the problem of determining the exact set of reachable states is not feasible. Hence, with later automation in mind, we append our certificate with a distributional invariant IQ×Δ(S), which is a set that over-approximates the set of reachables states in the PDTS. This is ensured by extending the Initial condition to require that all initial states of the PDTS are contained in the invariant I, and by extending the Büchi ranking condition to require that the successor state described above is also contained in the invariant I.

The following definition formalizes this intuition. In what follows, for each letter σ2𝖠𝖯 and distribution μΔ(S), we write μG(σ) as a shorthand for μ(pσp)(p𝖠𝖯\σ¬p).

Definition 8 (Distributional certificate).

A distributional certificate for an MDP with a set of initial distributions 𝖨𝗇𝗂𝗍, a distributionally memoryless strategy π, and a distributional ω-regular specification φ with NBA Nφ, is a tuple (𝒞,I) consisting of a function 𝒞:Q×Δ(S) and a set IQ×Δ(S), such that the following conditions hold:

  • Initial condition. For all μ0𝖨𝗇𝗂𝗍, we have 𝒞(q0,μ0)0 and (q0,μ0)I.

  • Büchi ranking condition. We have the following:

    • Non-negativity at accepting states. For all NBA states qF and letters σ2𝖠𝖯,

      μ|S|.qδ(q,σ)μΔ(S)μG(σ)𝒞(q,μ)0(q,μ)I𝒞(q,π(μ))0(q,π(μ))I. (1)
    • Strict decrease and non-negativity at non-accepting states. For all NBA states qF and letters σ2𝖠𝖯,

      μ|S|.qδ(q,σ)μΔ(S)μG(σ)𝒞(q,μ)0(q,μ)I𝒞(q,μ)1𝒞(q,π(μ))0(q,π(μ))I. (2)

The following theorem establishes that distributional certificates provide a sound and complete proof rule for proving that an MDP under a distributionally memoryless strategy satisfies a distributional ω-regular specification.

Theorem 9 (Soundness and completeness).

An MDP with a set of initial distributions 𝖨𝗇𝗂𝗍 under a distributionally memoryless strategy π satisfies a distributional ω-regular specification φ if and only if there exists a distributional certificate for , 𝖨𝗇𝗂𝗍, π and φ.

Proof.

Soundness.

Suppose that there exists a distributional certificate (𝒞,I) for , 𝖨𝗇𝗂𝗍, π and φ. To show that φ is satisfied, by Proposition 7 it suffices to show that the PDTS 𝒯× admits an accepting infinite path for every initial state in {q0}×𝖨𝗇𝗂𝗍.

Fix an initial state (q0,μ0){q0}×𝖨𝗇𝗂𝗍. By the Initial condition in Definition 8, we know that 𝒞(q0,μ0)0 and (q0,μ0)I. Hence, by the Büchi ranking condition in Definition 8, we can repeadly select successor states in order to obtain an infinite path (q0,μ0),(q1,μ1), in 𝒯× such that, for each i0, we have

  • 𝒞(qi,μi)0 and (qi,μi)I, and

  • whenever qiF is not an accepting state in Nφ, we have 𝒞(qi,μi)1𝒞(qi+1,μi+1).

We claim that (q0,μ0),(q1,μ1), is an accepting infinite path in 𝒯×. To prove this, note that for every (qi,μi) with qiF, the value of 𝒞 needs to keep decreasing by at least 1 in each subsequent step while also remaining non-negative. Hence, in at most 𝒞(qi,μi) steps, the path must again reach an accepting state. Thus, the infinite path (q0,μ0),(q1,μ1), reaches accepting states in F×Δ(S) infinitely many times and is an accepting infinite path. Since the initial state (q0,μ0){q0}×𝖨𝗇𝗂𝗍 was arbitrary, this concludes the proof.

Completeness.

Conversely, suppose that with a set of initial distributions 𝖨𝗇𝗂𝗍 under distributionally memoryless strategy π satisfies distributional ω-regular specification φ. We construct an instance (𝒞,I) of a distributional certificate for , 𝖨𝗇𝗂𝗍, π and φ as follows.

Consider an arbitrary but throughout fixed enumeration q1,,q|Q| of NBA states. We define an operator Next:Q×Δ(S)Q×Δ(S) via

  • Next(q,μ)=(qi,π(μ)) with i being the smallest index such that (q,μ),(qi,π(μ)) are successor states along some accepting infinite path in the PDTS, if such an accepting infinite path exists, or

  • Next(q,μ)=(q,μ), otherwise.

In other words, Next(q,μ) fixes a successor state of (q,μ) along some accepting infinite path in the PDTS if such a path exists, or halts the sequence at the state (q,μ) otherwise. Therefore, the transitive closure of the operator Next(q,μ) from the set of initial PDTS states {q0}×𝖨𝗇𝗂𝗍 allows us to consistently fix a unique accepting infinite path for each state (q,μ) that is contained along some accepting infinite path.

We can now define our distributional certificate (𝒞,I). Let distributional invariant IQ×Δ(S) be the set of all states in the PDTS that are reachable from {q0}×𝖨𝗇𝗂𝗍 under the transitive closure of the Next operator. Moreover, for each PDTS state (q,μ)I, let daccept(q,μ) denote the number of steps when applying the Next operator until an accepting state in F×Δ(S) is reached, with daccept(q,μ)=0 if (q,μ)F×Δ(S) is an accepting state. Finally, let the distributional Büchi ranking function 𝒞 be defined via

𝒞(q,μ)={daccept(q,μ),if (q,μ)I,1,otherwise.

We claim that (𝒞,I) is an instance of a distributional certificate. The Initial condition in Definition 8 is satisfied since the MDP under strategy μ satisfies φ, therefore every initial state (q0,μ){q0}×𝖨𝗇𝗂𝗍 belongs to some accepting infinite path in the PDTS and so (q0,μ)I and daccept(q0,μ)0. On the other hand, by the definition of the Next operator and I, for each (q,μ)I we have that also Next(q,μ)I. Moreover, Next(q,μ)=daccept(q,μ)10 if qF and Next(q,μ)0 id qF. Hence, the Büchi ranking condition in Definition 8 is also satisfied, and (𝒞,I) is a distributional certificate.

Example 10.

Consider again the MDP in Fig. 1. As in Example 1, consider the strategy π which in state A takes action b with probability 1, and the distributional specification φ=𝐆𝐅(p(B)0.249). An NBA for φ and the resulting PDTS are shown in Fig. 2. The following is an example of a distributional certificate (𝒞,I) for , 𝖨𝗇𝗂𝗍, π and φ:

𝒞(q,μ)={1+250μ(A)+750μ(C),if q=q0,1.251.25μ(C),if q=q1,

and I={(q0,μ)1.25+μ(A)+1μ(B)μ(C)0}{(q1,μ)μ(A)+0.25μ(B)}.

One can verify by inspection that the Initial condition and the Büchi ranking condition in Definition 8 are satisfied. We note that the above distributional certificate (𝒞,I) is the certificate computed by our prototype implementation in Section 6.

 Remark 11 (Distributional certificates for the existential problem).

As discussed in Section 2 and Remark 3, our distributional certificate in Definition 4.2 and our soundness and completeness result in Theorem 9 consider the universal satisfaction setting. However, their extension to the existential setting is immediate. The only required change in the definition of distributional certificates is to require the Initial condition in Definition 4.2 to hold for some initial distribution μ0𝖨𝗇𝗂𝗍. On the other hand, the soundness and completeness proof proceeds analogously as in the proof of Theorem 9, with the difference in the completeness proof being that the distributional invariant I is defined as the transitive closure of the Next operator with the singleton initial set {(q0,μ0)}, rather than the initial set {q0}×𝖨𝗇𝗂𝗍.

5 Template-based Strategy Verification and Synthesis with Certificates

We now present our algorithms for the strategy verification and synthesis problems for distributional ω-regular specifications. The core of the verification algorithm is to synthesize an affine distributional certificate in the PDTS of the input MDP and the specification, which proves that the specification is satisfied. When we move from strategy verification to strategy synthesis, we also synthesize an affine distributionally memoryless strategy.

The restriction to affine distributional certificates and affine distributionally memoryless strategies is needed to ensure tractability. While Theorem 9 establishes soundness and completeness of our distributional certificates, in combination with Remark 2 it also implies that giving a sound and complete algorithm for synthesizing distributional certificates is Skolem-hard. Hence, in this section, we instead focus on designing sound and relatively complete algorithms for synthesizing an affine distributional certificate together with an affine distributionally memoryless strategy (the latter for the synthesis problem), when they exist.

Affine distributional certificates and strategies.

We first formalize the notions of affine distributional certificates and distributionally memoryless strategies:

  • Affine distributional certificates. A distributional certificate (𝒞,I) is said to be affine if both the distributional Büchi ranking function 𝒞 and the distributional invariant I can be expressed in terms of affine expressions and inequalities over the space Δ(S) of probability distributions over MDP states. We require 𝒞 to be of the form

    𝒞(q,μ)=i=1|S|aiqμ(si)+bq, (3)

    where a1q,,a|S|q,bq are some real valued coefficients for each NBA state qQ. That is, for each NBA state q, the function 𝒞(q,) is an affine function over the probabilities of being in each MDP state, with μ(s1),,μ(s|V|) being the variables that capture probabilities of being in each MDP state and a1q,,a|S|q,bq being the coefficients of the affine function. Similarly, we require I to be a set defined by a conjunction of NI affine inequalities over the probabilities of being in each MDP state, i.e. to be of the form

    I={(μ,q)Δ(S)×Qk=1NIIk(q,μ)0}, (4)

    where each Ik(q,μ)=i=1|S|cik,qμ(si)+dk,q0 and c1k,q,,c|S|k,q,dk,q are some real valued coefficients for each NBA state qQ and each k{1,,NI}. The number NI is referred to as the size of the invariant and will be an algorithm parameter.

  • Affine distributionally memoryless strategies. A distributionally memoryless strategy π:Δ(S)Δ(Act) is said to be affine, if for each state sS, action a𝐴𝑐𝑡 and state distribution μΔ(S), the probability of taking action a in state s given the current distribution over states μ is of the form

    π(s,a)(μ)=i=1|S|ei,s,aπμ(si)+fs,aπi=1|S|gi,sπμ(si)+hsπ, (5)

    where e1,s,aπ,,e|S|,s,aπ,fs,aπ and g1,sπ,,g|S|,sπ,hsπ are real valued constants. The denominator is used in order to normalize the probabilities such that the sum of probabilities of all actions being taken at a state s is 1.

Algorithm input.

Both our verification and synthesis algorithms take as input an MDP =(S,𝐴𝑐𝑡,P), a set of initial distributions 𝖨𝗇𝗂𝗍, and a distributional ω-regular specification φ defined over atomic propositions 𝖠𝖯. We assume that the distributional ω-regular specification is provided via an NBA Nφ=(Q,Σ,δ,q0,F) with letters Σ=2𝖠𝖯, which accepts the same set of infinite words over 2𝖠𝖯 as φ. Finally, the algorithms also take as input the size of the invariant NI that needs to be synthesized. The verification algorithm in addition takes as input an affine distributionally memoryless strategy π.

Algorithm overview.

Both verification and synthesis algorithms follow a template-based synthesis approach and proceed in four steps. In the first step, the PDTS of the input MDP and the distributional specification is constructed. In the second step, the algorithms fix a symbolic template for the affine distributional certificate, i.e. symbolic variables for each real valued coefficient in affine expressions that define the certificate. The synthesis algorithm also fixes a symbolic template for the affine distributionally memoryless strategy. In the third step, the algorithms collect a system of constraints over the symbolic template variables, that together encode all defining conditions of distributional certificates in Definition 8 as well as conditions for the strategy template to define a valid distributionally memoryless strategy (the latter for the synthesis algorithm). Finally, in the fourth step, the collected system of constraints is solved by using an SMT-solver, to get a concrete valuation of the symbolic template variables which in turn gives rise to a distributional certificate and a distributionally memoryless strategy. In what follows, we detail each of these four steps.

Step 1: Constructing the PDTS.

In this step, the PDTS 𝒯×=(L×,V×,linit×,θinit×,×) is constructed from the given MDP and the NBA Nφ, as explained in Section 4.1.

Step 2: Fixing templates.

The algorithms fix a template for the affine distributional certificate (𝒞,I), while the synthesis algorithm also fixes a template for the affine distributionally memoryless strategy π. The novelty, compared to prior work on verification and synthesis for distributional reachability and safety specifications [4, 5], lies in a more complex template design for the distributional certificate, which is now defined with respect to the PDTS:

  • Template for 𝒞. Recall that an affine distributional Büchi ranking function is of the form 𝒞(q,μ)=i=1|S|aiqμ(si)+bq as in eq. (3). Hence, the template for 𝒞 is defined by introducing a set of symbolic template variables a1q,,a|S|q,bq for each NBA state qQ.

  • Template for I. Similarly, the template for an affine distributional invariant I is of the form as in eq. (4). Hence, the template for I is defined by introducing a set of symbolic template variables c1k,q,,c|S|k,q,dk,q for each NBA state qQ and each k{1,,NI}, where NI is the algorithm parameter that specifies the size of the invariant.

  • Template for π (synthesis algorithm). The template for an affine distributionally memoryless strategy π is of the form as in eq. (5), hence it is defined by introducing symbolic template variables e1,s,aπ,,e|S|,s,aπ,fs,aπ and g1,sπ,,g|S|,sπ,hsπ for each state sS and action a𝐴𝑐𝑡. Note that, in the special case when we are interested in synthesizing memoryless strategies instead of distributionally memoryless strategies, the strategy template becomes simpler. Instead of the template as in eq. (5), we introduce a single symbolic template variable ps,aπ for each state-action pair sS and a𝐴𝑐𝑡, to encode the probability of taking action a in state s.

Step 3: Collecting constraints.

In this step, the algorithms collect a system of constraints over the symbolic template variables that together encode that 𝒞 and indeed define a valid distributional certificate. For the synthesis algorithm, we also collect a system of constraints that encode that π defines a valid distributionally memoryless strategy. In each of the following constraints, each appearance of 𝒞, and π is replaced by the symbolic template introduced in Step 2, in the form as in eq. (3), (4) and (5). Moreover, we write μΔ(S) for the conjunction of affine inequalities i=1|S|(μi0)(μ1++μ|S|=1).

  • Initial condition. We define

    Φinitμ|S|.μΔ(S)μ𝖨𝗇𝗂𝗍𝒞(q0,μ)0k=1NIIk(q0,μ)0.
  • Büchi ranking condition for accepting states. For each accepting state qF and letter σ2𝖠𝖯 in the NBA, we define

    ΦBüchi,q,σμ|S|.qδ(q,σ)μΔ(S)μG(σ)𝒞(q,μ)0k=1NIIk(q,μ)0𝒞(q,π(μ))0k=1NIIk(π(q,μ))0.
  • Büchi ranking condition for nonaccepting states. For each non-accepting state qQ\F and letter σ2𝖠𝖯 in the NBA in the NBA, we define

    ΦBüchi,q,σμ|S|.qδ(q,σ)μΔ(S)μG(σ)𝒞(q,μ)0k=1NIIk(q,μ)0𝒞(μ,q)1𝒞(π(μ),q)0k=1NIIk(π(q,μ))0.
  • Strategy conditions (synthesis algorithm). For the strategy template to indeed define a valid affine distributionally memoryless strategy, we require that:

    ΦπsS(aActπ(s,a)(μ)=1aAct(π(s,a)(μ)0)).

In the above definitions, note that π(μ) is the one-step successor from distribution μ when policy π is applied in the MDP, computed as: sS,aAct(s)π(s,a)(μ)P(s,a).

Step 4: Constraint solving.

The strategy condition constraint Φπ is a purely existentially quantified Boolean combination of affine inequalities over the symbolic template variables. However, constraints Φinit and ΦBüchi,q,σ are all of the form

μ|S|.i=1mj=1naff-expri,j(t,μ)0l=1kpoly-exprk(t,μ)0,

where t is the vector of all symbolic template variables, aff-expri,j(t,μ)’s are some affine functions and poly-exprk’s are some polynomial functions over the vectors of variables t and μ. Polynomial expressions on the right hand side arise due to the quotients of affine expressions that define affine distributionally memoryless strategies, see eq. (5). Hence, multiplying both sides of the inequality by the affine expressions appearing in denominators results in polynomial expressions over variables in t and μ.

The problem of synthesizing affine distributional certificates and affine distributionally memoryless strategies then reduces to solving a system of constraints that contain quantifier alternation t.μ. Such quantifier alternation over real-valued variables is generally hard to handle directly and can lead to inefficiency in solvers. In order to allow for a more efficient constraint solving, before passing our system of constraints to an SMT-solver, we first apply Handelman’s theorem [27] to translate Φinit and ΦBüchi,q,σ into a purely existentially quantified system of polynomial constraints over the symbolic template variables in t and auxiliary variables introduced by the translation, whose satisfiability implies satisfiability of the original constraints. This translation is common in template-based program analysis, see [7] for details. This step allows for more efficient constraint solving as well as better bound on the algorithm complexity. Finally, the resulting purely existentially quantified system of polynomial constraints over real-valued variables is solved via an SMT solver.

In the special case when we are interested in synthesizing memoryless strategies rather than distributionally memoryless strategies, we may use Farkas’ lemma [25] rather than Handelman’s theorem. This yields a sound and complete translation into an equisatisfiable purely existentially satisfied system of constraints.

Soundness, relative completeness, complexity.

Soundness of our algorithms follows from the soundness of all four steps above, including soundness of the transformations via Handelman’s theorem and Farkas’ lemma [7]. Since the Farkas’ lemma transformation leads to an equisatisfiable system of constraints, it also follows that our algorithm is relatively complete – it is guaranteed to synthesize an affine distributional certificate and memoryless strategy whenever they exist. Finally, our algorithms provide a PSPACE complexity upper bound as they reduce the synthesis and verification problems to solving a sentence in the existential first-order theory of the reals. The following theorem summarizes these results.

Theorem 12.
Soundness:

If the algorithm returns an affine distributional certificate (𝒞,I) and an affine distributionally memoryless strategy π (for the synthesis algorithm), then the MDP with initial distributions 𝖨𝗇𝗂𝗍 under strategy π satisfies specification φ.

Relative completeness:

If there exist an affine distributional certificate (𝒞,I) and a memoryless strategy π, then there exists an invariant size NI such that (𝒞,I) and π are computed by the algorithm.

Complexity:

The runtime of the algorithm is in PSPACE in the size of the encoding of the MDP, NBA Nφ, startegy π (for the verification algorithm) and invariant size NI.

6 Experimental Evaluation

We implemented a prototype of our method in Python 3 and experimentally evaluated it on a number of challenging verification and synthesis tasks collected from the literature on distributional specifications in MDPs. Our prototype takes as input an MDP (in the Prism [30] input format) and an LTL specification. The LTL specification is then translated into an NBA via Spot [24]. For the constraint solving step in our algorithms, we use PolyQEnt [16] which provides a tooling support for quantifier elimination via Farkas’ lemma and Handelman’s theorem. PolyQEnt uses Z3 [23] and MathSAT5 [20] as backend SMT solvers for the final system of purely existentially quantified constraints. We set the invariant size parameter to NI=1, which was sufficient for all our experiments. Our experiments were conducted on consumer-grade hardware (AMD Ryzen 5 5625U CPU, 8GB RAM).

Benchmarks.

We evaluated our method on several examples collected from the literature:

  • GridWorld (synthesis). Motivated by [5], these benchmarks model robot swarms in gridworld environments. Initially, all robots are placed in the top-left corner of the gridworld environment. Some of the cells are covered by walls whereas some are slippery and with certain probability may lead to moving in an undesired direction. Hence, each environment induces an MDP. As shown in [5], the evolution of a robot swarm can be analyzed by taking the distribution transformer view of MDPs and considering how the robots are distributed across the gridworld cells at each time step. In Table 1, we consider 5 gridworld benchmarks of varying sizes and consider two distributional specifications: (1) at least 90% of robots should be in some slippery target cell infinitely often, and (2) in addition, at most 50% of robots should occupy some narrow passage at any point in time.

  • PageRank (verification). We consider a Markov chain representation of the PageRank algorithm taken from [2]. Given the context, we consider various verification tasks, which are of the form: always if the probability mass at some vertex/page is above a threshold, then eventually, it must be above a threhold in another page.

  • Pharamakocinetics (verification) We also consider a 6 state Markov chain from  [2] which is adapted from a Pharmacokinetics example in [11]. We use the two queries that were listed in [2] as the motivating examples to obtain our specifications.

  • Benchmarks from [4] (verification and synthesis). Finally, we collect 3 pairs of verification and synthesis tasks from [4]. In the verification task a strategy is fixed, whereas in the synthesis task one also needs to compute the strategy. While [4] considered distributional safety specifications, we design more complex ω-regular specifications.

Results.

Our experimental results are shown in Table 1. Our results demonstrate that our prototype is able to solve a number of challenging verification and synthesis tasks for distributional ω-regular specifications in MDPs, that were beyond the reach of all existing methods. This is achieved at runtimes that are comparable or even lower than those reported by earlier methods for distributional reachability and safety specifications in [4, 5] on benchmarks of similar size. Hence, even though we consider a significantly more general class of distributional ω-regular specifications, our algorithms do not lead to a significant computational overhead. Moreover, for all our strategy synthesis tasks, our prototype was able to compute memoryless strategies that lead to distributional specification satisfaction. This demonstrates the generality of relative completeness guarantees provided by our algorithms.

We also make some observations. First, as can be seen from runtimes reported in Table 1, the final SMT-solving step is computationally the most expensive step of our algorithms. Constraint generation took at most a few seconds in all cases. Second, we observe that strategy synthesis tasks are generally computationally more expensive, which is expected given that they require synthesizing both the strategy and the distributional certificate. However, in some cases the synthesis problems can also be solved more efficiently. This is demonstrated by the last 6 experiments (CAV’23 in Table 1), where synthesis is achieved at lower runtimes due to our prototype being able to compute a simple memoryless strategy that was easier to verify, compared to the strategy considered in the verification task.

Finally, regarding the invariant size parameter, we used NI=1 because it was sufficient for all our benchmarks. We also ran our prototype tool with NI=2 on 12 of the benchmarks and 8 of them were solved within the timeout of 5 minutes. The timeouts are likely due to the larger size of the final system of constraints. Indeed, given more time, we expect our method can be effectively applied with larger template sizes as well.

Table 1: For each experiment we report, from left to right, the benchmark, specification, task (verification or synthesis), the number of coefficients, the number of constraints, the number of coefficients in PolyQEnt generated file (i.e. after application of Farkas’ lemma), the number of constraints in PolyQEnt generated file, SMT-solving time, and the total runtime.
Model Specification Task Coeff # Const # PQ Coeff # Query # SMT time Total time
GW (3*3) G F “V5>=0.9” Synth 62 41 171 34 <2s 6s
GW (3*3) G F “V5>=0.9” & G “V4<=0.5” Synth 80 45 277 49 <5s <5s
GW (4*4) G F “V11>=0.9” Synth 121 79 286 81 <10s 10s
GW (4*4) G F “V11>=0.9” & G “V9<=0.5” Synth 153 83 448 87 12s 13s
GW (5*5) G F “V19>=0.9” Synth 198 129 435 131 302s 303s
PageRank F G “V2>0.2” Verify 60 13 400 35 63s 64s
PageRank G ( “0.2<=V0” → F “0.2<=V2”) Verify 48 13 253 22 17s 18s
PageRank G ( “0.2<=V2” → F “0.2<=V2”) Verify 48 13 253 22 8s 8s
PageRank G ( “0.2<=V3” → F “0.2<=V2”) Verify 48 13 253 22 44s 45s
PageRank G ( “0.2<=V4” → F “0.2<=V2”) Verify 48 13 253 22 5s 6s
PageRank G “V0>=0.2” | “V1>=0.2” | “V2>=0.2” Verify 60 19 569 44 136s 137s
| “V3>=0.2” | “V4>=0.2” → F “V2=1”
PageRank F “V2=1”→ G “V1<=0.2” Verify 144 35 630 46 6s 6s
Pharmacokinetics F “V4=1” Verify 42 9 176 14 <1s <1s
Pharmacokinetics G (“0.13<=V3<=0.2” | “0<=V3<0.13”) Verify 56 11 365 28 102s 102s
CAV23 [4] G F “V1>=0.249” Verify 16 7 85 13 <2s <2s
CAV23 [4] G F “V1>=0.249” Synth 20 14 108 16 7s <2s
CAV23 [4] “V1>= 0.249” U “V2 >= 0.25” Verify 32 13 185 22 <5s 7s
CAV23 [4] “V1>= 0.249” U “V2 >= 0.25” Synth 36 20 189 25 7s 5s
CAV23 [4] “0.334>=V1>=0.332” U “V0=0.25” Verify 32 17 229 24 <4s 4s
CAV23 [4] “0.334>=V1>=0.332” U “V0=0.25” Synth 36 20 233 28 <1s <1s

7 Conclusion

In this paper, we considered distributional ω-regular specifications in MDPs and addressed the problems of strategy verification and synthesis. We developed new notions of product distributional transition systems between an MDP and an NBA. We then introduced distributional certificates, using which we provided template-based synthesis algorithms for strategy verification and synthesis. Our experiments demonstrate the benefits and promise of our approach. As future work, we would like to go beyond MDPs and consider partially observable MDPs. Moreover, it would be interesting to lift the objectives from NBA to Rabin automata, where even the notion of distributional certificates is unclear.

References

  • [1] Alessandro Abate, Mirco Giacobbe, and Diptarko Roy. Stochastic omega-regular verification and control with supermartingales. In Arie Gurfinkel and Vijay Ganesh, editors, Computer Aided Verification - 36th International Conference, CAV 2024, Montreal, QC, Canada, July 24-27, 2024, Proceedings, Part III, volume 14683 of Lecture Notes in Computer Science, pages 395–419. Springer, 2024. doi:10.1007/978-3-031-65633-0_18.
  • [2] Manindra Agrawal, S. Akshay, Blaise Genest, and P. S. Thiagarajan. Approximate verification of the symbolic dynamics of Markov chains. J. ACM, 62(1):2:1–2:34, 2015. doi:10.1145/2629417.
  • [3] S. Akshay, Timos Antonopoulos, Joël Ouaknine, and James Worrell. Reachability problems for Markov chains. Inf. Process. Lett., 115(2):155–158, 2015. doi:10.1016/J.IPL.2014.08.013.
  • [4] S. Akshay, Krishnendu Chatterjee, Tobias Meggendorfer, and Dorde Zikelic. MDPs as distribution transformers: Affine invariant synthesis for safety objectives. In Constantin Enea and Akash Lal, editors, Computer Aided Verification - 35th International Conference, CAV 2023, Paris, France, July 17-22, 2023, Proceedings, Part III, volume 13966 of Lecture Notes in Computer Science, pages 86–112. Springer, 2023. doi:10.1007/978-3-031-37709-9_5.
  • [5] S. Akshay, Krishnendu Chatterjee, Tobias Meggendorfer, and Dorde Zikelic. Certified policy verification and synthesis for MDPs under distributional reach-avoidance properties. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pages 3–12. ijcai.org, 2024. URL: https://www.ijcai.org/proceedings/2024/1.
  • [6] S. Akshay, Blaise Genest, and Nikhil Vyas. Distribution-based objectives for Markov decision processes. In Anuj Dawar and Erich Grädel, editors, Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2018, Oxford, UK, July 09-12, 2018, pages 36–45. ACM, 2018. doi:10.1145/3209108.3209185.
  • [7] Ali Asadi, Krishnendu Chatterjee, Hongfei Fu, Amir Kafshdar Goharshady, and Mohammad Mahdavi. Polynomial reachability witnesses via stellensätze. In Stephen N. Freund and Eran Yahav, editors, PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, pages 772–787. ACM, 2021. doi:10.1145/3453483.3454076.
  • [8] Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT Press, 2008.
  • [9] Roberto Baldoni, François Bonnet, Alessia Milani, and Michel Raynal. On the solvability of anonymous partial grids exploration by mobile robots. In Theodore P. Baker, Alain Bui, and Sébastien Tixeuil, editors, Principles of Distributed Systems, 12th International Conference, OPODIS 2008, Luxor, Egypt, December 15-18, 2008. Proceedings, volume 5401 of Lecture Notes in Computer Science, pages 428–445. Springer, 2008. doi:10.1007/978-3-540-92221-6_27.
  • [10] Danièle Beauquier, Alexander Moshe Rabinovich, and Anatol Slissenko. A logic of probability with decidable model checking. J. Log. Comput., 16(4):461–487, 2006. doi:10.1093/logcom/exl004.
  • [11] Rohit Chadha, Vijay Anand Korthikanti, Mahesh Viswanathan, Gul Agha, and YoungMin Kwon. Model checking MDPs with a unique compact invariant set of distributions. In Eighth International Conference on Quantitative Evaluation of Systems, QEST 2011, Aachen, Germany, 5-8 September, 2011, pages 121–130. IEEE Computer Society, 2011. doi:10.1109/QEST.2011.22.
  • [12] Aleksandar Chakarov and Sriram Sankaranarayanan. Probabilistic program analysis with martingales. In Natasha Sharygina and Helmut Veith, editors, Computer Aided Verification - 25th International Conference, CAV 2013, Saint Petersburg, Russia, July 13-19, 2013. Proceedings, volume 8044 of Lecture Notes in Computer Science, pages 511–526. Springer, 2013. doi:10.1007/978-3-642-39799-8_34.
  • [13] Krishnendu Chatterjee, Hongfei Fu, and Amir Kafshdar Goharshady. Termination analysis of probabilistic programs through positivstellensatz’s. In CAV (1), volume 9779 of Lecture Notes in Computer Science, pages 3–22. Springer, 2016. doi:10.1007/978-3-319-41528-4_1.
  • [14] Krishnendu Chatterjee, Hongfei Fu, Amir Kafshdar Goharshady, and Ehsan Kafshdar Goharshady. Polynomial invariant generation for non-deterministic recursive programs. In Alastair F. Donaldson and Emina Torlak, editors, Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, pages 672–687. ACM, 2020. doi:10.1145/3385412.3385969.
  • [15] Krishnendu Chatterjee, Hongfei Fu, Petr Novotný, and Rouzbeh Hasheminezhad. Algorithmic analysis of qualitative and quantitative termination problems for affine probabilistic programs. TOPLAS, 40(2):7:1–7:45, 2018. doi:10.1145/3174800.
  • [16] Krishnendu Chatterjee, Amir Kafshdar Goharshady, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, Milad Saadat, Maximilian Seeliger, and Đorđe Žikelić. Polyqent: A polynomial quantified entailment solver, 2025. arXiv:2408.03796.
  • [17] Krishnendu Chatterjee, Amir Kafshdar Goharshady, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, and Dorde Zikelic. Sound and complete witnesses for template-based verification of LTL properties on polynomial programs. In André Platzer, Kristin Yvonne Rozier, Matteo Pradella, and Matteo Rossi, editors, Formal Methods - 26th International Symposium, FM 2024, Milan, Italy, September 9-13, 2024, Proceedings, Part I, volume 14933 of Lecture Notes in Computer Science, pages 600–619. Springer, 2024. doi:10.1007/978-3-031-71162-6_31.
  • [18] Krishnendu Chatterjee, Amir Kafshdar Goharshady, Tobias Meggendorfer, and Dorde Zikelic. Sound and complete certificates for quantitative termination analysis of probabilistic programs. In CAV (1), volume 13371 of Lecture Notes in Computer Science, pages 55–78. Springer, 2022. doi:10.1007/978-3-031-13185-1_4.
  • [19] Krishnendu Chatterjee, Petr Novotný, and Đorđe Žikelić. Stochastic invariants for probabilistic termination. In POPL, pages 145–160, 2017. doi:10.1145/3009837.3009873.
  • [20] Alessandro Cimatti, Alberto Griggio, Bastiaan Schaafsma, and Roberto Sebastiani. The MathSAT5 SMT Solver. In Nir Piterman and Scott Smolka, editors, Proceedings of TACAS, volume 7795 of LNCS. Springer, 2013. doi:10.1007/978-3-642-36742-7_7.
  • [21] Michael Colón, Sriram Sankaranarayanan, and Henny Sipma. Linear invariant generation using non-linear constraint solving. In Warren A. Hunt Jr. and Fabio Somenzi, editors, Computer Aided Verification, 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003, Proceedings, volume 2725 of Lecture Notes in Computer Science, pages 420–432. Springer, 2003. doi:10.1007/978-3-540-45069-6_39.
  • [22] Michael Colón and Henny Sipma. Synthesis of linear ranking functions. In Tiziana Margaria and Wang Yi, editors, Tools and Algorithms for the Construction and Analysis of Systems, 7th International Conference, TACAS 2001 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2001 Genova, Italy, April 2-6, 2001, Proceedings, volume 2031 of Lecture Notes in Computer Science, pages 67–81. Springer, 2001. doi:10.1007/3-540-45319-9_6.
  • [23] Leonardo Mendonça de Moura and Nikolaj S. Bjørner. Z3: an efficient SMT solver. In TACAS, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008. doi:10.1007/978-3-540-78800-3_24.
  • [24] Alexandre Duret-Lutz, Etienne Renault, Maximilien Colange, Florian Renkin, Alexandre Gbaguidi Aisse, Philipp Schlehuber-Caissier, Thomas Medioni, Antoine Martin, Jérôme Dubois, Clément Gillard, et al. From spot 2.0 to spot 2.10: What’s new? In International Conference on Computer Aided Verification, pages 174–187. Springer, 2022.
  • [25] Julius Farkas. Theorie der einfachen ungleichungen. Journal für die reine und angewandte Mathematik (Crelles Journal), 1902(124):1–27, 1902.
  • [26] Yulong Gao, Alessandro Abate, Lihua Xie, and Karl Henrik Johansson. Distributional reachability for Markov decision processes: Theory and applications. IEEE Trans. Autom. Control., 69(7):4598–4613, 2024. doi:10.1109/TAC.2023.3341282.
  • [27] David Handelman. Representing polynomials by positive linear functions on compact convex polyhedra. Pacific Journal of Mathematics, 132(1):35–62, 1988.
  • [28] Thomas A. Henzinger, Maria Mateescu, and Verena Wolf. Sliding window abstraction for infinite Markov chains. In Ahmed Bouajjani and Oded Maler, editors, Computer Aided Verification, 21st International Conference, CAV 2009, Grenoble, France, June 26 - July 2, 2009. Proceedings, volume 5643 of Lecture Notes in Computer Science, pages 337–352. Springer, 2009. doi:10.1007/978-3-642-02658-4_27.
  • [29] Vijay Anand Korthikanti, Mahesh Viswanathan, Gul Agha, and YoungMin Kwon. Reasoning about MDPs as transformers of probability distributions. In QEST 2010, Seventh International Conference on the Quantitative Evaluation of Systems, Williamsburg, Virginia, USA, 15-18 September 2010, pages 199–208. IEEE Computer Society, 2010. doi:10.1109/QEST.2010.35.
  • [30] Marta Z. Kwiatkowska, Gethin Norman, and David Parker. PRISM 4.0: Verification of probabilistic real-time systems. In CAV, volume 6806 of Lecture Notes in Computer Science, pages 585–591. Springer, 2011. doi:10.1007/978-3-642-22110-1_47.
  • [31] YoungMin Kwon and Gul A. Agha. Verifying the evolution of probability distributions governed by a DTMC. IEEE Trans. Software Eng., 37(1):126–141, 2011. doi:10.1109/TSE.2010.80.
  • [32] Joël Ouaknine and James Worrell. Decision problems for linear recurrence sequences. In Alain Finkel, Jérôme Leroux, and Igor Potapov, editors, Reachability Problems - 6th International Workshop, RP 2012, Bordeaux, France, September 17-19, 2012. Proceedings, volume 7550 of Lecture Notes in Computer Science, pages 21–28. Springer, 2012. doi:10.1007/978-3-642-33512-9_3.
  • [33] Stephen Prajna, Ali Jadbabaie, and George J. Pappas. A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control., 52(8):1415–1428, 2007. doi:10.1109/TAC.2007.902736.
  • [34] Toru Takisaka, Yuichiro Oyabu, Natsuki Urabe, and Ichiro Hasuo. Ranking and repulsing supermartingales for reachability in randomized programs. ACM Trans. Program. Lang. Syst., 43(2):5:1–5:46, 2021. doi:10.1145/3450967.
  • [35] Dorde Zikelic, Mathias Lechner, Thomas A. Henzinger, and Krishnendu Chatterjee. Learning control policies for stochastic systems with reach-avoid guarantees. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 11926–11935. AAAI Press, 2023. doi:10.1609/AAAI.V37I10.26407.