Abstract 1 Introduction 2 Preliminaries 3 A Gadget for Transforming SPGs into SSGs 4 Reducing SPGs to SSGs 5 Epilogue References Appendix A Appendix

A Direct Reduction from Stochastic Parity Games to Simple Stochastic Games

Raphaël Berthon ORCID RWTH Aachen University, Germany Joost-Pieter Katoen ORCID RWTH Aachen University, Germany Zihan Zhou ORCID National University of Singapore, Singapore
Abstract

Significant progress has been recently achieved in developing efficient solutions for simple stochastic games (SSGs), focusing on reachability objectives. While reductions from stochastic parity games (SPGs) to SSGs have been presented in the literature through the use of multiple intermediate game models, a direct and simple reduction has been notably absent. This paper introduces a novel and direct polynomial-time reduction from quantitative SPGs to quantitative SSGs. By leveraging a gadget-based transformation that effectively removes the priority function, we construct an SSG that simulates the behavior of a given SPG. We formally establish the correctness of our direct reduction. Furthermore, we demonstrate that under binary encoding this reduction is polynomial, thereby directly corroborating the known NPcoNP complexity of SPGs and providing new understanding in the relationship between parity and reachability objectives in turn-based stochastic games.

Keywords and phrases:
stochastic games, parity, reduction
Funding:
Raphaël Berthon: Funded by DFG Project POMPOM (KA 1462/6-1).
Copyright and License:
[Uncaptioned image] © Raphaël Berthon, Joost-Pieter Katoen, and Zihan Zhou; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Probabilistic computation
Related Version:
Full Version: https://arxiv.org/abs/2506.06223 [6]
Editors:
Patricia Bouyer and Jaco van de Pol

1 Introduction

Stochastic games (SGs) are a broadly used framework for decision-making under uncertainty with a well-defined objective. They aim at modeling two important aspects under which sequential decisions must be made: turn-based interaction with an adversary environment, and dealing with randomness. Stochastic games have been introduced long ago by Shapley [40] as important models, and many variations have been shown to be inter-reducible and in NPcoNP [18, 2, 9]. SGs find their applications in various fields, including artificial intelligence [33], economics [1], operations research [20], or providing tools to graph theory [41]. Moreover, Markov Decision Processes (MDP) [38], a foundational framework for modeling decision-making in stochastic environments, are special cases of SGs, where one of the players has no states under control.

While we make use of simple stochastic games (SSGs) with reachability objectives, we focus on the specific case of stochastic parity games (SPGs), which are zero-sum, and where the set of winning runs is ω-regular. Solving such a game consists in finding an optimal strategy and determining its winning probability.

Take, for instance, the case of a market competition, where two firms Alpha and Beta try to expand their market share. From the standpoint of each firm, the other acts as a direct competitor, and therefore we assume the players are adversarial. States represent the relative valuation of the firms over a sustained period of time, and business decisions (made by either firm Alpha or Beta) and the fluctuation of market shares, policy changes, and external forces (modeled as randomness) lead to transitions between game states. Firm Alpha is interested in keeping its share above a set key threshold, say 50%. We distinguish between three priorities, leading to a zero-sum game:

  • Priority 0: Alpha’s market share is significantly above 50%.

  • Priority 1: Alpha’s market share is significantly below 50%.

  • Priority 2: Alpha’s market share fluctuates around 50%. While this is sustainable without major fluctuation, this is not sustainable if the only other fluctuation is Alpha’s share regularly dropping below 50% (priority 1).

If the minimum priority visited infinitely often is 0, Alpha can manage in the long term to regularly dominate the market, recovering any loss that occurred in the meantime. If the minimum priority visited infinitely often is 1, despite any temporary success, Alpha’s market share will stay near or below 50% in the long run, which is not sustainable for the firm. Finally, if the minimum priority visited infinitely often is 2, Alpha and Beta will eventually find an even balance point.

A variety of algorithms have been considered for (reachability) SSGs [17, 21, 3, 37], which we present in our related work section below. In Markov chains (MCs), ω-regular objectives can be reduced directly to reachability objectives [4]. A similar reduction exists from MDPs with ω-regular objectives to reachability objectives and has been used extensively, for example in [22]. This means that solvers often focus on optimizing specifically the computation of reachability probabilities. Such a direct reduction is lacking for SPGs. To reduce quantitative SPGs to SSGs, some intermediate steps are necessary, via a reduction to stochastic mean-payoff and stochastic discounted-payoff games [11, 2] (see the lower part of Figure 1), making this approach less appealing. For qualitative solutions, a translation via deterministic parity games (i.e. with no random states) exists [13, 14, 9], see the upper part of Figure 1.

Figure 1: Reducing SPGs to SSGs.
Outline and Contribution

In this paper, we propose a direct reduction from SPGs to SSGs with a reachability objective (see the solid arrow in Figure 1). To that end, we leverage a gadget whose structure comes from [9], but where we use new probability values, to reduce deterministic parity games to quantitative SSGs. Given an SPG G, where a parity condition is satisfied if the minimum priority seen infinitely often is even, we show in Section 3.1 how to use the gadget to transform G into an SSG G~. This introduces two new sink states, one winning and the other losing for the reachability objective. Parity values are removed, and every transition going to a state that used to be even (respectively odd) now has a small chance to go to the winning (resp. losing) sink. We scale the probabilities, with lower parity values yielding a higher chance to go to a sink. Theorem 10 ensures that any optimal strategy in G~ is also optimal in G (the reciprocal may not be true). We can then compute an optimal memoryless strategy in G~, and compute its value in G.

We show in Theorem 13 that under binary encoding, our reduction is polynomial. We thus reobtain in a direct way the classical result that solving quantitative SPGs is in the complexity class NPcoNP [2, 9]. While the complexity remains the same as for existing algorithms, and the values used in the reduction make it unlikely to be very efficient in practice, this new approach implies that any efficient SSG solver can be used for SPGs. The direct reduction was already conjectured to exist by Chatterjee and Fijalkow in [9], but as expected, proving its correctness is challenging, and involves the computations of very precise probability bounds. Despite the inspiration drawn from a known gadget, the technical depth of this paper resides in the intricate and novel proofs for the correctness of our reduction. In addition, our direct reduction gives new insights into the relationship between SSGs and SPGs.

In Section 2, we present the necessary background knowledge. In Section 3, we define the gadget and present some related results, which we use in Section 4 to define the reduction properly, and to show its correctness and its complexity. We give some concluding remarks in Section 5.

Related Work

Stochastic parity games, mean-payoff games and discounted payoff games can all be reduced to SSGs [26, 43], and this also applies to their stochastic extensions, namely stochastic parity games [9], stochastic mean payoff games and stochastic discounted payoff games [2]. SSGs also find their applications in the analysis of MDPs, serving as abstractions for large MDPs [28]. The amount of memory required to solve stochastic parity games has been studied in [7].

Various extensions have been considered within this family of inter-reducible stochastic games. Introducing more than two players allows for the analysis of Nash equilibria [15, 42]. Using continuous states can provide tools to represent timed systems [34]. Multi-objective approaches have been employed to synthesize systems that balance average expected outcomes with worst-case guarantees [16]. Parity objectives are significant in many of these scenarios where long-run behavior is relevant, but the classical reduction to SSGs cannot be directly applied.

Common approaches to solving SSGs, as presented in [17], include value iteration (VI), strategy iteration (SI), and quadratic programming, but are all exponential in the size of the SSG. These approaches have been widely studied on MDPs, where recent advancements have been made to apply VI with guarantees, using interval VI [5], sound VI [39], and optimistic VI [24]. Interestingly, optimistic VI does not require an a priori computation of starting vectors to approximate from above. Similar ideas have been lifted to SSGs: Eisentraut et al. [21] introduce a VI algorithm for under- and over-approximation sequences, as well as the first practical stopping criterion for VI on SSGs. Optimistic VI has been adapted to SSGs [3], and a novel bounded VI with the concept of widest path has been introduced in [37]. A comparative analysis [29] suggests VI and SI are more efficient. Storm [25] and PRISM [31] are two popular model checkers incorporating different variants of VI and SI, and both employ VI as the default algorithm for solving MDPs. PRISM-games [30] exploits VI for solving SSGs.

For SPGs, we distinguish three main approaches. Chatterjee et al. [10] use a strategy improvement algorithm requiring randomized sub-exponential time. With n game states and d priorities, the expected running time is in 2O(dnlog(n)). The probabilistic game solver GIST [12] reduces qualitative SPGs to deterministic parity games (DPG), and benefits from several quasi-polynomial algorithms for DPGs [27, 36, 32] since the breakthrough made by Calude et al. [8], but this approach is unlikely to achieve polynomial running time [19]. Hahn et al. [23] reduce SPGs to SSGs, allowing the use of reinforcement learning to approximate the values without knowing the game’s probabilistic transition structure. Their reduction is only proven correct in the limit.

2 Preliminaries

Our notations on Markov chains and stochastic games on graphs mainly come from [4].

2.1 Discrete-Time Markov Chains

A discrete distribution over a countable set 𝒜 is a function μ:𝒜0 with a𝒜μ(a)=1. The support of the discrete distribution μ is defined as supp(μ){a𝒜|μ(a)>0}. We denote the set of all discrete distributions over 𝒜 with 𝔻(𝒜).

A discrete-time Markov Chain (MC) is a tuple =(V,δ,vI) where V is a finite set of states, δ:V𝔻(V) is a probabilistic transition function, and vIV is the initial state. Given δ(v)=μ with μ(v)=p, we write δ(v,v)=p. For SV and vV, let δ(v,S)=sSδ(v,S).

An infinite sequence π=v0v1Vω is an infinite path through MC if δ(vi,vi+1)>0 for all i. We denote all infinite paths that start from state vV with Paths(v). Prefixes of infinite path π=v0v1Vω are {v0vi|i} and are finite paths. We denote all finite paths that start from state vV with Paths(v). The set of infinitely often visited states in π=v0v1Vω is defined as inf(π)={vV|n,k s.t. vn+k=v}.

The probability Pr of a finite path π=v0v1vnV is given by i[0,n1]δ(vi,vi+1). The set of infinite paths that start with a given finite path is called a cylinder, and as in [4], we extend the probability of cylinders in a unique way to all measurable sets of Vω.

Reachability Probabilities

Let =(V,δ,vI) be an MC. For target states TV and starting state v0V, the event of reaching T is defined as 𝑅𝑒𝑎𝑐ℎ(T)={v0v1Vω|i,viT}. The probability to reach T from v0 is defined as Prv0(𝑅𝑒𝑎𝑐ℎ(T))=Pr({π^|π^Paths(v0)((V\T)T)}).

Let variable xv denote the probability of reaching T from any vV. Whether T is reachable from a given state v can be determined using standard graph analysis. Let Pre(T) denote the set of states from which T is reachable. If vPre(T), then xv=0. If vT, then xv=1. Otherwise, xv=uV\Tδ(v,u)xu+wTδ(v,w). This is equivalent to a linear equation system, formalized as follows:

Theorem 1 (Reachability Probability of Markov Chains [4]).

Given MC =(V,δ,vI) and target states TV, let VQ=Pre(T)\T, 𝐀=(δ(v,v))v,vVQ and 𝐛=(bv)vVQ=(δ(v,T))vVQ. Then, the vector 𝐱=(xv)vVQ with xv=Prv(𝑅𝑒𝑎𝑐ℎ(T)) is the unique solution of the linear equation system 𝐱=𝐀𝐱+𝐛.

Limit Behavior

Let MC =(V,δ,vI). A set LV is strongly connected if for all pairs of states v,vL, v and v are mutually reachable. Hence a singleton {v} is strongly connected if δ(v,v)>0. Set LV is a strongly connected component (SCC) if it is maximally strongly connected, i.e., there does not exist another set LV and LL such that L is strongly connected. LV is a bottom SCC (BSCC) if L is a SCC and there is no transition leaving L, i.e., there does not exist vL,vV\L such that δ(v,v)>0. We denote the set of BSCCs in MC with BSCC().

The limit behavior of an MC regarding the infinitely often visited states is captured by the following theorem.

Theorem 2 (Limit behavior of Markov Chains [4]).

For MC =(V,δ,vI), it holds that Pr{πPaths(vI)|inf(π)BSCC()}=1.

2.2 Stochastic Games

A stochastic arena G is a tuple G=((V,E),(V,V,VR),Δ), where (V,E) is a directed graph, with a finite set of vertices V, partitioned as VVVR=V, and a set of edges EV×V. The probabilistic transition function Δ is such that for all vrVR, Δ(vr) is a distribution over V, and for vVV, v, we have (vr,v)E if and only if vsupp(Δ(vr)). We usually uncurry Δ(vr)(v) and write Δ(vr,v) .

Without loss of generality, we assume each vertex has at least one successor. This property is called non-blocking. The finite set V of vertices is partitioned into three sets: V – vertices where Eve chooses the successor, V – vertices where Adam chooses the successor, and VR are the random vertices. A stochastic arena is a Markov Decision Process (MDP) if either V= or V=, and an MC if both V= and V=.

Figure 2 illustrates a stochastic arena G. Square-shaped vertex v3 is a vertex in V where Eve chooses the successor, pentagon-shaped vertex v4 is in V where Adam chooses the successor, and the circular vertices VR={v0,v1,v2,v5} are random. Edges from random vertices are annotated with probabilities from Δ.

Figure 2: An example stochastic arena G.
Strategies

Let G=((V,E),(V,V,VR),Δ) be a stochastic arena. A strategy σ of Eve is a function σ:VV𝔻(V), such that for all v0v1vnVV, we have σ(v0v1vn,vn+1)>0 implies (vn,vn+1)E. A strategy γ of Adam is defined analogously. We denote the sets of all strategies of Eve and Adam with ΣA and ΣA respectively.

A strategy σ of Eve is a pure memoryless strategy, if for all w,wV and vV, σ(wv)=σ(wv) and the support of this distribution is a singleton. A pure memoryless strategy γ of Adam is defined analogously. We denote the sets of pure memoryless strategies of Eve and Adam with Σ and Σ respectively.

In a stochastic arena G, when Eve and Adam follow pure memoryless strategies σΣ and γΣ respectively, the arena Gσ,γ=((V,E),(V,V,VR),Δ) results. Here, the new edge set E is such that for all uV, (u,v)E if and only if σ(u)=v, and for all uV, (u,v)E if and only if γ(u)=v. We refer to such arenas obtained by fixing pure memoryless strategies as sub-arenas. In fact, given a fixed starting vertex vIV, we often view the sub-arena Gσ,γ as an MC σ,γ=(V,δ,vI), where the state space is the vertex set V in G, and the transition function δ combines deterministic moves indicated by strategies σ and τ, and the transition function Δ defined on random vertices:

δ(u,v)={Δ(u,v)if uVR,or uV,σ(u)=v or uV,γ(u)=v0otherwise

We continue with the stochastic arena G from Figure 2. Fixing strategy σ=[v3v5] for Eve and γ=[v4v5] for Adam induces the sub-arena Gσ,γ, as shown in Figure 3.

Figure 3: Sub-arena Gσ,γ induced by strategies σ=[v4v5] and γ=[v3v5].
Winning Objectives

A play of G is an infinite sequence of vertices π=v0v1Vω where for all i, (vi,vi+1)E. We denote the set of all plays of G with ΠG, or in short Π when G is clear from the context.

Let G be a stochastic arena. A winning objective for Eve is defined as a set of plays 𝒪Π. As we study zero-sum games, the winning objectives of the two players are complementary. The winning objective for Adam is thus Π\𝒪. A play π satisfies an objective 𝒪 if π𝒪, and is a winning play of Eve. A winning play π of Adam satisfies Π\𝒪.

A reachability objective asserts that the play in G has to reach target vertices TV, formally given by 𝑅𝐸(T)={v0v1Π|k,vkT}. If T={v} for some vertex v, we simply write 𝑅𝐸(v).

Let p:V be a priority function which assigns a priority p(v) to each vertex vV. For TV, let p(T)={p(t)|tT}. A parity objective asserts that the minimum priority visited infinitely often along an infinite path is even: 𝑃𝐴(p)={π=v0v1Π|min(p(inf(π))) is even}.

We formally define stochastic games as follows:

Definition 3 (Stochastic Games).

Let G=((V,E),(V,V,VR),Δ) be a stochastic arena. A stochastic game (SG) with winning objective ΦΠ is defined as (G,Φ). If Φ is a reachability or parity objective, (G,Φ) is a stochastic reachability game (SRG) or stochastic parity game (SPG) respectively. SRGs are also referred to as simple stochastic games (SSG). When the winning objective is clear from the context, we refer to G as a stochastic game.

Solving stochastic games

Let (G,Φ) be an SG, and let Eve and Adam follow strategies σΣA and γΣA. Given a starting vertex vV, the probability for play π to satisfy Φ – the probability for Eve to win – is denoted σ,γv(Φ). The probability for Adam to win is σ,γv(Π\Φ).

Let the value of a vertex v be the maximal probability of generating a play from v that satisfies Φ, formally defined using a value function E(Φ)(v)=supσΣAinfγΣAσ,γv(Φ) for Eve, and A(Π\Φ)(v)=supγΣAinfσΣAσ,γv(Π\Φ) for Adam. A strategy σ for Eve is optimal from vertex v if infγΣAσ,γv(Φ)=E(Φ)(v). Optimal strategies for Adam are defined analogously.

We divide solving stochastic games into three distinct tasks. Given an SG, solving the SG quantitatively amounts to computing the values of all vertices in the arena. Solving the SG strategically amounts to computing an optimal strategy of Eve (or Adam) for the game.

Since for both SSGs and SPGs, solving quantitatively and strategically is polynomially equivalent [2], we just say “solving” in what follows. We mainly consider quantitative solving, but Theorem 10 applies to both quantitative and strategic solving.

Determinacy

Determinacy refers to the property of an SG where both players, Eve and Adam, have optimal strategies, meaning they can guarantee to achieve the values of the game, regardless of the strategies employed by the other player. Pure memoryless determinacy means that both players have pure memoryless optimal strategies.

Theorem 4 (Pure Memoryless Determinacy [35]).

Let (G,Φ) be an SG, where Φ is a reachability or parity objective. For all vV, it holds that E(Φ)(v)+A(Π\Φ)(v)=1. Pure memoryless optimal strategies exist for both players from all vertices.

When Eve and Adam follow pure memoryless strategies σΣ and γΣ respectively, we obtain sub-arena Gσ,γ, which can be seen as the MC σ,γ. We can reduce the winning probabilities σ,γvI to reachability probabilities in Gσ,γ as follows. Given a reachability objective 𝑅𝐸(T), σ,γvI(𝑅𝐸(T))=Prσ,γvI(𝑅𝑒𝑎𝑐ℎ(T)). Given a parity objective 𝑃𝐴(p), , i.e. (G,𝑃𝐴(p)), we call BBSCC(σ,γ) an even BSCC if min(p(B)) is even, meaning intuitively the smallest priority of its vertices is even. Odd BSCCs are defined analogously. Then σ,γvI(𝑃𝐴(p))=Prσ,γvI(𝑅𝑒𝑎𝑐ℎ(BE)), where BE=min(p(B)) is evenBBSCC(σ,γ).

Corollary 5 (Sufficiency of Pure Memoryless Strategies [11]).

Let G=((V,E),(V,V,VR),Δ) be a stochastic arena, 𝑅𝐸(T) a reachability objective, and 𝑃𝐴(p) a parity objective. For all vertices vV, it holds:

  • E(𝑅𝐸(T))(v)=supσΣAinfγΣAσ,γv(𝑅𝐸(T))=supσΣinfγΣPrσ,γv(𝑅𝑒𝑎𝑐ℎ(T))

  • E(𝑃𝐴(p))(v)=supσΣAinfγΣAσ,γv(𝑃𝐴(p))=supσΣinfγΣPrσ,γv(𝑅𝑒𝑎𝑐ℎ(BE))
    where BE=min(p(B)) is evenBBSCC(σ,γ).

Therefore we consider only pure memoryless strategies in the sequel, unless stated otherwise.

3 A Gadget for Transforming SPGs into SSGs

The aim of this paper is to reduce an SPG (G,𝑃𝐴(p)) to an SSG (G~,𝑅𝐸(vwin)) such that the probability of reaching target vertex vwin in this SSG is related to the probability of winning in the SPG (G,𝑃𝐴(p)). As an important step toward this goal, we introduce in this section a gadget that expands each transition of G while removing the priority function.

Let G=((V,E),(V,V,VR),Δ) be a stochastic arena, p:V be a priority function, and (G,𝑃𝐴(p)) be an SPG. Section 3.1 presents the gadget enabling the reduction from SPG to SSG (G~,𝑅𝐸(vwin)). We then analyze how probabilistic events in G~ are related to those in G. Section 3.2 presents a bound on the probability of reaching BSCCs in G~. Section 3.3 provides a bound on the winning probability once a BSCC in G~ is reached, while Section 3.4 gives interval bounds on the winning probabilities in G~ with regard to those in G.

3.1 Gadget Construction

To reduce the parity objective to a reachability objective, we transform the SPG (G,𝑃𝐴(p)) into the SSG (G~,𝑅𝐸(vwin)) by means of a gadget, whose structure was defined by Chatterjee and Fijalkow in [9] to reduce deterministic parity games (DPG) to SSGs.

When both players’ strategies are fixed in a DPG, every play forms a lasso, i.e. a finite simple path that ends in a simple cycle. Chatterjee and Fijalkow’s analysis proceeds by examining lasso plays to choose transition probabilities of the gadget. The specific probabilities they introduce do not extend to a reduction from SPGs to SSGs, because in our stochastic setting, a lasso lifts to a more complicated structure, namely an induced Markov chain: the simple path becomes the transient segment, and each cycle becomes a BSCC. Hence, our approach makes use of smaller probability values, requiring more representation space.

The structure of the gadget remains applicable, and its intuition is as follows: whenever a play visits a vertex with even priority in G, give a small but positive chance to reach a winning sink in G~. Vertices with odd priority yield a small chance to reach a losing sink. Finally, to represent that smaller priorities have precedence over larger ones, the probability of reaching a sink from a vertex depends on the priority it is associated to. We introduce a monotonically decreasing function α for this purpose.

Figure 4: The gadgets for reducing SPG G (left) to SSG G~ (right).

We obtain the stochastic arena G~ by modifying G as indicated in Figure 4. Each vertex v in G is duplicated in G~ yielding vertices v^ and v¯. A transition Δ(u,v) in G is replaced by first moving to v^, which can then either evolve to a sink with probability αp(v), or to the copy v¯ with the complementary probability. Depending on p being even or odd, the sink is vwin or vlose.

Formally, for UV, let U¯={v¯|vU}, U^={v^|vU} and U~=U¯U^. We define the arena G~=((V~{vwin,vlose},E~),(V¯,V¯,V¯RV^{vwin,vlose}),Δ~) where the new edge set E~ is as follows: E~={(u¯,v^)|(u,v)E}{(v^,v¯),(v^,vwin)|vV,p(v)is even}{(v^,v¯),(v^,vlose)|vV,p(v)is odd}{(vwin,vwin),(vlose,vlose)}.

To define the new transition function Δ~, let α:[0,1] where αi represents the probability of entering the winning (resp. losing) sink before visiting a vertex with even (resp. odd) priority i. We give suitable values for α later, in Lemma 11 on page 11. Now, we define Δ~:V~×V~[0,1] as follows:

Δ~(u~,w~)={Δ(u,w)if u~V¯,w~V^1αp(u)if u~V^,w~V¯,u=wαp(u)if u~V^,p(u) is even, w~=vwinαp(u)if u~V^,p(u) is odd, w~=vlose1if u~=w~=vwin or u~=w~=vlose0otherwise

When the context is clear, we also address the SPG (G,𝑃𝐴(p)) and the SSG (G~,𝑅𝐸(vwin)) with G and G~ respectively.

Since all new vertices V^{vwin,vlose} are random vertices, a strategy of either player in SPG G is a strategy in SSG G~ and vice versa. That is, there is a one-to-one relationship between strategies in G and G~. Hence, to keep notations simpler, we do not distinguish between strategies in G and G~.

A pair of strategies σ,γΣ×Σ for Eve and Adam in G induces the sub-arena Gσ,γ. Similarly, we obtain G~σ,γ. If U is an even or odd BSCC in SPG Gσ,γ, we denote with U~ what we call the associated even pBSCC or odd pBSCC in SSG G~σ,γ respectively. While those are not BSCCs, they correspond to the BSCC of the associated parity game, and we never consider the only true BSCCs of G~σ,γ, i.e. {vwin} and {vlose}.

Figure 5: The sub-arena G~σ,γ induced by the gadget.

We continue with the example in Figure 2 and Figure 3. For the vertices v0, v1, and v2, we assign priority 0, and for each remaining vertex vi, i{3,4,5}, we assign priority i. An illustration of the corresponding sub-arena G~σ,γ, induced by our gadget construction, is provided in Figure 5. Since we do not need to distinguish between different types of vertices, we use circles for all vertices. In Gσ,γ, the set {v3,v4,v5} forms an odd BSCC. We refer to the set {v^3,v¯3,v^4,v¯4,v^5,v¯5} in G~σ,γ as the associated odd pBSCC with v^3, v^4, and v^5 having outgoing transitions to either vwin or vlose.

3.2 Before Entering a pBSCC in SSG 𝑮~

Recall that when Eve and Adam follow pure memoryless strategies σΣ and γΣ, the resulting sub-arenas Gσ,γ and G~σ,γ can be viewed as finite Markov chains. We first focus on what happens before a play reaches a pBSCC in G~σ,γ. Specifically, we give a lower bound on the probability of reaching an entry state of a pBSCC without entering a winning or losing sink. Later, in Lemma 11, we use this bound to determine a suitable value for α0.

Intuitively, we show that the probability of entry is minimized in a classical worst-case scenario extending the sub-arena {v0,v1,v2} in Figure 3, and {v^0,v¯0,v^1,v¯1,v^2,v¯2} in Figure 5. More precisely, we consider an original sub-arena Kσ,γ, depicted in Figure 6(a) with n states arranged in a sequence (as {v0,v1,v2} in Figure 3) before reaching a BSCC. Each state has the minimal probability δmin of progressing to the next state and a maximal probability 1δmin of returning to the initial state. All these states have parity value 0. Upon applying our gadget construction to obtain K~σ,γ in Figure 6(b), we introduce random states ({v^0,v^1,v^2} in Figure 5) that have the highest probability α0 of going to the winning sink vwin.

(a) The original sub-arena Kσ,γ.
(b) The sub-arena K~σ,γ induced by the gadget, where Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ) is minimized.
Figure 6: Example sub-arenas where the probability of 𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ is minimized.

In the following, let (G,𝑃𝐴(p)) be an SPG, and (G~,𝑅𝐸(vwin)) be its associated SSG. For all strategy pairs σ,γΣ×Σ, let Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ) denote the probability for a play starting from v¯V¯ to reach a pBSCC in G~. Note that we never consider v^V^ as the starting vertex.

Lemma 6.

For all strategy pairs σ,γΣ×Σ, for all vV, it holds:

Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ)(1x0)x0n(1x0)(1x0n)x1

where n=|V|, x0=δmin(1α0), x1=(1δmin)(1α0), and δmin=minu,vV{Δ(u,v)Δ(u,v)>0}.

Sketch of Proof.

We fix an arbitrary strategy pair σ,γΣ×Σ, and analyze the corresponding MC G~σ,γ. We simplify the MC while either preserving or under-approximating the probability of reaching a pBSCC in G~σ,γ. These steps merge all pBSCCs into a sink vb, eliminate auxiliary states to simplify the MC, increase all values of α, and restructure transitions so that only one designated vertex can reach the sink vb directly. We denote the resulting MC with G~4. We then derive a lower bound on the probability of reaching vb in G~4, which provides a reachability lower bound in a template MC with absorbing sinks and bounded transition probabilities. As the reachability probabilities of vb in G~4 underapproximate those in G~σ,γ, this yields the desired lower bound on Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ).

3.3 Inside a pBSCC in SSG 𝑮~

We now focus on what happens after a play reaches a pBSCC in sub-arena G~σ,γ. Specifically, we consider MCs and give a lower bound on the probability of reaching the winning sink after reaching an even pBSCC, and dually an upper bound on the probability of reaching the winning sink after reaching an odd pBSCC.

The lower bound is attained in the MC shown in Figure 7, where k is an even parity value. There are 2n+1 states in a line, and winning and losing sinks. Each white state has maximal probability 1δmin to return to the initial state, and otherwise proceeds to the next blue state. Each blue state, except v^, can with probability αk+1 go to the losing sink, and otherwise proceeds to the next white state. The special state v^ goes with probability αk to vwin, and otherwise proceeds to v¯. Unlike the case with {v^0,v¯0,v^1,v¯1,v^2,v¯2} in Figure 5, this MC cannot be obtained by applying our gadget on some sub-arena Gσ,γ, and hence this bound is not guaranteed to be tight. More precisely, the outgoing transitions of v^ indicate that v has an odd parity value, while v^ suggests otherwise. The upper bound is obtained by considering the same MC, where k is an odd parity value. Later, in Lemma 11, we use these two bounds to find suitable values for all αk with k.

Figure 7: The MC where the lower bound on minimum 𝑤𝑖𝑛𝐸𝑣𝑒𝑛 probability is attained.

Let U be an even BSCC with smallest priority k in Gσ,γ and U~ its associated pBSCC in G~σ,γ. Let Pr~σ,γk,U~(𝑤𝑖𝑛𝐸𝑣𝑒𝑛) denote the minimum probability of reaching the winning sink after reaching U~. That is, Pr~σ,γk,U~(𝑤𝑖𝑛𝐸𝑣𝑒𝑛)=min{Pr~σ,γv~(𝑅𝑒𝑎𝑐ℎ(vwin))v~U~}. We denote it Pr~σ,γk,U~(𝑤𝑖𝑛𝐸𝑣𝑒𝑛) when U~ is clear from context. Analogously, given an odd BSCC U with smallest priority k in Gσ,γ, we use Pr~σ,γk(𝑤𝑖𝑛𝑂𝑑𝑑) to denote the maximum probability of reaching the winning sink after reaching U~.

Lemma 7.

For all strategy pairs σ,γΣ×Σ, for all even k, it holds:

Pr~σ,γk(𝑤𝑖𝑛𝐸𝑣𝑒𝑛)(1αk+1)(1x2)x2n1x41(x2+x3)+x5x2n+tx2n1x5x2n1

and for all odd k, it holds:

Pr~σ,γk(𝑤𝑖𝑛𝑂𝑑𝑑)1(1αk+1)(1x2)x2n1x41(x2+x3)+x5x2n+tx2n1x5x2n1

where n=|V|, x2=δmin(1αk+1), x3=(1δmin)(1αk+1), x4=δminαk, x5=δmin(1αk)+(1δmin)(1αk+1), and δmin is as before.

Sketch of Proof.

The proof follows the same structure as the one for Lemma 6, applied to a BSCC in σ,γ. We apply a similar four-step transformation and obtain a simplified MC where we can directly derive an upper bound. The lower bound comes as the dual.

3.4 Range of Winning Probabilities in the SSG

We now relate the winning probabilities in the constructed SSG G~ to the original SPG G. Intuitively, with a fixed strategy pair σ,γΣ×Σ, the value ~σ,γ of the SSG G~ falls into a range around the value σ,γ of the SPG G, and the range size depends on the probabilities 𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ, 𝑤𝑖𝑛𝐸𝑣𝑒𝑛 and 𝑤𝑖𝑛𝑂𝑑𝑑.

Lemma 8.

Let x,y(0,1) such that Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ)>x for all even k, and Pr~σ,γk(𝑤𝑖𝑛𝐸𝑣𝑒𝑛)y, and for all odd k, Pr~σ,γk(𝑤𝑖𝑛𝑂𝑑𝑑)1y, then it holds:

yσ,γvy+xy~σ,γv¯σ,γv+1xy

4 Reducing SPGs to SSGs

We now present the direct reduction from SPGs to SSGs. Let G=((V,E),(V,V,VR),Δ) be a stochastic arena, p:V be a priority function, and (G,𝑃𝐴(p)) be an SPG. We construct the SSG (G~,𝑅𝐸(vwin)) using the gadget presented in Section 3.1. Section 4.1 presents a lower bound on the difference between winning probabilities associated to different strategy pairs in G. Section 4.2 presents the main theorem establishing the reduction, while Section 4.3 gives complexity bounds.

4.1 A Lower Bound on Different Strategies

We consider two strategy pairs (σ,γ),(σ,γ)Σ×Σ and show a general result on all such pair: if they yield different values in G, then there exists a lower bound on the difference between these values.

In the following, we assume for all uVR,vV that Δ(u,v) is a rational number au,vbu,v, where au,v,bu,v+ and au,vbu,v. Let M=max(u,v)E{bu,v} and n=|V|.

Lemma 9.

For all (σ,γ),(σ,γ)Σ×Σ, for all vV, the following holds:

σ,γvσ,γv|σ,γvσ,γv|>1(n!)2M2n2=ϵ
Proof.

Let Prσ,γv(𝑒𝑛𝑡𝑒𝑟𝐸𝑣𝑒𝑛) be the probability for a play starting from vV to reach an even BSCC. It follows from Corollary 5 that for all σ,γΣ×Σ and vV, we have σ,γv=Prσ,γv(𝑒𝑛𝑡𝑒𝑟𝐸𝑣𝑒𝑛). We can obtain Prσ,γv(𝑒𝑛𝑡𝑒𝑟𝐸𝑣𝑒𝑛) by setting all vertices belonging to at least one even BSCC as the target set, and calculating the reachability probability. Calculating σ,γv is thus reduced to solving a linear equation system x=Ax+b according to Theorem 1. We omit the details of A and b. Every non-zero entry of A and b is either 1, or au,vbu,v for some u,vVR×V.

We use the following notations:

  • Let s=|b|. It follows that s<n since there is at least one vertex in a BSCC.

  • Let Q=IA. For i1,2,,n we denote the i-th row of Q with Q[i], and the entry of Q at i-th row and j-th column with Q[i,j]. It can be written as Q[i,j]=ci,jdi,j, where |ci,j| and |di,j| are natural numbers bounded by M with |ci,j||di,j|.

  • We denote the i-th entry of b with b[i]. It can be written as b[i]=ci,s+1di,s+1, where |ci,s+1| and |di,s+1| are natural numbers bounded by M with |ci,s+1||di,s+1|.

The equation system can be written as:

Qx=b

We take an arbitrary row i, and write the i-th equation Q[i]x=b[i] as follows:

[ci,1di,1ci,2di,2ci,sdi,s]x=ci,s+1di,s+1 (1)

We multiply equation (1) with t=1s+1di,t to obtain:

  • For all j=1,,s, Q[i,j] equals (t=1s+1di,t)ci,jdi,j, an integer with absolute value bounded by Ms+1.

  • For all i=1,,s, b[i] equals (t=1sdi,t)ci,s+1, an integer with absolute value bounded by Ms+1.

We apply this transformation to each row of the equation system, and write the new equation system as:

Qx=b

By Cramer’s rule, for all i=1,2,,s, we obtain:

x[i]=det(Qi)det(Q)

where Qi is the matrix obtained by replacing the i-th column of Q with the column vector b. It follows that all entries of Qi are also integers with absolute values bounded by Ms+1.

Since x[i] is a reachability probability, we have x[i]1. Following from the calculation of determinants, we obtain the following:

|det(Qi)||det(Q)|s!(Ms+1)s<n!Mn2

Therefore, if the equation system resulting from σ,γ yields x[i]>x[i], we have:

x[i]x[i]>1(n!Mn2)2=1(n!)2M2n2

4.2 Direct Reduction

We now establish the direct reduction from SPGs to SSGs.

Theorem 10 (Reducing SPGs to SSGs).

If for all (σ,γ)Σ×Σ, and vV, the following conditions hold:

  1. 1.

    Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ)>4ϵ4

  2. 2.

    Pr~σ,γk(𝑤𝑖𝑛𝐸𝑣𝑒𝑛)44+ϵ for all even k, and Pr~σ,γk(𝑤𝑖𝑛𝑂𝑑𝑑)144+ϵ for all odd k

where ϵ=1(n!)2M2n2, then every optimal strategy σΣ of Eve in the SSG (G~,𝑅𝐸(vwin)) is also optimal in the SPG (G,𝑃𝐴(p)). The same holds for Adam.

Proof.

We assume conditions 1 and 2 hold. We show that every optimal strategy σΣ in SSG G~ is also optimal in SPG G. We prove this by contraposition.

We take ΣΣ and ΣΣ as the sets of optimal strategies of Eve and Adam in G~. We obtain by Lemma 8 that for all vV and all σ,γΣ×Σ, the following holds:

yσ,γvy+xy~σ,γv¯σ,γv+1xy

Since conditions 1 and 2 hold, we substitute x and y to obtain:

~σ,γv¯σ,γv+2ϵ4+ϵ (2)

If σ is not optimal in G, then there exists another strategy σΣ and a vertex vV such that σ,γv>σ,γv. It follows again from Lemma 8 that:

yσ,γvy+xy~σ,γv¯σ,γv+1xy (3)

Furthermore, Lemma 9 yields:

σ,γv>σ,γv+ϵ (4)

As a result, we obtain the following:

~σ,γv¯ yσ,γvy+xy by (3)
> y(σ,γv+ϵ)y+xy by (4)
= 44+ϵ(σ,γv+ϵ)44+ϵϵ4
= σ,γvϵ4+ϵσ,γv+3ϵ4+ϵ
σ,γv+2ϵ4+ϵ
~σ,γv¯ by (2)

It indicates that ~σ,γv¯>~σ,γv¯, which contradicts the assumption that σΣ.

Until now, we have used the function α in Theorem 10, obtaining inequalities relating parity values in SPG G to transition probabilities in SSG G~. We now give requirements for α that satisfy all these inequalities.

Lemma 11.

When the values of α are arranged as follows, the conditions in Theorem 10 are satisfied:

  1. 1.

    If α0δminn8(n!)2M2n2, then condition 1 is satisfied.

  2. 2.

    If for all k, the following holds, then condition 2 is satisfied:

    αk+1αkδminn(1δmin)8(n!)2M2n2+1.
Sketch of Proof.

Both cases follow a similar structure. For α0 (respectively αk+1/αk), we derive from the bound given by Lemma 6 (resp. Lemma 7) a corollary giving a bound that explicitly makes use of α. We then directly obtain the two cases of Lemma 11 from these two bounds. The full proof of this lemma, detailing how to compute function α can be found in Appendix A.1.

4.3 Complexity Considerations

To introduce complexity results, we first define size of a stochastic game G as |G|=|V|+|E|+|Δ| where |Δ| is the space needed to store the transition function (which may be stored in unary or binary). A now longstanding result shows that most stochastic game settings are polynomially reducible one to the other. In particular:

Theorem 12 (From Theorem 1 in [2]).

Solving stochastic parity games and solving simple stochastic games is polynomial-time equivalent. Either can be using unary or binary encoding.

We show that the reduction we have introduced in this paper is polynomial with binary encoding. We recall that M=max(u,v)E{bu,v} and n=|V|.

Theorem 13.

Given an SPG G, there exist polynomial values for function α that satisfy Theorem 10, such that the SSG G~ is of size 𝒪(n5logM) in binary.

Proof.

Since δmin1M and 1δmin12, the following is a valid instance of α, polynomial in G (and polynomial in the transition probabilities appearing in G) under binary encoding:

k,αk=(116(n!)2M2n2+n+1)k+1

Then the size of the SSG G~ is:

|G~|= |V~{vwin,vlose}|+|E~|+|Δ~|
= 𝒪(n)+𝒪(n2)+𝒪(n2)𝒪(n(nlogn+n2logM))
= 𝒪(n5logM)

According to [2], quantitative SSGs under unary and binary encoding are in the same complexity class, and so in NPcoNP [18]. We thus obtain that our reduction yields an NPcoNP algorithm for solving SPGs.

5 Epilogue

We have given a polynomial reduction from quantitative SPGs to quantitative SSGs, taking inspiration from a gadget used in [9] to obtain a reduction from deterministic PGs to quantitative SSGs. After fixing a pair of strategies, the values of both the SPG and the SSG are determined, but the construction of the SSG makes it difficult to establish coinciding values. Using these fixed strategies, we showed that the value of the SSG falls into a range around the value of the SPG, where this range depends on the probability to reach a pBSCC of the SSG and the minimum probability to reach a winning sink in pBSCCs of the SSG. When considering all possible strategy pairs, we obtained a lower bound ϵ on their value differences in the SPG, by restricting transition probabilities to rational numbers and analyzing reachability equation systems of Markov chains. We then showed that by arranging transition probabilities of the SSG properly in terms of the size of the SPG, its smallest probability, and ϵ, the value ranges of different strategy pairs can be narrowed so that they do not overlap. In this case, a reduction from SPGs to SSGs is achieved.

Although under unary encoding, exponential numbers can be introduced into the probability function α of the newly constructed SSGs, both reductions are polynomial. Hence, our construction yields an NPcoNP algorithm in both qualitative and quantitative SSGs under unary and binary encoding, substantiating the complexity results from previous works [18, 2, 9]. .

Our result enables solving SPGs by first reducing them to SSGs and then applying algorithms for SSGs. However, its implementability is in question, due to the possibly huge representation of α. Our reduction also captures the transformation from an MDP with a parity objective into an SSG. As we assume the minimum transition probability to be δmin(0,12] in the original SPG, we cannot capture the subcase of reducing DPGs to quantitative SSGs. Although our reduction is unlikely to be leveraged to effectively solve SPGs in practice, some improvements are possible. First, we have not formally examined the optimal arrangement of α. It is possible to find the weakest requirements on α so that the reductions are correct, thus optimizing possible implementations. Second, some specific cases lead to very small values of α. These cases are similar to the ones that can challenge classical MDP solvers using VI, and so we can benefit from any family of arena structure where these cases are avoided, leading to implementable valuations of α.

References

  • [1] Rabah Amir. Stochastic games in economics and related fields: an overview. Stochastic Games and Applications, pages 455–470, 2003.
  • [2] Daniel Andersson and Peter Bro Miltersen. The complexity of solving stochastic games on graphs. In International Symposium on Algorithms and Computation, pages 112–121. Springer, 2009. doi:10.1007/978-3-642-10631-6_13.
  • [3] Muqsit Azeem, Alexandros Evangelidis, Jan Křetínskỳ, Alexander Slivinskiy, and Maximilian Weininger. Optimistic and topological value iteration for simple stochastic games. In International Symposium on Automated Technology for Verification and Analysis, pages 285–302. Springer, 2022.
  • [4] Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT press, 2008.
  • [5] Christel Baier, Joachim Klein, Linda Leuschner, David Parker, and Sascha Wunderlich. Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In International Conference on Computer Aided Verification, volume 10426, pages 160–180. Springer, 2017. doi:10.1007/978-3-319-63387-9_8.
  • [6] Raphaël Berthon, Joost-Pieter Katoen, and Zihan Zhou. A direct reduction from stochastic parity games to simple stochastic games, 2025. arXiv:2506.06223.
  • [7] Patricia Bouyer, Youssouf Oualhadj, Mickael Randour, and Pierre Vandenhove. Arena-independent finite-memory determinacy in stochastic games. Log. Methods Comput. Sci., 19(4), 2023. doi:10.46298/LMCS-19(4:18)2023.
  • [8] Cristian S Calude, Sanjay Jain, Bakhadyr Khoussainov, Wei Li, and Frank Stephan. Deciding parity games in quasipolynomial time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 252–263, 2017. doi:10.1145/3055399.3055409.
  • [9] Krishnendu Chatterjee and Nathanaël Fijalkow. A reduction from parity games to simple stochastic games. Electronic Proceedings in Theoretical Computer Science, 54:74–86, 2011. doi:10.4204/eptcs.54.6.
  • [10] Krishnendu Chatterjee and Thomas A Henzinger. Strategy improvement and randomized subexponential algorithms for stochastic parity games. In Annual Symposium on Theoretical Aspects of Computer Science, pages 512–523. Springer, 2006. doi:10.1007/11672142_42.
  • [11] Krishnendu Chatterjee and Thomas A Henzinger. Reduction of stochastic parity to stochastic mean-payoff games. Information Processing Letters, 106(1):1–7, 2008. doi:10.1016/J.IPL.2007.08.035.
  • [12] Krishnendu Chatterjee, Thomas A Henzinger, Barbara Jobstmann, and Arjun Radhakrishna. Gist: A solver for probabilistic games. In Computer Aided Verification: 22nd International Conference, volume 6174, pages 665–669. Springer, 2010. doi:10.1007/978-3-642-14295-6_57.
  • [13] Krishnendu Chatterjee, Marcin Jurdziński, and Thomas A. Henzinger. Simple stochastic parity games. In Matthias Baaz and Johann A. Makowsky, editors, Computer Science Logic, volume 2803, pages 100–113, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. doi:10.1007/978-3-540-45220-1_11.
  • [14] Krishnendu Chatterjee, Marcin Jurdzinski, and Thomas A Henzinger. Quantitative stochastic parity games. In SODA, volume 4, pages 121–130, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982808.
  • [15] Krishnendu Chatterjee, Rupak Majumdar, and Marcin Jurdzinski. On Nash equilibria in stochastic games. In Computer Science Logic, volume 3210 of Lecture Notes in Computer Science, pages 26–40. Springer, 2004. doi:10.1007/978-3-540-30124-0_6.
  • [16] Krishnendu Chatterjee and Nir Piterman. Combinations of qualitative winning for stochastic parity games. In 30th International Conference on Concurrency Theory, volume 140 of LIPIcs, pages 6:1–6:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.CONCUR.2019.6.
  • [17] Anne Condon. On algorithms for simple stochastic games. Advances in Computational Complexity Theory, 13:51–72, 1990. doi:10.1090/DIMACS/013/04.
  • [18] Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992. doi:10.1016/0890-5401(92)90048-K.
  • [19] Wojciech Czerwiński, Laure Daviaud, Nathanaël Fijalkow, Marcin Jurdziński, Ranko Lazić, and Paweł Parys. Universal trees grow inside separating automata: Quasi-polynomial lower bounds for parity games. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2333–2349. SIAM, 2019.
  • [20] Matthew Darlington, Kevin D. Glazebrook, David S. Leslie, Rob Shone, and Roberto Szechtman. A stochastic game framework for patrolling a border. Eur. J. Oper. Res., 311(3):1146–1158, 2023. doi:10.1016/J.EJOR.2023.06.011.
  • [21] Julia Eisentraut, Edon Kelmendi, Jan Křetínskỳ, and Maximilian Weininger. Value iteration for simple stochastic games: Stopping criterion and learning algorithm. Information and Computation, 285:104886, 2022. doi:10.1016/J.IC.2022.104886.
  • [22] Kousha Etessami, Marta Z. Kwiatkowska, Moshe Y. Vardi, and Mihalis Yannakakis. Multi-objective model checking of Markov decision processes. Log. Methods Comput. Sci., 4(4), 2008. doi:10.2168/LMCS-4(4:8)2008.
  • [23] Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Model-free reinforcement learning for stochastic parity games. In 31st International Conference on Concurrency Theory, volume 171 of LIPIcs, pages 21:1–21:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.CONCUR.2020.21.
  • [24] Arnd Hartmanns and Benjamin Lucien Kaminski. Optimistic value iteration. In Computer Aided Verification - 32nd International Conference, volume 12225 of Lecture Notes in Computer Science, pages 488–511. Springer, 2020. doi:10.1007/978-3-030-53291-8_26.
  • [25] Christian Hensel, Sebastian Junges, Joost-Pieter Katoen, Tim Quatmann, and Matthias Volk. The probabilistic model checker storm. International Journal on Software Tools for Technology Transfer, pages 1–22, 2021.
  • [26] Marcin Jurdziński. Deciding the winner in parity games is in UPco-UP. Information Processing Letters, 68(3):119–124, 1998.
  • [27] Marcin Jurdziński and Ranko Lazić. Succinct progress measures for solving parity games. In 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–9. IEEE, 2017.
  • [28] Mark Kattenbelt, Marta Kwiatkowska, Gethin Norman, and David Parker. A game-based abstraction-refinement framework for Markov decision processes. Formal Methods in System Design, 36:246–280, 2010. doi:10.1007/S10703-010-0097-6.
  • [29] Jan Křetínskỳ, Emanuel Ramneantu, Alexander Slivinskiy, and Maximilian Weininger. Comparison of algorithms for simple stochastic games. Information and Computation, 289:104885, 2022. doi:10.1016/J.IC.2022.104885.
  • [30] Marta Kwiatkowska, Gethin Norman, David Parker, and Gabriel Santos. PRISM-games 3.0: Stochastic game verification with concurrency, equilibria and time. In Computer Aided Verification - 32nd International Conference, volume 12225 of Lecture Notes in Computer Science, pages 475–487. Springer, 2020. doi:10.1007/978-3-030-53291-8_25.
  • [31] Marta Z. Kwiatkowska, Gethin Norman, and David Parker. PRISM 4.0: Verification of probabilistic real-time systems. In Computer Aided Verification - 23rd International Conference, volume 6806 of Lecture Notes in Computer Science, pages 585–591. Springer, 2011. doi:10.1007/978-3-642-22110-1_47.
  • [32] Karoliina Lehtinen. A modal μ perspective on solving parity games in quasi-polynomial time. In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, pages 639–648, 2018. doi:10.1145/3209108.3209115.
  • [33] David S Leslie, Steven Perkins, and Zibo Xu. Best-response dynamics in zero-sum stochastic games. Journal of Economic Theory, 189:105095, 2020. doi:10.1016/J.JET.2020.105095.
  • [34] Rupak Majumdar, Kaushik Mallik, Anne-Kathrin Schmuck, and Sadegh Soudjani. Symbolic control for stochastic systems via finite parity games. Nonlinear Analysis: Hybrid Systems, 51:101430, 2024. doi:10.1016/j.nahs.2023.101430.
  • [35] Donald A Martin. The determinacy of Blackwell games. The Journal of Symbolic Logic, 63(4):1565–1581, 1998. doi:10.2307/2586667.
  • [36] Pawel Parys. Parity games: Zielonka’s algorithm in quasi-polynomial time. In 44th International Symposium on Mathematical Foundations of Computer Science, volume 138 of LIPIcs, pages 10:1–10:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.MFCS.2019.10.
  • [37] Kittiphon Phalakarn, Toru Takisaka, Thomas Haas, and Ichiro Hasuo. Widest paths and global propagation in bounded value iteration for stochastic games. In Computer Aided Verification: 32nd International Conference, pages 349–371. Springer, 2020. doi:10.1007/978-3-030-53291-8_19.
  • [38] Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  • [39] Tim Quatmann and Joost-Pieter Katoen. Sound value iteration. In Computer Aided Verification - 30th International Conference, volume 10981 of Lecture Notes in Computer Science, pages 643–661. Springer, 2018. doi:10.1007/978-3-319-96145-3_37.
  • [40] Lloyd S Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, 1953.
  • [41] Frédéric Simard, Josée Desharnais, and François Laviolette. General cops and robbers games with randomness. Theor. Comput. Sci., 887:30–50, 2021. doi:10.1016/J.TCS.2021.06.043.
  • [42] Michael Ummels and Dominik Wojtczak. The complexity of Nash equilibria in stochastic multiplayer games. Log. Methods Comput. Sci., 7(3), 2011. doi:10.2168/LMCS-7(3:20)2011.
  • [43] Uri Zwick and Mike Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158(1-2):343–359, 1996. doi:10.1016/0304-3975(95)00188-3.

Appendix A Appendix

We present how we compute values for function α. We include the details, hoping these techniques may later be modified to yield better values on specific classes of games.

A.1 Proof of Lemma 11

Lemma 11. [Restated, see original statement.]

When the values of α are arranged as follows, the conditions in Theorem 10 are satisfied:

  1. 1.

    If α0δminn8(n!)2M2n2, then condition 1 is satisfied.

  2. 2.

    If for all k, the following holds, then condition 2 is satisfied:

    αk+1αkδminn(1δmin)8(n!)2M2n2+1.

A.1.1 Arranging 𝜶𝟎

We start by giving a bound on Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ) that involves α0. To do so, we make use of Lemma 6.

Corollary 14 (Another Lower Bound of 𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ Probability).

For all strategy pairs σ,γΣ×Σ, for all v¯V¯, the following holds:

Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ)>δminn(1α0)n+12α0+δminn(1α0)n+1
Proof.

We scale down Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ) as follows:

Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ)
(1x0)x0n(1x0)(1x0n)x1 Lemma 6
= (1δmin(1α0))δminn(1α0)nα0+(1δmin)δminn(1α0)n+1
> (1δmin)(1α0)δminn(1α0)nα0+(1δmin)δminn(1α0)n+1 since 1δmin(1α0)>(1δmin)(1α0)
= δminn(1α0)n+1α01δmin+δminn(1α0)n+1
δminn(1α0)n+12α0+δminn(1α0)n+1 since 1δmin12

We can now arrange α0. It follows from Corollary 14 that:

Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ)>δminn(1α0)n+12α0+δminn(1α0)n+1

Therefore to show:

Pr~σ,γv¯(𝑐𝑟𝑜𝑠𝑠𝑃𝑎𝑡ℎ)>4ϵ4 (5)

where ϵ=1(n!)2M2n2, it suffices to show that:

δminn(1α0)n+12α0+δminn(1α0)n+14ϵ4 (6)

which can be further simplified as:

ϵ8α02α0+δminn(1α0)n+1 (7)

We show that when α0δminn8(n!)2M2n2, inequalit (7) holds. We start with the right side:

8α02α0+δminn(1α0)n+1
8α02α0+δminn(1(n+1)α0) follows from Bernoulli’s inequality
= 8α0α0+δminn+α0(1(n+1)δminn)
8α0α0+δminn since 1(n+1)δminn>0
δminn(n!)2M2n2δminn8(n!)2M2n2+δminn
= 118+(n!)2M2n2
1(n!)2M2n2=ϵ

Therefore we obtain that when α0δminn8(n!)2M2n2, inequality (5) holds, and thus condition (1) is satisfied.

A.1.2 Arranging 𝜶𝒌+𝟏/𝜶𝒌

We start by getting a lower bound on Pr~σ,γk(𝑤𝑖𝑛𝐸𝑣𝑒𝑛) for even k’s, which makes use of αk+1 and αk. The reasoning for odd k’s is symmetric. To do so, we use Lemma 7.

Corollary 15 (Another Lower Bound of 𝑤𝑖𝑛𝐸𝑣𝑒𝑛 Probability).

For all strategy pairs σ,γΣ×Γ, for all even k, the following holds:

Pr~σ,γk(𝑤𝑖𝑛𝐸𝑣𝑒𝑛)>δminn(1δmin)αk+1αkδminn(1δmin)+αk+1αk
Proof.

We scale down the right side as follows:

Pr~σ,γk(𝑤𝑖𝑛𝐸𝑣𝑒𝑛)
(1αk+1)(1x2)x2n1x41(x2+x3)+x5x2n+tx2n1x5x2n1 Lemma 7
= (1δmin+δminαk+1)δminn1(1αk+1)nδminαkαk+1+δminn1(1αk+1)n1(δmin2(αk+αk+1+αkαk+1αk+12)+δmin(αk2αk+1+αk+12))
> (1δmin)δminn(1αk+1)n+1αkαk+1+δminn(δmin(αk+αk+1+αkαk+1αk+12)+(αk2αk+1+αk+12)) since 1δmin+δminαk+1>(1δmin)(1αk+1) and (1αk+1)n1<1
(1δmin)δminnαk(1(n+1)αk+1)αk+1+δminn(αk(1δmin)αk+1(2αk+1δmin(1+αkαk+1))) follows from Bernoulli inequality
> (1δmin)δminnαk(n+1)δminn(1δmin)αkαk+1αk+1+(1δmin)δminnαk since αk+1(2αk+1δmin(1+αkαk+1))>0
> δminnαk(1δmin)αk+1δminnαk(1δmin)+αk+1 since (n+1)δminn(1δmin)αk<1
= δminn(1δmin)αk+1αkδminn(1δmin)+αk+1αk

We now arrange αk+1/αk. It follows from Corollary 15 that for all strategy pairs σ,γΣ×Γ, for all even k, the following holds:

Pr~σ,γk(𝑤𝑖𝑛𝐸𝑣𝑒𝑛)>δminn(1δmin)αk+1αkδminn(1δmin)+αk+1αk (8)

Therefore to show:

Pr~σ,γk(𝑤𝑖𝑛𝐸𝑣𝑒𝑛)44+ϵ (9)

it suffices to show that for all k:

δminn(1δmin)αk+1αkδminn(1δmin)+αk+1αk44+ϵ (10)

We denote αk+1αk with r in the following calculation. Inequality 10 can be further simplified to:

ϵ4+ϵ2r(1δmin)δminn+r (11)

and can be finally simplified to:

r(1δmin)δminn8(n!)2M2n2+1 (12)

Therefore we obtain that if for all k, the following holds:

αk+1αk(1δmin)δminn8(n!)2M2n2+1

then inequality (9) holds, and thus condition (2) is satisfied.