A Direct Reduction from Stochastic Parity Games to Simple Stochastic Games
Abstract
Significant progress has been recently achieved in developing efficient solutions for simple stochastic games (SSGs), focusing on reachability objectives. While reductions from stochastic parity games (SPGs) to SSGs have been presented in the literature through the use of multiple intermediate game models, a direct and simple reduction has been notably absent. This paper introduces a novel and direct polynomial-time reduction from quantitative SPGs to quantitative SSGs. By leveraging a gadget-based transformation that effectively removes the priority function, we construct an SSG that simulates the behavior of a given SPG. We formally establish the correctness of our direct reduction. Furthermore, we demonstrate that under binary encoding this reduction is polynomial, thereby directly corroborating the known complexity of SPGs and providing new understanding in the relationship between parity and reachability objectives in turn-based stochastic games.
Keywords and phrases:
stochastic games, parity, reductionFunding:
Raphaël Berthon: Funded by DFG Project POMPOM (KA 1462/6-1).Copyright and License:
2012 ACM Subject Classification:
Theory of computation Probabilistic computationEditors:
Patricia Bouyer and Jaco van de PolSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Stochastic games (SGs) are a broadly used framework for decision-making under uncertainty with a well-defined objective. They aim at modeling two important aspects under which sequential decisions must be made: turn-based interaction with an adversary environment, and dealing with randomness. Stochastic games have been introduced long ago by Shapley [40] as important models, and many variations have been shown to be inter-reducible and in [18, 2, 9]. SGs find their applications in various fields, including artificial intelligence [33], economics [1], operations research [20], or providing tools to graph theory [41]. Moreover, Markov Decision Processes (MDP) [38], a foundational framework for modeling decision-making in stochastic environments, are special cases of SGs, where one of the players has no states under control.
While we make use of simple stochastic games (SSGs) with reachability objectives, we focus on the specific case of stochastic parity games (SPGs), which are zero-sum, and where the set of winning runs is -regular. Solving such a game consists in finding an optimal strategy and determining its winning probability.
Take, for instance, the case of a market competition, where two firms Alpha and Beta try to expand their market share. From the standpoint of each firm, the other acts as a direct competitor, and therefore we assume the players are adversarial. States represent the relative valuation of the firms over a sustained period of time, and business decisions (made by either firm Alpha or Beta) and the fluctuation of market shares, policy changes, and external forces (modeled as randomness) lead to transitions between game states. Firm Alpha is interested in keeping its share above a set key threshold, say 50%. We distinguish between three priorities, leading to a zero-sum game:
-
Priority : Alpha’s market share is significantly above 50%.
-
Priority : Alpha’s market share is significantly below 50%.
-
Priority : Alpha’s market share fluctuates around 50%. While this is sustainable without major fluctuation, this is not sustainable if the only other fluctuation is Alpha’s share regularly dropping below 50% (priority ).
If the minimum priority visited infinitely often is , Alpha can manage in the long term to regularly dominate the market, recovering any loss that occurred in the meantime. If the minimum priority visited infinitely often is , despite any temporary success, Alpha’s market share will stay near or below 50% in the long run, which is not sustainable for the firm. Finally, if the minimum priority visited infinitely often is , Alpha and Beta will eventually find an even balance point.
A variety of algorithms have been considered for (reachability) SSGs [17, 21, 3, 37], which we present in our related work section below. In Markov chains (MCs), -regular objectives can be reduced directly to reachability objectives [4]. A similar reduction exists from MDPs with -regular objectives to reachability objectives and has been used extensively, for example in [22]. This means that solvers often focus on optimizing specifically the computation of reachability probabilities. Such a direct reduction is lacking for SPGs. To reduce quantitative SPGs to SSGs, some intermediate steps are necessary, via a reduction to stochastic mean-payoff and stochastic discounted-payoff games [11, 2] (see the lower part of Figure 1), making this approach less appealing. For qualitative solutions, a translation via deterministic parity games (i.e. with no random states) exists [13, 14, 9], see the upper part of Figure 1.
Outline and Contribution
In this paper, we propose a direct reduction from SPGs to SSGs with a reachability objective (see the solid arrow in Figure 1). To that end, we leverage a gadget whose structure comes from [9], but where we use new probability values, to reduce deterministic parity games to quantitative SSGs. Given an SPG , where a parity condition is satisfied if the minimum priority seen infinitely often is even, we show in Section 3.1 how to use the gadget to transform into an SSG . This introduces two new sink states, one winning and the other losing for the reachability objective. Parity values are removed, and every transition going to a state that used to be even (respectively odd) now has a small chance to go to the winning (resp. losing) sink. We scale the probabilities, with lower parity values yielding a higher chance to go to a sink. Theorem 10 ensures that any optimal strategy in is also optimal in (the reciprocal may not be true). We can then compute an optimal memoryless strategy in , and compute its value in .
We show in Theorem 13 that under binary encoding, our reduction is polynomial. We thus reobtain in a direct way the classical result that solving quantitative SPGs is in the complexity class [2, 9]. While the complexity remains the same as for existing algorithms, and the values used in the reduction make it unlikely to be very efficient in practice, this new approach implies that any efficient SSG solver can be used for SPGs. The direct reduction was already conjectured to exist by Chatterjee and Fijalkow in [9], but as expected, proving its correctness is challenging, and involves the computations of very precise probability bounds. Despite the inspiration drawn from a known gadget, the technical depth of this paper resides in the intricate and novel proofs for the correctness of our reduction. In addition, our direct reduction gives new insights into the relationship between SSGs and SPGs.
Related Work
Stochastic parity games, mean-payoff games and discounted payoff games can all be reduced to SSGs [26, 43], and this also applies to their stochastic extensions, namely stochastic parity games [9], stochastic mean payoff games and stochastic discounted payoff games [2]. SSGs also find their applications in the analysis of MDPs, serving as abstractions for large MDPs [28]. The amount of memory required to solve stochastic parity games has been studied in [7].
Various extensions have been considered within this family of inter-reducible stochastic games. Introducing more than two players allows for the analysis of Nash equilibria [15, 42]. Using continuous states can provide tools to represent timed systems [34]. Multi-objective approaches have been employed to synthesize systems that balance average expected outcomes with worst-case guarantees [16]. Parity objectives are significant in many of these scenarios where long-run behavior is relevant, but the classical reduction to SSGs cannot be directly applied.
Common approaches to solving SSGs, as presented in [17], include value iteration (VI), strategy iteration (SI), and quadratic programming, but are all exponential in the size of the SSG. These approaches have been widely studied on MDPs, where recent advancements have been made to apply VI with guarantees, using interval VI [5], sound VI [39], and optimistic VI [24]. Interestingly, optimistic VI does not require an a priori computation of starting vectors to approximate from above. Similar ideas have been lifted to SSGs: Eisentraut et al. [21] introduce a VI algorithm for under- and over-approximation sequences, as well as the first practical stopping criterion for VI on SSGs. Optimistic VI has been adapted to SSGs [3], and a novel bounded VI with the concept of widest path has been introduced in [37]. A comparative analysis [29] suggests VI and SI are more efficient. Storm [25] and PRISM [31] are two popular model checkers incorporating different variants of VI and SI, and both employ VI as the default algorithm for solving MDPs. PRISM-games [30] exploits VI for solving SSGs.
For SPGs, we distinguish three main approaches. Chatterjee et al. [10] use a strategy improvement algorithm requiring randomized sub-exponential time. With game states and priorities, the expected running time is in . The probabilistic game solver GIST [12] reduces qualitative SPGs to deterministic parity games (DPG), and benefits from several quasi-polynomial algorithms for DPGs [27, 36, 32] since the breakthrough made by Calude et al. [8], but this approach is unlikely to achieve polynomial running time [19]. Hahn et al. [23] reduce SPGs to SSGs, allowing the use of reinforcement learning to approximate the values without knowing the game’s probabilistic transition structure. Their reduction is only proven correct in the limit.
2 Preliminaries
Our notations on Markov chains and stochastic games on graphs mainly come from [4].
2.1 Discrete-Time Markov Chains
A discrete distribution over a countable set is a function with . The support of the discrete distribution is defined as . We denote the set of all discrete distributions over with .
A discrete-time Markov Chain (MC) is a tuple where is a finite set of states, is a probabilistic transition function, and is the initial state. Given with , we write . For and , let .
An infinite sequence is an infinite path through MC if for all . We denote all infinite paths that start from state with . Prefixes of infinite path are and are finite paths. We denote all finite paths that start from state with . The set of infinitely often visited states in is defined as .
The probability of a finite path is given by . The set of infinite paths that start with a given finite path is called a cylinder, and as in [4], we extend the probability of cylinders in a unique way to all measurable sets of .
Reachability Probabilities
Let be an MC. For target states and starting state , the event of reaching is defined as . The probability to reach from is defined as .
Let variable denote the probability of reaching from any . Whether is reachable from a given state can be determined using standard graph analysis. Let denote the set of states from which is reachable. If , then . If , then . Otherwise, This is equivalent to a linear equation system, formalized as follows:
Theorem 1 (Reachability Probability of Markov Chains [4]).
Given MC and target states , let , and . Then, the vector with is the unique solution of the linear equation system .
Limit Behavior
Let MC . A set is strongly connected if for all pairs of states , and are mutually reachable. Hence a singleton is strongly connected if . Set is a strongly connected component (SCC) if it is maximally strongly connected, i.e., there does not exist another set and such that is strongly connected. is a bottom SCC (BSCC) if is a SCC and there is no transition leaving , i.e., there does not exist such that . We denote the set of BSCCs in MC with .
The limit behavior of an MC regarding the infinitely often visited states is captured by the following theorem.
Theorem 2 (Limit behavior of Markov Chains [4]).
For MC , it holds that .
2.2 Stochastic Games
A stochastic arena is a tuple , where is a directed graph, with a finite set of vertices , partitioned as , and a set of edges . The probabilistic transition function is such that for all , is a distribution over , and for , , we have if and only if . We usually uncurry and write .
Without loss of generality, we assume each vertex has at least one successor. This property is called non-blocking. The finite set of vertices is partitioned into three sets: – vertices where Eve chooses the successor, – vertices where Adam chooses the successor, and are the random vertices. A stochastic arena is a Markov Decision Process (MDP) if either or , and an MC if both and .
Figure 2 illustrates a stochastic arena . Square-shaped vertex is a vertex in where Eve chooses the successor, pentagon-shaped vertex is in where Adam chooses the successor, and the circular vertices are random. Edges from random vertices are annotated with probabilities from .
Strategies
Let be a stochastic arena. A strategy of Eve is a function , such that for all , we have implies . A strategy of Adam is defined analogously. We denote the sets of all strategies of Eve and Adam with and respectively.
A strategy of Eve is a pure memoryless strategy, if for all and , and the support of this distribution is a singleton. A pure memoryless strategy of Adam is defined analogously. We denote the sets of pure memoryless strategies of Eve and Adam with and respectively.
In a stochastic arena , when Eve and Adam follow pure memoryless strategies and respectively, the arena results. Here, the new edge set is such that for all , if and only if , and for all , if and only if . We refer to such arenas obtained by fixing pure memoryless strategies as sub-arenas. In fact, given a fixed starting vertex , we often view the sub-arena as an MC , where the state space is the vertex set in , and the transition function combines deterministic moves indicated by strategies and , and the transition function defined on random vertices:
We continue with the stochastic arena from Figure 2. Fixing strategy for Eve and for Adam induces the sub-arena , as shown in Figure 3.
Winning Objectives
A play of is an infinite sequence of vertices where for all , . We denote the set of all plays of with , or in short when is clear from the context.
Let be a stochastic arena. A winning objective for Eve is defined as a set of plays . As we study zero-sum games, the winning objectives of the two players are complementary. The winning objective for Adam is thus . A play satisfies an objective if , and is a winning play of Eve. A winning play of Adam satisfies .
A reachability objective asserts that the play in has to reach target vertices , formally given by . If for some vertex , we simply write .
Let be a priority function which assigns a priority to each vertex . For , let . A parity objective asserts that the minimum priority visited infinitely often along an infinite path is even: .
We formally define stochastic games as follows:
Definition 3 (Stochastic Games).
Let be a stochastic arena. A stochastic game (SG) with winning objective is defined as . If is a reachability or parity objective, is a stochastic reachability game (SRG) or stochastic parity game (SPG) respectively. SRGs are also referred to as simple stochastic games (SSG). When the winning objective is clear from the context, we refer to as a stochastic game.
Solving stochastic games
Let be an SG, and let Eve and Adam follow strategies and . Given a starting vertex , the probability for play to satisfy – the probability for Eve to win – is denoted . The probability for Adam to win is .
Let the value of a vertex be the maximal probability of generating a play from that satisfies , formally defined using a value function for Eve, and for Adam. A strategy for Eve is optimal from vertex if . Optimal strategies for Adam are defined analogously.
We divide solving stochastic games into three distinct tasks. Given an SG, solving the SG quantitatively amounts to computing the values of all vertices in the arena. Solving the SG strategically amounts to computing an optimal strategy of Eve (or Adam) for the game.
Determinacy
Determinacy refers to the property of an SG where both players, Eve and Adam, have optimal strategies, meaning they can guarantee to achieve the values of the game, regardless of the strategies employed by the other player. Pure memoryless determinacy means that both players have pure memoryless optimal strategies.
Theorem 4 (Pure Memoryless Determinacy [35]).
Let be an SG, where is a reachability or parity objective. For all , it holds that . Pure memoryless optimal strategies exist for both players from all vertices.
When Eve and Adam follow pure memoryless strategies and respectively, we obtain sub-arena , which can be seen as the MC . We can reduce the winning probabilities to reachability probabilities in as follows. Given a reachability objective , . Given a parity objective , , i.e. , we call an even BSCC if is even, meaning intuitively the smallest priority of its vertices is even. Odd BSCCs are defined analogously. Then , where .
Corollary 5 (Sufficiency of Pure Memoryless Strategies [11]).
Let be a stochastic arena, a reachability objective, and a parity objective. For all vertices , it holds:
-
-
where .
Therefore we consider only pure memoryless strategies in the sequel, unless stated otherwise.
3 A Gadget for Transforming SPGs into SSGs
The aim of this paper is to reduce an SPG to an SSG such that the probability of reaching target vertex in this SSG is related to the probability of winning in the SPG . As an important step toward this goal, we introduce in this section a gadget that expands each transition of while removing the priority function.
Let be a stochastic arena, be a priority function, and be an SPG. Section 3.1 presents the gadget enabling the reduction from SPG to SSG . We then analyze how probabilistic events in are related to those in . Section 3.2 presents a bound on the probability of reaching BSCCs in . Section 3.3 provides a bound on the winning probability once a BSCC in is reached, while Section 3.4 gives interval bounds on the winning probabilities in with regard to those in .
3.1 Gadget Construction
To reduce the parity objective to a reachability objective, we transform the SPG into the SSG by means of a gadget, whose structure was defined by Chatterjee and Fijalkow in [9] to reduce deterministic parity games (DPG) to SSGs.
When both players’ strategies are fixed in a DPG, every play forms a lasso, i.e. a finite simple path that ends in a simple cycle. Chatterjee and Fijalkow’s analysis proceeds by examining lasso plays to choose transition probabilities of the gadget. The specific probabilities they introduce do not extend to a reduction from SPGs to SSGs, because in our stochastic setting, a lasso lifts to a more complicated structure, namely an induced Markov chain: the simple path becomes the transient segment, and each cycle becomes a BSCC. Hence, our approach makes use of smaller probability values, requiring more representation space.
The structure of the gadget remains applicable, and its intuition is as follows: whenever a play visits a vertex with even priority in , give a small but positive chance to reach a winning sink in . Vertices with odd priority yield a small chance to reach a losing sink. Finally, to represent that smaller priorities have precedence over larger ones, the probability of reaching a sink from a vertex depends on the priority it is associated to. We introduce a monotonically decreasing function for this purpose.
We obtain the stochastic arena by modifying as indicated in Figure 4. Each vertex in is duplicated in yielding vertices and . A transition in is replaced by first moving to , which can then either evolve to a sink with probability , or to the copy with the complementary probability. Depending on being even or odd, the sink is or .
Formally, for , let , and . We define the arena where the new edge set is as follows: .
To define the new transition function , let where represents the probability of entering the winning (resp. losing) sink before visiting a vertex with even (resp. odd) priority . We give suitable values for later, in Lemma 11 on page 11. Now, we define as follows:
When the context is clear, we also address the SPG and the SSG with and respectively.
Since all new vertices are random vertices, a strategy of either player in SPG is a strategy in SSG and vice versa. That is, there is a one-to-one relationship between strategies in and . Hence, to keep notations simpler, we do not distinguish between strategies in and .
A pair of strategies for Eve and Adam in induces the sub-arena . Similarly, we obtain . If is an even or odd BSCC in SPG , we denote with what we call the associated even pBSCC or odd pBSCC in SSG respectively. While those are not BSCCs, they correspond to the BSCC of the associated parity game, and we never consider the only true BSCCs of , i.e. and .
We continue with the example in Figure 2 and Figure 3. For the vertices , , and , we assign priority , and for each remaining vertex , , we assign priority . An illustration of the corresponding sub-arena , induced by our gadget construction, is provided in Figure 5. Since we do not need to distinguish between different types of vertices, we use circles for all vertices. In , the set forms an odd BSCC. We refer to the set in as the associated odd pBSCC with , , and having outgoing transitions to either or .
3.2 Before Entering a pBSCC in SSG
Recall that when Eve and Adam follow pure memoryless strategies and , the resulting sub-arenas and can be viewed as finite Markov chains. We first focus on what happens before a play reaches a pBSCC in . Specifically, we give a lower bound on the probability of reaching an entry state of a pBSCC without entering a winning or losing sink. Later, in Lemma 11, we use this bound to determine a suitable value for .
Intuitively, we show that the probability of entry is minimized in a classical worst-case scenario extending the sub-arena in Figure 3, and in Figure 5. More precisely, we consider an original sub-arena , depicted in Figure 6(a) with states arranged in a sequence (as in Figure 3) before reaching a BSCC. Each state has the minimal probability of progressing to the next state and a maximal probability of returning to the initial state. All these states have parity value . Upon applying our gadget construction to obtain in Figure 6(b), we introduce random states ( in Figure 5) that have the highest probability of going to the winning sink .
In the following, let be an SPG, and be its associated SSG. For all strategy pairs , let denote the probability for a play starting from to reach a pBSCC in . Note that we never consider as the starting vertex.
Lemma 6.
For all strategy pairs , for all , it holds:
where , , , and .
Sketch of Proof.
We fix an arbitrary strategy pair , and analyze the corresponding MC . We simplify the MC while either preserving or under-approximating the probability of reaching a pBSCC in . These steps merge all pBSCCs into a sink , eliminate auxiliary states to simplify the MC, increase all values of , and restructure transitions so that only one designated vertex can reach the sink directly. We denote the resulting MC with . We then derive a lower bound on the probability of reaching in , which provides a reachability lower bound in a template MC with absorbing sinks and bounded transition probabilities. As the reachability probabilities of in underapproximate those in , this yields the desired lower bound on .
3.3 Inside a pBSCC in SSG
We now focus on what happens after a play reaches a pBSCC in sub-arena . Specifically, we consider MCs and give a lower bound on the probability of reaching the winning sink after reaching an even pBSCC, and dually an upper bound on the probability of reaching the winning sink after reaching an odd pBSCC.
The lower bound is attained in the MC shown in Figure 7, where is an even parity value. There are states in a line, and winning and losing sinks. Each white state has maximal probability to return to the initial state, and otherwise proceeds to the next blue state. Each blue state, except , can with probability go to the losing sink, and otherwise proceeds to the next white state. The special state goes with probability to , and otherwise proceeds to . Unlike the case with in Figure 5, this MC cannot be obtained by applying our gadget on some sub-arena , and hence this bound is not guaranteed to be tight. More precisely, the outgoing transitions of indicate that has an odd parity value, while suggests otherwise. The upper bound is obtained by considering the same MC, where is an odd parity value. Later, in Lemma 11, we use these two bounds to find suitable values for all with .
Let be an even BSCC with smallest priority in and its associated pBSCC in . Let denote the minimum probability of reaching the winning sink after reaching . That is, . We denote it when is clear from context. Analogously, given an odd BSCC with smallest priority in , we use to denote the maximum probability of reaching the winning sink after reaching .
Lemma 7.
For all strategy pairs , for all even , it holds:
and for all odd , it holds:
where , , , , , and is as before.
Sketch of Proof.
The proof follows the same structure as the one for Lemma 6, applied to a BSCC in . We apply a similar four-step transformation and obtain a simplified MC where we can directly derive an upper bound. The lower bound comes as the dual.
3.4 Range of Winning Probabilities in the SSG
We now relate the winning probabilities in the constructed SSG to the original SPG . Intuitively, with a fixed strategy pair , the value of the SSG falls into a range around the value of the SPG , and the range size depends on the probabilities , and .
Lemma 8.
Let such that for all even , and , and for all odd , , then it holds:
4 Reducing SPGs to SSGs
We now present the direct reduction from SPGs to SSGs. Let be a stochastic arena, be a priority function, and be an SPG. We construct the SSG using the gadget presented in Section 3.1. Section 4.1 presents a lower bound on the difference between winning probabilities associated to different strategy pairs in . Section 4.2 presents the main theorem establishing the reduction, while Section 4.3 gives complexity bounds.
4.1 A Lower Bound on Different Strategies
We consider two strategy pairs and show a general result on all such pair: if they yield different values in , then there exists a lower bound on the difference between these values.
In the following, we assume for all that is a rational number , where and . Let and .
Lemma 9.
For all , for all , the following holds:
Proof.
Let be the probability for a play starting from to reach an even BSCC. It follows from Corollary 5 that for all and , we have . We can obtain by setting all vertices belonging to at least one even BSCC as the target set, and calculating the reachability probability. Calculating is thus reduced to solving a linear equation system according to Theorem 1. We omit the details of and . Every non-zero entry of and is either , or for some .
We use the following notations:
-
Let . It follows that since there is at least one vertex in a BSCC.
-
Let . For we denote the -th row of with , and the entry of at -th row and -th column with . It can be written as , where and are natural numbers bounded by with .
-
We denote the -th entry of with . It can be written as , where and are natural numbers bounded by with .
The equation system can be written as:
We take an arbitrary row , and write the -th equation as follows:
| (1) |
We multiply equation (1) with to obtain:
-
For all , equals , an integer with absolute value bounded by .
-
For all , equals , an integer with absolute value bounded by .
We apply this transformation to each row of the equation system, and write the new equation system as:
By Cramer’s rule, for all , we obtain:
where is the matrix obtained by replacing the -th column of with the column vector . It follows that all entries of are also integers with absolute values bounded by .
Since is a reachability probability, we have . Following from the calculation of determinants, we obtain the following:
Therefore, if the equation system resulting from yields , we have:
4.2 Direct Reduction
We now establish the direct reduction from SPGs to SSGs.
Theorem 10 (Reducing SPGs to SSGs).
If for all , and , the following conditions hold:
-
1.
-
2.
for all even , and for all odd
where , then every optimal strategy of Eve in the SSG is also optimal in the SPG . The same holds for Adam.
Proof.
We assume conditions 1 and 2 hold. We show that every optimal strategy in SSG is also optimal in SPG . We prove this by contraposition.
We take and as the sets of optimal strategies of Eve and Adam in . We obtain by Lemma 8 that for all and all , the following holds:
Since conditions 1 and 2 hold, we substitute and to obtain:
| (2) |
If is not optimal in , then there exists another strategy and a vertex such that . It follows again from Lemma 8 that:
| (3) |
Furthermore, Lemma 9 yields:
| (4) |
As a result, we obtain the following:
| by (3) | ||||
| by (4) | ||||
| by (2) |
It indicates that , which contradicts the assumption that .
Until now, we have used the function in Theorem 10, obtaining inequalities relating parity values in SPG to transition probabilities in SSG . We now give requirements for that satisfy all these inequalities.
Sketch of Proof.
Both cases follow a similar structure. For (respectively ), we derive from the bound given by Lemma 6 (resp. Lemma 7) a corollary giving a bound that explicitly makes use of . We then directly obtain the two cases of Lemma 11 from these two bounds. The full proof of this lemma, detailing how to compute function can be found in Appendix A.1.
4.3 Complexity Considerations
To introduce complexity results, we first define size of a stochastic game as where is the space needed to store the transition function (which may be stored in unary or binary). A now longstanding result shows that most stochastic game settings are polynomially reducible one to the other. In particular:
Theorem 12 (From Theorem 1 in [2]).
Solving stochastic parity games and solving simple stochastic games is polynomial-time equivalent. Either can be using unary or binary encoding.
We show that the reduction we have introduced in this paper is polynomial with binary encoding. We recall that and .
Theorem 13.
Given an SPG , there exist polynomial values for function that satisfy Theorem 10, such that the SSG is of size in binary.
Proof.
Since and , the following is a valid instance of , polynomial in (and polynomial in the transition probabilities appearing in ) under binary encoding:
Then the size of the SSG is:
According to [2], quantitative SSGs under unary and binary encoding are in the same complexity class, and so in [18]. We thus obtain that our reduction yields an algorithm for solving SPGs.
5 Epilogue
We have given a polynomial reduction from quantitative SPGs to quantitative SSGs, taking inspiration from a gadget used in [9] to obtain a reduction from deterministic PGs to quantitative SSGs. After fixing a pair of strategies, the values of both the SPG and the SSG are determined, but the construction of the SSG makes it difficult to establish coinciding values. Using these fixed strategies, we showed that the value of the SSG falls into a range around the value of the SPG, where this range depends on the probability to reach a pBSCC of the SSG and the minimum probability to reach a winning sink in pBSCCs of the SSG. When considering all possible strategy pairs, we obtained a lower bound on their value differences in the SPG, by restricting transition probabilities to rational numbers and analyzing reachability equation systems of Markov chains. We then showed that by arranging transition probabilities of the SSG properly in terms of the size of the SPG, its smallest probability, and , the value ranges of different strategy pairs can be narrowed so that they do not overlap. In this case, a reduction from SPGs to SSGs is achieved.
Although under unary encoding, exponential numbers can be introduced into the probability function of the newly constructed SSGs, both reductions are polynomial. Hence, our construction yields an algorithm in both qualitative and quantitative SSGs under unary and binary encoding, substantiating the complexity results from previous works [18, 2, 9]. .
Our result enables solving SPGs by first reducing them to SSGs and then applying algorithms for SSGs. However, its implementability is in question, due to the possibly huge representation of . Our reduction also captures the transformation from an MDP with a parity objective into an SSG. As we assume the minimum transition probability to be in the original SPG, we cannot capture the subcase of reducing DPGs to quantitative SSGs. Although our reduction is unlikely to be leveraged to effectively solve SPGs in practice, some improvements are possible. First, we have not formally examined the optimal arrangement of . It is possible to find the weakest requirements on so that the reductions are correct, thus optimizing possible implementations. Second, some specific cases lead to very small values of . These cases are similar to the ones that can challenge classical MDP solvers using VI, and so we can benefit from any family of arena structure where these cases are avoided, leading to implementable valuations of .
References
- [1] Rabah Amir. Stochastic games in economics and related fields: an overview. Stochastic Games and Applications, pages 455–470, 2003.
- [2] Daniel Andersson and Peter Bro Miltersen. The complexity of solving stochastic games on graphs. In International Symposium on Algorithms and Computation, pages 112–121. Springer, 2009. doi:10.1007/978-3-642-10631-6_13.
- [3] Muqsit Azeem, Alexandros Evangelidis, Jan Křetínskỳ, Alexander Slivinskiy, and Maximilian Weininger. Optimistic and topological value iteration for simple stochastic games. In International Symposium on Automated Technology for Verification and Analysis, pages 285–302. Springer, 2022.
- [4] Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT press, 2008.
- [5] Christel Baier, Joachim Klein, Linda Leuschner, David Parker, and Sascha Wunderlich. Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In International Conference on Computer Aided Verification, volume 10426, pages 160–180. Springer, 2017. doi:10.1007/978-3-319-63387-9_8.
- [6] Raphaël Berthon, Joost-Pieter Katoen, and Zihan Zhou. A direct reduction from stochastic parity games to simple stochastic games, 2025. arXiv:2506.06223.
- [7] Patricia Bouyer, Youssouf Oualhadj, Mickael Randour, and Pierre Vandenhove. Arena-independent finite-memory determinacy in stochastic games. Log. Methods Comput. Sci., 19(4), 2023. doi:10.46298/LMCS-19(4:18)2023.
- [8] Cristian S Calude, Sanjay Jain, Bakhadyr Khoussainov, Wei Li, and Frank Stephan. Deciding parity games in quasipolynomial time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 252–263, 2017. doi:10.1145/3055399.3055409.
- [9] Krishnendu Chatterjee and Nathanaël Fijalkow. A reduction from parity games to simple stochastic games. Electronic Proceedings in Theoretical Computer Science, 54:74–86, 2011. doi:10.4204/eptcs.54.6.
- [10] Krishnendu Chatterjee and Thomas A Henzinger. Strategy improvement and randomized subexponential algorithms for stochastic parity games. In Annual Symposium on Theoretical Aspects of Computer Science, pages 512–523. Springer, 2006. doi:10.1007/11672142_42.
- [11] Krishnendu Chatterjee and Thomas A Henzinger. Reduction of stochastic parity to stochastic mean-payoff games. Information Processing Letters, 106(1):1–7, 2008. doi:10.1016/J.IPL.2007.08.035.
- [12] Krishnendu Chatterjee, Thomas A Henzinger, Barbara Jobstmann, and Arjun Radhakrishna. Gist: A solver for probabilistic games. In Computer Aided Verification: 22nd International Conference, volume 6174, pages 665–669. Springer, 2010. doi:10.1007/978-3-642-14295-6_57.
- [13] Krishnendu Chatterjee, Marcin Jurdziński, and Thomas A. Henzinger. Simple stochastic parity games. In Matthias Baaz and Johann A. Makowsky, editors, Computer Science Logic, volume 2803, pages 100–113, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. doi:10.1007/978-3-540-45220-1_11.
- [14] Krishnendu Chatterjee, Marcin Jurdzinski, and Thomas A Henzinger. Quantitative stochastic parity games. In SODA, volume 4, pages 121–130, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982808.
- [15] Krishnendu Chatterjee, Rupak Majumdar, and Marcin Jurdzinski. On Nash equilibria in stochastic games. In Computer Science Logic, volume 3210 of Lecture Notes in Computer Science, pages 26–40. Springer, 2004. doi:10.1007/978-3-540-30124-0_6.
- [16] Krishnendu Chatterjee and Nir Piterman. Combinations of qualitative winning for stochastic parity games. In 30th International Conference on Concurrency Theory, volume 140 of LIPIcs, pages 6:1–6:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.CONCUR.2019.6.
- [17] Anne Condon. On algorithms for simple stochastic games. Advances in Computational Complexity Theory, 13:51–72, 1990. doi:10.1090/DIMACS/013/04.
- [18] Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992. doi:10.1016/0890-5401(92)90048-K.
- [19] Wojciech Czerwiński, Laure Daviaud, Nathanaël Fijalkow, Marcin Jurdziński, Ranko Lazić, and Paweł Parys. Universal trees grow inside separating automata: Quasi-polynomial lower bounds for parity games. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2333–2349. SIAM, 2019.
- [20] Matthew Darlington, Kevin D. Glazebrook, David S. Leslie, Rob Shone, and Roberto Szechtman. A stochastic game framework for patrolling a border. Eur. J. Oper. Res., 311(3):1146–1158, 2023. doi:10.1016/J.EJOR.2023.06.011.
- [21] Julia Eisentraut, Edon Kelmendi, Jan Křetínskỳ, and Maximilian Weininger. Value iteration for simple stochastic games: Stopping criterion and learning algorithm. Information and Computation, 285:104886, 2022. doi:10.1016/J.IC.2022.104886.
- [22] Kousha Etessami, Marta Z. Kwiatkowska, Moshe Y. Vardi, and Mihalis Yannakakis. Multi-objective model checking of Markov decision processes. Log. Methods Comput. Sci., 4(4), 2008. doi:10.2168/LMCS-4(4:8)2008.
- [23] Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Model-free reinforcement learning for stochastic parity games. In 31st International Conference on Concurrency Theory, volume 171 of LIPIcs, pages 21:1–21:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.CONCUR.2020.21.
- [24] Arnd Hartmanns and Benjamin Lucien Kaminski. Optimistic value iteration. In Computer Aided Verification - 32nd International Conference, volume 12225 of Lecture Notes in Computer Science, pages 488–511. Springer, 2020. doi:10.1007/978-3-030-53291-8_26.
- [25] Christian Hensel, Sebastian Junges, Joost-Pieter Katoen, Tim Quatmann, and Matthias Volk. The probabilistic model checker storm. International Journal on Software Tools for Technology Transfer, pages 1–22, 2021.
- [26] Marcin Jurdziński. Deciding the winner in parity games is in . Information Processing Letters, 68(3):119–124, 1998.
- [27] Marcin Jurdziński and Ranko Lazić. Succinct progress measures for solving parity games. In 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–9. IEEE, 2017.
- [28] Mark Kattenbelt, Marta Kwiatkowska, Gethin Norman, and David Parker. A game-based abstraction-refinement framework for Markov decision processes. Formal Methods in System Design, 36:246–280, 2010. doi:10.1007/S10703-010-0097-6.
- [29] Jan Křetínskỳ, Emanuel Ramneantu, Alexander Slivinskiy, and Maximilian Weininger. Comparison of algorithms for simple stochastic games. Information and Computation, 289:104885, 2022. doi:10.1016/J.IC.2022.104885.
- [30] Marta Kwiatkowska, Gethin Norman, David Parker, and Gabriel Santos. PRISM-games 3.0: Stochastic game verification with concurrency, equilibria and time. In Computer Aided Verification - 32nd International Conference, volume 12225 of Lecture Notes in Computer Science, pages 475–487. Springer, 2020. doi:10.1007/978-3-030-53291-8_25.
- [31] Marta Z. Kwiatkowska, Gethin Norman, and David Parker. PRISM 4.0: Verification of probabilistic real-time systems. In Computer Aided Verification - 23rd International Conference, volume 6806 of Lecture Notes in Computer Science, pages 585–591. Springer, 2011. doi:10.1007/978-3-642-22110-1_47.
- [32] Karoliina Lehtinen. A modal perspective on solving parity games in quasi-polynomial time. In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, pages 639–648, 2018. doi:10.1145/3209108.3209115.
- [33] David S Leslie, Steven Perkins, and Zibo Xu. Best-response dynamics in zero-sum stochastic games. Journal of Economic Theory, 189:105095, 2020. doi:10.1016/J.JET.2020.105095.
- [34] Rupak Majumdar, Kaushik Mallik, Anne-Kathrin Schmuck, and Sadegh Soudjani. Symbolic control for stochastic systems via finite parity games. Nonlinear Analysis: Hybrid Systems, 51:101430, 2024. doi:10.1016/j.nahs.2023.101430.
- [35] Donald A Martin. The determinacy of Blackwell games. The Journal of Symbolic Logic, 63(4):1565–1581, 1998. doi:10.2307/2586667.
- [36] Pawel Parys. Parity games: Zielonka’s algorithm in quasi-polynomial time. In 44th International Symposium on Mathematical Foundations of Computer Science, volume 138 of LIPIcs, pages 10:1–10:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.MFCS.2019.10.
- [37] Kittiphon Phalakarn, Toru Takisaka, Thomas Haas, and Ichiro Hasuo. Widest paths and global propagation in bounded value iteration for stochastic games. In Computer Aided Verification: 32nd International Conference, pages 349–371. Springer, 2020. doi:10.1007/978-3-030-53291-8_19.
- [38] Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- [39] Tim Quatmann and Joost-Pieter Katoen. Sound value iteration. In Computer Aided Verification - 30th International Conference, volume 10981 of Lecture Notes in Computer Science, pages 643–661. Springer, 2018. doi:10.1007/978-3-319-96145-3_37.
- [40] Lloyd S Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, 1953.
- [41] Frédéric Simard, Josée Desharnais, and François Laviolette. General cops and robbers games with randomness. Theor. Comput. Sci., 887:30–50, 2021. doi:10.1016/J.TCS.2021.06.043.
- [42] Michael Ummels and Dominik Wojtczak. The complexity of Nash equilibria in stochastic multiplayer games. Log. Methods Comput. Sci., 7(3), 2011. doi:10.2168/LMCS-7(3:20)2011.
- [43] Uri Zwick and Mike Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158(1-2):343–359, 1996. doi:10.1016/0304-3975(95)00188-3.
Appendix A Appendix
We present how we compute values for function . We include the details, hoping these techniques may later be modified to yield better values on specific classes of games.
A.1 Proof of Lemma 11
Lemma 11. [Restated, see original statement.]
A.1.1 Arranging
We start by giving a bound on that involves . To do so, we make use of Lemma 6.
Corollary 14 (Another Lower Bound of Probability).
For all strategy pairs , for all , the following holds:
Proof.
We scale down as follows:
| Lemma 6 | ||||
| since | ||||
| since |
We can now arrange . It follows from Corollary 14 that:
Therefore to show:
| (5) |
where , it suffices to show that:
| (6) |
which can be further simplified as:
| (7) |
We show that when , inequalit (7) holds. We start with the right side:
| follows from Bernoulli’s inequality | ||||
| since | ||||
Therefore we obtain that when , inequality (5) holds, and thus condition (1) is satisfied.
A.1.2 Arranging
We start by getting a lower bound on for even ’s, which makes use of and . The reasoning for odd ’s is symmetric. To do so, we use Lemma 7.
Corollary 15 (Another Lower Bound of Probability).
For all strategy pairs , for all even , the following holds:
Proof.
We scale down the right side as follows:
| Lemma 7 | ||||
| since and | ||||
| follows from Bernoulli inequality | ||||
| since | ||||
| since | ||||
We now arrange . It follows from Corollary 15 that for all strategy pairs , for all even , the following holds:
| (8) |
Therefore to show:
| (9) |
it suffices to show that for all :
| (10) |
We denote with in the following calculation. Inequality 10 can be further simplified to:
| (11) |
and can be finally simplified to:
| (12) |
Therefore we obtain that if for all , the following holds:
then inequality (9) holds, and thus condition (2) is satisfied.
