Expectation in Stochastic Games with Prefix-Independent Objectives

Doyen, Laurent; Gaba, Pranshu; Guha, Shibashis

doi:10.4230/LIPIcs.CONCUR.2025.16

Expectation in Stochastic Games with Prefix-Independent Objectives

Laurent Doyen

Université Paris-Saclay, CNRS, ENS Paris-Saclay, LMF, Gif-sur-Yvette, France Pranshu Gaba

Tata Institute of Fundamental Research, Mumbai, India Shibashis Guha

Tata Institute of Fundamental Research, Mumbai, India

Abstract

Stochastic two-player games model systems with an environment that is both adversarial and stochastic. In this paper, we study the expected value of bounded quantitative prefix-independent objectives in the context of stochastic games. We show a generic reduction from the expectation problem to linearly many instances of the almost-sure satisfaction problem for threshold Boolean objectives. The result follows from partitioning the vertices of the game into so-called value classes where each class consists of vertices of the same value. Our procedure further entails that the memory required by both players to play optimally for the expectation problem is no more than the memory required by the players to play optimally for the almost-sure satisfaction problem for a corresponding threshold Boolean objective.

We show the applicability of the framework to compute the expected window mean-payoff measure in stochastic games. The window mean-payoff measure strengthens the classical mean-payoff measure by computing the mean payoff over windows of bounded length that slide along an infinite path. We show that the decision problem to check if the expected window mean-payoff value is at least a given threshold is in $\mathsf{UP}\cap\mathsf{coUP}$ when the window length is given in unary.

Keywords and phrases:

Stochastic games, finitary objectives, mean payoff, reactive synthesis

Copyright and License:

2012 ACM Subject Classification:

Mathematics of computing

\rightarrow

Stochastic processes ; Theory of computation

\rightarrow

Probabilistic computation

Editors:

Patricia Bouyer and Jaco van de Pol

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Reactive systems typically have an infinite execution where the controller continually reacts to the environment. Given a specification, the reactive controller synthesis problem [19] concerns with synthesising a policy for the controller such that the specification is satisfied by the system for all behaviours of the environment. This problem is modelled using two-player turn-based games on graphs, where the two players are the controller ( $\textrm{Player}\leavevmode\nobreak\ {1}$ ) and the environment ( $\textrm{Player}\leavevmode\nobreak\ {2}$ ), the vertices and the edges of the game graph represent the states and transitions of the system, and the objective of $\textrm{Player}\leavevmode\nobreak\ {1}$ is to satisfy the specification. An execution of the system is then an infinite path in the game graph. The reactive controller synthesis problem corresponds to determining if there exists a strategy of $\textrm{Player}\leavevmode\nobreak\ {1}$ such that for all strategies of $\textrm{Player}\leavevmode\nobreak\ {2}$ , the outcome satisfies the objective. If such a winning strategy exists, then we would also like to synthesise it. The environment is considered as an adversarial player to ensure that the specification is met even in the worst-case scenario.

Objectives are either Boolean or quantitative. Each execution either satisfies a Boolean objective ${\psi{}}$ or does not satisfy ${\psi{}}$ . The set of executions that satisfy ${\psi{}}$ form a language over infinite words with the alphabet being the set of vertices in the graphs. On the other hand, a quantitative objective ${\varphi}$ evaluates the performance of the execution by a numerical metric, which $\textrm{Player}\leavevmode\nobreak\ {1}$ aims to maximise and $\textrm{Player}\leavevmode\nobreak\ {2}$ aims to minimise. A quantitative objective can be viewed as a real-valued function over infinite paths in the graph.

In the presence of uncertainty or probabilistic behaviour, the game graph becomes stochastic. Fixing the strategies of the two players gives a distribution over infinite paths in the game graph. For Boolean objectives ${\psi{}}$ , the goal of $\textrm{Player}\leavevmode\nobreak\ {1}$ is to maximise the probability that an outcome satisfies ${\psi{}}$ . For quantitative objectives ${\varphi}$ , there are two possible views:

Satisfaction.: Given a threshold $\lambda$ , to maximise the probability that ${\varphi}$ -value of the outcome is greater than $\lambda$ ;
Expectation.: To maximise the ${\varphi}$ -value of the outcome in expectation.

Either view may be desirable depending on the context [8, 7, 6, 5]. The satisfaction view can be seen as a Boolean objective: the ${\varphi}$ -value of the outcome is either greater than $\lambda$ or it is not. The expectation view is more nuanced, and is the subject of study in this paper.

In this paper, we look at the expectation problem for quantitative prefix-independent objectives (also referred to as tail objectives). These are objectives that do not depend on finite prefixes of the plays, but only on the long-run behaviour of the system. In systems, we are often willing to allow undesirable behaviour in the short-term, if the long run behaviour is desirable. Prefix-independent objectives model such requirements and thus are of interest to study [9]. Prefix-independent objectives also have the benefit that they satisfy the Bellman equations [34], which simplifies their analysis. The expectation problem for such objectives arises naturally in many scenarios. For example:

(i)

An algorithmic trading system is designed to generate profit by executing trades based on real-time market data. Following an initial phase of learning and unstable behaviour due to parameter tuning, average profit over a bounded time window must always exceed a threshold and decisions need to be made within short well-defined intervals for them to be effective.
(ii)

A power plant may have different strategies to produce power (such as coal, solar, nuclear, wind) and must allocate resources among these strategies so as to maximise the power produced in expectation.

Contributions.

All of our contributions are with regard to quantitative prefix-independent objectives ${\varphi}$ that are bounded (i.e., the image of ${\varphi}$ is bounded between integers $-\textsf{W}_{{\varphi}}$ and $\textsf{W}_{{\varphi}}$ ) and such that a bound $\textsf{den}_{{\varphi}}$ on the denominators of the optimal expected ${\varphi}$ -values of vertices in the game is known. The bound on the image ensures determinacy [35], that is, the players have optimal strategies, and the bound on the denominator of optimal values of vertices discretise the search space. These bounds often exist and are easily derivable for common objectives of interest such as mean payoff.

Our primary contribution is a reduction of the expectation problem for such an objective ${\varphi}$ to linearly many instances of the almost-sure satisfaction problem for threshold Boolean objectives $\{{\varphi}>\lambda\}$ for thresholds $\lambda\in\mathbb{Q}$ . Deciding the almost-sure satisfaction of $\{{\varphi}>\lambda\}$ is conceptually simpler than computing the expected value of ${\varphi}$ , as in the former, we only need to consider if the measure of the paths that satisfy the objective $\{{\varphi}>\lambda\}$ is equal to one, whereas in the latter, one must take the averages of the measures of the sets of paths $\pi$ weighted with the value ${\varphi}(\pi)$ of the paths. Our technique is generic in the sense that when an algorithm for the almost-sure satisfaction problem for $\{{\varphi}>\lambda\}$ is known, we directly obtain the complexity and a way to solve the expectation problem for ${\varphi}$ .

Our reduction builds on the technique introduced in [16] for Boolean prefix-independent objectives and non-trivially extends it to quantitative prefix-independent objectives ${\varphi}$ for which the bounds $\textsf{W}_{{\varphi}}$ and $\textsf{den}_{{\varphi}}$ are known. The expected ${\varphi}$ -values of vertices are nondeterministically guessed, and we present a characterisation (Theorem 7, similar to [16, Lemma 8] but with important and subtle differences) to verify the guess. We also explicitly construct strategies for both players that are optimal for the expectation of ${\varphi}$ , in terms of almost-sure winning strategies for $\{{\varphi}>\lambda\}$ (proof of Lemma 9). The memory requirement for the constructed optimal strategies is the same as that of the almost-sure winning strategies (Corollary 10).

Our framework gives an alternative approach to solve the expectation problem for well-studied objectives such as expected mean payoff and gives new results for not-as-well-studied objectives such as the window mean-payoff objectives introduced in [12]. As our secondary contribution, we illustrate our technique by applying it to two variants of window mean-payoff objectives: fixed ( $\textsf{FWMP}(\ell)$ ) and bounded (BWMP) window mean-payoff. Using our reduction, we are able to show that for both of these objectives, the expectation problem is in $\mathsf{UP}\cap\mathsf{coUP}$ (Theorem 18 and Theorem 22), a result that was not known before. The $\mathsf{UP}\cap\mathsf{coUP}$ upper bound for window mean-payoff objectives matches the special case of simple stochastic games [20, 13], and thus would require a major breakthrough to be improved. The lower bounds on the memory requirements for these objectives carry over from special case of the non-stochastic games [12, 22]. We summarise the complexity results and bounds on the memory requirements for the window mean-payoff objectives in Table 1.

Table 1: Complexity and bounds on memory requirement for window mean-payoff objectives.

Objective

Complexity

Memory (

\textrm{Player}\leavevmode\nobreak\ {1}

)

(lower [22], upper)

Memory (

\textrm{Player}\leavevmode\nobreak\ {2}

)

(lower [22], upper)

\textsf{FWMP}(\ell)

\mathsf{UP}\cap\mathsf{coUP}

\ell-1

,

\ell

\lvert V\rvert-\ell

,

\lvert V\rvert\cdot\ell

BWMP

\mathsf{UP}\cap\mathsf{coUP}

memoryless, memoryless

infinite, infinite

Related work.

Stochastic games were introduced by Shapley [38] where these games were studied under expectation semantics for discounted-sum objectives. In [14], it was shown that solving stochastic parity games reduces to solving stochastic mean-payoff games. Further, solving stochastic parity games, stochastic mean-payoff games, and simple stochastic games (i.e., stochastic games with reachability objective) are all polynomial-time equivalent [30, 1], and thus, are all in $\mathsf{UP}\cap\mathsf{coUP}$ [13]. A sub-exponential (or even quasi-polynomial) time deterministic algorithms for simple stochastic games on graphs with poly-logarithmic treewidth was proposed in [18]. In [29], sufficient conditions on the objective were shown such that optimal deterministic memoryless strategies exist for the players. In [34], value iteration to solve the expectation problem in stochastic games with reachability, safety, total-payoff, and mean-payoff objectives was studied.

Mean-payoff objectives were studied initially in two-player games, without stochasticity [24, 39], and with stochasticity in [28]. Finitary versions were introduced as window mean-payoff objectives [12]. For finitary mean-payoff objectives, the satisfaction problem [6] and the expectation problem [5] were studied in the special case of Markov decision processes (MDPs), which correspond to stochastic games with a trivial adversary. Expected mean payoff, expected discounted payoff, expected total payoff, etc., are widely studied for MDPs [36, 4]. Both the expectation problem [5] and the satisfaction problem [6] for the $\textsf{FWMP}(\ell)$ objective are in $\mathsf{PTIME}$ , while they are in $\mathsf{UP}\cap\mathsf{coUP}$ for the BWMP objective. Ensuring the satisfaction and expectation semantics simultaneously was studied in MDPs for the mean-payoff objective in [17] and for the window mean-payoff objectives in [26]. In both cases, the complexity was shown to be no greater than that of only expectation optimisation.

The satisfaction problem for window mean-payoff objectives has been studied for two-player stochastic games in [22]. While positive and almost-sure satisfaction of $\textsf{FWMP}(\ell)$ are in $\mathsf{PTIME}$ , it follows from [22] that the problem is in $\mathsf{UP}\cap\mathsf{coUP}$ for quantitative satisfaction i.e., with threshold probabilities $0<p<1$ . Furthermore, the satisfaction problem of BWMP is in $\mathsf{UP}\cap\mathsf{coUP}$ and thus has the same complexity as that of the special case of MDPs [6].

Due to lack of space, some of the proofs and details have been omitted. A full version of the paper can be found in [21].

2 Preliminaries

Probability distributions.

A probability distribution over a finite non-empty set $A$ is a function $\mathsf{Pr}\colon A\to[0,1]$ such that $\sum_{a\in A}\mathsf{Pr}(a)=1$ . We denote by $\immediate\immediate\immediate{\color[rgb]{0.109375,0.18359375,0.18359375}% \definecolor[named]{pgfstrokecolor}{rgb}{0.109375,0.18359375,0.18359375}% \mathcal{D}(}A{\color[rgb]{0.109375,0.18359375,0.18359375}\definecolor[named]{% pgfstrokecolor}{rgb}{0.109375,0.18359375,0.18359375})}$ the set of all probability distributions over $A$ . For the algorithmic and complexity results, we assume that probabilities are given as rational numbers.

Stochastic games.

We consider two-player turn-based zero-sum stochastic games (or simply, stochastic games in the sequel). The two players are referred to as $\textrm{Player}\leavevmode\nobreak\ {1}$ (she/her) and $\textrm{Player}\leavevmode\nobreak\ {2}$ (he/him). A stochastic game is given by $\mathcal{G}=((V,E),(V_{1},V_{2},V_{\Diamond}),\mathbb{P},w)$ , where:

$\blacksquare$

$(V,E)$ is a directed graph with a finite set $V$ of vertices and a set $E\subseteq V\times V$ of directed edges such that for all vertices $v\in V$ , the set $\immediate\immediate\immediate{\color[rgb]{0.109375,0.18359375,0.18359375}% \definecolor[named]{pgfstrokecolor}{rgb}{0.109375,0.18359375,0.18359375}E(}v{% \color[rgb]{0.109375,0.18359375,0.18359375}\definecolor[named]{pgfstrokecolor}% {rgb}{0.109375,0.18359375,0.18359375})}=\{v^{\prime}\in V\mid(v,v^{\prime})\in E\}$ of out-neighbours of $v$ is non-empty, i.e., ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}\neq\varnothing$ (no deadlocks).
$\blacksquare$

$(V_{1},V_{2},V_{\Diamond})$ is a partition of $V$ . The vertices in $V_{1}$ belong to $\textrm{Player}\leavevmode\nobreak\ {1}$ , the vertices in $V_{2}$ belong to $\textrm{Player}\leavevmode\nobreak\ {2}$ , and the vertices in $V_{\Diamond}$ are called probabilistic vertices;
$\blacksquare$

$\mathbb{P}\colon V_{\Diamond}\to{\color[rgb]{0.1875,0,0.1875}\definecolor[% named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875}\mathcal{D}(}V{\color[rgb]{% 0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875})}$ is a probability function that describes the behaviour of probabilistic vertices in the game. It maps each probabilistic vertex $v\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}V_{\Diamond}}$ to a probability distribution $\mathbb{P}(v)$ over the set ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}$ of out-neighbours of $v$ such that $\mathbb{P}(v)(v^{\prime})>0$ for all $v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{% rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}$ (i.e., all out-neighbours have non-zero probability);
$\blacksquare$

$w\colon E\to\mathbb{Q}$ is a payoff function assigning a rational payoff to every edge in the game.

Stochastic games are played in rounds. The game starts by initially placing a token on some vertex. At the beginning of a round, if the token is on a vertex $v$ , and $v\in V_{i}$ for $i\in\{1,2\}$ , then $\textrm{Player}\leavevmode\nobreak\ {i}$ chooses an out-neighbour $v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{% rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}$ ; otherwise $v\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}V_{\Diamond}}$ , and an out-neighbour $v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{% rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}$ is chosen with probability $\mathbb{P}(v)(v^{\prime})$ . $\textrm{Player}\leavevmode\nobreak\ {1}$ receives from $\textrm{Player}\leavevmode\nobreak\ {2}$ the amount $w(v,v^{\prime})$ given by the payoff function, and the token moves to $v^{\prime}$ for the next round. This continues ad infinitum resulting in an infinite sequence $\pi=v_{0}v_{1}v_{2}\dotsm\in V^{\omega}$ such that $(v_{i},v_{i+1})\in E$ for all $i\geq 0$ , called a play. For $i<j$ , we denote by $\pi(i,j)$ the infix $v_{i}v_{i+1}\dotsm v_{j}$ of $\pi$ . Its length is $\lvert\pi(i,j)\rvert=j-i$ , the number of edges. We denote by $\pi(0,j)$ the finite prefix $v_{0}v_{1}\dotsm v_{j}$ of $\pi$ , and by $\pi(i,\infty)$ the infinite suffix $v_{i}v_{i+1}\ldots$ of $\pi$ . We denote by $\mathsf{Plays}_{\mathcal{G}}$ and $\mathsf{Prefs}_{\mathcal{G}}$ the set of all plays and the set of all finite prefixes in $\mathcal{G}$ respectively. We denote by $\mathsf{Last}(\rho)$ the last vertex of the prefix $\rho\in\mathsf{Prefs}_{\mathcal{G}}$ . We denote by $\mathsf{Prefs}^{i}_{\mathcal{G}}$ ( $i\in\{1,2\}$ ) the set of all prefixes $\rho$ such that $\mathsf{Last}(\rho)\in V_{i}$ .

A stochastic game with ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}V_{\Diamond}}=\varnothing$ is a non-stochastic two-player game, a stochastic game with $V_{2}=\varnothing$ is a Markov decision process (MDP), and a stochastic game with $V_{1}=V_{2}=\varnothing$ is a Markov chain. Figure 1 shows an example of a stochastic game; $\textrm{Player}\leavevmode\nobreak\ {1}$ vertices are shown as circles, $\textrm{Player}\leavevmode\nobreak\ {2}$ vertices as boxes, and probabilistic vertices as diamonds.

Figure 1: A stochastic game.

\textrm{Player}\leavevmode\nobreak\ {1}

vertices are denoted by circles,

\textrm{Player}\leavevmode\nobreak\ {2}

vertices are denoted by boxes, and probabilistic vertices are denoted by diamonds. The payoff for each edge is shown in red and probability distribution out of probabilistic vertices is shown in blue.

Subgames and traps.

Given a stochastic game $\mathcal{G}=((V,E),(V_{1},V_{2},V_{\Diamond}),\mathbb{P},w)$ , a subset $V^{\prime}\subseteq V$ of vertices induces a subgame if $(i)$ every vertex $v^{\prime}\in V^{\prime}$ has an outgoing edge in $V^{\prime}$ , that is ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v^{\prime}{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}\cap V^{\prime}\neq\varnothing$ , and $(ii)$ every probabilistic vertex $v^{\prime}\in V_{\Diamond}\cap V^{\prime}$ has all outgoing edges in $V^{\prime}$ , that is ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v^{\prime}{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}\subseteq V^{\prime}$ . The induced subgame is $((V^{\prime},E^{\prime}),(V_{1}\cap V^{\prime},V_{2}\cap V^{\prime},V_{% \Diamond}\cap V^{\prime}),\mathbb{P}^{\prime},w^{\prime})$ , where $E^{\prime}=E\cap(V^{\prime}\times V^{\prime})$ , and $\mathbb{P}^{\prime}$ and $w^{\prime}$ are restrictions of $\mathbb{P}$ and $w$ respectively to $(V^{\prime},E^{\prime})$ . If $T\subseteq V$ is such that for all $v\in T$ , if $v\in V_{1}\cup V_{\Diamond}$ then ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}\subseteq T$ and if $v\in V_{2}$ then ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}\cap T\neq\varnothing$ , then $T$ induces a subgame, and the subgame is a trap for $\textrm{Player}\leavevmode\nobreak\ {1}$ in $\mathcal{G}$ , since $\textrm{Player}\leavevmode\nobreak\ {2}$ can ensure that if the token reaches $T$ , then it never escapes.

Boolean objectives.

A Boolean objective ${\psi{}}$ is a Borel-measurable subset of $\mathsf{Plays}_{\mathcal{G}}$ [35]. A play $\pi\in\mathsf{Plays}_{\mathcal{G}}$ satisfies an objective ${\psi{}}$ if $\pi\in{\psi{}}$ . In a stochastic game $\mathcal{G}$ with objective ${\psi{}}$ , the objective of $\textrm{Player}\leavevmode\nobreak\ {1}$ is ${\psi{}}$ , and since $\mathcal{G}$ is a zero-sum game, the objective of $\textrm{Player}\leavevmode\nobreak\ {2}$ is the complement set $\overline{{\psi{}}}=\mathsf{Plays}_{\mathcal{G}}\setminus{\psi{}}$ . An example of a Boolean objective is reachability, denoted ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}\textsf{Reach}}(T)$ , the set of all plays that visit a vertex in the target set $T\subseteq V$ . This is formally defined and more examples of Boolean objectives are given in [21].

Quantitative objectives.

A quantitative objective is a Borel-measurable function of the form ${\varphi}\colon\mathsf{Plays}_{\mathcal{G}}\to\mathbb{R}$ . In a stochastic game $\mathcal{G}$ with objective ${\varphi}$ , the objective of $\textrm{Player}\leavevmode\nobreak\ {1}$ is ${\varphi}$ and the objective of $\textrm{Player}\leavevmode\nobreak\ {2}$ is $-{\varphi}$ , the negative of ${\varphi}$ . Let $\pi=v_{0}v_{1}v_{2}\cdots$ be a play. Some common examples of quantitative objectives include the mean-payoff objective $\textsf{MP}(\pi)=\liminf_{n\to\infty}\frac{1}{n}\sum_{i=0}^{n}w(v_{i},v_{i+1})$ , and the liminf objective $\textsf{liminf}(\pi)=\liminf_{n\to\infty}w(v_{n},v_{n+1})$ . In this work, we also consider the window mean-payoff objective, which is defined in Section 4. Corresponding to a quantitative objective ${\varphi}$ , we define threshold objectives which are Boolean objectives ${\psi{}}$ of the form $\{\pi\in\mathsf{Plays}_{\mathcal{G}}\mid{\varphi}(\pi)\bowtie\lambda\}$ for thresholds $\lambda\in\mathbb{R}$ and for $\bowtie\leavevmode\nobreak\ \in\{<,\leq,>,\geq\}$ . We denote this threshold objective succinctly as $\{{\varphi}\bowtie\lambda\}$ .

Prefix independence.

An objective is said to be prefix-independent if it only depends on the suffix of a play. Formally, a Boolean objective ${\psi{}}$ is prefix-independent if for all plays $\pi$ and $\pi^{\prime}$ with a common suffix (that is, $\pi^{\prime}$ can be obtained from $\pi$ by removing and adding a finite prefix), we have that $\pi\in{\psi{}}$ if and only if $\pi^{\prime}\in{\psi{}}$ . Similarly, a quantitative objective ${\varphi}$ is prefix-independent if for all plays $\pi$ and $\pi^{\prime}$ with a common suffix, we have that ${\varphi}(\pi)={\varphi}(\pi^{\prime})$ . Mean payoff and liminf are examples of prefix-independent objectives, whereas reachability and discounted payoff [2] are not.

Strategies.

A (deterministic or pure) strategy¹¹1 We only consider the satisfaction and expectation of Borel-measurable objectives, and deterministic strategies suffice for such objectives [10]. Satisfying two goals simultaneously, e.g., $\mathsf{Pr}({\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{% rgb}{0.1875,0,0.1875}\textsf{Reach}}(T_{1}))>0.5\wedge\mathsf{Pr}({\color[rgb]% {0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875}% \textsf{Reach}}(T_{2}))>0.5$ requires randomisation and is not allowed by our definition. for Player $i\in\{1,2\}$ in a game $\mathcal{G}$ is a function $\sigma_{i}:\mathsf{Prefs}^{i}_{\mathcal{G}}\to V$ that maps prefixes ending in a vertex $v\in V_{i}$ to a successor of $v$ . Strategies can be realised as the output of a (possibly infinite-state) Mealy machine [33]. The memory size of a strategy $\sigma_{i}$ is the smallest number of states a Mealy machine defining $\sigma_{i}$ can have. A strategy $\sigma_{i}$ is memoryless if $\sigma_{i}(\rho)$ only depends on the last element of the prefix $\rho$ , that is, for all prefixes $\rho,\rho^{\prime}\in\mathsf{Prefs}^{i}_{\mathcal{G}}$ if $\mathsf{Last}(\rho)=\mathsf{Last}(\rho^{\prime})$ , then $\sigma_{i}(\rho)=\sigma_{i}(\rho^{\prime})$ .

A strategy profile $\sigma=(\sigma_{1},\sigma_{2})$ is a pair of strategies $\sigma_{1}$ and $\sigma_{2}$ of $\textrm{Player}\leavevmode\nobreak\ {1}$ and $\textrm{Player}\leavevmode\nobreak\ {2}$ respectively. A play $\pi=v_{0}v_{1}\dotsm$ is consistent with a strategy $\sigma_{i}$ of $\text{Player\leavevmode\nobreak\ }i$ ( $i\in\{1,2\}$ ) if for all $j\geq 0$ with $v_{j}\in V_{i}$ , we have $v_{j+1}=\sigma_{i}(\pi(0,j))$ . A play $\pi$ is an outcome of a profile $\sigma=(\sigma_{1},\sigma_{2})$ if it is consistent with both $\sigma_{1}$ and $\sigma_{2}$ . For a Boolean objective ${\psi{}}$ , we denote by $\mathsf{Pr}^{\sigma_{1},\sigma_{2}}_{\mathcal{G},v}({\psi{}})$ the probability that an outcome of the profile $(\sigma_{1},\sigma_{2})$ in $\mathcal{G}$ with initial vertex $v$ satisfies ${\psi{}}$ .

Satisfaction probability of Boolean objectives.

Let ${\psi{}}$ be a Boolean objective. A strategy $\sigma_{1}$ of $\textrm{Player}\leavevmode\nobreak\ {1}$ is winning with probability $p$ from a vertex $v$ in $\mathcal{G}$ for objective ${\psi{}}$ if $\mathsf{Pr}^{\sigma_{1},\sigma_{2}}_{\mathcal{G},v}({\psi{}})\geq p$ for all strategies $\sigma_{2}$ of $\textrm{Player}\leavevmode\nobreak\ {2}$ . A strategy $\sigma_{1}$ of $\textrm{Player}\leavevmode\nobreak\ {1}$ is positive winning (resp., almost-sure winning) from $v$ for $\textrm{Player}\leavevmode\nobreak\ {1}$ in $\mathcal{G}$ with objective ${\psi{}}$ if $\mathsf{Pr}^{\sigma_{1},\sigma_{2}}_{\mathcal{G},v}({\psi{}})>0$ (resp., $\mathsf{Pr}^{\sigma_{1},\sigma_{2}}_{\mathcal{G},v}({\psi{}})=1$ ) for all strategies $\sigma_{2}$ of $\textrm{Player}\leavevmode\nobreak\ {2}$ . In the above, if such a strategy $\sigma_{1}$ exists, then the vertex $v$ is said to be positive winning (resp., almost-sure winning) for $\textrm{Player}\leavevmode\nobreak\ {1}$ . If a vertex $v$ is positive winning (resp., almost-sure winning) for $\textrm{Player}\leavevmode\nobreak\ {1}$ , then $\textrm{Player}\leavevmode\nobreak\ {1}$ is said to play optimally from $v$ if she follows a positive (resp., almost-sure) winning strategy from $v$ . We omit analogous definitions for $\textrm{Player}\leavevmode\nobreak\ {2}$ .

Expected value of quantitative objectives.

Let ${\varphi}$ be a quantitative objective. Given a strategy profile $\sigma=(\sigma_{1},\sigma_{2})$ and an initial vertex $v$ , let $\mathbb{E}_{v}^{\sigma}({\varphi})$ denote the expected ${\varphi}$ -value of the outcome of the strategy profile $\sigma$ from $v$ , that is, the expectation of ${\varphi}$ over all plays with initial vertex $v$ under the probability measure $\mathsf{Pr}^{\sigma_{1},\sigma_{2}}_{\mathcal{G},v}({\varphi})$ . We only consider objectives ${\varphi}$ that are Borel-measurable and whose image is bounded. Thus, by determinacy of Blackwell games [35], we have that stochastic games with objective ${\varphi}$ are determined. That is, we have $\sup_{\sigma_{1}}\inf_{\sigma_{2}}\mathbb{E}_{v}^{\sigma}({\varphi})=\inf_{% \sigma_{2}}\sup_{\sigma_{1}}\mathbb{E}_{v}^{\sigma}({\varphi})$ . We call this quantity the expected ${\varphi}$ -value of the vertex $v$ and denote it by $\mathbb{E}_{v}({\varphi})$ . We say that $\textrm{Player}\leavevmode\nobreak\ {1}$ plays optimally from a vertex $v$ if she follows a strategy $\sigma_{1}$ such that for all strategies $\sigma_{2}$ of $\textrm{Player}\leavevmode\nobreak\ {2}$ , the expected ${\varphi}$ -value of the outcome is at least $\mathbb{E}_{v}({\varphi})$ . Similarly, $\textrm{Player}\leavevmode\nobreak\ {2}$ plays optimally if he follows a strategy $\sigma_{2}$ such that for all strategies $\sigma_{1}$ of $\textrm{Player}\leavevmode\nobreak\ {1}$ , the expected ${\varphi}$ -value of the outcome is at most $\mathbb{E}_{v}({\varphi})$ . If ${\varphi}$ is a prefix-independent objective, then we have the following relation between the expected ${\varphi}$ -value of a vertex $v$ and the expected ${\varphi}$ -values of its out-neighbours.

Proposition 1 (Bellman equations).

If ${\varphi}$ is a prefix-independent objective, then the following equations hold for all $v\in V$ .

\mathbb{E}_{v}({\varphi})=\begin{cases}\max_{v^{\prime}\in{\color[rgb]{% 0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875}E(}v{% \color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875})}}\mathbb{E}_{v^{\prime}}({\varphi})&\text{if }v\in V_{1}\\ \min_{v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}% \definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875})}}\mathbb{E}_{v^{% \prime}}({\varphi})&\text{if }v\in V_{2}\\ \sum_{v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}% \definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875})}}\mathbb{P}(v)(v^{% \prime})\cdot\mathbb{E}_{v^{\prime}}({\varphi})&\text{if }v\in V_{\Diamond}% \end{cases}

In this paper, we consider the expectation problem for prefix-independent objectives. Our solution in turn uses the almost-sure satisfaction problem. The decision problems are defined as follows.

Decision problems.

Given a stochastic game $\mathcal{G}$ , a quantitative objective ${\varphi}$ , a vertex $v$ , and a threshold $\lambda\in\mathbb{Q}$ , the following decision problems are relevant:

$\blacksquare$

almost-sure satisfaction problem: Is vertex $v$ almost-sure winning for $\textrm{Player}\leavevmode\nobreak\ {1}$ for a threshold objective $\{{\varphi}>\lambda\}$ ?
$\blacksquare$

expectation problem: Is $\mathbb{E}_{v}({\varphi})\geq\lambda$ ? That is, is the expected ${\varphi}$ -value of $v$ at least $\lambda$ ?

The reader is pointed to [2] and [25] for a more comprehensive discussion on the above-mentioned concepts.

3 Reducing expectation to almost-sure satisfaction

In this section, we show a reduction (Theorem 7) of the expectation problem for bounded quantitative prefix-independent objectives ${\varphi}$ to the almost-sure satisfaction problem for the corresponding threshold objectives $\{{\varphi}>\lambda\}$ . The reduction involves guessing a value $\mathsf{r}_{v}$ for every vertex $v$ in the game, and then verifying if the guessed values are equal to the expected ${\varphi}$ -values of the vertices. Theorem 7 generalises [16, Lemma 8] which studies the satisfaction problem for prefix-independent Boolean objectives, as Boolean objectives can be viewed as a special case of quantitative objectives by restricting the range to $\{0,1\}$ . We further discuss the difference in approaches between Theorem 7 and [16, Lemma 8] in Section 5.

Given a game $\mathcal{G}$ and a bounded prefix-independent quantitative objective ${\varphi}$ , our reduction requires the existence of an integer bound $\textsf{den}_{{\varphi}}$ on the denominators of expected ${\varphi}$ -values of vertices in $\mathcal{G}$ . Since ${\varphi}$ is bounded, there exists an integer $\textsf{W}_{{\varphi}}$ such that $\lvert{\varphi}(\pi)\rvert\leq\textsf{W}_{{\varphi}}$ for every play $\pi$ in $\mathcal{G}$ . Thus, for every vertex $v$ in $\mathcal{G}$ , one can write $\mathbb{E}_{v}({\varphi})$ as $\frac{p}{q}$ , where $p$ and $q$ are integers such that $\lvert p\rvert\leq\textsf{W}_{{\varphi}}\cdot\textsf{den}_{{\varphi}}$ and $0<q\leq\textsf{den}_{{\varphi}}$ . The bounds $\textsf{W}_{{\varphi}}$ and $\textsf{den}_{{\varphi}}$ may depend on the objective and the structure of the graph, i.e., number of vertices, edge payoffs, probability distributions in the game, etc. These bounds effectively discretise the set of possible expected ${\varphi}$ -values of the vertices, as there are at most $(2\cdot\textsf{W}_{{\varphi}}\cdot\textsf{den}_{{\varphi}}+1)\cdot\textsf{den}% _{{\varphi}}$ distinct possible values. This directly gives a bound on the granularity of the possible expected ${\varphi}$ -values of vertices, that is, the minimum difference between two possible values of vertices, and we represent this quantity by $\varepsilon_{{\varphi}}$ . Observe that given two rational numbers with denominators at most $\textsf{den}_{{\varphi}}$ , the difference between them is at least $(\frac{1}{\textsf{den}_{{\varphi}}})^{2}$ , and thus, we let $\varepsilon_{{\varphi}}$ be $(\frac{1}{\textsf{den}_{{\varphi}}})^{2}$ .

3.1 Value vectors and value classes

We first define and give notations for value vectors, which are useful in describing the reduction, and then look at some of their interesting and useful properties.

Definitions and notations.

A vector $\vec{\mathsf{r}}=(\mathsf{r}_{v})_{v\in V}$ of reals indexed by vertices in $V$ induces a partition of $V$ such that all vertices with the same value in $\vec{\mathsf{r}}$ belong to the same part in the partition. Let $k_{\vec{\mathsf{r}}}$ denote the number of parts in the partition, and let us denote the parts by $\{\mathsf{R}({1}),\mathsf{R}({2}),\ldots,\mathsf{R}({k_{\vec{\mathsf{r}}}})\}$ . We call each part $\mathsf{R}({i})$ of the partition an $\vec{\mathsf{r}}$ -class, or simply, class if $\vec{\mathsf{r}}$ is clear from the context. For all $1\leq i\leq k_{\vec{\mathsf{r}}}$ , let $\mathfrak{r}_{i}$ denote the $\vec{\mathsf{r}}$ -value of the class $\mathsf{R}({i})$ . Given two vectors $\vec{\mathsf{r}},\vec{\mathsf{s}}$ , we write $\vec{\mathsf{r}}\geq\vec{\mathsf{s}}$ if for all $v\in V$ , we have $\mathsf{r}_{v}\geq\mathsf{s}_{v}$ , and we write $\vec{\mathsf{r}}>\vec{\mathsf{s}}$ if we have that $\vec{\mathsf{r}}\geq\vec{\mathsf{s}}$ and there exists $v\in V$ such that $\mathsf{r}_{v}>\mathsf{s}_{v}$ . For a constant $c\in\mathbb{R}$ , we denote by $\vec{\mathsf{r}}+c$ the vector obtained by adding $c$ to each component of $\vec{\mathsf{r}}$ .

For all $1\leq i\leq k_{\vec{\mathsf{r}}}$ , a vertex $v\in\mathsf{R}({i})$ is a boundary vertex if $v$ is a probabilistic vertex and has an out-neighbour not in $\mathsf{R}({i})$ , i.e., if $v\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}V_{\Diamond}}$ and ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}\not\subseteq\mathsf{R}({i})$ . Let $Bnd(\mathsf{R}({i}))$ denote the set of boundary vertices in the class $\mathsf{R}({i})$ . For all $1\leq i\leq k_{\vec{\mathsf{r}}}$ , let $\mathcal{G}_{\mathsf{R}({i})}$ denote the restriction of $\mathcal{G}$ to vertices in $\mathsf{R}({i})$ with all vertices in $Bnd(\mathsf{R}({i}))$ changed to absorbing vertices with a self-loop. The edge payoffs of these self loops are not important (we assume them to be $0$ ) as we restrict our attention to a subgame of $\mathcal{G}_{\mathsf{R}({i})}$ that does not contain boundary vertices.

Example 2.

Figure 2: Restrictions

\mathcal{G}_{\mathsf{R}({1})}

,

\mathcal{G}_{\mathsf{R}({2})}

,

\mathcal{G}_{\mathsf{R}({3})}

,

\mathcal{G}_{\mathsf{R}({4})}

, and

\mathcal{G}_{\mathsf{R}({5})}

of the game shown in Figure 1 for the vector

\vec{\mathsf{r}}=(-2,-1,-1,-1,-1,0,0,0,0,1,1,2,2,2)

.

For the game in Figure 1, let $\vec{\mathsf{r}}=(-2,-1,-1,-1,-1,0,0,0,0,1,1,2,2,2)$ be a value vector for vertices $v_{1},v_{2},\ldots,v_{14}$ respectively. Since $\vec{\mathsf{r}}$ has five distinct values, we have $k_{\vec{\mathsf{r}}}=5$ , and the five $\vec{\mathsf{r}}$ -classes are $\mathsf{R}({1})=\{v_{1}\}$ , $\mathsf{R}({2})=\{v_{2},v_{3},v_{4},v_{5}\}$ , $\mathsf{R}({3})=\{v_{6},v_{7},v_{8},v_{9}\}$ , $\mathsf{R}({4})=\{v_{10},v_{11}\}$ , and $\mathsf{R}({5})=\{v_{12},v_{13},v_{14}\}$ with values $\mathfrak{r}_{1}=-2$ , $\mathfrak{r}_{2}=-1$ , $\mathfrak{r}_{3}=0$ , $\mathfrak{r}_{4}=1$ , and $\mathfrak{r}_{5}=2$ respectively. Out of the five probabilistic vertices $v_{2}$ , $v_{5}$ , $v_{8}$ , $v_{9}$ and $v_{12}$ , we see that $v_{2}$ , $v_{5}$ , and $v_{8}$ are boundary vertices while $v_{9}$ and $v_{12}$ are not. Thus, $Bnd(\mathsf{R}({2}))=\{v_{2},v_{5}\}$ , $Bnd(\mathsf{R}({3}))=\{v_{8}\}$ , and $Bnd(\mathsf{R}({1}))=Bnd(\mathsf{R}({4}))=Bnd(\mathsf{R}({5}))=\varnothing$ . We show the restrictions $\mathcal{G}_{\mathsf{R}({i})}$ in Figure 2. $\lrcorner$

Let ${\varphi}$ be a bounded prefix-independent quantitative objective. Analogous to the notation of a general value vector $\vec{\mathsf{r}}$ , we describe notations for the expected ${\varphi}$ -value vector consisting of the expected ${\varphi}$ -values of vertices in $V$ . For all vertices $v\in V$ , let $\mathsf{s}_{v}$ denote $\mathbb{E}_{v}({\varphi})$ , the expected ${\varphi}$ -value of vertex $v$ , and let $\vec{\mathsf{s}}=(\mathsf{s}_{v})_{v\in V}$ denote the expected ${\varphi}$ -value vector. Let $\mathsf{S}(i)$ denote the $i^{\text{th}}$ $\vec{\mathsf{s}}$ -class and let $\mathfrak{s}_{i}$ denote the $\vec{\mathsf{s}}$ -value of $\mathsf{S}(i)$ .

Given a vector $\vec{\mathsf{r}}$ , it follows from Proposition 1 that the following is a necessary (but not sufficient) condition for $\vec{\mathsf{r}}$ to be the expected ${\varphi}$ -value vector $\vec{\mathsf{s}}$ .

$\blacksquare$

Bellman condition: for every vertex $v\in V$ , the following Bellman equations hold

$\mathsf{r}_{v}=\begin{cases}\max_{v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}% \definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{% 0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875})}}% \mathsf{r}_{v^{\prime}}&\text{if }v\in V_{1},\\ \min_{v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}% \definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875})}}\mathsf{r}_{v^{% \prime}}&\text{if }v\in V_{2},\\ \sum_{v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}% \definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875})}}\mathbb{P}(v)(v^{% \prime})\cdot\mathsf{r}_{v^{\prime}}&\text{if }v\in{\color[rgb]{% 0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875}V_{% \Diamond}}.\end{cases}$

Consequences of the Bellman condition.

We now see some properties of value vectors $\vec{\mathsf{r}}$ that satisfy the Bellman condition. Since boundary vertices are probabilistic vertices, the following is immediate.

Proposition 3.

Let $\vec{\mathsf{r}}$ be a value vector satisfying the Bellman condition. Then for all $1\leq i\leq k_{\vec{\mathsf{r}}}$ , for all $v\in Bnd(\mathsf{R}({i}))$ , there exists an out-neighbour of $v$ with $\vec{\mathsf{r}}$ -value less than $\mathfrak{r}_{i}$ and there exists an out-neighbour of $v$ with $\vec{\mathsf{r}}$ -value greater than $\mathfrak{r}_{i}$ . Formally, there exist $1\leq i_{1},i_{2}\leq k_{\vec{\mathsf{r}}}$ such that $\mathfrak{r}_{i_{1}}<\mathfrak{r}_{i}<\mathfrak{r}_{i_{2}}$ and ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}\cap\mathsf{R}({i_{1}})\neq\varnothing$ and ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}\definecolor[named]{% pgfstrokecolor}{rgb}{0.1875,0,0.1875})}\cap\mathsf{R}({i_{2}})\neq\varnothing$ .

A corollary of Proposition 3 is that the $\vec{\mathsf{r}}$ -classes with the smallest and the biggest $\vec{\mathsf{r}}$ -values have no boundary vertices. Note that there may also exist $\vec{\mathsf{r}}$ -classes other than these that do not contain boundary vertices (see $\mathsf{R}({4})$ in Example 2). Next, we see that the Bellman condition entails that each restriction $\mathcal{G}_{\mathsf{R}({i})}$ is a stochastic game.

Proposition 4.

If $\vec{\mathsf{r}}$ is a value vector that satisfies the Bellman condition, then for all $1\leq i\leq k_{\vec{\mathsf{r}}}$ , we have that $\mathcal{G}_{\mathsf{R}({i})}$ is a stochastic game.

In Proposition 5, we make a crucial observation about long-run behaviours of plays in $\mathcal{G}$ , which is that either player can ensure with probability $1$ that the token eventually reaches an $\vec{\mathsf{r}}$ -class from which it does not exit. This follows from the Borel-Cantelli lemma [23] due to the fact that there is a positive probability to reach an $\vec{\mathsf{r}}$ -class without boundary vertices following a finite number of edges out of boundary vertices.

Proposition 5.

Let $\vec{\mathsf{r}}$ be a value vector satisfying the Bellman condition. Suppose the strategy of $\text{Player\leavevmode\nobreak\ }i$ ( $i\in\{1,2\}$ ) is such that each time the token reaches a vertex $v\in V_{i}$ , (s)he moves the token to a vertex $v^{\prime}$ in the same $\vec{\mathsf{r}}$ -class as $v$ . Then, with probability 1, the token eventually reaches a class $\mathsf{R}({j})$ for some $1\leq j\leq k_{\vec{\mathsf{r}}}$ from which it never exits.

Finally, we define the notion of trap subgames of $\mathcal{G}_{\mathsf{R}({i})}$ which will be used in the subsequent discussion. We denote by $\immediate\immediate\immediate{\color[rgb]{0.109375,0.18359375,0.18359375}% \definecolor[named]{pgfstrokecolor}{rgb}{0.109375,0.18359375,0.18359375}P}^{1}% _{\mathsf{R}({i})}$ the $\textrm{Player}\leavevmode\nobreak\ {1}$ positive attractor set of $Bnd(\mathsf{R}({i}))$ , i.e., the set of vertices in $\mathcal{G}_{\mathsf{R}({i})}$ that are positive winning for $\textrm{Player}\leavevmode\nobreak\ {1}$ for the ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}\textsf{Reach}}(Bnd(\mathsf{R}({i})))$ objective. The complement $\immediate\immediate\immediate{\color[rgb]{0.109375,0.18359375,0.18359375}% \definecolor[named]{pgfstrokecolor}{rgb}{0.109375,0.18359375,0.18359375}T}^{1}% _{\mathsf{R}({i})}=\mathsf{R}({i})\setminus{\color[rgb]{0.1875,0,0.1875}% \definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875}P}^{1}_{\mathsf{R}({i% })}$ is a trap for $\textrm{Player}\leavevmode\nobreak\ {1}$ in $\mathcal{G}_{\mathsf{R}({i})}$ , and with abuse of notation, we use the same symbol ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ to denote the subset of $\mathcal{G}_{\mathsf{R}({i})}$ as well as the trap subgame. We note that if $\mathsf{R}({i})$ does not have boundary vertices, that is, if $Bnd(\mathsf{R}({i}))=\varnothing$ , then it holds that ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({i})}=\varnothing$ and ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}=\mathsf{R}({i})$ . We can analogously define ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{2}_{\mathsf{R}({i})}$ and ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{2}_{\mathsf{R}({i})}$ for $\textrm{Player}\leavevmode\nobreak\ {2}$ . Given $\mathcal{G}_{\mathsf{R}({i})}$ , these sets can be computed in polynomial time using attractor computations [25].

Example 6.

We compute these sets for the restrictions shown in Figure 2. For $i\in\{1,4,5\}$ , since $Bnd(\mathsf{R}({i}))$ is empty, we have that ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}=\mathsf{R}({i})$ and ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({i})}=\varnothing$ . For $\mathsf{R}({2})$ , we have that ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({2})}=\mathsf{R}({2})$ , and ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({2})}=\varnothing$ . For $\mathsf{R}({3})$ , we have that ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({3})}=\{v_{6}\}$ , ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({3})}=\{v_{7},v_{8},v_{9}\}$ . $\lrcorner$

3.2 Characterisation of the value vector

We describe in Theorem 7 a necessary and sufficient set of conditions for a given vector $\vec{\mathsf{r}}$ to be equal to the expected ${\varphi}$ -value vector $\vec{\mathsf{s}}$ . In addition to Bellman, Theorem 7 makes use of two more conditions, which we define before stating the theorem.

$\blacksquare$

lower-bound condition: for all $1\leq i\leq k_{\vec{\mathsf{r}}}$ , $\textrm{Player}\leavevmode\nobreak\ {1}$ wins $\{{\varphi}>\mathfrak{r}_{i}-\varepsilon_{{\varphi}}\}$ almost surely in the trap subgame ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ from all vertices in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ .
$\blacksquare$

upper-bound condition: for all $1\leq i\leq k_{\vec{\mathsf{r}}}$ , $\textrm{Player}\leavevmode\nobreak\ {2}$ wins $\{{\varphi}<\mathfrak{r}_{i}+\varepsilon_{{\varphi}}\}$ almost surely in the trap subgame ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{2}_{\mathsf{R}({i})}$ from all vertices in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{2}_{\mathsf{R}({i})}$ .

Theorem 7.

The only vector $\vec{\mathsf{r}}$ , whose every component has denominator at most $\textsf{den}_{{\varphi}}$ , that satisfies Bellman, lower-bound, and upper-bound is the expected ${\varphi}$ -value vector $\vec{\mathsf{s}}$ .

Proof.

We show in Lemma 8 that $\vec{\mathsf{s}}$ satisfies the three conditions. We show in Lemma 9 that if $\vec{\mathsf{r}}$ is a vector that satisfies the three conditions, then $\vec{\mathsf{r}}$ is less than $\varepsilon_{{\varphi}}$ distance away from $\vec{\mathsf{s}}$ , that is, $\vec{\mathsf{s}}-\varepsilon_{{\varphi}}<\vec{\mathsf{r}}<\vec{\mathsf{s}}+% \varepsilon_{{\varphi}}$ . In particular, if each component of $\vec{\mathsf{r}}$ can be written as $\frac{p}{q}$ , where $p$ , $q$ are both integers and $q$ is at most $\textsf{den}_{{\varphi}}$ , then it follows that $\vec{\mathsf{r}}$ is equal to $\vec{\mathsf{s}}$ . $\hfill\blacktriangleleft$

In the rest of the section, we prove Lemmas 8 and 9 used in the proof of Theorem 7.

Lemma 8.

The expected ${\varphi}$ -value vector $\vec{\mathsf{s}}$ satisfies the three conditions in Theorem 7.

Proof.

The fact that $\vec{\mathsf{s}}$ satisfies Bellman follows directly from Proposition 1. We show that lower-bound holds for $\vec{\mathsf{s}}$ . The proof for upper-bound is analogous.

Suppose for the sake of contradiction that lower-bound does not hold, that is, there exists $1\leq i\leq k_{\vec{\mathsf{s}}}$ and a vertex $v$ in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{S}(i)}$ such that $\textrm{Player}\leavevmode\nobreak\ {2}$ has a positive winning strategy from $v$ for the $\{{\varphi}\leq\mathfrak{s}_{i}-\varepsilon_{{\varphi}}\}$ objective in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{S}(i)}$ . Since $\{{\varphi}\leq\mathfrak{s}_{i}-\varepsilon_{{\varphi}}\}$ is a prefix-independent objective, from [9, Theorem 1] we have that there exists another vertex $v^{\prime}$ in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{S}(i)}$ such that $\textrm{Player}\leavevmode\nobreak\ {2}$ has an almost-sure winning strategy from $v^{\prime}$ for the same objective $\{{\varphi}\leq\mathfrak{s}_{i}-\varepsilon_{{\varphi}}\}$ in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{S}(i)}$ . If $\textrm{Player}\leavevmode\nobreak\ {2}$ follows this strategy from $v^{\prime}$ in the original game $\mathcal{G}$ , then one of the following two cases holds

$\blacksquare$

$\textrm{Player}\leavevmode\nobreak\ {1}$ always moves the token to a vertex in $\mathsf{S}(i)$ . Since $v^{\prime}$ is in the trap ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{S}(i)}$ for $\textrm{Player}\leavevmode\nobreak\ {1}$ in $\mathcal{G}_{\mathsf{S}(i)}$ , $\textrm{Player}\leavevmode\nobreak\ {2}$ can force the token to remain in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{S}(i)}$ forever, and follow the almost-sure winning strategy to ensure that with probability 1, the outcome satisfies the objective $\{{\varphi}\leq\mathfrak{s}_{i}-\varepsilon_{{\varphi}}\}$ .
$\blacksquare$

$\textrm{Player}\leavevmode\nobreak\ {1}$ eventually moves the token to a vertex out of $\mathsf{S}(i)$ . Since $\vec{\mathsf{s}}$ satisfies Bellman, the token moves to an $\vec{\mathsf{s}}$ -class with a smaller value than $\mathfrak{s}_{i}$ .

In both cases, the expected ${\varphi}$ -value of the outcome is less than $\mathfrak{s}_{i}$ . This is a contradiction since $v^{\prime}\in\mathsf{S}(i)$ , and the expected ${\varphi}$ -value of every vertex in $\mathsf{S}(i)$ is equal to $\mathfrak{s}_{i}$ . $\hfill\blacktriangleleft$

Lemma 9.

If a vector $\vec{\mathsf{r}}$ satisfies the three conditions in Theorem 7, then $\vec{\mathsf{s}}-\varepsilon_{{\varphi}}<\vec{\mathsf{r}}<\vec{\mathsf{s}}+% \varepsilon_{{\varphi}}$ . In particular, we have the following:

$\blacksquare$

If $\vec{\mathsf{r}}$ satisfies the Bellman and lower-bound conditions, then $\vec{\mathsf{s}}>\vec{\mathsf{r}}-\varepsilon_{{\varphi}}$ .
$\blacksquare$

If $\vec{\mathsf{r}}$ satisfies the Bellman and upper-bound conditions, then $\vec{\mathsf{s}}<\vec{\mathsf{r}}+\varepsilon_{{\varphi}}$ .

Proof sketch..

We prove the first case. The proof for the second case follows by symmetry, that is, we essentially replace $\textrm{Player}\leavevmode\nobreak\ {1}$ by $\textrm{Player}\leavevmode\nobreak\ {2}$ , and $\{{\varphi}>\mathfrak{r}_{i}-\varepsilon_{{\varphi}}\}$ by $\{{\varphi}<\mathfrak{r}_{i}+\varepsilon_{{\varphi}}\}$ . We describe an optimal strategy $\sigma_{1}^{*}$ of $\textrm{Player}\leavevmode\nobreak\ {1}$ and give a sketch of its optimality.

Since $\vec{\mathsf{r}}$ satisfies the lower-bound condition, we have that for all $1\leq i\leq k_{\vec{\mathsf{r}}}$ , $\textrm{Player}\leavevmode\nobreak\ {1}$ has an an almost-sure winning strategy $\sigma^{T}_{\mathsf{R}({i})}$ in the trap subgame ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ to win the objective $\{{\varphi}>\mathfrak{r}_{i}-\varepsilon_{{\varphi}}\}$ in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ almost surely from all vertices in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ . From the definition of ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({i})}$ , $\textrm{Player}\leavevmode\nobreak\ {1}$ has a positive winning strategy $\sigma^{P}_{\mathsf{R}({i})}$ in the restricted game $\mathcal{G}_{\mathsf{R}({i})}$ from vertices in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({i})}$ for the ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}\textsf{Reach}}(Bnd(\mathsf{R}({i})))$ objective. By following $\sigma^{P}_{\mathsf{R}({i})}$ , the token either reaches $Bnd(\mathsf{R}({i}))$ with positive probability, or ends up in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ from where $\textrm{Player}\leavevmode\nobreak\ {2}$ can ensure that the token never leaves ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ . Using these strategies of $\textrm{Player}\leavevmode\nobreak\ {1}$ in $\mathcal{G}_{\mathsf{R}({i})}$ , we construct a strategy $\sigma_{1}^{*}$ of $\textrm{Player}\leavevmode\nobreak\ {1}$ that is optimal for expected ${\varphi}$ -value in the original game $\mathcal{G}$ : As long as the token is in the class $\mathsf{R}({i})$ in $\mathcal{G}$ , the strategy $\sigma_{1}^{*}$ mimics $\sigma^{T}_{\mathsf{R}({i})}$ if the token is in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i})}$ and $\sigma_{1}^{*}$ mimics $\sigma^{P}_{\mathsf{R}({i})}$ if the token is in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({i})}$ .

Note that whenever the token is on a vertex $v\in V_{1}$ in $\mathsf{R}({i})$ , the strategy $\sigma_{1}^{*}$ always moves the token to a vertex $v^{\prime}$ in same $\vec{\mathsf{r}}$ -class $\mathsf{R}({i})$ as $v$ (i.e. a token only exits an $\vec{\mathsf{r}}$ -class from a $\textrm{Player}\leavevmode\nobreak\ {2}$ vertex or from a boundary vertex), and thus, Proposition 5 holds. Whenever the token exits a class $\mathsf{R}({i})$ to reach a different class $\mathsf{R}({i^{\prime}})$ , then as long as the token remains in $\mathsf{R}({i^{\prime}})$ , the strategy $\sigma_{1}^{*}$ follows $\sigma^{T}_{\mathsf{R}({i^{\prime}})}$ if the token is in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({i^{\prime}})}$ , and $\sigma_{1}^{*}$ follows $\sigma^{P}_{\mathsf{R}({i^{\prime}})}$ if the token is in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({i^{\prime}})}$ .

Since Proposition 5 holds, we have that for all strategies of $\textrm{Player}\leavevmode\nobreak\ {2}$ , with probability $1$ , the token eventually reaches an $\vec{\mathsf{r}}$ -class $\mathsf{R}({j})$ from which it never exits. Moreover, the strategy $\sigma_{1}^{*}$ ensures that with probability $1$ , the token eventually reaches ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({j})}$ in $\mathsf{R}({j})$ from which it never leaves. Because if not, then the token would visit vertices in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}P}^{1}_{\mathsf{R}({j})}$ infinitely often, having a fixed positive probability of reaching $Bnd(\mathsf{R}({j}))$ in every step because of $\sigma^{P}_{\mathsf{R}({j})}$ . Thus, with probability $1$ , the token would eventually reach $Bnd(\mathsf{R}({i}))$ from where it could escape to a different $\vec{\mathsf{r}}$ -class, which contradicts the fact that the token stays in $\mathsf{R}({j})$ forever.

Since ${\varphi}$ is prefix-independent, the ${\varphi}$ -value of a play only depends on the trap ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({j})}$ it ends up in. If the game begins from a vertex $v\in\mathsf{R}({i})$ , then for $1\leq j\leq k_{\vec{\mathsf{r}}}$ , let $p_{j}$ denote the probability that the token ends up in the trap subgame ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({j})}$ from which it never exits. Since $\vec{\mathsf{r}}$ satisfies Bellman, we have that $\sum_{j}p_{j}\mathfrak{r}_{j}=\mathfrak{r}_{i}$ . Since $\vec{\mathsf{r}}$ satisfies lower-bound, $\textrm{Player}\leavevmode\nobreak\ {1}$ has an almost-sure winning strategy for $\{{\varphi}>\mathfrak{r}_{j}-\varepsilon_{{\varphi}}\}$ in ${\color[rgb]{0.1875,0,0.1875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.1875,0,0.1875}T}^{1}_{\mathsf{R}({j})}$ . Thus, for all strategies $\sigma_{2}$ of $\textrm{Player}\leavevmode\nobreak\ {2}$ , the expected value of an outcome of $(\sigma_{1}^{*},\sigma_{2})$ from $v\in\mathsf{R}({i})$ is greater than $\sum_{j}p_{j}(\mathfrak{r}_{j}-\varepsilon_{{\varphi}})$ , which is $\mathfrak{r}_{i}-\varepsilon_{{\varphi}}$ . This holds for all vertices $v$ in $\mathcal{G}$ , giving us the desired result $\vec{\mathsf{s}}>\vec{\mathsf{r}}-\varepsilon_{{\varphi}}$ . $\hfill\blacktriangleleft$

We also note that the optimal strategy $\sigma_{1}^{*}$ always either follows an almost-sure winning strategy $\sigma^{T}_{\mathsf{R}({i})}$ for the threshold objective $\{{\varphi}>\mathfrak{r}_{i}-\varepsilon_{{\varphi}}\}$ or a positive winning strategy for a Reach objective. Since there exist memoryless positive winning strategies for the Reach objective [20], we have the following bound on the memory requirement of $\sigma_{1}^{*}$ .

Corollary 10.

The memory requirement of $\sigma_{1}^{*}$ is at most the maximum over all $1\leq i\leq k_{\vec{\mathsf{r}}}$ of the memory requirement of an almost-sure winning strategy $\sigma^{T}_{\mathsf{R}({i})}$ for the threshold objective $\{{\varphi}>\mathfrak{r}_{i}-\varepsilon_{{\varphi}}\}$ . Moreover, if $\sigma^{T}_{\mathsf{R}({i})}$ is a deterministic strategy, then so is $\sigma_{1}^{*}$ .

3.3 Bounding the denominators in the value vector

In this section, we discuss the problem of obtaining an upper bound $\textsf{den}_{{\varphi}}$ for the denominators of the expected ${\varphi}$ -values of vertices $\mathfrak{s}_{i}$ for a bounded prefix-independent objective ${\varphi}$ in a game $\mathcal{G}$ . In [16], the technique of value class is used to compute the values of vertices for Boolean prefix-independent objectives. It is stated without proof that the probability of satisfaction of a parity or a Streett objective [2] from each vertex can be written as $\frac{p}{q}$ where $q\leq(\widehat{\mathbb{P}})^{4\cdot\lvert E\rvert}$ and $\widehat{\mathbb{P}}$ is the maximum denominator over all edge probabilities in the game. As such, we were not able to directly generalise this bound for the expectation of quantitative prefix-independent objectives. Instead, we make the following observations:

$\blacksquare$

Let $\mathsf{S}(i)$ be an $\vec{\mathsf{s}}$ -class without boundary vertices. If the token is in $\mathsf{S}(i)$ at some point in the play, then since $\vec{\mathsf{s}}$ satisfies the Bellman condition, neither player has an incentive to move the token out of $\mathsf{S}(i)$ . Since there are no boundary vertices in $\mathsf{S}(i)$ , the token does not exit $\mathsf{S}(i)$ from a probabilistic vertex either, and remains in $\mathsf{S}(i)$ forever. Thus, the value $\mathfrak{s}_{i}$ of $\mathsf{S}(i)$ depends only on the internal structure of $\mathsf{S}(i)$ . We denote by $\underline{\textsf{den}}_{{\varphi}}$ an upper bound on the denominators of values of $\vec{\mathsf{s}}$ -classes without boundary vertices. It is a simpler problem to find $\underline{\textsf{den}}_{{\varphi}}$ than to find $\textsf{den}_{{\varphi}}$ , as each class without boundary vertices can be treated as a subgame in which each vertex has the same expected ${\varphi}$ -value, or equivalently, the subgame consists of exactly one $\vec{\mathsf{s}}$ -class.
$\blacksquare$

On the other hand, suppose $\mathsf{S}(i)$ is an $\vec{\mathsf{s}}$ -class containing at least one boundary vertex, and let $v$ be a boundary vertex in $\mathsf{S}(i)$ . Then, since $\vec{\mathsf{s}}$ satisfies the Bellman condition, we have $\mathsf{s}_{v}=\sum_{v^{\prime}\in{\color[rgb]{0.1875,0,0.1875}\definecolor[% named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875}E(}v{\color[rgb]{0.1875,0,0.1875}% \definecolor[named]{pgfstrokecolor}{rgb}{0.1875,0,0.1875})}}\mathbb{P}(v)(v^{% \prime})\cdot\mathsf{s}_{v^{\prime}}$ , which is also the value $\mathfrak{s}_{i}$ of $\mathsf{S}(i)$ . Thus, $\mathfrak{s}_{i}$ can be written in terms of the values of classes reachable from $v$ in one step and the probabilities with which those classes are reached. In fact, we construct in the proof of Theorem 11 a system of linear equations to show that the value of each $\vec{\mathsf{s}}$ -class with boundary vertices can be expressed solely in terms of transition probabilities of the outgoing edges from boundary vertices and values of $\vec{\mathsf{s}}$ -classes without boundary vertices.

The method to calculate $\underline{\textsf{den}}_{{\varphi}}$ depends on the specific objective; we illustrate as an example in Section 4 a way to obtain $\underline{\textsf{den}}_{{\varphi}}$ for a particular kind of objective called the window mean-payoff objective. Once we know $\underline{\textsf{den}}_{{\varphi}}$ for an objective ${\varphi}$ , we can use Theorem 11 to obtain $\textsf{den}_{{\varphi}}$ in terms of $\underline{\textsf{den}}_{{\varphi}}$ .

Theorem 11.

The denominator of the value of each $\vec{\mathsf{s}}$ -class in $\mathcal{G}$ is at most $\textsf{den}_{{\varphi}}=2^{\lvert V\rvert}\cdot\widehat{\mathbb{P}}^{\lvert V% \rvert^{3}}\cdot(\underline{\textsf{den}}_{{\varphi}})^{\lvert V\rvert}$ .

We note that this theorem implies that the number of bits required to write $\textsf{den}_{{\varphi}}$ is polynomial in the number of vertices in the game and in the number of bits required to write $\underline{\textsf{den}}_{{\varphi}}$ . We devote the rest of this section to the proof of Theorem 11. For ease of notation, we denote the number of $\vec{\mathsf{s}}$ -classes in the game by $k$ instead of $k_{\vec{\mathsf{s}}}$ for the rest of this section. If every $\vec{\mathsf{s}}$ -class has no boundary vertices, then we have $\textsf{den}_{{\varphi}}$ equal to $\underline{\textsf{den}}_{{\varphi}}$ and we are done. So we assume there exists at least one class that contains boundary vertices. Let $m\geq 1$ denote the number of $\vec{\mathsf{s}}$ -classes with boundary vertices, and therefore, there are $k-m$ $\vec{\mathsf{s}}$ -classes without boundary vertices. Since there always exists at least one $\vec{\mathsf{s}}$ -class without boundary vertices (Proposition 3), we have that $m<k$ . Let $B=\{1,2,\ldots,m\}$ and $C=\{m+1,\ldots,k\}$ . We index the $\vec{\mathsf{s}}$ -classes such that each class with boundary vertices has its index in $B$ and each class without boundary vertices has its index in $C$ . Furthermore, in the sets $B$ and $C$ , the classes are indexed in increasing order of their values. That is, for $i, j$ both in $B$ or both in $C$ , we have $i<j$ if and only if $\mathfrak{s}_{i}<\mathfrak{s}_{j}$ . We show bounds on the denominators of $\vec{\mathsf{s}}$ -values of classes with boundary vertices, i.e., $\mathfrak{s}_{1},\ldots,\mathfrak{s}_{m}$ in terms of $\vec{\mathsf{s}}$ -values of classes without boundary vertices, i.e., $\mathfrak{s}_{m+1},\ldots,\mathfrak{s}_{k}$ .

For all $i\in B=\{1,2,\ldots,m\}$ , pick an arbitrary boundary vertex from $Bnd(\mathsf{S}({i}))$ and call this the representative vertex $u_{i}$ of $Bnd(\mathsf{S}({i}))$ . For all $i\in B$ and $j\in\{1,2,\ldots,k\}$ , let $p_{i,j}$ denote the probability of reaching the class $\mathsf{S}(j)$ from $u_{i}$ in one step. Since $\vec{\mathsf{s}}$ satisfies the Bellman condition, we have that $\sum_{1\leq j\leq k}p_{i,j}\cdot\mathfrak{s}_{j}=\mathfrak{s}_{i}$ . It is helpful to split this sum based on whether $j\in B$ or $j\in C$ , i.e., whether $1\leq j\leq m$ or $m+1\leq j\leq k$ . We rewrite the sums as $\sum_{j\in B}p_{i,j}\mathfrak{s}_{j}+\sum_{j\in C}p_{i,j}\mathfrak{s}_{j}=% \mathfrak{s}_{i}$ for all $i\in B$ , and we represent this system of equations below using matrices.

\begin{pmatrix}p_{1,1}&p_{1,2}&\cdots&p_{1,m}\\ p_{2,1}&p_{2,2}&\cdots&p_{2,m}\\ \vdots&\vdots&\ddots&\vdots\\ p_{m,1}&p_{m,2}&\cdots&p_{m,m}\end{pmatrix}\begin{pmatrix}\mathfrak{s}_{1}\\ \mathfrak{s}_{2}\\ \vdots\\ \mathfrak{s}_{m}\end{pmatrix}+\begin{pmatrix}p_{1,m+1}&p_{1,m+2}&\cdots&p_{1,k% }\\ p_{2,m+1}&p_{2,m+2}&\cdots&p_{2,k}\\ \vdots&\vdots&\ddots&\vdots\\ p_{m,m+1}&p_{m,m+2}&\cdots&p_{m,k}\\ \end{pmatrix}\begin{pmatrix}\mathfrak{s}_{m+1}\\ \mathfrak{s}_{m+2}\\ \vdots\\ \mathfrak{s}_{k}\end{pmatrix}=\begin{pmatrix}\mathfrak{s}_{1}\\ \mathfrak{s}_{2}\\ \vdots\\ \mathfrak{s}_{m}\end{pmatrix}

This system of equations is of the form $Q_{B}\mathfrak{s}_{B}+Q_{C}\mathfrak{s}_{C}=\mathfrak{s}_{B}$ . Rearranging terms gives us $(I-Q_{B})\mathfrak{s}_{B}=Q_{C}\mathfrak{s}_{C}$ where $I$ is the $m\times m$ identity matrix. It follows from Proposition 12 that the equation $(I-Q_{B})\mathfrak{s}_{B}=Q_{C}\mathfrak{s}_{C}$ has a unique solution.

Proposition 12.

The matrix $(I-Q_{B})$ is invertible.

Let $\alpha$ denote the least common multiple (lcm) of the denominators of $p_{i,j}$ for $1\leq i\leq m$ and $1\leq j\leq k$ . We have $0<\alpha\leq\widehat{\mathbb{P}}^{mk}$ , where $\widehat{\mathbb{P}}$ is the maximum denominator over all edge probabilities in $\mathcal{G}$ . We multiply both sides of the equation $(I-Q_{B})\mathfrak{s}_{B}=Q_{c}\mathfrak{s}_{C}$ by $\alpha$ to get $\alpha(I-Q_{B})\mathfrak{s}_{B}=\alpha Q_{C}\mathfrak{s}_{C}$ and note that all the elements of $\alpha(I-Q_{B})$ and $\alpha Q_{C}$ are integers. Let $D$ be the determinant of the matrix $\alpha(I-Q_{B})$ , and for $1\leq i\leq m$ , let $N_{i}$ be the determinant of the matrix obtained by replacing the $i^{\text{th}}$ column of $\alpha(I-Q_{B})$ with the column vector $\alpha Q_{C}\mathfrak{s}_{C}$ . Since $\alpha(I-Q_{B})$ is invertible, by Cramer’s rule [32], we have that $\mathfrak{s}_{i}=N_{i}/D$ . Proposition 13 shows that $\lvert D\rvert$ is an integer and is at most $(2\alpha)^{m}$ and Proposition 14 shows that $N_{i}$ has denominator at most $(\underline{\textsf{den}}_{{\varphi}})^{k-m}$ .

Proposition 13.

The absolute value of the determinant of $\alpha(I-Q_{B})$ , i.e., $\lvert D\rvert$ , is an integer and is at most $(2\alpha)^{m}$ , which is at most $2^{\lvert V\rvert}\cdot\widehat{\mathbb{P}}^{\lvert V\rvert^{3}}$ .

Proposition 14.

The denominator of $N_{i}$ is at most $(\underline{\textsf{den}}_{{\varphi}})^{k-m}$ , which is at most $(\underline{\textsf{den}}_{{\varphi}})^{\lvert V\rvert}$ .

Since the denominator of $\mathfrak{s}_{i}$ is at most $|D|$ times the denominator of $N_{i}$ , we obtain the bound stated in Theorem 11. ∎

4 Expectation of window mean-payoff objectives

In this section, we apply the results from the previous section for two types of window mean-payoff objectives introduced in [12]:

(i)

fixed window mean-payoff ( $\textsf{FWMP}(\ell)$ ) in which a window length $\ell\geq 1$ is given, and
(ii)

bounded window mean-payoff (BWMP) in which for every play, we need a bound on window lengths.

We define these objectives below.

For a play $\pi$ in a stochastic game $\mathcal{G}$ , the mean payoff of an infix $\pi(i,i+n)$ is the average of the payoffs of the edges in the infix and is defined as $\mathsf{MP}(\pi(i,i+n))=\sum_{k=i}^{i+n-1}\frac{1}{n}w(v_{k},v_{k+1})$ . Given a window length $\ell\geq 1$ and a threshold $\lambda\in\mathbb{R}$ , a play $\pi=v_{0}v_{1}\dotsm$ in $\mathcal{G}$ satisfies the fixed window mean-payoff objective $\textsf{FWMP}_{\mathcal{G}}(\ell,\lambda)$ if from every position after some point, it is possible to start an infix of length at most $\ell$ with mean payoff at least $\lambda$ .

\displaystyle\textsf{FWMP}_{\mathcal{G}}(\ell,\lambda)=\{\pi\in\mathsf{Plays}_% {\mathcal{G}}\mid\exists k\geq 0\cdot\forall i\geq k\cdot\exists j\in\{1,% \ldots,\ell\}\colon\mathsf{MP}(\pi(i,i+j))\geq\lambda\}

We omit the subscript $\mathcal{G}$ when it is clear from the context. We extend the definition of windows as defined in [12] for arbitrary thresholds. Given a threshold $\lambda$ , a play $\pi=v_{0}v_{1}\cdots$ , and $0\leq i<j$ , we say that the $\lambda$ -window $\pi(i,j)$ is open if the mean payoff of $\pi(i,k)$ is less than $\lambda$ for all $i<k\leq j$ . Otherwise, the $\lambda$ -window is closed. A play $\pi$ satisfies $\textsf{FWMP}(\ell,\lambda)$ if and only if from some point on, every $\lambda$ -window in $\pi$ closes within at most $\ell$ steps. Note that $\textsf{FWMP}(\ell,\lambda)\subseteq\textsf{FWMP}(\ell^{\prime},\lambda)$ for $\ell\leq\ell^{\prime}$ as a smaller window length is a stronger constraint.

We also consider another window mean-payoff objective called the bounded window mean-payoff objective $\textsf{BWMP}_{\mathcal{G}}(\lambda)$ . A play satisfies the objective $\textsf{BWMP}(\lambda)$ if there exists a window length $\ell\geq 1$ such that the play satisfies $\textsf{FWMP}(\ell,\lambda)$ .

\textsf{BWMP}_{\mathcal{G}}(\lambda)=\{\pi\in\mathsf{Plays}_{\mathcal{G}}\mid% \exists\ell\geq 1:\pi\in\textsf{FWMP}_{\mathcal{G}}(\ell,\lambda)\}

Note that both $\textsf{FWMP}(\ell,\lambda)$ and $\textsf{BWMP}(\lambda)$ are Boolean prefix-independent objectives.

Expected window mean-payoff values.

Corresponding to the Boolean objectives $\textsf{FWMP}(\ell,\lambda)$ and $\textsf{BWMP}(\lambda)$ , we define quantitative versions of these objectives. Given a play $\pi$ in a stochastic game $\mathcal{G}$ and a window length $\ell$ , the ${\varphi}_{\textsf{FWMP}(\ell)}$ -value of $\pi$ is $\sup\{\lambda\in\mathbb{R}\mid\pi\in\textsf{FWMP}_{\mathcal{G}}(\ell,\lambda)\}$ , the supremum threshold $\lambda$ such that the play satisfies $\textsf{FWMP}_{\mathcal{G}}(\ell,\lambda)$ . Using notations from Section 2, we denote the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value of a vertex $v$ by $\mathbb{E}_{v}({\varphi}_{\textsf{FWMP}(\ell)})$ . We define $\mathbb{E}_{v}({\varphi}_{\textsf{BWMP}})$ , the expected ${\varphi}_{\textsf{BWMP}}$ -value of a vertex $v$ analogously. If W is an integer such that the payoff $w(e)$ of each edge $e$ in $\mathcal{G}$ satisfies $|w(e)|\leq\textsf{W}$ , then for all plays $\pi$ in $\mathcal{G}$ , we have that ${\varphi}_{\textsf{FWMP}(\ell)}(\pi)$ and ${\varphi}_{\textsf{BWMP}}(\pi)$ lie between $-\textsf{W}$ and W. Thus, ${\varphi}_{\textsf{FWMP}(\ell)}$ and ${\varphi}_{\textsf{BWMP}}$ are bounded objectives.

Decision problems.

Given a stochastic game $\mathcal{G}$ , a vertex $v$ , and a threshold $\lambda\in\mathbb{Q}$ , we have the following expectation problems for the window mean-payoff objectives:

$\blacksquare$

expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value problem: Given a window length $\ell\geq 1$ , is $\mathbb{E}_{v}({\varphi}_{\textsf{FWMP}(\ell)})\geq\lambda$ ?
$\blacksquare$

expected ${\varphi}_{\textsf{BWMP}}$ -value problem: Is $\mathbb{E}_{v}({\varphi}_{\textsf{BWMP}})\geq\lambda$ ?

As considered in previous works [12, 5, 6], the window length $\ell$ is usually small ( $\ell\leq\lvert V\rvert$ ), and hence we assume that $\ell$ is given in unary (while the edge-payoffs are given in binary).

4.1 Expected fixed window mean-payoff value

We give tight complexity bounds for the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value problem. We use the characterisation from Theorem 7 to present our main result that this problem is in $\mathsf{UP}\cap\mathsf{coUP}$ (Theorem 18). We show in [21] that simple stochastic games [20], which are in $\mathsf{UP}\cap\mathsf{coUP}$ [13], reduce to the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value problem, giving a tight lower bound.

In order to use the characterisation, we show the existence of the bound $\underline{\textsf{den}}_{\textsf{FWMP}(\ell)}$ for the ${\varphi}_{\textsf{FWMP}(\ell)}$ objective. We show in Lemma 15 that the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value $\mathfrak{s}_{i}$ of a class $\mathsf{S}(i)$ without boundary vertices takes a special form, that is, $\mathfrak{s}_{i}$ is the mean-payoff of a sequence of at most $\ell$ edges in $\mathsf{S}(i)$ . We use the fact that the ${\varphi}_{\textsf{FWMP}(\ell)}$ -value of every play $\pi$ is the largest $\lambda$ such that, eventually, every $\lambda$ -window in $\pi$ closes in at most $\ell$ steps, and that $\lambda$ is the mean payoff of a sequence of at most $\ell$ edges in $\pi$ . To complete the argument, we show that if both players play optimally, then, with probability $1$ , the ${\varphi}_{\textsf{FWMP}(\ell)}$ -value of the outcome $\pi$ is equal to $\mathfrak{s}_{i}$ and thus, $\mathfrak{s}_{i}$ is also of this form.

Lemma 15.

The expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value $\mathfrak{s}_{i}$ of vertices in a class $\mathsf{S}(i)$ without boundary vertices is equal to the mean payoff of some sequence of $\ell$ or fewer edges in $\mathsf{S}(i)$ . That is, $\mathfrak{s}_{i}$ is of the form $\frac{1}{j}\left(w(e_{1})+\cdots+w(e_{j})\right)$ for some $j\leq\ell$ and edges $e_{1},e_{2},\ldots,e_{j}$ .

This observation gives us the bound $\underline{\textsf{den}}_{\textsf{FWMP}(\ell)}$ on the denominators of the values of $\vec{\mathsf{s}}$ -classes without boundary vertices. To see this, let $\widehat{w}=\max\{q\mid\exists p,q\in\mathbb{Z},\,\exists e\in E:w(e)=\frac{p}% {q}\text{ with }p,q\text{ co-prime}\}$ be the maximum denominator over all edge-payoffs in $\mathcal{G}$ . Since $j\leq\ell$ , and each $w(e_{1}),w(e_{2}),\ldots,w(e_{j})$ is a rational number with denominator at most $\widehat{w}$ , the denominator of the sum $w(e_{1})+\cdots+w(e_{j})$ is at most $\widehat{w}\cdot(\widehat{w}-1)\cdot(\widehat{w}-2)\cdots(\widehat{w}-(\ell-1))$ if $\widehat{w}\geq\ell$ , and at most $\widehat{w}!$ if $\widehat{w}\leq\ell$ . In both cases, this is at most $\widehat{w}^{\ell}$ .

Corollary 16.

The expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value of vertices in $\vec{\mathsf{s}}$ -classes without boundary vertices can be written as $\frac{p}{q}$ where $p$ and $q$ are integers and $q\leq\widehat{w}^{\ell}\cdot\ell$ .

From Theorem 11, we get that the denominator of $\mathfrak{s}_{i}$ for each class $\mathsf{S}(i)$ in $\mathcal{G}$ is at most $2^{\lvert V\rvert}\cdot\widehat{\mathbb{P}}^{\lvert V\rvert^{3}}\cdot(% \underline{\textsf{den}}_{\textsf{FWMP}(\ell)})^{\lvert V\rvert}$ , which is at most $2^{\lvert V\rvert}\cdot\widehat{\mathbb{P}}^{\lvert V\rvert^{3}}\cdot(\widehat% {w}^{\ell}\cdot\ell)^{\lvert V\rvert}$ .

Lemma 17.

The expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value of each vertex in $\mathcal{G}$ can be written as a fraction $\frac{p}{q}$ , where $p, q$ are integers, and $q\leq 2^{\lvert V\rvert}\cdot\widehat{\mathbb{P}}^{\lvert V\rvert^{3}}\cdot(% \widehat{w}^{\ell}\cdot\ell)^{\lvert V\rvert}$ , and $-\textsf{W}\cdot q\leq p\leq\textsf{W}\cdot q$ .

We now state the main result of this section for the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value problem.

Theorem 18.

The expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value problem is in $\mathsf{UP}\cap\mathsf{coUP}$ when $\ell$ is given in unary. Memory of size $\ell$ suffices for $\textrm{Player}\leavevmode\nobreak\ {1}$ , while memory of size $\lvert V\rvert\cdot\ell$ suffices for $\textrm{Player}\leavevmode\nobreak\ {2}$ .

Proof.

To show membership of the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value problem in $\mathsf{UP}\cap\mathsf{coUP}$ , we first guess the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value vector $\vec{\mathsf{s}}$ , that is, the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value $\mathsf{s}_{v}$ of every vertex $v$ in the game. From Lemma 17, it follows that the number of bits required to write $\mathsf{s}_{v}$ for every vertex $v$ is polynomial in the size of the input. Thus, the vector $\vec{\mathsf{s}}$ can be guessed in polynomial time.

Then, to verify the guess, it is sufficient to verify the Bellman, lower-bound, and upper-bound conditions for ${\varphi}_{\textsf{FWMP}(\ell)}$ . It is easy to see that the Bellman condition can be checked in polynomial time. Checking the lower-bound and upper-bound conditions, i.e., checking the almost-sure satisfaction of the threshold Boolean objective $\textsf{FWMP}(\ell,\lambda)$ for appropriate thresholds $\lambda$ in trap subgames in each $\vec{\mathsf{s}}$ -class can be done in polynomial time [22]. Thus, the decision problem of $\mathbb{E}_{v}({\varphi}_{\textsf{FWMP}(\ell)})\geq\lambda$ is in $\mathsf{NP}$ , and moreover, since there is exactly one value vector that satisfies the conditions in Theorem 7, the decision problem is, in fact, in $\mathsf{UP}$ . Analogously, the complement decision problem of $\mathbb{E}_{v}({\varphi}_{\textsf{FWMP}(\ell)})<\lambda$ is also in $\mathsf{UP}$ . Hence, the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value problem is in $\mathsf{UP}\cap\mathsf{coUP}$ .

From the description of the optimal strategy in Lemma 9, it follows from Corollary 10 that the memory requirement for the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ objective is no greater than the memory requirement for the almost-sure satisfaction of the corresponding threshold objectives, which are $\ell$ and $\lvert V\rvert\cdot\ell$ for $\textrm{Player}\leavevmode\nobreak\ {1}$ and $\textrm{Player}\leavevmode\nobreak\ {2}$ respectively [22]. $\hfill\blacktriangleleft$

4.2 Expected bounded window mean-payoff value

We would like to apply the characterisation in Theorem 7 to ${\varphi}_{\textsf{BWMP}}$ to show that the expected ${\varphi}_{\textsf{BWMP}}$ -value problem is in $\mathsf{UP}\cap\mathsf{coUP}$ , and thus, we show the existence of the bound $\textsf{den}_{\textsf{BWMP}}$ for the ${\varphi}_{\textsf{BWMP}}$ objective. We show in Lemma 19 that the expected ${\varphi}_{\textsf{BWMP}}$ -value $\mathfrak{s}_{i}$ of a class $\mathsf{S}(i)$ without boundary vertices is the mean payoff of a simple cycle in $\mathsf{S}(i)$ .

Lemma 19.

The expected ${\varphi}_{\textsf{BWMP}}$ -value $\mathfrak{s}_{i}$ of vertices in a class $\mathsf{S}(i)$ without boundary vertices is equal to the mean-payoff value of a simple cycle in $\mathsf{S}(i)$ . That is, $\mathfrak{s}_{i}$ is of the form $\frac{1}{j}(w(e_{1})+\cdots+w(e_{j}))$ for some $j\leq\lvert V\rvert$ and edges $e_{1},e_{2},\ldots,e_{j}$ of a simple cycle.

While Lemma 19 is analogous to Lemma 15 for ${\varphi}_{\textsf{FWMP}(\ell)}$ , the proof of Lemma 19 is more involved since the ${\varphi}_{\textsf{BWMP}}$ objective requires one to consider windows of arbitrary lengths. In the proof, we make use of the fact that memoryless strategies suffice for $\textrm{Player}\leavevmode\nobreak\ {1}$ to play optimally for the almost-sure satisfaction of the BWMP objective [22]. In the resulting MDP (which has the same set of vertices as the game $\mathcal{G}_{\mathsf{S}(i)}$ ), we carefully analyse the resulting plays when $\textrm{Player}\leavevmode\nobreak\ {2}$ plays optimally. The following corollary of Lemma 19 states the bound $\underline{\textsf{den}}_{\textsf{BWMP}}$ for the ${\varphi}_{\textsf{BWMP}}$ objective.

Corollary 20.

The expected ${\varphi}_{\textsf{BWMP}}$ -value of vertices in $\vec{\mathsf{s}}$ -classes without boundary vertices can be written as $\frac{p}{q}$ where $p$ and $q$ are integers and $q\leq\widehat{w}^{\lvert V\rvert}\cdot\lvert V\rvert$ .

From Theorem 11, we get that the denominator of $\mathfrak{s}_{i}$ of each class $\mathsf{S}(i)$ in $\mathcal{G}$ is at most $2^{\lvert V\rvert}\cdot\widehat{\mathbb{P}}^{\lvert V\rvert^{3}}\cdot(% \underline{\textsf{den}}_{\textsf{BWMP}})^{\lvert V\rvert}$ , which is at most $2^{\lvert V\rvert}\cdot\widehat{\mathbb{P}}^{\lvert V\rvert^{3}}\cdot(\widehat% {w}^{\lvert V\rvert}\cdot\lvert V\rvert)^{\lvert V\rvert}$ .

Lemma 21.

The expected ${\varphi}_{\textsf{BWMP}}$ -value of each vertex in $\mathcal{G}$ can be written as $\frac{p}{q}$ , where $p, q$ are integers, and $q\leq 2^{\lvert V\rvert}\cdot\widehat{\mathbb{P}}^{\lvert V\rvert^{3}}\cdot(% \widehat{w}^{\lvert V\rvert}\cdot\lvert V\rvert)^{\lvert V\rvert}$ , and $-\textsf{W}\cdot q\leq p\leq\textsf{W}\cdot q$ .

We now state the main result of this section for the expected ${\varphi}_{\textsf{BWMP}}$ -value problem.

Theorem 22.

The expected ${\varphi}_{\textsf{BWMP}}$ -value problem is in $\mathsf{UP}\cap\mathsf{coUP}$ . Memoryless strategies suffice for $\textrm{Player}\leavevmode\nobreak\ {1}$ . $\textrm{Player}\leavevmode\nobreak\ {2}$ requires infinite memory in general.

Proof sketch.

This proof follows a similar structure as the proof of Theorem 18. As before, the Bellman condition can be checked in polynomial time. Checking the lower-bound and upper-bound conditions involves checking almost-sure satisfaction of the Boolean objective BWMP for appropriate thresholds, which reduces to checking the satisfaction of BWMP in non-stochastic games [22], which in turn reduces to total supremum payoff [12], which is in $\mathsf{UP}\cap\mathsf{coUP}$ [27]. Both of these reductions are polynomial-time reduction, and hence, the expected ${\varphi}_{\textsf{BWMP}}$ -value problem is in $\mathsf{UP}\cap\mathsf{coUP}$ .

Memoryless strategies suffice for $\textrm{Player}\leavevmode\nobreak\ {1}$ for almost-sure satisfaction of $\textsf{BWMP}(\lambda)$ [22]. $\textrm{Player}\leavevmode\nobreak\ {2}$ requires infinite memory in general for the $\textsf{BWMP}(\lambda)$ objective even in non-stochastic games [12], which are a special case of stochastic games. Deterministic strategies suffice for both players. Hence, we get the memory requirements of an optimal strategy for the expected ${\varphi}_{\textsf{BWMP}}$ -value problem using Corollary 10. $\hfill\blacktriangleleft$

5 Discussion

We discuss some concluding remarks about the relation of our work to previous work [16], which deals with the satisfaction of Boolean prefix-independent objectives. We also discuss practical implementations for window mean-payoff objectives and applicability of our technique to other prefix-independent objectives.

Comparison with [16].

In [16], it suffices to check the almost-sure satisfaction of the same Boolean objective ${\psi{}}$ in all value classes. In contrast, for quantitative objectives, the threshold Boolean objective for which we check the almost-sure satisfaction depends on the guessed value of the value class (“Can $\textrm{Player}\leavevmode\nobreak\ {1}$ satisfy $\{{\varphi}>\mathfrak{r}_{i}-\varepsilon_{{\varphi}}\}$ with probability $1$ ?”). Another key difference is that for Boolean objectives, the value classes without boundary vertices are precisely the extremal value classes, that is classes with values $0$ and $1$ . In the case of quantitative objectives, there may be multiple intermediate value classes without boundary vertices, making reasoning about the correctness of the reduction more difficult.

We note that if we apply our approach to Boolean prefix-independent objectives (such as Büchi, coBüchi, parity) by viewing them as quantitative objectives mapping each play to $0$ or $1$ , then we recover the algorithm given in [16].

Applicability to other prefix-independent objectives.

Recall that in order to be able to apply our characterisation to a prefix-independent objective ${\varphi}$ , we require bounds $\textsf{W}_{{\varphi}}$ on the image of ${\varphi}$ and $\textsf{den}_{{\varphi}}$ on the denominators of the optimal expected ${\varphi}$ -values of vertices in the game. These bounds can easily be derived individually for the following quantitative prefix-independent objectives, and thus, our technique applies to these objectives.

Mean-payoff objective.: The mean-payoff value of any play is bounded between the minimum and maximum edge payoffs in the game, directly giving bounds on the image of the mean-payoff objective. Since deterministic memoryless optimal strategies exist for both players [24], the expected mean-payoff value is a solution of a stationary distribution in the Markov chain obtained by fixing memoryless strategies of both players. This gives denominator bounds for the expected mean-payoff values of vertices in the game.
Limsup and liminf objectives.: Since the liminf objective is equivalent to $\textsf{FWMP}(\ell)$ objective with window length $\ell=1$ [21], and the limsup objective is the dual of the liminf objective, our analysis with window mean-payoff objectives generalises limsup and liminf objectives.
Positive frequency payoff objective [29].: Here, each vertex has a payoff, and this objective returns the maximum payoff among all vertices that are visited with positive frequency in an infinite play. This objective is prefix-independent, as the frequency of a vertex is independent of finite prefixes. The image of the objective is bounded between the minimum and maximum vertex payoffs. We observe that the value of vertices in a value class without boundary vertices is equal to the payoff of a vertex in the class, (giving a denominator bound for these vertices), and Theorem 11 uses this to give a denominator bound for vertices in value classes with boundary vertices.

Practical implementation.

We discuss approaches to solve the expected ${\varphi}$ -value problem for the window mean-payoff objectives in practice.

A trivial algorithm that works for both ${\varphi}_{\textsf{FWMP}(\ell)}$ and ${\varphi}_{\textsf{BWMP}}$ objectives is to iterate over all possible value vectors. For each value vector, we check if the conditions in Theorem 7 are satisfied, which can be done in polynomial time. Since there are exponentially many possible value vectors, this algorithm has an exponential running time in the worst-case.

Another technique is value iteration [15], which has been seen to be an anytime algorithm for the standard mean-payoff objective [34]. An anytime algorithm gives better precision the longer it is run, and can be interrupted any time. Given a game $\mathcal{G}$ with $|V|$ vertices, the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -value problem on $\mathcal{G}$ reduces to the expected liminf-value problem on a game $\mathcal{G}^{\prime}$ with $|V|^{\ell}$ vertices, (that is, on an exponentially larger game graph). The liminf objective is a well-studied objective in the context of value iteration [15, 11]. We describe the reduction in [21], which also gives the expected ${\varphi}_{\textsf{FWMP}(\ell)}$ -values of vertices in $\mathcal{G}$ .

Since the size of the graph $\mathcal{G}^{\prime}$ is much bigger than that of $\mathcal{G}$ , we would like to work with $\mathcal{G}^{\prime}$ on-the-fly rather than explicitly constructing the entire graph. In [34], the authors show bounded value iteration for objectives such as reachability and mean-payoff. They also discuss that the algorithm can be extended to be asynchronous and use partial exploration. As future work, we would like to look at the practicality of on-demand asynchronous value iteration for the liminf objective, or even the window mean-payoff objectives ${\varphi}_{\textsf{FWMP}(\ell)}$ and ${\varphi}_{\textsf{BWMP}}$ directly. An interesting aspect of it would be to investigate heuristics and optimisations such as sound value iteration [37], optimistic value iteration [31], and topological value iteration [3] to speed up the practical running time.

References

[1] D. Andersson and P. B. Miltersen. The Complexity of Solving Stochastic Games on Graphs. In ISAAC, volume 5878 of Lecture Notes in Computer Science, pages 112–121. Springer, 2009. doi:10.1007/978-3-642-10631-6_13.
[2] K. R. Apt and E. Grädel. Lectures in Game Theory for Computer Scientists. Cambridge University Press, 2011.
[3] M. Azeem, A. Evangelidis, J. Křetínský, A. Slivinskiy, and M. Weininger. Optimistic and Topological Value Iteration for Simple Stochastic Games. In ATVA, pages 285–302. Springer, 2022. doi:10.1007/978-3-031-19992-9_18.
[4] C. Baier and J-P. Katoen. Principles of model checking. MIT Press, 2008.
[5] B. Bordais, S. Guha, and J-F. Raskin. Expected Window Mean-Payoff. In FSTTCS, volume 150 of LIPIcs, pages 32:1–32:15, 2019. doi:10.4230/LIPIcs.FSTTCS.2019.32.
[6] T. Brihaye, F. Delgrange, Y. Oualhadj, and M. Randour. Life is Random, Time is Not: Markov Decision Processes with Window Objectives. Logical Methods in Computer Science, Volume 16, Issue 4, December 2020. doi:10.23638/LMCS-16(4:13)2020.
[7] V. Bruyère, E. Filiot, M. Randour, and J-F. Raskin. Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. Information and Computation, 254:259–295, 2017. doi:10.1016/j.ic.2016.10.011.
[8] T. Brázdil, V. Brožek, K. Chatterjee, V. Forejt, and A. Kučera. Markov Decision Processes with Multiple Long-run Average Objectives. Logical Methods in Computer Science, Volume 10, Issue 1, Feb 2014. doi:10.2168/LMCS-10(1:13)2014.
[9] K. Chatterjee. Concurrent games with tail objectives. Theoretical Computer Science, 388(1):181–198, 2007. doi:10.1016/j.tcs.2007.07.047.
[10] K. Chatterjee, L. Doyen, H. Gimbert, and T. A. Henzinger. Randomness for free. Information and Computation, 245:3–16, 2015. doi:10.1016/j.ic.2015.06.003.
[11] K. Chatterjee, L. Doyen, and T. A. Henzinger. A Survey of Stochastic Games with Limsup and Liminf Objectives. In ICALP, pages 1–15. Springer, 2009. doi:10.1007/978-3-642-02930-1_1.
[12] K. Chatterjee, L. Doyen, M. Randour, and J-F. Raskin. Looking at mean-payoff and total-payoff through windows. Information and Computation, 242:25–52, 2015. doi:10.1016/j.ic.2015.03.010.
[13] K. Chatterjee and N. Fijalkow. A reduction from parity games to simple stochastic games. EPTCS, 54:74–86, 2011. doi:10.4204/eptcs.54.6.
[14] K. Chatterjee and T. A. Henzinger. Reduction of stochastic parity to stochastic mean-payoff games. Information Processing Letters, 106(1):1–7, 2008. doi:10.1016/j.ipl.2007.08.035.
[15] K. Chatterjee and T. A. Henzinger. Value Iteration. In 25 Years of Model Checking - History, Achievements, Perspectives, LNCS 5000, pages 107–138. Springer, 2008. doi:10.1007/978-3-540-69850-0_7.
[16] K. Chatterjee, T. A. Henzinger, and F. Horn. Stochastic Games with Finitary Objectives. In MFCS, pages 34–54. Springer, 2009. doi:10.1007/978-3-642-03816-7_4.
[17] K. Chatterjee, Z. Křetínská, and J. Křetínský. Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes. Logical Methods in Computer Science, Volume 13, Issue 2, July 2017. doi:10.23638/LMCS-13(2:15)2017.
[18] K. Chatterjee, T. Meggendorfer, R. Saona, and J. Svoboda. Faster Algorithm for Turn-based Stochastic Games with Bounded Treewidth. In SODA, pages 4590–4605, 2023. doi:10.1137/1.9781611977554.ch173.
[19] A. Church. Application of Recursive Arithmetic to the Problem of Circuit Synthesis. Journal of Symbolic Logic, 28(4):289–290, 1963. doi:10.2307/2271310.
[20] A. Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992. doi:10.1016/0890-5401(92)90048-K.
[21] L. Doyen, P. Gaba, and S. Guha. Expectation in Stochastic Games with Prefix-independent Objectives, 2024. doi:10.48550/arXiv.2405.18048.
[22] L. Doyen, P. Gaba, and S. Guha. Stochastic Window Mean-payoff Games. In FoSSaCS Part I, volume 14574 of LNCS, pages 34–54. Springer, 2024. doi:10.1007/978-3-031-57228-9_3.
[23] R. Durrett. Probability: Theory and Examples. Cambridge University Press, 2010.
[24] A. Ehrenfeucht and J. Mycielski. Positional Strategies for Mean Payoff Games. Int. Journal of Game Theory, 8(2):109–113, 1979. doi:10.1007/BF01768705.
[25] N. Fijalkow, N. Bertrand, P. Bouyer, R. Brenguier, A. Carayol, J. Fearnley, H. Gimbert, F. Horn, R. Ibsen-Jensen, N. Markey, B. Monmege, P. Novotny, M. Randour, O. Sankur, S. Schmitz, O. Serre, and M. Skomra. Games on Graphs. Online, 2024. doi:10.48550/arXiv.2305.10546.
[26] P. Gaba and S. Guha. Optimising expectation with guarantees for window mean payoff in Markov decision processes. In AAMAS, pages 820–828. International Foundation for Autonomous Agents and Multiagent Systems / ACM, 2025. doi:10.5555/3709347.3743600.
[27] T. M. Gawlitza and H. Seidl. Games through Nested Fixpoints. In CAV, pages 291–305. Springer, 2009. doi:10.1007/978-3-642-02658-4_24.
[28] D. Gillette. Stochastic games with zero stop probabilities, pages 179–188. Princeton University Press, December 1958.
[29] H. Gimbert and E. Kelmendi. Submixing and Shift-Invariant Stochastic Games. International Journal of Game Theory, 52(4):1179–1214, 2023. doi:10.1007/s00182-023-00860-5.
[30] V. Gurvich and P. B. Miltersen. On the Computational Complexity of Solving Stochastic Mean-Payoff Games. CoRR, abs/0812.0486, 2008. arXiv:0812.0486.
[31] A. Hartmanns and B. L. Kaminski. Optimistic Value Iteration. In CAV, pages 488–511. Springer, 2020. doi:10.1007/978-3-030-53291-8_26.
[32] K. Hoffman and R. Kunze. Linear Algebra. Prentice-Hall, Inc., 1971.
[33] J. Hopcroft and J. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Publishing Co., Inc., 1979.
[34] J. Křetínský, T. Meggendorfer, and M. Weininger. Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives. In LICS, pages 1–14, 2023. doi:10.1109/LICS56636.2023.10175771.
[35] D. A. Martin. The Determinacy of Blackwell Games. The Journal of Symbolic Logic, 63(4):1565–1581, 1998. doi:10.2307/2586667.
[36] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, 1994.
[37] T. Quatmann and J-P. Katoen. Sound Value Iteration. In CAV, pages 643–661. Springer, 2018. doi:10.1007/978-3-319-96145-3_37.
[38] L. S. Shapley. Stochastic Games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, October 1953. doi:10.1073/pnas.39.10.1095.
[39] U. Zwick and M. Paterson. The Complexity of Mean Payoff Games on Graphs. Theoretical Computer Science, 158(1&2):343–359, 1996. doi:10.1016/0304-3975(95)00188-3.

[bib.bib1] [1] D. Andersson and P. B. Miltersen. The Complexity of Solving Stochastic Games on Graphs. In ISAAC, volume 5878 of Lecture Notes in Computer Science, pages 112–121. Springer, 2009. doi:10.1007/978-3-642-10631-6_13.

[bib.bib2] [2] K. R. Apt and E. Grädel. Lectures in Game Theory for Computer Scientists. Cambridge University Press, 2011.

[bib.bib3] [3] M. Azeem, A. Evangelidis, J. Křetínský, A. Slivinskiy, and M. Weininger. Optimistic and Topological Value Iteration for Simple Stochastic Games. In ATVA, pages 285–302. Springer, 2022. doi:10.1007/978-3-031-19992-9_18.

[bib.bib4] [4] C. Baier and J-P. Katoen. Principles of model checking. MIT Press, 2008.

[bib.bib5] [5] B. Bordais, S. Guha, and J-F. Raskin. Expected Window Mean-Payoff. In FSTTCS, volume 150 of LIPIcs, pages 32:1–32:15, 2019. doi:10.4230/LIPIcs.FSTTCS.2019.32.

[bib.bib6] [6] T. Brihaye, F. Delgrange, Y. Oualhadj, and M. Randour. Life is Random, Time is Not: Markov Decision Processes with Window Objectives. Logical Methods in Computer Science, Volume 16, Issue 4, December 2020. doi:10.23638/LMCS-16(4:13)2020.

[bib.bib7] [7] V. Bruyère, E. Filiot, M. Randour, and J-F. Raskin. Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. Information and Computation, 254:259–295, 2017. doi:10.1016/j.ic.2016.10.011.

[bib.bib8] [8] T. Brázdil, V. Brožek, K. Chatterjee, V. Forejt, and A. Kučera. Markov Decision Processes with Multiple Long-run Average Objectives. Logical Methods in Computer Science, Volume 10, Issue 1, Feb 2014. doi:10.2168/LMCS-10(1:13)2014.

[bib.bib9] [9] K. Chatterjee. Concurrent games with tail objectives. Theoretical Computer Science, 388(1):181–198, 2007. doi:10.1016/j.tcs.2007.07.047.

[bib.bib10] [10] K. Chatterjee, L. Doyen, H. Gimbert, and T. A. Henzinger. Randomness for free. Information and Computation, 245:3–16, 2015. doi:10.1016/j.ic.2015.06.003.

[bib.bib11] [11] K. Chatterjee, L. Doyen, and T. A. Henzinger. A Survey of Stochastic Games with Limsup and Liminf Objectives. In ICALP, pages 1–15. Springer, 2009. doi:10.1007/978-3-642-02930-1_1.

[bib.bib12] [12] K. Chatterjee, L. Doyen, M. Randour, and J-F. Raskin. Looking at mean-payoff and total-payoff through windows. Information and Computation, 242:25–52, 2015. doi:10.1016/j.ic.2015.03.010.

[bib.bib13] [13] K. Chatterjee and N. Fijalkow. A reduction from parity games to simple stochastic games. EPTCS, 54:74–86, 2011. doi:10.4204/eptcs.54.6.

[bib.bib14] [14] K. Chatterjee and T. A. Henzinger. Reduction of stochastic parity to stochastic mean-payoff games. Information Processing Letters, 106(1):1–7, 2008. doi:10.1016/j.ipl.2007.08.035.

[bib.bib15] [15] K. Chatterjee and T. A. Henzinger. Value Iteration. In 25 Years of Model Checking - History, Achievements, Perspectives, LNCS 5000, pages 107–138. Springer, 2008. doi:10.1007/978-3-540-69850-0_7.

[bib.bib16] [16] K. Chatterjee, T. A. Henzinger, and F. Horn. Stochastic Games with Finitary Objectives. In MFCS, pages 34–54. Springer, 2009. doi:10.1007/978-3-642-03816-7_4.

[bib.bib17] [17] K. Chatterjee, Z. Křetínská, and J. Křetínský. Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes. Logical Methods in Computer Science, Volume 13, Issue 2, July 2017. doi:10.23638/LMCS-13(2:15)2017.

[bib.bib18] [18] K. Chatterjee, T. Meggendorfer, R. Saona, and J. Svoboda. Faster Algorithm for Turn-based Stochastic Games with Bounded Treewidth. In SODA, pages 4590–4605, 2023. doi:10.1137/1.9781611977554.ch173.

[bib.bib19] [19] A. Church. Application of Recursive Arithmetic to the Problem of Circuit Synthesis. Journal of Symbolic Logic, 28(4):289–290, 1963. doi:10.2307/2271310.

[bib.bib20] [20] A. Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992. doi:10.1016/0890-5401(92)90048-K.

[bib.bib21] [21] L. Doyen, P. Gaba, and S. Guha. Expectation in Stochastic Games with Prefix-independent Objectives, 2024. doi:10.48550/arXiv.2405.18048.

[bib.bib22] [22] L. Doyen, P. Gaba, and S. Guha. Stochastic Window Mean-payoff Games. In FoSSaCS Part I, volume 14574 of LNCS, pages 34–54. Springer, 2024. doi:10.1007/978-3-031-57228-9_3.

[bib.bib23] [23] R. Durrett. Probability: Theory and Examples. Cambridge University Press, 2010.

[bib.bib24] [24] A. Ehrenfeucht and J. Mycielski. Positional Strategies for Mean Payoff Games. Int. Journal of Game Theory, 8(2):109–113, 1979. doi:10.1007/BF01768705.

[bib.bib25] [25] N. Fijalkow, N. Bertrand, P. Bouyer, R. Brenguier, A. Carayol, J. Fearnley, H. Gimbert, F. Horn, R. Ibsen-Jensen, N. Markey, B. Monmege, P. Novotny, M. Randour, O. Sankur, S. Schmitz, O. Serre, and M. Skomra. Games on Graphs. Online, 2024. doi:10.48550/arXiv.2305.10546.

[bib.bib26] [26] P. Gaba and S. Guha. Optimising expectation with guarantees for window mean payoff in Markov decision processes. In AAMAS, pages 820–828. International Foundation for Autonomous Agents and Multiagent Systems / ACM, 2025. doi:10.5555/3709347.3743600.

[bib.bib27] [27] T. M. Gawlitza and H. Seidl. Games through Nested Fixpoints. In CAV, pages 291–305. Springer, 2009. doi:10.1007/978-3-642-02658-4_24.

[bib.bib28] [28] D. Gillette. Stochastic games with zero stop probabilities, pages 179–188. Princeton University Press, December 1958.

[bib.bib29] [29] H. Gimbert and E. Kelmendi. Submixing and Shift-Invariant Stochastic Games. International Journal of Game Theory, 52(4):1179–1214, 2023. doi:10.1007/s00182-023-00860-5.

[bib.bib30] [30] V. Gurvich and P. B. Miltersen. On the Computational Complexity of Solving Stochastic Mean-Payoff Games. CoRR, abs/0812.0486, 2008. arXiv:0812.0486.

[bib.bib31] [31] A. Hartmanns and B. L. Kaminski. Optimistic Value Iteration. In CAV, pages 488–511. Springer, 2020. doi:10.1007/978-3-030-53291-8_26.

[bib.bib32] [32] K. Hoffman and R. Kunze. Linear Algebra. Prentice-Hall, Inc., 1971.

[bib.bib33] [33] J. Hopcroft and J. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Publishing Co., Inc., 1979.

[bib.bib34] [34] J. Křetínský, T. Meggendorfer, and M. Weininger. Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives. In LICS, pages 1–14, 2023. doi:10.1109/LICS56636.2023.10175771.

[bib.bib35] [35] D. A. Martin. The Determinacy of Blackwell Games. The Journal of Symbolic Logic, 63(4):1565–1581, 1998. doi:10.2307/2586667.

[bib.bib36] [36] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, 1994.

[bib.bib37] [37] T. Quatmann and J-P. Katoen. Sound Value Iteration. In CAV, pages 643–661. Springer, 2018. doi:10.1007/978-3-319-96145-3_37.

[bib.bib38] [38] L. S. Shapley. Stochastic Games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, October 1953. doi:10.1073/pnas.39.10.1095.

[bib.bib39] [39] U. Zwick and M. Paterson. The Complexity of Mean Payoff Games on Graphs. Theoretical Computer Science, 158(1&2):343–359, 1996. doi:10.1016/0304-3975(95)00188-3.

Expectation in Stochastic Games with Prefix-Independent Objectives

Abstract

Keywords and phrases:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Contributions.

Related work.

2 Preliminaries

Probability distributions.

Stochastic games.

Subgames and traps.

Boolean objectives.

Quantitative objectives.

Prefix independence.

Strategies.

Satisfaction probability of Boolean objectives.

Expected value of quantitative objectives.

Proposition 1 (Bellman equations).

Decision problems.

3 Reducing expectation to almost-sure satisfaction

3.1 Value vectors and value classes

Definitions and notations.

Example 2.

Consequences of the Bellman condition.

Proposition 3.

Proposition 4.

Proposition 5.

Example 6.

3.2 Characterisation of the value vector

Theorem 7.

Proof.

Lemma 8.

Proof.

Lemma 9.

Proof sketch..

Corollary 10.

3.3 Bounding the denominators in the value vector

Theorem 11.

Proposition 12.

Proposition 13.

Proposition 14.

4 Expectation of window mean-payoff objectives

Expected window mean-payoff values.

Decision problems.

4.1 Expected fixed window mean-payoff value

Lemma 15.

Corollary 16.

Lemma 17.

Theorem 18.

Proof.

4.2 Expected bounded window mean-payoff value

Lemma 19.

Corollary 20.

Lemma 21.

Theorem 22.

Proof sketch.

5 Discussion

Comparison with [16].

Applicability to other prefix-independent objectives.

Practical implementation.

References