Permissive Equilibria in Multiplayer Reachability Games

Goeminne, Aline; Monmege, Benjamin

doi:10.4230/LIPIcs.CSL.2025.23

Permissive Equilibria in Multiplayer Reachability Games

Aline Goeminne F.R.S.-FNRS & UMONS – Université de Mons, Belgium Benjamin Monmege

Aix-Marseille Univ, CNRS, LIS, Marseille, France

Abstract

We study multi-strategies in multiplayer reachability games played on finite graphs. A multi-strategy prescribes a set of possible actions, instead of a single action as usual strategies: it represents a set of all strategies that are consistent with it. We aim for profiles of multi-strategies (a multi-strategy per player), where each profile of consistent strategies is a Nash equilibrium, or a subgame perfect equilibrium. The permissiveness of two multi-strategies can be compared with penalties, as already used in the two-player zero-sum setting by Bouyer, Duflot, Markey and Renault [3]. We show that we can decide the existence of a multi-strategy profile that is a Nash equilibrium or a subgame perfect equilibrium, while satisfying some upper-bound constraints on the penalties in PSPACE, if the upper-bound penalties are given in unary. The same holds when we search for multi-strategies where certain players are asked to win in at least one play or in all plays.

Keywords and phrases:

multiplayer reachability games, penalties, permissive equilibria

Funding:

Aline Goeminne: Postdoctoral Researcher of the Fonds de la Recherche Scientifique – FNRS.

Benjamin Monmege: This author was partially funded by ANR JCJC Quasy ANR-23-CE48-0008.

Copyright and License:

2012 ACM Subject Classification:

Software and its engineering

\rightarrow

Formal methods ; Theory of computation

\rightarrow

Logic and verification ; Theory of computation

\rightarrow

Solution concepts in game theory

Editors:

Jörg Endrullis and Sylvain Schmitz

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Nowadays, computer systems are ubiquitous and increasingly complex. Errors in such systems can have dramatic consequences. This is why model checking provides a formal tool to ensure these systems are correct and meet certain specifications. Synthesis, on the other hand, allows for the construction of a correct-by-construction system model: concepts from game theory can be used for this purpose.

Two-player zero-sum games are commonly used to model a system interacting with its environment. In this model, the system aims to achieve a goal while the environment acts antagonistically to prevent it. This situation can be abstracted as a game played on a graph involving two players (the system and the environment). The graph represents the different possible configurations of the system, and an infinite path in this graph is a sequence of interactions between the system and the environment. In this model, building a correct system amounts to synthesizing a winning strategy, that is, a way for the system to play that ensures its goal is met regardless of the environment’s behavior.

Unlike the purely antagonistic view of two-player zero-sum games, multiplayer games allow for modeling situations where the environment may have its own goals, or where the system consists of different interacting components, each with its own specification. In this context, the notion of a winning strategy is no longer appropriate, hence notions of equilibria are studied: Nash equilibria or subgame perfect equilibria, which more adequately account for the sequential aspect of games played on graphs (avoiding non-credible threats). Intuitively, an equilibrium can be seen as a contract among players such that no player has an incentive to unilaterally change his strategy.

It is well known that different equilibria can coexist in the same game. In particular, a game may include an equilibrium where no player achieves his goal and an equilibrium where all players achieve their goals. The latter equilibrium is more relevant than the former. Therefore, it seems appropriate to focus on the existence and synthesis of relevant equilibria (according to certain relevance criteria).

Even if the synthesis process provides an equilibrium, its implementation may fail. This can be due to the occurrence of errors; for example, the action prescribed by the equilibrium may be unavailable. Synthesizing robust equilibria against such perturbations is therefore essential. To address these robustness issues, the classic notion of a player’s strategy can be replaced by the notion of a multi-strategy: unlike a classic strategy that provides a single action at each decision point, a multi-strategy provides a subset of possible actions (see, for example, [2, 3]).

Intuitively, a multi-strategy is more permissive than another if the first allows more behaviors than the second. There are different ways to express this permissiveness. A qualitative view of permissiveness is studied in [2], where a multi-strategy is more permissive than another if the set of resulting plays includes those of the second multi-strategy. A quantitative view is addressed in [3] via the notion of penalty of multi-strategies, where a cost is associated with each edge not chosen by the multi-strategy. Thus, the penalty of a multi-strategy is the highest sum of blocked edges along a play consistent with the multi-strategy.

Related works.

In [2], permissiveness in parity games (a highly expressive winning condition) is studied: considering the qualitative view of permissiveness, there does not necessarily exist a most permissive strategy. However, one exists when restricted to memoryless strategies (which always make the same decision in any given vertex of the game). By reducing to safety games, the authors show that it is possible to compute the most permissive strategy. In [3], the above-mentioned quantitative view of permissiveness is implemented. Several penalty measures and games are used, and the complexity of computing the most permissive strategies in this context is given. More general parity objectives are then studied in [5]. Recently, other methods have explored permissiveness in two-player games using templates to concisely represent multiple strategies in graph games [1]. This approach is also used in multiplayer games for the synthesis of secure equilibria [16].

Independently, different equilibria (Nash or subgame perfect) have been studied in multiplayer games to ensure a strategy profile where no player has an incentive to deviate. Several works have characterized such equilibria and studied the complexity of decision problems related to the existence of relevant equilibria. Notably, these works have focused on the study and characterization of ( $i$ ) Nash equilibria in games where players have classical $\omega$ -regular objectives [17, 11], ( $i i$ ) weak subgame perfect equilibria (a variant of subgame perfect equilibria) where players have classical $\omega$ -regular objectives [9] (this work also characterizes subgame perfect equilibria when the studied objectives are either qualitative reachability or safety objectives); ( $i i i$ ) subgame perfect equilibria for games with quantitative reachability objectives [10]; ( $i v$ ) subgame perfect equilibria for games with parity objectives [6] or ( $v$ ) mean-payoff objectives [8, 7].

Contribution.

Our goal is to combine these two research directions by studying permissiveness in strategy profiles that describe equilibria (Nash or subgame perfect). In this first work, we focus on reachability games only. We study permissive strategy profiles such that all the fully described strategy profiles they contain are equilibria. The motivation is to allow greater latitude and robustness of equilibrium profiles without losing quality in the final goal of secure synthesis. With the qualitative view, as in the two-player game framework [2], it is not difficult to show that there does not necessarily exist most permissive profiles that are Nash equilibria (or subgame perfect equilibria). We will thus consider a quantitative view of permissiveness similar to the penalty measures introduced in [3] for two-player games. We obtain a characterization based on trees, and decision algorithms with penalties bounded by a given threshold in polynomial space with respect to the size of the game and the maximal penalty bound (if this is encoded in unary). We also solve the problem of synthesis of robust and relevant equilibria, where the relevance is the constraint that all derived equilibria ensure that all players in a fixed subset satisfy their objective (strongly winning), or that at least one derived equilibrium ensures this guarantee (weakly winning).

All missing proofs can be found in the long version of this article [14].

2 Multiplayer reachability games

A (multiplayer) reachability games is a tuple $(\operatorname{N},V,(V_{i})_{i\in\operatorname{N}},E,(\operatorname{F}_{i})_{i% \in\operatorname{N}},v_{0})$ , that we denote $(\operatorname{\mathcal{G}},v_{0})$ to emphasize the $v_{0}$ component, where $\operatorname{N}=\{1,\ldots,n\}$ is a finite set of $n$ players, $(V,E)$ is a finite directed graph without deadlocks (for all $v\in V$ , there exists $v^{\prime}\in V$ such that $(v,v^{\prime})\in E$ ), $(V_{i})_{i\in\operatorname{N}}$ is a partition of $V$ between the players, $\operatorname{F}_{i}\subseteq V$ is the set of target vertices, called target set, of player $i\in\operatorname{N}$ , and $v_{0}$ is an initial vertex. Given a vertex $v\in V$ , we let $\operatorname{Succ}(v)=\{v^{\prime}\in V\mid(v,v^{\prime})\in E\}$ be the set of all successors of $v$ .

A play in $\operatorname{\mathcal{G}}$ is an infinite sequence of vertices consistent with the graph structure, i.e., if $\rho=\rho_{0}\rho_{1}\cdots$ is a play, then for all $k\in\mathbb{N}$ , $\rho_{k}\in V$ and $(\rho_{k},\rho_{k+1})\in E$ . The set of plays is denoted by $\operatorname{Plays}$ , while $\operatorname{Plays}(v)$ denotes the set of plays beginning in $v$ . Given a play $\rho=\rho_{0}\rho_{1}\cdots$ and $k\in\mathbb{N}$ , $\rho_{\geq k}$ is the suffix $\rho_{k}\rho_{k+1}\cdots$ of $\rho$ .

For each player $i\in\operatorname{N}$ , we let $\operatorname{Gain}_{i}$ be the gain function that associates with each play the value $1$ if the play is winning for player $i$ , $0$ if it is losing. For a reachability game as above, we have $\operatorname{Gain}_{i}(\rho)=1$ iff player $i$ reaches his target set in $\rho$ , i.e., $\rho=\rho_{0}\rho_{1}\cdots$ and there exists $k\in\mathbb{N}$ with $\rho_{k}\in\operatorname{F}_{i}$ . In the rest of this article, $(\operatorname{\mathcal{G}},v_{0})$ will always denote a reachability game associated with these gain functions.

A history is a finite sequence of vertices $h=h_{0}h_{1}\cdots h_{k}$ with $k\in\mathbb{N}$ defined similarly. The set of histories is denoted by $\operatorname{Hist}$ , while $\operatorname{Hist}(v)$ denotes the set of histories beginning in $v$ . For all $i\in\operatorname{N}$ , we write $\operatorname{Hist}_{i}$ to denote the set of histories ending in a vertex owned by player $i$ . If $h=h_{0}\cdots h_{k}$ with $k\in\mathbb{N}$ is a history, $\operatorname{Last}(h)$ denotes the last vertex $h_{k}$ , while $|h|$ denotes its length $k$ . Given a history $h=h_{0}\cdots h_{k}$ , $\operatorname{Visit}(h)=\{i\in\operatorname{N}\mid\exists 1\leq\ell\leq k\;h_{% \ell}\in F_{i}\}$ is the set of players who visit their target set along $h$ .

A strategy of player $i$ is a function $\sigma_{i}\colon\operatorname{Hist}_{i}(v_{0})\rightarrow V$ that assigns to each history $hv\in\operatorname{Hist}_{i}(v_{0})$ a vertex $v^{\prime}$ such that $(v,v^{\prime})\in E$ . A play $\rho=\rho_{0}\rho_{1}\cdots$ is consistent with a strategy $\sigma_{i}$ if for all $\rho_{k}\in V_{i}$ , $\sigma_{i}(\rho_{0}\cdots\rho_{k})=\rho_{k+1}$ . A strategy profile is a tuple $\sigma=(\sigma_{i})_{i\in\operatorname{N}}$ of strategies, one per player: there is a unique play from $v_{0}$ which is consistent with each strategy $\sigma_{i}$ , and we call this play the outcome of $\sigma$ , denoted by $\langle\sigma\rangle_{v_{0}}$ . To highlight the role of player $i$ , we sometimes write $\sigma=(\sigma_{i},\sigma_{-i})$ where $\sigma_{-i}$ denotes the strategy profile of the players other than player $i$ .

The strategy profile $\sigma$ is a Nash equilibrium (NE) in $(\operatorname{\mathcal{G}},v_{0})$ if no player has an incentive to deviate unilaterally from his strategy to increase his gain, i.e., if for all players $i\in\operatorname{N}$ and all strategies $\sigma^{\prime}_{i}$ of player $i$ , $\operatorname{Gain}_{i}(\langle\sigma\rangle_{v_{0}})\geq\operatorname{Gain}_{% i}(\langle\sigma^{\prime}_{i},\sigma_{-i}\rangle_{v_{0}})$ .

The concept of subgame perfect equilibrium (SPE) takes more into account the sequential nature of games played on graphs by avoiding non-credible threat, a well-known weakness of NEs in this setting. Informally, a strategy profile is an SPE if it is an NE in all subgames. Given a history $hv\in\operatorname{Hist}(v_{0})$ , the subgame $(\operatorname{\mathcal{G}}_{\restriction h},v)$ is obtained from $\operatorname{\mathcal{G}}$ by changing the initial vertex to $v$ , and by considering the gain functions $(\operatorname{Gain}_{i\restriction h})_{i\in\operatorname{N}}$ taking into account the players that have won in history $h$ : we thus write, for each $i\in\operatorname{N}$ , $\operatorname{Gain}_{i\restriction h}(\rho)=\operatorname{Gain}_{i}(h\rho)$ for all $\rho\in\operatorname{Plays}(v)$ . Moreover, if $\sigma_{i}$ is a strategy of player $i$ in $\operatorname{\mathcal{G}}$ , then $\sigma_{i\restriction h}$ is the strategy of player $i$ in the subgame $(\operatorname{\mathcal{G}}_{\restriction h},v)$ such that for all $h^{\prime}\in\operatorname{Hist}_{i}(v)$ , $\sigma_{i\restriction h}(h^{\prime})=\sigma_{i}(hh^{\prime})$ . In the same way, from a strategy profile $\sigma$ in $\operatorname{\mathcal{G}}$ , we can derive a strategy profile $\sigma_{\restriction h}$ in $(\operatorname{\mathcal{G}}_{\restriction h},v)$ . We now define formally the concept of SPEs: a strategy profile $\sigma$ is an SPE in $\operatorname{\mathcal{G}}$ if for all $i\in\operatorname{N}$ , for all $hv\in\operatorname{Hist}_{i}(v_{0})$ , $\sigma_{\restriction h}$ is an NE in $(\operatorname{\mathcal{G}}_{\restriction h},v)$ . Notice that an SPE is an NE and that there always exists an SPE (and thus an NE) in a reachability game [17].

3 Permissiveness in strategies

Our goal is to allow for some permissiveness in strategies of all players, i.e., being able to underspecify the strategies of the players, while maintaining that they describe an NE or an SPE.

A multi-strategy of player $i$ is a function $\operatorname{\Theta}_{i}\colon\operatorname{Hist}_{i}(v_{0})\rightarrow 2^{V}% \setminus\{\emptyset\}$ that assigns to each history $hv\in\operatorname{Hist}_{i}(v_{0})$ a non-empty set of vertices $\operatorname{A}\subseteq V$ such that for all $v^{\prime}\in\operatorname{A}$ , $(v,v^{\prime})\in E$ . Notice that a strategy $\sigma_{i}$ can be seen as a multi-strategy $\operatorname{\Theta}_{i}$ where, for all $hv\in\operatorname{Hist}_{i}(v_{0})$ , $\operatorname{\Theta}_{i}(hv)$ is the singleton $\{\sigma_{i}(hv)\}$ . A multi-strategy profile $\operatorname{\Theta}=(\operatorname{\Theta}_{i})_{i\in\operatorname{N}}$ is a tuple of multi-strategies, one per player.

Unlike strategies, when we fix a game $\operatorname{\mathcal{G}}$ and a multi-strategy profile $\operatorname{\Theta}$ , there exist several plays beginning in $v_{0}$ that are consistent with all the multi-strategies $\operatorname{\Theta}_{i}$ . To describe them, we say that a strategy $\sigma_{i}$ is consistent with a multi-strategy $\operatorname{\Theta}_{i}$ , written $\sigma_{i}\lesssim\operatorname{\Theta}_{i}$ if for all $hv\in\operatorname{Hist}_{i}(v_{0})$ , $\sigma_{i}(hv)\in\operatorname{\Theta}_{i}(hv)$ . We extend this notation to profiles of strategies, as expected. Then, we let $\langle\operatorname{\Theta}\rangle_{v_{0}}$ be the set of plays $\langle\sigma\rangle_{v_{0}}$ for all profiles $\sigma$ of strategies consistent with the multi-strategy $\operatorname{\Theta}$ . We call this set the outcomes of $\operatorname{\Theta}$ . We also let $\langle\operatorname{\Theta}\rangle_{v_{0}}^{\mathrm{H}}$ be the set of histories consistent with the multi-strategy $\operatorname{\Theta}$ , i.e., the finite prefixes of plays in $\langle\operatorname{\Theta}\rangle_{v_{0}}$ .

Our goal is to compute profiles of multi-strategies such that all profiles of consistent strategies are NEs or SPEs: such profiles of multi-strategies are called permissive NEs or permissive SPEs. By the existence of NEs and SPEs in reachability games, we straightforwardly obtain the existence of permissive NEs and permissive SPEs. We thus want to study most permissive NEs or SPEs, i.e., profiles of multi-strategies that are permissive NEs or SPEs, and such that no “more permissive” multi-strategies are still permissive NEs or SPEs.

The natural first attempt would be to look for a notion of “more permissive” that is set-theoretic, with respect to a given solution concept. We would thus say that a profile of multi-strategies $\operatorname{\Theta}$ is at least as permissive as a profile of multi-strategies $\operatorname{\Theta}^{\prime}$ if for all $i\in\operatorname{N}$ , for all histories $h\in\operatorname{Hist}_{i}(v_{0})$ , $\operatorname{\Theta}_{i}(h)\supseteq\operatorname{\Theta}^{\prime}_{i}(h)$ . Then, $\operatorname{\Theta}$ would be more permissive than $\operatorname{\Theta}^{\prime}$ if it is at least as permissive, while being different (for at most one history). Finally, $\operatorname{\Theta}$ would be a most permissive NE or SPE if it is a permissive NE or SPE, respectively, and no permissive NE or SPE, respectively, is more permissive than $\operatorname{\Theta}$ .

This natural definition is very problematic in the realm of reachability games (as already noticed in the context of winning strategies in parity games by [2]) where no most permissive NE or SPE could exist, as demonstrated by the game in Figure 1.

Figure 1: In this game, player

1

owns all vertices and wants to reach

v_{1}

. For all

k\in\mathbb{N}

, we define the multi-strategy

\operatorname{\Theta}^{k}_{1}

such that for all

h\in\operatorname{Hist}(v_{0})

,

\operatorname{\Theta}^{k}_{1}(h)=\{v_{0},v_{1}\}

if

\operatorname{Last}(h)=v_{0}

and

|\{n\in\mathbb{N}\mid h_{n}=v_{0}\}|\leq k

, and

\operatorname{\Theta}^{k}_{1}(h)=\{v_{1}\}

otherwise. We have that for all

k\in\mathbb{N}

, for all

\sigma_{1}\lesssim\operatorname{\Theta}^{k}_{1}

,

\operatorname{Gain}_{1}(\langle\sigma_{1}\rangle_{v_{0}})=1

(and thus

\operatorname{\Theta}^{k}_{1}

is a permissive SPE), but for all

k\in\mathbb{N}

,

\langle\operatorname{\Theta}^{k}_{1}\rangle_{v_{0}}\subseteq\langle% \operatorname{\Theta}^{k+1}_{1}\rangle_{v_{0}}

.

We thus propose another way to measure the permissiveness of a multi-strategy, inspired by the definition of penalty used in [3] to describe permissive winning strategies in two-player games. To define the notion of penalty in our context, we equip the game with a function $w\colon E\to\mathbb{N}$ assigning a non-negative weight to each edge: if unspecified, we will consider that every edge has weight $1$ . The player who owns the vertex at the source of an edge $e$ will pay the penalty $w(e)$ if he decides to not include the edge $e$ in his multi-strategy. All penalties are then counted additively. Formally, for a multi-strategy profile $\operatorname{\Theta}$ , we first define for each player $i\in\operatorname{N}$ the penalty of player $i$ w.r.t. $\operatorname{\Theta}$ in a play $\rho=\rho_{0}\rho_{1}\cdots$ by induction on the length of its prefixes:

$\blacksquare$

$\operatorname{Penalty}_{i}^{\operatorname{\Theta}}(\varepsilon)=0$ where $\varepsilon$ denotes the empty prefix;
$\blacksquare$

for $h=\rho_{0}\cdots\rho_{k}$ , $\displaystyle\operatorname{Penalty}_{i}^{\operatorname{\Theta}}(hv)=\begin{% cases}\displaystyle\operatorname{Penalty}_{i}^{\operatorname{\Theta}}(h)+\sum_% {v^{\prime}\in\operatorname{Succ}(v)\setminus\operatorname{\Theta}_{i}(hv)}w(v% ,v^{\prime})&\text{ if }v\in V_{i}\\ \operatorname{Penalty}_{i}^{\operatorname{\Theta}}(h)&\text{ otherwise}\end{cases}$ ;
$\blacksquare$

$\operatorname{Penalty}_{i}^{\operatorname{\Theta}}(\rho)=\lim_{k\rightarrow+% \infty}\operatorname{Penalty}_{i}^{\operatorname{\Theta}}(\rho_{0}\cdots\rho_{% k})$ : this limit exists (it may be equal to $+\infty$ ) since $(\operatorname{Penalty}_{i}^{\operatorname{\Theta}}(\rho_{0}\cdots\rho_{k}))_{k}$ is a non-decreasing sequence of natural numbers.

There are several ways to associate a penalty with a multi-strategy profile $\operatorname{\Theta}$ , depending on how we take into account the non-determinism offered in the multi-strategies. A first choice consists in considering a worst-case scenario in the outcomes (without considering the possible deviations). A second choice consists in considering only the deviations of one player, i.e., to consider that the retaliation of other players with respect to the deviation of a player will count in the final penalty. It is then possible to combine both types of penalties, though we will treat them separately in the rest of this article.

Definition 1 (Penalties).

Let $\operatorname{\Theta}$ be a multi-strategy profile in $(\operatorname{\mathcal{G}},v_{0})$ . The main penalty and retaliation penalty of player $i$ with respect to $\operatorname{\Theta}$ are defined respectively as

	$\displaystyle\operatorname{MPenalty}_{i}(\operatorname{\Theta},v_{0})$	$\displaystyle=\sup_{\rho\in\langle\operatorname{\Theta}\rangle_{v_{0}}}% \operatorname{Penalty}_{i}^{\operatorname{\Theta}}(\rho)$
	$\displaystyle\operatorname{RPenalty}_{i}(\operatorname{\Theta},v_{0})$	$\displaystyle=\sup_{hv\in\operatorname{Hist}_{i}(v_{0})\setminus\langle% \operatorname{\Theta}\rangle_{v_{0}}^{\mathrm{H}}}\sup\{\operatorname{Penalty}% _{i}^{\operatorname{\Theta}}(\rho)\mid\rho\in\langle\operatorname{\Theta}_{% \restriction hv}\rangle_{v}\}$

If there are no histories $h v$ in $\operatorname{Hist}_{i}(v_{0})\setminus\langle\operatorname{\Theta}\rangle_{v_% {0}}^{\mathrm{H}}$ , we let $\operatorname{RPenalty}_{i}(\operatorname{\Theta},v_{0})=0$ .

The existence of a multi-strategy profile which satisfies some upper-bounds on penalties does not provide any certainty about the satisfaction of the reachability objectives of the players. For this reason, we also consider multi-strategy profiles that satisfy some properties on the set of players who satisfy their objective. Let $\operatorname{Win}$ be a subset of players and $\operatorname{\Theta}$ be a multi-strategy profile. Then, $\operatorname{\Theta}$ is said weakly winning if there exists a strategy profile $\sigma$ which is consistent with $\operatorname{\Theta}$ and such its outcome is winning for all players in $\operatorname{Win}$ . Similarly, $\operatorname{\Theta}$ is said strongly winning if for each strategy profile $\sigma$ which is consistent with $\operatorname{\Theta}$ , its outcome is winning for all players in $\operatorname{Win}$ .

Definition 2 (Weakly and strongly winning).

Given a subset of player $\operatorname{Win}\subseteq\operatorname{N}$ and a multi-strategy profile $\operatorname{\Theta}$ ,

$\blacksquare$

$\operatorname{\Theta}$ is said weakly winning with respect to $\operatorname{Win}$ if there exists a strategy profile $\sigma$ such that $\sigma\lesssim\operatorname{\Theta}$ and for all $i\in\operatorname{Win}$ , $\operatorname{Gain}_{i}(\langle\sigma\rangle_{v_{0}})=1$ .
$\blacksquare$

$\operatorname{\Theta}$ is said strongly winning with respect to $\operatorname{Win}$ if for all strategy profiles $\sigma$ such that $\sigma\lesssim\operatorname{\Theta}$ , we have that for all $i\in\operatorname{Win}$ , $\operatorname{Gain}_{i}(\langle\sigma\rangle_{v_{0}})=1$ .

Figure 2: An example of a reachability game where player 1 (resp. player 2) owns circle (resp. rectangle) vertices. The initial vertex is

v_{0}

. Target vertices

F_{1}=\{v_{3},v_{6},v_{8},v_{9}\}

of player

1

and

F_{2}=\{v_{4},v_{6}\}

of player

2

are drawn with gray vertices and double-bordered vertices respectively.

Figure 3: Examples of permissive equilibria: (a) a permissive NE and (b) a permissive SPE.

Example 3.

An example of a reachability game with two players is depicted in Figure 2. The edge labelled with $10$ corresponds to the penalty if player 2 decides not to allow this edge: all other penalties are set to $1$ by default. A multi-strategy is represented with red edges (black dotted edges are thus the ones that are not selected in the multi-strategy) in Figure 3(a).¹¹1Notice that, in this example, the set of successors prescribed by multi-strategies only depends on the current vertex and not on the past history. All strategy profiles that are consistent with this multi-strategy depend on the choice of successor for $v_{0}$ among $\{v_{1},v_{2}\}$ . It is indeed a permissive NE since the consistent strategies are NEs: player 1 has no interest in deviating from either $v_{1}$ or $v_{2}$ in $v_{0}$ , since all strategies lead to plays where he visits his target set, while going to $v_{5}$ make him lose. It has a main penalty of $2$ for player $1$ and $0$ for player $2$ . Player $1$ can do slightly better by allowing the edge $(v_{1},v_{4})$ in the multi-strategy: this remains a permissive NE (now player $2$ wins in certain plays, but he is left with no real choices to make), and player 1 now gets a main penalty of $1$ . This modified permissive NE is strongly winning w.r.t. $\{1\}$ , and weakly winning w.r.t. $\{1,2\}$ . It is not a permissive SPE since player $2$ has a profitable deviation from $v_{5}$ by going to $v_{6}$ where he wins. A permissive SPE is depicted in Figure 3(b), that is strongly winning w.r.t. $\{1\}$ , but only weakly winning w.r.t $\{1,2\}$ . Player $2$ has a main penalty of $11$ (because he cuts edges $(v_{5},v_{5})$ and $(v_{5},v_{7})$ ), while player $1$ has a retaliation penalty of $1$ (because he cuts edge $(v_{7},v_{9})$ ). If we want a permissive SPE that is strongly winning w.r.t. $\{1,2\}$ , we need to increase the main penalty of player 1 to 2 by removing edges $(v_{1},v_{3})$ and $(v_{0},v_{2})$ . However, we may decrease to 0 the retaliation penalty of player $1$ by adding the edge $(v_{7},v_{9})$ (since it is equally good to him anyway).

We now define the problems we study in the rest of the article, where we use the word “equilibrium” to either mean NE or SPE, depending on the solution concept we want to check. In all these problems, we give different penalty bounds for the main penalty and the retaliation penalty. Notice though that the bounds can be set to $+\infty$ , relaxing the constraints in this case.

Problem 1 (Constrained penalty problem).

Given a reachability game $(\operatorname{\mathcal{G}},v_{0})$ , $m\in(\mathbb{N}\cup\{\infty\})^{n}$ and $r\in(\mathbb{N}\cup\{\infty\})^{n}$ , does there exist a permissive equilibrium $\operatorname{\Theta}$ in $(\operatorname{\mathcal{G}},v_{0})$ such that for all $i\in\operatorname{N}$ , $\operatorname{MPenalty}_{i}(\operatorname{\Theta},v_{0})\leq m_{i}$ and $\operatorname{RPenalty}_{i}(\operatorname{\Theta},v_{0})\leq r_{i}$ ?

Problem 2 (Weakly winning with constrained penalty problem).

Given a reachability game $(\operatorname{\mathcal{G}},v_{0})$ , $m\in(\mathbb{N}\cup\{\infty\})^{n}$ , $r\in(\mathbb{N}\cup\{\infty\})^{n}$ and $\operatorname{Win}\subseteq\operatorname{N}$ , does there exist a permissive equilibrium $\operatorname{\Theta}$ in $(\operatorname{\mathcal{G}},v_{0})$ such that (i) for all $i\in\operatorname{N}$ , $\operatorname{MPenalty}_{i}(\operatorname{\Theta},v_{0})\leq m_{i}$ and $\operatorname{RPenalty}_{i}(\operatorname{\Theta},v_{0})\leq r_{i}$ and (ii) $\operatorname{\Theta}$ is weakly winning w.r.t. $\operatorname{Win}$ ?

Problem 3 (Strongly winning with constrained penalty problem).

Given a reachability game $(\operatorname{\mathcal{G}},v_{0})$ , $m\in(\mathbb{N}\cup\{\infty\})^{n}$ , $r\in(\mathbb{N}\cup\{\infty\})^{n}$ and $\operatorname{Win}\subseteq\operatorname{N}$ , does there exist a permissive equilibrium $\operatorname{\Theta}$ in $(\operatorname{\mathcal{G}},v_{0})$ such that (i) for all $i\in\operatorname{N}$ , $\operatorname{MPenalty}_{i}(\operatorname{\Theta},v_{0})\leq m_{i}$ and $\operatorname{RPenalty}_{i}(\operatorname{\Theta},v_{0})\leq r_{i}$ and (ii) $\operatorname{\Theta}$ is strongly winning w.r.t. $\operatorname{Win}$ ?

We show in the rest of this article that all these problems, for NEs and SPEs, are decidable in PSPACE, if the upper-bound penalties are encoded in unary. To do so, we characterize the permissive equilibria in the various problems in Section 4. In Section 5, we then show that tree-like witnesses can be found if the according permissive equilibria exist. These witnesses have a height bounded by a polynomial depending on the size of the game and the largest upper-bound on penalties. We use these witnesses to obtain the PSPACE decision procedures.

4 Characterizations of permissive equilibria

We now characterize permissive equilibria of the reachability game $(\operatorname{\mathcal{G}},v_{0})$ . This is a first step towards their computation in the next section. We provide a characterization for permissive NEs in Section 4.1 and one for permissive SPEs in Section 4.2. These characterizations are inspired by existing ones for classical NEs (resp. SPEs) [11, 9]. The latter rely on properties that a play (resp. a set of plays) must satisfy in order to be the outcome of an NE (resp. the set of subgame outcomes of an SPE). However, the outcomes of permissive equilibria are a set of plays and not a simple play. For that reason, the characterizations of permissive equilibria employ trees that we first formally define.

Trees.

We call tree over $\operatorname{\mathcal{G}}$ rooted at $v$ (for some $v\in V$ ) any subset $\operatorname{\mathcal{T}}$ of non-empty histories of $\operatorname{\mathcal{G}}$ that contains $v$ and such that if $hu\in\operatorname{\mathcal{T}}$ then $h\in\operatorname{\mathcal{T}}$ . All $h\in\operatorname{\mathcal{T}}$ are called nodes of the tree, the particular node $v$ is called the root of the tree, and for all $hu\in\operatorname{\mathcal{T}}$ , $h$ is called the parent of $h u$ , and $h u$ a child of $h$ .

As for histories in an arena, for all $hu\in\operatorname{\mathcal{T}}$ , we let $\operatorname{Last}(hu)=u$ . The depth of a node $h\in\operatorname{\mathcal{T}}$ , written $\operatorname{depth}(h)$ , is equal to $|h|$ and its height, denoted by $\operatorname{height}(h)$ , is given by $\sup\{|\operatorname{Last}(h)h^{\prime}|\mid h^{\prime}\in\operatorname{Hist}% \text{ and }hh^{\prime}\in\operatorname{\mathcal{T}}\}$ . The height of the tree corresponds to the height of its root. A node $h\in\operatorname{\mathcal{T}}$ is called a leaf if $\operatorname{height}(h)=0$ .

We denote by $\operatorname{\mathcal{T}}_{\restriction hu}$ , the subtree of $\operatorname{\mathcal{T}}$ rooted at $u$ for some $hu\in\operatorname{\mathcal{T}}$ , that is the set of non-empty histories $h^{\prime}\in\operatorname{Hist}(u)$ such that $hh^{\prime}\in\operatorname{\mathcal{T}}$ .

A (finite or infinite) branch of the tree is a maximal (finite or infinite) sequence of nodes $h_{0}h_{1}\cdots$ such that for all $k\in\mathbb{N}$ , $h_{k}$ is the parent of $h_{k+1}$ . Finally, we denote by $\operatorname{\operatorname{\mathcal{T}}^{\infty}}$ the set of plays in $\operatorname{\mathcal{G}}$ represented by infinite branches in $\operatorname{\mathcal{T}}$ , i.e.,

\operatorname{\operatorname{\mathcal{T}}^{\infty}}=\{\rho_{0}\rho_{1}\cdots\in% \operatorname{Plays}\mid\text{there exists a branch }h_{0}h_{1}\cdots\in% \operatorname{\mathcal{T}}\text{ st. }\forall k\in\mathbb{N},\;\rho_{k}=% \operatorname{Last}(h_{k})\}

In what follows, we consider outcomes of multi-strategies as trees. Indeed, given a multi-strategy $\operatorname{\Theta}$ , $\langle\operatorname{\Theta}\rangle_{v_{0}}^{\mathrm{H}}$ can be seen as a tree $\operatorname{\mathcal{T}}$ over $\operatorname{\mathcal{G}}$ rooted at $v_{0}$ and $\langle\operatorname{\Theta}\rangle_{v_{0}}$ corresponds to $\operatorname{\operatorname{\mathcal{T}}^{\infty}}$ . In particular, penalties can also be defined on trees, mimicking the definition for profiles of multi-strategies. The penalty of a tree $\operatorname{\mathcal{T}}$ for a player $i$ , denoted by $\operatorname{Penalty}_{i}(\operatorname{\mathcal{T}})$ , is the maximal penalty of a branch of $\operatorname{\mathcal{T}}$ , the penalty of a branch being equal to the penalty of the associated play $\rho$ w.r.t. any profile of multi-strategies that is consistent with the choices appearing in $\operatorname{\mathcal{T}}$ along the play $\rho$ . Formally, let $\operatorname{\mathcal{T}}$ be a tree and $i\in\operatorname{N}$ be a player. For each $hv=v_{1}\cdots v_{k}v\in\operatorname{\mathcal{T}}$ , we define $\operatorname{Blocked}(h)=\{u\in\operatorname{Succ}(v_{k})\mid hu\not\in% \operatorname{\mathcal{T}}\}$ as the set of blocked successors of $h$ in $\operatorname{\mathcal{T}}$ and

\operatorname{Penalty}_{i}(hv)=\begin{cases}0&\text{ if }h=\varepsilon\\ \operatorname{Penalty}_{i}(h)+\sum_{u\in\operatorname{Blocked}(h)}w(v_{k},u)&% \text{ if }v_{k}\in V_{i}\\ \operatorname{Penalty}_{i}(h)&\text{ otherwise.}\end{cases}

Moreover, for all plays $\rho=\rho_{0}\rho_{1}\cdots\in\operatorname{\operatorname{\mathcal{T}}^{\infty}}$ , we let $\operatorname{Penalty}_{i}(\rho)=\lim_{k\rightarrow+\infty}\operatorname{% Penalty}_{i}(\rho_{0}\cdots\rho_{k})$ . Thus, the penalty of a tree $\operatorname{\mathcal{T}}$ for a player $i\in\operatorname{N}$ is naturally defined as:

\operatorname{Penalty}_{i}(\operatorname{\mathcal{T}})=\sup\{\operatorname{% Penalty}_{i}(\rho)\mid\rho\in\operatorname{\operatorname{\mathcal{T}}^{\infty}% }\}.

4.1 Characterization of permissive Nash equilibria

In order to characterize permissive Nash equilibria, we start by defining good trees, by checking two conditions. The first one, called resistance to internal deviations, means that at any node $h$ of the tree such that $\operatorname{Last}(h)$ belongs to player $i$ , if $h$ has at least two children, the plays starting with $h$ are either all losing, or all winning, for player $i$ . The second one, called resistance to external deviations, means that at any node $h u$ of the tree with $u$ belonging to player $i$ , if player $i$ has the possibility to play to a successor $u^{\prime}$ not in the tree from which he has a winning strategy, then all plays in the subtree from $h u$ must be winning for player $i$ .

Definition 4.

Let $\operatorname{\mathcal{T}}$ be a tree over $(\operatorname{\mathcal{G}},v_{0})$ .

1.

Given a subset of players $\operatorname{D}\subseteq\operatorname{N}$ , the tree $\operatorname{\mathcal{T}}$ is $\operatorname{D}$ -resistant to internal deviations if for all $i\in\operatorname{D}$ and for all $hv\in\operatorname{\mathcal{T}}$ such that $v\in V_{i}$ and $|\{hvv^{\prime}\in\operatorname{\mathcal{T}}\mid v^{\prime}\in V\}|\geq 2$ , we have that for all $\rho,\rho^{\prime}\in\operatorname{\mathcal{T}}^{\infty}_{\restriction hv}$ , $\operatorname{Gain}_{i}(h\rho)=\operatorname{Gain}_{i}(h\rho^{\prime})$ . If $\operatorname{D}=\operatorname{N}$ , we simply say that $\operatorname{\mathcal{T}}$ is resistant to internal deviations.
2.

The tree $\operatorname{\mathcal{T}}$ is resistant to external deviations if for all $hu\in\operatorname{\mathcal{T}}$ with $u\in V_{i}$ and $i\not\in\operatorname{Visit}(hu)$ , if there exists $u^{\prime}\in\operatorname{Succ}(u)$ such that $huu^{\prime}\notin\operatorname{\mathcal{T}}$ and player $i$ has a winning strategy from $u^{\prime}$ (against the coalition of the other players), then for all plays $\rho\in\operatorname{\mathcal{T}}^{\infty}_{\restriction hu}$ , $\operatorname{Gain}_{i}(\rho)=1$ .
3.

The tree $\operatorname{\mathcal{T}}$ is good if it is resistant to internal and external deviations.

The resistance to internal and external deviations leads to the characterization of outcomes of permissive NEs (Theorem 5): given a good tree $\operatorname{\mathcal{T}}$ , there exists a permissive NE such that its outcomes are the plays corresponding to the infinite branches of $\operatorname{\mathcal{T}}$ iff $\operatorname{\mathcal{T}}$ is good.

Theorem 5.

Let $\operatorname{\mathcal{T}}$ be a tree over $(\operatorname{\mathcal{G}},v_{0})$ rooted at $v_{0}$ . The following assertions are equivalent:

1.

There exists a permissive NE $\operatorname{\Theta}$ in $(\operatorname{\mathcal{G}},v_{0})$ such that $\langle\operatorname{\Theta}\rangle_{v_{0}}^{\mathrm{H}}=\operatorname{% \mathcal{T}}$ ;
2.

The tree $\operatorname{\mathcal{T}}$ is good.

$\blacktriangleright$ Remark 6.

For all multi-strategies $\operatorname{\Theta}$ , and all players $i\in\operatorname{N}$ , the penalty $\operatorname{MPenalty}_{i}(\operatorname{\Theta},v_{0})$ is equal to the penalty of player $i$ in the good tree $\langle\operatorname{\Theta}\rangle_{v_{0}}^{\mathrm{H}}$ , i.e., $\operatorname{Penalty}_{i}(\langle\operatorname{\Theta}\rangle_{v_{0}}^{% \mathrm{H}})$ . The construction of Theorem 5 thus also preserves the main penalties.

Proof sketch.

For $(1\Rightarrow 2)$ let us assume that $\operatorname{\Theta}$ is a permissive NE and that $\langle\operatorname{\Theta}\rangle_{v_{0}}^{\mathrm{H}}=\operatorname{% \mathcal{T}}$ . We have to prove that $\operatorname{\mathcal{T}}$ is good. If $\operatorname{\mathcal{T}}$ is not resistant to internal deviations that means that from some vertex $v$ there exists two plays $\rho$ , crossing $u$ , and $\rho^{\prime}$ , crossing $u^{\prime}\neq u$ , such that: $\rho$ is winning for player $i$ and $\rho^{\prime}$ is losing for player $i$ , see Figure 4(a). In particular, we can build a strategy profile $\sigma$ consistent with $\operatorname{\Theta}$ such that $\langle\sigma_{\restriction h}\rangle_{v}=\rho^{\prime}$ and $\langle\sigma_{\restriction hv}\rangle_{u}=\rho_{\geq 1}$ . Meaning that player $i$ should deviate by choosing $u$ instead of $u^{\prime}$ from $v$ , meaning that $\sigma$ is not an NE and $\operatorname{\Theta}$ is not a permissive NE. If $\operatorname{\mathcal{T}}$ is not resistant to external deviations, that means that from some vertex $u$ of player $i$ there exists a play $\rho$ such that $\operatorname{Gain}_{i}(\rho)=0$ and $u^{\prime}$ a successor of $u$ outside $\operatorname{\mathcal{T}}$ from which player $i$ can win, see Figure 4(b). Thus we can build a strategy profile $\sigma$ consistent with $\operatorname{\Theta}$ such that $\langle\sigma\rangle_{v_{0}}=h\rho$ . In this way, player $i$ should choose to go in $u^{\prime}$ and then follow a winning strategy meaning that $\sigma$ is not an NE and $\operatorname{\Theta}$ not a permissive NE.

For $(2\Rightarrow 1)$ , let us assume that $\operatorname{\mathcal{T}}$ is a good tree. We build a permissive NE $\operatorname{\Theta}$ such that its outcomes are the plays corresponding to the infinite branches of $\operatorname{\mathcal{T}}$ . Additionally, if a player $i$ deviates from $\operatorname{\mathcal{T}}$ , the coalition of the other players plays its retaliation²²2This retaliation strategy corresponds to the winning strategy of player $2$ in a two-player zero-sum reachability game in which player $1$ is player $i$ and wants to reach $\operatorname{F}_{i}$ and player $2$ is the coalition of the other players and wants to avoid visiting $\operatorname{F}_{i}$ [15, Chapter 2]. strategy to prevent player $i$ from deviating. $\hfill\blacktriangleleft$

Figure 4: Examples of trees that do not respect: (a) the resistance to internal deviations since

\operatorname{Gain}_{i}(\rho^{\prime})=0

but

\operatorname{Gain}_{i}(\rho)=1

; (b) the resistance to external deviations since

\operatorname{Gain}_{i}(\rho)=0

but Player

i

can win from

u^{\prime}

.

4.2 Characterization of permissive subgame perfect equilibria

Permissive subgame perfect equilibria are intrinsically more complex than permissive Nash equilibria. Thus their characterization cannot only rely on the outcomes from the initial vertex, it should also take into account the outcomes in all subgames. This is the reason why, in order to deal with a compact representation of outcomes of a permissive SPE and its subgames, we introduce the notion of forest. Then, we generalize the definition of good trees to define good forests needed to characterize SPEs instead of NEs.

Forests and penalties of forests.

Trees of the forest are indexed by tuples $(i,v,\operatorname{I})\in\operatorname{N}\times V\times 2^{\operatorname{N}}$ . More precisely, we let

\mathcal{I}=\{(0,v_{0},\operatorname{I}_{0})\}\cup\{(i,v,\operatorname{I})\in% \operatorname{N}\times V\times 2^{\operatorname{N}}\mid\exists hv^{\prime}\in% \operatorname{Hist}_{i}(v_{0})\text{ st. }v\in\operatorname{Succ}(v^{\prime})% \,\wedge\,\operatorname{I}=\operatorname{Visit}(hv^{\prime}v)\}

where $\operatorname{I}_{0}=\{i\in\operatorname{N}\mid v_{0}\in F_{i}\}$ . Apart from the special tuple $(0,v_{0},\operatorname{I}_{0})$ , a tuple $(i,v,\operatorname{I})$ represents the fact that $v$ is a vertex played by player $i$ and reachable from $v_{0}$ , and that all players in $\operatorname{I}$ have already seen their target when $v$ is reached. A forest in $(\operatorname{\mathcal{G}},v_{0})$ is thus a set of trees $\operatorname{\mathcal{F}}=\{\operatorname{\mathcal{T}}_{i,v,I}\mid(i,v,% \operatorname{I})\in\mathcal{I}\}$ such that $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ is a tree without leaves over $\operatorname{\mathcal{G}}$ rooted at $v$ . The intuition behind this object is that the tree $\operatorname{\mathcal{T}}_{0,v_{0},\operatorname{I}_{0}}$ represents the outcomes of a multi-strategy $\operatorname{\Theta}$ and the other trees $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ represent the outcomes of $\operatorname{\Theta}_{\restriction hv^{\prime}}$ in the subgames $(\operatorname{\mathcal{G}}_{\restriction hv^{\prime}},v)$ for all $hv^{\prime}\in\operatorname{Hist}_{i}(v_{0})$ such that $\operatorname{Visit}(hv^{\prime}v)=\operatorname{I}$ .

Moreover the main (resp. retaliation) penalty of a forest $\operatorname{\mathcal{F}}$ for a player $i\in\operatorname{N}$ are respectively given by

\operatorname{MPenalty}_{i}(\operatorname{\mathcal{F}})=\operatorname{Penalty}% _{i}(\operatorname{\mathcal{T}}_{0,v_{0},\operatorname{I}_{0}})\quad\text{ and% }\quad\operatorname{RPenalty}_{i}(\operatorname{\mathcal{F}})\displaystyle=% \sup_{\begin{subarray}{c}\operatorname{\mathcal{T}}_{i,v,I}\in\operatorname{% \mathcal{F}}\\ (i,v,\operatorname{I})\in\operatorname{Out}\end{subarray}}\!\!\!\!% \operatorname{Penalty}_{i}(\operatorname{\mathcal{T}}_{i,v,I})

where $\operatorname{Out}=\{(i,v,I)\in\mathcal{I}\setminus\{(0,v_{0},\operatorname{I}% _{0})\}\mid\exists hv\in\operatorname{Hist}(v_{0})\text{ st. }hv\not\in% \operatorname{\mathcal{T}}_{(0,v_{0},\operatorname{I}_{0})}\,\wedge\,% \operatorname{Last}(h)\in V_{i}\wedge\operatorname{I}=\operatorname{Visit}(hv)\}$ described the indices of trees in the forest that are deviations from the main tree $\operatorname{\mathcal{T}}_{0,v_{0},\operatorname{I}_{0}}$ . If $\operatorname{Out}$ is empty, we let $\operatorname{RPenalty}_{i}(\operatorname{\mathcal{F}})=0$ .

Characterization.

Following the same philosophy as for permissive NEs, a forest is good if each tree $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ of the forest satisfies two properties. The first one is that $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ has to be $(\operatorname{N}\setminus\operatorname{I})$ -resistant to internal deviations, exactly as for permissive NEs except that we take into account players who have already visited their target set, i.e., players in $\operatorname{I}$ . The second one, called resistance to constrained external deviations, means that at any node $h u$ of the tree such that $u$ belongs to player $j$ , if player $j$ has the possibility to jump to another tree $\operatorname{\mathcal{T}}_{j,u^{\prime},\operatorname{I}^{\prime}}$ by playing to a successor $u^{\prime}$ not in the tree and if there exists a play in this latter tree which is winning for player $j$ , then all plays after $h u$ in $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ have to be winning for player $j$ .

Definition 7 (Good forest).

Let $\operatorname{\mathcal{F}}$ be a forest in $(\operatorname{\mathcal{G}},v_{0})$ .

1.

A tree $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}\in\operatorname{\mathcal{F}}$ is resistant to constrained external deviations if it satisfies the following property: for all $hu\in\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ and $j\in\operatorname{N}$ such that we have that (i) $u\in V_{j}$ and $j\not\in I\cup\operatorname{Visit}(hu)$ and (ii) there exists $u^{\prime}\in\operatorname{Succ}(u)$ such that $huu^{\prime}\not\in\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ , if there exists $\rho^{\prime}\in\operatorname{\mathcal{T}}^{\infty}_{j,u^{\prime},% \operatorname{I}^{\prime}}$ , where $\operatorname{I}^{\prime}=\operatorname{I}\cup\operatorname{Visit}(huu^{\prime})$ , such that $\operatorname{Gain}_{j}(\rho^{\prime})=1$ , then for all $\rho\in\operatorname{\mathcal{T}}^{\infty}_{i,v,\operatorname{I}\restriction hu}$ , $\operatorname{Gain}_{j}(\rho)=1$ .
2.

The forest $\operatorname{\mathcal{F}}$ is good if each tree $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}\in\operatorname{\mathcal{F}}$ is $(\operatorname{N}\setminus\operatorname{I})$ -resistant to internal deviations (see (1) in Definition 4) and resistant to constrained external deviations.

Thanks to good forests, we are able to characterize the outcomes of permissive SPEs: given a good tree $\operatorname{\mathcal{T}}^{*}$ , there exists a permissive SPE such that its outcomes correspond to $\operatorname{\mathcal{T}}^{*}$ iff there exists a good forest whose “main” tree is $\operatorname{\mathcal{T}}^{*}$ , i.e., $\operatorname{\mathcal{T}}_{0,v_{0},\operatorname{I}_{0}}=\operatorname{% \mathcal{T}}^{*}$ . With some other constraints, this also preserves strongly (resp. weakly) winning and penalty properties.

Theorem 8.

Let $m\in(\mathbb{N}\cup\{\infty\})^{n}$ and $r\in(\mathbb{N}\cup\{\infty\})^{n}$ be upper thresholds. Let $\operatorname{\mathcal{T}}^{*}$ be a tree rooted at $v_{0}$ and $\operatorname{Win}\subseteq\operatorname{N}$ be a set of players. The following assertions are equivalent:

1.
There exists a permissive SPE $\operatorname{\Theta}$ in $(\operatorname{\mathcal{G}},v_{0})$ such that:
1. (a)
  
  $\langle\operatorname{\Theta}\rangle_{v_{0}}^{\mathrm{H}}=\operatorname{% \mathcal{T}}^{*}$ ;
2. (b)
  
  $\operatorname{\Theta}$ is strongly winning w.r.t. $\operatorname{Win}$ ;
3. (c)
  
  for all $i\in\operatorname{N}$ , $\operatorname{MPenalty}_{i}(\operatorname{\Theta},v_{0})\leq m_{i}$ and $\operatorname{RPenalty}_{i}(\operatorname{\Theta},v_{0})\leq r_{i}$ .
2.
There exists a good forest $\operatorname{\mathcal{F}}$ in $(\operatorname{\mathcal{G}},v_{0})$ such that:
1. (a)
  
  $\operatorname{\mathcal{T}}_{0,v_{0},\operatorname{I}_{0}}=\operatorname{% \mathcal{T}}^{*}$ ;
2. (b)
  
  for all $\rho\in\operatorname{\mathcal{T}}^{\infty}_{0,v_{0},\operatorname{I}_{0}}$ , for all $i\in\operatorname{Win}$ , $\operatorname{Gain}_{i}(\rho)=1$ ;
3. (c)
  
  for all $i\in\operatorname{N}$ , $\operatorname{MPenalty}_{i}(\operatorname{\mathcal{F}})\leq m_{i}$ and $\operatorname{RPenalty}_{i}(\operatorname{\mathcal{F}})\leq r_{i}$ .

These assertions are still equivalent by replacing 1b by “ $\operatorname{\Theta}$ is weakly winning w.r.t. $\operatorname{Win}$ ” and 2b by “there exists $\rho\in\operatorname{\mathcal{T}}^{\infty}_{0,v_{0},\operatorname{I}_{0}}$ such that for all $i\in\operatorname{Win}$ , $\operatorname{Gain}_{i}(\rho)=1$ ”.

Proof sketch.

For $(1\Rightarrow 2)$ , let us assume that $\operatorname{\Theta}$ is a permissive SPE. We build a good forest $\operatorname{\mathcal{F}}$ such that $\operatorname{\mathcal{T}}_{0,v_{0},\operatorname{I}_{0}}$ is the outcomes of $\operatorname{\Theta}$ , i.e., $\operatorname{\mathcal{T}}_{0,v_{0},\operatorname{I}_{0}}=\langle\operatorname% {\Theta}\rangle_{v_{0}}^{\mathrm{H}}$ , and a tree $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ is a representative of the outcomes of $\operatorname{\Theta}_{\restriction hv^{\prime}}$ in some subgame $(\operatorname{\mathcal{G}}_{\restriction hv^{\prime}},v)$ such that $v^{\prime}\in V_{i}$ and $\operatorname{Visit}(hv^{\prime}v)=\operatorname{I}$ . In order to obtain a good forest and since several $hv^{\prime}v$ could satisfy those properties, each representative $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ has to be chosen in a proper way: it has to minimize the maximal gain of player $i$ for plays in $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ . More formally, for each $(i,v,\operatorname{I})\in\mathcal{I}$ , we let $\mathcal{O}(i,v,\operatorname{I})=\{\langle\operatorname{\Theta}_{\restriction hv% ^{\prime}}\rangle_{v}^{\mathrm{H}}\mid hv^{\prime}v\in\operatorname{Hist}(v_{0% })\,\wedge\,v^{\prime}\in V_{i}\,\wedge\,\operatorname{I}=\operatorname{Visit}% (hv^{\prime}v)\}$ and we choose $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}\in\mathcal{O}(i,v,% \operatorname{I})$ such that $\max\{\operatorname{Gain}_{i}(\rho)\mid\rho\in\operatorname{\mathcal{T}}^{% \infty}_{i,v,\operatorname{I}}\}=\min_{\operatorname{\mathcal{T}}\in\mathcal{O% }(i,v,\operatorname{I})}\max\{\operatorname{Gain}_{i}(\rho)\mid\rho\in% \operatorname{\operatorname{\mathcal{T}}^{\infty}}\}$ .

Thanks to this latter property, $\operatorname{\mathcal{F}}$ is good. Indeed, let $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ be a tree of $\operatorname{\mathcal{F}}$ . Exactly as for permissive NEs, if $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ is not $(\operatorname{N}\setminus\operatorname{I})$ -resistant to internal deviations, we can build a strategy profile $\sigma$ consistent with $\operatorname{\Theta}$ such that the restriction of $\sigma$ is not an NE in a subgame corresponding to $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ . If $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ is not resistant to constrained external deviations that means that from some node $u$ , owned by player $j$ , there exists a play $\rho$ losing for player $j$ and player $j$ could choose to play outside $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ by jumping to a tree $\operatorname{\mathcal{T}}_{j,u^{\prime},\operatorname{I}^{\prime}}$ in which there exists a play $\rho^{\prime}$ winning for him, see Figure 5. Let $g$ be the history such that $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ represents the outcomes of $\operatorname{\Theta}_{\restriction g}$ in $(\operatorname{\mathcal{G}}_{\restriction g},v)$ . Notice that $\langle\operatorname{\Theta}_{\restriction gh}\rangle_{u^{\prime}}$ may be different from $\operatorname{\mathcal{T}}_{j,u^{\prime},I^{\prime}}$ . However thanks to the way in which this representative is chosen, we have that there exists a play $\rho^{\prime\prime}$ in $\langle\operatorname{\Theta}_{\restriction gh}\rangle_{u^{\prime}}$ with $\operatorname{Gain}_{j}(\rho^{\prime\prime})=1$ . Thus, we can build a strategy $\sigma$ consistent with $\operatorname{\Theta}$ such that $\langle\sigma_{\restriction gh}\rangle_{u}=\rho$ and $\langle\sigma_{\restriction ghu}\rangle_{u^{\prime}}=\rho^{\prime\prime}$ . This means that player $j$ could deviate by choosing $u^{\prime}$ instead of $u$ from $v$ in the subgame $(\operatorname{\mathcal{G}}_{\restriction g},v)$ , thus $\sigma$ would not be an SPE and $\operatorname{\Theta}$ not a permissive SPE.

For $(2\Rightarrow 1)$ , from a good forest $\operatorname{\mathcal{F}}$ a multi-strategy is build such that its subgame outcomes are the trees of $\operatorname{\mathcal{F}}$ . This forms a permissive SPE because $\operatorname{\mathcal{F}}$ is good. $\hfill\blacktriangleleft$

Figure 5: Example of forest that does not respect the resistance to constrained external deviations since

\operatorname{Gain}_{i}(\rho)=0

but

\operatorname{Gain}_{i}(\rho^{\prime})=1

.

For now, good trees and trees in good forests are infinite, but Section 5 will show that we can represent some trees using a finite representation (intuitively, by supposing that every branch ends with a lasso in the game). It is this finite representation of good trees and good forests that will be used to decide the constrained penalty problems for permissive NEs and permissive SPEs, thanks to the characterizations of Theorems 5 and 8.

5 Computation of permissive equilibria

Theorems 5 and 8 characterize permissive NEs and SPEs with respect to infinite tree-shaped objects. In this section, we use these characterizations in order to decide the various penalty problems defined in Section 3: we check the existence of the good infinite tree-shaped objects by checking the existence of finite symbolic representations of such objects. We start by describing for a single tree this symbolic representation, and show that there exists a polynomial-size such representation (when the penalty upper-bounds are encoded in unary).

5.1 Symbolic trees and forests

Definition 9.

A symbolic tree is a pair $\mathcal{U}=(\operatorname{\mathcal{T}},f)$ with $\operatorname{\mathcal{T}}$ a finite tree (i.e., a finite subset of non-empty histories of $\operatorname{\mathcal{G}}$ ), and $f$ a function mapping each leaf $h$ of $\mathcal{U}$ to a non-empty set of successor nodes $h^{\prime}$ that are ancestors of $h$ in $\mathcal{U}$ such that $(\operatorname{Last}(h),\operatorname{Last}(h^{\prime}))\in E$ .

A symbolic tree can be unfolded into an infinite tree by repeatedly expanding the leaves of $\mathcal{U}$ using as successors the choice prescribed by $f$ . We denote by $\widetilde{\mathcal{U}}$ the infinite tree obtained by unfolding the symbolic tree $\mathcal{U}$ . Similarly, the notions of symbolic forest $\operatorname{\mathcal{F}}$ , where every tree in it is a symbolic tree, and unfolding of symbolic forest $\widetilde{\operatorname{\mathcal{F}}}$ can be defined.

In order to treat simultaneously NEs and SPEs, we introduce a new definition generalizing the resistance to external deviations and constrained external deviations. For a vector $\gamma\in\{0,1\}^{\operatorname{N}\times V\times 2^{\operatorname{N}}}$ of gains, and a subset $D\subseteq\operatorname{N}$ of players (that represent players that did not already win at the beginning of the tree), we say that a tree $\operatorname{\mathcal{T}}$ is $(\gamma,D)$ -resistant if for all $hu\in\operatorname{\mathcal{T}}$ with $u\in V_{i}$ and $u^{\prime}\in\operatorname{Succ}(u)$ with $huu^{\prime}\notin\operatorname{\mathcal{T}}$ , if $\gamma_{i,u^{\prime},(\operatorname{N}\setminus D)\cup\operatorname{Visit}(huu% ^{\prime})}=1$ , if $i\not\in(\operatorname{N}\setminus D)\cup\operatorname{Visit}(hu)$ , then for all plays $\rho\in\operatorname{\mathcal{T}}^{\infty}_{\restriction hu}$ , $\operatorname{Gain}_{i}(\rho)=1$ .

$\blacktriangleright$ Remark 10.

The notion of $(\gamma,D)$ -resistance is close to the resistance to external deviations and constrained external deviations, so that we directly obtain from Theorems 5 and 8:

$\blacksquare$

Let $\gamma^{\operatorname{\mathcal{G}}}$ be defined as follows: for all $(i,u,\operatorname{I})$ , we let $\gamma^{\operatorname{\mathcal{G}}}_{i,u,\operatorname{I}}$ equals $1$ iff player $i$ belongs to $\operatorname{I}$ or can win from $u$ against the coalition of the other players in $\operatorname{\mathcal{G}}$ . Let $\operatorname{\mathcal{T}}$ be a tree. Then, $\operatorname{\mathcal{T}}$ is a good tree iff $\operatorname{\mathcal{T}}$ is resistant to internal deviations and $(\gamma^{\operatorname{\mathcal{G}}},\operatorname{N})$ -resistant.
$\blacksquare$

Let $\operatorname{\mathcal{F}}$ be a forest and let $\gamma^{\operatorname{\mathcal{F}}}$ defined as follows: for all $(j,u,\operatorname{J})$ , we let $\gamma^{\operatorname{\mathcal{F}}}_{j,u,\operatorname{J}}$ equals $1$ iff player $j$ belongs to $\operatorname{J}$ or the tree $\operatorname{\mathcal{T}}_{j,u,\operatorname{J}}$ contains at least one branch with a vertex of $F_{j}$ . Then, $\operatorname{\mathcal{F}}$ is a good forest iff each each tree $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ of $\operatorname{\mathcal{F}}$ is $(\operatorname{N}\setminus\operatorname{I})$ -resistant to internal deviations and $(\gamma^{\operatorname{\mathcal{F}}},\operatorname{N}\setminus\operatorname{I})$ -resistant.

The challenge to make this remark a decision procedure is to make the tree and forest finitely representable. We treat each tree independently of each other, thus explaining how to symbolically represent one single tree in the following proposition:

Proposition 11.

Let $\operatorname{\mathcal{T}}$ be a tree that is $D$ -resistant to internal deviations, with $D\subseteq\operatorname{N}$ . We let $\gamma\in\{0,1\}^{\operatorname{N}\times V\times 2^{\operatorname{N}}}$ be a vector of gains such that $\operatorname{\mathcal{T}}$ is $(\gamma,D)$ -resistant, and $(P_{i})_{i\in\operatorname{N}^{\prime}}$ be finite constraints on penalties for a subset $\operatorname{N}^{\prime}\subseteq\operatorname{N}$ of players. There exists a symbolic tree $\mathcal{U}$ , that is a subtree of $\operatorname{\mathcal{T}}$ , of height polynomial in the number of players and vertices of $\operatorname{\mathcal{G}}$ , and in the largest bound on penalty $P_{i}$ , such that the infinite tree $\widetilde{\mathcal{U}}$ satisfies the following properties:

1.

$\widetilde{\mathcal{U}}$ is $D$ -resistant to internal deviations;
2.

in $\widetilde{\mathcal{U}}$ , every player $i\in\operatorname{N}^{\prime}$ has a penalty at most $P_{i}$ ;
3.

$\widetilde{\mathcal{U}}$ is $(\gamma,D)$ -resistant.

Moreover, for a subset $\operatorname{Win}$ of players, if we start with $\operatorname{\mathcal{T}}$ that is strongly (respectively, weakly) winning w.r.t. $\operatorname{Win}$ , then we can make the above construction so that moreover $\widetilde{\mathcal{U}}$ is strongly (respectively, weakly) winning w.r.t. $\operatorname{Win}$ .

Figure 6: Construction of the symbolic tree.

The proof of this result goes by several steps, that we briefly sketch here only in the case where $\operatorname{\mathcal{T}}$ is strongly winning w.r.t. $\operatorname{Win}$ . Figure 6 depicts the notions used in the construction of the symbolic tree. First, we consider the smallest subtree of $\operatorname{\mathcal{T}}$ where leaves are such that all players of $\operatorname{Win}$ have visited their target set: this subtree is finite by König’s lemma, since all branches of $\operatorname{\mathcal{T}}$ have such a node where all players of $\operatorname{Win}$ have won, and the tree is finitely branching. This subtree is called the core. We then continue considering the parts of $\operatorname{\mathcal{T}}$ outside the core, in order to complete the branches so that: the $D$ -resistance to internal deviations is fulfilled (if a player has won in a certain branch of a subtree, he must win in all of them), the $(\gamma,D)$ -resistance is fulfilled (if $\gamma$ gives a constraint in the current node for a player $i$ , all the branches of this subtree should visit a target vertex of $i$ ). This extension of the core is cut into two parts: the expanded core that ends in places where all the new players that must visit their target because of $D$ -resistance to internal deviations and $(\gamma,D)$ -resistance have indeed won; the completion of branches in order to then find leaves of the symbolic tree where all successors can be replaced (with function $f$ ) by similar nodes in the same branch, and the lassos thus formed are such that the penalty of players that have a finite penalty threshold does not increase along them. We show that these completions of branches can be chosen of polynomial length. We then compress the core and expanded core so that they also have polynomial height.

The symbolic tree $\mathcal{U}$ thus built is a subtree of $\operatorname{\mathcal{T}}$ (even if its unfolding $\widetilde{\mathcal{U}}$ is not): in particular, as a corollary, if a player $j$ has no winning play in $\operatorname{\mathcal{T}}$ , he does not have a winning play in $\mathcal{U}$ neither. In particular, when we apply independently this proposition to all the trees of a forest $\operatorname{\mathcal{F}}$ , to obtain a symbolic forest $\mathcal{H}$ , this remark allows us to check that the new vector $\gamma^{\widetilde{\mathcal{H}}}$ has all its components not above the corresponding ones in $\gamma^{\operatorname{\mathcal{F}}}$ (if $\gamma^{\operatorname{\mathcal{F}}}_{i,v,\operatorname{I}}=0$ then $\gamma^{\widetilde{\mathcal{H}}}_{i,v,\operatorname{I}}=0$ ). In particular, if the tree $\operatorname{\mathcal{T}}_{i,v,\operatorname{I}}$ of the forest $\operatorname{\mathcal{F}}$ is $(\gamma^{\operatorname{\mathcal{F}}},\operatorname{N}\setminus\operatorname{I})$ -resistant, then the tree $\widetilde{\mathcal{U}}_{i,v,\operatorname{I}}$ of the symbolic forest $\mathcal{H}$ is $(\gamma^{\widetilde{\mathcal{H}}},\operatorname{N}\setminus\operatorname{I})$ -resistant.

Finally, by combining this result with Remark 10, we obtain the following corollaries that allow us to obtain the PSPACE decision procedures:

Corollary 12.

Let $m\in(\mathbb{N}\cup\{\infty\})^{n}$ be upper thresholds, and $M$ be the largest such upper threshold. The following assertions are equivalent:

1.

There exists a permissive NE $\operatorname{\Theta}$ in $(\operatorname{\mathcal{G}},v_{0})$ such that:

(a) $\operatorname{\Theta}$ is strongly winning w.r.t. $\operatorname{Win}$ ; (b) for all $i\in\operatorname{N}$ , $\operatorname{MPenalty}_{i}(\operatorname{\Theta},v_{0})\leq m_{i}$ .
2.

There exists a symbolic tree $\widetilde{\operatorname{\mathcal{T}}}$ in $(\operatorname{\mathcal{G}},v_{0})$ of height polynomial in the number of players and vertices of $\operatorname{\mathcal{G}}$ and in $M$ , such that

(a) $\widetilde{\operatorname{\mathcal{T}}}$ is resistant to internal deviations, and $(\gamma^{\operatorname{\mathcal{G}}},\operatorname{N})$ -resistant, with $\gamma^{\operatorname{\mathcal{G}}}$ defined in Remark 10; and (b) for all $\rho\in\widetilde{\operatorname{\mathcal{T}}}^{\infty}$ and $i\in\operatorname{Win}$ , $\operatorname{Gain}_{i}(\rho)=1$ ; and (c) for all $i\in\operatorname{N}$ , $\operatorname{Penalty}_{i}(\widetilde{\operatorname{\mathcal{T}}})\leq m_{i}$ .

These assertions are still equivalent by replacing 1(a) by “ $\operatorname{\Theta}$ is weakly winning w.r.t. $\operatorname{Win}$ ” and 2(b) by “there exists $\rho\in\widetilde{\operatorname{\mathcal{T}}}^{\infty}$ such that for all $i\in\operatorname{Win}$ , $\operatorname{Gain}_{i}(\rho)=1$ ”.

Corollary 13.

Let $m\in(\mathbb{N}\cup\{\infty\})^{n}$ and $r\in(\mathbb{N}\cup\{\infty\})^{n}$ be upper thresholds, and $M$ be the largest such upper threshold. The following assertions are equivalent:

1.

There exists a permissive SPE $\operatorname{\Theta}$ in $(\operatorname{\mathcal{G}},v_{0})$ such that:

(a) $\operatorname{\Theta}$ is strongly winning w.r.t. $\operatorname{Win}$ ; and (b) for all $i\in\operatorname{N}$ , $\operatorname{MPenalty}_{i}(\operatorname{\Theta},v_{0})\leq m_{i}$ and $\operatorname{RPenalty}_{i}(\operatorname{\Theta},v_{0})\leq r_{i}$ .
2.

There exists a symbolic forest $\operatorname{\mathcal{F}}$ in $(\operatorname{\mathcal{G}},v_{0})$ , where each symbolic tree has a height polynomial in the number of players and vertices of $\operatorname{\mathcal{G}}$ and in $M$ , such that (a) each tree $\widetilde{\operatorname{\mathcal{T}}}_{i,v,\operatorname{I}}$ is $(\operatorname{N}\setminus\operatorname{I})$ -resistant to internal deviations, and $(\gamma^{\operatorname{\mathcal{F}}},\operatorname{N})$ -resistant, with $\gamma^{\operatorname{\mathcal{F}}}$ defined in Remark 10; (b) for all $\rho\in\widetilde{\operatorname{\mathcal{T}}}_{0,v_{0},\operatorname{I}_{0}}^{\infty}$ and $i\in\operatorname{Win}$ , $\operatorname{Gain}_{i}(\rho)=1$ ; and (c) for all $i\in\operatorname{N}$ , $\operatorname{MPenalty}_{i}(\widetilde{\operatorname{\mathcal{F}}})\leq m_{i}$ and $\operatorname{RPenalty}_{i}(\widetilde{\operatorname{\mathcal{F}}})\leq r_{i}$ .

These assertions are still equivalent by replacing 1(a) by “ $\operatorname{\Theta}$ is weakly winning w.r.t. $\operatorname{Win}$ ” and 2(b) by “there exists $\rho\in\widetilde{\operatorname{\mathcal{T}}}^{\infty}_{0,v_{0},\operatorname{% I}_{0}}$ such that for all $i\in\operatorname{Win}$ , $\operatorname{Gain}_{i}(\rho)=1$ ”.

5.2 Decision problems over permissive Nash equilibria

For permissive NEs, it makes little sense to take into consideration the retaliation penalties, since the punishment after a deviation should definitely make the deviator lose whatever the penalty from now on. We thus obtain the following decision result:

Theorem 14.

The constrained penalty problem, the weakly winning with constrained penalty problem and the strongly winning with constrained penalty problem, all with infinite (and thus no) constraints on retaliation penalties and for NEs are decidable in $\mathrm{PSPACE}$ (when the penalty bounds are encoded in unary).

Proof.

We build upon Corollary 12, looking for a finite symbolic tree with the corresponding properties. We first explain how to solve the constrained penalty problem, and explain afterwards the adaptation for the two other problems. The idea is to use an alternating polynomial time Turing machine (since $\mathrm{AP}=\mathrm{PSPACE}$ [12]) to guess a symbolic tree, checking the various constraints over it by using branch per branch. We describe the construction by supposing that the states of the Turing machine are split between existential states (where the machine accepts if at least one execution accepts) and universal states (where the machine accepts if all the executions accept). Existential states thus allow us to non-deterministically guess the finite symbolic tree node after node. We use a polynomial counter to keep track of the polynomially bounded height of the tree: if the counter goes over the polynomial bound, the execution of the alternating machine fails. At each node, existential states guess non-deterministically the set of successors on the working tape.

Universal states allow us to check several pieces of information on the guessed symbolic tree: the resistance to internal deviations, the constraint on the penalty for each player, and the $(\gamma^{\operatorname{\mathcal{G}}},\operatorname{N})$ -resistance, with $\gamma^{\operatorname{\mathcal{G}}}$ as in Remark 10. Notice that this vector has exponential size, but the index $\operatorname{I}$ in a triple $(i,v,\operatorname{I})$ is useless (apart from knowing if $i\in\operatorname{I}$ ), and can thus be ignored: moreover, this set $\operatorname{I}$ will be maintained along the execution of the algorithm. This vector can thus be precomputed in (deterministic) polynomial time by determining, for each player, their set of winning vertices (against the coalition of the other players) [15].

The various checks can be performed branch per branch by keeping some pieces of information in memory, not only for the current node of the symbolic tree, but also for the whole current branch (this remains in polynomial space). Universal states are thus used to perform the checks on all the branches of the guessed tree.

$\blacksquare$

Checking the penalty for player $i$ . If we have to check that the main penalty of player $i$ is bounded by a threshold $m_{i}$ (i.e., that the penalty of player $i$ over each branch is bounded by $m_{i}$ ), we keep in memory the current penalty, forbidding for it to go above $m_{i}$ .
$\blacksquare$

Checking the resistance to internal deviations and $(\gamma^{\operatorname{\mathcal{G}}},\operatorname{N})$ -resistance. At each node of the guessed tree, if the existential states guessed at least two successors, or depending on the vector $\gamma^{\operatorname{\mathcal{G}}}$ (for a vertex $v$ where $\gamma^{\operatorname{\mathcal{G}}}$ has value $1$ , and that has not been chosen among the set of successors), we must remember constraints on the successors: either (a) all plays in their subtrees must be winning for a certain player $i$ , or (b) none. We could add neither constraint (a) nor (b) for a certain player (if only one successor has been chosen, and the $\gamma^{\operatorname{\mathcal{G}}}$ value of all the other successors is 0). In the case where only the resistance to internal deviation applies (if at least two successors have been chosen, and the $\gamma^{\operatorname{\mathcal{G}}}$ value of all the other successors is 0), the choice of constraint (a) or (b) is guessed non-deterministically. These constraints are kept all along the guessed branch except if a vertex of the target set of player $i$ is visited; in this case the constraint (a) is released. Moreover, the constraint (b) for a player $i$ forbids to select a successor in the future where player $i$ visits one of his target vertices.
$\blacksquare$

The end of the branches. The existential states decide when to stop the branch of the symbolic tree (before the counter runs out of the polynomial bound). Notice that the branch cannot stop if one of the type (a) constraints is not released. Then existential states provide the set of successors taken in the ancestors so that for players that have a finite upper threshold on their penalty, ancestors must have the same current penalty as the leaf (to ensure that their penalty does not raise to $+\infty$ in the long run).

For the strongly winning variants, universal states also check the constraint that every player of $\operatorname{Win}$ must win at the end of each branch. For the weakly winning variant, the existential states are also used to propose a branch where all players of $\operatorname{Win}$ will win. The universal states moreover check whether this condition is fulfilled for this particular branch. $\hfill\blacktriangleleft$

5.3 Decision problems over permissive subgame perfect equilibria

Theorem 15.

The constrained penalty problem, the weakly winning with constrained penalty problem and the strongly winning with constrained penalty problem for SPEs are decidable in $\mathrm{PSPACE}$ (when the penalty bounds are encoded in unary).

Proof.

The proof is the same as for NEs, instead of the fact that we use Corollary 13, with a vector $\gamma^{\operatorname{\mathcal{F}}}$ that is partially guessed non-deterministically when it is needed. When the existential states extend a branch of the tree $\operatorname{\mathcal{T}}$ from a vertex of player $i$ , the universal states does not only explore the chosen successors (with constraints (a) or (b) as in the previous proof), but now also explores the other vertices $u$ by starting a fresh exploration of another tree $\operatorname{\mathcal{T}}_{i,u,I}$ of the forest. Existential states also non-deterministically guess if player $i$ is weakly winning in $\operatorname{\mathcal{T}}_{i,u,I}$ . If so, this gives new constraints (a) in the tree $\operatorname{\mathcal{T}}$ . The guessed weakly winning constraints are then checked in the fresh exploration: if player $i$ must be weakly winning, this is a constraint of the same type as a weakly winning constraint in the “main” tree; if player $i$ must not be weakly winning, this is a constraint of type (b) (none of the play must be winning for player $i$ ) that we deal as before.

Main penalties are checked as before. For the retaliation penalties, for each player, we check that the total penalty of all new symbolic trees $\operatorname{\mathcal{T}}_{i,u,\operatorname{I}}$ is below the given upper threshold. To ensure polynomial time termination, we maintain a polynomial counter, and the set of trees (more precisely, the set of triples $(i,u,I)$ used to index the trees of the forest) we jumped in so far. The polynomial counter again takes care of the depth of the branch we explore in the current tree (we reset this counter when we jump from a tree to another one). The set of trees we jumped in so far is maintained to forbid several explorations of the same tree of the forest. As for NEs, the exploration is losing if the depth of the current branch is longer than the polynomial bound. The cardinal of the set of triples $(i,u,I)$ we must maintain is also polynomial (bounded by $|\operatorname{N}|\times|V|\times|\operatorname{N}|$ , even though there are exponentially many trees in a forest), since the subset $\operatorname{I}$ of winning players does not decrease along the jumps from a tree to the next one. This also implies that the total length of the executions of the Turing machine is indeed polynomial.

Notice that weakly and strongly winning conditions have only to be checked on the “main” tree as for permissive NEs. $\hfill\blacktriangleleft$

6 Conclusion

We studied the permissiveness in Nash, and subgame perfect equilibria over multiplayer reachability games. We showed that several associated problems are decidable in PSPACE: they ask for the existence of such equilibria with various constraints, both on the set of players who reach their target set, and on the penalties that allow us to compare the permissiveness of two equilibria. The polynomial space depends on the size of the game, and the largest upper threshold on the penalties. We were not able to decrease the space dependency to be only polynomial in the logarithm of the penalty thresholds: we leave for future work to investigate if this is possible, or if there is a matching lower bound on complexity.

As other ideas for future works, we would like to extend our study to other objectives than reachability, like more general $\omega$ -regular objectives (e.g., parity games), but also weighted games like mean-payoff games, discounted-payoff games, or shortest-path games (where the reachability objective is combined with an objective to reach the target with the smallest possible total weight). An even more challenging problem is to extend this study to the setting of timed games, where the permissiveness is not only on the choice of edges, but also on the choice of delays spent in a given vertex. Work along these lines has been carried out on timed automata and two-player timed games [4, 13].

References

[1] Ashwani Anand, Satya Prakash Nayak, and Anne-Kathrin Schmuck. Synthesizing permissive winning strategy templates for parity games. In CAV 2023, volume 13964 of LNCS, pages 436–458. Springer, 2023. doi:10.1007/978-3-031-37706-8_22.
[2] Julien Bernet, David Janin, and Igor Walukiewicz. Permissive strategies: from parity games to safety games. RAIRO Theor. Informatics Appl., 36(3):261–275, 2002. doi:10.1051/ITA:2002013.
[3] Patricia Bouyer, Marie Duflot, Nicolas Markey, and Gabriel Renault. Measuring permissivity in finite games. In CONCUR 2009, volume 5710 of LNCS, pages 196–210. Springer, 2009. doi:10.1007/978-3-642-04081-8_14.
[4] Patricia Bouyer, Erwin Fang, and Nicolas Markey. Permissive strategies in timed automata and games. Electron. Commun. Eur. Assoc. Softw. Sci. Technol., 72, 2015. doi:10.14279/TUJ.ECEASST.72.1015.
[5] Patricia Bouyer, Nicolas Markey, Jörg Olschewski, and Michael Ummels. Measuring permissiveness in parity games: Mean-payoff parity games revisited. In ATVA 2011, volume 6996 of LNCS, pages 135–149. Springer, 2011. doi:10.1007/978-3-642-24372-1_11.
[6] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. On the Complexity of SPEs in Parity Games. In CSL 2022, volume 216 of LIPIcs, pages 10:1–10:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.CSL.2022.10.
[7] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. The Complexity of SPEs in Mean-Payoff Games. In ICALP 2022, volume 229 of LIPIcs, pages 116:1–116:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.ICALP.2022.116.
[8] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. Subgame-perfect equilibria in mean-payoff games. Logical Methods in Computer Science, 19, 2023. doi:10.46298/LMCS-19(4:6)2023.
[9] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, and Jean-François Raskin. Constrained existence problem for weak subgame perfect equilibria with $\omega$ -regular boolean objectives. In GandALF 2018, volume 277 of EPTCS, pages 16–29, 2018. doi:10.4204/EPTCS.277.2.
[10] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, Jean-François Raskin, and Marie van den Bogaard. The complexity of subgame perfect equilibria in quantitative reachability games. In CONCUR 2019, volume 140 of LIPIcs, pages 13:1–13:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.CONCUR.2019.13.
[11] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, and Nathan Thomasset. On relevant equilibria in reachability games. In RP 2019, volume 11674 of LNCS, pages 48–62. Springer, 2019. doi:10.1007/978-3-030-30806-3_5.
[12] Ashok K. Chandra, Dexter C. Kozen, and Larry J. Stockmeyer. Alternation. Journal of the ACM, 28(1):114–133, 1981. doi:10.1145/322234.322243.
[13] Emily Clement, Thierry Jéron, Nicolas Markey, and David Mentré. Computing maximally-permissive strategies in acyclic timed automata. In FORMATS 2020, volume 12288 of LNCS, pages 111–126. Springer, 2020. doi:10.1007/978-3-030-57628-8_7.
[14] Aline Goeminne and Benjamin Monmege. Permissive equilibria in multiplayer reachability games. Technical Report 2411.13296, arXiv, 2024. doi:10.48550/arXiv.2411.13296.
[15] Erich Grädel, Wolfgang Thomas, and Thomas Wilke, editors. Automata, Logics, and Infinite Games: A Guide to Current Research [outcome of a Dagstuhl seminar, February 2001], volume 2500 of LNCS. Springer, 2002. doi:10.1007/3-540-36387-4.
[16] Satya Prakash Nayak and Anne-Kathrin Schmuck. Most general winning secure equilibria synthesis in graph games. In TACAS 2024, volume 14572 of LNCS, pages 173–193. Springer, 2024. doi:10.1007/978-3-031-57256-2_9.
[17] Michael Ummels. Rational behaviour and strategy construction in infinite multiplayer games. In FSTTCS 2006, volume 4337 of LNCS, pages 212–223. Springer, 2006. doi:10.1007/11944836_21.

[bib.bib1] [1] Ashwani Anand, Satya Prakash Nayak, and Anne-Kathrin Schmuck. Synthesizing permissive winning strategy templates for parity games. In CAV 2023, volume 13964 of LNCS, pages 436–458. Springer, 2023. doi:10.1007/978-3-031-37706-8_22.

[bib.bib2] [2] Julien Bernet, David Janin, and Igor Walukiewicz. Permissive strategies: from parity games to safety games. RAIRO Theor. Informatics Appl., 36(3):261–275, 2002. doi:10.1051/ITA:2002013.

[bib.bib3] [3] Patricia Bouyer, Marie Duflot, Nicolas Markey, and Gabriel Renault. Measuring permissivity in finite games. In CONCUR 2009, volume 5710 of LNCS, pages 196–210. Springer, 2009. doi:10.1007/978-3-642-04081-8_14.

[bib.bib4] [4] Patricia Bouyer, Erwin Fang, and Nicolas Markey. Permissive strategies in timed automata and games. Electron. Commun. Eur. Assoc. Softw. Sci. Technol., 72, 2015. doi:10.14279/TUJ.ECEASST.72.1015.

[bib.bib5] [5] Patricia Bouyer, Nicolas Markey, Jörg Olschewski, and Michael Ummels. Measuring permissiveness in parity games: Mean-payoff parity games revisited. In ATVA 2011, volume 6996 of LNCS, pages 135–149. Springer, 2011. doi:10.1007/978-3-642-24372-1_11.

[bib.bib6] [6] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. On the Complexity of SPEs in Parity Games. In CSL 2022, volume 216 of LIPIcs, pages 10:1–10:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.CSL.2022.10.

[bib.bib7] [7] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. The Complexity of SPEs in Mean-Payoff Games. In ICALP 2022, volume 229 of LIPIcs, pages 116:1–116:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.ICALP.2022.116.

[bib.bib8] [8] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. Subgame-perfect equilibria in mean-payoff games. Logical Methods in Computer Science, 19, 2023. doi:10.46298/LMCS-19(4:6)2023.

[bib.bib9] [9] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, and Jean-François Raskin. Constrained existence problem for weak subgame perfect equilibria with $\omega$ -regular boolean objectives. In GandALF 2018, volume 277 of EPTCS, pages 16–29, 2018. doi:10.4204/EPTCS.277.2.

[bib.bib10] [10] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, Jean-François Raskin, and Marie van den Bogaard. The complexity of subgame perfect equilibria in quantitative reachability games. In CONCUR 2019, volume 140 of LIPIcs, pages 13:1–13:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.CONCUR.2019.13.

[bib.bib11] [11] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, and Nathan Thomasset. On relevant equilibria in reachability games. In RP 2019, volume 11674 of LNCS, pages 48–62. Springer, 2019. doi:10.1007/978-3-030-30806-3_5.

[bib.bib12] [12] Ashok K. Chandra, Dexter C. Kozen, and Larry J. Stockmeyer. Alternation. Journal of the ACM, 28(1):114–133, 1981. doi:10.1145/322234.322243.

[bib.bib13] [13] Emily Clement, Thierry Jéron, Nicolas Markey, and David Mentré. Computing maximally-permissive strategies in acyclic timed automata. In FORMATS 2020, volume 12288 of LNCS, pages 111–126. Springer, 2020. doi:10.1007/978-3-030-57628-8_7.

[bib.bib14] [14] Aline Goeminne and Benjamin Monmege. Permissive equilibria in multiplayer reachability games. Technical Report 2411.13296, arXiv, 2024. doi:10.48550/arXiv.2411.13296.

[bib.bib15] [15] Erich Grädel, Wolfgang Thomas, and Thomas Wilke, editors. Automata, Logics, and Infinite Games: A Guide to Current Research [outcome of a Dagstuhl seminar, February 2001], volume 2500 of LNCS. Springer, 2002. doi:10.1007/3-540-36387-4.

[bib.bib16] [16] Satya Prakash Nayak and Anne-Kathrin Schmuck. Most general winning secure equilibria synthesis in graph games. In TACAS 2024, volume 14572 of LNCS, pages 173–193. Springer, 2024. doi:10.1007/978-3-031-57256-2_9.

[bib.bib17] [17] Michael Ummels. Rational behaviour and strategy construction in infinite multiplayer games. In FSTTCS 2006, volume 4337 of LNCS, pages 212–223. Springer, 2006. doi:10.1007/11944836_21.

Permissive Equilibria in Multiplayer Reachability Games

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Related works.

Contribution.

2 Multiplayer reachability games

3 Permissiveness in strategies

Definition 1 (Penalties).

Definition 2 (Weakly and strongly winning).

Example 3.

Problem 1 (Constrained penalty problem).

Problem 2 (Weakly winning with constrained penalty problem).

Problem 3 (Strongly winning with constrained penalty problem).

4 Characterizations of permissive equilibria

Trees.

4.1 Characterization of permissive Nash equilibria

Definition 4.

Theorem 5.

▶ Remark 6.

Proof sketch.

4.2 Characterization of permissive subgame perfect equilibria

Forests and penalties of forests.

Characterization.

Definition 7 (Good forest).

Theorem 8.

Proof sketch.

5 Computation of permissive equilibria

5.1 Symbolic trees and forests

Definition 9.

▶ Remark 10.

Proposition 11.

Corollary 12.

Corollary 13.

5.2 Decision problems over permissive Nash equilibria

Theorem 14.

Proof.

5.3 Decision problems over permissive subgame perfect equilibria

Theorem 15.

Proof.

6 Conclusion

References

$\blacktriangleright$ Remark 6.

$\blacktriangleright$ Remark 10.