Abstract 1 Introduction 2 Preliminaries 3 SPE-NCRS Problem and 𝑷𝟏𝑪𝑷𝟐 Game 4 Solving the 𝑷𝟏𝑪𝑷𝟐 Game 5 Conclusion References Appendix A Helpful Examples for Games on Graphs Appendix B SPE-NCRS Problem and 𝑷𝟏𝑪𝑷𝟐 Game

The Non-Cooperative Rational Synthesis Problem for SPEs and ω-Regular Objectives

Véronique Bruyère ORCID Université de Mons, UMONS, Belgium Jean-François Raskin ORCID Université libre de Bruxelles, ULB, Belgium Alexis Reynouard ORCID Université libre de Bruxelles, ULB, Belgium Marie Van Den Bogaard ORCID Université Gustave Eiffel, CNRS, LIGM, Marne-la-Vallée, France
Abstract

This paper studies the rational synthesis problem for multi-player games played on graphs when rational players are following subgame perfect equilibria. In these games, one player, the system, declares his strategy upfront, and the other players, composing the environment, then rationally respond by playing strategies forming a subgame perfect equilibrium. We study the complexity of the rational synthesis problem when the players have ω-regular objectives encoded as parity objectives. Our algorithm is based on an encoding into a three-player game with imperfect information, showing that the problem is in 2ExpTime. When the number of environment players is fixed, the problem is in ExpTime and is NP- and coNP-hard. Moreover, for a fixed number of players and reachability objectives, we get a polynomial algorithm.

Keywords and phrases:
non-zero-sum games, subgame perfect equilibria, rational synthesis
Funding:
Véronique Bruyère: supported by FNRS under PDR Grant T.0023.22 (Rational).
Jean-François Raskin: supported by Fondation ULB (https://www.fondationulb.be/en/), the Thelam fund 2024-F2150080-0021312 (Open Problems on the Decidability and Computational Complexity of Infinite Duration Games), and by FNRS under PDR Grant T.0023.22 (Rational).
Copyright and License:
[Uncaptioned image] © Véronique Bruyère, Jean-François Raskin, Alexis Reynouard, and Marie Van Den Bogaard; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation
Related Version:
Full Version: https://arxiv.org/abs/2412.08547 [19]
Editors:
Patricia Bouyer and Jaco van de Pol

1 Introduction

Studying non zero-sum games played on graphs of infinite duration with multiple players [4, 13] poses both theoretical and algorithmic challenges. This paper primarily addresses the (non-cooperative) rational synthesis problem for n-player non zero-sum games featuring ω-regular objectives. In this context, the goal is to algorithmically determine the existence of a strategy σ0 for the system (also called player 0) to enforce his objective against any rational response from the environment (players 1,,n). So, rational synthesis supports the automatic synthesis of systems wherein the environment consists of multiple agents having each their own objective. These agents are assumed to act rationally towards their own objective rather than being strictly antagonistic (to the system). This approach contrasts with the simpler scenario of zero-sum two-player game graphs, the fully antagonistic setting, a framework extensively explored in earlier reactive synthesis research, see [2] and the numerous references therein. While the computational complexity of rational synthesis, where rationality is defined by the concept of Nash equilibrium (NE), has been explored in [26], this paper revisits the rational synthesis problem using the more encompassing definition of subgame perfect equilibrium (SPE) to formalize rationality. Nash equilibria (NEs) have a known limitation in sequential games, including the infinite-duration games on graphs that we consider here: they are prone to non-credible threats. Such threats involve decisions within subgames (potentially reached after a deviation from the equilibrium) that, while not rational, are intended to coerce other players into specific behaviors. To address this limitation, the concept of subgame-perfect equilibrium (SPE) was introduced, as discussed in [39]. An SPE is a profile of strategies that form Nash equilibria in every subgame, thereby preventing non-rational and thus non-credible threats. Although SPEs align more intuitively with sequential games, their algorithmic treatment in the context of infinite-duration graph games remains underdeveloped. This gap persists primarily because SPEs require more complex algorithmic techniques compared to NEs. Moreover, the standard backward induction method used in finite duration sequential games [33] cannot be directly applied to infinite duration games due to the non-terminating nature of these interactions.

Kupferman et al. introduced rational synthesis in two distinct forms. The first approach, dubbed cooperative rational synthesis [29], considers an environment working collaboratively with the system to determine whether a specific SPE that ensures the specification of the system is met. So, in this model, the agents of the environment engage in an SPE that guarantees a win for player 0, provided such an equilibrium exists. At the opposite, the second approach, termed non-cooperative rational synthesis (NCRS) [34], grants the environment greater flexibility. Here, the system first selects a fixed strategy σ0 and then the environmental agents respond by selecting any strategy that form an SPE with the fixed strategy σ0 of the system. The central algorithmic challenge is determining whether there exists a strategy σ0 for the system such that every resulting σ0-fixed SPE ensures the specification of the system is upheld (SPE-NCRS problem). The computational complexity of decision problems for the cooperative synthesis problem with subgame-perfect equilibria (SPE) is now well understood. While the decidability of this problem was first established in [43], its exact complexity was resolved in [6], where the problem was shown to be NP-complete for parity objectives.

In contrast, the computational complexity of the SPE-NCRS problem remains less thoroughly investigated. Although the decidability of this problem can be determined through an encoding in Strategy Logic [38], such an encoding does not provide clear insights into the effective construction of the strategy σ0 and is suboptimal from an algorithmic standpoint. For example, for rational environment behaviors modeled by Nash equilibria (NEs) instead of SPEs, the NE-NCRS problem can be solved in 3ExpTime for LTL objectives using a Strategy Logic encoding [34], while it can be solved in 2ExpTime through the use of tree automata [35]. This shows that using a reduction to Strategy Logic does not deliver optimal worst-case complexity. Addressing this gap, our contribution is twofold. First, we introduce an innovative algorithm that transforms the SPE-NCRS problem into a three-player imperfect information game. Such games were analyzed in [22], where computationally optimal algorithms for their analysis were presented. Second, our reduction offers a clear advantage over the Strategy Logic encoding, providing improved complexity (double exponential time as opposed to triple exponential time for LTL objectives in [34]). It also enables a precise analysis of the algorithmic complexity when the number of players is fixed, a consideration that is practically relevant in cases where the environment consists of a limited number of players.

Figure 1: Structure of the article.

Technical contributions and sketch of our solution.

The summary of our technical contributions is depicted in Figure 1. Our main result shows how to transform the SPE-NCRS problem into a two-player zero-sum parity game with perfect information (Theorem 11), a well-studied class of games for which algorithms are available. (Thus, our algorithm supports LTL specifications via their translation into deterministic parity automata, yielding parity conditions on the game graph). The transformation is structured in several non-trivial steps.

First, to solve the SPE-NCRS problem, we propose to use the Prover-Challenger framework initially introduced for the development of algorithms capable of determining the presence of a simulation relation between transition systems (for more details and extensions of this concept, see e.g. [1, 32, 37]). However, in our context, we need to use two Provers P1 and P2 with the Challenger C: P1 aims to demonstrate the existence of a solution, i.e., a strategy σ0 for the system, to the SPE-NCRS problem, while C seeks to counter this assertion, i.e., with a subgame perfect response σ¯0 that results in an unfavorable outcome for player 0. Then, P2 endeavors to demonstrate that the combined profile (σ0,σ¯0) is either not a 0-fixed SPE or its outcome favors player 0. To ensure that σ0 is fixed and cannot be modified in subgames, we prevent P1 from adjusting his strategy based on the interactions between C and P2 by imposing to him imperfect information of the game. More intuition and the formal definition of this P1CP2 game are given in Section 3 together with a proof of correctness (Theorem 20).

Second, we detail our method for solving the P1CP2 game. Given that this game involves three players with imperfect information, specialized techniques are essential for its resolution as multi-player games with imperfect information are undecidable in general (see e.g. [40]). We employ a solution specifically adapted for our context, derived from a transformation introduced in [22]. This transformation was originally proposed for addressing similar types of three-player games with imperfect information. Due to the intricate winning condition present in our Prover-Challenger reduction, we translate it into an explicit Rabin objective. After that transformation of the winning condition, the aforementioned techniques for three-player games with imperfect information can be adapted. As a result of this transformation, we obtain a more conventional two-player zero-sum Rabin game with imperfect information.

Third, techniques to remove imperfect information, see e.g. [41, 42], can then be used to obtain the desired two-player zero-sum parity game with perfect information. Therefore, solving the SPE-NCRS problem reduces to solving this two-player zero-sum parity game for which algorithms are well-known.

In Section 4, we provide a complexity analysis of each step of our construction. This analysis enables us to derive detailed complexity results: solving the SPE-NCRS problem for parity objectives is exponential in the size of the graph and the number of priorities used by the parity objectives, and double-exponential in the number of players (while a PSpace lower bound can be deduced from [26]). Consequently, when the number of players is fixed, we achieve membership in ExpTime. Furthermore, by adapting proofs from the NE-NCRS problem [27], we establish NP- and co-NP-hardness (the NE-NCRS problem is in PSpace and both NP- and co-NP-hard, even for a fixed number of players). Finally, for the specific case of reachability objectives, we obtain a polynomial-time algorithm when the number of players is fixed, as is the case for the NE-NCRS problem [27].

To help the reader navigate the paper, we provide in Figure 1 a visual representation of its structure. Furthermore, for additional details on concepts and proofs, we provide a complete full version in [19] and two Appendices A and B to present a few helpful examples.

Related work.

Recent literature, such as surveys in [4, 13, 14], underscores a growing interest in non zero-sum games for synthesis and design of multi-agents systems. Algorithms have been developed for reasoning about NEs in graph games, both for ω-regular [44] and quantitative objectives [12], or even in a concurrent setting [3]. The concept of secure equilibrium, a refinement of NE, was introduced in [25] and its potential for synthesis was demonstrated in later studies [24]. Similarly, doomsday equilibria, an expansion of secure equilibria for n-player games, is elaborated upon in [23]. All those results are related to the notion of NE. Algorithms to reason on SPEs are more recent and usually more intricate. For that reason, progresses on weaker notions like weak SPE [8, 10, 18] have been needed before establishing tight complexity bounds for SPEs. For SPEs, the exact complexity for the constrained existence for reachability games was established in [9], for parity games in [6], and for mean-payoff games in [5]. All those works introduce new algorithms that are substantially more sophisticated than those needed to study the notion of NE. As mentioned previously, the SPE-NCRS problem with LTL objectives can be addressed by reducing it to the model-checking problem of Strategy Logic, as shown in [34]. However, this encoding results in a triple-exponential complexity, even when the number of players is fixed. Consequently, this approach does not allow for a fine-grained complexity analysis when the number of players is treated as a fixed parameter, and so it cannot yield the ExpTime complexity we have achieved in this case, nor the PTime complexity for a fixed number of players with reachability objectives. Cooperative rational synthesis was first introduced in [29]. The adversarial version was later introduced in [34]. In both cases, as mentioned above, the decidability results were given through a reduction to Strategy Logic [38]. A more detailed analysis of the complexity for a large variety of ω-regular objectives and for rationality expressed as NEs was given in [26] for turn-based games and in [27] for concurrent games, for LTL objectives in [35], and for two players and for mean-payoff and discounted-sum objectives in [28]. Those results do not cover SPEs as we do in this paper. Rational verification (instead of rational synthesis) studies the problem of verifying that a given strategy σ0 for the system is a solution to the NE/SPE-NCRS problem. The complexity of this simpler problem has been studied for several kinds of objectives in [7, 30, 31]. Another notion of rational environment behavior treats the environment as a single agent but with multiple, sometimes conflicting, goals, aiming for behaviors that achieve a Pareto-optimal balance among these objectives. Both rational synthesis and verification have been very recently studied for this concept of rationality [11, 15, 16, 17].

2 Preliminaries

In this section, we recall the necessary notions and concepts underpinning this work: First, we specify the model of games on graphs that we study, along with the corresponding definitions of plays, (winning) strategies, and objectives. Second, we present the solution concepts relevant to our non-zero-sum model and specific problem, ranging from Nash equilibria to 0-fixed subgame-perfect equilibria. Finally, we provide a precise statement of the SPE-NCRS problem (see Definition 10) and our main results (see Theorem 11).

Definition 1 (Game structure).

A game structure is a tuple G=(V,A,Π,δ,v0), where:

  • Π={0,,n} is a finite set of players,

  • A=i=0nAi is the set of actions of the players, such that Ai is the action set of player i, iΠ, and A0(i0Ai)=,

  • V=i=0nVi is the set of states, such that Vi is the state set of player i, iΠ, and ViVj= for all ij,

  • v0V is the initial state,

  • δ:i=0n(Vi×Ai)V is a partial function called the transition function, such that:

    1. 1.

      G is deadlock-free: for every state vVi, there exists aAi an action of player i, such that δ(v,a) is defined,

    2. 2.

      G is action-unique: for every state vVi, and for all a,bAi actions of player i, we have δ(v,a)=δ(v,b)a=b.

The size of a game structure is given by the numbers |V|, |A|, and |Π| of its vertices, actions, and players respectively.

We say that a state vVi is controlled or owned by player i. Note that the condition A0(i0Ai)= means that one knows from the actions when player 0 is the one that is playing. This property of the action set of player 0 is not classical nor necessary but will reveal itself useful in the remaining of the paper. Indeed, in the setting we study here, player 0 has a distinguished role compared to the other players, see Section 3. Condition 1 on the transition function ensures that in every state, there is always a possible action to play. Condition 2 requires that a transition between two states can only be achieved via a unique action.

A play in a game structure G is an infinite sequence ρ(VA)ω of states and actions, such that ρ is of the form v0a0v1a1 with v0 the initial state and where for every k, we have δ(vk,ak)=vk+1. The set of all plays in G is denoted 𝖯𝗅𝖺𝗒𝗌G. A history is a finite prefix h(VA)V of a play ending in a state of G, that we describe as h=v0a0v1a1vk. The set of all histories in G is denoted 𝖧𝗂𝗌𝗍G, while for iΠ, the set of histories ending in a state controlled by player i is denoted 𝖧𝗂𝗌𝗍Gi. We write hρ if the history h is a prefix of the play ρ. We also use both notations and for two histories.

A strategy for player iΠ is a function σ:𝖧𝗂𝗌𝗍GiAi, that prescribes an action σ(h) for player i to choose for every history where it is his turn to play. The set of all strategies σ for player i is denoted by Σi. A collection σ¯=(σi)iΠ of strategies, one for each player, is called a profile of strategies. A play ρ=v0a0v1a1 is compatible with a strategy σi of player i if for every k such that vkVi, we have ak=σi(v0a0vk). Histories compatible with a strategy are defined similarly. A strategy σ is said to be memoryless when the prescribed action only depends on the last visited state, that is, for every hav,hav𝖧𝗂𝗌𝗍Gi, we have σ(hav)=σ(hav). Given a profile of strategies σ¯=(σi)iΠ, there is a unique play starting in v0 that is compatible with every strategy of the profile, that we call the outcome of σ¯ and denote it by σ¯v0. Given a strict subset of players ΠΠ, we write σ¯Π to refer to a partial profile of strategies that contains a strategy for each player except the ones in Π. In particular, we will often focus on the strategy of one player, say Π={i}, and the profile of strategies of the rest of the players, and use the notation (σi,σ¯i) to denote the complete profile of strategies.

Definition 2 (Objective and game).

A winning condition, or objective for player i is a subset Wi𝖯𝗅𝖺𝗒𝗌G of plays in the game structure G. We say that a play ρ is winning for player i or satisfies his objective if ρWi. Otherwise, we say that ρ is losing for player i. A game is a pair 𝒢=(G,(Wi)iΠ) consisting in a game structure G, together with a profile of objectives for the players. When the context is clear, we often only write 𝒢 to designate a game. Given a strategy profile σ¯, its gain profile (or simply gain) is the Boolean vector g¯ such that for all iΠ, we have gi=1 if σ¯v0Wi, and gi=0 if σ¯v0Wi. We also say that g¯ is the gain profile of the outcome of σ¯.

In this paper, we consider the concept of parity objective (which is expressively complete for ω-regular objectives), as well as the particular case of reachability objective. Both concepts are defined in the following way:

  • For each iΠ, let TiV be a target set. The reachability objective for player i is then Wi={ρ=v0a0v1a1𝖯𝗅𝖺𝗒𝗌G|k,vkTi} that we also denote by Reach(Ti). Given such a profile of target sets, we say that (G,(Reach(Ti))iΠ) is a reachability game.

  • For each iΠ, let Ci={0,1,,di} be a finite set of priorities and αi:VCi be a priority function, that is, a function that assigns a priority to each state of the game. The parity objective for player i is then Wi={ρ𝖯𝗅𝖺𝗒𝗌G|minvInf(ρ)(αi(v)) is even}111For a play ρ=v0a0v1a1, notation Inf(ρ) means {vV|k,vk=v}. that we also denote by Parity(αi). Given such a profile of priority functions, we say that (G,(Parity(αi))iΠ) is a parity game. The size of each parity objective, denoted by |αi|, is the maximum priority di.

 Remark 3.

Note that the objectives we consider in this work only put constraints on the states of the plays (and not on the actions).

Definition 4 (Winning strategy).

Let G be a game structure and Wi be an objective for player i. A strategy σi of player i is winning for Wi, if for every profile σ¯i, the outcome ρ of the profile (σi,σ¯i) is winning for player i, that is, ρWi.

This definition focuses on Wi only. A winning strategy σi ensures that player i satisfies his objective Wi against any strategy profile σ¯i of the other players. In particular, it ensures that player i wins even if the other players are strictly antagonistic, or adversarial, that is, their objective is to make player i lose. This context corresponds to the classical zero-sum setting, and we use notation 𝒢=(G,Wi) for the game, since the objective Wj, for all players ji, is implicitly defined as 𝖯𝗅𝖺𝗒𝗌GWi. An example of reachability game and a discussion on the winning strategies of both players are given in Appendix A.

Within the classical zero-sum context, the synthesis problem asks if there exists, for a particular player, a winning strategy (see [2] for an introduction). As recalled in the preliminaries (cf. Definition 4), such a strategy ensures that his player wins against all possible strategies of the other players. However, when we depart from this adversarial hypothesis and consider the richer setting of games that are non zero-sum, that is, where each player has his own objective, which may overlap with the objectives of the others, the solution concept of winning strategies shows its limits. Hence the call to solution concepts such as Nash equilibrium or subgame-perfect equilibrium, on which we focus in this work.

Definition 5 (Nash equilibrium [39]).

Let 𝒢=(G,(Wi)iΠ) be a game and σ¯=(σi)iΠ be a strategy profile. We say that σ¯ is a Nash equilibrium (NE for short) if for every player iΠ and strategy σiΣi, we have σi,σ¯iv0Wiσi,σ¯iv0Wi.

In other words, no player has an incentive to deviate unilaterally from its fixed strategy σi in the profile σ¯: if he does so, the resulting outcome will not satisfy his objective if the outcome of σ¯ was not already doing so. Notice that, however, there is no constraint on what players are allowed to do once a deviation has already occurred in the outcome of the NE. In fact, players may change their initial behaviours for completely adversarial choices. That is, NEs allow players to forget about their own objectives once the equilibrium outcome has been left, which can result in non-credible threats: promise of irrational behavior with respect to their own objectives. These non-credible threats are one important limitation of NEs as a solution concept that captures rationality of the players in sequential games. In order to enforce rationality in every scenario, even the ones that stem from deviations, one needs to monitor what the strategy profile prescribes after every history. This is exactly what subgame perfect equilibria (defined below) do [39]. Let 𝒢=(G,(Wi)iΠ) be a game. To each history hv𝖧𝗂𝗌𝗍G, with vV, corresponds a subgame 𝒢hv that is the game 𝒢 starting after the history hv: its plays ρ start at the initial state v and for all iΠ, ρ is winning for player i if, and only if, hρWi. Given a strategy σi for player i, this strategy in 𝒢hv is defined as σi,hv(h)=σi(hh) for all histories h𝖧𝗂𝗌𝗍G starting with the initial vertex v. Given a strategy profile σ¯, we denote by σ¯hv the profile (σi,hv)iΠ (note that its outcome starts in v).

Definition 6 (Subgame perfect equilibrium).

Let 𝒢=(G,(Wi)iΠ) be a game. A subgame perfect equilibrium (SPE for short) is a profile σ¯ of strategies such that σ¯hv is an NE in each subgame 𝒢hv of G.

We illustrate both concepts of NE and SPE on an example in Appendix A.

In this work, we focus on synthesizing a strategy for a specific player (see Definition 10), referred to as the system (or player 0), which we assume to be trustworthy or whose strategy is fixed by design. For instance, this assumption is justified in scenarios where the system is implemented as a program that is subsequently used by its environment (the other players). Since the program is fixed, the system cannot deviate from its prescribed strategy. As stated at the beginning of this section, we consider the non zero-sum setting, where each player has their own objective, which may partially overlap with the objectives of other players. Consequently, we seek a solution concept that addresses two key aspects: (1) the specific player (the system) will not deviate from the prescribed strategy, and (2) the other players may exhibit adversarial behavior toward the specific player, as long as such behavior does not compromise their own objectives, given that they are assumed to be rational [34]. We next define 0-fixed equilibria, which meet these requirements.

Definition 7 (0-Fixed NE).

Let 𝒢 be a game. A strategy profile σ¯=(σ0,σ¯0) is a 0-fixed NE, if for every player i, where i0, and every strategy σiΣi, we have σi,σ¯iv0Wiσi,σ¯iv0Wi.

In other words, if the strategy σ0 of player 0 is fixed, that is, if we assume player 0 will stick to his strategy, no other player has an incentive to deviate unilaterally. We also write that a strategy profile is a σ0-fixed NE to insist on the fixed strategy σ0 of player 0.

In this paper, we focus on 0-fixed SPEs: profiles σ¯ of strategies that are 0-fixed NE in each subgame of 𝒢 compatible with the fixed strategy of player 0. Formally:

Definition 8 (0-Fixed SPE).

Let 𝒢 be a game. A strategy profile σ¯=(σ0,σ¯0) is a 0-fixed SPE if for every history hv compatible with σ0, the profile σ¯hv is a σ0-fixed NE is the subgame 𝒢hv.

As for NEs, we use the terminology of σ0-fixed SPE. Furthermore, given a strategy σ0, we say that the strategy profile σ¯0 is a subgame-perfect response to σ0 if the complete profile (σ0,σ¯0) is a 0-fixed SPE. The next theorem guarantees the existence of an SPE in every reachability or parity game. This result also holds for 0-fixed SPEs [43].

Theorem 9 (Existence of (0-fixed) SPEs).

Given a game 𝒢=(G,(Wi)iΠ) that is a parity game or a reachability game, there always exists an SPE in 𝒢, and, for every strategy σ0 of player 0, there always exists a σ0-fixed SPE in 𝒢.

In this paper, we study the following synthesis problem. Intuitively, the non-cooperative rational synthesis problem asks, for a distinguished player, if there exists a strategy that is as close as possible to a winning strategy, taking into account the fact that the others are behaving rationally. The solutions seeked in this problem are indeed strategies that ensure winning for every possible way to complete the profile in a rational manner (i.e., other players do not cooperate but take care of their own objectives first).

Definition 10 (Non-cooperative rational synthesis problem).

The non-cooperative rational synthesis problem (SPE-NCRS problem for short), asks, given a game 𝒢=(G,(Wi)iΠ), if there exists a strategy σ0 of player 0, such that, for every subgame-perfect response σ¯0, the resulting outcome (σ0,σ¯0)v0 is winning for player 0.

The previous definition is illustrated with an example in Appendix A.

We now state our main results.

Theorem 11 (Complexity of the SPE-NCRS problem).

The SPE-NCRS problem for parity games is in 2ExpTime and PSpace-hard. Given 𝒢=(G,(Parity(αi))iΠ), this problem is solvable in time exponential in |V| and each |αi|, iΠ, and double-exponential in |Π|.

Furthermore, if the number of players is fixed, the SPE-NCRS problem is in ExpTime, and NP-hard and co-NP-hard for parity games; and it is solvable in time polynomial in |V| for reachability games.

3 SPE-NCRS Problem and 𝑷𝟏𝑪𝑷𝟐 Game

3.1 Overview of our solution

One may notice the particular role player 0 has in the setting of the SPE-NCRS problem: one needs to find, if it exists, a strategy σ0 of player 0 that ensures him to win for all possible partial profiles σ¯0 such that (σ0,σ¯0) is a 0-fixed SPE. Our approach is to construct, in the spirit of [36], a zero-sum two-player Prover-Challenger game that will, once solved, deliver the answer. The key idea in this approach is that Prover is set to prove that there exists a solution σ0 to the SPE-NCRS problem, while Challenger wants to disprove this claim. However, the special role of player 0 in the original game 𝒢 must be taken into account. To this end, we need to split Prover into a coalition of two Provers, Prover 1 and Prover 2 (we later show, in Section 4, how to come back to the two-player model). This game is called P1CP2 game and is denoted by P1CP2(𝒢).

Let us give some intuition. The three players P1, C, and P2, proceed to construct, step by step and with some additional interactions, a play in P1CP2(𝒢), that simulates a play in 𝒢 (see Definition 15 below). Each player has a specific part to play in the construction of this simulating play. As mentioned previously, player P1 has the particular task to exhibit a candidate solution strategy σ0 to the SPE-NCRS problem for 𝒢. This is done by simulating player 0 playing σ0 and not deviating from it. Thus, along the play in P1CP2(𝒢) being constructed, whenever the corresponding state in the simulated play of 𝒢 belongs to player 0, it is then up to P1 to choose the next state. One of the tasks of player C is to try to complete σ0 with a subgame perfect response σ¯0 from the other players in 𝒢. Thus, whenever the corresponding state in the simulated play of 𝒢 belongs to a player i0, it is player C’s turn to play in P1CP2(𝒢): he proposes to play an action according to some strategy σi of player i in 𝒢 and in addition predicts the gain profile of what will be the outcome of the profile (σ0,σ¯0), that is, of the simulated play being constructed. However, since players other than player 0 in 𝒢 can deviate, whenever C has made a proposal for a next state, he has to let player P2 have a say: P2 can either accept this proposal or refuse it and deviate on behalf of player i by choosing another state as the next one in the simulating play. This phase is called the decision phase. If this phase results in a deviation, the game proceeds to the adjusting phase: since the current subgame has changed, Challenger has to predict the gain profile of (σ0,σ¯0) in this subgame, that shows the deviation of player i was not a profitable one. Then, the play in P1CP2(𝒢) resumes to the construction phase again.

How do we ensure that σ0 is fixed? In other words, what restricts P1 from adapting his strategy to the way C and P2 interact, but only according to the so far constructed simulated history in G? The solution we choose is to make use of imperfect information: in the P1CP2 game, P1 cannot, in fact, distinguish between all the states. To model the partial view P1 has on the states of the game, we speak of observations. Informally, to each state (and action) is associated an observation, and P1 has to take action upon sequences of observations only (that we call observed histories). In particular, it is assumed that the strategies should respect the observations, in the sense that two different histories yielding the same sequence of observations should trigger the same action of P1. Thus, we only consider these observation-based strategies as possible behaviors of P1. We now proceed to the formal definition of the P1CP2 game. In Section 3.4, we show that solving the SPE-NCRS problem is equivalent to solving this game in a sense stated in Theorem 20.

3.2 Observation Functions

We begin by introducing the concepts of observation and of observation-based strategy. Let G=(V,A,Π,δ,v0) be a game structure and player i be some player in G. Let 𝒪 be a partition of V, that defines an observation function 𝒪bs:V𝒪 for player i, that is, for every state v, we have 𝒪bs(v)=o𝒪. Let then 𝒪¯ be a partition of A, that extends the previous function 𝒪bs to actions, that is, for every action a, we have 𝒪bs(a)=o¯𝒪¯. The function 𝒪bs extends to histories and plays in the straightforward way. We say that player i observes G through 𝒪bs if he cannot distinguish between states (resp. actions, histories, plays) that yield the same observation via the function 𝒪bs. For those states or actions that are the unique element of their observation set, we say that they are visible. We say that 𝒪bs is the identity function if 𝒪bs(v)={v} for all vV and 𝒪bs(a)={a} for all aA. Whenever player i observes the game structure through the identity function, we say that player i has perfect information. If it is through any other function, we say that he has imperfect information. We will always assume that a player sees his own actions perfectly (that is, his observation function restricted to the domain of his action set corresponds to the identity function). Furthermore, we assume all players have perfect recall: they remember the full sequence of observations they witnessed from the start of the play. In the sequel, we only consider player-stable observation functions such that the last states of the histories observed similarly all belong to the same player.

Definition 12 (Player-stable observation function).

An observation function 𝒪bs for player i is player-stable if for every two histories h=v0a0v1a1vk and g=u0b0u1b1uk such that 𝒪bs(h)=𝒪bs(g), then the states v,u are controlled by the same player, for all {0,,k}.

The identity observation function is trivially player-stable. A game 𝒢 with objectives Wi for all its players iΠ and a player-stable observation function 𝒪bs for one of its players is denoted 𝒢=(G,(Wi)iΠ,𝒪bs). When 𝒪bs is the identity, we do not mention it in this notation. Notice that with a player-stable observation function for player i, we have that, given a history h𝖧𝗂𝗌𝗍Gi, all histories h with the same observation as h also belong to 𝖧𝗂𝗌𝗍Gi. It is therefore natural to ask that a strategy σi for player i is observation-based, that is, the same action is played by player i after all histories observed similarly.

Definition 13 (Observation-based strategy).

Given a player-stable observation function 𝒪bs, a strategy σi of player i is observation-based if for every pair of histories h,h, if h𝖧𝗂𝗌𝗍Gi and 𝒪bs(h)=𝒪bs(h), then σi(h)=σi(h).

Both concepts of observation and observation-based strategy are illustrated in an example in Appendix B.1.

3.3 Definition of the 𝑷𝟏𝑪𝑷𝟐 Game

Figure 2: General structure of a play in P1CP2(𝒢).

In this section, let us fix a game 𝒢=(G,(Wi)iΠ) with a game structure G=(V,A,Π,δ,v0) and objectives (Wi)iΠ. Later, this game will be a reachability game or a parity game. We here formally define the corresponding  P1CP2 game, that was informally presented in Section 3.1. Recall that the role of player P1 is to simulate a strategy σ0 of player 0 in 𝒢, while player C has to propose a subgame perfect response to σ0 such that (σ0,σ¯0) is losing for player 0 in 𝒢, response to which player P2 can react by choosing some deviations (decision phase). The interactions between the three players result in a play in P1CP2(𝒢) that simulates a play in 𝒢. Challenger has to predict a gain profile for this simulated play, which he can update upon deviations from P2 (adjusting phase). We begin by defining the game structure of the new game P1CP2(𝒢).222Note that to avoid confusion with the players of G, and without loss of generality, the three players of the P1CP2 game have been given explicit names rather than numbers. We then define the observation function of P1 in Definition 19; and we finally define and describe the objectives of the three players. Figure 2 should help the reader for understanding those definitions. The states (except its initial state v0) of the P1CP2 game structure have several components: All of them have a state of G as their first component, that we call G-component. Similarly, all of them have a gain-component, which consists of a gain profile g¯{0,1}|Π| for the players of G, where in each component gi of g¯, 0 symbolizes a loss and 1 a win with respect to objective Wi. The states that have only these two components are called G-states and belong either to P1 or C. A projection on the G-components of these states determines the simulated play in G (see Definition 15). As already stated in Section 3.1, whenever the current state of the simulated play belongs to player 0 in G, the next state of the simulated play is chosen directly by P1. However, when it is not the case, C has to make a proposition, which should then be validated or changed by P2 (decision phase). Thus, to reflect this proposal while remaining hidden to P1, the successor states of G-states belonging to C have an extra component, called action-component, which records the action of G proposed by C. From such states called action-states, P2 has to react by confirming or changing the action that was proposed by C. This is modeled by a third kind of states, called player-states. Such states have a player-component, which consists of a player from G different from player 0 to signal to C this player has indeed deviated (or the empty set, to signal acceptance by Prover 2 of Challenger’s proposal).

Definition 14 (States of the P1CP2 game).

Given a game 𝒢, the game structure of P1CP2(𝒢) is a game structure G=(V,A,{P1,C,P2},δ,v0), where P1, C, and P2 are the three players and the set of states V=VP1VCVP2 is as follows:

  • VP1={(v,g¯)|vV0,g¯{0,1}|Π|},

  • VC={v0}{(v,g¯)|vVV0,g¯{0,1}|Π|}{(v,i,g¯)|vV,iΠ{0}{},g¯{0,1}|Π|},

  • VP2={(v,a,g¯)|vVV0,aAA0 and δ(v,a) is defined,g¯{0,1}|Π|},

  • v0 is the initial state.

Among those states, (v,g¯) are G-states, (v,a,g¯) are action-states, and (v,i,g¯) are player-states. Moreover, v is a G-component, g¯ is a gain-component, a is an action-component, and i is a player-component. Notice that the sets VP1,VC,VP2 are pairwise disjoint.

Let us now describe the transitions of the P1CP2 game and explain further the actions of the players, see also Figure 2 . The goal of Challenger is to prove the existence of a σ0-fixed SPE losing for player 0 in 𝒢. To be able to verify this claim in P1CP2(𝒢), Challenger has to predict the gain of the simulated play that is being constructed. To do so, the gain-components of the states of P1CP2(𝒢) are used as follows: First, Challenger owns the initial state v0, and has to choose a gain profile g¯, losing for player 0,333As an outcome of a subgame perfect response to σ0 losing for player 0 is exactly what Challenger wants to exhibit, for any σ0 simulated by P1. to start the construction of the simulated play in the G-state (v0,g¯). Then, whenever P2 chooses to make some player i deviate, before reaching a G-state, Challenger has to respond by choosing an adjusted gain profile with a lower or equal gain for this player i (adjusting phase). This new gain shows to P2 the absence of any profitable deviation for player i. Note that for the P1CP2 game structure, two modeling decisions have been made to help handle the subsequent developments. First, to ensure a certain regularity of the plays’ shape in the P1CP2 game and that P1 cannot infer whether P2 actually made a deviation, the adjusting phase is played regardless of a deviation occurring or not: if there was no deviation, Challenger has no choice but to play the same gain profile. Second, to avoid confusion on who, among Challenger and Prover 2, is currently playing some action aAA0, this action is of the form (a,i) for C and a for P2, that is, we additionally recall the player i from G performing the action a when Challenger is playing.

As already mentioned in Section 3.1, given a play in P1CP2(𝒢), there exists a unique corresponding play in G being simulated by the interactions of the Provers and Challenger.

Definition 15 (Simulated play and gain in G).

Let ρ=v0a0v1a1v2a2 be a play in the game structure of P1CP2(𝒢). The simulated play of ρ is the play 𝗌𝗂𝗆(ρ)𝖯𝗅𝖺𝗒𝗌G being the projection of ρ on the G-component of G-states (which belong either to P1 or C) and on actions of P1 and P2.444This projection does not take into account the action- and player-states and the actions of Challenger. The definition extends naturally to histories. The simulated gain of ρ is a Boolean vector 𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ){0,1}|Π|, such that 𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ)i=0 if 𝗌𝗂𝗆(ρ) is losing for player i, and 𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ)i=1 if 𝗌𝗂𝗆(ρ) is winning for player i.

Let us give an example:

Example 16.

Consider the fictional history h=v0g¯(v0,g¯)a0(v1,g¯)(a1,i)(v1,a1,g¯)a1(v2,i,g¯)g¯(v2,g¯) in the P1CP2 game of some game 𝒢. It starts in the initial state v0, where C chooses a gain profile g¯ where g0=0. Thus the second state of the history is the G-state (v0,g¯). Looking at the form of the third state (v1,g¯) in h, we can deduce that (v0,g¯) belongs to Prover 1, that is, v0 belongs to player 0 in G. Moreover δ(v0,a0)=v1. Then, the fourth state (v1,a1,g¯) in h is an action-state, which means that the previous state (v1,g¯) belongs to Challenger, such that v1 belongs to player i0 in G, and a1 is the action from G proposed by C in this scenario. One can see that P2 then chooses the action a1, which is different from a1 since the fifth state (v2,i,g¯) of h is a state whose player-component i is not equal to . This indicates that P2 makes player i deviate in G and that δ(v1,a1)=v2. By now, Challenger has to choose a new gain profile g¯, yielding the last state (v2,g¯) of h, which is a G-state. By projecting h on the G-components of its G-states and on the actions of the Provers, one can check that the simulated history of h in G is 𝗌𝗂𝗆(h)=v0a0v1a1v2. Consider now the fictional play ρ=h((a2,j)(v2,a2,g¯)a2(v2,,g¯)g¯(v2,g¯))ω in the P1CP2 game. Looking at ρ, one can deduce that from (v2,g¯), the last state of h, Challenger proposes action a2 from G for player j0 and that P2 accepts this proposal. Furthermore, Challenger cannot adjust the gain profile, leading to state (v2,g¯). One can see that this behavior repeats indefinitely, thus the corresponding simulated play is 𝗌𝗂𝗆(ρ)=𝗌𝗂𝗆(h)(a2v2)ω. The simulated gain 𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ) of ρ is a Boolean vector deduced from 𝗌𝗂𝗆(ρ), such that its i-th component is equal to 1 if, and only if, 𝗌𝗂𝗆(ρ) is winning for player i.

 Remark 17.

Note that while there exists a unique simulated play 𝗌𝗂𝗆(ρ) in 𝒢 for every play ρ in P1CP2(𝒢), the converse does not hold. Indeed, several different sequences of interactions between P2 and C yield the same simulated play in 𝒢. For instance, Challenger can propose different actions that P2 can refuse.

We have seen that each state and action of G appear in several contexts in the game structure G of P1CP2(𝒢). Let us now state how the size of G depends on the size of G. The proof of this lemma directly follows from the definition of the P1CP2 game.

Lemma 18.

Given the size |V|, |A|, and |Π| of the game structure of 𝒢, the game structure of P1CP2(𝒢) has 3 players and a size |V| linear in |V| and |A|, and exponential in |Π|, and |A| linear in |A| and exponential in |Π|.

In the P1CP2 game, recall that C and P2 have perfect information whereas P1 has imperfect information to ensure that P1 cannot adapt his strategy from observing the interactions between C and P2. The observation function 𝒪bs for player P1 is defined below on V and A. For each state v of V, he is only able to observe the G-component of v. Concerning the actions of A, those of AP1AP2 are all visible whereas those of AC are not visible at all. In the next definition, there is an abuse of notation in 𝒪 and 𝒪¯.

Definition 19 (Information in the P1CP2 game).

Given the game structure G of P1CP2(𝒢), players C and P2 have perfect information and the observation function 𝒪bs of player P1 is defined as follows: We have 𝒪bs:V𝒪={v0}{vvV} such that 𝒪bs(v,g¯)=v for all (v,g¯)VP1, 𝒪bs(v0)=v0, 𝒪bs(v,g¯)=v for all (v,g¯)VC, 𝒪bs(v,i,g¯)=v for all (v,i,g¯)VC, and 𝒪bs(v,a,g¯)=v for all (v,a,g¯)VP2. Furthermore, we have 𝒪bs:A𝒪¯={}{aaA} such that 𝒪bs(a)=a for all aAP1, 𝒪bs(a)= for all aAC, and 𝒪bs(a)=a for all aAP2.

To finalize the definition of the P1CP2 game, it remains to define the objectives of the three players. Recall that the objective of Challenger is to show that for each strategy σ0 of player 0 in 𝒢, there exists a subgame perfect response σ¯0 such that the outcome of (σ0,σ¯0) is losing for player 0. Let us give some intuition on what is a play ρ in the P1CP2 game that is winning for C. Three winning situations may occur:

(iC)

Eventually, that is, after reaching some subgame, P2 always accepts the action proposals of C and the gain predicted by C in the subgame is correct (it is equal to the simulated gain in this subgame), or

(iiC)

eventually, P2 keeps making one unique player i deviate in the decision phase, but C is able to adjust the gain to show that this deviation is not profitable for player i, or finally

(iiiC)

P2 keeps making at least two different players deviate, essentially conceding the play, as the only deviations that can be considered within the scope of (0-fixed) SPEs are unilateral deviations.

In the P1CP2 game, the two Provers have the same objective WP that is opposed to the objective WC of Challenger. Indeed, recall that their objective is to exhibit a strategy σ0 for player 0 in G, such that for every subgame-perfect response σ¯0 to σ0, the outcome of the resulting profile (σ0,σ¯0) is winning for player 0. Let us give some intuition on what is a winning play for the Provers. Two winning situations may occur for the Provers along a play ρ in the P1CP2 game:

(iP)

Eventually, after reaching some subgame, P2 always accepts the action proposals of C and the gain predicted by C is incorrect, or

(iiP)

eventually, P2 keeps making one unique player i deviate in the decision phase, and C is not able to adjust the gain to show that this deviation is not profitable for player i.

A full formal definition of the players’ objectives is provided in Appendix B.2.

Given a game 𝒢, we have P1CP2(𝒢) =(G,WP,𝒪bs) its P1CP2 game, where WP is the objective of the Provers.

3.4 Equivalence Between SPE-NCRS Problem and 𝑷𝟏𝑪𝑷𝟐 Game

In this section, we show that solving the SPE-NCRS problem for a game 𝒢 is equivalent to solving its P1CP2 game, as stated in the following theorem. To keep things short, we give here a sketch of the proof, and refer the reader to [19] for a full and detailed account.

Theorem 20 (Equivalence theorem between 𝒢 and P1CP2(𝒢)).

Let 𝒢=(G,(Wi)iΠ) be a game, and P1CP2(𝒢) =(G,WP,𝒪bs) be its P1CP2 game. There exists in 𝒢 a strategy σ0 of player 0, such that, for every subgame-perfect response σ¯0, the play (σ0,σ¯0)v0 is winning for player 0 if, and only if, there exists in P1CP2(𝒢) an observation-based strategy τP1 of P1 such that for all strategies τC of C, there exists a strategy τP2 of P2 such that the play τP1,τC,τP2v0 belongs to WP.

Proof sketch.

Let σ0 be a solution to the SPE-NCRS problem of 𝒢. Let τP1 be the observation-based strategy of P1 that simulates σ0: it is naturally defined as the strategy that makes the same choice of action as σ0 at h in 𝒢 at all the histories that are observed as h for P1 in P1CP2(𝒢). Let τC be a strategy of C. Let σ¯0 be the strategy profile obtained from τC by extracting the actions chosen by C at histories that indeed simulate histories in 𝒢. There are two possibilities, either (σ0,σ¯0) is a 0-fixed SPE in 𝒢, or it is not a 0-fixed SPE.

  1. 1.

    Suppose (σ0,σ¯0) is a 0-fixed SPE in 𝒢. Let τP2 be τacc, the accepting strategy of P2, where he accepts every action proposal of C and never deviates. By definition of τP2, the simulated play of τP1,τC,τP2v0 is σ0,σ¯0v0. In that case, since σ0 is a solution to the SPE-NCRS problem of 𝒢, we know that σ0,σ¯0v0W0. However, by construction of P1CP2(𝒢) and the fact that τP2 is the accepting strategy of P2, we know that C has predicted a loss for player 0 from the start, and has never had to adjust his prediction. Thus, we are in the case (iP) of the winning condition for the Provers.

  2. 2.

    Suppose now (σ0,σ¯0) is not a 0-fixed SPE in 𝒢. This means that it is not a 0-fixed NE in some subgame of 𝒢. Let h be a history compatible with σ0 such that (σ0,σ¯0)h is not an NE. Let player i0 be a player in 𝒢 that has a profitable deviation σi in the subgame 𝒢h starting after h. In P1CP2(𝒢), we can show that there exists a unique history h ending in a G-state, compatible with τC, such that 𝗌𝗂𝗆(h)=h and that is also compatible with τP1. Depending on Challenger’s prediction for player i at h, the strategy τP2 has to be different: if Challenger predicted a loss, we let τP2 bring the play in h, then simulate the profitable deviation of player i. If Challenger predicted a win, we let τP2 bring the play in h, then switch to τacc: since player i has a profitable deviation from h, it is necessary the case that the resulting play in 𝒢 is losing for player i.

Turning to the other direction, assume that there is no solution to the SPE-NCRS problem in 𝒢. Let τP1 be a strategy of P1 in P1CP2(𝒢). Let σ0 be the strategy of player 0 in 𝒢 such that τP1 is its simulation. By Theorem 9, there exists a σ0-fixed SPE in 𝒢. Among all those σ0-fixed SPEs, one must have an outcome losing for player 0 as, by assumption, σ0 is not a solution of the SPE-NCRS problem in 𝒢. Let σ¯=(σ0,σ¯0) be such a σ0-fixed SPE. From these, we can define a strategy τC for C that will precisely enact σ¯0 on relevant histories, and predict correctly the gain profile as the one σ in every subgame. In particular, at the initial state, it predicts the gain profile g¯ of σ¯, such that g0=0 by choice of the 0-fixed SPE.

Let now τP2 be a strategy of P2. Consider the play ρ=τP1,τC,τP2v0. If there are infinitely many deviations by two different players of 𝒢 in the play ρ, then ρ satisfies the winning condition (iiiC) of Challenger. Assume now that there is at most one player that P2 makes deviate. There are two cases:

  1. 1.

    Either τP2 prescribes a finite (possibly null) number of deviations along ρ, and then switches to the accepting strategy τacc. Eventually, after reaching some subgame, P2 always accepts the action proposals of C and the gain predicted by C in the subgame is correct. Hence ρ satisfies the winning condition (iC) of Challenger.

  2. 2.

    Or τP2 prescribes an infinite number of deviations for the same player i along ρ. As there are only two values (0 and 1) for the gain-components and there is no opportunity for the i-component of the predicted gain to increase, we eventually get a stable gain-component gi for player i. If gi=1, we clearly have gi=1𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ)i, showing that (iiC) is satisfied. If gi=0, recall that σ¯ is a 0-fixed SPE, thus it is an NE in the subgame from which gi was stabilized. The outcome of σ¯ in this subgame is losing for player i as Challenger correctly predicts a gain of 0 with τC. Any deviation from this outcome is thus also losing for player i, including 𝗌𝗂𝗆(ρ). Thus, (iiC) is again satisfied.

4 Solving the 𝑷𝟏𝑪𝑷𝟐 Game

This section is the last puzzle piece to complete the proof of Theorem 11. We present here the key ideas behind the manyfold process of solving the P1CP2 game, while the full technical constructions and details can be found in [19]. Thanks to Theorem 20, we have an equivalence between the existence of a solution of the SPE-NCRS problem for a game and the fact that the Provers are able to win the associated P1CP2 game. More precisely, the situation where P1 has a strategy, such that for every strategy of Challenger, P2 has a strategy to make the Provers win. The remaining question is how to determine whether this is the case? In other words, how can one solve, in this particular sense, this three-player game with imperfect information? As shown on Figure 1 , this involves several steps. We start by a first technical step to obtain a P1CP2 game with a Rabin objective for the Provers. Then, we get rid of the three-player setting, by eliminating one Prover to obtain a two-player PC game with imperfect information. Once we have such a PC game, we focus on getting rid of the imperfect information and work with a parity objective instead of a Rabin one. This allows us to obtain an equivalent two-player zero-sum parity game with perfect information, which is effectively solvable. The techniques used are similar to what can be found in the literature ([21, 22, 41, 42]). However, their settings each differ slightly from the others and ours. Thus, to obtain finer complexity measures, we adapt their techniques to our setting.

The 𝑷𝟏𝑪𝑷𝟐 Game as a Rabin Game.

Given a parity game 𝒢, we show that its corresponding P1CP2 game can be seen as a three-player game with the objective WP for the two Provers translated into a Rabin objective. The approach is to use a deterministic automaton 𝒪 that observes the states of P1CP2(𝒢). Then the synchronized product of the game structure of P1CP2(𝒢) with this observer automaton 𝒪 is equipped with a Rabin objective translating WP and thus leads to the announced three-player Rabin game.

From two Provers to one Prover.

The next step is to merge P1 and P2 into a unique Prover by using a technique inspired from [22]. The main idea is to use imperfect information to ensure merging the two Provers does not grant too much knowledge to the new single Prover. Indeed, if the new Prover had perfect information, he could not simulate Prover 1 truthfully. Thus we let the new Prover have the same level of information as Prover 1 in the P1CP2 game. However, in order to let the new Prover have as much actions available as Prover 2 and stay observation-based, we modify the action set to include all functions from states of P2 to actions of P2. This way, the “merged” Prover preselects actions for each possible Challenger move, effectively encoding both Provers’ strategies via a state-to-action function. This yields an exponential blowup in the number of actions. The Rabin objective remains the same. We obtain the corresponding PC game, denoted PC(𝒢).

Eliminating Imperfect Information.

To solve the two-player game PC(𝒢), we get rid of the imperfect information and then apply standard game-theoretic techniques. In the literature [21, 41, 42], this is usually done in two steps: first, make the objective visible, that is, such that any two similarly observed plays agree on the winning condition, second, apply the subset construction to recall the set of possible visited states, and letting them be observed. This can be done in only one step, by simultaneously modifying the game structure to both entail the subset construction on the states of the PC game, and the product with an automaton that monitors the winning condition along the plays. This allows us to limit the exponential blowups to only one in the size of the state space and the Rabin condition. Essentially, the monitoring automaton is a deterministic parity automaton that is obtained by complementing a non-deterministic Streett-automaton that detects sets of similarly observable plays that contain a losing play for the now unique Prover.

Complexity.

Recall that we set out from a parity game 𝒢=(G,(Wi)iΠ) with |V|, |A|, and |Π| be the size of the game structure G, and |αi| be the size of each parity condition Wi. By the construction mentioned above, we have an equivalence with a Rabin P1CP2 game with a state set of size linear in |V| and |A|, and exponential in |Π|, an action set linear in |A| and exponential in |Π|, while its Rabin objective has a size linear in |Π| and the |αi|. Reducing to only one Prover, we obtain an exponential number of actions compared to the two-Provers version, thus exponential in |V|, |A| and double-exponential in |Π|. As mentioned above, with the removal of imperfect information, we obtain a game with a size of the state set that is now exponential in |V|, |A|, the |αi| and double exponential in |Π|, while the new parity condition is polynomial in |V|,|A|,|αi|,i and exponential in |Π|. Finally, by [20], solving the ultimate parity game, or equivalently the SPE-NCRS problem, is in time exponential in |V|,|A|,|αi|,i, and double-exponential in |Π|. As the game structure of 𝒢0 is action-unique (see Definition 1), it follows that |A||V|2, thus leading to an algorithm in time exponential in |V| and each |αi|, and double-exponential in |Π|. This completes the proof of the complexity upper bound of Theorem 11. For the PSpace lower bound, the QBF-reduction of [26] (Theorem 7) applies here; all NE responses to σ0 in that proof are also SPEs. For the complexity lower bound of Theorem 11 when the number of players is fixed, we adapt the reduction proposed for the NE-NCRS problem in [26], in a way to deal with SPEs instead of NEs.

The Case of Reachability.

A careful analysis of the simpler case of reachability games, when the number of players is fixed, leads to a more fine-tuned solution and a better complexity result: Theorem 11 states a polynomial complexity instead of the exponential complexity of the parity case. This approach follows the same steps as presented above for parity games, but with a few adjustments. The key idea is that monitoring reachability objectives is simpler than parity ones, which means that the synchronized product with an observer automaton 𝒪 can omit some information from the original game structure. Indeed, when monitoring a play in the P1CP2 game that stems from an original reachability game, one still needs to keep track of the actions of both Provers and Challenger, but the checking of the gain component is simplified compared to parity: only the information of whether each player has already visited his target set is to be remembered. This can be exploited to avoid the exponential blowup of the state space in the subset construction phase (with respect to the original state set V) to get rid of imperfect information and thus obtain a better complexity.

 Remark 21.

Suppose that our algorithm establishes the existence of a solution σ0 to the SPE-NCRS problem for parity games. As it is obtained from a memoryless winning strategy in the final parity zero-sum game, we get a finite-memory solution σ0 whose memory size is exponential when the number of players is fixed, doubly exponential otherwise. For a lower bound on the memory required, it is fairly straightforward to show that it requires exponential memory by reducing a Streett zero-sum game to our problem.

5 Conclusion

In this work, we introduce a novel algorithm to solve the SPE-NCRS problem for parity objectives. Unlike previous methods that converted the problem into a model-checking problem for Strategy Logic, our algorithm reduces the SPE-NCRS problem to a three-player zero-sum game with imperfect information, framed as a Prover-Challenger game. This new angle yields improved complexity upper bounds: exponential time in the number of vertices of the game structure and the number of priorities of the parity objectives, doubly exponential time in the number of players. In particular, our algorithm runs within exponential time for a fixed number of players, which is particularly relevant since the number of players is typically small in practical scenarios. Moreover, we establish a lower bound that indicates the impossibility of solving the SPE-NCRS problem in polynomial time unless 𝖯=𝖭𝖯 even for a fixed number of players. For the particular case of reachability objectives, when the number of players is fixed, we prove polynomial complexity like for the NE-NCRS problem [26]. We believe that the Prover-Challenger framework, based on a three-player model with imperfect information, may be applicable for other synthesis challenges beyond our current application.

References

  • [1] Rajeev Alur, Thomas A. Henzinger, Orna Kupferman, and Moshe Y. Vardi. Alternating refinement relations. In Davide Sangiorgi and Robert de Simone, editors, CONCUR ’98: Concurrency Theory, 9th International Conference, Nice, France, September 8-11, 1998, Proceedings, volume 1466 of Lecture Notes in Computer Science, pages 163–178. Springer, 1998. doi:10.1007/BFB0055622.
  • [2] Roderick Bloem, Krishnendu Chatterjee, and Barbara Jobstmann. Graph games and reactive synthesis. In Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem, editors, Handbook of Model Checking, pages 921–962. Springer, 2018. doi:10.1007/978-3-319-10575-8_27.
  • [3] Patricia Bouyer, Romain Brenguier, Nicolas Markey, and Michael Ummels. Pure Nash equilibria in concurrent deterministic games. Log. Methods Comput. Sci., 11(2), 2015. doi:10.2168/LMCS-11(2:9)2015.
  • [4] Romain Brenguier, Lorenzo Clemente, Paul Hunter, Guillermo A. Pérez, Mickael Randour, Jean-François Raskin, Ocan Sankur, and Mathieu Sassolas. Non-zero sum games for reactive synthesis. In Adrian-Horia Dediu, Jan Janousek, Carlos Martín-Vide, and Bianca Truthe, editors, Language and Automata Theory and Applications - 10th International Conference, LATA 2016, Prague, Czech Republic, March 14-18, 2016, Proceedings, volume 9618 of Lecture Notes in Computer Science, pages 3–23. Springer, 2016. doi:10.1007/978-3-319-30000-9_1.
  • [5] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. The complexity of SPEs in mean-payoff games. In Mikolaj Bojanczyk, Emanuela Merelli, and David P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming, ICALP 2022, July 4-8, 2022, Paris, France, volume 229 of LIPIcs, pages 116:1–116:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ICALP.2022.116.
  • [6] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. On the complexity of SPEs in parity games. In Florin Manea and Alex Simpson, editors, 30th EACSL Annual Conference on Computer Science Logic, CSL 2022, February 14-19, 2022, Göttingen, Germany (Virtual Conference), volume 216 of LIPIcs, pages 10:1–10:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.CSL.2022.10.
  • [7] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. Rational Verification for Nash and Subgame-Perfect Equilibria in Graph Games. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science, MFCS 2023, August 28 to September 1, 2023, Bordeaux, France, volume 272 of LIPIcs, pages 26:1–26:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.MFCS.2023.26.
  • [8] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, and Jean-François Raskin. Constrained existence problem for weak subgame perfect equilibria with ω-regular Boolean objectives. Inf. Comput., 278:104594, 2021. doi:10.1016/J.IC.2020.104594.
  • [9] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, Jean-François Raskin, and Marie van den Bogaard. The complexity of subgame perfect equilibria in quantitative reachability games. Log. Methods Comput. Sci., 16(4), 2020. URL: https://lmcs.episciences.org/6883.
  • [10] Thomas Brihaye, Véronique Bruyère, Noémie Meunier, and Jean-François Raskin. Weak subgame perfect equilibria and their application to quantitative reachability. In Stephan Kreutzer, editor, 24th EACSL Annual Conference on Computer Science Logic, CSL 2015, September 7-10, 2015, Berlin, Germany, volume 41 of LIPIcs, pages 504–518. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2015. doi:10.4230/LIPICS.CSL.2015.504.
  • [11] Thomas Brihaye, Véronique Bruyère, and Gaspard Reghem. Quantitative Reachability Stackelberg-Pareto Synthesis is NEXPTIME-Complete. In Olivier Bournez, Enrico Formenti, and Igor Potapov, editors, Reachability Problems - 17th International Conference, RP 2023, Nice, France, October 11-13, 2023, Proceedings, volume 14235 of Lecture Notes in Computer Science, pages 70–84. Springer, 2023. doi:10.1007/978-3-031-45286-4_6.
  • [12] Thomas Brihaye, Julie De Pril, and Sven Schewe. Multiplayer cost games with simple Nash equilibria. In Sergei N. Artëmov and Anil Nerode, editors, Logical Foundations of Computer Science, International Symposium, LFCS 2013, San Diego, CA, USA, January 6-8, 2013. Proceedings, volume 7734 of Lecture Notes in Computer Science, pages 59–73. Springer, 2013. doi:10.1007/978-3-642-35722-0_5.
  • [13] Véronique Bruyère. Computer aided synthesis: A game-theoretic approach. In Émilie Charlier, Julien Leroy, and Michel Rigo, editors, Developments in Language Theory - 21st International Conference, DLT 2017, Liège, Belgium, August 7-11, 2017, Proceedings, volume 10396 of Lecture Notes in Computer Science, pages 3–35. Springer, 2017. doi:10.1007/978-3-319-62809-7_1.
  • [14] Véronique Bruyère. Synthesis of equilibria in infinite-duration games on graphs. ACM SIGLOG News, 8(2):4–29, 2021. doi:10.1145/3467001.3467003.
  • [15] Véronique Bruyère, Baptiste Fievet, Jean-François Raskin, and Clément Tamines. Stackelberg-Pareto synthesis. ACM Trans. Comput. Log., 25(2):14:1–14:49, 2024. doi:10.1145/3651162.
  • [16] Véronique Bruyère, Christophe Grandmont, and Jean-François Raskin. As soon as possible but rationally. In Rupak Majumdar and Alexandra Silva, editors, 35th International Conference on Concurrency Theory, CONCUR 2024, September 9-13, 2024, Calgary, Canada, volume 311 of LIPIcs, pages 14:1–14:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.CONCUR.2024.14.
  • [17] Véronique Bruyère, Jean-François Raskin, and Clément Tamines. Pareto-Rational Verification. In Bartek Klin, Slawomir Lasota, and Anca Muscholl, editors, 33rd International Conference on Concurrency Theory, CONCUR 2022, September 12-16, 2022, Warsaw, Poland, volume 243 of LIPIcs, pages 33:1–33:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.CONCUR.2022.33.
  • [18] Véronique Bruyère, Stéphane Le Roux, Arno Pauly, and Jean-François Raskin. On the existence of weak subgame perfect equilibria. Inf. Comput., 276:104553, 2021. doi:10.1016/J.IC.2020.104553.
  • [19] Véronique Bruyère, Jean-François Raskin, Alexis Reynouard, and Marie Van Den Bogaard. The non-cooperative rational synthesis problem for subgame perfect equilibria and omega-regular objectives, 2024. doi:10.48550/arXiv.2412.08547.
  • [20] Cristian S. Calude, Sanjay Jain, Bakhadyr Khoussainov, Wei Li, and Frank Stephan. Deciding parity games in quasi-polynomial time. SIAM J. Comput., 51(2):17–152, 2022. doi:10.1137/17M1145288.
  • [21] Krishnendu Chatterjee and Laurent Doyen. The complexity of partial-observation parity games. In Christian G. Fermüller and Andrei Voronkov, editors, Logic for Programming, Artificial Intelligence, and Reasoning - 17th International Conference, LPAR-17, Yogyakarta, Indonesia, October 10-15, 2010. Proceedings, volume 6397 of Lecture Notes in Computer Science, pages 1–14. Springer, 2010. doi:10.1007/978-3-642-16242-8_1.
  • [22] Krishnendu Chatterjee and Laurent Doyen. Games with a weak adversary. In Javier Esparza, Pierre Fraigniaud, Thore Husfeldt, and Elias Koutsoupias, editors, Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part II, volume 8573 of Lecture Notes in Computer Science, pages 110–121. Springer, 2014. doi:10.1007/978-3-662-43951-7_10.
  • [23] Krishnendu Chatterjee, Laurent Doyen, Emmanuel Filiot, and Jean-François Raskin. Doomsday equilibria for omega-regular games. Inf. Comput., 254:296–315, 2017. doi:10.1016/J.IC.2016.10.012.
  • [24] Krishnendu Chatterjee and Thomas A. Henzinger. Assume-guarantee synthesis. In Orna Grumberg and Michael Huth, editors, Tools and Algorithms for the Construction and Analysis of Systems, 13th International Conference, TACAS 2007, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2007 Braga, Portugal, March 24 - April 1, 2007, Proceedings, volume 4424 of Lecture Notes in Computer Science, pages 261–275. Springer, 2007. doi:10.1007/978-3-540-71209-1_21.
  • [25] Krishnendu Chatterjee, Thomas A. Henzinger, and Marcin Jurdzinski. Games with secure equilibria. Theor. Comput. Sci., 365(1-2):67–82, 2006. doi:10.1016/J.TCS.2006.07.032.
  • [26] Rodica Condurache, Emmanuel Filiot, Raffaella Gentilini, and Jean-François Raskin. The complexity of rational synthesis. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, volume 55 of LIPIcs, pages 121:1–121:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.ICALP.2016.121.
  • [27] Rodica Condurache, Youssouf Oualhadj, and Nicolas Troquard. The complexity of rational synthesis for concurrent games. In Sven Schewe and Lijun Zhang, editors, 29th International Conference on Concurrency Theory, CONCUR 2018, September 4-7, 2018, Beijing, China, volume 118 of LIPIcs, pages 38:1–38:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPICS.CONCUR.2018.38.
  • [28] Emmanuel Filiot, Raffaella Gentilini, and Jean-François Raskin. The Adversarial Stackelberg Value in Quantitative Games. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), volume 168 of Leibniz International Proceedings in Informatics (LIPIcs), pages 127:1–127:18, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2020.127.
  • [29] Dana Fisman, Orna Kupferman, and Yoad Lustig. Rational synthesis. In Javier Esparza and Rupak Majumdar, editors, Tools and Algorithms for the Construction and Analysis of Systems, 16th International Conference, TACAS 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings, volume 6015 of Lecture Notes in Computer Science, pages 190–204. Springer, 2010. doi:10.1007/978-3-642-12002-2_16.
  • [30] Julian Gutierrez, Muhammad Najib, Giuseppe Perelli, and Michael J. Wooldridge. Automated temporal equilibrium analysis: Verification and synthesis of multi-player games. Artif. Intell., 287:103353, 2020. doi:10.1016/J.ARTINT.2020.103353.
  • [31] Julian Gutierrez, Muhammad Najib, Giuseppe Perelli, and Michael J. Wooldridge. On the complexity of rational verification. Ann. Math. Artif. Intell., 91(4):409–430, 2023. doi:10.1007/S10472-022-09804-3.
  • [32] Thomas A. Henzinger, Orna Kupferman, and Sriram K. Rajamani. Fair simulation. In Antoni W. Mazurkiewicz and Józef Winkowski, editors, CONCUR ’97: Concurrency Theory, 8th International Conference, Warsaw, Poland, July 1-4, 1997, Proceedings, volume 1243 of Lecture Notes in Computer Science, pages 273–287. Springer, 1997. doi:10.1007/3-540-63141-0_19.
  • [33] Harold W. Kuhn. Extensive games and the problem of information. Classics in Game Theory, pages 46–68, 1953.
  • [34] Orna Kupferman, Giuseppe Perelli, and Moshe Y. Vardi. Synthesis with rational environments. Ann. Math. Artif. Intell., 78(1):3–20, 2016. doi:10.1007/S10472-016-9508-8.
  • [35] Orna Kupferman and Noam Shenwald. The complexity of LTL rational synthesis. In Dana Fisman and Grigore Rosu, editors, Tools and Algorithms for the Construction and Analysis of Systems - 28th International Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2-7, 2022, Proceedings, Part I, volume 13243 of Lecture Notes in Computer Science, pages 25–45. Springer, 2022. doi:10.1007/978-3-030-99524-9_2.
  • [36] Noémie Meunier. Multi-Player Quantitative Games: Equilibria and Algorithms. PhD thesis, Université de Mons, July 2016.
  • [37] Robin Milner. An algebraic definition of simulation between programs. In D. C. Cooper, editor, Proceedings of the 2nd International Joint Conference on Artificial Intelligence. London, UK, September 1-3, 1971, pages 481–489. William Kaufmann, 1971. URL: http://ijcai.org/Proceedings/71/Papers/044.pdf.
  • [38] Fabio Mogavero, Aniello Murano, Giuseppe Perelli, and Moshe Y. Vardi. What makes Atl* decidable? A decidable fragment of strategy logic. In Maciej Koutny and Irek Ulidowski, editors, CONCUR 2012 - Concurrency Theory - 23rd International Conference, CONCUR 2012, Newcastle upon Tyne, UK, September 4-7, 2012. Proceedings, volume 7454 of Lecture Notes in Computer Science, pages 193–208. Springer, 2012. doi:10.1007/978-3-642-32940-1_15.
  • [39] Martin J. Osborne and Ariel Rubinstein. A course in Game Theory. MIT Press, Cambridge, MA, 1994.
  • [40] Gary L. Peterson and John H. Reif. Multiple-person alternation. In 20th Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, 29-31 October 1979, pages 348–363. IEEE Computer Society, 1979. doi:10.1109/SFCS.1979.25.
  • [41] Jean-François Raskin, Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Algorithms for omega-regular games with imperfect information. Log. Methods Comput. Sci., 3(3), 2007. doi:10.2168/LMCS-3(3:4)2007.
  • [42] John H. Reif. The complexity of two-player games of incomplete information. J. Comput. Syst. Sci., 29(2):274–301, 1984. doi:10.1016/0022-0000(84)90034-5.
  • [43] Michael Ummels. Rational behaviour and strategy construction in infinite multiplayer games. In S. Arun-Kumar and Naveen Garg, editors, FSTTCS 2006: Foundations of Software Technology and Theoretical Computer Science, 26th International Conference, Kolkata, India, December 13-15, 2006, Proceedings, volume 4337 of Lecture Notes in Computer Science, pages 212–223. Springer, 2006. doi:10.1007/11944836_21.
  • [44] Michael Ummels. The complexity of Nash equilibria in infinite multiplayer games. In Roberto M. Amadio, editor, Foundations of Software Science and Computational Structures, 11th International Conference, FOSSACS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29 - April 6, 2008. Proceedings, volume 4962 of Lecture Notes in Computer Science, pages 20–34. Springer, 2008. doi:10.1007/978-3-540-78499-9_3.

Appendix A Helpful Examples for Games on Graphs

In this appendix, we present additional examples and explanations for notions referred to in Section 2.

Figure 3: Two NEs and one SPE.
Example 22 (A simple reachability game).

Consider the game structure G pictured in Figure 3. Its initial state is v0 and there are two players, player 0, who owns the circle states, and player 1, who owns the square state. Transitions are represented by the arrows between states, such that there exists an arrow from a state v to another state v if there exists an action a (either left or right here) such that δ(v,a)=v. The set of actions A is thus partitioned into A0={,r} and A1={,r}.

Both players have the same reachability objective, with T0=T1={v3}, denoted in the figure by the fact that state v3 is double-circled. None of the two players have a winning strategy: player 0 can prevent player 1 to ever reach state v3 by choosing to go to v4, v5 or v6, depending on the first action of player 1, while player 1 can prevent player 0 from winning by going to state v2.

Example 23 (A game with two NEs).

Let us come back to the reachability game of Figure 3. Two strategy profiles are represented in Figure 3: the first one, σ¯, by the red transitions between states and the second one, σ¯, by the blue ones. Both profiles are NEs. Indeed, for the red profile with outcome v0rv2r(v6)ω, if player 0 deviates from σ0, the resulting play v0rv2(v5)ω is still losing for him, since player 1 chooses to go to state v2. On the other hand, if player 1 deviates from σ1, the resulting play is v0v1r(v4)ω, since player 0 chooses to go to state v4, and this play is losing for player 1. Thus, none of the players have a profitable deviation from σ¯. Notice that the gain profile of the red profile is equal to (0,0). Similarly, one can easily check that the blue profile is also an NE with a gain profile equal to (1,1).

In the red profile σ¯ of the previous example, the possible choice of going from v1 to v4 for player 0 is irrational: after all, his objective is to reach state v3. Notice, however, that this behavior is part of the NE σ¯.

Example 24 (A game with an SPE).

Consider again the game in Figure 3. In the subgame 𝒢v0v1, starting after the history v0v1, the red profile is not an NE. Indeed, choosing to go to state v4 is a non-credible threat from player 0, as his target set {v3} is accessible from state v1, which he owns. The player 0 strategy that goes to v3 from v1 is thus a profitable deviation in the subgame 𝒢v0v1. Therefore, the red profile is not an SPE in 𝒢. On the other hand, one can check that the blue profile is an SPE in 𝒢, as it is an NE in every of its subgames.

Example 25 (A solution to the SPE-NCRS problem).

The answer to the SPE-NCRS problem for the game in Figure 3 is positive. Indeed, consider the strategy σ0 of player 0 that chooses action in both states v1 and v2. The unique subgame-perfect response σ¯0 for player 1 is to choose action in v0. The resulting strategy profile is thus the blue one whose outcome is winning for player 0.

Appendix B SPE-NCRS Problem and 𝑷𝟏𝑪𝑷𝟐 Game

In this appendix, we provide additional material for Section 3.

B.1 Imperfect Information

Example 26 (Observation).

In Figure 4, a two-player game structure is pictured, together with the observations of player 0, who owns the circle states (we suppose that player 1 has perfect information). Each state is divided in two sections: on the left, with a white background, the name of the state is displayed, while on the right side, with a gray background, the observation of player 0 is displayed. Similarly, the actions and r of player 1, who owns the square state, are accompanied by a # on their right side, which is an abuse of notation to mean {,r}, that is, player 0 cannot distinguish between the two possible actions of player 1. Notice that from states owned by player 0, the actions do not have an observation attached to them, as player 0 knows his own actions, by hypothesis.

Let us now look at which histories player 0 can or cannot distinguish. From v0, player 0 cannot distinguish between the actions of player 1, and both states v1 and v2 have the same observation set {v1,v2}. Therefore, the two histories v0v1 and v0rv2 are indistinguishable for player 0, i.e., 𝒪bs(v0v1)=𝒪bs(v0rv2). Similarly, we have 𝒪bs(v0v1v3)=𝒪bs(v0rv2v5).

Figure 4: Imperfect information in a reachability game.
Example 27 (Observation-based strategy).

Let us come back to the game structure of Figure 4. Suppose that the objective of player 0 is to reach the target set {v3}. One can verify that player 0 does not have an observation-based winning strategy. Indeed, as noted in Example 26, player 0 cannot distinguish between histories h1=v0v1 and h2=v0rv2. By definition, any observation-based strategy must prescribe the same action from both h1 and h2. If this action is r, then both resulting plays are losing for player 0. If it is , then one of the resulting play is winning, while the other is losing. Thus, no observation-based strategy ensures player 0 to reach {v3}.

B.2 Details on the 𝑷𝟏𝑪𝑷𝟐 Game Structure

Definition 28 (Objectives of the P1CP2 game).

Let ρ=v0a0v1a1v2a2 be a play in the game structure of P1CP2(𝒢). Let 𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ) be its simulated gain.

  • The play ρ is winning for Challenger if one of the following conditions is satisfied:

    (iC)

    there exist n and g{0,1}|Π|, such that for every state vk with k>n, if vk is a player-state, then it is of the form (vk,ik,g¯k) such that its player-component ik equals and its gain-component satisfies

    g¯k=𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ),
    (iiC)

    there exists a player i0 such that

    • for every n, there exists k>n for which vk is a player-state with its player-component being equal to i,

    • there exists n such that for every state vk with k>n, if vk=(vk,ik,g¯k) is a player-state with ik, then its player-component ik equals i and the i-th component g¯k,i of its gain-component g¯k satisfies

      g¯k,i𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ)i,
    (iiiC)

    there exist two distinct players i,j0 such that for every n, there exist k,>n for which vk and v are player-states, with their player-component being respectively i and j.

    The set of plays satisfying one of these conditions is denoted WC.

  • The play ρ is winning for Prover 1 and Prover 2 if one of the following conditions is satisfied:

    (iP)

    there exist n and g{0,1}|Π|, such that for every state vk with k>n, if vk is a player-state, then it is of the form (vk,ik,g¯k) such that its player-component ik equals and its gain-component satisfies

    g¯k𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ),
    (iiP)

    there exists a player i0 such that

    • for every n, there exists k>n for which vk is a player-state with its player-component being equal to i,

    • there exists n such that for every state vk with k>n, if vk=(vk,ik,g¯k) is a player-state with ik, then its player-component ik equals i and the i-th component g¯k,i of its gain-component g¯k satisfies

      g¯k,i<𝗌𝗂𝗆𝖦𝖺𝗂𝗇(ρ)i,

    The set of plays satisfying one of these conditions is denoted WP.

 Remark 29.

We have WP1=WP2=WP and WP=𝖯𝗅𝖺𝗒𝗌GWC.