The Non-Cooperative Rational Synthesis Problem for SPEs and -Regular Objectives
Abstract
This paper studies the rational synthesis problem for multi-player games played on graphs when rational players are following subgame perfect equilibria. In these games, one player, the system, declares his strategy upfront, and the other players, composing the environment, then rationally respond by playing strategies forming a subgame perfect equilibrium. We study the complexity of the rational synthesis problem when the players have -regular objectives encoded as parity objectives. Our algorithm is based on an encoding into a three-player game with imperfect information, showing that the problem is in 2ExpTime. When the number of environment players is fixed, the problem is in ExpTime and is NP- and coNP-hard. Moreover, for a fixed number of players and reachability objectives, we get a polynomial algorithm.
Keywords and phrases:
non-zero-sum games, subgame perfect equilibria, rational synthesisFunding:
Véronique Bruyère: supported by FNRS under PDR Grant T.0023.22 (Rational).Copyright and License:
2012 ACM Subject Classification:
Theory of computationEditors:
Patricia Bouyer and Jaco van de PolSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Studying non zero-sum games played on graphs of infinite duration with multiple players [4, 13] poses both theoretical and algorithmic challenges. This paper primarily addresses the (non-cooperative) rational synthesis problem for -player non zero-sum games featuring -regular objectives. In this context, the goal is to algorithmically determine the existence of a strategy for the system (also called player 0) to enforce his objective against any rational response from the environment (players ). So, rational synthesis supports the automatic synthesis of systems wherein the environment consists of multiple agents having each their own objective. These agents are assumed to act rationally towards their own objective rather than being strictly antagonistic (to the system). This approach contrasts with the simpler scenario of zero-sum two-player game graphs, the fully antagonistic setting, a framework extensively explored in earlier reactive synthesis research, see [2] and the numerous references therein. While the computational complexity of rational synthesis, where rationality is defined by the concept of Nash equilibrium (NE), has been explored in [26], this paper revisits the rational synthesis problem using the more encompassing definition of subgame perfect equilibrium (SPE) to formalize rationality. Nash equilibria (NEs) have a known limitation in sequential games, including the infinite-duration games on graphs that we consider here: they are prone to non-credible threats. Such threats involve decisions within subgames (potentially reached after a deviation from the equilibrium) that, while not rational, are intended to coerce other players into specific behaviors. To address this limitation, the concept of subgame-perfect equilibrium (SPE) was introduced, as discussed in [39]. An SPE is a profile of strategies that form Nash equilibria in every subgame, thereby preventing non-rational and thus non-credible threats. Although SPEs align more intuitively with sequential games, their algorithmic treatment in the context of infinite-duration graph games remains underdeveloped. This gap persists primarily because SPEs require more complex algorithmic techniques compared to NEs. Moreover, the standard backward induction method used in finite duration sequential games [33] cannot be directly applied to infinite duration games due to the non-terminating nature of these interactions.
Kupferman et al. introduced rational synthesis in two distinct forms. The first approach, dubbed cooperative rational synthesis [29], considers an environment working collaboratively with the system to determine whether a specific SPE that ensures the specification of the system is met. So, in this model, the agents of the environment engage in an SPE that guarantees a win for player 0, provided such an equilibrium exists. At the opposite, the second approach, termed non-cooperative rational synthesis (NCRS) [34], grants the environment greater flexibility. Here, the system first selects a fixed strategy and then the environmental agents respond by selecting any strategy that form an SPE with the fixed strategy of the system. The central algorithmic challenge is determining whether there exists a strategy for the system such that every resulting -fixed SPE ensures the specification of the system is upheld (SPE-NCRS problem). The computational complexity of decision problems for the cooperative synthesis problem with subgame-perfect equilibria (SPE) is now well understood. While the decidability of this problem was first established in [43], its exact complexity was resolved in [6], where the problem was shown to be NP-complete for parity objectives.
In contrast, the computational complexity of the SPE-NCRS problem remains less thoroughly investigated. Although the decidability of this problem can be determined through an encoding in Strategy Logic [38], such an encoding does not provide clear insights into the effective construction of the strategy and is suboptimal from an algorithmic standpoint. For example, for rational environment behaviors modeled by Nash equilibria (NEs) instead of SPEs, the NE-NCRS problem can be solved in 3ExpTime for LTL objectives using a Strategy Logic encoding [34], while it can be solved in 2ExpTime through the use of tree automata [35]. This shows that using a reduction to Strategy Logic does not deliver optimal worst-case complexity. Addressing this gap, our contribution is twofold. First, we introduce an innovative algorithm that transforms the SPE-NCRS problem into a three-player imperfect information game. Such games were analyzed in [22], where computationally optimal algorithms for their analysis were presented. Second, our reduction offers a clear advantage over the Strategy Logic encoding, providing improved complexity (double exponential time as opposed to triple exponential time for LTL objectives in [34]). It also enables a precise analysis of the algorithmic complexity when the number of players is fixed, a consideration that is practically relevant in cases where the environment consists of a limited number of players.
Technical contributions and sketch of our solution.
The summary of our technical contributions is depicted in Figure 1. Our main result shows how to transform the SPE-NCRS problem into a two-player zero-sum parity game with perfect information (Theorem 11), a well-studied class of games for which algorithms are available. (Thus, our algorithm supports LTL specifications via their translation into deterministic parity automata, yielding parity conditions on the game graph). The transformation is structured in several non-trivial steps.
First, to solve the SPE-NCRS problem, we propose to use the Prover-Challenger framework initially introduced for the development of algorithms capable of determining the presence of a simulation relation between transition systems (for more details and extensions of this concept, see e.g. [1, 32, 37]). However, in our context, we need to use two Provers and with the Challenger : aims to demonstrate the existence of a solution, i.e., a strategy for the system, to the SPE-NCRS problem, while seeks to counter this assertion, i.e., with a subgame perfect response that results in an unfavorable outcome for player 0. Then, endeavors to demonstrate that the combined profile is either not a -fixed SPE or its outcome favors player . To ensure that is fixed and cannot be modified in subgames, we prevent from adjusting his strategy based on the interactions between and by imposing to him imperfect information of the game. More intuition and the formal definition of this game are given in Section 3 together with a proof of correctness (Theorem 20).
Second, we detail our method for solving the game. Given that this game involves three players with imperfect information, specialized techniques are essential for its resolution as multi-player games with imperfect information are undecidable in general (see e.g. [40]). We employ a solution specifically adapted for our context, derived from a transformation introduced in [22]. This transformation was originally proposed for addressing similar types of three-player games with imperfect information. Due to the intricate winning condition present in our Prover-Challenger reduction, we translate it into an explicit Rabin objective. After that transformation of the winning condition, the aforementioned techniques for three-player games with imperfect information can be adapted. As a result of this transformation, we obtain a more conventional two-player zero-sum Rabin game with imperfect information.
Third, techniques to remove imperfect information, see e.g. [41, 42], can then be used to obtain the desired two-player zero-sum parity game with perfect information. Therefore, solving the SPE-NCRS problem reduces to solving this two-player zero-sum parity game for which algorithms are well-known.
In Section 4, we provide a complexity analysis of each step of our construction. This analysis enables us to derive detailed complexity results: solving the SPE-NCRS problem for parity objectives is exponential in the size of the graph and the number of priorities used by the parity objectives, and double-exponential in the number of players (while a PSpace lower bound can be deduced from [26]). Consequently, when the number of players is fixed, we achieve membership in ExpTime. Furthermore, by adapting proofs from the NE-NCRS problem [27], we establish NP- and co-NP-hardness (the NE-NCRS problem is in PSpace and both NP- and co-NP-hard, even for a fixed number of players). Finally, for the specific case of reachability objectives, we obtain a polynomial-time algorithm when the number of players is fixed, as is the case for the NE-NCRS problem [27].
Related work.
Recent literature, such as surveys in [4, 13, 14], underscores a growing interest in non zero-sum games for synthesis and design of multi-agents systems. Algorithms have been developed for reasoning about NEs in graph games, both for -regular [44] and quantitative objectives [12], or even in a concurrent setting [3]. The concept of secure equilibrium, a refinement of NE, was introduced in [25] and its potential for synthesis was demonstrated in later studies [24]. Similarly, doomsday equilibria, an expansion of secure equilibria for -player games, is elaborated upon in [23]. All those results are related to the notion of NE. Algorithms to reason on SPEs are more recent and usually more intricate. For that reason, progresses on weaker notions like weak SPE [8, 10, 18] have been needed before establishing tight complexity bounds for SPEs. For SPEs, the exact complexity for the constrained existence for reachability games was established in [9], for parity games in [6], and for mean-payoff games in [5]. All those works introduce new algorithms that are substantially more sophisticated than those needed to study the notion of NE. As mentioned previously, the SPE-NCRS problem with LTL objectives can be addressed by reducing it to the model-checking problem of Strategy Logic, as shown in [34]. However, this encoding results in a triple-exponential complexity, even when the number of players is fixed. Consequently, this approach does not allow for a fine-grained complexity analysis when the number of players is treated as a fixed parameter, and so it cannot yield the ExpTime complexity we have achieved in this case, nor the PTime complexity for a fixed number of players with reachability objectives. Cooperative rational synthesis was first introduced in [29]. The adversarial version was later introduced in [34]. In both cases, as mentioned above, the decidability results were given through a reduction to Strategy Logic [38]. A more detailed analysis of the complexity for a large variety of -regular objectives and for rationality expressed as NEs was given in [26] for turn-based games and in [27] for concurrent games, for LTL objectives in [35], and for two players and for mean-payoff and discounted-sum objectives in [28]. Those results do not cover SPEs as we do in this paper. Rational verification (instead of rational synthesis) studies the problem of verifying that a given strategy for the system is a solution to the NE/SPE-NCRS problem. The complexity of this simpler problem has been studied for several kinds of objectives in [7, 30, 31]. Another notion of rational environment behavior treats the environment as a single agent but with multiple, sometimes conflicting, goals, aiming for behaviors that achieve a Pareto-optimal balance among these objectives. Both rational synthesis and verification have been very recently studied for this concept of rationality [11, 15, 16, 17].
2 Preliminaries
In this section, we recall the necessary notions and concepts underpinning this work: First, we specify the model of games on graphs that we study, along with the corresponding definitions of plays, (winning) strategies, and objectives. Second, we present the solution concepts relevant to our non-zero-sum model and specific problem, ranging from Nash equilibria to -fixed subgame-perfect equilibria. Finally, we provide a precise statement of the SPE-NCRS problem (see Definition 10) and our main results (see Theorem 11).
Definition 1 (Game structure).
A game structure is a tuple , where:
-
is a finite set of players,
-
is the set of actions of the players, such that is the action set of player , , and ,
-
is the set of states, such that is the state set of player , , and for all ,
-
is the initial state,
-
is a partial function called the transition function, such that:
-
1.
is deadlock-free: for every state , there exists an action of player , such that is defined,
-
2.
is action-unique: for every state , and for all actions of player , we have .
-
1.
The size of a game structure is given by the numbers , , and of its vertices, actions, and players respectively.
We say that a state is controlled or owned by player . Note that the condition means that one knows from the actions when player is the one that is playing. This property of the action set of player is not classical nor necessary but will reveal itself useful in the remaining of the paper. Indeed, in the setting we study here, player has a distinguished role compared to the other players, see Section 3. Condition on the transition function ensures that in every state, there is always a possible action to play. Condition requires that a transition between two states can only be achieved via a unique action.
A play in a game structure is an infinite sequence of states and actions, such that is of the form with the initial state and where for every , we have . The set of all plays in is denoted . A history is a finite prefix of a play ending in a state of , that we describe as . The set of all histories in is denoted , while for , the set of histories ending in a state controlled by player is denoted . We write if the history is a prefix of the play . We also use both notations and for two histories.
A strategy for player is a function , that prescribes an action for player to choose for every history where it is his turn to play. The set of all strategies for player is denoted by . A collection of strategies, one for each player, is called a profile of strategies. A play is compatible with a strategy of player if for every such that , we have . Histories compatible with a strategy are defined similarly. A strategy is said to be memoryless when the prescribed action only depends on the last visited state, that is, for every , we have . Given a profile of strategies , there is a unique play starting in that is compatible with every strategy of the profile, that we call the outcome of and denote it by . Given a strict subset of players , we write to refer to a partial profile of strategies that contains a strategy for each player except the ones in . In particular, we will often focus on the strategy of one player, say , and the profile of strategies of the rest of the players, and use the notation to denote the complete profile of strategies.
Definition 2 (Objective and game).
A winning condition, or objective for player is a subset of plays in the game structure . We say that a play is winning for player or satisfies his objective if . Otherwise, we say that is losing for player . A game is a pair consisting in a game structure , together with a profile of objectives for the players. When the context is clear, we often only write to designate a game. Given a strategy profile , its gain profile (or simply gain) is the Boolean vector such that for all , we have if , and if . We also say that is the gain profile of the outcome of .
In this paper, we consider the concept of parity objective (which is expressively complete for -regular objectives), as well as the particular case of reachability objective. Both concepts are defined in the following way:
-
For each , let be a target set. The reachability objective for player is then that we also denote by . Given such a profile of target sets, we say that is a reachability game.
-
For each , let be a finite set of priorities and be a priority function, that is, a function that assigns a priority to each state of the game. The parity objective for player is then 111For a play , notation means . that we also denote by . Given such a profile of priority functions, we say that is a parity game. The size of each parity objective, denoted by , is the maximum priority .
Remark 3.
Note that the objectives we consider in this work only put constraints on the states of the plays (and not on the actions).
Definition 4 (Winning strategy).
Let be a game structure and be an objective for player . A strategy of player is winning for , if for every profile , the outcome of the profile is winning for player , that is, .
This definition focuses on only. A winning strategy ensures that player satisfies his objective against any strategy profile of the other players. In particular, it ensures that player wins even if the other players are strictly antagonistic, or adversarial, that is, their objective is to make player lose. This context corresponds to the classical zero-sum setting, and we use notation for the game, since the objective , for all players , is implicitly defined as . An example of reachability game and a discussion on the winning strategies of both players are given in Appendix A.
Within the classical zero-sum context, the synthesis problem asks if there exists, for a particular player, a winning strategy (see [2] for an introduction). As recalled in the preliminaries (cf. Definition 4), such a strategy ensures that his player wins against all possible strategies of the other players. However, when we depart from this adversarial hypothesis and consider the richer setting of games that are non zero-sum, that is, where each player has his own objective, which may overlap with the objectives of the others, the solution concept of winning strategies shows its limits. Hence the call to solution concepts such as Nash equilibrium or subgame-perfect equilibrium, on which we focus in this work.
Definition 5 (Nash equilibrium [39]).
Let be a game and be a strategy profile. We say that is a Nash equilibrium (NE for short) if for every player and strategy , we have .
In other words, no player has an incentive to deviate unilaterally from its fixed strategy in the profile : if he does so, the resulting outcome will not satisfy his objective if the outcome of was not already doing so. Notice that, however, there is no constraint on what players are allowed to do once a deviation has already occurred in the outcome of the NE. In fact, players may change their initial behaviours for completely adversarial choices. That is, NEs allow players to forget about their own objectives once the equilibrium outcome has been left, which can result in non-credible threats: promise of irrational behavior with respect to their own objectives. These non-credible threats are one important limitation of NEs as a solution concept that captures rationality of the players in sequential games. In order to enforce rationality in every scenario, even the ones that stem from deviations, one needs to monitor what the strategy profile prescribes after every history. This is exactly what subgame perfect equilibria (defined below) do [39]. Let be a game. To each history , with , corresponds a subgame that is the game starting after the history : its plays start at the initial state and for all , is winning for player if, and only if, . Given a strategy for player , this strategy in is defined as for all histories starting with the initial vertex . Given a strategy profile , we denote by the profile (note that its outcome starts in ).
Definition 6 (Subgame perfect equilibrium).
Let be a game. A subgame perfect equilibrium (SPE for short) is a profile of strategies such that is an NE in each subgame of .
We illustrate both concepts of NE and SPE on an example in Appendix A.
In this work, we focus on synthesizing a strategy for a specific player (see Definition 10), referred to as the system (or player 0), which we assume to be trustworthy or whose strategy is fixed by design. For instance, this assumption is justified in scenarios where the system is implemented as a program that is subsequently used by its environment (the other players). Since the program is fixed, the system cannot deviate from its prescribed strategy. As stated at the beginning of this section, we consider the non zero-sum setting, where each player has their own objective, which may partially overlap with the objectives of other players. Consequently, we seek a solution concept that addresses two key aspects: (1) the specific player (the system) will not deviate from the prescribed strategy, and (2) the other players may exhibit adversarial behavior toward the specific player, as long as such behavior does not compromise their own objectives, given that they are assumed to be rational [34]. We next define -fixed equilibria, which meet these requirements.
Definition 7 (0-Fixed NE).
Let be a game. A strategy profile is a -fixed NE, if for every player , where , and every strategy , we have .
In other words, if the strategy of player is fixed, that is, if we assume player will stick to his strategy, no other player has an incentive to deviate unilaterally. We also write that a strategy profile is a -fixed NE to insist on the fixed strategy of player .
In this paper, we focus on 0-fixed SPEs: profiles of strategies that are 0-fixed NE in each subgame of compatible with the fixed strategy of player . Formally:
Definition 8 (0-Fixed SPE).
Let be a game. A strategy profile is a -fixed SPE if for every history compatible with , the profile is a -fixed NE is the subgame .
As for NEs, we use the terminology of -fixed SPE. Furthermore, given a strategy , we say that the strategy profile is a subgame-perfect response to if the complete profile is a -fixed SPE. The next theorem guarantees the existence of an SPE in every reachability or parity game. This result also holds for -fixed SPEs [43].
Theorem 9 (Existence of (0-fixed) SPEs).
Given a game that is a parity game or a reachability game, there always exists an SPE in , and, for every strategy of player , there always exists a -fixed SPE in .
In this paper, we study the following synthesis problem. Intuitively, the non-cooperative rational synthesis problem asks, for a distinguished player, if there exists a strategy that is as close as possible to a winning strategy, taking into account the fact that the others are behaving rationally. The solutions seeked in this problem are indeed strategies that ensure winning for every possible way to complete the profile in a rational manner (i.e., other players do not cooperate but take care of their own objectives first).
Definition 10 (Non-cooperative rational synthesis problem).
The non-cooperative rational synthesis problem (SPE-NCRS problem for short), asks, given a game , if there exists a strategy of player , such that, for every subgame-perfect response , the resulting outcome is winning for player .
The previous definition is illustrated with an example in Appendix A.
We now state our main results.
Theorem 11 (Complexity of the SPE-NCRS problem).
The SPE-NCRS problem for parity games is in 2ExpTime and PSpace-hard. Given , this problem is solvable in time exponential in and each , , and double-exponential in .
Furthermore, if the number of players is fixed, the SPE-NCRS problem is in ExpTime, and NP-hard and co-NP-hard for parity games; and it is solvable in time polynomial in for reachability games.
3 SPE-NCRS Problem and Game
3.1 Overview of our solution
One may notice the particular role player has in the setting of the SPE-NCRS problem: one needs to find, if it exists, a strategy of player that ensures him to win for all possible partial profiles such that is a -fixed SPE. Our approach is to construct, in the spirit of [36], a zero-sum two-player Prover-Challenger game that will, once solved, deliver the answer. The key idea in this approach is that Prover is set to prove that there exists a solution to the SPE-NCRS problem, while Challenger wants to disprove this claim. However, the special role of player in the original game must be taken into account. To this end, we need to split Prover into a coalition of two Provers, Prover and Prover (we later show, in Section 4, how to come back to the two-player model). This game is called game and is denoted by .
Let us give some intuition. The three players , , and , proceed to construct, step by step and with some additional interactions, a play in , that simulates a play in (see Definition 15 below). Each player has a specific part to play in the construction of this simulating play. As mentioned previously, player has the particular task to exhibit a candidate solution strategy to the SPE-NCRS problem for . This is done by simulating player playing and not deviating from it. Thus, along the play in being constructed, whenever the corresponding state in the simulated play of belongs to player , it is then up to to choose the next state. One of the tasks of player is to try to complete with a subgame perfect response from the other players in . Thus, whenever the corresponding state in the simulated play of belongs to a player , it is player ’s turn to play in : he proposes to play an action according to some strategy of player in and in addition predicts the gain profile of what will be the outcome of the profile , that is, of the simulated play being constructed. However, since players other than player in can deviate, whenever has made a proposal for a next state, he has to let player have a say: can either accept this proposal or refuse it and deviate on behalf of player by choosing another state as the next one in the simulating play. This phase is called the decision phase. If this phase results in a deviation, the game proceeds to the adjusting phase: since the current subgame has changed, Challenger has to predict the gain profile of in this subgame, that shows the deviation of player was not a profitable one. Then, the play in resumes to the construction phase again.
How do we ensure that is fixed? In other words, what restricts from adapting his strategy to the way and interact, but only according to the so far constructed simulated history in ? The solution we choose is to make use of imperfect information: in the game, cannot, in fact, distinguish between all the states. To model the partial view has on the states of the game, we speak of observations. Informally, to each state (and action) is associated an observation, and has to take action upon sequences of observations only (that we call observed histories). In particular, it is assumed that the strategies should respect the observations, in the sense that two different histories yielding the same sequence of observations should trigger the same action of . Thus, we only consider these observation-based strategies as possible behaviors of . We now proceed to the formal definition of the game. In Section 3.4, we show that solving the SPE-NCRS problem is equivalent to solving this game in a sense stated in Theorem 20.
3.2 Observation Functions
We begin by introducing the concepts of observation and of observation-based strategy. Let be a game structure and player be some player in . Let be a partition of , that defines an observation function for player , that is, for every state , we have . Let then be a partition of , that extends the previous function to actions, that is, for every action , we have . The function extends to histories and plays in the straightforward way. We say that player observes G through if he cannot distinguish between states (resp. actions, histories, plays) that yield the same observation via the function . For those states or actions that are the unique element of their observation set, we say that they are visible. We say that is the identity function if for all and for all . Whenever player observes the game structure through the identity function, we say that player has perfect information. If it is through any other function, we say that he has imperfect information. We will always assume that a player sees his own actions perfectly (that is, his observation function restricted to the domain of his action set corresponds to the identity function). Furthermore, we assume all players have perfect recall: they remember the full sequence of observations they witnessed from the start of the play. In the sequel, we only consider player-stable observation functions such that the last states of the histories observed similarly all belong to the same player.
Definition 12 (Player-stable observation function).
An observation function for player is player-stable if for every two histories and such that , then the states are controlled by the same player, for all .
The identity observation function is trivially player-stable. A game with objectives for all its players and a player-stable observation function for one of its players is denoted . When is the identity, we do not mention it in this notation. Notice that with a player-stable observation function for player , we have that, given a history , all histories with the same observation as also belong to . It is therefore natural to ask that a strategy for player is observation-based, that is, the same action is played by player after all histories observed similarly.
Definition 13 (Observation-based strategy).
Given a player-stable observation function , a strategy of player is observation-based if for every pair of histories , if and , then .
Both concepts of observation and observation-based strategy are illustrated in an example in Appendix B.1.
3.3 Definition of the Game
In this section, let us fix a game with a game structure and objectives . Later, this game will be a reachability game or a parity game. We here formally define the corresponding game, that was informally presented in Section 3.1. Recall that the role of player is to simulate a strategy of player in , while player has to propose a subgame perfect response to such that is losing for player in , response to which player can react by choosing some deviations (decision phase). The interactions between the three players result in a play in that simulates a play in . Challenger has to predict a gain profile for this simulated play, which he can update upon deviations from (adjusting phase). We begin by defining the game structure of the new game .222Note that to avoid confusion with the players of , and without loss of generality, the three players of the game have been given explicit names rather than numbers. We then define the observation function of in Definition 19; and we finally define and describe the objectives of the three players. Figure 2 should help the reader for understanding those definitions. The states (except its initial state ) of the game structure have several components: All of them have a state of as their first component, that we call -component. Similarly, all of them have a gain-component, which consists of a gain profile for the players of , where in each component of , symbolizes a loss and a win with respect to objective . The states that have only these two components are called -states and belong either to or . A projection on the -components of these states determines the simulated play in (see Definition 15). As already stated in Section 3.1, whenever the current state of the simulated play belongs to player in , the next state of the simulated play is chosen directly by . However, when it is not the case, has to make a proposition, which should then be validated or changed by (decision phase). Thus, to reflect this proposal while remaining hidden to , the successor states of -states belonging to have an extra component, called action-component, which records the action of proposed by . From such states called action-states, has to react by confirming or changing the action that was proposed by . This is modeled by a third kind of states, called -states. Such states have a player-component, which consists of a player from different from player to signal to this player has indeed deviated (or the empty set, to signal acceptance by Prover of Challenger’s proposal).
Definition 14 (States of the game).
Given a game , the game structure of is a game structure , where , , and are the three players and the set of states is as follows:
-
,
-
,
-
,
-
is the initial state.
Among those states, are -states, are action-states, and are player-states. Moreover, is a -component, is a gain-component, is an action-component, and is a player-component. Notice that the sets are pairwise disjoint.
Let us now describe the transitions of the game and explain further the actions of the players, see also Figure 2 . The goal of Challenger is to prove the existence of a -fixed SPE losing for player in . To be able to verify this claim in , Challenger has to predict the gain of the simulated play that is being constructed. To do so, the gain-components of the states of are used as follows: First, Challenger owns the initial state , and has to choose a gain profile , losing for player ,333As an outcome of a subgame perfect response to losing for player is exactly what Challenger wants to exhibit, for any simulated by . to start the construction of the simulated play in the -state . Then, whenever chooses to make some player deviate, before reaching a -state, Challenger has to respond by choosing an adjusted gain profile with a lower or equal gain for this player (adjusting phase). This new gain shows to the absence of any profitable deviation for player . Note that for the game structure, two modeling decisions have been made to help handle the subsequent developments. First, to ensure a certain regularity of the plays’ shape in the game and that cannot infer whether actually made a deviation, the adjusting phase is played regardless of a deviation occurring or not: if there was no deviation, Challenger has no choice but to play the same gain profile. Second, to avoid confusion on who, among Challenger and Prover , is currently playing some action , this action is of the form for and for , that is, we additionally recall the player from performing the action when Challenger is playing.
As already mentioned in Section 3.1, given a play in , there exists a unique corresponding play in being simulated by the interactions of the Provers and Challenger.
Definition 15 (Simulated play and gain in ).
Let be a play in the game structure of . The simulated play of is the play being the projection of on the -component of -states (which belong either to or ) and on actions of and .444This projection does not take into account the action- and player-states and the actions of Challenger. The definition extends naturally to histories. The simulated gain of is a Boolean vector , such that if is losing for player , and if is winning for player .
Let us give an example:
Example 16.
Consider the fictional history in the game of some game . It starts in the initial state , where chooses a gain profile where . Thus the second state of the history is the -state . Looking at the form of the third state in , we can deduce that belongs to Prover , that is, belongs to player in . Moreover . Then, the fourth state in is an action-state, which means that the previous state belongs to Challenger, such that belongs to player in , and is the action from proposed by in this scenario. One can see that then chooses the action , which is different from since the fifth state of is a state whose player-component is not equal to . This indicates that makes player deviate in and that . By now, Challenger has to choose a new gain profile , yielding the last state of , which is a -state. By projecting on the -components of its -states and on the actions of the Provers, one can check that the simulated history of in is . Consider now the fictional play in the game. Looking at , one can deduce that from , the last state of , Challenger proposes action from for player and that accepts this proposal. Furthermore, Challenger cannot adjust the gain profile, leading to state . One can see that this behavior repeats indefinitely, thus the corresponding simulated play is . The simulated gain of is a Boolean vector deduced from , such that its -th component is equal to if, and only if, is winning for player .
Remark 17.
Note that while there exists a unique simulated play in for every play in , the converse does not hold. Indeed, several different sequences of interactions between and yield the same simulated play in . For instance, Challenger can propose different actions that can refuse.
We have seen that each state and action of appear in several contexts in the game structure of . Let us now state how the size of depends on the size of . The proof of this lemma directly follows from the definition of the game.
Lemma 18.
Given the size , , and of the game structure of , the game structure of has 3 players and a size linear in and , and exponential in , and linear in and exponential in .
In the game, recall that and have perfect information whereas has imperfect information to ensure that cannot adapt his strategy from observing the interactions between and . The observation function for player is defined below on and . For each state of , he is only able to observe the -component of . Concerning the actions of , those of are all visible whereas those of are not visible at all. In the next definition, there is an abuse of notation in and .
Definition 19 (Information in the game).
Given the game structure of , players and have perfect information and the observation function of player is defined as follows: We have such that for all , , for all , for all , and for all . Furthermore, we have such that for all , for all , and for all .
To finalize the definition of the game, it remains to define the objectives of the three players. Recall that the objective of Challenger is to show that for each strategy of player in , there exists a subgame perfect response such that the outcome of is losing for player . Let us give some intuition on what is a play in the game that is winning for . Three winning situations may occur:
-
Eventually, that is, after reaching some subgame, always accepts the action proposals of and the gain predicted by in the subgame is correct (it is equal to the simulated gain in this subgame), or
-
eventually, keeps making one unique player deviate in the decision phase, but is able to adjust the gain to show that this deviation is not profitable for player , or finally
-
keeps making at least two different players deviate, essentially conceding the play, as the only deviations that can be considered within the scope of (-fixed) SPEs are unilateral deviations.
In the game, the two Provers have the same objective that is opposed to the objective of Challenger. Indeed, recall that their objective is to exhibit a strategy for player in , such that for every subgame-perfect response to , the outcome of the resulting profile is winning for player . Let us give some intuition on what is a winning play for the Provers. Two winning situations may occur for the Provers along a play in the game:
-
Eventually, after reaching some subgame, always accepts the action proposals of and the gain predicted by is incorrect, or
-
eventually, keeps making one unique player deviate in the decision phase, and is not able to adjust the gain to show that this deviation is not profitable for player .
A full formal definition of the players’ objectives is provided in Appendix B.2.
Given a game , we have its game, where is the objective of the Provers.
3.4 Equivalence Between SPE-NCRS Problem and Game
In this section, we show that solving the SPE-NCRS problem for a game is equivalent to solving its game, as stated in the following theorem. To keep things short, we give here a sketch of the proof, and refer the reader to [19] for a full and detailed account.
Theorem 20 (Equivalence theorem between and ).
Let be a game, and be its game. There exists in a strategy of player , such that, for every subgame-perfect response , the play is winning for player if, and only if, there exists in an observation-based strategy of such that for all strategies of , there exists a strategy of such that the play belongs to .
Proof sketch.
Let be a solution to the SPE-NCRS problem of . Let be the observation-based strategy of that simulates : it is naturally defined as the strategy that makes the same choice of action as at in at all the histories that are observed as for in . Let be a strategy of . Let be the strategy profile obtained from by extracting the actions chosen by at histories that indeed simulate histories in . There are two possibilities, either is a 0-fixed SPE in , or it is not a 0-fixed SPE.
-
1.
Suppose is a 0-fixed SPE in . Let be , the accepting strategy of , where he accepts every action proposal of and never deviates. By definition of , the simulated play of is . In that case, since is a solution to the SPE-NCRS problem of , we know that . However, by construction of and the fact that is the accepting strategy of , we know that has predicted a loss for player from the start, and has never had to adjust his prediction. Thus, we are in the case of the winning condition for the Provers.
-
2.
Suppose now is not a 0-fixed SPE in . This means that it is not a 0-fixed NE in some subgame of . Let be a history compatible with such that is not an NE. Let player be a player in that has a profitable deviation in the subgame starting after . In , we can show that there exists a unique history ending in a -state, compatible with , such that and that is also compatible with . Depending on Challenger’s prediction for player at , the strategy has to be different: if Challenger predicted a loss, we let bring the play in , then simulate the profitable deviation of player . If Challenger predicted a win, we let bring the play in , then switch to : since player has a profitable deviation from , it is necessary the case that the resulting play in is losing for player .
Turning to the other direction, assume that there is no solution to the SPE-NCRS problem in . Let be a strategy of in . Let be the strategy of player in such that is its simulation. By Theorem 9, there exists a -fixed SPE in . Among all those -fixed SPEs, one must have an outcome losing for player as, by assumption, is not a solution of the SPE-NCRS problem in . Let be such a -fixed SPE. From these, we can define a strategy for that will precisely enact on relevant histories, and predict correctly the gain profile as the one in every subgame. In particular, at the initial state, it predicts the gain profile of , such that by choice of the -fixed SPE.
Let now be a strategy of . Consider the play . If there are infinitely many deviations by two different players of in the play , then satisfies the winning condition of Challenger. Assume now that there is at most one player that makes deviate. There are two cases:
-
1.
Either prescribes a finite (possibly null) number of deviations along , and then switches to the accepting strategy . Eventually, after reaching some subgame, always accepts the action proposals of and the gain predicted by in the subgame is correct. Hence satisfies the winning condition of Challenger.
-
2.
Or prescribes an infinite number of deviations for the same player along . As there are only two values (0 and 1) for the gain-components and there is no opportunity for the -component of the predicted gain to increase, we eventually get a stable gain-component for player . If , we clearly have , showing that is satisfied. If , recall that is a -fixed SPE, thus it is an NE in the subgame from which was stabilized. The outcome of in this subgame is losing for player as Challenger correctly predicts a gain of with . Any deviation from this outcome is thus also losing for player , including . Thus, is again satisfied.
4 Solving the Game
This section is the last puzzle piece to complete the proof of Theorem 11. We present here the key ideas behind the manyfold process of solving the game, while the full technical constructions and details can be found in [19]. Thanks to Theorem 20, we have an equivalence between the existence of a solution of the SPE-NCRS problem for a game and the fact that the Provers are able to win the associated game. More precisely, the situation where has a strategy, such that for every strategy of Challenger, has a strategy to make the Provers win. The remaining question is how to determine whether this is the case? In other words, how can one solve, in this particular sense, this three-player game with imperfect information? As shown on Figure 1 , this involves several steps. We start by a first technical step to obtain a game with a Rabin objective for the Provers. Then, we get rid of the three-player setting, by eliminating one Prover to obtain a two-player game with imperfect information. Once we have such a game, we focus on getting rid of the imperfect information and work with a parity objective instead of a Rabin one. This allows us to obtain an equivalent two-player zero-sum parity game with perfect information, which is effectively solvable. The techniques used are similar to what can be found in the literature ([21, 22, 41, 42]). However, their settings each differ slightly from the others and ours. Thus, to obtain finer complexity measures, we adapt their techniques to our setting.
The Game as a Rabin Game.
Given a parity game , we show that its corresponding game can be seen as a three-player game with the objective for the two Provers translated into a Rabin objective. The approach is to use a deterministic automaton that observes the states of . Then the synchronized product of the game structure of with this observer automaton is equipped with a Rabin objective translating and thus leads to the announced three-player Rabin game.
From two Provers to one Prover.
The next step is to merge and into a unique Prover by using a technique inspired from [22]. The main idea is to use imperfect information to ensure merging the two Provers does not grant too much knowledge to the new single Prover. Indeed, if the new Prover had perfect information, he could not simulate Prover truthfully. Thus we let the new Prover have the same level of information as Prover in the game. However, in order to let the new Prover have as much actions available as Prover and stay observation-based, we modify the action set to include all functions from states of to actions of . This way, the “merged” Prover preselects actions for each possible Challenger move, effectively encoding both Provers’ strategies via a state-to-action function. This yields an exponential blowup in the number of actions. The Rabin objective remains the same. We obtain the corresponding game, denoted .
Eliminating Imperfect Information.
To solve the two-player game , we get rid of the imperfect information and then apply standard game-theoretic techniques. In the literature [21, 41, 42], this is usually done in two steps: first, make the objective visible, that is, such that any two similarly observed plays agree on the winning condition, second, apply the subset construction to recall the set of possible visited states, and letting them be observed. This can be done in only one step, by simultaneously modifying the game structure to both entail the subset construction on the states of the game, and the product with an automaton that monitors the winning condition along the plays. This allows us to limit the exponential blowups to only one in the size of the state space and the Rabin condition. Essentially, the monitoring automaton is a deterministic parity automaton that is obtained by complementing a non-deterministic Streett-automaton that detects sets of similarly observable plays that contain a losing play for the now unique Prover.
Complexity.
Recall that we set out from a parity game with , , and be the size of the game structure , and be the size of each parity condition . By the construction mentioned above, we have an equivalence with a Rabin game with a state set of size linear in and , and exponential in , an action set linear in and exponential in , while its Rabin objective has a size linear in and the . Reducing to only one Prover, we obtain an exponential number of actions compared to the two-Provers version, thus exponential in , and double-exponential in . As mentioned above, with the removal of imperfect information, we obtain a game with a size of the state set that is now exponential in , , the and double exponential in , while the new parity condition is polynomial in and exponential in . Finally, by [20], solving the ultimate parity game, or equivalently the SPE-NCRS problem, is in time exponential in , and double-exponential in . As the game structure of is action-unique (see Definition 1), it follows that , thus leading to an algorithm in time exponential in and each , and double-exponential in . This completes the proof of the complexity upper bound of Theorem 11. For the PSpace lower bound, the QBF-reduction of [26] (Theorem 7) applies here; all NE responses to in that proof are also SPEs. For the complexity lower bound of Theorem 11 when the number of players is fixed, we adapt the reduction proposed for the NE-NCRS problem in [26], in a way to deal with SPEs instead of NEs.
The Case of Reachability.
A careful analysis of the simpler case of reachability games, when the number of players is fixed, leads to a more fine-tuned solution and a better complexity result: Theorem 11 states a polynomial complexity instead of the exponential complexity of the parity case. This approach follows the same steps as presented above for parity games, but with a few adjustments. The key idea is that monitoring reachability objectives is simpler than parity ones, which means that the synchronized product with an observer automaton can omit some information from the original game structure. Indeed, when monitoring a play in the game that stems from an original reachability game, one still needs to keep track of the actions of both Provers and Challenger, but the checking of the gain component is simplified compared to parity: only the information of whether each player has already visited his target set is to be remembered. This can be exploited to avoid the exponential blowup of the state space in the subset construction phase (with respect to the original state set ) to get rid of imperfect information and thus obtain a better complexity.
Remark 21.
Suppose that our algorithm establishes the existence of a solution to the SPE-NCRS problem for parity games. As it is obtained from a memoryless winning strategy in the final parity zero-sum game, we get a finite-memory solution whose memory size is exponential when the number of players is fixed, doubly exponential otherwise. For a lower bound on the memory required, it is fairly straightforward to show that it requires exponential memory by reducing a Streett zero-sum game to our problem.
5 Conclusion
In this work, we introduce a novel algorithm to solve the SPE-NCRS problem for parity objectives. Unlike previous methods that converted the problem into a model-checking problem for Strategy Logic, our algorithm reduces the SPE-NCRS problem to a three-player zero-sum game with imperfect information, framed as a Prover-Challenger game. This new angle yields improved complexity upper bounds: exponential time in the number of vertices of the game structure and the number of priorities of the parity objectives, doubly exponential time in the number of players. In particular, our algorithm runs within exponential time for a fixed number of players, which is particularly relevant since the number of players is typically small in practical scenarios. Moreover, we establish a lower bound that indicates the impossibility of solving the SPE-NCRS problem in polynomial time unless even for a fixed number of players. For the particular case of reachability objectives, when the number of players is fixed, we prove polynomial complexity like for the NE-NCRS problem [26]. We believe that the Prover-Challenger framework, based on a three-player model with imperfect information, may be applicable for other synthesis challenges beyond our current application.
References
- [1] Rajeev Alur, Thomas A. Henzinger, Orna Kupferman, and Moshe Y. Vardi. Alternating refinement relations. In Davide Sangiorgi and Robert de Simone, editors, CONCUR ’98: Concurrency Theory, 9th International Conference, Nice, France, September 8-11, 1998, Proceedings, volume 1466 of Lecture Notes in Computer Science, pages 163–178. Springer, 1998. doi:10.1007/BFB0055622.
- [2] Roderick Bloem, Krishnendu Chatterjee, and Barbara Jobstmann. Graph games and reactive synthesis. In Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem, editors, Handbook of Model Checking, pages 921–962. Springer, 2018. doi:10.1007/978-3-319-10575-8_27.
- [3] Patricia Bouyer, Romain Brenguier, Nicolas Markey, and Michael Ummels. Pure Nash equilibria in concurrent deterministic games. Log. Methods Comput. Sci., 11(2), 2015. doi:10.2168/LMCS-11(2:9)2015.
- [4] Romain Brenguier, Lorenzo Clemente, Paul Hunter, Guillermo A. Pérez, Mickael Randour, Jean-François Raskin, Ocan Sankur, and Mathieu Sassolas. Non-zero sum games for reactive synthesis. In Adrian-Horia Dediu, Jan Janousek, Carlos Martín-Vide, and Bianca Truthe, editors, Language and Automata Theory and Applications - 10th International Conference, LATA 2016, Prague, Czech Republic, March 14-18, 2016, Proceedings, volume 9618 of Lecture Notes in Computer Science, pages 3–23. Springer, 2016. doi:10.1007/978-3-319-30000-9_1.
- [5] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. The complexity of SPEs in mean-payoff games. In Mikolaj Bojanczyk, Emanuela Merelli, and David P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming, ICALP 2022, July 4-8, 2022, Paris, France, volume 229 of LIPIcs, pages 116:1–116:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ICALP.2022.116.
- [6] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. On the complexity of SPEs in parity games. In Florin Manea and Alex Simpson, editors, 30th EACSL Annual Conference on Computer Science Logic, CSL 2022, February 14-19, 2022, Göttingen, Germany (Virtual Conference), volume 216 of LIPIcs, pages 10:1–10:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.CSL.2022.10.
- [7] Léonard Brice, Jean-François Raskin, and Marie van den Bogaard. Rational Verification for Nash and Subgame-Perfect Equilibria in Graph Games. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science, MFCS 2023, August 28 to September 1, 2023, Bordeaux, France, volume 272 of LIPIcs, pages 26:1–26:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.MFCS.2023.26.
- [8] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, and Jean-François Raskin. Constrained existence problem for weak subgame perfect equilibria with -regular Boolean objectives. Inf. Comput., 278:104594, 2021. doi:10.1016/J.IC.2020.104594.
- [9] Thomas Brihaye, Véronique Bruyère, Aline Goeminne, Jean-François Raskin, and Marie van den Bogaard. The complexity of subgame perfect equilibria in quantitative reachability games. Log. Methods Comput. Sci., 16(4), 2020. URL: https://lmcs.episciences.org/6883.
- [10] Thomas Brihaye, Véronique Bruyère, Noémie Meunier, and Jean-François Raskin. Weak subgame perfect equilibria and their application to quantitative reachability. In Stephan Kreutzer, editor, 24th EACSL Annual Conference on Computer Science Logic, CSL 2015, September 7-10, 2015, Berlin, Germany, volume 41 of LIPIcs, pages 504–518. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2015. doi:10.4230/LIPICS.CSL.2015.504.
- [11] Thomas Brihaye, Véronique Bruyère, and Gaspard Reghem. Quantitative Reachability Stackelberg-Pareto Synthesis is NEXPTIME-Complete. In Olivier Bournez, Enrico Formenti, and Igor Potapov, editors, Reachability Problems - 17th International Conference, RP 2023, Nice, France, October 11-13, 2023, Proceedings, volume 14235 of Lecture Notes in Computer Science, pages 70–84. Springer, 2023. doi:10.1007/978-3-031-45286-4_6.
- [12] Thomas Brihaye, Julie De Pril, and Sven Schewe. Multiplayer cost games with simple Nash equilibria. In Sergei N. Artëmov and Anil Nerode, editors, Logical Foundations of Computer Science, International Symposium, LFCS 2013, San Diego, CA, USA, January 6-8, 2013. Proceedings, volume 7734 of Lecture Notes in Computer Science, pages 59–73. Springer, 2013. doi:10.1007/978-3-642-35722-0_5.
- [13] Véronique Bruyère. Computer aided synthesis: A game-theoretic approach. In Émilie Charlier, Julien Leroy, and Michel Rigo, editors, Developments in Language Theory - 21st International Conference, DLT 2017, Liège, Belgium, August 7-11, 2017, Proceedings, volume 10396 of Lecture Notes in Computer Science, pages 3–35. Springer, 2017. doi:10.1007/978-3-319-62809-7_1.
- [14] Véronique Bruyère. Synthesis of equilibria in infinite-duration games on graphs. ACM SIGLOG News, 8(2):4–29, 2021. doi:10.1145/3467001.3467003.
- [15] Véronique Bruyère, Baptiste Fievet, Jean-François Raskin, and Clément Tamines. Stackelberg-Pareto synthesis. ACM Trans. Comput. Log., 25(2):14:1–14:49, 2024. doi:10.1145/3651162.
- [16] Véronique Bruyère, Christophe Grandmont, and Jean-François Raskin. As soon as possible but rationally. In Rupak Majumdar and Alexandra Silva, editors, 35th International Conference on Concurrency Theory, CONCUR 2024, September 9-13, 2024, Calgary, Canada, volume 311 of LIPIcs, pages 14:1–14:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.CONCUR.2024.14.
- [17] Véronique Bruyère, Jean-François Raskin, and Clément Tamines. Pareto-Rational Verification. In Bartek Klin, Slawomir Lasota, and Anca Muscholl, editors, 33rd International Conference on Concurrency Theory, CONCUR 2022, September 12-16, 2022, Warsaw, Poland, volume 243 of LIPIcs, pages 33:1–33:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.CONCUR.2022.33.
- [18] Véronique Bruyère, Stéphane Le Roux, Arno Pauly, and Jean-François Raskin. On the existence of weak subgame perfect equilibria. Inf. Comput., 276:104553, 2021. doi:10.1016/J.IC.2020.104553.
- [19] Véronique Bruyère, Jean-François Raskin, Alexis Reynouard, and Marie Van Den Bogaard. The non-cooperative rational synthesis problem for subgame perfect equilibria and omega-regular objectives, 2024. doi:10.48550/arXiv.2412.08547.
- [20] Cristian S. Calude, Sanjay Jain, Bakhadyr Khoussainov, Wei Li, and Frank Stephan. Deciding parity games in quasi-polynomial time. SIAM J. Comput., 51(2):17–152, 2022. doi:10.1137/17M1145288.
- [21] Krishnendu Chatterjee and Laurent Doyen. The complexity of partial-observation parity games. In Christian G. Fermüller and Andrei Voronkov, editors, Logic for Programming, Artificial Intelligence, and Reasoning - 17th International Conference, LPAR-17, Yogyakarta, Indonesia, October 10-15, 2010. Proceedings, volume 6397 of Lecture Notes in Computer Science, pages 1–14. Springer, 2010. doi:10.1007/978-3-642-16242-8_1.
- [22] Krishnendu Chatterjee and Laurent Doyen. Games with a weak adversary. In Javier Esparza, Pierre Fraigniaud, Thore Husfeldt, and Elias Koutsoupias, editors, Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part II, volume 8573 of Lecture Notes in Computer Science, pages 110–121. Springer, 2014. doi:10.1007/978-3-662-43951-7_10.
- [23] Krishnendu Chatterjee, Laurent Doyen, Emmanuel Filiot, and Jean-François Raskin. Doomsday equilibria for omega-regular games. Inf. Comput., 254:296–315, 2017. doi:10.1016/J.IC.2016.10.012.
- [24] Krishnendu Chatterjee and Thomas A. Henzinger. Assume-guarantee synthesis. In Orna Grumberg and Michael Huth, editors, Tools and Algorithms for the Construction and Analysis of Systems, 13th International Conference, TACAS 2007, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2007 Braga, Portugal, March 24 - April 1, 2007, Proceedings, volume 4424 of Lecture Notes in Computer Science, pages 261–275. Springer, 2007. doi:10.1007/978-3-540-71209-1_21.
- [25] Krishnendu Chatterjee, Thomas A. Henzinger, and Marcin Jurdzinski. Games with secure equilibria. Theor. Comput. Sci., 365(1-2):67–82, 2006. doi:10.1016/J.TCS.2006.07.032.
- [26] Rodica Condurache, Emmanuel Filiot, Raffaella Gentilini, and Jean-François Raskin. The complexity of rational synthesis. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, volume 55 of LIPIcs, pages 121:1–121:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.ICALP.2016.121.
- [27] Rodica Condurache, Youssouf Oualhadj, and Nicolas Troquard. The complexity of rational synthesis for concurrent games. In Sven Schewe and Lijun Zhang, editors, 29th International Conference on Concurrency Theory, CONCUR 2018, September 4-7, 2018, Beijing, China, volume 118 of LIPIcs, pages 38:1–38:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPICS.CONCUR.2018.38.
- [28] Emmanuel Filiot, Raffaella Gentilini, and Jean-François Raskin. The Adversarial Stackelberg Value in Quantitative Games. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), volume 168 of Leibniz International Proceedings in Informatics (LIPIcs), pages 127:1–127:18, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2020.127.
- [29] Dana Fisman, Orna Kupferman, and Yoad Lustig. Rational synthesis. In Javier Esparza and Rupak Majumdar, editors, Tools and Algorithms for the Construction and Analysis of Systems, 16th International Conference, TACAS 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings, volume 6015 of Lecture Notes in Computer Science, pages 190–204. Springer, 2010. doi:10.1007/978-3-642-12002-2_16.
- [30] Julian Gutierrez, Muhammad Najib, Giuseppe Perelli, and Michael J. Wooldridge. Automated temporal equilibrium analysis: Verification and synthesis of multi-player games. Artif. Intell., 287:103353, 2020. doi:10.1016/J.ARTINT.2020.103353.
- [31] Julian Gutierrez, Muhammad Najib, Giuseppe Perelli, and Michael J. Wooldridge. On the complexity of rational verification. Ann. Math. Artif. Intell., 91(4):409–430, 2023. doi:10.1007/S10472-022-09804-3.
- [32] Thomas A. Henzinger, Orna Kupferman, and Sriram K. Rajamani. Fair simulation. In Antoni W. Mazurkiewicz and Józef Winkowski, editors, CONCUR ’97: Concurrency Theory, 8th International Conference, Warsaw, Poland, July 1-4, 1997, Proceedings, volume 1243 of Lecture Notes in Computer Science, pages 273–287. Springer, 1997. doi:10.1007/3-540-63141-0_19.
- [33] Harold W. Kuhn. Extensive games and the problem of information. Classics in Game Theory, pages 46–68, 1953.
- [34] Orna Kupferman, Giuseppe Perelli, and Moshe Y. Vardi. Synthesis with rational environments. Ann. Math. Artif. Intell., 78(1):3–20, 2016. doi:10.1007/S10472-016-9508-8.
- [35] Orna Kupferman and Noam Shenwald. The complexity of LTL rational synthesis. In Dana Fisman and Grigore Rosu, editors, Tools and Algorithms for the Construction and Analysis of Systems - 28th International Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2-7, 2022, Proceedings, Part I, volume 13243 of Lecture Notes in Computer Science, pages 25–45. Springer, 2022. doi:10.1007/978-3-030-99524-9_2.
- [36] Noémie Meunier. Multi-Player Quantitative Games: Equilibria and Algorithms. PhD thesis, Université de Mons, July 2016.
- [37] Robin Milner. An algebraic definition of simulation between programs. In D. C. Cooper, editor, Proceedings of the 2nd International Joint Conference on Artificial Intelligence. London, UK, September 1-3, 1971, pages 481–489. William Kaufmann, 1971. URL: http://ijcai.org/Proceedings/71/Papers/044.pdf.
- [38] Fabio Mogavero, Aniello Murano, Giuseppe Perelli, and Moshe Y. Vardi. What makes Atl* decidable? A decidable fragment of strategy logic. In Maciej Koutny and Irek Ulidowski, editors, CONCUR 2012 - Concurrency Theory - 23rd International Conference, CONCUR 2012, Newcastle upon Tyne, UK, September 4-7, 2012. Proceedings, volume 7454 of Lecture Notes in Computer Science, pages 193–208. Springer, 2012. doi:10.1007/978-3-642-32940-1_15.
- [39] Martin J. Osborne and Ariel Rubinstein. A course in Game Theory. MIT Press, Cambridge, MA, 1994.
- [40] Gary L. Peterson and John H. Reif. Multiple-person alternation. In 20th Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, 29-31 October 1979, pages 348–363. IEEE Computer Society, 1979. doi:10.1109/SFCS.1979.25.
- [41] Jean-François Raskin, Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Algorithms for omega-regular games with imperfect information. Log. Methods Comput. Sci., 3(3), 2007. doi:10.2168/LMCS-3(3:4)2007.
- [42] John H. Reif. The complexity of two-player games of incomplete information. J. Comput. Syst. Sci., 29(2):274–301, 1984. doi:10.1016/0022-0000(84)90034-5.
- [43] Michael Ummels. Rational behaviour and strategy construction in infinite multiplayer games. In S. Arun-Kumar and Naveen Garg, editors, FSTTCS 2006: Foundations of Software Technology and Theoretical Computer Science, 26th International Conference, Kolkata, India, December 13-15, 2006, Proceedings, volume 4337 of Lecture Notes in Computer Science, pages 212–223. Springer, 2006. doi:10.1007/11944836_21.
- [44] Michael Ummels. The complexity of Nash equilibria in infinite multiplayer games. In Roberto M. Amadio, editor, Foundations of Software Science and Computational Structures, 11th International Conference, FOSSACS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29 - April 6, 2008. Proceedings, volume 4962 of Lecture Notes in Computer Science, pages 20–34. Springer, 2008. doi:10.1007/978-3-540-78499-9_3.
Appendix A Helpful Examples for Games on Graphs
In this appendix, we present additional examples and explanations for notions referred to in Section 2.
Example 22 (A simple reachability game).
Consider the game structure pictured in Figure 3. Its initial state is and there are two players, player , who owns the circle states, and player , who owns the square state. Transitions are represented by the arrows between states, such that there exists an arrow from a state to another state if there exists an action (either left or right here) such that . The set of actions is thus partitioned into and .
Both players have the same reachability objective, with , denoted in the figure by the fact that state is double-circled. None of the two players have a winning strategy: player can prevent player to ever reach state by choosing to go to , or , depending on the first action of player , while player can prevent player from winning by going to state .
Example 23 (A game with two NEs).
Let us come back to the reachability game of Figure 3. Two strategy profiles are represented in Figure 3: the first one, , by the red transitions between states and the second one, , by the blue ones. Both profiles are NEs. Indeed, for the red profile with outcome , if player deviates from , the resulting play is still losing for him, since player chooses to go to state . On the other hand, if player deviates from , the resulting play is , since player chooses to go to state , and this play is losing for player . Thus, none of the players have a profitable deviation from . Notice that the gain profile of the red profile is equal to . Similarly, one can easily check that the blue profile is also an NE with a gain profile equal to .
In the red profile of the previous example, the possible choice of going from to for player is irrational: after all, his objective is to reach state . Notice, however, that this behavior is part of the NE .
Example 24 (A game with an SPE).
Consider again the game in Figure 3. In the subgame , starting after the history , the red profile is not an NE. Indeed, choosing to go to state is a non-credible threat from player , as his target set is accessible from state , which he owns. The player strategy that goes to from is thus a profitable deviation in the subgame . Therefore, the red profile is not an SPE in . On the other hand, one can check that the blue profile is an SPE in , as it is an NE in every of its subgames.
Example 25 (A solution to the SPE-NCRS problem).
The answer to the SPE-NCRS problem for the game in Figure 3 is positive. Indeed, consider the strategy of player that chooses action in both states and . The unique subgame-perfect response for player is to choose action in . The resulting strategy profile is thus the blue one whose outcome is winning for player .
Appendix B SPE-NCRS Problem and Game
In this appendix, we provide additional material for Section 3.
B.1 Imperfect Information
Example 26 (Observation).
In Figure 4, a two-player game structure is pictured, together with the observations of player , who owns the circle states (we suppose that player has perfect information). Each state is divided in two sections: on the left, with a white background, the name of the state is displayed, while on the right side, with a gray background, the observation of player is displayed. Similarly, the actions and of player , who owns the square state, are accompanied by a on their right side, which is an abuse of notation to mean , that is, player cannot distinguish between the two possible actions of player . Notice that from states owned by player , the actions do not have an observation attached to them, as player knows his own actions, by hypothesis.
Let us now look at which histories player can or cannot distinguish. From , player cannot distinguish between the actions of player , and both states and have the same observation set . Therefore, the two histories and are indistinguishable for player , i.e., . Similarly, we have .
Example 27 (Observation-based strategy).
Let us come back to the game structure of Figure 4. Suppose that the objective of player is to reach the target set . One can verify that player does not have an observation-based winning strategy. Indeed, as noted in Example 26, player cannot distinguish between histories and . By definition, any observation-based strategy must prescribe the same action from both and . If this action is , then both resulting plays are losing for player . If it is , then one of the resulting play is winning, while the other is losing. Thus, no observation-based strategy ensures player to reach .
B.2 Details on the Game Structure
Definition 28 (Objectives of the game).
Let be a play in the game structure of . Let be its simulated gain.
-
The play is winning for Challenger if one of the following conditions is satisfied:
-
there exist and , such that for every state with , if is a player-state, then it is of the form such that its player-component equals and its gain-component satisfies
-
there exists a player such that
-
–
for every , there exists for which is a player-state with its player-component being equal to ,
-
–
there exists such that for every state with , if is a player-state with , then its player-component equals and the -th component of its gain-component satisfies
-
–
-
there exist two distinct players such that for every , there exist for which and are player-states, with their player-component being respectively and .
The set of plays satisfying one of these conditions is denoted .
-
The play is winning for Prover and Prover if one of the following conditions is satisfied:
-
there exist and , such that for every state with , if is a player-state, then it is of the form such that its player-component equals and its gain-component satisfies
-
there exists a player such that
-
–
for every , there exists for which is a player-state with its player-component being equal to ,
-
–
there exists such that for every state with , if is a player-state with , then its player-component equals and the -th component of its gain-component satisfies
-
–
The set of plays satisfying one of these conditions is denoted .
Remark 29.
We have and .
