Memory Requirements in Non-Zero-Sum Games
Abstract
The interaction between a system and the components modeling its environment is traditionally modeled by a multi-player game played on a finite graph. In zero-sum games, the players have conflicting objectives, and it is clear that increasing the memory of the environment players can only make it harder for the system to win. In non-zero-sum games, the objectives of the players may overlap. There, typical questions concern the stability of the game and the equilibria the players may reach. In particular, in rational synthesis (RS), the goal is to find an equilibrium that satisfies the objective of the system.
We study how the memory of the environment players may affect the existence of an RS solution. As we show, the picture is diverse, even when the objectives of all players are memoryless. On the one hand, when stability amounts to a Nash equilibrium (NE), then increasing the memory of the environment may only help the system to suggest an RS solution. On the other hand, when the notion of stability involves deviations by coalitions of environment players, for example in a strong Nash equilibrium (SNE), then increasing their memory may sometimes enable and sometimes prevent the existence of an RS solution. We study memory bounds for the players, showing that the memory required may be polynomial in an NE-RS solution and exponential in an SNE-RS solution. We also solve the SNE-RS problem, show that it is PSPACE-complete, and relate the differences between NE and SNE with the differences between cooperative and non-cooperative RS.
Keywords and phrases:
Non-Zero-Sum Games, Synthesis, MemoryCopyright and License:
2012 ACM Subject Classification:
Theory of computation Formal languages and automata theory ; Theory of computation Semantics and reasoningEditors:
Stefano Guerrini and Barbara KönigSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Synthesis is the automated construction of systems from their specifications [28]. Modern systems often consist of interacting components. The interaction is modeled by a multi-player game played on a finite graph. In the turn-based setting, the vertices of the game graph are partitioned among the players. A token is placed on an initial vertex, and in each turn, the player that owns the vertex with the token moves it to a successor vertex. Each player has a strategy that directs her how to move the token when it reaches vertices she owns. A profile is a vector of strategies, one for each player. The outcome of a profile is a play – an infinite path in the game graph, obtained when the players follow their strategies. The goal of each player is to direct the game into a play that satisfies her objective. Each objective defines a subset of [25], where is the set of vertices of the game graph. For example, in games with Büchi objectives, is a subset of , and a play satisfies if it visits vertices in infinitely often.
In zero-sum games, the players compete with each other on the satisfaction of contradicting objectives. Zero-sum games with -regular objectives are determined: in every game, exactly one player has a winning strategy – one that achieves her objective against all strategies of the other players [25]. Deciding a zero-sum game amounts to finding this player. In contrast, in non-zero-sum games, the objectives of the players may overlap [12, 31]. There, typical questions concern the stability of the game and the equilibria the players may reach [32]. The most common notion of stability is Nash equilibrium (NE) [26]. A profile of strategies is an NE if no (single) player can benefit from unilaterally changing her strategy.
Two-player zero-sum games model the interaction between a system and its environment. The system aims to satisfy its specification in all environments. Accordingly, the environment is assumed to be hostile, as if its objective is to violate the specification [4]. Often, however, the components composing the environment have objectives of their own, and they act to achieve their objectives. For example, clients interacting with a server typically have objectives other than to fail the server. Accordingly, the setting induces a non-zero-sum game, where the objectives of the players are induced from their specifications. In rational synthesis, we let the system exploit the rationality of the environment. In particular, in cooperative rational synthesis (CRS) [18], the desired output is an NE profile whose outcome satisfies the objective of the system. Thus, in CRS, we assume that we can suggest strategies to the environment players, and if they have no incentive to deviate from these strategies, they follow them. Rational synthesis has been extensively studied in various settings and variants [33, 13, 1, 22, 9, 10]. Recall that strategies for the players direct them how to move the token in vertices they own. A strategy may depend on the history of the game so far. Thus, in different visits of the token to the same vertex , a strategy may direct the owner of to move the token to different successors. Extensive research has concerned the memory requirements for strategies in zero-sum games with -regular objectives [30, 14, 5, 11, 8]. For example, it is well known that a winning strategy for a Büchi objective can be memoryless, thus it may depend only on the current vertex. On the other hand, a winning strategy for a conjunction of Büchi objectives may require memory [14]. Researchers have also studied games in which the memory of the players is bounded [29, 15, 17, 20, 23]. Clearly, increasing the memory of the system or reducing the memory of the environment in a zero-sum game can only help the system to win a game.
For non-zero-sum games, the situation is less clear. First, as we show in Section 3, even when its objective is memoryless, the system may need memory for its strategy in a CRS solution. Moreover, unlike the situation in zero-sum games, an increase of memory to the environment players may be helpful for the system. That is, even when the objectives are memoryless, the only possible CRS solutions may require the environment players to have memory. Intuitively, increasing the memory of the players in non-zero-sum games enables them to satisfy multiple objectives, which may be essential for achieving both stability and the satisfaction of the system’s objective [31]. In fact, we show that when the objectives of the environment players are memoryless, then adding memory to the environment players may only help the system to win.
Our basic observations above raise several interesting questions about memory bounds in non-zero-sum games. We first prove that a CRS solution in a -player non-zero-sum game with sink objectives (that is, ones that can be specified with all reachability or -regular objectives) may require each of the players to have memory , matching the known upper bound [24, 31]. Further questions concern richer settings, detailed below.
The notion of an NE corresponds to deviations of single players. In some applications, a coalition of players may deviate together. For example, protocols for voting, mechanisms for exchange of messages, allocation and construction of shared resources – all should take into account the possibility of players that deviate together. The different applications induce different notions of stability. The first such definition is of strong Nash equilibrium (SNE). A profile is an SNE if no subset of players can deviate in a way that benefits all its members [3]. Then, for , a profile is a -robust equilibrium if no coalition of size can deviate in a way that benefits at least one of its members without harming the other members, and no coalition of size can deviate in a way that harms the other players [6]. Finally, a profile is a strong secure equilibria (SSE) if every deviation of a coalition of players that harms some player, also harms a player in the coalition [7].
We study the CRS problem when stability is defined with respect to deviations by a coalition of players. For example, in the SNE-CRS problem, we seek a profile of strategies that satisfies the objective of the system and is an SNE. The contributions in [6, 7] include a study of the complexity of the CRS problem when the solution concepts are robust-equilibrium and SSE. PSPACE upper bounds for the problems involve a reduction to a two-player game, termed the deviator game [6], which we easily extend to SSE-CRS. For the lower-bound, our contribution is more interesting and involves relating non-cooperation in rational synthesis with deviations of coalitions in SNEs: Recall that in cooperative rational synthesis, the desired output is an NE that satisfies the system objective. In non-cooperative rational synthesis (NRS) [21], the desired output is a strategy for the system player such that the objective of the system is satisfied in the outcome of all NE profiles that include this strategy. Thus, in NRS, the environment players are rational, but we cannot suggest them a strategy. As shown in [1], the cooperative and non-cooperative approaches are related to the two stability-inefficiency measures of price of stability [2] and price of anarchy [19, 27]. We relate NE-NRS (that is, NRS when stability amounts to being an NE) with SNE-CRS, showing that the challenge of coping with deviations of a coalition is similar to the challenge of coping with non-cooperation. Intuitively, in both cases, all the environment players may deviate simultaneously, as long as these deviations are beneficial for them. The relation implies that the PSPACE-hardness of the NE-NRS problem [13] applies also to the SNE-CRS problem.
Back to the study of memory requirements, we examine how the transition to solution concepts that involve deviations by a coalition of players affects these requirements. Since players in a coalition care for the satisfaction of the objectives of all the players in the coalition, their strategies have to satisfy multiple objectives. Since the latter typically requires memory, the study of the memory requirements in non-zero-sum games with deviations by coalitions is more interesting and involved. We start with some observations about the effect of increasing the memory to the environment players and show that, unlike the case of NE-CRS, here an increase to the memory of the environment players may prevent the existence of an SNE-CRS solution even when the objectives are memoryless.
On the other hand, for some games, the existence of an SNE-CRS solution requires the environment players to have memory, and we examine the memory requirements for them. For the upper bound, it is not hard to extend the analysis in [6] and describe an exponential upper bound for the required memory. Our main technical contribution is a matching lower bound. We show that even in -player games with sink objectives, an SNE-CRS solution may require players to have memory . Moreover, our bounds apply also to the solution concepts of -robust-equilibrium and SSE, completing the picture to all known solution concepts with deviations of a coalition of players.
2 Preliminaries
Games.
For , let . A -player (turn-based) game graph is a tuple , where are disjoint sets of vertices. For every , the vertices in are owned by Player , and we let . Then, is an initial vertex, and is a total edge relation, thus for every , there is such that . For , we denote by the player such that . The size of , denoted , is , namely the number of edges in it.
A game is a tuple , where is a -player game graph, and , for , is a winning condition (a.k.a. objective) for Player . In the beginning of a play in the game, a token is placed on . Then, in each turn, the player that owns the vertex that hosts the token chooses a successor vertex and moves the token to it. Together, the players generate a play in , namely an infinite path that starts in and respects : for all , we have that .
Each winning condition defines a subset of . The objective of Player is to cause the interaction to generate a play that satisfies . We describe some types of winning conditions below. For a play , we denote by the set of vertices visited at least once along , and by the set of vertices visited infinitely often along . That is, and there are infinitely many such that . A reachability objective is given by a set of vertices , and it requires some vertex in to be visited at least once; thus a play satisfies iff . A Büchi objective is given by a set of vertices , and it requires some vertex in to be visited infinitely often; thus satisfies iff . The objectives dual to reachability and Büchi are avoid (also known as safety) and co-Büchi, respectively. Formally, a play satisfies an avoid objective iff , and satisfies a co-Büchi objective iff . A generalized Büchi objective is a set of Büchi objectives. A play satisfies if it satisfies all the objectives in ; thus if for all , we have that . Generalized reachability objectives are defined similarly, requiring all underlying reachability objectives to be satisfied.
Strategies, profiles, and equilibria.
For , a strategy for Player is a function that maps prefixes of plays that end in a vertex owned by Player to possible extensions in a way that respects . That is, for every history and , we have that . Intuitively, a strategy for Player directs her how to move the token, and the direction may depend on the history of the play so far.
A profile is a tuple of strategies, one for each player. The outcome of a profile is the play obtained when the players follow their strategies. Formally, is such that for all , we have that . Consider a game and a profile . The set of winners in when the players follow , denoted , is the set of players whose objectives are satisfied in . Formally, iff satisfies . The set of losers in , denoted , is then , namely the set of players whose objectives are not satisfied in . When is known from the context we write and respectively.
For a subset of players, an -profile is a set of strategies, one for each player in . We say that a profile extends an -profile if the players in use in their strategies in . For a profile , a non-empty subset , and a -profile , we denote by the profile in which the players in follow their strategies in and the players in follow their strategies in . Formally, , where for every , we have that , if , and , otherwise. When is a singleton, for some , we simplify the notation and use rather than .
A profile is a Nash Equilibrium (NE, for short) [26] if no single player has an incentive to deviate from . Formally, is an NE if for every , if , then for every strategy for Player , we have that . The notion of an NE assumes deviation by single players. In some applications, a coalition of players may deviate together. The different applications induce different definitions of stability. We consider three definitions. First, a profile is a Strong Nash Equilibrium (SNE, for short) [3] if no coalition of players can jointly deviate in a way that strictly benefits all its members. Formally, is an SNE if for every non-empty subset , and every -profile , there exists such that . Note that an NE is a special case of an SNE in which only deviations of coalitions of size are possible. Then, for , a profile is a -robust equilibrium if no coalition of size can deviate in a way that benefits at least one of its members without harming at least one of the other members, and no coalition of size can deviate in a way that harms at least one of the other players [6].111The setting in [6] considers weighted objectives, where rather than winning or losing, each profile induces a payoff for each player, which enables also a quantitative definition of “harm” and “benefit”. Here we consider -regular Boolean objectives, inducing a Boolean interpretations for “harm” and “benefit”. Formally, is -robust if is -resilient: for every subset of size at most , and every -profile , if , then , and is -immune: for every subset of size at most , and every -profile , we have that . Finally, a profile is a strong secure equilibria (SSE) if every deviation of a coalition of players that harms some player, also harms a player in the coalition [7]. Formally, is an SSE if for every subset , and every -profile , if , then .
For a subset of players, we say that is a -NE if is an NE with , and similarly for the other solution concepts.
Rational Synthesis.
We consider a setting in which the players model a controllable system and its rational environment. Technically, we assume that Player models the system (a system may be composed from several components, but since the system is controllable, we can merge them to a single player), and the other players model the components of the environment. Let . The basic problem we consider is the existence and finding stable profiles that satisfy the objective of the system.
We refine the notions of NE and SNE to take into account our ability to control the system. For a profile , we say that is a -fixed NE if no player in can benefit from unilaterally changing her strategy. Likewise, we say that is a -fixed SNE if no coalition of players in can jointly deviate in a way that strictly benefits all its members.
Consider a -player game . The problem of NE (SNE) cooperative rational synthesis, denoted NE-CRS (resp., SNE-CRS) is to return a -fixed NE (resp., SNE) in in which Player wins. As in traditional synthesis, one can also define the corresponding decision problems, of rational realizability, where we only need to decide whether the desired strategies exist. In order to avoid additional notations, we sometimes refer to NE-CRS and SNE-CRS also as decision problems.
Finite-Memory Strategies.
A strategy is finite-memory if it is possible to replace the unbounded histories in by finitely many memories. Formally, a memory structure for a game with is , consisting of a finite set of memory states, an initial memory state , and an update function . A memory structure is similar to an automaton with alphabet , which is executed in parallel to the game: it starts in and reads the edges traversed by the token. Then, a strategy for Player that relies on replaces the dependency on the history of the play by dependency on the current memory state of . Thus, the strategy is given by a function , such that for all and , we have that . When the current memory state is and the token is in vertex , Player moves the token to and moves to state . A strategy is memoryless if it only depends on the current vertex. That is, if for every two histories and vertex , we have that . Note that a memoryless strategy can be viewed as a function , and corresponds to the case . An objective type is called memoryless if in every zero-sum game with an objective of type , if Player wins, then she has a memoryless winning strategy. Reachability, avoid, Büchi, and co-Büchi objectives are all memoryless [30].
For , we say that a strategy uses memory if a memory structure that generates needs states. Given a profile , we say that Player uses memory (in ) if uses memory .
3 Is Memory for the Environment Helpful?
In zero-sum games, increasing the memory of the environment may only decrease the ability of the system to satisfy the specification. Formally, for every zero-sum game and for every bound , if Player wins against Player that uses memory , then for every , Player also wins against Player that uses memory , and possibly there is such that Player loses against Player that uses memory .
In this section we show that the picture in non-zero-sum games is different and more involved. First, the system may need memory in order to have a CRS solution in a game with memoryless objectives for all players. In addition, memory for the environment is required for the existence of a CRS solution in some cases yet prevents the existence of a CRS solution in other cases. Intuitively, memory for the environment enables the system to suggest to the environment richer strategies, but also enables the environment to have richer deviations.
We first describe cases in which increasing the memory of the system and the environment is required for a CRS solution, even in games with memoryless objectives for all players. Our examples are with two-player games, and thus apply to both NE-CRS and SNE-CRS.
Theorem 1.
There are two-player Büchi (or reachability) games and such that the following hold.
-
1.
There is a CRS solution for in which Player uses a memory of size and there is no CRS solution for in which Player is memoryless.
-
2.
There is a CRS solution for in which Player uses a memory of size and there is no CRS solution for in which Player is memoryless.
Proof.
We describe and with Büchi objectives. The same games and considerations apply when the games have reachability objectives. Consider the Büchi game in Figure 1 (left). Let and . Drawing two-player games we use circles and boxes to describe the vertices in and , respectively.
Consider the strategy for Player that, in , alternates between and . That is and . Note that in order to implement the alternation, the strategy requires a memory of size . Consider the strategy for Player that, in , takes the token down to . Note that the profile is a CRS solution. Indeed, its outcome is , which visits both and infinitely often. Thus, both objectives are satisfied, and the profile is a CRS solution.
On the other hand, consider a memoryless strategy for Player . If, in , Player moves the token to , then the play reaches and stays forever in no matter what the strategy of Player is, and is not satisfied. If, in , Player moves the token to , then either, in , Player always moves the token to , in which case the play never visits , and so is not satisfied, or Player always moves the token to , in which case is not satisfied, causing Player to deviate in . Thus, there is no CRS solution in which Player uses a memoryless strategy.
Consider now the Büchi game in Figure 1 (right). Let and . Note that all the vertices with more than one successor belong to Player , and so there is a single strategy for Player in the game. Consider the strategy for Player that, in , takes the token down to , and, in , alternates between and . That is and . Note that in order to implement the alternation, the strategy requires a memory of size . It is easy to see that the profile is a CRS solution. Indeed, its outcome is , which satisfies both objectives.
On the other hand, consider a memoryless strategy for Player . If, in , Player moves the token to , then the play reaches and stays forever in and is not satisfied. If, in , Player moves the token to , then either, in , Player always moves the token to , in which case is not satisfied, or Player always moves the token to , in which case is not satisfied, causing Player to deviate in either or . Thus, there is no CRS solution in which Player uses a memoryless strategy.
The example of in Theorem 1 shows that increasing the memory of the environment may help the system to achieve a CRS solution. We continue and examine whether this is always the case. We first need some notations. Consider a non-zero-sum game . For , we say that a profile is an -bounded -fixed NE if for every , the strategy uses memory at most , and if , then for every strategy for Player that uses memory at most , we have that . Thus, environment players are restricted to strategies that use memory at most , in both and their deviations. Likewise, is an -bounded -fixed SNE if all the strategies of the environment players in use memory at most and no coalition of environment players can jointly deviate to strategies that use memory at most in a way that strictly benefits all its members. Then, we say that is an -bounded NE-CRS (SNE-CRS) solution if is an -bounded -fixed NE (SNE, respectively) that satisfies . Note that the usual CRS problem coincides with the case .
We first show that, unsurprisingly, when the objectives of the environment players require memory, in particular when they consist of a conjunction of objectives, then a system may have a CRS solution only thanks to bounds on the memory of the environment players.
Theorem 2.
There is a two-player generalized-Büchi (or generalized reachability) game such that there is no CRS solution for , yet there is a -bounded CRS solution for .
Proof.
We prove the theorem for the Büchi case. The same game and considerations apply for generalized-reachability objectives.222The application to reachability is less straightforward here. In particular, for Büchi, one could give up the vertex and let have three successors. For reachability, we need the decision of Player about not visiting in her first transition to be un-recoverable. Consider the generalized Büchi game played on the graph of (Figure 1, right), now with and . Note that Player has a Büchi objective.
Recall that all the vertices with more than one successor belong to Player , and so there is a single strategy for Player in the game. Consider the strategy for Player that, in , takes the token to , where it loops forever. Clearly, the induced play satisfies . When Player is restricted to memoryless strategies, it cannot satisfy her objectives, making this profile a -bounded CRS solution. On the other hand, when Player is not memoryless, she can take the token down to and then alternate between and . Hence, when Player has memory or more, she would deviate from every profile that leads to , and so there is no CRS solution in .
The example of in the proof of Theorem 2 heavily relies on Player having an objective that requires memory. We now show that for SNE-CRSs, when a deviation of a player may need to be beneficial to several players, a system may have an SNE-CRS solution only thanks to bounds on the memory of the environment players, even when all players have memoryless objectives.
Theorem 3.
There is a -player Büchi game such that there is no SNE-CRS solution for , yet there is a -bounded SNE-CRS solution for .
Proof.
We prove the theorem for the Büchi case. The same game and considerations apply for reachability objectives. Consider the -players Büchi game in Figure 2. We use diamonds to denote vertices controlled by Player . Let , , and .
Note that all the vertices with more than one successor belong to Player or Player , and so there is a single strategy for Player in the game. We first describe a -bounded SNE-CRS solution for . Let be the memoryless strategy for Player that, in , takes the token to and, in , take the token to . Let be the memoryless strategy for Player that loops in . Clearly, , and so . We prove that Player and Player cannot deviate to memoryless strategies in a way that causes their objectives to be satisfied. Clearly, a deviation by Player alone cannot affect the outcome of the game. Also, a deviation by Player alone may not cause the outcome to reach or . Consider now a joint deviation of Player and Player . When Player uses a memoryless strategy, it cannot cause the outcome of the profile to visit both and infinitely often. Thus, the deviation is not beneficial to either Player and Player . Hence, is a -bounded SNE-CRS solution.
On the other hand, when Player has memory , then for every profile whose outcome reaches , Player and Player would deviate to an outcome that satisfies their both objectives, and so no SNE-CRS solution exists for .
We now complete the picture and show that when the objective of the environment players are memoryless and deviations are allowed only for single players, then adding memory to the environment may only help the system. Thus, for NE-CRS with memoryless objectives, we cannot have examples as those in Theorems 2 and 3.
Theorem 4.
For every Büchi (or reachability) game , if there is a -bounded NE-CRS solution in , then there is also an NE-CRS solution in .
Proof.
Consider a Büchi (or reachability) game . Let be a -bounded NE-CRS solution. We prove that there exists a strategy for Player such that the profile is an NE-CRS solution.333In Appendix A.1, we show that the transition to is essential, thus need not be an NE-CRS solution. In fact, the example is stronger, showing that a profile may be a -bounded NE-CRS solution and still no profile in which the system follows its strategy in is an NE-CRS solution. We also show that the strategy of Player in a -bounded NE-CRS solution may require memory of size .
For every , let be the graph obtained from by removing edges that leave vertices in that do not agree with the memoryless strategy , for all . Consider the zero-sum two-player game between Player and Player . Let be the winning region of Player in , and let be a memoryless strategy for Player for the vertices in . That is, iff for every strategy for Player , the outcome in of and from satisfies . Likewise, let be a memoryless strategy for Player in that is winning in all vertices not in . That is, iff for every strategy for Player , the outcome in of and from does not satisfy . Note that since is a Büchi objectives, memoryless strategies and exist.
Let . We first argue that for all and vertices , we have that . To see this, assume by way of contradiction that there exists and a vertex such that . Let be the first such vertex. That is, are all not in and . Consider the strategy for Player that agrees with in the vertices and agrees with in all other vertices. Consider the profile . Note that is memoryless, has a prefix , and, as is winning in , the outcome continues in a way that satisfies . Thus, , contradicting the fact that is a -bounded -fixed NE.
We can now define the strategy , as follows. As long as the generated play follows , then agrees with . If for some and vertex , Player deviates and moves the token to a successor of that is different from , then follows the strategy for the rest of the game. Note that need not be memoryless (even when is memoryless).
We argue that the profile is an NE-CRS solution. First, since agrees with as long as the environment players follow their strategies in , then , and so . Thus, . It is left to show that is a -fixed NE. Consider a player and a strategy for Player . Let be the longest prefix of that agrees with . Thus, , and so, by its definition, the strategy starts to follow the strategy after the history . Since , then . Therefore, the strategy is winning in when the game starts in . Hence, does not satisfy , and so is not a beneficial deviation for Player . Thus, is an NE-CRS solution, and we are done.
4 Memory Requirements for NE-CRS
In this section we consider the memory requirements for NE-CRS, namely when the solution concept is an NE. Consider a Büchi game . In [31], Ummels shows444The study in [31] considers Streett objectives, and includes also the parameter of the number of pairs in the objectives. It also combines the memory of the strategy with the size of the state space. The presentation here includes a straightforward adjustment to our setting. that for a desired set of winners, if there exists a -NE in , then there exists a -NE in which the memory of all players is of size . Since an NE-CRS solution corresponds to a -fixed -NE with , and once , then a profile is a -fixed -NE iff it is a -NE, the result provides an upper bound also to our problem. Below we prove a matching lower bound. The proof is not too complicated, and mainly serves as a warm-up to the study of SNE-CRS.
The lower bound holds already the class of sink games, defined below. Consider a graph . A vertex in is a sink if it has only one outgoing edge, which is a self-loop. Then, a -player sink game is a game in which the only cycles in are sinks, and for every , the objective is a set of sinks. Note that since once a play reaches a sink it stays there forever, the objective in sink games can be described using reachability, avoid, Büchi, or co-Büchi objectives.
Theorem 5.
For every , we can construct a -player sink game such that has an NE-CRS solution, and every CRS solution for requires all the environment players to have memory .
Proof.
We define as follows (see an illustration for the case in Figure 3).
A play in starts at the initial vertex . For each , Player controls the vertex and decides whether to move the token to or to a vertex for some . For each , Player also controls the vertex . From , Player chooses a successor for some . The vertex is a sink that satisfies the objectives of all players in . The vertex is a sink that satisfies the objective of Player .
Note that the only play in which Player achieves her objective is , in which all players in , when controlling their vertices, choose to move the token towards . If for some , Player instead chooses to move the token to a vertex for some , then Player can move the token from to , making the deviation non-beneficial for Player . Thus, an NE-CRS solution for consists of strategies that direct the token to and “punish” players that do not follow such a direction. Consider . In order to punish Player by moving the token from to , Player has to remember the vertex from which the token has reached . Since there are such possible vertices, has a NE-CRS solution where the strategy of each player uses memory. Also, if for some , the strategy of Player has a memory smaller than , then there exists a player that can move from to without being punished, which implies that the profile is not an NE-CRS solution.
In Appendix A.2, we describe and prove the bounds formally.
5 On the SNE-CRS Problem
In this section we analyze the complexity of the SNE-CRS problem and the memory required for the players in an SNE-CRS solution. For both problems, the upper bounds follow easily from the study of robust equilibrium and SSE. Our main contributions are the lower bounds. For the complexity of the SNE-CRS problem, we relate the collaboration of players in SNE with the concept of non-cooperation in rational synthesis. For the results on the memory, lower bounds are open also for the other solution concepts, and we show that our contribution applies for them too.
5.1 Solving SNE-CRS
In this section we prove that the SNE-CRS problem is PSPACE-complete. The upper bound is similar to the one presented for other types of equilibria with respect to deviations by a coalition of players and is based on adjusting the objectives in the deviator game of Brenguier [6] to the solution concept of SNE. The adjustment is quite straightforward, and we describe it in the full version. The PSPACE algorithm holds for every objective that can be translated in polynomial space to an Emerson-Lei objective555An Emerson-Lei objective is given by a Boolean assertion over subsets of . A play induces an assignments , where for every set , we have that iff . Then, satisfies iff satisfies . For example, the Emerson-Lei objective is equivalent to the generalized Büchi objective , and the objective is satisfied by plays that visit vertices in infinitely often and visit vertices in only finitely often. [16]. This clearly includes (but is not limited to) Büchi and co-Büchi objectives.
Our lower bound involves an interesting relation between the collaboration of players in an SNE and the concept of non-cooperation in rational synthesis. We start with a definition of the latter. Recall that in CRS, we assume that the environment players are collaborative, in the sense they would follow a suggested equilibrium. In non-cooperative rational synthesis (NRS), we cannot suggest a strategy to the environment players and only know they would reach an equilibrium [21]. Accordingly, for NE-NRS, the goal is to return a strategy for Player such that Player wins in every -fixed NE . In other words, Player follows , and no matter how the environment players behave, then as long as they are rational, and so resulting profile is a -fixed NE, the objective of Player is satisfied.
Note that in NE-NRS, the system has to cope only with deviations of single players, yet the players are non-cooperative. On the other hand, in SNE-CRS, the system has to cope with deviations of coalitions of players, yet the players are cooperative, and would follow a suggested -fixed SNE. In this section we relate NE-NRS with SNE-CRS, showing that the challenge of coping with deviations of coalitions is similar to the challenge of coping with non-cooperation. Intuitively, in both cases, all the environment players may deviate simultaneously, as long as these deviations are beneficial for them.
We formalize the connection by describing a class of games such that for every game in the class , there is a game such that there is an NE-NRS in iff there is an SNE-CRS in . Note that and are defined with respect to the same game graph, and only the objectives are different. Moreover, the class is strong, in the sense that the PSPACE-hardness of NE-NRS applies already to a game in [13]. Accordingly, the results in this section, beyond the interesting connection between non-cooperation and coalitions, also imply PSPACE-hardness for the problem of SNE-CRS.
In the rest of this section we define the class and describe the reduction from to . Recall that in a sink game, the only cycles are sinks, and objectives are set of sinks. The class consists of special sink games, defined below, which restricts sink games further. Nevertheless, the class of special sink games captures the PSPACE-hardness of NE-NRS for all types of objectives, and we are going to relate NE-NRS solutions in special sink games to SNE-CRS solutions in sink games.
Definition 6.
A -player sink game , with , is special if the following holds:
-
1.
.
-
2.
.
-
3.
For every and vertex , there is a vertex such that .
Note that the third condition implies that is not empty and that whenever a token reaches a vertex of Player , for , she can move the token to and cause all players to win.
Theorem 7.
Consider a -player special sink game . The game has an NE-NRS solution iff the game has an SNE-CRS solution.
Proof.
Let , and let be the set of environment players.
Given a play in , let denote the set of environment players that own at least one vertex visited along . For a profile , we use to denote . Note that when we consider deviations in a -NE , only deviations of players in are of interest. Indeed, for every , if Player changes her strategy, the outcome of the profile is not changed. We say that a play is a trap for Player in if it reaches a sink such that for every , we have that . Given a strategy for Player , and a play in , we say that agrees with (and that agrees with ) if for every , if , then .
In Appendix A.3, we prove that the following three claims are equal. The theorem follows from the equivalence between and .
-
The game has an NE-NRS solution.
-
The game has an SNE-CRS solution.
-
There exists a strategy for Player such that there does not exist a trap in that agrees with .
Intuitively, both NE-NRS and SNE-CRS solutions consider stable profiles in which Player uses a strategy as described in . In NE-NRS, stability amounts to an NE, and the universal quantification on all stable profiles is explicit in the definition of NRS. In SNE-CRS, we do start with a single profile, but deviations of sets of players capture the many profiles that have to be considered in NRS. Then, as specified in , in the setting of special sink games, the relevant deviations correspond to traps, and the two notions coincide.
Recall that Büchi and co-Büchi objectives are special cases of Emerson-Lei objectives, for which the SNE-CRS problem can be solved in PSPACE. Also, sink objectives are special case of both Büchi and co-Büchi objectives. Since the NE-NRS problem is PSPACE-hard already for special sink games [13], Theorem 7 enables us to conclude with the following.
Theorem 8.
The SNE-CRS problem for Büchi and co-Büchi games is PSPACE-complete.
5.2 Memory Requirements for SNE-CRS
In this section we consider the memory requirements for SNE-CRS. The upper bound is similar to the one known for resilient-CRS and is based on an analysis of the memory required for the players in the corresponding deviator game. Essentially (see details in the full version), the players have to maintain in their memory the set of players who have deviated, as well as memory required for the satisfaction of the objectives of the winning players.
Theorem 9.
Consider a -player Büchi game . If there exists a SNE-CRS solution in , then there also exists an SNE-CRS solution in which the strategy of each player uses memory at most .
Our main contribution is a lower bound, which was not studied before, and applies also to other solution concepts.
Theorem 10.
For every , and , there is a sink game such that has an SNE-CRS solution, and every SNE-CRS solution requires environment players to have memory .
Proof.
We define as follows (see in Figure 4).
Let denote the set of all subsets of of size . That is, . For every set of players and , let be the -th element in when the elements are ordered by the usual order on .
The game begins at the initial vertex , which is owned by Player . From , Player chooses a subset and moves the token to the vertex . For each subset and , Player controls the vertex . If , then the successors of are the vertices and . If , then the successors of are the vertices and , for . For each , Player controls the vertex . From , Player chooses a subset and moves the token to the vertex , where p is a symbol, indicating the game is in the “punishment” layer. For every and , the vertex is owned by Player , who chooses an index and moves the token to the vertex . For every , the vertex is a sink that satisfies the objectives of all players in . The vertex is a sink that satisfies the objective of Player .
The idea behind is as follows. Consider a profile . Note that if Player wins in , then its outcome reaches the vertex , in which case all the other players lose in . Also, Player wins in iff its the outcome of reaches a sink of the form , in which case Player loses. Thus, the objectives of Player and Player complement each other. Assume that Player wins in , thus its outcome reaches . In order for to be an SNE, every deviation of players in should result in a profile in which at least one of the players that deviate does not satisfy its objective. Thus, when the coalition of deviators is , then there should be such that the outcome of the new profile either continues to reach or, in case , reaches a sink or for some . Before we explain why this implies the existence and an SNE-CRS solution, and why strategies for players in any SNE-CRS solution require exponential memory, let us define formally.
We define , as follows.
-
1.
The set of vertices and its partition to owners is as follows.
-
.
-
.
-
For every , we define .
-
-
2.
The set contains edges of the following types:
-
For every , there is an edge from to .
-
For every and , there is an edge from to and to .
-
For every , there is an edge from to and to , for all .
-
For every and , there is an edge from to .
-
For every , , and , there is an edge from to .
-
The vertices and , for all , are sinks.
-
-
3.
The objectives of the players are defined as follows.
-
.
-
For every , we have .
-
We first describe an SNE-CRS solution in . Since Player only owns sinks, her strategy is straightforward. Moreover, the notions of SNE and -fixed SNE coincide in . For Player , the strategy is an arbitrary memoryless strategy. As for , the strategy directs Player to move the token to from vertices in that she owns and to move the token to from . It is easy to see that , and so . We prove that is a -fixed SNE. Consider a coalition and a deviation profile . Since Player is losing in , and win in every outcome that does not reach , a deviation for Player that does not reach is always beneficial, and so, we can assume that . Let . Let be the set of players that Player chooses from in ; thus . If there is a player that in moves the token from the vertex with to the sink , then the outcome of reaches . Thus, , and the deviation of the players in is not beneficial. If all players in do not move the token to in their strategies in (in particular, this means that ), then let be the vertex to which Player moves the token from in . Since all the sinks that are reachable from are losing for Player , the deviation is not beneficial for Player , implying that . Hence, Player follows , which directs her to move the token from to . Since every successor of is losing for some player in , the deviation is not beneficial to all the players in , and so is an SNE in which Player wins.
We continue and prove that every SNE-CRS solution in requires each of the players to have memory . Assume by contradiction that there exists an SNE profile in which Player wins and there exists such that the strategy has a memory structure with fewer than states. Since there are fewer than states in , and there are subsets of of size , there exists a set such that and never (that is, no matter what the history along which has been reached) directs Player to move the token from to .
Consider the coalition , and the deviation profile , where for every , the strategy agrees with , except that from , with , the strategy directs Player to move the token to , in case , and to , in case . Finally, the strategy agrees with , except that from , the strategy directs Player to move the token to , and for every such that , the strategy directs Player to move the token from to . Note that since both and are of size , the set is not empty, and thus exists and is in . Let . By the definition of the strategies in , there exists such that and . Thus, . Since , and are all not in , and thus also not in , it follows that . Hence, the deviation to is beneficial to all the players in , contradicting the assumption that is an SNE.
6 Memory Requirements for Additional Solution Concepts
Recall that different applications have initiated the study of different solution concepts for settings in which a coalition of players may deviate together. While upper bounds on the concepts of robust equilibria and SSE serve as a basis to our upper bounds here, no lower bounds are known on the memory requirements for CRS solutions with respect to these concepts. In this section we show that the construction in Theorem 10 can be modified to show a lower bound on the memory required to the environment players in CRS solution when the solution concepts are resilient (and hence, robust) equilibria and SSE.
For and , consider the -player sink game obtained from by changing the objectives of the players so that now the sink is winning for all players in (rather than for Player only in ). Thus, as in , the objectives of Player and Player still complement each other, and now, Players win also in profiles that reach the vertex . Also, as in , all the vertices owned by Player have only one successor, and the notions of -fixed equilibrium and (usual) equilibrium coincide.
We start with resilient equilibria, and show that has a resilient-CRS solution, and that every resilient-CRS solution requires environment players to have memory . Recall that a profile is a resilient-equilibrium if for every subset , every deviation of the players in that benefits a player in also harms a player in . The proof is similar to the proof of Theorem 10. In particular, the profile described in the proof is a resilient-CRS solution in . To see this, recall that reaches , and is thus winning in for all the players in . Thus, only Player may benefit from a deviation. When Player chooses a set , all the players in have to deviate on order for to be avoided, and so, by the definition of resilient equilibrium, all of them have to win when the game reach the punishment layer. This, however, is impossible in deviations from , as for all , the only way for Player to make sure she does not lose is to remember the set and proceed as in , from to the vertex . Moreover, using considerations similar to these in the proof of Theorem 10 (see details in Appendix A.4), if there is such that Player has no memory to keep track of the set chosen by Player , then there is no resilient-CRS. Essentially, in such a case Player can convince a coalition of players to join the deviation by letting the outcome of the new profile reach , where a player not in would be punished.
For the solution concept of SSE, we show that has an SSE-CRS solution, and that every SSE-CRS solution requires environment players to have memory . Recall that a profile is an SSE if for every coalition , every deviation that harms a player not in also harms a player in . In , a deviation that does not reach the vertex causes Player to lose. Thus, a profile is an SSE iff for every coalition and every deviation that causes the game to reach the punishment layer, at least one player in the coalition looses. Accordingly, the profile described in the proof of Theorem 10 is an SSE-CRS solution in and in every SSE-CRS solution, every player in has to remember the set chosen by Player from . Indeed (see details in Appendix A.5), otherwise, a coalition of players in can deviate by letting the outcome of the new profile reach the vertex , where a player not in would be punished.
References
- [1] S. Almagor, O. Kupferman, and G. Perelli. Synthesis of controllable Nash equilibria in quantitative objective game. In Proc. 27th Int. Joint Conf. on Artificial Intelligence, pages 35–41, 2018.
- [2] E. Anshelevich, A. Dasgupta, J. Kleinberg, E. Tardos, T. Wexler, and T. Roughgarden. The price of stability for network design with fair cost allocation. In Proc. 45th IEEE Symp. on Foundations of Computer Science, pages 295–304. IEEE Computer Society, 2004. doi:10.1109/FOCS.2004.68.
- [3] R. Aumann. Acceptable Points in General Cooperative -Person Games. In Contributions to the Theory of Games, volume 4, 1959.
- [4] R. Bloem, K. Chatterjee, and B. Jobstmann. Graph games and reactive synthesis. In Handbook of Model Checking., pages 921–962. Springer, 2018. doi:10.1007/978-3-319-10575-8_27.
- [5] P. Bouyer, N. Fijalkow, M. Randour, and P. Vandenhove. How to play optimally for regular objectives? In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023), volume 261 of Leibniz International Proceedings in Informatics (LIPIcs), pages 118:1–118:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ICALP.2023.118.
- [6] R. Brenguier. Robust equilibria in mean-payoff games. In Proc. 19th Int. Conf. on Foundations of Software Science and Computation Structures, volume 9634 of Lecture Notes in Computer Science, pages 217–233. Springer, 2016. doi:10.1007/978-3-662-49630-5_13.
- [7] L. Brice, J-F. Raskin, M. Sassolas, G. Scerri, and M. Bogaard. Pessimism of the will, optimism of the intellect: Fair protocols with malicious but rational agents. In IEEE 38th Computer Security Foundations Symposium, pages 33–47, 2025.
- [8] T. Brihaye, A. Goeminne, J.C.A. Main, and M. Randour. Reachability games and friends: A journey through the lens of memory and complexity. In 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2023), volume 284 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1:1–1:26. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.FSTTCS.2023.1.
- [9] V. Bruyère, C. Grandmont, and J-F.Raskin. As soon as possible but rationally. In 35th International Conference on Concurrency Theory (CONCUR 2024), volume 311 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1–14:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CONCUR.2024.14.
- [10] V. Bruyère, J-F. Raskin, A. Reynouard, and M. Bogaard. The non-cooperative rational synthesis problem for spes and omega-regular objectives. In 36th International Conference on Concurrency Theory (CONCUR 2025), volume 348 of Leibniz International Proceedings in Informatics (LIPIcs), pages 12:1–12:23. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPIcs.CONCUR.2025.12.
- [11] A. Casares. On the minimisation of transition-based rabin automata and the chromatic memory requirements of muller conditions. In 30th EACSL Annual Conference on Computer Science Logic (CSL 2022), volume 216 of Leibniz International Proceedings in Informatics (LIPIcs), pages 12:1–12:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.CSL.2022.12.
- [12] K. Chatterjee, R. Majumdar, and M. Jurdzinski. On Nash equilibria in stochastic games. In Proc. 13th Annual Conf. of the European Association for Computer Science Logic, volume 3210 of Lecture Notes in Computer Science, pages 26–40. Springer, 2004. doi:10.1007/978-3-540-30124-0_6.
- [13] R. Condurache, E. Filiot, R. Gentilini, and J.-F. Raskin. The complexity of rational synthesis. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55 of Leibniz International Proceedings in Informatics (LIPIcs), pages 121:1–121:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.ICALP.2016.121.
- [14] S. Dziembowski, M. Jurdzinski, and I. Walukiewicz. How much memory is needed to win infinite games. In Proc. 12th ACM/IEEE Symp. on Logic in Computer Science, pages 99–110, 1997.
- [15] R. Ehlers. Symbolic bounded synthesis. In Proc. 22nd Int. Conf. on Computer Aided Verification, volume 6174 of Lecture Notes in Computer Science, pages 365–379. Springer, 2010. doi:10.1007/978-3-642-14295-6_33.
- [16] E.A. Emerson and C.-L. Lei. Modalities for model checking: Branching time logic strikes back. Science of Computer Programming, 8:275–306, 1987. doi:10.1016/0167-6423(87)90036-0.
- [17] E. Filiot, N. Jin, and J.-F. Raskin. An antichain algorithm for LTL realizability. In Proc. 21st Int. Conf. on Computer Aided Verification, volume 5643, pages 263–277, 2009. doi:10.1007/978-3-642-02658-4_22.
- [18] D. Fisman, O. Kupferman, and Y. Lustig. Rational synthesis. In Proc. 16th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, volume 6015 of Lecture Notes in Computer Science, pages 190–204. Springer, 2010. doi:10.1007/978-3-642-12002-2_16.
- [19] E. Koutsoupias and C. Papadimitriou. Worst-case equilibria. Computer Science Review, 3(2):65–69, 2009. doi:10.1016/J.COSREV.2009.04.003.
- [20] O. Kupferman, Y. Lustig, M.Y. Vardi, and M. Yannakakis. Temporal synthesis for bounded systems and environments. In Proc. 28th Symp. on Theoretical Aspects of Computer Science, pages 615–626, 2011.
- [21] O. Kupferman, G. Perelli, and M.Y. Vardi. Synthesis with rational environments. Annals of Mathematics and Artificial Intelligence, 78(1):3–20, 2016. doi:10.1007/S10472-016-9508-8.
- [22] O. Kupferman and N. Shenwald. Games with trading of control. In 34th International Conference on Concurrency Theory (CONCUR 2023), volume 279 of Leibniz International Proceedings in Informatics (LIPIcs), pages 19:1–19:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.CONCUR.2023.19.
- [23] O. Kupferman and N. Shenwald. Positional-Player Games. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025), volume 345 of Leibniz International Proceedings in Informatics (LIPIcs), pages 64:1–64:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPIcs.MFCS.2025.64.
- [24] J. C. A. Main. Arena-independent memory bounds for nash equilibria in reachability games. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024), volume 289 of Leibniz International Proceedings in Informatics (LIPIcs), pages 50:1–50:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.STACS.2024.50.
- [25] D.A. Martin. Borel determinacy. Annals of Mathematics, 65:363–371, 1975.
- [26] J.F. Nash. Equilibrium points in -person games. In Proceedings of the National Academy of Sciences of the United States of America, 1950.
- [27] C. H. Papadimitriou. Algorithms, games, and the internet. In Proc. 33rd ACM Symp. on Theory of Computing, pages 749–753, 2001.
- [28] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proc. 16th ACM Symp. on Principles of Programming Languages, pages 179–190, 1989.
- [29] S. Schewe and B. Finkbeiner. Bounded synthesis. In 5th Int. Symp. on Automated Technology for Verification and Analysis, volume 4762 of Lecture Notes in Computer Science, pages 474–488. Springer, 2007. doi:10.1007/978-3-540-75596-8_33.
- [30] W. Thomas. On the synthesis of strategies in infinite games. In Proc. 12th Symp. on Theoretical Aspects of Computer Science, volume 900 of Lecture Notes in Computer Science, pages 1–13. Springer, 1995. doi:10.1007/3-540-59042-0_57.
- [31] M. Ummels. The complexity of Nash equilibria in infinite multiplayer games. In Proc. 11th Int. Conf. on Foundations of Software Science and Computation Structures, pages 20–34, 2008.
- [32] J. von Neumann and O. Morgenstern. Theory of games and economic behavior. Princeton University Press, 1953.
- [33] M. Wooldridge, J. Gutierrez, P. Harrenstein, E. Marchioni, G. Perelli, and A. Toumi. Rational verification: From model checking to equilibrium checking. In Proc. of 30th Conf. on Artificial Intelligence, pages 4184–4190, 2016.
Appendix A Missing Proofs and Examples
A.1 On strategies for Player 1 in a -bounded NE-CRS solution
We describe two interesting examples. The first is a game and a CRS solution such that there is no -bounded CRS solution for a memoryless . The second is a game such that every -bounded CRS solution requires to have memory of size at least .
Consider the Büchi game in Figure 5 (left). Let and . Consider the following strategy of of Player :
-
.
-
.
Thus, if the token reaches without visiting before, Player always direct it to , where only Player wins, and if the token visits for one time before reaching , then Player always direct the token to , where both players win. Let be a strategy for Player that proceeds to once and then to . It is easy to see that while is a CRS solution, there is no -bounded CRS solution for a memoryless .
We continue and show that the strategy of Player in a -bounded CRS solution may require memory of size . Consider the Büchi game in Figure 5 (right). Let and .
Consider a profile for a memoryless strategy for Player . We claim that is not a -bounded CRS solution. Indeed, if , then is not satisfied, and if , then Player can deviate to a memoryless strategy , where only is satisfied.
Nevertheless, the game does have a -bounded CRS solution , for that is not memoryless. To see this, consider a strategy of Player that behaves as follows:
-
.
-
.
Thus, when the game starts and as long as Player moves the token from to , Player moves the token from down to . If Player moves the token from to , then on the next visit of the token in , Player moves it to , where no player wins. Note that when Player uses a memoryless strategy, then this “next” visit must be the second one, and so the definition of above covers all possible histories.
It is not hard to see that , for the memoryless strategy with is a -bounded CRS solution
A.2 Missing Details in the proof of Theorem 5
The game graph and the objectives are defined as follows:
-
1.
The vertex sets of the players are defined as follows:
-
Player controls the set
-
For every , Player controls the set .
-
-
2.
The set contains edges of the following types:
-
For every , edges from to .
-
For every , there is an edge from to where .
-
For every , and , there is an edge from to .
-
The vertices in are sink vertices.
-
-
3.
The objectives are defined as follows:
-
The objective of Player 1 is .
-
For every , the objective of Player is .
-
We begin by describing an NE profile in which Player wins. For every , Player decides to always move from towards . If there exists such that Player deviated and moved the token from to for some , then Player , move the token to . The outcome of the profile is , thus, . If for some , Player deviates and move the token from to for some , then Player moves the token to and Player loses. Meaning that deviation is not beneficial and that is an NE. Overall is an NE in which Player wins. Note that the strategy for each players in can be implemented with a finite memory structure of size . Indeed. each player only needs to remember if a deviation has happened, and if so, which player (besides themselves and Player ) has deviated.
We now show that in every NE in which Player wins, the strategy of every player uses memory at least . Assume by contradiction that there exists an NE profile where Player wins and there exists such that the strategy has a memory structure with less than states. Then, by the pigeon hole principle, since there are less than states in but has successors, there exists a successors to , which we mark with , such that never chooses as a successor from . Since , we have that and may deviate. Let be a strategy for Player that agrees with , except that from , the strategy always move the token to . Then, since never chooses as a successor from , the profile visits a sink where . Thus, Player wins in and so is not stable.
A.3 Missing Details in the proof of Theorem 7
We prove that the following three claims are equal:
-
The game has an NE-NRS solution.
-
The game has an SNE-CRS solution.
-
There exists a strategy for Player such that there does not exist a trap in that agrees with .
We first prove that iff . In fact we prove a stronger claim, namely that for every strategy for Player , we have that is an NE-NRS solution in iff there does not exist a trap in that agrees with .
Consider a strategy for Player . Assume first that is an NE-NRS solution in , and assume by way of contradiction that there exists a trap that agrees with . Thus, reaches a sink , and for every , we have that . Consider a profile , such that for every , the strategy agrees with . Since , Player loses in . Since for every , we have that , then . Thus, no player in has an incentive to deviate in . Hence, is a -fixed NE in in which Player loses, contradicting the fact is an NE-NRS solution for .
For the second direction, assume that there does not exist a trap in that agrees with , and assume by way of contradiction that is not an NE-NRS solution for . That is, there is a -NE profile such that . Consider the play . Since , the play reaches a sink . Also, since is not a trap, there exists such that , which implies that . By the third condition on the structure of special sink games, Player can deviate and move the token from the first vertex she owns in to a vertex in . Since , such a deviation causes Player to win in , contradicting the fact is a -fixed NE.
We continue and prove that iff . Here too, we prove a stronger claim, namely that for every strategy for Player , we have that there is an SNE-CRS solution for iff there does not exist a trap in that agrees with .
Assume first that is such that there is an SNE-CRS solution for . Recall that for every , we have that . Thus, as , it must be that .
Assume by way of contradiction there exists a trap that agrees with . Let . Since , then . For every , let be a strategy for Player that agrees with . Let , and consider the profile . Since agrees with and for all , the strategy agrees with , we have that . Since is a trap, we have that . Thus, the deviation strictly benefits all the players in and so is not a -fixed SNE.
Assume now that is such that there does not exist a trap that agrees with . Consider the profile , where is some memoryless strategy, and for every the strategy is a memoryless strategy that moves the token from every vertex to . Note that since is a special sink game, such strategies exist. We prove that is an SNE-CRS solution for .
Let . First, we show that reaches , which implies that . Clearly, if , then, by the definition of the strategies , the play reaches . Otherwise, . Then, if does not reach , then, by the definition of , we have that reaches , implying that is a trap that agrees with and contradicting the fact that no such trap exists.
Second, we show that is a -fixed SNE in . Assume by way of contradiction that is not a -fixed SNE in . Then, there exists a coalition and a strategy profile such that for the profile , we have that . By the definition of , the latter implies that reaches a sink . Let . By the definition of the strategies , it must be that . Indeed, otherwise would have reached . Thus, as , it follows that for every , we have that . In addition, as , we also have that . Thus, the play reaches a sink , and for every , we have that . Thus, is a trap that agrees with , contradicting the assumption that no such trap exists.
A.4 Resilient-CRS memory lower bound
We first show that is a resilient-CRS. Consider a coalition and a deviation profile . Let . Let be the set of players that Player choose from in ; thus . If there is a player that in moves the token from the vertex with to the sink , then the outcome of reaches . Thus, , and the deviation of the players in is not strictly beneficial to a player in the coalition. If all players in do not move the token to in their strategies in (in particular, this means that ), then let be the vertex to which Player moves the token from in . Since all the sinks that are reachable from are losing for Player , the deviation is harmful for Player , implying that . Hence, Player follows , which directs her to move the token from to . Since every successor of is losing for some player in , the deviation harms a player in , and so is resilient in which Player wins.
We continue and prove that every resilient-CRS solution in requires each of the players to have memory . Assume by contradiction that there exists a resilient profile in which Player wins and there exists such that the strategy has a memory structure with fewer than states. Since there are fewer than states in , and there are subsets of of size , there exists a set such that and never (that is, no matter what the history along which has been reached) directs Player to move the token from to .
Consider the coalition , and the deviation profile , where for every , the strategy agrees with , except that from , with , the strategy directs Player to move the token to , in case , and to , in case . Finally, the strategy agrees with , except that from , the strategy directs Player to move the token to , and for every such that , the strategy directs Player to move the token from to . Note that since both and are of size , the set is not empty, and thus exists and is in . Let . By the definition of the strategies in , there exists such that and . Thus, . Since , and are all not in , and thus also not in , it follows that .
Hence, the deviation to is strictly beneficial for Player , and does not harm any member of , contradicting the assumption that is resilient.
A.5 SSE-CRS memory lower bound
We first show that is an SSE-CRS. Consider a coalition and a deviation profile . Let . Let be the set of players that Player choose from in ; thus . If there is a player that in moves the token from the vertex with to the sink , then the outcome of reaches . Thus, , and the deviation of the players in does not harm a player outside . If all players in do not move the token to in their strategies in (in particular, this means that ), then let be the vertex to which Player moves the token from in . Since all the sinks that are reachable from are losing for Player , the deviation harms Player , implying that . Hence, Player follows , which directs her to move the token from to . Since every successor of is losing for some player in , the deviation harms a player in , and so is SSE in which Player wins.
We continue and prove that every SSE-CRS solution in requires each of the players to have memory . Assume by contradiction that there exists a SSE profile in which Player wins and there exists such that the strategy has a memory structure with fewer than states. Since there are fewer than states in , and there are subsets of of size , there exists a set such that and never (that is, no matter what the history along which has been reached) directs Player to move the token from to .
Consider the coalition , and the deviation profile , where for every , the strategy agrees with , except that from , with , the strategy directs Player to move the token to , in case , and to , in case . Finally, the strategy agrees with , except that from , the strategy directs Player to move the token to , and for every such that , the strategy directs Player to move the token from to . Note that since both and are of size , the set is not empty, and thus exists and is in . Let . By the definition of the strategies in , there exists such that and . Thus, . Since , and are all not in , and thus also not in , it follows that .
Hence, the deviation to does not harm any member of while harming Player , contradicting the assumption that is SSE.
