Abstract 1 Introduction 2 Preliminaries 3 Is Memory for the Environment Helpful? 4 Memory Requirements for NE-CRS 5 On the SNE-CRS Problem 6 Memory Requirements for Additional Solution Concepts References Appendix A Missing Proofs and Examples

Memory Requirements in Non-Zero-Sum Games

Yoav Feinstein ORCID School of Engineering and Computer Science, Hebrew University, Jerusalem, Israel Orna Kupferman ORCID School of Engineering and Computer Science, Hebrew University, Jerusalem, Israel
Abstract

The interaction between a system and the components modeling its environment is traditionally modeled by a multi-player game played on a finite graph. In zero-sum games, the players have conflicting objectives, and it is clear that increasing the memory of the environment players can only make it harder for the system to win. In non-zero-sum games, the objectives of the players may overlap. There, typical questions concern the stability of the game and the equilibria the players may reach. In particular, in rational synthesis (RS), the goal is to find an equilibrium that satisfies the objective of the system.

We study how the memory of the environment players may affect the existence of an RS solution. As we show, the picture is diverse, even when the objectives of all players are memoryless. On the one hand, when stability amounts to a Nash equilibrium (NE), then increasing the memory of the environment may only help the system to suggest an RS solution. On the other hand, when the notion of stability involves deviations by coalitions of environment players, for example in a strong Nash equilibrium (SNE), then increasing their memory may sometimes enable and sometimes prevent the existence of an RS solution. We study memory bounds for the players, showing that the memory required may be polynomial in an NE-RS solution and exponential in an SNE-RS solution. We also solve the SNE-RS problem, show that it is PSPACE-complete, and relate the differences between NE and SNE with the differences between cooperative and non-cooperative RS.

Keywords and phrases:
Non-Zero-Sum Games, Synthesis, Memory
Copyright and License:
[Uncaptioned image] © Yoav Feinstein and Orna Kupferman; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Formal languages and automata theory
; Theory of computation Semantics and reasoning
Editors:
Stefano Guerrini and Barbara König

1 Introduction

Synthesis is the automated construction of systems from their specifications [28]. Modern systems often consist of interacting components. The interaction is modeled by a multi-player game played on a finite graph. In the turn-based setting, the vertices of the game graph are partitioned among the players. A token is placed on an initial vertex, and in each turn, the player that owns the vertex with the token moves it to a successor vertex. Each player has a strategy that directs her how to move the token when it reaches vertices she owns. A profile is a vector of strategies, one for each player. The outcome of a profile is a play – an infinite path in the game graph, obtained when the players follow their strategies. The goal of each player is to direct the game into a play that satisfies her objective. Each objective α defines a subset of Vω [25], where V is the set of vertices of the game graph. For example, in games with Büchi objectives, α is a subset of V, and a play satisfies α if it visits vertices in α infinitely often.

In zero-sum games, the players compete with each other on the satisfaction of contradicting objectives. Zero-sum games with ω-regular objectives are determined: in every game, exactly one player has a winning strategy – one that achieves her objective against all strategies of the other players [25]. Deciding a zero-sum game amounts to finding this player. In contrast, in non-zero-sum games, the objectives of the players may overlap [12, 31]. There, typical questions concern the stability of the game and the equilibria the players may reach [32]. The most common notion of stability is Nash equilibrium (NE) [26]. A profile of strategies is an NE if no (single) player can benefit from unilaterally changing her strategy.

Two-player zero-sum games model the interaction between a system and its environment. The system aims to satisfy its specification in all environments. Accordingly, the environment is assumed to be hostile, as if its objective is to violate the specification [4]. Often, however, the components composing the environment have objectives of their own, and they act to achieve their objectives. For example, clients interacting with a server typically have objectives other than to fail the server. Accordingly, the setting induces a non-zero-sum game, where the objectives of the players are induced from their specifications. In rational synthesis, we let the system exploit the rationality of the environment. In particular, in cooperative rational synthesis (CRS) [18], the desired output is an NE profile whose outcome satisfies the objective of the system. Thus, in CRS, we assume that we can suggest strategies to the environment players, and if they have no incentive to deviate from these strategies, they follow them. Rational synthesis has been extensively studied in various settings and variants [33, 13, 1, 22, 9, 10]. Recall that strategies for the players direct them how to move the token in vertices they own. A strategy may depend on the history of the game so far. Thus, in different visits of the token to the same vertex v, a strategy may direct the owner of v to move the token to different successors. Extensive research has concerned the memory requirements for strategies in zero-sum games with ω-regular objectives [30, 14, 5, 11, 8]. For example, it is well known that a winning strategy for a Büchi objective can be memoryless, thus it may depend only on the current vertex. On the other hand, a winning strategy for a conjunction of k Büchi objectives may require memory k [14]. Researchers have also studied games in which the memory of the players is bounded [29, 15, 17, 20, 23]. Clearly, increasing the memory of the system or reducing the memory of the environment in a zero-sum game can only help the system to win a game.

For non-zero-sum games, the situation is less clear. First, as we show in Section 3, even when its objective is memoryless, the system may need memory for its strategy in a CRS solution. Moreover, unlike the situation in zero-sum games, an increase of memory to the environment players may be helpful for the system. That is, even when the objectives are memoryless, the only possible CRS solutions may require the environment players to have memory. Intuitively, increasing the memory of the players in non-zero-sum games enables them to satisfy multiple objectives, which may be essential for achieving both stability and the satisfaction of the system’s objective [31]. In fact, we show that when the objectives of the environment players are memoryless, then adding memory to the environment players may only help the system to win.

Our basic observations above raise several interesting questions about memory bounds in non-zero-sum games. We first prove that a CRS solution in a k-player non-zero-sum game with sink objectives (that is, ones that can be specified with all reachability or ω-regular objectives) may require each of the players to have memory O(k), matching the known upper bound [24, 31]. Further questions concern richer settings, detailed below.

The notion of an NE corresponds to deviations of single players. In some applications, a coalition of players may deviate together. For example, protocols for voting, mechanisms for exchange of messages, allocation and construction of shared resources – all should take into account the possibility of players that deviate together. The different applications induce different notions of stability. The first such definition is of strong Nash equilibrium (SNE). A profile is an SNE if no subset of players can deviate in a way that benefits all its members [3]. Then, for b1,b20, a profile is a (b1,b2)-robust equilibrium if no coalition of size b1 can deviate in a way that benefits at least one of its members without harming the other members, and no coalition of size b2 can deviate in a way that harms the other players [6]. Finally, a profile is a strong secure equilibria (SSE) if every deviation of a coalition of players that harms some player, also harms a player in the coalition [7].

We study the CRS problem when stability is defined with respect to deviations by a coalition of players. For example, in the SNE-CRS problem, we seek a profile of strategies that satisfies the objective of the system and is an SNE. The contributions in [6, 7] include a study of the complexity of the CRS problem when the solution concepts are robust-equilibrium and SSE. PSPACE upper bounds for the problems involve a reduction to a two-player game, termed the deviator game [6], which we easily extend to SSE-CRS. For the lower-bound, our contribution is more interesting and involves relating non-cooperation in rational synthesis with deviations of coalitions in SNEs: Recall that in cooperative rational synthesis, the desired output is an NE that satisfies the system objective. In non-cooperative rational synthesis (NRS) [21], the desired output is a strategy for the system player such that the objective of the system is satisfied in the outcome of all NE profiles that include this strategy. Thus, in NRS, the environment players are rational, but we cannot suggest them a strategy. As shown in [1], the cooperative and non-cooperative approaches are related to the two stability-inefficiency measures of price of stability [2] and price of anarchy [19, 27]. We relate NE-NRS (that is, NRS when stability amounts to being an NE) with SNE-CRS, showing that the challenge of coping with deviations of a coalition is similar to the challenge of coping with non-cooperation. Intuitively, in both cases, all the environment players may deviate simultaneously, as long as these deviations are beneficial for them. The relation implies that the PSPACE-hardness of the NE-NRS problem [13] applies also to the SNE-CRS problem.

Back to the study of memory requirements, we examine how the transition to solution concepts that involve deviations by a coalition of players affects these requirements. Since players in a coalition care for the satisfaction of the objectives of all the players in the coalition, their strategies have to satisfy multiple objectives. Since the latter typically requires memory, the study of the memory requirements in non-zero-sum games with deviations by coalitions is more interesting and involved. We start with some observations about the effect of increasing the memory to the environment players and show that, unlike the case of NE-CRS, here an increase to the memory of the environment players may prevent the existence of an SNE-CRS solution even when the objectives are memoryless.

On the other hand, for some games, the existence of an SNE-CRS solution requires the environment players to have memory, and we examine the memory requirements for them. For the upper bound, it is not hard to extend the analysis in [6] and describe an exponential upper bound for the required memory. Our main technical contribution is a matching lower bound. We show that even in k-player games with sink objectives, an SNE-CRS solution may require O(k) players to have memory 2Θ(k). Moreover, our bounds apply also to the solution concepts of (k,0)-robust-equilibrium and SSE, completing the picture to all known solution concepts with deviations of a coalition of players.

2 Preliminaries

Games.

For k1, let [k]={1,,k}. A k-player (turn-based) game graph is a tuple G={Vi}i[k],v0,E, where V1,,Vk are disjoint sets of vertices. For every i[k], the vertices in Vi are owned by Player i, and we let V=i[k]Vi. Then, v0V is an initial vertex, and EV×V is a total edge relation, thus for every vV, there is uV such that v,uE. For vV, we denote by 𝗈𝗐𝗇𝖾𝗋(v) the player i[k] such that vVi. The size of G, denoted |G|, is |E|, namely the number of edges in it.

A game is a tuple 𝒢=G,{αi}i[k], where G is a k-player game graph, and αi, for i[k], is a winning condition (a.k.a. objective) for Player i. In the beginning of a play in the game, a token is placed on v0. Then, in each turn, the player that owns the vertex that hosts the token chooses a successor vertex and moves the token to it. Together, the players generate a play ρ=v0,v1,v2Vω in 𝒢, namely an infinite path that starts in v0 and respects E: for all i0, we have that vi,vi+1E.

Each winning condition αi defines a subset of Vω. The objective of Player i is to cause the interaction to generate a play that satisfies αi. We describe some types of winning conditions below. For a play ρ=v0,v1, we denote by 𝗋𝖾𝖺𝖼𝗁(ρ) the set of vertices visited at least once along ρ, and by 𝗂𝗇𝖿(ρ) the set of vertices visited infinitely often along ρ. That is, 𝗋𝖾𝖺𝖼𝗁(ρ)={vV:there exists some i0 such that vi=v} and 𝗂𝗇𝖿(ρ)={vV: there are infinitely many i0 such that vi=v}. A reachability objective is given by a set of vertices αV, and it requires some vertex in α to be visited at least once; thus a play ρVω satisfies α iff 𝗋𝖾𝖺𝖼𝗁(ρ)α. A Büchi objective is given by a set of vertices αV, and it requires some vertex in α to be visited infinitely often; thus ρ satisfies α iff 𝗂𝗇𝖿(ρ)α. The objectives dual to reachability and Büchi are avoid (also known as safety) and co-Büchi, respectively. Formally, a play ρ satisfies an avoid objective αV iff 𝗋𝖾𝖺𝖼𝗁(ρ)α=, and satisfies a co-Büchi objective αV iff inf(ρ)α=. A generalized Büchi objective is a set α={α1,,αm} of Büchi objectives. A play ρVω satisfies α if it satisfies all the objectives in α; thus if for all j[m], we have that 𝗂𝗇𝖿(ρ)αj. Generalized reachability objectives are defined similarly, requiring all underlying reachability objectives to be satisfied.

Strategies, profiles, and equilibria.

For i[k], a strategy for Player i is a function fi:VViV that maps prefixes of plays that end in a vertex owned by Player i to possible extensions in a way that respects E. That is, for every history hV and vVi, we have that v,fi(hv)E. Intuitively, a strategy for Player i directs her how to move the token, and the direction may depend on the history of the play so far.

A profile is a tuple π=f1,,fk of strategies, one for each player. The outcome of a profile π=f1,,fk is the play obtained when the players follow their strategies. Formally, 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)=v0,v1,v2,Vω is such that for all j0, we have that vj+1=f𝗈𝗐𝗇𝖾𝗋(vj)(v0vj). Consider a game 𝒢 and a profile π. The set of winners in 𝒢 when the players follow π, denoted 𝖶𝗂𝗇(𝒢,π), is the set of players whose objectives are satisfied in 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π). Formally, i𝖶𝗂𝗇(π,𝒢) iff 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π) satisfies αi. The set of losers in π, denoted 𝖫𝗈𝗌𝖾(𝒢,π), is then [k]𝖶𝗂𝗇(π), namely the set of players whose objectives are not satisfied in 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π). When 𝒢 is known from the context we write 𝖶𝗂𝗇(π) and 𝖫𝗈𝗌𝖾(π) respectively.

For a subset S[k] of players, an S-profile is a set of strategies, one for each player in S. We say that a profile π extends an S-profile π if the players in S use in π their strategies in π. For a profile π=f1,,fk, a non-empty subset C[k], and a C-profile πC=iC{fi}, we denote by π[CπC] the profile in which the players in C follow their strategies in πC and the players in [k]C follow their strategies in π. Formally, π[CπC]=g1,,gk, where for every i[k], we have that gi=fi, if iC, and gi=fi, otherwise. When C={i} is a singleton, for some i[k], we simplify the notation and use π[ifi] rather than π[{i}{fi}].

A profile π=f1,,fk is a Nash Equilibrium (NE, for short) [26] if no single player has an incentive to deviate from π. Formally, π is an NE if for every i[k], if i𝖫𝗈𝗌𝖾(π), then for every strategy fi for Player i, we have that i𝖫𝗈𝗌𝖾(π[ifi]). The notion of an NE assumes deviation by single players. In some applications, a coalition of players may deviate together. The different applications induce different definitions of stability. We consider three definitions. First, a profile π is a Strong Nash Equilibrium (SNE, for short) [3] if no coalition of players can jointly deviate in a way that strictly benefits all its members. Formally, π is an SNE if for every non-empty subset C𝖫𝗈𝗌𝖾(π), and every C-profile πC, there exists jC such that j𝖫𝗈𝗌𝖾(π[CπC]). Note that an NE is a special case of an SNE in which only deviations of coalitions of size 1 are possible. Then, for b1,b20, a profile is a (b1,b2)-robust equilibrium if no coalition of size b1 can deviate in a way that benefits at least one of its members without harming at least one of the other members, and no coalition of size b2 can deviate in a way that harms at least one of the other players [6].111The setting in [6] considers weighted objectives, where rather than winning or losing, each profile induces a payoff for each player, which enables also a quantitative definition of “harm” and “benefit”. Here we consider ω-regular Boolean objectives, inducing a Boolean interpretations for “harm” and “benefit”. Formally, π is (b1,b2)-robust if π is b1-resilient: for every subset C[k] of size at most b1, and every C-profile πC, if 𝖫𝗈𝗌𝖾(π)𝖶𝗂𝗇(π[CπC]), then 𝖶𝗂𝗇(π)𝖫𝗈𝗌𝖾(π[CπC]), and π is b2-immune: for every subset C[k] of size at most b2, and every C-profile πC, we have that 𝖶𝗂𝗇(π)𝖫𝗈𝗌𝖾(π[CπC])([k]C). Finally, a profile is a strong secure equilibria (SSE) if every deviation of a coalition of players that harms some player, also harms a player in the coalition [7]. Formally, π is an SSE if for every subset C[k], and every C-profile πC, if 𝖶𝗂𝗇(π)𝖫𝗈𝗌𝖾(π[CπC])([k]C), then 𝖶𝗂𝗇(π)𝖫𝗈𝗌𝖾(π[CπC])C.

For a subset W[k] of players, we say that π is a W-NE if π is an NE with W=𝖶𝗂𝗇(π), and similarly for the other solution concepts.

Rational Synthesis.

We consider a setting in which the players model a controllable system and its rational environment. Technically, we assume that Player 1 models the system (a system may be composed from several components, but since the system is controllable, we can merge them to a single player), and the other players model the components of the environment. Let 𝖤𝗇𝗏={2,,k}. The basic problem we consider is the existence and finding stable profiles that satisfy the objective of the system.

We refine the notions of NE and SNE to take into account our ability to control the system. For a profile π, we say that π is a 1-fixed NE if no player in 𝖤𝗇𝗏 can benefit from unilaterally changing her strategy. Likewise, we say that π is a 1-fixed SNE if no coalition of players in 𝖤𝗇𝗏 can jointly deviate in a way that strictly benefits all its members.

Consider a k-player game 𝒢. The problem of NE (SNE) cooperative rational synthesis, denoted NE-CRS (resp., SNE-CRS) is to return a 1-fixed NE (resp., SNE) in 𝒢 in which Player 1 wins. As in traditional synthesis, one can also define the corresponding decision problems, of rational realizability, where we only need to decide whether the desired strategies exist. In order to avoid additional notations, we sometimes refer to NE-CRS and SNE-CRS also as decision problems.

Finite-Memory Strategies.

A strategy fi:VV is finite-memory if it is possible to replace the unbounded histories in V by finitely many memories. Formally, a memory structure for a game 𝒢=G,{αi}i[k] with G={Vi}i[k],v0,E is =M,μ0,δ, consisting of a finite set M of memory states, an initial memory state μ0M, and an update function δ:M×E M. A memory structure is similar to an automaton with alphabet E, which is executed in parallel to the game: it starts in μ0 and reads the edges traversed by the token. Then, a strategy for Player i that relies on replaces the dependency on the history of the play by dependency on the current memory state of . Thus, the strategy is given by a function fi:M×ViV, such that for all μM and vVi, we have that v,fi(μ,v)E. When the current memory state is μ and the token is in vertex vVi, Player i moves the token to fi(μ,v) and moves to state δ(μ,v,fi(μ,v)). A strategy fi is memoryless if it only depends on the current vertex. That is, if for every two histories h,hV and vertex vVi, we have that fi(hv)=fi(hv). Note that a memoryless strategy can be viewed as a function fi:ViV, and corresponds to the case |M|=1. An objective type γ is called memoryless if in every zero-sum game with an objective of type γ, if Player 1 wins, then she has a memoryless winning strategy. Reachability, avoid, Büchi, and co-Büchi objectives are all memoryless [30].

For l1, we say that a strategy fi:VViV uses memory l if a memory structure that generates fi needs l states. Given a profile π=f1,,fk, we say that Player i uses memory l (in π) if fi uses memory l.

3 Is Memory for the Environment Helpful?

In zero-sum games, increasing the memory of the environment may only decrease the ability of the system to satisfy the specification. Formally, for every zero-sum game 𝒢 and for every bound m1, if Player 1 wins against Player 2 that uses memory m, then for every mm, Player 1 also wins against Player 2 that uses memory m, and possibly there is m′′>m such that Player 1 loses against Player 2 that uses memory m′′.

In this section we show that the picture in non-zero-sum games is different and more involved. First, the system may need memory in order to have a CRS solution in a game with memoryless objectives for all players. In addition, memory for the environment is required for the existence of a CRS solution in some cases yet prevents the existence of a CRS solution in other cases. Intuitively, memory for the environment enables the system to suggest to the environment richer strategies, but also enables the environment to have richer deviations.

We first describe cases in which increasing the memory of the system and the environment is required for a CRS solution, even in games with memoryless objectives for all players. Our examples are with two-player games, and thus apply to both NE-CRS and SNE-CRS.

Theorem 1.

There are two-player Büchi (or reachability) games 𝒢1 and 𝒢2 such that the following hold.

  1. 1.

    There is a CRS solution for 𝒢1 in which Player 1 uses a memory of size 2 and there is no CRS solution for 𝒢1 in which Player 1 is memoryless.

  2. 2.

    There is a CRS solution for 𝒢2 in which Player 2 uses a memory of size 2 and there is no CRS solution for 𝒢2 in which Player 2 is memoryless.

Proof.

We describe 𝒢1 and 𝒢2 with Büchi objectives. The same games and considerations apply when the games have reachability objectives. Consider the Büchi game 𝒢1 in Figure 1 (left). Let α1={v3} and α2={v1,v4}. Drawing two-player games we use circles and boxes to describe the vertices in V1 and V2, respectively.

Figure 1: The games 𝒢1 and 𝒢2.

Consider the strategy f1 for Player 1 that, in v2, alternates between v3 and v4. That is f1(v0,v2,(v3,v2,v4,v2))=v3 and f1(v0,v2,(v3,v2,v4,v2),v3,v2)=v4. Note that in order to implement the alternation, the strategy f1 requires a memory of size 2. Consider the strategy f2 for Player 2 that, in v0, takes the token down to v2. Note that the profile f1,f2 is a CRS solution. Indeed, its outcome is v0,(v2,v3,v2,v4)ω, which visits both v3 and v4 infinitely often. Thus, both objectives are satisfied, and the profile is a CRS solution.

On the other hand, consider a memoryless strategy for Player 1. If, in v0, Player 2 moves the token to v1, then the play reaches and stays forever in v1 no matter what the strategy of Player 1 is, and α1 is not satisfied. If, in v0, Player 2 moves the token to v2, then either, in v2, Player 1 always moves the token to v4, in which case the play never visits v3, and so α1 is not satisfied, or Player 1 always moves the token to v3, in which case α2 is not satisfied, causing Player 2 to deviate in v0. Thus, there is no CRS solution in which Player 1 uses a memoryless strategy.

Consider now the Büchi game 𝒢2 in Figure 1 (right). Let α1={v3} and α2={v1,v4}. Note that all the vertices with more than one successor belong to Player 2, and so there is a single strategy f1 for Player 1 in the game. Consider the strategy f2 for Player 2 that, in v0, takes the token down to v2, and, in v2, alternates between v3 and v4. That is f2(v0,v2,(v3,v2,v4,v2))=v3 and f2(v0,v2,(v3,v2,v4,v2),v3,v2)=v4. Note that in order to implement the alternation, the strategy f2 requires a memory of size 2. It is easy to see that the profile f1,f2 is a CRS solution. Indeed, its outcome is v0,(v2,v3,v2,v4)ω, which satisfies both objectives.

On the other hand, consider a memoryless strategy for Player 2. If, in v0, Player 2 moves the token to v1, then the play reaches and stays forever in v1 and α1 is not satisfied. If, in v0, Player 2 moves the token to v2, then either, in v2, Player 2 always moves the token to v4, in which case α1 is not satisfied, or Player 2 always moves the token to v3, in which case α2 is not satisfied, causing Player 2 to deviate in either v0 or v2. Thus, there is no CRS solution in which Player 2 uses a memoryless strategy.

The example of 𝒢2 in Theorem 1 shows that increasing the memory of the environment may help the system to achieve a CRS solution. We continue and examine whether this is always the case. We first need some notations. Consider a non-zero-sum game 𝒢=G,{αi}i[k]. For m1, we say that a profile π=f1,f2,,fk is an m-bounded 1-fixed NE if for every 2ik, the strategy fi uses memory at most m, and if i𝖫𝗈𝗌𝖾(π), then for every strategy fi for Player i that uses memory at most m, we have that i𝖫𝗈𝗌𝖾(π[ifi]). Thus, environment players are restricted to strategies that use memory at most m, in both π and their deviations. Likewise, π is an m-bounded 1-fixed SNE if all the strategies of the environment players in π use memory at most m and no coalition of environment players can jointly deviate to strategies that use memory at most m in a way that strictly benefits all its members. Then, we say that π is an m-bounded NE-CRS (SNE-CRS) solution if π is an m-bounded 1-fixed NE (SNE, respectively) that satisfies α1. Note that the usual CRS problem coincides with the case m=.

We first show that, unsurprisingly, when the objectives of the environment players require memory, in particular when they consist of a conjunction of objectives, then a system may have a CRS solution only thanks to bounds on the memory of the environment players.

Theorem 2.

There is a two-player generalized-Büchi (or generalized reachability) game 𝒢3 such that there is no CRS solution for 𝒢3, yet there is a 1-bounded CRS solution for 𝒢3.

Proof.

We prove the theorem for the Büchi case. The same game and considerations apply for generalized-reachability objectives.222The application to reachability is less straightforward here. In particular, for Büchi, one could give up the vertex v2 and let v0 have three successors. For reachability, we need the decision of Player 2 about not visiting v1 in her first transition to be un-recoverable. Consider the generalized Büchi game 𝒢3 played on the graph of 𝒢2 (Figure 1, right), now with α1={{v1}} and α2={{v3},{v4}}. Note that Player 1 has a Büchi objective.

Recall that all the vertices with more than one successor belong to Player 2, and so there is a single strategy f1 for Player 1 in the game. Consider the strategy f2 for Player 2 that, in v0, takes the token to v1, where it loops forever. Clearly, the induced play satisfies α1. When Player 2 is restricted to memoryless strategies, it cannot satisfy her objectives, making this profile a 1-bounded CRS solution. On the other hand, when Player 2 is not memoryless, she can take the token down to v2 and then alternate between v3 and v4. Hence, when Player 2 has memory 2 or more, she would deviate from every profile that leads to v0, and so there is no CRS solution in 𝒢3.

The example of 𝒢3 in the proof of Theorem 2 heavily relies on Player 2 having an objective that requires memory. We now show that for SNE-CRSs, when a deviation of a player may need to be beneficial to several players, a system may have an SNE-CRS solution only thanks to bounds on the memory of the environment players, even when all players have memoryless objectives.

Theorem 3.

There is a 3-player Büchi game 𝒢4 such that there is no SNE-CRS solution for 𝒢4, yet there is a 1-bounded SNE-CRS solution for 𝒢4.

Proof.

We prove the theorem for the Büchi case. The same game and considerations apply for reachability objectives. Consider the 3-players Büchi game 𝒢4 in Figure 2. We use diamonds to denote vertices controlled by Player 3. Let α1={v1}, α2={v3}, and α3={v4}.

Figure 2: The game 𝒢4.

Note that all the vertices with more than one successor belong to Player 2 or Player 3, and so there is a single strategy f1 for Player 1 in the game. We first describe a 1-bounded SNE-CRS solution for 𝒢4. Let f2 be the memoryless strategy for Player 2 that, in v0, takes the token to v1 and, in v2, take the token to v3. Let f3 be the memoryless strategy for Player 3 that loops in v5. Clearly, 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)=v0,v1ω, and so 𝖶𝗂𝗇(π)={1}. We prove that Player 2 and Player 3 cannot deviate to memoryless strategies in a way that causes their objectives to be satisfied. Clearly, a deviation by Player 3 alone cannot affect the outcome of the game. Also, a deviation by Player 2 alone may not cause the outcome to reach v3 or v4. Consider now a joint deviation of Player 2 and Player 3. When Player 2 uses a memoryless strategy, it cannot cause the outcome of the profile to visit both v3 and v4 infinitely often. Thus, the deviation is not beneficial to either Player 2 and Player 3. Hence, π is a 1-bounded SNE-CRS solution.

On the other hand, when Player 2 has memory m2, then for every profile whose outcome reaches v1, Player 2 and Player 3 would deviate to an outcome that satisfies their both objectives, and so no SNE-CRS solution exists for 𝒢4.

We now complete the picture and show that when the objective of the environment players are memoryless and deviations are allowed only for single players, then adding memory to the environment may only help the system. Thus, for NE-CRS with memoryless objectives, we cannot have examples as those in Theorems 2 and 3.

Theorem 4.

For every Büchi (or reachability) game 𝒢, if there is a 1-bounded NE-CRS solution in 𝒢, then there is also an NE-CRS solution in 𝒢.

Proof.

Consider a Büchi (or reachability) game 𝒢=G,{αi}i[k]. Let π={f1,f2,,fk} be a 1-bounded NE-CRS solution. We prove that there exists a strategy g1 for Player 1 such that the profile π=π[1g1] is an NE-CRS solution.333In Appendix A.1, we show that the transition to g1 is essential, thus π need not be an NE-CRS solution. In fact, the example is stronger, showing that a profile π may be a 1-bounded NE-CRS solution and still no profile in which the system follows its strategy in π is an NE-CRS solution. We also show that the strategy of Player 1 in a 1-bounded NE-CRS solution may require memory of size 2.

For every i𝖤𝗇𝗏, let Gπ,i be the graph obtained from G by removing edges that leave vertices in Vj that do not agree with the memoryless strategy fj, for all j[k]{1,i}. Consider the zero-sum two-player game 𝒢π,i=Gπ,i,αi between Player i and Player 1. Let WiV be the winning region of Player i in 𝒢π,i, and let fi be a memoryless strategy for Player i for the vertices in Wi. That is, vWi iff for every strategy f1 for Player 1, the outcome in 𝒢π,i of fi and f1 from v satisfies αi. Likewise, let f1i be a memoryless strategy for Player 1 in 𝒢π,i that is winning in all vertices not in Wi. That is, vWi iff for every strategy gi for Player i, the outcome in 𝒢π,i of gi and f1i from v does not satisfy αi. Note that since αi is a Büchi objectives, memoryless strategies fi and f1i exist.

Let ρ=v0,v1,v2,=𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π). We first argue that for all i𝖫𝗈𝗌𝖾(π) and vertices v𝗋𝖾𝖺𝖼𝗁(ρ)Vi, we have that vWi. To see this, assume by way of contradiction that there exists i𝖫𝗈𝗌𝖾(π) and a vertex v𝗋𝖾𝖺𝖼𝗁(ρ)Vi such that vWi. Let v=vj be the first such vertex. That is, v0,,vj1 are all not in Wi and vjWi. Consider the strategy gi for Player i that agrees with fi in the vertices v0,,vj1 and agrees with fi in all other vertices. Consider the profile πi=π[igi]). Note that gi is memoryless, 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(πi) has a prefix v0,,vj, and, as fi is winning in 𝒢π,i, the outcome continues in a way that satisfies αi. Thus, i𝖶𝗂𝗇(π[igi]), contradicting the fact that π is a 1-bounded 1-fixed NE.

We can now define the strategy g1, as follows. As long as the generated play follows ρ, then g1 agrees with f1. If for some i𝖤𝗇𝗏 and vertex v𝗋𝖾𝖺𝖼𝗁(ρ)Vi, Player i deviates and moves the token to a successor of v that is different from fi(v), then g1 follows the strategy f1i for the rest of the game. Note that g1 need not be memoryless (even when f1 is memoryless).

We argue that the profile π=π[1g1] is an NE-CRS solution. First, since g1 agrees with f1 as long as the environment players follow their strategies in π, then 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)=𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π), and so 𝖶𝗂𝗇(π)=𝖶𝗂𝗇(π). Thus, 1𝖶𝗂𝗇(π). It is left to show that π is a 1-fixed NE. Consider a player i𝖫𝗈𝗌𝖾(π) and a strategy fi for Player i. Let hvVVi be the longest prefix of 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π[ifi]) that agrees with ρ. Thus, fi(hv)fi(v), and so, by its definition, the strategy g1 starts to follow the strategy f1i after the history hv. Since v𝗋𝖾𝖺𝖼𝗁(ρ)Vi, then vWi. Therefore, the strategy f1i is winning in 𝒢π,i when the game starts in v. Hence, 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π[ifi]) does not satisfy αi, and so fi is not a beneficial deviation for Player i. Thus, π is an NE-CRS solution, and we are done.

4 Memory Requirements for NE-CRS

In this section we consider the memory requirements for NE-CRS, namely when the solution concept is an NE. Consider a Büchi game 𝒢=G,{αi}i[k]. In [31], Ummels shows444The study in [31] considers Streett objectives, and includes also the parameter of the number of pairs in the objectives. It also combines the memory of the strategy with the size of the state space. The presentation here includes a straightforward adjustment to our setting. that for a desired set W of winners, if there exists a W-NE in 𝒢, then there exists a W-NE in which the memory of all players is of size O(k). Since an NE-CRS solution corresponds to a 1-fixed W-NE with 1W, and once 1W, then a profile is a 1-fixed W-NE iff it is a W-NE, the result provides an upper bound also to our problem. Below we prove a matching lower bound. The proof is not too complicated, and mainly serves as a warm-up to the study of SNE-CRS.

The lower bound holds already the class of sink games, defined below. Consider a graph G. A vertex in G is a sink if it has only one outgoing edge, which is a self-loop. Then, a k-player sink game 𝒢=G,{αi}i[k] is a game in which the only cycles in G are sinks, and for every i[k], the objective αiV is a set of sinks. Note that since once a play reaches a sink it stays there forever, the objective in sink games can be described using reachability, avoid, Büchi, or co-Büchi objectives.

Theorem 5.

For every k>2, we can construct a k-player sink game 𝒢k such that Gk has an NE-CRS solution, and every CRS solution for 𝒢k requires all the environment players to have memory k2.

Proof.

We define 𝒢k as follows (see an illustration for the case k=4 in Figure 3).

Figure 3: The game G4. Each vertex is labeled by its owner (top). The colors correspond to owners.

A play in Gk starts at the initial vertex d2. For each i𝖤𝗇𝗏, Player i controls the vertex di and decides whether to move the token to di+1 or to a vertex cj for some j𝖤𝗇𝗏{i}. For each j𝖤𝗇𝗏, Player j also controls the vertex cj. From cj, Player j chooses a successor si for some i𝖤𝗇𝗏. The vertex si is a sink that satisfies the objectives of all players in 𝖤𝗇𝗏{i}. The vertex dk+1 is a sink that satisfies the objective of Player 1.

Note that the only play in which Player 1 achieves her objective is d2,d3,,dk,(dk+1)ω, in which all players in 𝖤𝗇𝗏, when controlling their di vertices, choose to move the token towards dk+1. If for some i𝖤𝗇𝗏, Player i instead chooses to move the token to a vertex cj for some j𝖤𝗇𝗏{i}, then Player j can move the token from cj to si, making the deviation non-beneficial for Player i. Thus, an NE-CRS solution for 𝒢k consists of strategies that direct the token to dk+1 and “punish” players that do not follow such a direction. Consider j𝖤𝗇𝗏. In order to punish Player i by moving the token from cj to si, Player j has to remember the vertex from which the token has reached cj. Since there are k2 such possible vertices, 𝒢k has a NE-CRS solution where the strategy of each player uses k2 memory. Also, if for some i[k], the strategy of Player i has a memory smaller than k2, then there exists a player j𝖤𝗇𝗏{i} that can move from dj to ci without being punished, which implies that the profile is not an NE-CRS solution.

In Appendix A.2, we describe 𝒢k and prove the bounds formally.

5 On the SNE-CRS Problem

In this section we analyze the complexity of the SNE-CRS problem and the memory required for the players in an SNE-CRS solution. For both problems, the upper bounds follow easily from the study of robust equilibrium and SSE. Our main contributions are the lower bounds. For the complexity of the SNE-CRS problem, we relate the collaboration of players in SNE with the concept of non-cooperation in rational synthesis. For the results on the memory, lower bounds are open also for the other solution concepts, and we show that our contribution applies for them too.

5.1 Solving SNE-CRS

In this section we prove that the SNE-CRS problem is PSPACE-complete. The upper bound is similar to the one presented for other types of equilibria with respect to deviations by a coalition of players and is based on adjusting the objectives in the deviator game of Brenguier [6] to the solution concept of SNE. The adjustment is quite straightforward, and we describe it in the full version. The PSPACE algorithm holds for every objective that can be translated in polynomial space to an Emerson-Lei objective555An Emerson-Lei objective is given by a Boolean assertion θ over subsets of V. A play ρVω induces an assignments fρ:2V{F,T}, where for every set S2V, we have that fρ(S)=T iff 𝗂𝗇𝖿(ρ)S. Then, ρ satisfies θ iff fρ satisfies θ. For example, the Emerson-Lei objective α1α2αk is equivalent to the generalized Büchi objective {α1,α2,,αk}, and the objective α1¬α2 is satisfied by plays that visit vertices in α1 infinitely often and visit vertices in α2 only finitely often. [16]. This clearly includes (but is not limited to) Büchi and co-Büchi objectives.

Our lower bound involves an interesting relation between the collaboration of players in an SNE and the concept of non-cooperation in rational synthesis. We start with a definition of the latter. Recall that in CRS, we assume that the environment players are collaborative, in the sense they would follow a suggested equilibrium. In non-cooperative rational synthesis (NRS), we cannot suggest a strategy to the environment players and only know they would reach an equilibrium [21]. Accordingly, for NE-NRS, the goal is to return a strategy f1 for Player 1 such that Player 1 wins in every 1-fixed NE f1,f2,,fk. In other words, Player 1 follows f1, and no matter how the environment players behave, then as long as they are rational, and so resulting profile is a 1-fixed NE, the objective of Player 1 is satisfied.

Note that in NE-NRS, the system has to cope only with deviations of single players, yet the players are non-cooperative. On the other hand, in SNE-CRS, the system has to cope with deviations of coalitions of players, yet the players are cooperative, and would follow a suggested 1-fixed SNE. In this section we relate NE-NRS with SNE-CRS, showing that the challenge of coping with deviations of coalitions is similar to the challenge of coping with non-cooperation. Intuitively, in both cases, all the environment players may deviate simultaneously, as long as these deviations are beneficial for them.

We formalize the connection by describing a class 𝒞 of games such that for every game 𝒢=G,{αi}i[k] in the class 𝒞, there is a game 𝒢=G,{αi}i[k] such that there is an NE-NRS in 𝒢 iff there is an SNE-CRS in 𝒢. Note that 𝒢 and 𝒢 are defined with respect to the same game graph, and only the objectives are different. Moreover, the class 𝒞 is strong, in the sense that the PSPACE-hardness of NE-NRS applies already to a game in 𝒞 [13]. Accordingly, the results in this section, beyond the interesting connection between non-cooperation and coalitions, also imply PSPACE-hardness for the problem of SNE-CRS.

In the rest of this section we define the class 𝒞 and describe the reduction from 𝒢 to 𝒢. Recall that in a sink game, the only cycles are sinks, and objectives are set of sinks. The class 𝒞 consists of special sink games, defined below, which restricts sink games further. Nevertheless, the class of special sink games captures the PSPACE-hardness of NE-NRS for all types of objectives, and we are going to relate NE-NRS solutions in special sink games to SNE-CRS solutions in sink games.

Definition 6.

A k-player sink game 𝒢=G,{αi}i[k], with G={Vi}i[k],v0,E, is special if the following holds:

  1. 1.

    α1=i{3,,k}αi.

  2. 2.

    α2=V.

  3. 3.

    For every i{3,,k} and vertex vVi, there is a vertex uα1 such that v,uE.

Note that the third condition implies that α1 is not empty and that whenever a token reaches a vertex of Player i, for i{3,,k}, she can move the token to α1 and cause all players to win.

Theorem 7.

Consider a k-player special sink game 𝒢=G,{α1,α2,,αk}. The game 𝒢 has an NE-NRS solution iff the game 𝒢=G,{α1,α2α1,α3α1,,αkα1} has an SNE-CRS solution.

Proof.

Let G={Vi}i[k],v0,E, and let 𝖤𝗇𝗏={2,,k} be the set of environment players.

Given a play pVω in G, let 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p)𝖤𝗇𝗏 denote the set of environment players that own at least one vertex visited along p. For a profile π, we use 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(π) to denote 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)). Note that when we consider deviations in a 1-NE π, only deviations of players in 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(π) are of interest. Indeed, for every i𝖤𝗇𝗏𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(π), if Player i changes her strategy, the outcome of the profile is not changed. We say that a play pVω is a trap for Player 1 in 𝒢 if it reaches a sink sα1 such that for every i𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p), we have that sαi. Given a strategy fi for Player i, and a play p=v0,v1,v2 in G, we say that p agrees with fi (and that fi agrees with p) if for every j0, if vjVi, then fi(v0,v1,,vj)=vj+1.

In Appendix A.3, we prove that the following three claims are equal. The theorem follows from the equivalence between (𝐂1) and (𝐂2).

(𝐂1)

The game 𝒢 has an NE-NRS solution.

(𝐂2)

The game 𝒢 has an SNE-CRS solution.

(𝐂3)

There exists a strategy f1 for Player 1 such that there does not exist a trap in 𝒢 that agrees with f1.

Intuitively, both NE-NRS and SNE-CRS solutions consider stable profiles in which Player 1 uses a strategy f1 as described in (𝐂3). In NE-NRS, stability amounts to an NE, and the universal quantification on all stable profiles is explicit in the definition of NRS. In SNE-CRS, we do start with a single profile, but deviations of sets of players capture the many profiles that have to be considered in NRS. Then, as specified in (𝐂3), in the setting of special sink games, the relevant deviations correspond to traps, and the two notions coincide.

Recall that Büchi and co-Büchi objectives are special cases of Emerson-Lei objectives, for which the SNE-CRS problem can be solved in PSPACE. Also, sink objectives are special case of both Büchi and co-Büchi objectives. Since the NE-NRS problem is PSPACE-hard already for special sink games [13], Theorem 7 enables us to conclude with the following.

Theorem 8.

The SNE-CRS problem for Büchi and co-Büchi games is PSPACE-complete.

5.2 Memory Requirements for SNE-CRS

In this section we consider the memory requirements for SNE-CRS. The upper bound is similar to the one known for resilient-CRS and is based on an analysis of the memory required for the players in the corresponding deviator game. Essentially (see details in the full version), the players have to maintain in their memory the set of players who have deviated, as well as memory required for the satisfaction of the objectives of the winning players.

Theorem 9.

Consider a k-player Büchi game 𝒢. If there exists a SNE-CRS solution in 𝒢, then there also exists an SNE-CRS solution in which the strategy of each player uses memory at most 2O(k).

Our main contribution is a lower bound, which was not studied before, and applies also to other solution concepts.

Theorem 10.

For every k4, and mk2, there is a sink game 𝒢k,m=Gk,{αi}i[k] such that 𝒢k,m has an SNE-CRS solution, and every SNE-CRS solution requires O(k) environment players to have memory 2Θ(k).

Proof.

We define 𝒢k,m as follows (see G6,2 in Figure 4).

Figure 4: The sink game G6,2. Each vertex is labeled by its owner (top). The colors correspond to owners. Some edges are omitted for clarity. In particular, all the vertices in 𝒮2×{1,2} have an edge to the vertex .

Let 𝒮m denote the set of all subsets of {3,,k} of size m. That is, 𝒮m={A{3,,k}:|A|=m}. For every set of players A𝒮m and l[m], let A[l] be the l-th element in A when the elements are ordered by the usual order on {3,,k}.

The game begins at the initial vertex q0, which is owned by Player 2. From q0, Player 2 chooses a subset A𝒮m and moves the token to the vertex A,1. For each subset A𝒮m and l[m], Player A[l] controls the vertex A,l. If lm, then the successors of A,l are the vertices and A,l+1. If l=m, then the successors of A,l are the vertices and cj, for j{3,,k}. For each i{3,,k}, Player i controls the vertex ci. From ci, Player i chooses a subset A𝒮m and moves the token to the vertex A,i,p, where p is a symbol, indicating the game is in the “punishment” layer. For every A𝒮m and i{3,,k}, the vertex A,i,p is owned by Player 2, who chooses an index jA and moves the token to the vertex si,j. For every i,j{3,,k}, the vertex si,j is a sink that satisfies the objectives of all players in [k]{1,i,j}. The vertex is a sink that satisfies the objective of Player 1.

The idea behind 𝒢k,m is as follows. Consider a profile π=f1,,fk. Note that if Player 1 wins in π, then its outcome reaches the vertex , in which case all the other players lose in π. Also, Player 2 wins in π iff its the outcome of π reaches a sink of the form si,j, in which case Player 1 loses. Thus, the objectives of Player 1 and Player 2 complement each other. Assume that Player 1 wins in π, thus its outcome reaches . In order for π to be an SNE, every deviation of players in {2,,k} should result in a profile in which at least one of the players that deviate does not satisfy its objective. Thus, when the coalition of deviators is C{2,,k}, then there should be iC such that the outcome of the new profile either continues to reach or, in case i2, reaches a sink si,j or sj,i for some j{3,,k}. Before we explain why this implies the existence and an SNE-CRS solution, and why strategies for players 3,,k in any SNE-CRS solution require exponential memory, let us define 𝒢k,m formally.

We define Gk,m={Vi}i[k],q0,E,{αi}i[k], as follows.

  1. 1.

    The set of vertices and its partition to owners is as follows.

    • V1={}{si,j:i,j{3,,k}}.

    • V2={q0}{A,i,p:A𝒮m and i{3,,k}}.

    • For every i{3,,k}, we define Vi={ci}{A,l:A𝒮m,l[m], and i=A[l]}.

  2. 2.

    The set E contains edges of the following types:

    • For every A𝒮m, there is an edge from q0 to A,1.

    • For every A𝒮m and l[m1], there is an edge from A,l to and to A,l+1.

    • For every A𝒮m, there is an edge from A,m to and to cj, for all j{3,,k}.

    • For every A𝒮m and i{3,,k}, there is an edge from ci to A,i,p.

    • For every A𝒮m, i{3,,k}, and jA, there is an edge from A,i,p to si,j.

    • The vertices and si,j, for all i,j{3,,k}, are sinks.

  3. 3.

    The objectives of the players are defined as follows.

    • α1={}.

    • For every i{2,,k}, we have αi={sl,j:l,j{3,,k}{i}}.

We first describe an SNE-CRS solution π=f1,,fk in 𝒢k,m. Since Player 1 only owns sinks, her strategy is straightforward. Moreover, the notions of SNE and 1-fixed SNE coincide in 𝒢k,m. For Player 2, the strategy f2 is an arbitrary memoryless strategy. As for i{3,,k}, the strategy fi directs Player i to move the token to from vertices in 𝒮m×[m] that she owns and to move the token to A,i,p from ci. It is easy to see that 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)=q0,f2(q0),()ω, and so 𝖶𝗂𝗇(π)={1}. We prove that π is a 1-fixed SNE. Consider a coalition C{2,,k} and a deviation profile {fi}iC. Since Player 2 is losing in π, and win in every outcome that does not reach , a deviation for Player 2 that does not reach is always beneficial, and so, we can assume that 2C. Let π=π[C{fi}iC]. Let A𝒮m be the set of players that Player 2 chooses from q0 in π; thus f2(q0)=A,1. If there is a player iA that in π moves the token from the vertex A,l with A[l]=i to the sink , then the outcome of π reaches . Thus, 𝖶𝗂𝗇(π)={1}, and the deviation of the players in C is not beneficial. If all players in A do not move the token to in their strategies in π (in particular, this means that AC), then let ci be the vertex to which Player A[m] moves the token from A,m in π. Since all the sinks si,l that are reachable from ci are losing for Player i, the deviation is not beneficial for Player i, implying that iC. Hence, Player i follows fi, which directs her to move the token from ci to A,i,p. Since every successor of A,i,p is losing for some player in A, the deviation is not beneficial to all the players in C, and so π is an SNE in which Player 1 wins.

We continue and prove that every SNE-CRS solution in 𝒢n,k requires each of the players 3,,k to have memory (k3m). Assume by contradiction that there exists an SNE profile π=f1,,fk in which Player 1 wins and there exists i{3,,k} such that the strategy fi has a memory structure i with fewer than (k3m) states. Since there are fewer than (k3m) states in i, and there are (k3m) subsets of {3,,k}{i} of size m, there exists a set A𝒮m such that iA and fi never (that is, no matter what the history along which ci has been reached) directs Player i to move the token from ci to A,i,p.

Consider the coalition C=A{2}, and the deviation profile fC={{fj}jC}, where for every jA, the strategy fj agrees with fj, except that from A,l, with A[l]=j, the strategy fj directs Player j to move the token to A,l+1, in case l<m, and to ci, in case l=m. Finally, the strategy f2 agrees with f2, except that from q0, the strategy f2 directs Player 2 to move the token to A,1, and for every A𝒮m such that AA, the strategy f2 directs Player 2 to move the token from A,i,p to si,minAA. Note that since both A and A are of size m, the set AA is not empty, and thus minAA exists and is in {3,,k}. Let π=π[CfC]. By the definition of the strategies in π, there exists A𝒮m such that AA and 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)=q0,A,1,A,2,,A,m,ci,A,i,p,(si,minAA)ω. Thus, 𝖶𝗂𝗇(π)=[k]{1,i,minAA}. Since 1,i, and minAA are all not in A, and thus also not in C, it follows that C𝖶𝗂𝗇(π). Hence, the deviation to fC is beneficial to all the players in C, contradicting the assumption that π is an SNE.

6 Memory Requirements for Additional Solution Concepts

Recall that different applications have initiated the study of different solution concepts for settings in which a coalition of players may deviate together. While upper bounds on the concepts of robust equilibria and SSE serve as a basis to our upper bounds here, no lower bounds are known on the memory requirements for CRS solutions with respect to these concepts. In this section we show that the construction in Theorem 10 can be modified to show a lower bound on the memory required to the environment players in CRS solution when the solution concepts are resilient (and hence, robust) equilibria and SSE.

For k4 and m[k2], consider the k-player sink game 𝒢k,m obtained from 𝒢k,m by changing the objectives of the players so that now the sink is winning for all players in [k]{2} (rather than for Player 1 only in 𝒢k,m). Thus, as in 𝒢k,m, the objectives of Player 1 and Player 2 still complement each other, and now, Players 3,,k win also in profiles that reach the vertex . Also, as in 𝒢k,m, all the vertices owned by Player 1 have only one successor, and the notions of 1-fixed equilibrium and (usual) equilibrium coincide.

We start with resilient equilibria, and show that 𝒢k,m has a resilient-CRS solution, and that every resilient-CRS solution requires O(k) environment players to have memory 2Θ(k). Recall that a profile is a resilient-equilibrium if for every subset C𝖤𝗇𝗏, every deviation of the players in C that benefits a player in C also harms a player in C. The proof is similar to the proof of Theorem 10. In particular, the profile π=f1,,fk described in the proof is a resilient-CRS solution in 𝒢k,m. To see this, recall that 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π) reaches , and is thus winning in 𝒢k,m for all the players in [k]{2}. Thus, only Player 2 may benefit from a deviation. When Player 2 chooses a set A𝒮m, all the players in A have to deviate on order for to be avoided, and so, by the definition of resilient equilibrium, all of them have to win when the game reach the punishment layer. This, however, is impossible in deviations from π, as for all i{3,,k}, the only way for Player i to make sure she does not lose is to remember the set A and proceed as in π, from ci to the vertex A,i,p. Moreover, using considerations similar to these in the proof of Theorem 10 (see details in Appendix A.4), if there is i{3,,k} such that Player i has no memory to keep track of the set A chosen by Player 2, then there is no resilient-CRS. Essentially, in such a case Player 2 can convince a coalition A of players to join the deviation by letting the outcome of the new profile reach ci, where a player not in A would be punished.

For the solution concept of SSE, we show that 𝒢k,m has an SSE-CRS solution, and that every SSE-CRS solution requires O(k) environment players to have memory 2Θ(k). Recall that a profile is an SSE if for every coalition C𝖤𝗇𝗏, every deviation that harms a player not in C also harms a player in C. In 𝒢k,m, a deviation that does not reach the vertex causes Player 1 to lose. Thus, a profile is an SSE iff for every coalition C𝖤𝗇𝗏 and every deviation that causes the game to reach the punishment layer, at least one player in the coalition looses. Accordingly, the profile π described in the proof of Theorem 10 is an SSE-CRS solution in 𝒢k,m and in every SSE-CRS solution, every player in {3,,k} has to remember the set A chosen by Player 2 from q0. Indeed (see details in Appendix A.5), otherwise, a coalition of players in A can deviate by letting the outcome of the new profile reach the vertex ci, where a player not in A would be punished.

References

  • [1] S. Almagor, O. Kupferman, and G. Perelli. Synthesis of controllable Nash equilibria in quantitative objective game. In Proc. 27th Int. Joint Conf. on Artificial Intelligence, pages 35–41, 2018.
  • [2] E. Anshelevich, A. Dasgupta, J. Kleinberg, E. Tardos, T. Wexler, and T. Roughgarden. The price of stability for network design with fair cost allocation. In Proc. 45th IEEE Symp. on Foundations of Computer Science, pages 295–304. IEEE Computer Society, 2004. doi:10.1109/FOCS.2004.68.
  • [3] R. Aumann. Acceptable Points in General Cooperative n-Person Games. In Contributions to the Theory of Games, volume 4, 1959.
  • [4] R. Bloem, K. Chatterjee, and B. Jobstmann. Graph games and reactive synthesis. In Handbook of Model Checking., pages 921–962. Springer, 2018. doi:10.1007/978-3-319-10575-8_27.
  • [5] P. Bouyer, N. Fijalkow, M. Randour, and P. Vandenhove. How to play optimally for regular objectives? In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023), volume 261 of Leibniz International Proceedings in Informatics (LIPIcs), pages 118:1–118:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ICALP.2023.118.
  • [6] R. Brenguier. Robust equilibria in mean-payoff games. In Proc. 19th Int. Conf. on Foundations of Software Science and Computation Structures, volume 9634 of Lecture Notes in Computer Science, pages 217–233. Springer, 2016. doi:10.1007/978-3-662-49630-5_13.
  • [7] L. Brice, J-F. Raskin, M. Sassolas, G. Scerri, and M. Bogaard. Pessimism of the will, optimism of the intellect: Fair protocols with malicious but rational agents. In IEEE 38th Computer Security Foundations Symposium, pages 33–47, 2025.
  • [8] T. Brihaye, A. Goeminne, J.C.A. Main, and M. Randour. Reachability games and friends: A journey through the lens of memory and complexity. In 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2023), volume 284 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1:1–1:26. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.FSTTCS.2023.1.
  • [9] V. Bruyère, C. Grandmont, and J-F.Raskin. As soon as possible but rationally. In 35th International Conference on Concurrency Theory (CONCUR 2024), volume 311 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1–14:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CONCUR.2024.14.
  • [10] V. Bruyère, J-F. Raskin, A. Reynouard, and M. Bogaard. The non-cooperative rational synthesis problem for spes and omega-regular objectives. In 36th International Conference on Concurrency Theory (CONCUR 2025), volume 348 of Leibniz International Proceedings in Informatics (LIPIcs), pages 12:1–12:23. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPIcs.CONCUR.2025.12.
  • [11] A. Casares. On the minimisation of transition-based rabin automata and the chromatic memory requirements of muller conditions. In 30th EACSL Annual Conference on Computer Science Logic (CSL 2022), volume 216 of Leibniz International Proceedings in Informatics (LIPIcs), pages 12:1–12:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.CSL.2022.12.
  • [12] K. Chatterjee, R. Majumdar, and M. Jurdzinski. On Nash equilibria in stochastic games. In Proc. 13th Annual Conf. of the European Association for Computer Science Logic, volume 3210 of Lecture Notes in Computer Science, pages 26–40. Springer, 2004. doi:10.1007/978-3-540-30124-0_6.
  • [13] R. Condurache, E. Filiot, R. Gentilini, and J.-F. Raskin. The complexity of rational synthesis. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55 of Leibniz International Proceedings in Informatics (LIPIcs), pages 121:1–121:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.ICALP.2016.121.
  • [14] S. Dziembowski, M. Jurdzinski, and I. Walukiewicz. How much memory is needed to win infinite games. In Proc. 12th ACM/IEEE Symp. on Logic in Computer Science, pages 99–110, 1997.
  • [15] R. Ehlers. Symbolic bounded synthesis. In Proc. 22nd Int. Conf. on Computer Aided Verification, volume 6174 of Lecture Notes in Computer Science, pages 365–379. Springer, 2010. doi:10.1007/978-3-642-14295-6_33.
  • [16] E.A. Emerson and C.-L. Lei. Modalities for model checking: Branching time logic strikes back. Science of Computer Programming, 8:275–306, 1987. doi:10.1016/0167-6423(87)90036-0.
  • [17] E. Filiot, N. Jin, and J.-F. Raskin. An antichain algorithm for LTL realizability. In Proc. 21st Int. Conf. on Computer Aided Verification, volume 5643, pages 263–277, 2009. doi:10.1007/978-3-642-02658-4_22.
  • [18] D. Fisman, O. Kupferman, and Y. Lustig. Rational synthesis. In Proc. 16th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, volume 6015 of Lecture Notes in Computer Science, pages 190–204. Springer, 2010. doi:10.1007/978-3-642-12002-2_16.
  • [19] E. Koutsoupias and C. Papadimitriou. Worst-case equilibria. Computer Science Review, 3(2):65–69, 2009. doi:10.1016/J.COSREV.2009.04.003.
  • [20] O. Kupferman, Y. Lustig, M.Y. Vardi, and M. Yannakakis. Temporal synthesis for bounded systems and environments. In Proc. 28th Symp. on Theoretical Aspects of Computer Science, pages 615–626, 2011.
  • [21] O. Kupferman, G. Perelli, and M.Y. Vardi. Synthesis with rational environments. Annals of Mathematics and Artificial Intelligence, 78(1):3–20, 2016. doi:10.1007/S10472-016-9508-8.
  • [22] O. Kupferman and N. Shenwald. Games with trading of control. In 34th International Conference on Concurrency Theory (CONCUR 2023), volume 279 of Leibniz International Proceedings in Informatics (LIPIcs), pages 19:1–19:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.CONCUR.2023.19.
  • [23] O. Kupferman and N. Shenwald. Positional-Player Games. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025), volume 345 of Leibniz International Proceedings in Informatics (LIPIcs), pages 64:1–64:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPIcs.MFCS.2025.64.
  • [24] J. C. A. Main. Arena-independent memory bounds for nash equilibria in reachability games. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024), volume 289 of Leibniz International Proceedings in Informatics (LIPIcs), pages 50:1–50:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.STACS.2024.50.
  • [25] D.A. Martin. Borel determinacy. Annals of Mathematics, 65:363–371, 1975.
  • [26] J.F. Nash. Equilibrium points in n-person games. In Proceedings of the National Academy of Sciences of the United States of America, 1950.
  • [27] C. H. Papadimitriou. Algorithms, games, and the internet. In Proc. 33rd ACM Symp. on Theory of Computing, pages 749–753, 2001.
  • [28] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proc. 16th ACM Symp. on Principles of Programming Languages, pages 179–190, 1989.
  • [29] S. Schewe and B. Finkbeiner. Bounded synthesis. In 5th Int. Symp. on Automated Technology for Verification and Analysis, volume 4762 of Lecture Notes in Computer Science, pages 474–488. Springer, 2007. doi:10.1007/978-3-540-75596-8_33.
  • [30] W. Thomas. On the synthesis of strategies in infinite games. In Proc. 12th Symp. on Theoretical Aspects of Computer Science, volume 900 of Lecture Notes in Computer Science, pages 1–13. Springer, 1995. doi:10.1007/3-540-59042-0_57.
  • [31] M. Ummels. The complexity of Nash equilibria in infinite multiplayer games. In Proc. 11th Int. Conf. on Foundations of Software Science and Computation Structures, pages 20–34, 2008.
  • [32] J. von Neumann and O. Morgenstern. Theory of games and economic behavior. Princeton University Press, 1953.
  • [33] M. Wooldridge, J. Gutierrez, P. Harrenstein, E. Marchioni, G. Perelli, and A. Toumi. Rational verification: From model checking to equilibrium checking. In Proc. of 30th Conf. on Artificial Intelligence, pages 4184–4190, 2016.

Appendix A Missing Proofs and Examples

A.1 On strategies for Player 1 in a 𝟏-bounded NE-CRS solution

We describe two interesting examples. The first is a game 𝒢5 and a CRS solution f1,f2 such that there is no 1-bounded CRS solution f1,f2 for a memoryless f2. The second is a game 𝒢6 such that every 1-bounded CRS solution f1,f2 requires f1 to have memory of size at least 2.

Consider the Büchi game 𝒢5 in Figure 5 (left). Let α1={v3,v4} and α2={v4}. Consider the following strategy of f1 of Player 1:

  • f1(v0,v2,(v3,v2))=v3.

  • f1(v0,v1,v0,v2,(v4,v2))=v4.

Thus, if the token reaches v2 without visiting v1 before, Player 1 always direct it to v3, where only Player 1 wins, and if the token visits v1 for one time before reaching v2, then Player 1 always direct the token to v4, where both players win. Let f2 be a strategy for Player 2 that proceeds to v1 once and then to v2. It is easy to see that while f1,f2 is a CRS solution, there is no 1-bounded CRS solution f1,f2 for a memoryless f2.

Figure 5: The games 𝒢5 (left) and 𝒢6 (right).

We continue and show that the strategy of Player 1 in a 1-bounded CRS solution may require memory of size 2. Consider the Büchi game 𝒢6 in Figure 5 (right). Let α1={v3} and α2={v4}.

Consider a profile π=f1,f2 for a memoryless strategy f1 for Player 1. We claim that π is not a 1-bounded CRS solution. Indeed, if f1(v0)=v1, then α1 is not satisfied, and if f1(v0)=v2, then Player 2 can deviate to a memoryless strategy f2(v2)=v4, where only α2 is satisfied.

Nevertheless, the game 𝒢6 does have a 1-bounded CRS solution π=f1,f2, for f1 that is not memoryless. To see this, consider a strategy f1 of Player 1 that behaves as follows:

  • f1((v0,v2,v3),v0)=v2.

  • f1(v0,v2,v4,v0)=v1.

Thus, when the game starts and as long as Player 2 moves the token from v2 to v3, Player 1 moves the token from v0 down to v2. If Player 2 moves the token from v2 to v4, then on the next visit of the token in v0, Player 1 moves it to v1, where no player wins. Note that when Player 2 uses a memoryless strategy, then this “next” visit must be the second one, and so the definition of f1 above covers all possible histories.

It is not hard to see that π=f1,f2, for the memoryless strategy f2 with f2(v2)=v3 is a 1-bounded CRS solution

A.2 Missing Details in the proof of Theorem 5

The game graph Gk={Vi}i[k],d2,E and the objectives {αi}i[k] are defined as follows:

  1. 1.

    The vertex sets of the players are defined as follows:

    • Player 1 controls the set V1={dk+1,c1}{s2,,sk}

    • For every i{2,,k}, Player i controls the set Vi={ci,di}.

  2. 2.

    The set E contains edges of the following types:

    • For every i{2,,k}, edges from di to di+1.

    • For every i{2,,k}, there is an edge from di to cj where j[k]{i}.

    • For every i[k], and j{2,,k}, there is an edge from ci to sj.

    • The vertices in {dk+1}{s2,,sk} are sink vertices.

  3. 3.

    The objectives are defined as follows:

    • The objective of Player 1 is α1={dk+1}.

    • For every i{2,,k}, the objective of Player i is αi={sj:j[k]{i}}.

We begin by describing an NE profile π=f1,,fk in which Player 1 wins. For every i{2,,k}, Player i decides to always move from di towards dk+1. If there exists i[k]{1} such that Player i deviated and moved the token from di to cj for some j[k]{i}, then Player j, move the token to si. The outcome of the profile π is d2,d3,,dk,dk+1ω, thus, 𝖶𝗂𝗇(π)={1}. If for some l{2,,k}, Player l deviates and move the token from dl to cj for some j[k]{l}, then Player j moves the token to sl and Player l loses. Meaning that deviation is not beneficial and that π is an NE. Overall π is an NE in which Player 1 wins. Note that the strategy for each players in π can be implemented with a finite memory structure of size k2. Indeed. each player only needs to remember if a deviation has happened, and if so, which player (besides themselves and Player 1) has deviated.

We now show that in every NE in which Player 1 wins, the strategy of every player uses memory at least k2. Assume by contradiction that there exists an NE profile π=f1,,fk where Player 1 wins and there exists i[k] such that the strategy fi has a memory structure i with less than k2 states. Then, by the pigeon hole principle, since there are less than k2 states in i but ci has k1 successors, there exists a successors to ci, which we mark with sj, such that fi never chooses sj as a successor from ci. Since j>2, we have that j𝖫𝗈𝗌𝖾(π) and may deviate. Let fj be a strategy for Player j that agrees with fj, except that from dj, the strategy fj always move the token to ci. Then, since fi never chooses sj as a successor from ci, the profile π=π[ifi] visits a sink sm where m[k]{1,j}. Thus, Player j wins in π and so π is not stable.

A.3 Missing Details in the proof of Theorem 7

We prove that the following three claims are equal:

(𝐂1)

The game 𝒢 has an NE-NRS solution.

(𝐂2)

The game 𝒢 has an SNE-CRS solution.

(𝐂3)

There exists a strategy f1 for Player 1 such that there does not exist a trap in 𝒢 that agrees with f1.

We first prove that (𝐂1) iff (𝐂3). In fact we prove a stronger claim, namely that for every strategy f1 for Player 1, we have that f1 is an NE-NRS solution in 𝒢 iff there does not exist a trap in 𝒢 that agrees with f1.

Consider a strategy f1 for Player 1. Assume first that f1 is an NE-NRS solution in 𝒢, and assume by way of contradiction that there exists a trap p that agrees with f1. Thus, p reaches a sink sα1, and for every i𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p), we have that sαi. Consider a profile π=f1,,fk, such that for every i𝖤𝗇𝗏, the strategy fi agrees with p. Since sα1, Player 1 loses in π. Since for every i𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p), we have that sαi, then 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(π)𝖶𝗂𝗇(𝒢,π). Thus, no player in 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(π) has an incentive to deviate in 𝒢. Hence, π is a 1-fixed NE in 𝒢 in which Player 1 loses, contradicting the fact f1 is an NE-NRS solution for 𝒢.

For the second direction, assume that there does not exist a trap in 𝒢 that agrees with f1, and assume by way of contradiction that f1 is not an NE-NRS solution for 𝒢. That is, there is a 1-NE profile π=f1,,fk such that 1𝖶𝗂𝗇(𝒢,π). Consider the play p=𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π). Since 1𝖶𝗂𝗇(𝒢,π), the play p reaches a sink sα1. Also, since p is not a trap, there exists i𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p) such that sαi, which implies that i𝖶𝗂𝗇(𝒢,π). By the third condition on the structure of special sink games, Player i can deviate and move the token from the first vertex she owns in p to a vertex in α1. Since αiα1, such a deviation causes Player i to win in 𝒢, contradicting the fact π is a 1-fixed NE.

We continue and prove that (𝐂2) iff (𝐂3). Here too, we prove a stronger claim, namely that for every strategy f1 for Player 1, we have that there is an SNE-CRS solution f1,f2,,fk for 𝒢 iff there does not exist a trap in 𝒢 that agrees with f1.

Assume first that f1 is such that there is an SNE-CRS solution π=f1,f2,,fk for 𝒢. Recall that for every i𝖤𝗇𝗏, we have that αi=αiα1. Thus, as 1𝖶𝗂𝗇(𝒢,π), it must be that 𝖤𝗇𝗏𝖫𝗈𝗌𝖾(𝒢,π).

Assume by way of contradiction there exists a trap p that agrees with f1. Let C=𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p). Since C𝖤𝗇𝗏, then C𝖫𝗈𝗌𝖾(𝒢,π). For every iC, let gi be a strategy for Player i that agrees with p. Let gC={gi:iC}, and consider the profile π=π[CgC]. Since f1 agrees with p and for all i𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p), the strategy gi agrees with p, we have that 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)=p. Since p is a trap, we have that C𝖶𝗂𝗇(𝒢,π). Thus, the deviation strictly benefits all the players in C and so π is not a 1-fixed SNE.

Assume now that f1 is such that there does not exist a trap p that agrees with f1. Consider the profile π=f1,f2,,fk, where f2 is some memoryless strategy, and for every i{3,,k} the strategy fi is a memoryless strategy that moves the token from every vertex to α1. Note that since 𝒢 is a special sink game, such strategies f3,,fk exist. We prove that π is an SNE-CRS solution for 𝒢.

Let p=𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π). First, we show that p reaches α1, which implies that 1𝖶𝗂𝗇(𝒢,π). Clearly, if 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p){3,,k}, then, by the definition of the strategies f3,,fk, the play p reaches α1. Otherwise, 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p){1,2}. Then, if p does not reach α1, then, by the definition of α2=Vα1, we have that p reaches α2, implying that p is a trap that agrees with f1 and contradicting the fact that no such trap exists.

Second, we show that π is a 1-fixed SNE in 𝒢. Assume by way of contradiction that π is not a 1-fixed SNE in 𝒢. Then, there exists a coalition C𝖫𝗈𝗌𝖾(𝒢,π) and a strategy profile gC such that for the profile π=π[CgC], we have that C𝖶𝗂𝗇(𝒢,π). By the definition of α2,,αk, the latter implies that 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π) reaches a sink sα1. Let p=𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π). By the definition of the strategies fi, it must be that 𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p){3,,k}C. Indeed, otherwise p would have reached α1. Thus, as C𝖶𝗂𝗇(𝒢,π), it follows that for every i𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(p){3,,k}, we have that sαi. In addition, as α2=V, we also have that sα2. Thus, the play p reaches a sink sα1, and for every i𝗋𝖾𝗅𝖾𝗏𝖺𝗇𝗍(π), we have that sαi. Thus, p is a trap that agrees with f1, contradicting the assumption that no such trap exists.

A.4 Resilient-CRS memory lower bound

We first show that π=f1,,fk is a resilient-CRS. Consider a coalition C{2,,k} and a deviation profile {fi}iC. Let π=π[C{fi}iC]. Let A𝒮m be the set of players that Player 2 choose from q0 in π; thus f2(q0)=A,1. If there is a player iA that in π moves the token from the vertex A,l with A[l]=i to the sink , then the outcome of π reaches . Thus, 𝖶𝗂𝗇(π)=[k]{2}, and the deviation of the players in C is not strictly beneficial to a player in the coalition. If all players in A do not move the token to in their strategies in π (in particular, this means that AC), then let ci be the vertex to which Player A[m] moves the token from A,m in π. Since all the sinks si,l that are reachable from ci are losing for Player i, the deviation is harmful for Player i, implying that iC. Hence, Player j follows fi, which directs her to move the token from ci to A,i,p. Since every successor of A,i,p is losing for some player in A, the deviation harms a player in C, and so π is resilient in which Player 1 wins.

We continue and prove that every resilient-CRS solution in 𝒢k,m requires each of the players 3,,k to have memory (k3m). Assume by contradiction that there exists a resilient profile π=f1,,fk in which Player 1 wins and there exists i{3,,k} such that the strategy fi has a memory structure i with fewer than (k3m) states. Since there are fewer than (k3m) states in i, and there are (k3m) subsets of {3,,k}{i} of size m, there exists a set A𝒮m such that iA and fi never (that is, no matter what the history along which ci has been reached) directs Player i to move the token from ci to A,i,p.

Consider the coalition C=A{2}, and the deviation profile fC={{fj}jC}, where for every jA, the strategy fj agrees with fj, except that from A,l, with A[l]=j, the strategy fj directs Player j to move the token to A,l+1, in case l<m, and to ci, in case l=m. Finally, the strategy f2 agrees with f2, except that from q0, the strategy f2 directs Player 2 to move the token to A,1, and for every A𝒮m such that AA, the strategy f2 directs Player 2 to move the token from A,i,p to si,minAA. Note that since both A and A are of size m, the set AA is not empty, and thus minAA exists and is in {3,,k}. Let π=π[CfC]. By the definition of the strategies in π, there exists A𝒮m such that AA and 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)=q0,A,1,A,2,,A,m,ci,A,i,p,(si,minAA)ω. Thus, 𝖶𝗂𝗇(π)=[k]{1,i,minAA}. Since 1,i, and minAA are all not in A, and thus also not in C, it follows that C𝖶𝗂𝗇(π).

Hence, the deviation to fC is strictly beneficial for Player 2, and does not harm any member of A, contradicting the assumption that π is resilient.

A.5 SSE-CRS memory lower bound

We first show that π=f1,,fk is an SSE-CRS. Consider a coalition C{2,,k} and a deviation profile {fi}iC. Let π=π[C{fi}iC]. Let A𝒮m be the set of players that Player 2 choose from q0 in π; thus f2(q0)=A,1. If there is a player iA that in π moves the token from the vertex A,l with A[l]=i to the sink , then the outcome of π reaches . Thus, 𝖶𝗂𝗇(π)=[k]{2}, and the deviation of the players in C does not harm a player outside C. If all players in A do not move the token to in their strategies in π (in particular, this means that AC), then let ci be the vertex to which Player A[m] moves the token from A,m in π. Since all the sinks si,l that are reachable from ci are losing for Player i, the deviation harms Player i, implying that iC. Hence, Player j follows fi, which directs her to move the token from ci to A,i,p. Since every successor of A,i,p is losing for some player in A, the deviation harms a player in C, and so π is SSE in which Player 1 wins.

We continue and prove that every SSE-CRS solution in 𝒢n,k requires each of the players 3,,k to have memory (k3m). Assume by contradiction that there exists a SSE profile π=f1,,fk in which Player 1 wins and there exists i{3,,k} such that the strategy fi has a memory structure i with fewer than (k3m) states. Since there are fewer than (k3m) states in i, and there are (k3m) subsets of {3,,k}{i} of size m, there exists a set A𝒮m such that iA and fi never (that is, no matter what the history along which ci has been reached) directs Player i to move the token from ci to A,i,p.

Consider the coalition C=A{2}, and the deviation profile fC={{fj}jC}, where for every jA, the strategy fj agrees with fj, except that from A,l, with A[l]=j, the strategy fj directs Player j to move the token to A,l+1, in case l<m, and to ci, in case l=m. Finally, the strategy f2 agrees with f2, except that from q0, the strategy f2 directs Player 2 to move the token to A,1, and for every A𝒮m such that AA, the strategy f2 directs Player 2 to move the token from A,i,p to si,minAA. Note that since both A and A are of size m, the set AA is not empty, and thus minAA exists and is in {3,,k}. Let π=π[CfC]. By the definition of the strategies in π, there exists A𝒮m such that AA and 𝗈𝗎𝗍𝖼𝗈𝗆𝖾(π)=q0,A,1,A,2,,A,m,ci,A,i,p,(si,minAA)ω. Thus, 𝖶𝗂𝗇(π)=[k]{1,i,minAA}. Since 1,i, and minAA are all not in A, and thus also not in C, it follows that C𝖶𝗂𝗇(π).

Hence, the deviation to fC does not harm any member of C while harming Player 1, contradicting the assumption that π is SSE.