Amplifiers for the Moran Process

The Moran process, as studied by Lieberman, Hauert and Nowak, is a randomised algorithm modelling the spread of genetic mutations in populations. The algorithm runs on an underlying graph where individuals correspond to vertices. Initially, one vertex (chosen u.a.r.) possesses a mutation, with fitness r>1. All other individuals have fitness 1. During each step of the algorithm, an individual is chosen with probability proportional to its fitness, and its state (mutant or non-mutant) is passed on to an out-neighbour which is chosen u.a.r. If the underlying graph is strongly connected then the algorithm will eventually reach fixation, in which all individuals are mutants, or extinction, in which no individuals are mutants. An infinite family of directed graphs is said to be strongly amplifying if, for every r>1, the extinction probability tends to 0 as the number of vertices increases. Lieberman et al. proposed two potentially strongly-amplifying families - superstars and metafunnels. Heuristic arguments have been published, arguing that there are infinite families of superstars that are strongly amplifying. The same has been claimed for metafunnels. In this paper, we give the first rigorous proof that there is an infinite family of directed graphs that is strongly amplifying. We call the graphs in the family"megastars". When the algorithm is run on an n-vertex graph in this family, starting with a uniformly-chosen mutant, the extinction probability is roughly $n^{-1/2}$ (up to logarithmic factors). We prove that all infinite families of superstars and metafunnels have larger extinction probabilities (as a function of n). Finally, we prove that our analysis of megastars is fairly tight - there is no infinite family of megastars such that the Moran algorithm gives a smaller extinction probability (up to logarithmic factors).


INTRODUCTION
This article is about a randomised algorithm called the Moran process. This algorithm was introduced in biology [Moran 1958;Lieberman et al. 2005] to model the spread of genetic mutations in populations. Similar algorithms have been used to model the spread of epidemic diseases, the behaviour of voters, the spread of ideas in social networks, strategic interaction in evolutionary game theory, the emergence of monopolies, and cascading failures in power grids and transport networks [Asavathiratham et al. 2001;Berge 2001;Gintis 2000;Kempe et al. 2003;Liggett 1999].
There has been past work about analysing the expected convergence time of the algorithm [Díaz et al. 2014[Díaz et al. , 2016. In fact, the fast-convergence result of Díaz et al. [2014] implies that when the algorithm is run on an undirected graph, and the "fitness" of the initial mutation is some constant r > 1, there is a Fully Polynomial Randomised Approximation Scheme (FPRAS) for the "fixation probability," which is the probability that a randomly introduced initial mutation spreads throughout the whole graph.
This article answers an even more basic question, originally raised in Lieberman et al. [2005], about the long-term behaviour of the algorithm when it is run on directed graphs. In particular, the question is whether there even exists an infinite family of (directed) graphs such that, when the algorithm is run on an n-vertex graph in this family, the fixation probability is 1 − o(1), as a function of n. A heuristic argument that this is the case was given in Lieberman et al. [2005], but a counterexample to the argument (and to the hypothesized bound on the fixation probability) was given in Díaz et al. [2013]. A further heuristic argument (with a revised bound) was given in Jamieson-Lane and Hauert [2015]. Here, we give the first rigorous proof that there is indeed a family of "amplifiers" with fixation probability 1 − o(1). Before describing this, and the other results of this article, we describe the model.
The Moran algorithm has a parameter r which is the fitness of "mutants." All nonmutants have fitness 1. The algorithm runs on a directed graph. In the initial state, one vertex is chosen uniformly at random to become a mutant. After this, the algorithm runs in discrete steps as follows. At each step, a vertex is selected at random, with probability proportional to its fitness. Suppose that this is vertex v. Next, an outneighbour w of v is selected uniformly at random. Finally, the state of vertex v (mutant or nonmutant) is copied to vertex w.
If the graph is finite and strongly connected, then with probability 1, the process will either reach the state where there are only mutants (known as fixation) or it will reach the state where there are only nonmutants (extinction). In this article, we are interested in the probability that fixation is reached, as a function of the mutant fitness r, given the topology of the underlying graph. If r < 1, then the single initial mutant has lower fitness than the nonmutants that occupy every other vertex in the initial configuration, so the mutation is overwhelmingly likely to go extinct. If r = 1, an easy symmetry argument shows that the fixation probability is 1 n in any strongly connected graph on n vertices [Díaz et al. 2014, Lemma 1]. 1 Because of this, we restrict attention to the case r > 1. Perhaps surprisingly, a single advantageous mutant can have a very high probability of reaching fixation, despite being heavily outnumbered in the initial configuration.
A directed graph is said to be regular if there is some positive integer d so that the in-degree and out-degree of every vertex is d. In a strongly connected regular graph on n vertices, the fixation probability of a mutant with fitness r > 1 when the Moran algorithm is run is given by so the extinction probability of such a mutant is given by (2) Thus, in the limit, as n tends to ∞, the extinction probability tends to 1/r. To see why Equations (1) and (2) hold, note that, for every configuration of mutants, the number of edges from mutants to nonmutants is the same as the number of edges from nonmutants to mutants. Suppose that the sum of the individuals' fitnesses is W and consider an edge (u, v). If u is a mutant in the current state, it is selected to reproduce with probability r/W, and if this happens, the offspring is placed at v with probability 1/d. Similarly, if u is not a mutant, reproduction happens along (u, v) with probability 1/(dW ). So, in any state, the number of mutants is r times as likely to increase at the next step of the process as it is to decrease. If we observe the number of mutants every time it changes, the resulting stochastic process is a random walk on the integers, that starts at 1, absorbs at 0 and n, increases with probability r r+1 , and decreases with probability 1 r+1 . It is well known that this walk absorbs at n with probability (1) and at 0 with probability (2). In particular, the undirected n-vertex complete graph is regular. Thus, by Equation (2), its extinction probability tends to 1/r.
When the Moran process is run on nonregular graphs, the extinction probability may be quite a bit lower than 1/r. Consider the undirected (n + 1)-vertex "star" graph, which consists of a single centre vertex that is connected by edges to each of n leaves. In the limit as n → ∞, the n-leaf star has extinction probability 1 r 2 [Lieberman et al. 2005;Broom and Rychtár 2008]. Informally, the reason that the extinction probability is so small is that the initial mutant is likely to be placed in a leaf and, at each step, a mutation at a leaf is relatively unlikely to be overwritten. Lieberman et al. [2005] refer to graphs which have a smaller extinction probability than Equation (2) (and therefore have a larger fixation probability than Equation (1)) as amplifiers. The terminology comes from the fact that the selective advantage of the mutant is being "amplified" in such graphs.
The purpose of this article is to explore the long-term behaviour of the Moran process by quantifying how good amplifiers can be. For this, it helps to have some more formal definitions.
Definition 1.1. Consider a function ζ (r, n) : R >1 × Z ≥1 → R ≥0 . An infinite family ϒ of directed graphs is said to be up-to-ζ fixating if, for every r > 1, there is an n 0 (depending on r) so that, for every graph G ∈ ϒ with n ≥ n 0 vertices, the following is true: When the Moran process is run on G, starting from a uniformly random initial mutant, the extinction probability is at most ζ (r, n).
The infinite family of graphs containing all undirected stars (which can be viewed as directed graphs with edges in both directions) is up-to-ζ (r, n) fixating for a function ζ (r, n) satisfying lim n→∞ ζ (r, n) = 1/r 2 , so this family of graphs is amplifying. Lieberman et al. [2005] were interested in infinite families of digraphs for which the extinction probability tends to 0, prompting the following definition. Definition 1.3. An infinite family of directed graphs is strongly amplifying if it is up-to-ζ fixating for a function ζ (r, n) which, for every r > 1, satisfies lim n→∞ ζ (r, n) = 0.
Note that the infinite family of undirected stars is not strongly amplifying since the extinction probability of stars tends to 1/r 2 rather than to 0.
Prior to this article, there was no (rigorous) proof that a strongly amplifying family of digraphs exists (though there were heuristic arguments, as we explain later). Proving rigorously that there is an infinite family of directed graphs that is strongly amplifying for the Moran algorithm is one of our main contributions. Lieberman et al. [2005] produced good intuition about strong amplification and defined two infinite families of graphs-superstars and metafunnels-from which it turns out that strongly amplifying families can be constructed. It is extremely difficult to analyse the Moran process on these families, due mostly to the complexity of the graphs, and the difficulty of dealing with issues of dependence and concentration. Thus, all previous arguments have been heuristic. For completeness, we discuss these heuristic arguments in Section 1.4.
In this article, we define a new family of digraphs called megastars. The definition of megastars is heavily influenced by the superstars of Lieberman et al. Our main theorem is the following. THEOREM 1.4. There exists an infinite family of megastars that is strongly amplifying.
Megastars are not easier to analyse than superstars or metafunnels. The reason for our focus on this class of graphs is that it turns out to be provably better amplifying than any of the previously proposed families. We will present several theorems along these lines. Before doing so, we define the classes of graphs.

Metafunnels, Superstars, and Megastars
1.1.1. Metafunnels. We start by defining the metafunnels of Lieberman et al. [2005]. Let k, , and m be positive integers. The (k, , m)-metafunnel is the directed graph G k, ,m defined as follows (see Figure 1).
The vertex set V (G k, ,m ) is the union of k + 1 disjoint sets V 0 , . . . , V k . The set V 0 contains the single vertex v * , which is called the centre vertex. For i ∈ [k], V i is the union of disjoint sets V i,1 , . . . , V i, , each of which has size m i . The edge set of G k, ,m is Fig. 1. The metafunnel G 3,4,2 . All edges are directed downwards in the diagram and the centre vertex v * is shown twice, once at the top and once at the bottom of the diagram. There are = 4 copies of the basic unit, each of which consists of k = 3 levels V 1, j , V 2, j , and V 3, j , with |V i, j | = m i = 2 i . Fig. 2. The superstar S 4,3,5 , with = 3 reservoirs R 1 , R 2 , and R 3 , each of size m = 5, connected by a path with k = 4 vertices to v * . The centre vertex v * is shown twice, at both the top and bottom of the diagram.
1.1.3. Megastars. Finally, we define the new class of megastars, which turn out to be provably better amplifiers than either metafunnels or superstars. The intuition behind the design of this class of graphs is that the path v i,1 v i,2 . . . v i,k linking the i'th reservoir R i of a superstar to the centre vertex v * is good for amplifying but that a clique is even better.
Let k, , and mbe positive integers. The (k, , m)-megastar is the directed graph M k, ,m defined as follows (see Figure 3). The vertex set V (M k, ,m ) of M k, ,m is the disjoint union of sets R 1 , . . . , R of size m, called reservoirs; sets K 1 , . . . , K of size k, called cliques; "feeder vertices" a 1 , . . . , a ; and a single centre vertex v * . The edge set of M k, ,m consists of the following edges: -an edge from v * to every vertex in R 1 ∪ · · · ∪ R , -for each i ∈ [ ], an edge from each vertex in R i to a i , -for each i ∈ [ ], an edge from a i to each vertex in K i , 5:6 A. Galanis et al. Fig. 3. The megastar M 3,2,4 , with = 2 reservoirs R 1 and R 2 , each of size m = 4. Each reservoir R i is attached, via the feeder vertex a i , to a clique of size k = 3. The centre vertex v * is shown twice, once at the top and once at the bottom of the diagram. The edges within the cliques K 1 and K 2 are bidirectional.
-for each i ∈ [ ], edges in both directions between every pair of distinct vertices in K i , -an edge from every vertex in K 1 ∪ · · · ∪ K to v * .

Our Results
Our main result is that there is an infinite family of megastars that is strongly amplifying, so we start by defining this family. Although megastars are parameterised by three parameters, k, , and m, the megastars in the family that we consider have a single parameter , so we define k and m to be functions of . Our main result can then be stated as follows. THEOREM 1.6. Let ζ M (r, n) = (log n) 23 n −1/2 . The family ϒ M is up-to-ζ M fixating.
COROLLARY 1.7. The family ϒ M is strongly amplifying.
The proof of Theorem 1.6 requires a complicated analysis, accounting for dependencies and concentration. The theorem, as stated here, follows directly from Theorem 6.1, which is proved in Section 6 (see page 52).
The reason that we studied megastars rather than the previously introduced superstars and metafunnels is that megastars turn out to be provably better amplifying than any of the previously proposed families. To demonstrate this, we prove the following theorem about superstars. THEOREM 1.8. Let ζ (r, n) be any function such that, for any r > 1, lim n→∞ ζ (r, n)(n log n) 1/3 = 0.
Then there is no infinite family of superstars that is up-to-ζ fixating.
The function ζ M from Theorem 1.6 certainly satisfies lim n→∞ ζ M (r, n)(nlog n) 1/3 = 0, so Theorem 1.8 shows that there is no infinite family of superstars that is up-to-ζ M fixating. More mundanely, it shows, for example, that if ζ (r, n) = n −1/3 (log n) −1 , then no infinite family of superstars is up-to-ζ fixating. Theorem 1.8 is a direct consequence of Theorem 4.1, which is proved in Section 4 (see page 19). It turns out that analysing superstars is a little bit easier than analysing megastars or metafunnels, so this is the first proof that we present.
Taken together, Theorems 1.6 and 1.8 show that superstars are worse amplifiers than megastars. We next show that metafunnels are substantially worse. We start with the following simple-to-state theorem. THEOREM 1.9. Fix any δ > 0 and let ζ (r, n) = n −δ . Then there is no infinite family of metafunnels that is up-to-ζ fixating.
In fact, Theorem 1.9 can be strengthened by an exponential amount. THEOREM 1.10. Fix any < 1/2 and let ζ (r, n) = n −1/(log n) . Then there is no infinite family of metafunnels that is up-to-ζ fixating. Theorems 1.9 and 1.10 are a direct consequence of Theorem 5.1, which is proved in Section 5 (see page 32). In fact, Theorem 5.1 provides even tighter bounds, though these are more difficult to state.
The theorems that we have already described (Theorem 1.6, Theorem 1.8, and Theorem 1.10) are the main contributions of the article. Together, they show that there is a family of megastars that is strongly amplifying, and that there are no families of superstars or metafunnels that amplify as well. For completeness, we present a theorem showing that the analysis of Theorem 1.6 is fairly tight, in the sense that there are no infinite families of megastars that amplify substantially better than ϒ Min particular, our bound on extinction probability can only be improved by factors of log(n). It cannot be improved more substantially. THEOREM 1.11. Let ζ (r, n) = n −1/2 /(52r 2 ). There is no infinite family of megastars that is up-to-ζ fixating.
Theorem 1.11 follows from Theorem 7.3, which is straightforward, and is proved in Section 7 (see page 86). We conclude the article with a digression which perhaps clarifies the literature. It is stated, and seems to be commonly believed, that an evolutionary graph (a weighted version of the Moran process-see Section 8 for details) is "isothermal" if and only if the fixation probability of a mutant placed uniformly at random is ρ reg (r, n). This belief seems to have come from an informal statement of the "isothermal theorem" in the main body of Lieberman et al. [2005] (the formal statement in the supplementary material of Lieberman et al. [2005] is correct, however) and it has spread, for example, as Theorem 1 of Shakarian et al. [2012]. In the final section of our article, we clear this up by proving the following proposition, which says that there is a counterexample. PROPOSITION 1.12. There is an evolutionary graph that is not isothermal, but has fixation probability ρ reg (r, n).
The definitions needed to prove Proposition 1.12 are deferred to Section 8 (see page 86).

Proof Techniques
As we have seen, it is easy to study the Moran process on a d-regular graph by considering the transition matrix of the corresponding Markov chain (which looks like a one-dimensional random walk). Highly symmetric graphs such as undirected stars can also be handled in a straightforward manner, by directly analysing the transition matrix. Superstars, metafunnels, and megastars are more complicated, and the number of mutant configurations is exponential, so instead we resort to dividing the process into phases, as is typical in the study of randomised algorithms and stochastic processes.
An essential and common trick in the area of stochastic processes (e.g., in work on the voter model) is moving to continuous time. Instead of directly studying the discrete-time Moran process, one could consider the following natural continuous-time model which was studied in Díaz et al. [2016]: Given a set of mutants at time t, each vertex waits an amount of time before reproducing. For each vertex, the period of time is chosen according to the exponential distribution with parameter equal to the vertex's fitness, independently of the other vertices. If the first vertex to reproduce is v at time t+τ then, as in the standard, discrete-time version of the process, one of its out-neighbours w is chosen uniformly at random, the individual at w is replaced by a copy of the one at v, and the time at which w will next reproduce is exponentially distributed with parameter given by its new fitness. The discrete-time process is recovered by taking the sequence of configurations each time a vertex reproduces. Thus, the fixation probability of the discrete-time process is exactly the same as the fixation probability of the continuoustime process. So moving to the continuous-time model causes no harm. As Díaz et al. [2016] explain, analysis can be easier in the continuous-time model because certain natural stochastic domination techniques apply in the continuous-time setting but not in the discrete-time setting.
It turns out that moving to the model of Díaz et al. [2016] does not suffice for our purposes. A major problem in our proofs is dealing with dependencies. In order to make this feasible, we instead study a continuous-time model (see "the clock process" in Section 3.1) in which every edge of the underlying graph G is equipped with two Poisson processes, one of which is called a mutant clock and the other of which is called a nonmutant clock. The clock process is a stochastic process in which all of these clocks run independently. The continuous-time Moran process (Definition 3.2) can be recovered as a function of the times at which these clocks trigger.
Having all of these clocks available still does not give us the flexibility that we need. We say that a vertex u "spawns a mutant" in the Moran process if, at some point in time, u is a mutant, and it is selected for reproduction. We wish to be able to discuss events such as the event that the vertex u does not spawn a mutant until it has already been a mutant for some particular amount of time. In order to express such events in a clean way, making all conditioning explicit, we define additional stochastic processes called "star-clocks" (see Section 3.3). All of the star-clocks run independently in the star-clock process.
In Section 3.4 we provide a coupling of the star-clock process with the Moran process. The coupling is valid in the sense that the two projections are correct-the projection onto the Moran process runs according to the correct distribution and so does the projection onto the star-clock process. The point of the coupling is that the different star-clocks can be viewed as having their own "local" times. In particular, there is a star-clock M * (u,v) which controls reproductions from vertex u onto vertex v during the time that u is a mutant. The coupling enables us to focus on relevant parts of the stochastic process, making all conditioning explicit.
The processes that we have described so far are all that we need to derive our upper bound on the fixation probability of superstars (Section 4). This is the easiest of our main results.
Analysing the Moran process on metafunnels is more difficult. By design, the initial mutant x 0 is likely to be placed in the "top of a funnel" (in the set V k ). In the analysis, it is useful to be able to create independence by considering a "strain" of mutants which contains all of the descendants of a particular mutant spawned by x 0 . Like the Moran process itself, a strain can be viewed as a stochastic process depending on the triggering of the clocks. In order to facilitate the proof, we define a general notion of "mutant process" (Section 3.2)-so the Moran process is one example of a mutant process, and a strain is another. The analysis of the Moran process on metafunnels involves both of these and also a third mutant process which is essentially the bottom level of a strain (called its head). Strains and heads-of-strains share some common properties, and they are analysed together as "colonies" in Section 5.4.1. The analysis of the metafunnel is the technically most difficult of our results.
Fortunately, the analysis of the megastar in Section 6 does not require three different types of mutant processes-it only requires one. The process that is considered is not the Moran process itself. Instead, it is a modification of the Moran process called the megastar process. The megastar process is similar to the Moran process except that the feeder vertices are forced to be nonmutants, except when their corresponding cliques are completely full or completely empty. It is easy to show (see the proof of Theorem 6.1) that the fixation probability of the Moran process is at least as high as the fixation probability of the megastar process. However, the megastar process is somewhat easier to analyse because the cliques evolve somewhat independently. The proof of the key lemma (Lemma 6.3) is fairly long but it is not conceptually difficult. The point is to prove that, with high probability, the cliques fill up and cause fixation.

Comparison with Previous Work
The Moran process is similar to a discrete version of directed percolation known as the contact process. There is vast literature (e.g., Liggett [1999], Durrett [2010], Shah [2009], and Durrett and Steif [1993]) on the contact process and other related infection processes such as the voter model and Susceptible-Infected-Susceptible (SIS) epidemic models. Often, the questions that are studied in these models are different from the question that we study here. For example, in voter systems [Durrett and Steif 1993] the two states (mutant/nonmutant) are often symmetric (similar to our r = 1 case) and the models are often studied on infinite graphs where the question is whether the process absorbs or not (both kinds of absorption, fixation and extinction, are therefore called "fixation" in some of this work). The particular details of the Moran process are very important for us because the details of the algorithm determine the long-term behaviour. For example, unlike the Moran process, in the contact process [Bezuidenhout and Grimmett 1990], the rate at which a node becomes a nonmutant is typically taken to be 1, whereas the rate at which a node becomes a mutant is proportional to the number of mutant neighbours. In the discrete-time versions of many commonly studied models, a node is chosen randomly at each step for replacement, rather than (as in the Moran process) for reproduction. In any case, the important point for us is that the details of the algorithm are important-results do not carry over from one algorithm to the other. Therefore, we concentrate in this section on previous work about calculating the fixation probability of the Moran process itself. Lieberman et al. [2005] studied the fixation probability of the Moran process and introduced superstars and metafunnels. Intuitively, a superstar is a good amplifier because (as long as m is sufficiently large) the initial mutation is likely to be placed in a reservoir and (as long as is sufficiently large) this is unlikely to be killed quickly by the centre vertex. Moreover, the paths of a superstar are good for amplifying the selective advantage of mutants because, after the infection spreads from a reservoir vertex to the beginning of a path, it is likely to "pick up momentum" as it travels down the path, arriving at the centre vertex as a chain of (k) mutants (which, taken together, are more likely to cause the centre to spread the infection than a single mutant arriving at the centre would be). As we have seen (Theorems 1.6 and 1.8) megastars are provably better for amplification than superstars. The reason for this is that a clique is substantially better than a path at doing this "amplification." Nevertheless, the amplifying properties of superstars strongly influenced our decision to study megastars. Lieberman et al. [2005, Equation (2)] claimed 2 that for sufficiently large n, the fixation probability of a superstar with parameter k tends to 1 − r −(k+2) , and that "similar results hold for the funnel and metafunnel." They provided a heuristic sketch proof for the superstar, but not for the funnel or metafunnel. Hauert [2008, Equation (5)] claims specifically that the fixation probability of funnels tends to 1 − r −(k+1) . As far as we know, no heuristic arguments have been given for funnels or metafunnels.
In any event, Díaz et al. [2013] showed that the 1 − r −(k+2) claim for superstars is incorrect for the case k = 3. In particular, for this case they showed that the fixation probability is at most 1 − r+1 2r 5 +r+1 , which is less than the originally claimed value of 1 − r −5 for all r ≥ 1.42.
Subsequently, Jamieson-Lane and Hauert [2015, Equation (5)] made a more detailed but still heuristic 3 analysis of the fixation probability of superstars. They claim that for superstars with parameter k and with = m, the fixation probability ρ k has the following bounds for fixed r > 1, where the o(1) terms tend to 0 as → ∞. They claim that their bounds are a good approximation as long as k = m ∼ √ n. It is not clear exactly what " " means in this context. Certainly there are parameter regimes where k = o( ) and = m ∼ √ n but nevertheless the extinction probability is much larger than the proposed upper bound 1/(r 4 (k − 1)(1 − 1/r) 2 ) from Equation (3). For example, suppose that = m = k 3/2 . In this case (see Lemma 4.2), the extinction probability is at least k 2r(m + k) = 1 2r(k 1/2 + 1) , which is larger than 1/(r 4 (k − 1)(1 − 1/r) 2 ) for all sufficiently large k. Nevertheless, the bounds proposed by Jamieson-Lane and Hauert (Equation (3)) seem to be close to the truth when k is very small compared to and m. Our Corollary 4.6 identifies a wide class of parameters for which the extinction probability is provably at least 1/(1470r 4 k). This is weaker than the suggested bound of Jamieson-Lane and Hauert by a factor of 1,470. This constant factor is explained by the fact that our rigorous proof needs to show concentration of all random variables. We use lots of Chernoff bounds and other bounds on probabilities. In writing the proof, we optimised readability rather than optimising our constants, so our constants can presumably be improved.
There is recent work on other related aspects of the Moran process. For example,  and  give fixation probability bounds on connected undirected graphs. Adlam et al. [2015] study amplification with respect to adversarial or "temperature-based" placement of the initial mutation, in which the "temperature" of a vertex is proportional to the sum of all incoming edge weights. Also,  consider the extent to which the number of "good starts" for fixation can be bounded.
Amplifiers for the Moran Process 5:11

Outline of the Article
Section 2, starting on page 11, defines some notation and states some well-known probabilistic bounds (Chernoff bounds and analysis of gambler's ruin) which will be used in the proof.
Section 3, starting on page 15, defines several stochastic processes which we use to study the Moran process. This section is important. It is impossible to read any of the proofs without understanding these processes.
Section 4, starting on page 19, gives an upper bound on the fixation probability of superstars. The main result of the section is Theorem 4.1, which immediately implies Theorem 1.8. This is the technically easiest of our main proofs, so we present it first.
Section 5, starting on page 32, gives a stronger upper bound on the fixation probability of metafunnels (and hence of funnels). The main result of the section is Theorem 5.1, which immediately implies Theorems 1.9 and 1.10. The proof of Theorem 5.1 has highlevel similarity to the proof of Theorem 4.1, but it is much more difficult. Dependencies cause complications, and we must analyse several mutant processes to deal with these.
Section 6, starting on page 52, establishes the existence of an infinite family of megastars which is strongly amplifying. The main theorem is Theorem 6.1, which immediately implies Theorem 1.6 and hence Theorem 1.4. In order to deal with dependencies, we study a mutant process called a "megastar process." We show in the proof of Theorem 6.1 that this process is dominated by the Moran process. Thus, the main work of the section is to prove the key lemma, Lemma 6.3, which analyses the megastar process.
Section 7, starting on page 85, gives an upper bound showing that the analysis in Section 6 is fairly tight. The main theorem, Theorem 7.3, is straightforward and it immediately implies Theorem 1.11. Section 8, starting on page 86, gives a simple example of an evolutionary graph that is not isothermal but has fixation probability ρ reg (r, n) (Proposition 1.12), clearing up a misconception in the literature.
Section 9, starting on page 88, discusses earlier heuristic analysis of superstars.

Notation
We use N − (v) to refer to the set of in-neighbours of a vertex v and N + (v) to refer to the set of out-neighbours of v.
We refer to the Lebesgue measure of a (measurable) subset S ⊆ R as the measure of that set, and denote it by len(S).
We use base-e for logarithms unless the base is given explicitly.
If b < a, we consider the interval [a, b] to be well defined but empty. Likewise, if b ≤ a, we consider the intervals (a, b), (a, b], and [a, b) to be well defined but empty. We define empty sums, products, unions, etc., to be the identities of the corresponding operations. For example, 0 i=1 i = 1 and 0 i=1 A i = ∅. Throughout the article, we use lowercase t's to denote fixed times and uppercase T 's to denote stopping times.

Chernoff Bounds
We often use the following simple bound which applies to any real number x ∈ [0, 1].
We will require the following well-known Chernoff bounds. The first appears as Theorem 5.4 of Mitzenmacher and Upfal [2005].
LEMMA 2.1. Let Y be a Poisson random variable with parameter ρ ≥ 0. If y > ρ and z < ρ, then PROOF. Lemma 2.1 applied with y = 2ρ and z = 2ρ/3 implies that PROOF. Note that y > e 2 ρ. Thus, by Lemma 2.1, we have COROLLARY 2.4. Let s be a positive integer and let Y be the sum of s independent and identically distributed (i.i.d.) exponential random variables, each with parameter λ. Then, for any j ≥ 3s/(2λ), P(Y < j) ≥ 1 − e −λj/16 . PROOF. First, note that P(Y < j) = P(Y ≤ j) since P(Y = j) = 0. Then P(Y ≤ j) is equal to the probability that a Poisson process with parameter λ triggers at least s times in the interval [0, j]. This is the same as the probability that a Poisson random variable with parameter λj is at least s. Since s ≤ 2λj/3, we can now use Corollary 2.2.
The following is Corollary 2.4 of Janson et al. [2000].
We define the geometric distribution as follows. Given a biased coin which comes up heads with probability p > 0, imagine tossing it until it comes up heads. Then the total number of tosses which came up tails follows the geometric distribution with parameter p.
LEMMA 2.6. Let Y 1 , . . . , Y t be a sequence of i.i.d. geometric variables with parameter p ≥ 13/14. Then PROOF. Consider a series of independent coin tosses, each with probability p of coming up heads. Then the probability that Y 1 + · · · + Y t ≥ 14t(1 − p) is exactly the probability that at least 14t(1 − p) of the first t + 14t(1 − p) − 1 coin tosses come up tails. By Lemma 2.5, the probability that at least 14t(1 − p) of the first 2t coin tosses come up tails is at most e −14t(1− p) , and 2t ≥ t + 14t(1 − p) − 1, so the result follows.

Gambler's Ruin
The following analysis of the classical gambler's ruin problem is well known. See, for example, Feller [1968, Chapter XIV].
(i) The probability of reaching state a is (ii) The expected number of transitions until absorption is COROLLARY 2.8 (GAMBLER'S RUIN INEQUALITIES). Consider a random walk on Z ≥0 that absorbs at 0 and a (for some positive integer a), starts at z ∈ [0, a], and from each state in {1, . . . , a − 1} has probability p = 1/2 of increasing (by 1) and probability q = 1 − p of decreasing (by 1).
(i) If p > q, then the probability of reaching state a is at least 1 − (q/ p) z .
(ii) If q > p, then the expected number of transitions until absorption is at most z/(q − p). (iii) If p > q, then the expected number of transitions until absorption is at most PROOF. Items (i) and (ii) are immediate. To see (iii) for p > q, rewrite the expected number of transitions as a p − q We also consider a variant of the gambler's ruin in which the probability of upwards transitions depends on the current state. LEMMA 2.9. Let a, b, c, and d be integers satisfying a < b < c − 1 and c + 1 < d. Consider p 1 ∈ (1/2, 1). Consider the discrete Markov chain on states {a, . . . , d} with the following transition matrix. p a,a = 1, PROOF. We first prove (i). It is immediate that Rearranging, Now from Lemma 2.7(i), Also, from Corollary 2.8(i), Plugging these bounds into Equation (5), we get We now prove (ii Plugging all of these in,

STOCHASTIC PROCESSES
We will be concerned with the discrete-time Moran process [Moran 1958], as adapted by Lieberman et al. [2005] and described in Section 1. This is a discrete model of evolution on an underlying directed graph G where the reproduction rate of mutants is a parameter r > 0 called the "fitness." In this article, we consider the situation r > 1, which corresponds to the situation in which a mutation is advantageous. The fitness r is a parameter of all of our processes. Our results apply to any fixed r > 1. Since the value of r is fixed, we simplify the presentation by not including it in the explicit notation and terminology. Thus, from now on, we say "Moran process" to signify "Moran process with fitness r." Following Díaz et al. [2016], we will simplify our proofs by studying a continuoustime version of the Moran process. The continuous-time version is also parameterised by G and r and it has the same fixation probability as the discrete-time version, so our results will carry over immediately to the discrete process.
In order to deal with conditioning in the proofs we will in fact define several general stochastic processes, all of which depend on G and r-one of these will be equivalent to the continuous-time Moran process and others will be useful for dominations.
All of the processes that we study evolve over time. For any process P, we use F(P) to denote the filtration of P so F t (P) captures the history of the process P up to and including time t.

The Clock Process
For each edge e = (u, v) of G we define two Poisson processes-a Poisson process M e with parameter r/d + (u) and a Poisson process N e with parameter 1/d + (u). We refer to these processes as clocks, and when an event occurs in one of them, we say that the relevant clock triggers. We refer to M e as a mutant clock with source u and target v and N e as a nonmutant clock with source u and target v.
We use C(G) to denote the set of all clocks so C(G) = e∈E(G) {M e , N e }. We use P(G) to denote the Cartesian product of all processes in C(G). P(G) is the stochastic process in which all clocks in C(G) evolve simultaneously and independently, starting at time 0.
With probability 1, the clocks trigger a countably infinite number of times and these can be indexed by an increasing sequence τ 1 , τ 2 , . . . . Also, no clocks trigger simultaneously and the clocks trigger for an infinitely long period-that is, for every clock and every t, the clock triggers at some τ i > t. For convenience, we take τ 0 = 0. We will use the random variables τ 0 , τ 1 , . . . (which depend on the process P(G)) in our arguments.

Mutant Processes
A mutant process μ has an underlying graph G(μ) and initial state μ 0 . At every time t, the state μ t is a subset of V (G(μ)), which we sometimes refer to as the "set of mutants" at time t. Every mutant process satisfies the following two constraints.
(2) For all t, t ≥ 0, if there is a nonnegative integer i so that t and t are both in the range [τ i , τ i+1 ), then μ t = μ t .
We define some terminology associated with the mutant process μ.
-If the clock M (u,v) triggers at time t and u ∈ μ t , we say that u spawns a mutant onto v in μ at time t and that μ spawns a mutant onto v at time τ i . -If the clock N (u,v) triggers at time t and u / ∈ μ t , we say that u spawns a nonmutant onto v in μ at time t. We say that μ spawns a nonmutant onto v at time τ i .
When the mutant process is absolutely clear from the context, we sometimes drop the phrase "in μ." Note that v does not necessarily become a mutant at time τ i when some u spawns a mutant onto v at time τ i since v may already be a mutant at that time.
For convenience, we include the filtration F t (P(G)) in the filtration F t (μ) of the mutant process so the sequence of trigger times τ 0 , τ 1 , . . . up to time t can be determined from F t (μ).
Remark 3.1. Sometimes we will consider a mutant process μ in which the initial state μ 0 is a randomly chosen subset of V (G(μ)). When we do this, we assume that the choice of the initial state μ 0 is independent of the triggering of the clocks in C(G(μ)).
We will define several mutant processes in the course of our proofs, but the most fundamental is the Moran process itself, which is a particular mutant process.
Definition 3.2 (the Moran Process). The (continuous-time) Moran process on graph G with initial mutant x 0 ∈ V (G) is a mutant process X with G(X) = G and X 0 = {x 0 } defined as follows. Recall that, for every positive integer i, a clock C ∈ C(G) triggers at τ i . For t ∈ (τ i−1 , τ i ), we set X t = X τ i−1 . Then we define X τ i as follows.
Considering the positive integers i in order, this completes the definition of the Moran process X t .
Remark 3.3. It is clear from Definition 3.2 that the Moran process X t is a mutant process. In Definition 3.2, say that τ i is a "relevant trigger time" if (i) or (ii) occurs rather than (iii). The discrete-time Moran process [Moran 1958], as adapted by Lieberman et al. [2005] is the Markov chain X τ 0 , X τ i 1 , X τ i 2 , . . . , where τ i 1 , τ i 2 , . . . is the increasing sequence of relevant trigger times. Note that the fixation probability of the discretetime Moran process is the same as the fixation probability of the continuous-time process X t , so we will study the process X t in this article.
Definition 3.4. We say that a mutant process is extinct by time t if, for all t ≥ t, μ t = ∅. We say that it fixates by time t if, for all t ≥ t, μ t = V (G(μ)). We say that it absorbs by time t if it is extinct by time t or it fixates by time t. The fixation probability is the probability that, for some t, it fixates by time t. The extinction probability is the probability that, for some t, it is extinct by time t.
Remark 3.5. The Moran process X t is extinct by time t if X t = ∅ and fixates by time t if X t = V (G(X)). If G is strongly connected, then the fixation probability and the extinction probability sum to 1.
Definition 3.6. For any mutant process μ, any vertex u ∈ V (G(μ)), and any t ≥ 0, we define i m (μ, u, t) to be the measure of the set {t ≤ t | u ∈ μ t }. Similarly, we define i n (μ, u, t) to be the measure of the set The subscript "m" stands for "mutant" since i m (μ, u, t) is the amount of time that u is a mutant in μ, up until time t. Similarly, the subscript "n" stands for nonmutant. The random variables i m (μ, u, t) and i n (μ, u, t) are determined by F t (μ). Also, i m (μ, u, t) + i n (μ, u, t) = t.

The Star-Clock Process
Consider a mutant process μ. We wish to be able to discuss events such as the event that a vertex u does not spawn a mutant until it has been a mutant for time t. In order to express such events in a clean way, making all conditioning explicit, we define additional stochastic processes.
For each edge e = (u, v) of G we define four further Poisson processes-Poisson processes M * e and M * e each with parameter r/d + (u) and Poisson processes N * e and N * e each with parameter 1/d + (u). We refer to these processes as star-clocks. We identify sources and targets of star-clocks in the same way that we did for clocks. For example, the star-clock M * (u,v) has source u and target v.
The star-clock process P * (G) is the stochastic process where all star-clocks in C * mut (G) ∪ C * nmut (G) evolve simultaneously and independently, starting at time 0.

A Coupled Process
Given a mutant process μ let G = G(μ). We will now define a stochastic process (μ) which is a coupling of μ (which includes the clock process P(G)) with the newly defined star-clock process P * (G). Intuitively, the idea of the coupling is that each clock M (u,v) in P(G) will evolve following M * (u,v) when u is a mutant and following M * (u,v) when u is a nonmutant. Similarly, N (u,v) will evolve following N * (u,v) when u is a nonmutant and N * (u,v) when u is a mutant. In the coupling, we pause the star-clocks in C * mut (G) ∪ C * nmut (G) while they are not being used to drive clocks in C(G), so that, for example, the "local time" of a clock M * (u,v) at global time t is i m (μ, u, t). We will be able to deduce both F t (μ) and F t (P * (G)) from the filtration F T ( (μ)) of the coupled process at an appropriate stopping time T -the details are given in the following. The fact that the coupling is valid (which we will show in the following) will ensure that both of the marginal processes, μ and P * (G), evolve according to their correct distributions.
To construct the coupling we start with a copy of the star-clock process P * (G) and with the initial state μ 0 of the mutant process μ. We define τ 0 = 0 (so we have implicitly defined F τ 0 (μ)).
Suppose that, for some nonnegative integer j, we have defined F τ j (μ). Given this and the evolution of the star-clock process P * (G), we will show how to define τ j+1 and F τ j+1 (P(G)) which determine F τ j+1 (μ). To do this, let t j be the minimum t > 0 such that one of the following occurs.
-For some u ∈ μ τ j , a star-clock in C * mut (G) with source u triggers at time i m (μ, u, τ j ) + t, or -for some u / ∈ μ τ j , a star-clock in C * nmut (G) with source u triggers at time i n (μ, u, τ j ) + t. We define τ j+1 = τ j + t j . No clocks in C(G) trigger in the interval (τ j , τ j+1 ). We now determine which clock from C(G) triggers at time τ j+1 by reconsidering each case.
This fully defines F τ j+1 (P(G)) and hence F τ j+1 (μ). So we have fully defined the coupling and therefore the process (μ).
Before showing that the coupling is valid, it will be helpful to state exactly what information is contained in F t ( (μ)). Certainly this includes F t (μ), which itself includes F t (P(G)). Also, F t (P(G)) defines a nonnegative integer j so that t ∈ [τ j , τ j+1 ). We will use j to state the information that F t ( (μ)) contains about the evolution of P * (G).
) includes a list of the times in [0, i n (μ, u, τ j )] when C triggers.
To show that the coupling is valid we must show that both of the marginal processes, μ and P * (G), evolve according to their correct distributions. The fact that P * (G) does so is by construction. To show that μ does so, it suffices to prove that for all j ∈ Z ≥0 and all possible values f j of F τ j (μ), the distribution of F τ j+1 (μ) conditioned on F τ j (μ) = f j is correct. Note that the only information contained in F τ j+1 (μ) but not F τ j (μ) is the value of τ j+1 and the identity of the clock in C(G) that triggers at time τ j+1 .
Let f j be an arbitrary possible value of F τ j ( (μ)) consistent with the event F τ j (μ) = f j , in the sense that the intersection of the events F τ j ( (μ)) = f j and F τ j (μ) = f j is nonempty. Recall from the definition of (μ) that, conditioned on F τ j ( (μ)) = f j , F τ j+1 (μ) depends only on particular star-clocks in particular intervals, as follows.
-For each u ∈ μ τ j , it depends on the evolution of each star-clock in C * mut (G) with source u only during the interval (i m (μ, u, τ j ), ∞). It does not depend on the evolution of star-clocks in C * nmut (G) with source u. -For each u / ∈ μ τ j , it depends on the evolution of each star-clock in C * nmut (G) with source u only during the interval (i n (μ, u, τ j ), ∞). It does not depend on the evolution of star-clocks in C * mut (G) with source u.
For each star-clock, these intervals are disjoint from the intervals exposed in f j , and the start of each interval is determined by f j . Moreover, in the interval (τ j , τ j+1 ], each clock in C(G) is triggered by a unique clock in C * mut (G) ∪ C * nmut (G) with the same rate. Thus, all clocks in C(G) trigger with the correct rates in this period and they are independent of each other (since all of the star-clocks in P * (G) evolve independently). We conclude that F τ j+1 (P(G)), and hence F τ j+1 (μ), has the appropriate distribution. The coupling is therefore valid.
By construction, we have the following observation.
Observation 3.7. Let μ be a mutant process and consider (μ). Let (u, v) be an edge of G(μ). Given t > 0, let j be the maximum integer such that τ j < t. Then the following are true.
-M (u,v) triggers at time t if and only if either u ∈ μ τ j and M * (u,v) triggers at time i m (μ, u, t) or u / ∈ μ τ j and M * (u,v) triggers at time i n (μ, u, t). -N (u,v) triggers at time t if and only if either u / ∈ μ τ j and N * (u,v) triggers at time i n (μ, u, t) or u ∈ μ τ j and N * (u,v) triggers at time i m (μ, u, t).

AN UPPER BOUND ON THE FIXATION PROBABILITY OF SUPERSTARS
Recall the definition of a (k, , m)-superstar from Section 1.1.2. We use n = (k + m) + 1 to denote the number of vertices of a (k, , m)-superstar.
Given any i ∈ [ ], we say that v i,1 v i,2 . . . v i,k is the path associated with the reservoir R i . We will often consider the case that the initial mutant x 0 is in a reservoir. When it is possible, we simplify the notation by dropping the index i of the reservoir. Thus, we write R for the reservoir containing x 0 and we write v 1 . . . v k for the path associated with R. So if R = R i , then for each j ∈ [k], we write v j as a synonym for v i, j . The main result of this section is the following upper bound on the fixation probability of the superstar. THEOREM 4.1. Let r > 1. Then there exists a constant c r > 0 (depending on r) such that the following holds for all positive integers k, , and m. Choose x 0 uniformly at random from V (S k, ,m ). Let X be the Moran process (with fitness r) with G(X) = S k, ,m and X 0 = {x 0 }. Then, the probability that X goes extinct is at least 1/(c r (n log n) 1/3 ).

Proof Sketch
In this section, we give an informal sketch of the proof of Theorem 4.1. The presentation of the proof itself does not depend upon the sketch so the reader may prefer to skip directly to the proof. In all of our proof sketches, we use the word "likely" to mean "sufficiently likely." We leave the details of "how likely" to the actual proofs.
If m is small relative to k (in particular, if m < k(n log n) 1/3 ), then the initial mutant x 0 is likely to be placed in a path, rather than in a reservoir. If this happens, then it is likely to go extinct. This easy case is dealt with in Lemma 4.2 and corresponds to Case 2 in the proof of Theorem 4.1. (Case 1 is the trivial case where n < n 0 .) Another easy case arises if is sufficiently small relative to n (in particular, if = O((n log n) 1/3 )). This case is dealt with in Lemma 4.3 and corresponds to Case 3 in the proof of Theorem 4.1. In this case, even when x 0 is placed in a reservoir R, it is still likely that x 0 dies before v 2 ever becomes a mutant. This is because it takes roughly (m) time for the mutation to spread from v 1 to v 2 since a mutant at v 1 has only probability (1/m) of spawning a mutant before it dies. On the other hand, since is small, x 0 is sufficiently likely to die in (m) time. For details, see the proof of Lemma 4.3.
The remaining case, Case 4 in the proof of Theorem 4.1, is deemed the "difficult regime" and is dealt with in Section 4.4. In this case, it is easy to show that = (κ log n) and m = (κ) where κ = max{3k, 70r 4 log n}.
It is likely that the initial mutant x 0 is placed in a reservoir R, and the key lemma, showing that it is sufficiently likely to go extinct, is Lemma 4.5.
At a very high level, the argument proceeds as follows. Suppose that v * does not spawn a mutant before x 0 dies. Then it is very easy to see that, after x 0 dies, the path of reservoir R is likely to go extinct quickly.
Thus, the crux of the argument is to show that x 0 is likely to die before v * spawns a mutant. Each time v * becomes a mutant it has an O(1/ ) chance of spawning a mutant before dying, so roughly our goal is to show that x 0 is sufficiently likely to die before v * becomes a mutant ( ) times.
Very roughly, our high-level approach is to partition time into intervals of length κ = O(m). In each block of O(m/κ) such intervals, v 2 is likely to become a mutant O(1) times. Each time this happens, it is likely that R's path will again fill with nonmutants within O(κ) time, so it is likely that v k is a mutant for at most O(κ) time during the block and it is likely that v * becomes a mutant at most O(κ) times during the block.
Combining O( /κ) blocks, it is likely that v * becomes a mutant at most O( ) times by time m/κ. Since N (v * ,x 0 ) has rate 1/( m), it is also likely that x 0 dies by time m/κ.
In more detail, the proof of Lemma 4.5 shows that x 0 dies before v * spawns a mutant as long as certain events called P 1 -P 5 occur. These events are defined in the statement of Lemma 4.11. They formalise the high-level approach that we have just described. It is important that most of these events are defined in terms of clock-triggers so that we can get good upper bounds on the probability that they fail and thus prove (in Lemma 4.11) that they are likely to occur simultaneously.
The proof of Lemma 4.5 tracks a quantity σ (t) which is the number of times that v k (the end of the path of the reservoir containing x 0 ) spawns a mutant onto the centre vertex v * by time t. The proof uses P 1 -P 5 to show that σ (t) stays O( ) up to a fixed time t x 0 = O( m/κ). As we noted, the analysis divides the period up to time t x 0 into intervals of size κ. Event P 5 ensures that during most such intervals, nonmutant clocks with target v 1 and mutant clocks with targets v 1 and v 2 behave appropriately so that, if x 0 is the only mutant in R during the interval, then v 2 does not become a mutant during the interval. The fact that x 0 is indeed the only mutant in R follows from event P 1 which ensures that v * does not spawn a mutant while σ (t) is small. Then, since v 2 does not become a mutant during the interval, event P 3 ensures that the clocks along the path trigger in such a way that (unless v 1 or v * spawn a mutant) the only mutants remaining at the end of the interval are in {x 0 , v 1 }. This ensures that σ (t) stays small through another interval. Event P 5 only ensures the preceding during "most such intervals" but event P 4 ensures that the mutant clock with source v k does not trigger too often, so the remaining intervals are not too problematic. Thus, events P 1 , P 3 , P 4 , and P 5 , taken together, ensure that Given that σ (t x 0 ) is O( ), it is easy to show that the initial mutant goes extinct during the next two intervals (beyond time t x 0 ). Event P 1 ensures that v * does not spawn any mutants. Event P 2 ensures that the initial mutant x 0 has already died by time t x 0 . Finally, event P 3 ensures that any remaining mutants die in the path during the next two intervals.
The difficult part of the proof is defining events P 1 -P 5 in such a way that we can show (in Lemma 4.11) that they are likely to occur simultaneously. It turns out (Lemma 4.15, Corollary 4.17, and Lemma 4.18) that events P 3 -P 5 are so unlikely to fail that we bound this probability with a simple union bound, avoiding any complicating conditioning. (Of course, for this it was necessary to express these events in terms of clocks rather than in terms of the underlying Moran process.) In order to simplify the presentation, we deal with P 1 and P 2 together, in Lemma 4.14. Roughly, they correspond to the event that, as long as σ (t) = O( ) then v * does not spawn a mutant at time t and, for t = t x 0 , x 0 dies by time t. This event is implied by the conjunction of three further events.
-E 3 corresponds informally to the event that v * is a mutant for a period of time shorter than 1/r during the first O( ) times that it becomes a mutant (though the formal definition is expressed in terms of clocks, and is a little more complicated). Note the intention, though, which is to ensure that v * is a mutant for a period of time shorter than 1/r, which makes E 1 relevant.
Lemma 4.13 shows that E 3 is very likely to hold. In the proof of Lemma 4.14, it is observed that E 1 and E 2 are independent (by the definition of the star-clocks) and that P(E 1 ) = 1/e. The proof demonstrates that E 2 is sufficiently likely, giving the desired bound.

The Easy Regimes
LEMMA 4.2. Choose x 0 uniformly at random from V (S k, ,m ). Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. The extinction probability of X is at least k/(2r(m+ k)).
PROOF. Let R be the reservoir containing x 0 , and let v 1 . . . v k be the path associated with R. Let ξ = m/(2r) , t * = m/(4r 2 ), and J = [0, t * ]. For all t ≥ 0, let E 1 , E 2 , and E 3 t be events defined as follows.
Suppose that events E 1 , E 2 , and E 3 occur. We will show that X goes extinct. Let ξ be the number of times that v 1 becomes a mutant in J. By E 2 , ξ ≤ ξ . By E 3 , for each of the first ξ times that v 1 becomes a mutant, it dies before spawning a mutant. Thus, for all t ∈ J, X t ⊆ {x 0 , v 1 }. Also, by E 1 , x 0 dies in J. As soon as x 0 dies, and v 1 dies for the (ξ )'th time, X is extinct.
Here the inequality follows by Equation (4). Moreover, since M (x 0 ,v 1 ) has rate r, by Corollary 2.2 we have For any t ∈ J, let f be a possible value of F t (X). Let be the random variable containing the list of times in J at which N (v * ,x 0 ) and M (x 0 ,v 1 ) trigger. Let ϕ be a possible value of which is consistent with the events F t (X) = f and E 1 ∩ E 2 . Note that ϕ determines E 1 ∩ E 2 . By memorylessness and independence of clocks in C(S k, ,m ), we have Since E 1 and E 2 depend entirely on distinct clocks in C(S k, ,m ) in fixed intervals, the two events are independent. Thus, by Equations (7)-(9), we have and the result follows.
LEMMA 4.5. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose that ≥ Kr 4 κ log n, m ≥ 6r 2 κ and n ≥ n 0 . Fix x 0 ∈ R 1 ∪ · · · ∪ R . Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. Then the extinction probability of X is at least 1/(7Kr 4 κ).
The following corollary, which applies to the regime in which κ = 3k, is immediate. COROLLARY 4.6. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose that k ≥ (K/3)r 4 log n, ≥ 3Kr 4 k log n, m ≥ 18r 2 k, and n ≥ n 0 . Fix x 0 ∈ R 1 ∪ · · · ∪ R . Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. Then the extinction probability of X is at least 1/(21Kr 4 k).
The crux of our proof of Lemma 4.5 is Lemma 4.11. In order to state this lemma, we require the following additional definitions.
The reason that we give t x 0 its name is that we will be most concerned with the case in which x 0 dies in the interval [0, t x 0 ]. Definition 4.8. Let I ⊆ [0, ∞) be an interval, let R ∈ {R 1 , . . . , R }, suppose x 0 ∈ R, and let v 1 . . . v k be the path associated with R. We say that v 1 clears before spawning a mutant within I if at least one of the following statements holds: We say that v 2 is protected in I i if both of the following properties hold.
(i) v 1 clears before spawning a mutant within I i .
(ii) For all t ∈ I i such that M (x 0 ,v 1 ) triggers at time t, v 1 clears before spawning a mutant within (t, (i + 1)κ].
In particular, suppose that v 2 is protected in I i and that x 0 is the only mutant in R for the duration of I i . Then, as we will see in the proof of Lemma 4.5, v 2 does not become a mutant in I i .
The purpose of this definition is the following. Suppose that X iκ ⊆ {x 0 , v 1 , . . . , v k , v * }, that neither v 1 nor v * spawns a mutant within I i , and that v 1 . . . v k clears within I i . Then, as we will see in the proof of Lemma 4.5, we will have X (i+1)κ ⊆ {x 0 , v 1 }.
Our main task will be to prove the following lemma.
LEMMA 4.11. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose ≥ Kr 4 κ log n, m ≥ 6r 2 κ, and n ≥ n 0 . Let R ∈ {R 1 , . . . , R }, and let v 1 . . . v k be the path associated with R. Fix x 0 ∈ R. Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. Then, with probability at least 1/(7Kr 4 κ), the following events occur simultaneously.
Note that the definition of P 5 considers i up to t x 0 /κ, because it corresponds to at most t x 0 /κ intervals of length κ. The definitions of P 3 and P 4 consider larger values of i. In fact, it is only necessary to take i up to t x 0 /κ + 2 in P 3 and P 4 but we state the lemma as we did to avoid clutter. As a first step towards proving Lemma 4.11, we prove Lemmas 4.13 and 4.14 which give a lower bound on the probability that P 1 and P 2 hold.
Definition 4.12. Let R ∈ {R 1 , . . . , R }, let v 1 . . . v k be the path associated with R, and suppose x 0 ∈ R. Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. We give two mutual recurrences to define stopping times T h n for all h ∈ Z ≥0 and T h m for all h ∈ Z ≥1 . The subscript "n" stands for "nonmutant" and the subscript "m" stands for 5:24 A. Galanis et al. LEMMA 4.13. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose ≥ Kr 4 κ log n and n ≥ n 0 . Let R ∈ {R 1 , . . . , R } and let v 1 . . . v k be the path associated with R. Let x 0 ∈ R. Let X be the Moran process PROOF. Let n 0 be an integer that is sufficiently large with respect to r. We claim that Y 1 , . . . , Y /(2r) are stochastically dominated above by /(2r) independent exponential variables, each with parameter − 1. To see this claim, fix i ∈ Z ≥1 , t ≥ 0, and y 1 , . . . , These clocks have total rate − 1, and so by memorylessness we have Since Equations (10) and (11) apply to every value of Thus, as required.
We are now in a position to prove that P 1 and P 2 occur with reasonable probability.
PROOF. Let n 0 be an integer that is sufficiently large with respect to r. Consider the process (X). Define the following three events.
We first bound P(E 1 ∩ E 2 ∩ E 3 ) below. The sum of the parameters of the star-clocks in We have t x 0 = m/(Kr 4 κ) ≥ mlog n ≥ 2 log n 0 by hypothesis so, by choice of n 0 , we may assume t x 0 ≥ 25. The parameter of the star-clock N * Note that E 1 and E 2 are independent of each other by the definition of the starclock process P * (S k, ,m ) and the fact that the intervals in the definitions of E 1 and E 2 are fixed: t x 0 = m/(Kr 4 κ) does not depend on the evolution of (X). So we have P(E 1 ∩ E 2 ) ≥ 12/(25eKr 4 κ). Finally, by Lemma 4.13 together with the fact that κ ≤ n, it follows that We next show that E 1 and E 3 together imply that P 1 occurs. If v * does not spawn a mutant before time t max , then P 1 occurs, so suppose instead that v * spawns a mutant for the first time at some time t sp ≤ t max . (The "sp" subscript in t sp stands for "spawn".) We must show that σ (t sp ) ≥ /(2r) + 1. This will ensure that P 1 occurs since σ (t) is monotonically increasing.
Since v * spawns no mutants before time t sp , we have X t ⊆ {x 0 , v 1 , . . . , v k , v * } for all t < t sp , and so (recalling Definition 3.6) Since E 3 occurs, we have However, since E 1 occurs and v * spawns a mutant at t sp , we have i m (X, v * , t sp ) ≥ 1/r. Therefore, by Equations (13) and (14), Thus, σ (t sp ) ≥ /(2r) + 1, and P 1 occurs. Finally, we show that P 1 , E 2 , and E 3 together imply that P 2 occurs. Suppose σ (t x 0 ) ≤ /(2r) . We have t x 0 ≤ m ≤ t max , so by P 1 , v * spawns no mutants in [0, t x 0 ]. Hence, as in Equation (13), we have ∈ X t x 0 and P 2 occurs. Thus, E 1 ∩ E 2 ∩ E 3 implies that P 1 and P 2 occur, and so the result follows from Equation (12).
Lower bounds for the probabilities that properties P 3 -P 5 hold follow from Chernoff bounds without too much difficulty.
LEMMA 4.15. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose ≥ Kr 4 κ log n, m ≥ 2, and n ≥ n 0 . Let R ∈ {R 1 , . . . , R } and let v 1 . . . v k be the path associated with R. Let x 0 ∈ R. Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. Then, P(P 3 ) ≥ 1 − 1/n. PROOF. Let n 0 be an integer that is sufficiently large with respect to r. Fix i ∈ Z ≥0we will bound the probability that v 1 . . . v k clears within I i (as in Definition 4.10).
To see this, note that the 1 − e −(r+1)(iκ+h+1−t 1 ) factor corresponds to the probability that either N (v 1 ,v 2 ) or M (x 0 ,v 1 ) triggers in the relevant interval (together, they correspond to a Poisson process with rate r + 1). The 1/(r + 1) factor corresponds to the probability that it is actually N (v 1 ,v 2 ) rather than M (x 0 ,v 1 ) that triggers first. Since the relevant interval has length at least 1/2, the product of these two factors is at least (1 − e −(r+1)/2 )/(r + 1). So Moreover, the events {E h | h ∈ Z ≥0 } are mutually independent, as they depend only on the behaviour of clocks in C(S k, ,m ) in fixed disjoint intervals. Thus, Now, for all integers j with 3 ≤ j ≤ k + 1, define , we see that t 1 , . . . , t k+1 satisfy the requirements of Definition 4.10 and so v 1 . . . v k clears within I i . It therefore follows by Equations (15) and (16), and a union bound that LEMMA 4.16. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose n ≥ n 0 . Fix x 0 ∈ R 1 ∪ · · · ∪ R . Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. With probability at least 1 − 1/n 2 , for all integers i with 0 ≤ i ≤ t x 0 , every clock in C(S k, ,m ) triggers at most 2rκ times in I i .
PROOF. Let n 0 be an integer that is sufficiently large with respect to r. Fix a given clock C ∈ C(S k, ,m ), fix i ∈ Z ≥0 with i ≤ t x 0 , and write a ≤ r for the rate of C(S k, ,m ). The number of times that C triggers in I i follows the Poisson distribution with parameter aκ. Since the number of triggers is an integer and 2rκ ≥ 2aκ, by Corollary 2.2 we have P(C triggers at most 2rκ times in I i ) = P(C triggers at most 2rκ times in I i ) ≥ P(C triggers at most 2aκ times in I i ) There are at most 2n 2 clocks in C(S k, ,m ) and at most t x 0 + 1 ≤ 2n 2 choices of i. Thus, by a union bound, with probability at least 1 − 4n 4 e −Kr 4 log n/3 ≥ 1 − 1/n 2 , no single clock in C(S k, ,m ) triggers more than 2rκ times in any interval The following corollary follows immediately from Lemma 4.16. Of course, the probability bound in the corollary can be strengthened to 1 − 1/n 2 , but we state what we will later use.
The following lemma gives a lower bound on the probability that P 5 occurs. In this lemma, we require that m ≥ 6r 2 κ, rather than m ≥ 2, which we have so far been assuming.
LEMMA 4.18. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose ≥ Kr 4 κ log n, m ≥ 6r 2 κ, and n ≥ n 0 . Let R ∈ {R 1 , . . . , R } and let v 1 . . . v k be the path associated with R. Fix x 0 ∈ R. Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. Then, P(P 5 ) ≥ 1 − 1/n. PROOF. Let n 0 be an integer that is sufficiently large with respect to r. Fix i ∈ Z ≥0 . For all t ∈ I i , define the following events E 1 t and E 2 i .
then v 2 is protected in I i . Now consider any t ∈ I i and let f t be a possible value of F t (X). By memorylessness, we have In particular, since the event By Lemma 4.16, we have P(E 2 i ) ≥ 1 − 1/n 2 , so it follows by a union bound that Since I 0 , I 1 , . . . are disjoint intervals, the events that v 2 is or is not protected in these intervals are independent by memorylessness. Thus, the number of intervals I i with 0 ≤ i ≤ t x 0 /κ in which v 2 is not protected is stochastically dominated above by a binomial distribution consisting of t x 0 /κ + 1 Bernoulli trials, each with success probability 3r 2 κ/m + 1/n 2 . This distribution has expectation so by Lemma 2.5 we have Here the penultimate inequality follows since ≥ Kr 4 κ log n by hypothesis. The result therefore follows. Now that we have proved lower bounds on the probability that each of P 1 -P 5 occur, Lemma 4.11 follows easily. LEMMA 4.11. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose ≥ Kr 4 κ log n, m ≥ 6r 2 κ, and n ≥ n 0 . Let R ∈ {R 1 , . . . , R }, and let v 1 . . . v k be the path associated with R. Fix x 0 ∈ R. Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. Then, with probability at least 1/(7Kr 4 κ), the following events occur simultaneously.
PROOF. P(P 1 ∩ · · · ∩ P 5 ) ≥ P(P 1 ∩ P 2 ) − P(P 3 ) − P(P 4 ) − P(P 5 ). Let n 0 be an integer that is sufficiently large with respect to r. Then we bound each term on the right-hand side by applying (in order) Lemma 4.14, Lemma 4.15, Corollary 4.17, and Lemma 4.18 to obtain as required. The final inequality follows since, by hypothesis, κ ≤ /(Kr 4 log n) ≤ n/ log n.
We are now at last in a position to prove Lemma 4.5, which we will then use to prove Theorem 4.1.
LEMMA 4.5. Consider any r > 1. There is an n 0 , depending on r, such that the following holds. Suppose that ≥ Kr 4 κ log n, m ≥ 6r 2 κ and n ≥ n 0 . Fix x 0 ∈ R 1 ∪ · · · ∪ R . Let X be the Moran process with G(X) = S k, ,m and X 0 = {x 0 }. Then the extinction probability of X is at least 1/(7Kr 4 κ).
PROOF. Let n 0 be an integer that is sufficiently large with respect to r. Let R be a reservoir in {R 1 , . . . , R } and let v 1 . . . v k be the path associated with R. Suppose x 0 ∈ R. By Lemma 4.11, it suffices to assume that P 1 -P 5 occur and to prove that X goes extinct.
Recall the definition of σ (t) from Definition 4.7. Note that σ (0) = 0 and σ (t) is monotonically increasing in t. We will first bound σ (t x 0 ) from above (assuming that P 1 -P 5 occur). Consider an interval I i with 0 ≤ i ≤ t x 0 (technically, we need only consider 0 ≤ i ≤ t x 0 /κ, but the extra generality does no harm and we will later need to consider slightly larger i). Note that We will derive an upper bound on σ ((i + 1)κ) by splitting into cases.
Case 1: i > 0 and v 2 is protected in I i−1 and I i . First note that since P 1 occurs and σ (iκ) ≤ /(2r) , v * does not spawn a mutant over the course of [0, iκ] and so Now, suppose for a contradiction that v 2 becomes a mutant at some time t 2 ∈ I i−1 .
Then v 1 must have become a mutant beforehand. Let t 1 be the latest time in [0, t 2 ] at which this occurs, and note that M (x 0 ,v 1 ) must have triggered at time t 1 . Since v 2 is protected in I i−1 , if it were the case that t 1 ∈ I i−1 , then v 1 would clear before spawning within ( t 1 , iκ] and so v 1 would die in ( t 1 , t 2 ). This is impossible since v 1 spawns a mutant at time t 2 , and v 1 does not become a mutant in ( t 1 , t 2 ] by the definition of t 1 . We therefore have t 1 / ∈ I i−1 , so t 1 ≤ (i − 1)κ. Since v 2 is protected in I i−1 , v 1 clears before spawning a mutant within I i−1 , so v 1 dies in ((i − 1)κ, t 2 )-again contradicting the fact that v 1 spawns a mutant at time t 2 . Thus, we can conclude that v 2 does not become a mutant in I i−1 . Since We have already seen that v 2 does not become a mutant in However, by the same argument as earlier, the fact that v 2 is protected in I i implies that v 2 does not become a mutant in I i . Hence, X t ⊆ {x 0 , v 1 } for all t ∈ I i , and in particular, v k does not spawn a mutant onto v * in I i . Thus, σ ((i + 1)κ) = σ (iκ). This gives the desired upper bound on σ ((i + 1)κ). Case 2: Case 1 does not hold. Suppose for a contradiction that σ ( Combining Cases 1 and 2, we have proved that whenever 0 Since P 5 occurs and ≥ Kr 4 κ log n, the number of intervals I i such that 0 ≤ i ≤ t x 0 /κ and Case 1 does not hold is at most Moreover, again using the fact that ≥ Kr 4 κ log n, Since σ (0) = 0, it therefore follows by repeated application of Equation (18) that Now consider the behaviour of the process in the interval (t x 0 , Since P 2 occurs and Equation (19) and the sequence of times t 1 , . . . , t k+1 ∈ I t x 0 /κ+1 be as in Definition 4.10.
In particular, X t x 0 /κ+2 κ = ∅, so X goes extinct and the result holds.
We now have everything we need to prove Theorem 4.1, which follows relatively easily from Lemmas 4.2, 4.3, and 4.5.
THEOREM 4.1. Let r > 1. Then there exists a constant c r > 0 (depending on r) such that the following holds for all positive integers k, , and m. Choose x 0 uniformly at random from V (S k, ,m ). Let X be the Moran process (with fitness r) with G(X) = S k, ,m and X 0 = {x 0 }. Then, the probability that X goes extinct is at least 1/(c r (n log n) 1/3 ).
PROOF. Fix r > 1 as in the statement of the theorem. Recall from Definitions 4.4 and 4.7 that K = 70 and κ = max{3k, Kr 4 log n}. Let n 0 be the smallest integer such that, for n ≥ n 0 , Lemmas 4.5 and 4.14 apply, and also (n log n) 1/3 ≥ n 1/3 ≥ max{18r 2 , 6Kr 6 log n, Kr 4 (log n) 2 }. (20) We split into cases depending on the values of k, , m, and n. We show that in each case, the statement of the theorem holds, provided c r ≥ max{2rn 0 , 156r 6 K}.
Case 1: n < n 0 . We show that with probability at least 1/2rn 0 , x 0 dies before spawning a single mutant. Indeed, at the start of the process x 0 spawns a mutant with rate r, and every choice of x 0 ∈ V (S k, ,m ) has an in-neighbour so x 0 dies with rate at least 1/n. Thus, X goes extinct with probability at least so the result follows since c r ≥ 2rn 0 .

AN UPPER BOUND ON THE FIXATION PROBABILITY OF METAFUNNELS
The (k, , m)-metafunnel is defined in Section 1.1.1. We use n = 1 + k i=1 m i to denote the number of vertices.
The main result of this section is the following upper bound on the fixation probability of the metafunnel.
THEOREM 5.1. Let r > 1. Then there is a constant c r > 0, depending on r, such that the following holds for all k, , m ∈ Z ≥1 such that the (k, , m)-metafunnel G k, ,m has n ≥ 3 vertices. Suppose that the initial state X 0 of the Moran process with fitness r is chosen uniformly at random from all singleton subsets of V (G k, ,m ). The probability that the Moran process goes extinct is at least e − √ log r·log n (log n) −c r .

Proof Sketch
If k = 1, then G k, ,m is a star and has extinction probability roughly 1/r 2 so Theorem 5.1 follows easily. So for most of the proof (and the rest of this sketch) we assume k ≥ 2. To prove the theorem, we divide the parameter space into two regimes.
In the first regime, m < r √ log r n . Since m is small, V k is not too large compared to V 0 ∪ · · · ∪ V k−1 . Thus, it is fairly likely that x 0 is born outside V k , and dies before it can spawn a single mutant. This straightforward analysis is contained in the short Section 5.3.
Most of the proof (Section 5.4) focusses on the second regime, where m ≥ r √ log r n which, since n ≥ m k , implies k ≤ log r n. In this regime it is likely that a uniformly chosen initial mutant x 0 is born in V k (Lemma 5.3), so we assume that this is the case in most of the proof (and the rest of this sketch). The key lemma is Lemma 5.28 which shows that, in this case, it is (sufficiently) likely that x 0 dies before v * spawns a mutant. In more detail, Definition 5.5 defines a stopping time T pa , which is the first time t that one of the following occurs.
(A1) X t = ∅, or (A2) |X t | exceeds a given threshold m * , which is a polynomial in log n, or (A3) by time t, v * has already become a mutant in X more than b * times, where b * is about half as large as its number m of in-neighbours, or (A4) t exceeds some threshold t max which is (very) exponentially large in n.
The subscript "pa" is for "pseudoabsorption time" because (A1) implies that the Moran process absorbs by going extinct and (A2) is a prerequisite for absorbing by fixating. The proof of Lemma 5.28 shows that, with sufficiently high probability, (A2)-(A4) do not hold, and so the Moran process X must go extinct by T pa .
Conditioning makes it difficult to prove that (A2)-(A4) fail. To alleviate this, we divide the mutants into groups called "strains," which are easier to analyse. In particular, a strain contains all of the descendants of a particular mutant spawned by x 0 . Formally, for each positive integer i, the i'th strain S i is defined as a mutant process in Definition 5.6. Informally, S i is "born" at the i'th time at which x 0 spawns a mutant in X. It "dies" when all of the descendants of this spawn have died. It is "dangerous" if one (or more) of these descendants spawns a mutant onto v * before T pa .
Lemma 5.7 defines eight events, P 1 -P 8 . These are defined in such a way that we can show (in the proof of Lemma 5.28) that if P 1 -P 8 simultaneously occur, then (A2)-(A4) do not hold. The definitions are engineered in such a way that we can also show that it is fairly likely that they do hold simultaneously-this takes up most of the proof. Informally, the events are defined as follows.
. P 3 : v * is a mutant for at most one unit of time up to time T pa . P 4 : The Moran process absorbs (either fixates or goes extinct) by time t max /2. P 5 : Break [0, t x 0 ] into intervals of length (log n) 2 . During each interval, x 0 spawns at most 2r(log n) 2 mutants in X. P 6 : Define s to be around 3rt x 0 . Each of the strains S 1 , . . . , S s spawns at most log n mutants before T pa . P 7 : Each of the strains S 1 , . . . , S s dies within (log n) 2 steps. P 8 : At most b * / log n of S 1 , . . . , S s are dangerous.
The rough sketch of Lemma 5.28 is as follows. P 1 and P 3 guarantee that v * does not spawn a mutant in X until after T pa . This together with P 2 and P 3 guarantees that the only mutants in the process before time T pa are part of strains that are born before t x 0 . By P 5 , there are at most s such strains. By P 6 , each of these strains only has about log n mutants. Together with P 7 , this implies that (A2) does not hold at t = T pa . P 8 and P 6 imply that (A3) does not hold at t = T pa . Finally, P 4 implies that (A4) does not hold at t = T pa .
The bulk of the proof involves showing (Lemma 5.7) that P 1 -P 8 are sufficiently likely to simultaneously occur. Of these, P 3 -P 7 are all so likely to occur that the probability that they do not occur can be subtracted off using a union bound (so conditioning on the other P i 's is not an issue). The majority of the failure probability comes from the probability that P 2 does not occur. This is handled in the straightforward Lemma 5.8, which gives a lower bound on the probability that P 1 and P 2 both occur. The remaining event, P 8 , is sufficiently unlikely to occur that careful conditioning is required. This is (eventually) handled in Lemma 5.27, which shows that it is fairly likely to occur, conditioned on the fact that both P 1 and P 2 occur.
In order to get a good estimate of the probability that a strain is dangerous (in P 8 ), we need to consider the number of mutants spawned from the "layer" of the strain closest to the centre vertex v * . In order to do this, we define a new mutant process called the "head" of a strain. Strains and heads-of-strains share some common properties, and they are analysed together as "colonies" in Section 5.4.1. Informally (see Definition 5.12) a "colony" is a mutant process Z whose mutants are in V 1 ∪ · · · ∪ V k−1 (and not in V 0 or V k ). Once a colony becomes empty, it stays empty. Since a colony is a mutant process but not necessarily a Moran process, vertices may enter and/or leave whenever a clock triggers but we say that the colony is hit when a vertex leaves a colony specifically because a nonmutant is spawned onto it in the underlying Moran process. We define the "spawning chain" Y Z of a colony and show that it increases whenever the colony spawns a mutant and that it only decreases when the colony is hit. By analysing the jump chain of a spawning chain (see Definition 5.15), we are able to obtain the desired bounds on the probability that P 6 , P 7 , and P 8 fail to occur.

The Small m Case
We first show that if m is small, then x 0 is likely to be born outside V k and die before spawning a mutant. This is relatively easy.
LEMMA 5.2. Suppose k ≥ 2. Choose x 0 uniformly at random from V (G k, ,m ). Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. The extinction probability of X is at least 1/(2(m + r)).
PROOF. Note that P(x 0 ∈ V k ) = m k /n. First suppose m = 1 and k ≥ 2. We have Moreover, if x 0 / ∈ V k , then x 0 has an in-neighbour of out-degree 1. In this case, with probability at least 1/(1 + r), x 0 dies before spawning a mutant. It follows that X goes extinct with probability at least 1/(2(1 + r)), as required. Now suppose m, k ≥ 2. We have , then x has at least m i+1 in-neighbours with outdegree m i , so with probability at least m/(m + r), x 0 dies before spawning a mutant.
Hence, X has extinction probability at least .

The Large m Case
We now consider the case where m is large.
For the remainder of Section 5, we will fix an arbitrary vertex x 0 ∈ V k and let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. We first define some constants. Then, we define a "pseudoabsorption time" T pa which, by (A1) and (A2) in Definition 5.5, is at most the absorption time of the Moran process X.
Definition 5.4 (Constants). We will use the following definitions for the rest of Section 5.
Definition 5.5 (The Stopping Time T pa ). We define the stopping time T pa to be the first time t that one of the following occurs: The definition of T pa is motivated as follows. Certainly (A1) must hold when the process X goes extinct, and (A2) must hold before X fixates. If (A3) holds, we expect v * to spawn a mutant in X, which makes the process significantly harder to analyse (so we will stop the analysis before this). Actually, this is also why we stop at m * mutants in (A2)-if the process contains too many mutants, then it becomes harder to analyse. Finally, (A4) ensures that T pa < ∞.
We will prove that, with sufficiently high probability, (A2)-(A4) do not hold, and so the Moran process X must go extinct by T pa . To do this, we will group the descendants of each mutant spawned by x 0 in X together and analyse each group as a separate mutant process.
Definition 5.6 (Strains). Consider the Moran process X with G(X) = G k, ,m and X 0 = {x 0 } for some x 0 ∈ V k . For each positive integer i, we define a mutant process S i (called the i'th strain) with G(S i ) = G k, ,m . Let T i b be the i'th time at which x 0 spawns a mutant in X, or ∞ if x 0 spawns fewer than i mutants. The subscript "b" stands for the "birth" of the strain. Clearly, T i b is a function of the evolution of the process X. We let Let u i be the vertex onto which the mutant is spawned in X, and let S i τ j = {u i }. The process S i now evolves discretely as follows. Suppose we are given S i τ a for some a ≥ j. We define S i t = S i τ a for all t ∈ (τ a , τ a+1 ). We then define S i τ a+1 by dividing into cases. Case 1: Some vertex u ∈ S i τ a spawns a mutant onto some vertex v in X at time The subscript "d" stands for the "death" of the strain. Note that the definition maintains the invariant that S i t ⊆ X t . Finally, we define the notion of a dangerous strain. The strain S i is said to be dangerous if it spawns a mutant onto v * during the interval [0, T pa ].
Note that we allow S i t and S i t to intersect for i = i . Intuitively, S i t is the set of all living descendants at time t (within V 1 ∪ · · · ∪ V k−1 ) of the i'th mutant spawned by x 0 in X.
We now set out a list of events P 1 , . . . , P 8 which, as we will see in the proof of Lemma 5.28, together imply extinction. We state these events and claim they hold with reasonable probability in Lemma 5.7.
LEMMA 5.7. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ r √ log r n , and 2 ≤ k ≤ log r n. Suppose x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. With probability at least r −k /(log n) C r +7 , all of the following events occur in (X).
P 5 : for all j ≤ t x 0 /(log n) 2 , x 0 spawns at most 2r(log n) 2 mutants in I j in X. The majority of the failure probability in Lemma 5.7 comes from P 2 . In addition, P 1 and P 8 may fail with reasonably high probability, so we will need to be careful with conditioning for these events. The remaining events each occur with high enough probability that we can apply a union bound.
We first show that P 1 ∩ P 2 occurs with reasonable probability.
LEMMA 5.8. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ r √ log r n , and 2 ≤ k ≤ log r n. Suppose x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Then, in the process (X), PROOF. Let n 0 be a large integer relative to r. Note that P 1 and P 2 depend only on the star-clock process P * (G), and so they are independent by the definition of P * (G). The sum of the parameters of the star-clocks in {M * (v * ,u) | u ∈ V k } is r, so the definition of Poisson processes ensures that P(P 1 ) = e −r .
The assumptions in the statement of the lemma guarantee that t x 0 ≥ 4. The parameter of the star-clock N * ( Using Equation (4), we get Since P 1 and P 2 are independent, the result follows.
LEMMA 5.9. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ r √ log r n , and k ≥ 2. Suppose x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Then P(P 3 ) ≥ 1 − 1/n.
Now consider any i ∈ [b * ] and any t 0 ≥ 0. Consider any possible value f for F t 0 (X) that is consistent with T i m = t 0 . We will show The event T i m = t 0 is determined by f . Since T i m ≤ T pa , we have t 0 ≤ T pa . The value f also determines whether or not t 0 = T pa . If so, then Equation (24) is trivial since T i n = T pa = t 0 , so P(T i n − t 0 ≤ y | F t 0 (X) = f ) = 1. From now on, we assume that f implies that T pa > t 0 . Since T i m = t 0 , this implies that v * ∈ X t 0 . Let B be the set of all nonmutant clocks with target v * and let encapsulate the behaviour of every clock in C(G k, ,m ) \ B over the interval (t 0 , t max ]. (That is, let contain a list, for each of these clocks, of all times that it triggered during this interval.) Consider any value ξ of which is consistent with F t 0 (X) = f . We now define a time t pa which depends only on the values f and ξ . To do so, consider the situation in which F t 0 (X) = f , = ξ and no clock in B triggers in (t 0 , t max ], so that the evolution of X in this interval is entirely determined by f and ξ . Let t pa be the time at which T pa would occur in this situation. It is easy to see that T i n ≤ t pa . If a nonmutant is spawned onto v * in X at some time t ∈ (t 0 , t pa ], then T i n ≤ t ≤ t pa . Otherwise, the evolution of X in (t 0 , t pa ] is exactly the same as it would be if no clocks in B triggered, so T pa = t pa = T i n . We will now prove Equation (24). First, if y ≥ t pa − t 0 , then since T i n ≤ t pa , we have So suppose y < t pa − t 0 . Let t 1 < · · · < t z be the times in (t 0 , t 0 + y] at which clocks in C(G k, ,m ) \ B trigger and let t z+1 = t 0 + y. Thus, t 0 < · · · < t z ≤ t z+1 < t pa . For all h ∈ {0, . . . , z}, let χ (h) be the value that X t h would take in the situation where no clock in B triggers in (t 0 , t h ], F t 0 (X) = f and = ξ . Thus, t pa , z, t 0 , . . . , t z+1 and χ (0), . . . , χ(z) are all uniquely determined by f and ξ .
For each h ∈ [z + 1], let E h be the event that a nonmutant is spawned onto v * in the interval (t h−1 , t h ). Note that with probability 1, no nonmutant is spawned onto v * at any time t h . Thus, Now fix h ∈ [z + 1], and consider any possible value f h−1 of F t h−1 (X) which implies that F t 0 (X) = f and E 1 ∩ · · · ∩ E h−1 and is consistent with = ξ . Consider the evolution of X given F t h−1 (X) = f h−1 and = ξ . Since E 1 ∩ · · · ∩ E h−1 occurs, no nonmutant is spawned onto v * in the interval (t 0 , t h−1 ] and so X t h−1 = χ (h − 1). Moreover, X remains constant in [t h−1 , t h ) unless a nonmutant is spawned onto v * . Thus, given the condition that F t h−1 (X) = f h−1 and = ξ , E h occurs if and only if a nonmutant clock whose source is in V 1 \ χ (h − 1) triggers in the interval (t h−1 , t h ). Since t h−1 < t pa , property (A2) in the definition of T pa (Definition 5.5) ensures that |χ

Combining this with Equation (26) (by multiplying over all
Equation (24) follows from Equations (25) and (27). Equation (24) shows that b * i=1 (T i n − T i m ) is dominated from above by a sum S of b * i.i.d. exponential random variables with rate m . It follows that The hypothesis of the lemma guarantees that m ≥ 4m * so b * = m/2 ≤ 2 (m − m * )/3 = 2 m /3. Therefore, by Corollary 2.4, P(S < 1) ≥ 1 − e − m /16 ≥ 1 − 1/n, where the last inequality holds since log n ≤ (m− m * )/16 by the hypothesis of the lemma. LEMMA 5.10. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 and k ≥ 2. Suppose x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Then P(P 4 ) ≥ 1 − 1/n. PROOF. Let n 0 be a large constant. For all i ∈ Z ≥1 , let , 1), . . . , e(i, n − 1) be the sequence of edges returned by a breadth-first search of G k, ,m starting from v j(i) . Then E i holds if and only if clocks in C(G k, ,m ) trigger at least n − 1 times in J i , and the first n − 1 such trigger events correspond to M e(i,1) , . . . , M e(i,n−1) , in that order. Note that if E i holds for some i, then the Moran process reaches absorption no later than J + i . Now let f i be any possible value for the filtration F J − i ( (X)). The event F J − i ( (X)) = f i contains all information about E 1 , . . . , E i−1 and j(i). We will show that By Corollary 2.2, the probability that clocks in C(G k, ,m ) trigger at least n times in J i is at least 1 − e −n 2 /16 . Thus, by a union bound, we have established Equation (28).
LEMMA 5.11. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 and k ≥ 2. Suppose x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Then P(P 5 ) ≥ 1 − 1/n. PROOF. Let n 0 be large relative to r. For each j ∈ Z ≥1 , the number of times in I j that x 0 spawns a mutant in X is dominated from above by the number of times in I j that mutant clocks with source x 0 trigger, which follows a Poisson distribution with parameter rlen(I j ) = r(log n) 2 . By Corollary 2.2, we have P(x 0 spawns at least 2r(log n) 2 mutants in I j in X) ≤ e −r(log n) 2 /3 ≤ e −(log n) 2 /3 .
We have t x 0 /(log n) 2 ≤ m k ≤ n, so the result follows by a union bound over the I j 's with j ≤ t x 0 /(log n) 2 .

Colonies and Spawning Chains.
To deal with P 6 , P 7 , and P 8 , we will need to analyse mutant spawns and deaths in strains and in the "bottom layers" of strains. We will use similar ideas for both cases, so to avoid redundancy we introduce the following general definitions.
Definition 5.12 (Colonies). Fix G k, ,m and fix x 0 ∈ V k . Consider the Moran process X with G(X) = G k, ,m and X 0 = {x 0 }. A colony is a mutant process Z with G(Z) = G k, ,m satisfying the following conditions.
-If for some t < t , Z t is nonempty and Z t = ∅, then for all t ≥ t , Z t = ∅.
We define the start and end times of any colony Z as follows. The subscript "s" stands for "start" and the subscript "e" stands for "end." Further, we say that a colony Z in a Moran process X is hit at time t if t = τ j for some j ≥ 1 and there is a vertex v ∈ Z τ j−1 such that some vertex u spawns a nonmutant onto v in X at time τ j .
Since a colony Z is a mutant process but not necessarily a Moran process, vertices may enter and/or leave Z at any time τ j , as long as the conditions of Definition 5.12 are respected. A colony being hit at a particular time means that a vertex left the colony at that time specifically because a nonmutant was spawned onto it in the underlying Moran process.
Note that a strain (Definition 5.6) is a colony. Also, Consider the Moran process X with G(X) = G k, ,m and X 0 = {x 0 }. The spawning chain Y Z of the colony Z is a continuous-time stochastic process with states in Z which evolves as follows. First, for all t ∈ [0, T s (Z)], we define Y Z t = 1. We next define Y Z t for all t ∈ (T s (Z), T e (Z)]. If T s (Z) ≥ T e (Z), there is nothing to define. Suppose instead that T s (Z) < T e (Z), so that T s (Z) = τ i for some i. Now for any j ≥ i with τ j < T e (Z), suppose that we are given Y Z τ j . If τ j+1 > T e (Z), then for all t ∈ (τ j , T e (Z)], we set Y Z t = Y Z τ j . Otherwise, for all t ∈ (τ j , τ j+1 ), we set Y Z t = Y Z τ j and we define Y Z τ j+1 according to the following cases.
Case 1: Z spawns a mutant at time τ j+1 . We set Y Z τ j+1 = Y Z τ j + 1. Case 2: Z is hit at time τ j+1 . With probability (independently of all other events), we set Y Z τ j+1 = Y Z τ j −1; with the remaining probability, we set Y Z τ j+1 = Y Z τ j . We will show later that the probability in Equation (29) is well defined.
Case 3: Neither Case 1 nor Case 2 holds. We set Y Z τ j+1 = Y Z τ j . We have now defined Y Z t for all t ≤ T e (Z). Finally, for t > T e (Z) the spawning chain Y Z t evolves independently of (X) as a continuous-time Markov chain on Z with start state Y Z T e (Z) and the following transition rate matrix.
Definition 5.14 (The Process (X, Z) and its Filtration). Fix G k, ,m with m > m * and fix x 0 ∈ V k . Consider the Moran process X with G(X) = G k, ,m and X 0 = {x 0 }. Consider a colony Z. Let (X, Z) be the stochastic process consisting of (X) together with the spawning chain Y Z whose evolution is coupled with that of (X) in the manner described previously. The filtration F t ( (X, Z)) of (X, Z) consists of F t ( (X)) together with all information about the transitions of Y Z up to time t.
Definition 5.15 (The Jump Chain Y Z ). Fix G k, ,m with m > m * and fix x 0 ∈ V k . Consider the Moran process X with G(X) = G k, ,m and X 0 = {x 0 }. Consider a colony Z.
We now show that the probability in Equation (29) is well defined. First note that since m > m * , the numerator of Equation (29) is positive. Moreover, since Z is hit at time τ j+1 , there must be a vertex v ∈ Z τ j and a vertex u ∈ N − (v) \ X τ j that spawned onto v at time τ j+1 . So, certainly, the denominator of Equation (29) is nonzero. Now, let v ∈ Z τ j be arbitrary, and let i be the integer in It follows that so Equation (29) is, as claimed, a probability. Note that either T e (Z) = T pa or Z t = ∅ for all t ∈ [T e (Z), T pa ]. In either case, we see that spawning chains satisfy two important properties.
Intuitively, we expect that for all t ∈ (T s (Z), T e (Z)], the spawning chain Y Z t should behave similarly to a continuous-time Markov chain on Z which increments with rate r|Z t | and decrements with rate m |Z t |. For technical convenience we will not prove this. Instead, we will prove that the jump chain Y Z evolves as a random walk on Z with appropriate probabilities.
Definition 5.16. Let contain, for each star-clock C ∈ A * , a list of the times at which C triggers in [0, t max ].
The star-clocks in A * are part of the star-clock process P * (G k, ,m ), so of course the times in are "local" and do not necessarily correspond to the times that clocks in A trigger. However, these are related by Observation 3.7.
In order to prove Lemma 5.7, we will need to show that P 8 is reasonably likely to occur, conditioned on P 1 ∩ P 2 . The following lemma (among others) will be used for this purpose, so we prove it conditioned on .
LEMMA 5.17. Let k ≥ 2 and m ≥ 5m * . Fix G k, ,m and fix x 0 ∈ V k . Consider the Moran process X with G(X) = G k, ,m and X 0 = {x 0 }. Let Z be a colony. Let t 0 be a nonnegative real number, and let f 0 be a possible value of the filtration F t 0 ( (X, Z)). Let ϕ be a possible value of . If the three events T s (Z) = t 0 , F t 0 ( (X, Z)) = f 0 , and = ϕ are consistent, then conditioned on these three events, the jump chain Y Z evolves as a random walk on Z with initial state 1 and the following transition matrix.
PROOF. The definition of the jump chain Y Z implies that Y Z (0) = 1.
Now consider an i ∈ Z ≥0 and a nonnegative real number t i . (If i = 0, then t 0 is already defined in the statement of the lemma. Otherwise, consider any t i ≥ t 0 .) Suppose that f i is a possible value for the filtration F t i ( (X, Z)) and that the events T s (Z) = t 0 , F t 0 ( (X, Z)) = f 0 , = ϕ, F t i ( (X, Z)) = f i , and T i = t i are consistent. Note that all of these events are determined by F t i ( (X, Z)) = f i and = ϕ, which also determine Y Z (0), . . . , Y Z (i). We therefore wish to show that Let t i+1 > t i be arbitrary. Let contain, for each clock C ∈ C(G k, ,m ), a list of the times at which C triggers in (t i , t i+1 ). Suppose that ξ is a possible value of such that the event = ξ is consistent with the events F t i ( (X, Z)) = f i , T i+1 = t i+1 , and = ϕ. Let F be the intersection of these four events. Namely, F is the intersection of Let E 1 be the event that Y Z t i+1 = Y Z t i + 1 and let E 2 be the event that Y Z t i+1 = Y Z t i − 1. By integrating over all choices of t i+1 and ξ , Equation (30) will follow from Since F implies T i = t i and T i+1 = t i+1 , it implies E 1 ∪ E 2 . Also, E 1 ∩ E 2 is empty. F determines the evolution of (X, Z) throughout [0, t i+1 ). In particular, it determines whether the event T e (Z) < t i+1 occurs. We split into cases accordingly.
Case 1: F implies that t i+1 > T e (Z). In this case, conditioned on F, the behaviour of Y Z t at t i+1 is governed entirely by the transition rate matrix R and is therefore independent of and (X). The definition of the spawning chain Y Z gives Equation (31).
Case 2: F implies that t i+1 ≤ T e (Z). Let T − = max{τ j | τ j < t i+1 }, and let t − be the unique value of T − consistent with F. Let χ t − be the unique value of X t − consistent with F, and let ζ t − be the unique value of Z t − consistent with F. Define Consider the following events.
-E 1 is the event that a clock in B 1 triggers at t i+1 , and -E 2 is the event that -a clock in B 2 triggers at t i+1 , and -an (independent) coin toss (part of the spawning chain), with probability m |ζ t − |/S of coming up heads, comes up heads.
Note that, conditioned on F, event E 1 coincides with E 1 and E 2 coincides with E 2 . It is easy to see, using the definition of a Poisson process, that In order to establish Equation (31), we would like to show that conditioning on F is equivalent to conditioning on E 1 ∪ E 2 . This is straightforward, apart from the event = ϕ, which is part of F. Unfortunately, we need the result of the lemma to be conditioned on = ϕ and not merely on the rest of F, so the rest of this proof is merely technical, and is to deal with this.
To proceed, we consider the four events making up F.
-First, consider the event F t i ( (X, Z)) = f i . Letf i be the induced value of F t i ( (X)). The value f i consists off i , together with the extra information about the transitions of Y Z up to time t i (giving the outcomes of the independent coin tosses that are part of the spawning chain Y Z ). The valuef i contains, for each clock C ∈ C(G k, ,m ), a list of the times at which C triggers in [0, t i ]. It also contains information about the times that the star-clocks trigger, according to the coupling in Section 3.4. Using Observation 3.7, we could translatef i into a (unique) equivalent event which is a list of times at which certain star-clocks trigger. We will therefore write F * t i = f * i to denote the event F t i ( (X, Z)) = f i , expressed entirely in terms of star-clock triggers and outcomes of spawning-chain coin tosses.
-Similarly, given F * t i = f * i , we can uniquely express = ξ as an event which is a list of times at which certain star-clocks trigger. We will denote this event as * = ξ * . The definitions of t − , χ t − , and ζ t − can be deduced from f * i and ξ * .
Note that for every star-clock with source u in B * 1 ∪ B * 2 , the quantities i m (u, X, t i+1 ) and i n (u, X, t i+1 ) can be deduced from f * i and ξ * . Let E * 1 be the event that a star-clock with source u in B * 1 triggers at i m (u, X, t i+1 ). Note that by Observation 3.7, F * t i = f * i and * = ξ * implies that E * 1 is equivalent to E 1 . Similarly, let E * 2 be the event that a star-clock with source u in B * 2 triggers at i n (u, X, t i+1 ) and the appropriate coin toss comes up heads, so that F * t i = f * i and * = ξ * implies that E * 2 is equivalent to E 2 .

We now have
Finally, since B * 1 ∪ B * 2 is disjoint from A * , we conclude that, conditioned on F * t i = f * i , * = ξ * , and E * 1 ∪ E * 2 , the event E * 1 is independent of (by independence of star-clocks in the star-clock process). Thus, we obtain which implies Equation (31) by translating the events back to their original formulation.
We next prove that, with high probability, Y Z transitions many times shortly after T s (Z).
LEMMA 5.18. There exists n 0 > 0 such that the following holds for all n ≥ n 0 , m ≥ 5m * , and k ≥ 2. Fix G k, ,m with n vertices and fix x 0 ∈ V k . Consider the Moran process X with G(X) = G k, ,m and X 0 = {x 0 }. Let Z be a colony. Let t 0 be a nonnegative real number, and let f 0 be a possible value of the filtration F t 0 ( (X, Z)). If the events T s (Z) = t 0 and F t 0 ( (X, Z)) = f 0 are consistent, then conditioned on these events, with probability at least 1 − e −(log n) 2 /16 , Y Z increases at least 2 log n times in the interval [t 0 , t 0 + (log n) 2 ] .
PROOF. Let P be a Poisson process with rate r. Conditioned on T s (Z) = t 0 and F t 0 ( (X, Z)) = f 0 , we will couple the evolution of (X, Z) from time t 0 with that of P. The coupling will have the property that every time P triggers, Y Z increases.
Given the coupling, we can conclude that the probability that Y Z increases at least 2 log n times in the interval [t 0 , t 0 + (log n) 2 ] is at least the probability that P triggers at least 2 log n times in an interval of length (log n) 2 . This is the probability that a Poisson random variable W with parameter ρ = r(log n) 2 has value at least 2 log n. We have 2 log n < 2ρ/3, so by Corollary 2.2, P(W ≤ 2 log n) ≤ exp(−ρ/16) ≤ exp(−(log n) 2 /16).
It remains to give the details of the coupling. Roughly, the coupling will be constructed using the sequence τ 1 , τ 2 , . . . . However, one technical detail arises, since T e (Z) might not occur at one of the instants τ 1 , τ 2 , . . . . So, for the purposes of the proof, letτ 1 ,τ 2 , . . . be the increasing sequence containing T e (Z) and all τ 1 , τ 2 , . . . from (X) (and nothing else). This random sequence is a function of the evolution of (X, Z). Now, conditioned on T s (Z) = t 0 and F t 0 ( (X, Z)) = f 0 there is a nonnegative integer j such thatτ j = t 0 . We will define the coupling from eachτ i for i ≥ j.
So consider i ≥ j and suppose that for some time t i and some filtration value f i , we haveτ i = t i and F t i ( (X, Z)) = f i . To continue the coupling in the open interval from t i there are two cases. It is easy to determine which case applies, using f i .
If T e (Z) ≤ t i , then the evolution of Y Z after time t i is a continuous-time Markov chain with transition matrix R, evolving independently of (X). The rate of an upwards transition in R is r so use the triggering of P to dictate these upwards transitions.
If T e (Z) > t i , then Z t i is nonempty, so choose some u ∈ Z t i . Now use the triggering of P to dictate the triggering of the mutant clocks with source u (which together have rate r). Thus, every time P triggers, a mutant clock in C(G k, ,m ) with source u is chosen uniformly at random to trigger.

5.4.2.
Continuing the Proof of Lemma 5.7. We are now in a position to bound the probabilities with which events P 6 -P 8 occur, using Lemmas 5.17 and 5.18. We start by showing that, up until T e (S i ), the state of the spawning chain Y S i is at least as large as the size of the strain S i (which is a colony).
LEMMA 5.19. Let k ≥ 2 and m ≥ 5m * . Fix x 0 ∈ V k . Consider the Moran process X with ). We will do so by induction on j. If T s (S i ) = T e (S i ) = T pa , then there is nothing to prove, so suppose T s (S i ) = τ x for some x. We have already seen that the claim holds for j = x. Suppose that the claim holds for some j ≥ x, and that τ j+1 < T e (S i ). We will now prove the claim for j + 1 by dividing into cases.

Case 1: |S
This case may arise only if S i spawns a mutant at time τ j+1 . Thus by (Z1) This case may arise only if the spawning chain Y S i is decremented at time τ j+1 , which by (Z2) may happen only if S i is hit at time τ j+1 . Thus we have It now follows by (Z1) and Corollary 5.20 that P 6 occurs if, for each i ∈ [s], Y S i increases at most log n times before reaching zero. We will then be able to apply Lemma 5.17 to reduce this to a simple question about random walks. We state the following lemma somewhat more generally since we will use it again to deal with P 8 . LEMMA 5.21. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ 5m * , and k ≥ 2. Fix x 0 ∈ V k . Consider the Moran process X with G(X) = G k, ,m and X 0 = {x 0 }. Suppose that i and y are positive integers and that ϕ is a possible value for . Then, conditioned on = ϕ, the probability in (X) that S i spawns at least y mutants in (0, T pa ] is at most (20r/m) y .
PROOF. By (Z1), the number of mutants S i spawns in (0, T pa ] is equal to the number of times Y S i increases in (T s (S i ), T e (S i )]. By Corollary 5.20, Y S i cannot reach zero until T e (S i ), and so it suffices to prove that P Y S i increases at least y times before reaching 0 Let t 0 be a nonnegative real number and let f 0 be a possible value of F t 0 ( (X, S i )) consistent with T s (S i ) = t 0 and = ϕ. Recall from Lemma 5.17 the transition probabilities of Y S i , conditioned on T s (S i ) = t 0 , F t 0 ( (X, S i )) = f 0 , and = ϕ.
For all j ∈ Z ≥1 , let E j be the event that Y S i takes exactly j forward steps from 1 before reaching 0. Let E ≥y = ∞ j=y E j . Since m ≥ 5m * > r, Y S i reaches 0 with probability 1, so we have For Y S i to reach 0 after exactly j forward steps, Y S i must decrease exactly j + 1 times for a total of 2 j + 1 steps. Thus, Since 4r/(r + m ) ≤ 1/2, it follows that Here the penultimate inequality follows since m ≥ 5m * , so m ≥ 4m/5. Thus, Equation (32) holds, and the result follows.
COROLLARY 5.22. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ 5m * , and k ≥ 2. Fix x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Then P(P 6 ) ≥ 1 − 1/n. By a union bound, the probability that each of S 1 , . . . , S s spawn at most log n mutants in (0, T pa ] is at least 1 − 1/n as required. Now, P 7 is implied by P 6 and Lemma 5.18. LEMMA 5.23. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , k ≥ 2, and m ≥ 5m * . Fix x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Then P(P 7 ) ≥ 1 − 2/n. PROOF. By applying a union bound to Corollary 5.22 and Lemma 5.18, with probability at least 1 − 2/n, P 6 occurs and, for all i ∈ [s], Y S i increases at least 2 log n times in [T s . By (Z1) and the occurrence of P 6 , Y S i can increase at most log n times in [T s (S i ), T e (S i )), so we must have T e (S i ) ≤ T s (S i )+(log n) 2 . Thus, P 7 occurs by the definitions of T s (S i ) and T e (S i ).
It remains only to bound the probability of P 8 . Note that while Lemma 5.21 bounds the number of mutants spawned by any vertex in a strain, to tightly bound the probability that the strain is dangerous we need to look at the number of mutants spawned from the "layer" of the strain closest to the centre vertex.
The following lemma relates the behaviour of the spawning chain Y H i to the question of whether or not S i is dangerous.
LEMMA 5.25. Let k ≥ 2 and m ≥ 5m * . Fix x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Let i be a positive integer. Suppose that S i is dangerous, spawning a mutant onto v * for the first time at some time t sp ≤ T pa . Then t sp ≤ T e (S i ), and the following two statements hold in (X, H i ).
PROOF. First note that since t sp ≤ T pa and S i is nonempty at t sp , it is immediate that T s (S i ) < t sp ≤ T e (S i ). We now define some notation. Clearly, t sp is in the sequence . Let a = |{t ≤ t sp | H i spawns a mutant at t}|. We first bound a below. Recall that h i (T s (S i )) = k − 1, and note that h i (τ j−1 ) = 1 since S i spawns a mutant onto v * at time τ j . Moreover, every time h i decreases, it only decreases by 1 and H i spawns a mutant. Also, h i increases (by at least 1) at time τ x whenever H i is hit at τ x and |H i τ x−1 | = 1. Finally, note that H i spawns a mutant at time τ j . Thus, where the equality follows since no two clocks trigger at the same time. By (Z1) this implies part (i) of the result. (In fact, it is substantially stronger, and we will use this extra strength later in the proof.) We now bound a above. We say that a layer V y is empty at time t if S i t ∩ V y = ∅, and nonempty otherwise. Since t sp is the first time that H i spawns a mutant onto v * , every time H i spawns a mutant in [0, t sp ), a layer must become nonempty in S i . Since H i spawns no mutants in [0, T s (S i )], it follows that ,V y becomes nonempty at t}| + 1.
Since V k−1 becomes nonempty at T s (S i ), it follows that a ≤ |{t < t sp | for some y ∈ [k − 1],V y becomes nonempty at t}|.
Every time a layer becomes nonempty in [0, t sp ], either it subsequently becomes empty again in [0, t sp ] or it contains at least one mutant at time t sp . Thus, a ≤ |{t ≤ t sp | for some y ∈ [k − 1],V y becomes empty at t}| + |S i t sp |. If a layer V y becomes empty at time τ x , then S i is hit at time τ x and |S i τ x−1 ∩ V y | = 1. Thus, Every time a vertex becomes a mutant in S i during [0, t sp ], that mutant must either die in [0, t sp ] (at which point S i is hit) or still be alive at t sp . Thus, a ≤ |{t ≤ t sp | a vertex becomes a mutant in S i at t}| − |{x ≤ j | H i is hit at τ x and |H i τ x−1 | > 1}|. Since the only time a vertex becomes a mutant in S i without S i spawning a mutant is T s (S i ), and S i spawns a mutant onto v * at t sp , it follows that a ≤ |{t ≤ t sp | S i spawns a mutant at t}| − |{x ≤ j | H i is hit at τ x and |H i τ x−1 | > 1}|. It now follows from Equation (33) that Part (ii) of the result follows immediately from (Z2).
We next prove that Y H i is unlikely to increase k − 1 times before decreasing C r + 2 times. This, combined with Lemma 5.25, will allow us to show that S i is dangerous with probability at most roughly (r/m) k−1 .
LEMMA 5.26. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ m * (log n) 3 , and 2 ≤ k ≤ log r n. Fix x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Let ϕ be a possible value of and let i ∈ Z ≥1 . Let E i be the event in (X, H i ) that Y H i increases k − 1 times before decreasing C r + 2 times. Then PROOF. Let n 0 be a large integer relative to r. Consider any positive integer t 0 and filtration value f 0 such that the events T s (H i ) = t 0 , F t 0 ( (X, H i )) = f 0 , and = ϕ are consistent. Recall from Lemma 5.17 that, at each step and conditioned on these events, Y H i increases with probability r/(r + m ) and decreases with probability m /(r + m ).
For 0 ≤ K ≤ C r + 1, let E i,K be the event that Y H i increases precisely k − 1 times within its first k − 1 + K transitions. Thus, E i = C r +1 i=0 E i,K . The number of backward steps among the first k − 1 + K transitions of Y H i follows the binomial distribution consisting of k − 1 + K Bernoulli trials, each with success probability m /(r + m ), and E i,K holds if and only if this quantity is equal to K. Hence, where the final inequality holds since k ≤ log r n and K ≤ C r + 1. Moreover, m m It therefore follows that It now follows by a union bound over K that We are now finally in a position to deal with P 8 .
LEMMA 5.27. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ r √ log r n , and 2 ≤ k ≤ log r n. Fix x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Then, in the process (X), P(P 8 | P 1 ∩P 2 ) ≥ 1/2. PROOF. Let n 0 be a large integer relative to r. Let ϕ be a possible value for so that the event = ϕ is consistent with P 1 ∩ P 2 . Note that P 1 and P 2 are determined by .
For each i ∈ [s], consider the process (X, H i ) on G k, ,m . Let E i be the event that S i spawns at most k + C r mutants in (0, T pa ] and let E i be the event that Y H i increases fewer than k− 1 times before decreasing C r + 2 times. We first claim that E i ∩ E i implies that S i is not dangerous. Indeed, suppose E i holds and S i is dangerous, spawning a mutant onto v * for the first time at some time t sp ≤ T pa . By Lemma 5.25(i), Y H i must increase at least k − 1 times in [0, t sp ], and so since E i holds Y H i must decrease at least C r + 2 times in [0, t sp ]. By Lemma 5.25(ii), it follows that S i must spawn at least k + C r + 1 mutants in [0, t sp ]. Thus, E i cannot hold, and so E i ∩ E i implies that S i is not dangerous as claimed.
It therefore suffices to prove, conditioned on = ϕ, that with probability at least 1/2, E i ∩ E i holds for all but b * / log n of the i ∈ (For the final inequality, we use the fact that C r = 2 log r 20 .) Moreover, by Lemma 5.26 we have From a union bound and the fact that determines P 1 and P 2 , it follows that We now simply apply Markov's inequality. By linearity of expectation, Hence, by Markov's inequality, with probability at least 1/2, at most 4s(r/m) k−1 (log n) C r +3 strains are dangerous. Since t x 0 = m k /(r k (log n) C r +5 ), s = 3rt x 0 ≤ 4rt x 0 , and b * = m/2 , we have so this implies that P 8 occurs and the result follows.
Lemma 5.7 now follows from everything we have done so far.
LEMMA 5.7. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ r √ log r n , and 2 ≤ k ≤ log r n. Suppose x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. With probability at least r −k /(log n) C r +7 , all of the following events occur in (X).
We bound each term on the right-hand side by applying (in order) Lemmas 5.8, 5.27, 5.9, 5.10, and 5.11, Corollary 5.22, and Lemma 5.23. This yields P(P) ≥ 1 r k (log n) C r +6 · 1 2 − 6 n ≥ 1 r k (log n) C r +7 , as required. The final inequality follows since r k ≤ r √ log r n ≤ √ n.
5.4.3. Applying Lemma 5.7. We now prove that P 1 ∩ · · · ∩ P 8 implies extinction for the Moran process X, which together with Lemma 5.7 implies our lower bound on extinction probability.
LEMMA 5.28. There exists n 0 > 0, depending on r, such that the following holds. Suppose n ≥ n 0 , m ≥ r √ log r n , and 2 ≤ k ≤ log r n. Fix x 0 ∈ V k . Let X be the Moran process with G(X) = G k, ,m and X 0 = {x 0 }. Then P(X goes extinct) ≥ 1 r k (log n) C r +7 .
PROOF. Let n 0 be a large integer relative to r. By Lemma 5.7, it suffices to prove that P 1 ∩ · · · ∩ P 8 implies extinction. We first show that we may restrict our attention to S 1 , . . . , S s . Note that by Observation 3.7, P 1 ∩ P 3 implies that v * does not spawn a mutant in X in [0, T pa ]. Thus, the definition of the strains ensures that for all t ≤ T pa , Let s = |{i | T i b ≤ min{t x 0 , T pa }}|. Again by Observation 3.7, P 2 ∩ P 3 implies that, in the interval [0, t x 0 ], either x 0 dies in X or T pa occurs, or both. Thus, no strains are born in (t x 0 , T pa ], so by Equation (34), it follows that Moreover, by P 5 and the definition of each interval I j in Definition 5.4, x 0 spawns at most 2r(log n) 2 · t x 0 /(log n) 2 ≤ 3rt x 0 ≤ s mutants in [0, t x 0 ] in X and so s ≤ s. We now show that |X T pa | < m * , and so (A2) does not hold with t = T pa . By Equation (35), each mutant in X T pa \ {x 0 , v * } belongs to S i t for some i ∈ [s ]. By P 6 , each such S i contains at most log n + 1 mutants in total. By P 7 and the definition of s , each such S i was born in the interval [T pa . This interval spans at most two I j 's with j ≤ t x 0 /(log n) 2 , so by P 5 there are at most 4r(log n) 2 such S i 's. Hence, and so (A2) does not hold with t = T pa .
By P 8 , P 6 , and Equation (35), v * becomes a mutant in X at most (b * / log n) log n = b * times in [0, T pa ]. Thus, (A3) does not hold with t = T pa .
Finally, since P 4 occurs, V t max /2 ∈ {∅, V (G k, ,m )}. In either case, T pa < t max by (A1) and (A2) and so (A4) does not hold with t = T pa . Since none of (A2)-(A4) holds at T pa , (A1) must hold at T pa by definition, and so the Moran process X goes extinct as required.

Proof of the Main Theorem (Theorem 5.1)
Theorem 5.1 now follows easily from Lemmas 5.2 and 5.28.
THEOREM 5.1. Let r > 1. Then there is a constant c r > 0, depending on r, such that the following holds for all k, , m ∈ Z ≥1 such that the (k, , m)-metafunnel G k, ,m has n ≥ 3 vertices. Suppose that the initial state X 0 of the Moran process with fitness r is chosen uniformly at random from all singleton subsets of V (G k, ,m ). The probability that the Moran process goes extinct is at least e − √ log r·log n (log n) −c r .
PROOF. Let n 0 be the maximum of the value n 0 in Lemma 5.28 and e 2(r+1) . Recall that C r = 2 log r 20 and take c r ≥ C r + 8 large enough that the result holds whenever n < n 0 and whenever k = 1. (Note that if k = 1, then G k, ,m is a star so as n tends to infinity, the extinction probability tends to 1/r 2 [Lieberman et al. 2005;Broom and Rychtár 2008]. ) We now consider the case where n ≥ n 0 and k ≥ 2. Consider the coupled process (X), with x 0 taken uniformly at random from V (G k, ,m ).
If m ≤ e √ log r·log n , then by Lemma 5.2, we have and the result follows. Suppose instead m ≥ e √ log r·log n = r √ log r n . Then we have n ≥ m k , so k ≤ log r n. By Lemma 5.3, we have P(x 0 ∈ V k ) ≥ 1/2. It follows from Lemma 5.28 that P(X goes extinct) ≥ 1 2 r −k (log n) −(C r +7) ≥ e − √ log r·log n (log n) −c r , and again the result follows.

A LOWER BOUND ON THE FIXATION PROBABILITY OF ϒ M
The definition of the (k, , m)-megastar M k, ,m is given in Section 1.1.3. Note that each of the cliques K 1 , . . . , K is the vertex set of a complete graph on k vertices (contrary to the fact that some authors use the notation K i to denote a complete graph on i vertices). An infinite family of megastars is identified in Definition 1.5. Recall that where m( ) = and k( ) = (log ) 23 . For convenience, we drop the argument in the functions m( ) and k( ) and simply write m and k. Also, we use M to denote the megastar M k( ), ,m( ) . We use n = 1 + (m+ 1 + k) to denote the number of vertices of M . Note that √ n/2 ≤ , m ≤ √ n when is sufficiently large. Our main theorem is the following. THEOREM 6.1. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Consider the Moran process X with G(X) = M where the initial mutant x 0 is chosen uniformly at random from V (M ). The fixation probability of X is at least 1 − (log n) 23 /n 1/2 .  we show how to use Lemma 6.3 to prove Theorem 6.1, which we restate here for convenience.

Glossary
THEOREM 6.1. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Consider the Moran process X with G(X) = M where the initial mutant x 0 is chosen uniformly at random from V (M ). The fixation probability of X is at least 1 − (log n) 23 /n 1/2 . PROOF. Let X be the megastar process on M with X 0 = X 0 . Recall that both X and X are defined in terms of the same clock process C(M ). It is therefore immediate that for all t ≥ 0, X t ⊆ X t . Thus, if X fixates at time t, then X t = X t = V (M ) and so X must also fixate at or before time t. Thus, P(X fixates) ≥ P(X fixates).
Let R be the event that the initial mutant x 0 is in a reservoir. Clearly, By Lemma 6.3, we have Therefore, and the result follows.
The rest of Section 6 is devoted to the proof of Lemma 6.3.

Sketch of the Proof of the Key Lemma (Lemma 6.3)
In this section, we give an informal sketch of the proof of Lemma 6.3. The presentation of the proof itself does not depend upon the sketch so the reader may prefer to skip directly to the proof. Throughout, we assume that n is "large" relative to r, leaving the details of how large to the actual proof. At a very high level, the argument proceeds as follows. We set out some preliminary results concerning cliques in Section 6.5. With our choice of parameters, x 0 is very likely to spawn inside a reservoir, say R 1 . Let = ((log n) 3 ) (see the Glossary for the precise definition), and note that is much smaller than k. In Section 6.6.1, we prove that K 1 is likely to fill with mutants before x 0 dies, and likely to contain at most nonmutants at time n. In Section 6.6.2, we prove that K 1 , . . . , K are all likely to contain at most nonmutants at time n 8 . Finally, in Section 6.7, we prove that the process is likely to fixate by time 2n 8 .
We now discuss each part of the argument in more detail. We say that a clique is active if it contains both mutants and nonmutants. The key idea of Section 6.5 is that since we are working with the megastar process rather than the Moran process, the behaviour of an active clique is governed by a simple random walk on {0, . . . , k} (see Lemma 6.8). This walk is forward-biased for almost its entire length, so we dominate it below by two back-to-back gambler's ruins (see Definition 6.9 and Lemma 6.10). This, together with the fact that any clique containing both mutants and nonmutants changes state with rate at least r, allows us to prove several key properties of cliques in the megastar process, which we state here in simplified form (see Corollary 6.18 and Lemma 6.19): (C1) If a clique contains at least one mutant, then with at least constant probability it fills with mutants within time k log n. (C2) If a clique contains at most nonmutants, then with very high probability it fills with mutants within time (log n) 7 . (C3) Let I be an interval with (log n) 7 ≤ len(I) ≤ e (log n) 2 . Then if a clique contains at most nonmutants at the start of I, with very high probability it contains at most nonmutants at the end of I and contains at most 2 nonmutants at any time in I.
Finally, we use (C2) and (C3) together with a careful domination to prove upper bounds on the likelihood of nonmutants being spawned onto v * from an active clique (see Lemma 6.20).
We now discuss Section 6.6.1. Heuristically, the argument is quite simple. Consider the interval J = [0, √ n(log n) 3 ]. With probability at least 1− O((log n) 3 / √ n), N (v * ,x 0 ) does not trigger in J and so v * remains a mutant throughout. Conditioned on this event, by Chernoff bounds x 0 is very likely to spawn ( √ n(log n) 3 ) mutants onto a 1 in J. Each time a mutant is spawned onto a 1 , either K 1 already contains a mutant or there is an (1/m) = (1/ √ n) chance of a 1 spawning a mutant into K 1 before dying. Whenever K 1 contains a mutant, by (C1) there is an (1) chance that K 1 will fill with mutants, so in expectation K 1 will fill with mutants ((log n) 3 ) times over the course of J. Finally, when K 1 has filled with mutants, by (C3) it is likely to contain at most nonmutants at time n (see Lemma 6.22). Unfortunately, these events are not independent-for example, a mutant may be spawned onto a j while K j is already active from a previous spawn-so concentration is not guaranteed. To make the argument rigorous, we therefore divide J into subintervals and apply domination. Section 6.6.2 is now relatively easy. (C3) tells us that K 1 is very likely to remain almost full of mutants for a superpolynomial length of time. While we could fill each subsequent clique with a similar argument to that used in Section 6.6.1, we have enough wiggle room that we can instead use a substantially simpler argument to prove that K 1 , . . . , K each contain at most nonmutants by time n 8 (see Lemma 6.25). A side effect of this is that our bound on t in the statement of Lemma 6.3 is very loose.
The meat of the proof is in Section 6.7. Suppose that K 1 , . . . , K each contain at most nonmutants. Since is much smaller than k, it is tempting to simply dominate the number of mutants in reservoirs below by a random walk on {0, . . . , m}. We could argue that by (C3), for superpolynomial time most of v * 's in-neighbours will be mutants, and so v * will spawn far more mutants than nonmutants in this interval. While this is true, it will only take us so far-even if each clique only contained one nonmutant, we should still expect v * to be a nonmutant for an (1/k) proportion of the time, leaving us with (m/k) nonmutants in each reservoir. However, all is not lost. Intuitively, when there are many mutants in a reservoir, the corresponding feeder vertex is more likely to be a mutant and so frequently its clique will contain no nonmutants at all. Developing this idea yields Lemma 6.27, the main result of the section.
For all i ∈ Z ≥0 , let We say that the filtration at time I − i is good if the following events occur.
-P 4 (i): For all but at most Lemma 6.27 implies that if the filtration at I − i is good, then with very high probability so is the filtration at I − i+1 . Thus the number of nonmutants in each reservoir drops by a factor of at least (2 log n) 2 to a minimum of (2 log n) 2 , as does the total number of nonmutants across all reservoirs. Moreover, if i is sufficiently large, then Lemma 6.27 also implies that the process fixates by time I − i+1 with probability at least 1/2. Note that the filtration at I − 0 is good; indeed, α 0 = m and β 0 = m, so P 1 (0), P 2 (0) and P 4 (0) trivially occur, and P 3 (0) is very likely to occur by Lemma 6.25. It is therefore relatively easy to prove Lemma 6.3 using Lemmas 6.25 and 6.27 (see Section 6.7.1).
The linchpin of the proof of Lemma 6.27 is a stopping time T i end defined to be the first time t ≥ I − i such that one of the following holds.
Note that the definition of T i end guarantees, without any need for conditioning, that throughout (I − i , T i end ] our cliques remain almost full of mutants and not too many nonmutants are spawned onto reservoirs. We will therefore work in (I − i , T i end ] for most of the proof to facilitate dominations, with the eventual goal of proving that (I − i , T i end ] = I i . In Section 6.7.2, we prove an upper bound on the number of times cliques are likely to become active over the interval (I − i , T i end ] (see Lemma 6.34). In Section 6.7.3, we apply this together with Lemma 6.20 to prove an upper bound on the length of time for which v * is likely to be a mutant over the interval (I − i , T i end ] (see Lemma 6.37). Unfortunately, due to the use of T i end , these proofs require a fairly technical series of dominations. More details can be found in the relevant sections.
In Section 6.7.4, we put all of this together to prove Lemma 6.27 and hence Lemma 6.3. The key observation is that Lemma 6.37 combines with Chernoff bounds on star-clocks to give strong upper bounds on the number of nonmutants spawned by v * in (I − i , T i end ]. These bounds, together with (C3), imply that none of (D2)-(D4) are likely to hold at T i end -in which case (I − i , T i end ] = I i . Additional Chernoff bounds on star-clocks then imply that v * is likely to spawn a mutant onto every reservoir vertex over the first half of I i , which implies that P 1 (i + 1) and P 2 (i + 1) are likely to occur. Then P 3 (i + 1) is likely to occur by (C3), and P 4 (i + 1) is likely to occur by (C2) combined with a relatively simple argument. This implies that the filtration at I − i+1 is likely to be good, as required by Lemma 6.27. This part of the argument is mostly contained in Lemmas 6.41 and 6.42.
It remains only to prove that if i is sufficiently large, X fixates with probability at least 1/2. In this case, we use similar arguments to the preceding to show that with probability at least 5/6, v * spawns no nonmutants at all over the course of I i . In this case, similar arguments to those used to deal with P 4 (i + 1) work to show that X is likely to fixate as required.
The out-degree of a j and every vertex in K j is k. There are y h (k − y h ) mutant clocks whose sources are in K j ∩ X t and targets are in K j \ X t . Similarly, there are (k− y h + 1)y h nonmutant clocks whose sources are in (K j ∪ {a j }) \ X t and targets are in K j ∩ X t . Thus after t , the number of mutants in K j increases with rate u := ry h (k−y h )/k and decreases with rate d := (k − y h + 1)y h /k. It follows that A. Galanis et al.
PROOF. Recall the notation p x→y;z from Lemma 2.9. The Markov chain Z is the same as the chain in Lemma 2.9 with p 1 = r /(r + 1), a = 0, c = k − c r , and d = k. We start with simple lower bounds concerning Z.
-From any state i ∈ {k − c r , . . . , k}, Z must reach either state k or state k − c r before reaching state k − . By Lemma 6.12(i), p k−c r →k;k− ≥ 1 − e −(log n) 3 , and we have -Consider i ∈ {k − , . . . , k − c r − 1}. By Corollary 2.8(i), where the last inequality holds for all n sufficiently large with respect to r, using the fact that c r r −1 r = c r r−1 r+1 > 1. It follows from Lemma 6.12(i) that The result now follows easily. By Lemma 6.10, Y t 0 , j is dominated below by Z with initial state y 0 conditioned on F t 0 (X ) = f . Thus Item (i) follows from Equation (36) and Item (ii) follows from Equations (36) and (37).
Note that in the process of proving Lemma 6.13, we proved the following (see Equations (36) and (37)). COROLLARY 6.14. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let y 0 be an integer satisfying k − ≤ y 0 ≤ k. The probability that Z, when started from y 0 , reaches state k without passing through state k − 2 , is at least 1 − 2e −(log n) 3 .
We now use Lemma 6.12 and Corollary 6.14 to give lower bounds on the probability that Z reaches k in a relatively short time.
LEMMA 6.15. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Consider the Markov chain Z, starting from y 0 ∈ [k− 1].
For every integer y satisfying k To see Equation (39), there are three cases to consider.
Case 1: y = k− c r . By Lemma 6.12(ii), we have the (tighter) bound Case 2: k− 2 < y < k− c r . By Corollary 2.8(iii), we have Combining this and Equation (40) These three cases establish Equation (39). Applying Markov's inequality gives the following.
Now Equation (38) follows by subdividing the set [η] (indexing η transitions from the initial state y 0 ) into (log n) 3 disjoint sets of contiguous indices, each of size 5ec r , then applying Equation (41) to each subset. We next establish Item (ii), so suppose 1 ≤ y 0 ≤ k − . By Corollary 2.8(iii), for all y 0 ∈ {1, . . . , k − c r }, we have Applying Markov's inequality, we obtain Now, using the fact that, starting from Z 0 = k − c r , we have P(Z η = k) ≥ 1 − 3e −(log n) 3 , which we proved in the derivation of (i), we have, starting from Z 0 = y 0 , which completes the proof, since η + η ≤ 5c 2 r k (which we will now justify). To see that η + η ≤ 5c 2 r k , recall from the very start of Section 6 (just before the statement of Theorem 6.1) that √ n/2 ≤ ≤ √ n so n ≤ 4 2 . Furthermore, n ≥ 2 so the condition ≥ 0 in the statement of the lemma also guarantees that n is sufficiently large.
From the definition at the beginning of the proof, η = 5ec r (log n) 3 , so plugging in = c r (log n) 3 (from the glossary in Section 6.1) and using the fact that n is sufficiently large, we get η ≤ 6ec 2 r (log n) 6 . Since c r is a constant (depending on r, but not on n) and n is sufficiently large, this gives η ≤ (log n) 7 . Now since n ≤ 4 2 and is sufficiently large, we have log n ≤ 3 log so η ≤ 3 7 (log ) 7 ≤ (log ) 8 = (log ) 23 /(log ) 15 . Finally, plugging in k = (log ) 23 (from the glossary in Section 6.1), η ≤ k/(log ) 15 . Since is sufficiently large, this is (easily) smaller than 5c 2 r k − η = 5c 2 r k − 4c 2 r k . The following corollary follows immediately from Lemma 6.10 and Lemma 6.15. COROLLARY 6.16. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let j ∈ [ ], let t 0 ≥ 0, and let y 0 ∈ [k − 1]. Let f be a possible value of F t 0 (X ), which implies that Y t 0 , j (0) = y 0 . Then the following statements hold.
To translate Corollary 6.16 into a bound on the time it takes K j to fill with mutants, we will require the following lemma. LEMMA 6.17. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let j ∈ [ ] and t 0 > 0. Let f be a possible value of F t 0 (X ) which implies that K j is active at t 0 . Let t * ≥ 16(log n) 3 . Then, conditioned on the event F t 0 (X ) = f , with probability at least 1 − e −(log n) 3 , T
-Let E 1 be the event that T a (ξ ) = I + .
-For h ∈ Z ≥0 , let E 2 (h) be the event that T in (h) ≤ T a (h) + κ.
-The event F I − (X ) = f determines whether K j is active at time I − .
-If so, let E 3 (0) be the event that, for all t ∈ [T a (0), T in (0) Note that for all h ≥ 0, E 3 (h) implies E 4 (h). This is easy to see as long as K j is inactive at T in (h). In this case, E 3 (h) implies that K j ⊆ X T in (h) -since the number of nonmutants is at most 2 , but it is either 0 or k, it must be 0. On the other hand, if K j is active at T in (h), then T in (h) = I + so T a (h + 1) = I + so the interval in E 4 (h) is empty.
We next observe that F I − (X ) = f implies that E 4 (−1) occurs. From the statement of the lemma, F I − (X ) = f implies that |K j \ X I − | ≤ . If K j is inactive at I − , then this implies that K j ⊆ X I − , which implies E 4 (−1). If, instead, K j is active at I − , then the interval in E 4 (−1) is empty so E 4 (−1) occurs vacuously.
For any integer q, let E We first show that if F I − (X ) = f and E 1 , E 2 , and E 3 all occur, then statements (i), (ii), and (iii) hold. As we have just observed, F I − (X ) = f and E 3 imply that E 4 also occurs. Then E 3 , E 4 , and E 1 imply (i). They also imply (ii) except in the case where K j is active at I − and remains active for all of I. This case is ruled out by E 2 (0) since len(I) > κ. We now turn to statement (iii). Consider any t 0 ∈ [I − , I + − κ). Suppose first that K j is inactive at time t 0 . Since (i) holds, K j ⊆ X t 0 , so it suffices to take t 1 = t 0 . Suppose, instead, that K j is active at time t 0 . Then, for some h ≥ 0, t 0 ∈ [T a (h), T in (h)]. By E 1 , h ≤ ξ . By E 2 (h), we may assume T in (h) ≤ t 0 + κ so we can choose t 1 = T in (h). Since t 0 + κ < I + , t 1 < I + so t 1 ∈ [T in (h), T a (h + 1)) and E 4 (h) guarantees that K j ⊆ X t 1 .
During the remainder of the proof, we will show that It is clear that E 1 occurs if clocks with source a j trigger (in total) fewer than ξ times in I. These clocks have total rate 1 + r ≤ 2r, so by Corollary 2.2 it follows that Now consider any h ∈ {0, . . . , ξ} and any t h ∈ [I − , , and E h−1 4 are consistent, then let f h be any value of F t h (X ) such that F t h (X ) = f h implies all of these events. Suppose that F t h (X ) = f h and consider how many nonmutants K j can have at time t h . Case 1. If h = 0 and K j is active at I − , then t h = I − and it follows from the assumption in the statement of the lemma that K j has at most nonmutants at time t h . Case 2. Otherwise, if T in (h − 1) < I + , then [T in (h − 1), T a (h)) is nonempty, so it follows from E 4 (h − 1) that K j has at most one nonmutant at time t h . Case 3. Otherwise, T in (h − 1) = t h = I + , so it follows from E 3 (h − 1) that K j has at most nonmutants at T in (h − 1) = t h .
In any of these three cases, Corollary 6.18(i) implies that In Case 1, Lemma 6.13(ii) implies that In Case 2, Lemma 6.13(i) implies that Equation (47) gives the worst bound of the three cases. Combining this with Equation (46) using a union bound, we have which, together with Equation (45) gives Equation (44). The final result of Section 6.5 shows that if an active clique is almost full of mutants, then with high probability it spawns no nonmutants at all onto v * before becoming inactive, and with even higher probability it does not spawn too many nonmutants onto v * before becoming inactive.
LEMMA 6.20. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let j ∈ [ ] and t 0 ≥ 0. Let f be a possible value of F t 0 (X ) which implies that k − ≤ |X t 0 ∩ K j | ≤ k − 1. Let S be the total number of nonmutants spawned onto v * by vertices in K j within (t 0 , T ≤ t 0 + 30c 2 r (log n) 6 . Let A be the set of all clocks which have both source and target in K j ∪ {a j }. Let contain, for each clock C ∈ A, a list of the times at which C triggers in (t 0 , T t 0 , j in ]. Note that by the definition of the megastar process (Definition 6.2), for all t ∈ [t 0 , T t 0 , j in ], a j / ∈ X t . It follows that for all t ∈ [t 0 , T t 0 , j in ], and F t 0 (X ) together uniquely determine K j ∩ X t . They therefore determine T t 0 , j c (h) and Y t 0 , j (h) for all h ≥ 0 and hence they determine whether E 1 and E 2 occur.
S h is independent of by the definition of C(M ), since no clocks with target v * are contained in A, (t h−1 , t h ) is a fixed time interval, and χ h−1 is a fixed set. Moreover, S h is independent of F t h−1 (X ) by memorylessness. Thus, conditioned on F t h−1 (X ) = f h−1 and = ϕ, S h is simply a Poisson variable with parameter (t h − t h−1 )(k − |χ h−1 |)/k, which is at most λ h by Equation (48). It therefore follows that for all a ≥ 0, and hence Thus, conditioned on F t 0 (X ) = f and = ϕ, S 1 , . . . , S y are dominated above by S 1 , . . . , S y as claimed.
Note that with probability 1, no nonmutants are spawned onto v * at times t 1 , . . . , t y , and so S = S 1 + · · · + S y . It follows from Equation (49) that conditioned on F t 0 (X ) = f and = ϕ, S is dominated above by a Poisson variable S with parameter By a union bound applied to Lemma 6.13(ii) and Corollary 6.18(i), It is immediate from Equation (50) using Equation (4) that and so part (i) of the result holds. Moreover, by Equation (50) combined with Corollary 2.3, we have and so part (ii) of the result holds.

Filling Cliques
Recall from Definition 6.2 that X 0 is the set containing a single initial mutant, and write X 0 = {x 0 }. Because of the megastar's symmetry, without loss of generality we may assume that x 0 ∈ R 1 ∪ K 1 ∪ {a 1 , v * }. In Section 6.6, we will further restrict our attention to the case where x 0 is in a reservoir, that is, x 0 ∈ R 1 .
6.6.1. The First Clique Fills with Mutants. In Section 6.6.1, we will show that if the initial mutant of X lies in the reservoir R 1 , then with high probability K 1 is almost full of mutants at time n (see Lemma 6.22). We first prove an ancillary lemma. LEMMA 6.21. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Suppose X 0 ⊆ R 1 , and write X 0 = {x 0 }. Let E 1 be the event that N (v * ,x 0 ) does not trigger in [0, 17m(log n) 2 ]. Let t 0 ∈ [0, 17m((log n) 2 − 1)], and let f be a possible value of F t 0 (X ) which is consistent with E 1 . Then For h ∈ {0, . . . , 8m − 1}, let Let E 2 (h) be the event that M (x 0 ,a 1 ) triggers at time T h . We have Let Let E 2 (h) be the event that some mutant clock with source a 1 triggers at time T h . Note that E 2 (h) is independent of E 2 (h). The probability that some clock in C a 1 triggers in (T h , T h + 1] is 1 − e −r−m , and the probability that the first clock in C a 1 to trigger in (T h , ∞) is a mutant clock with source a 1 is r/(r + m). Hence by Equation (51) and a union bound, Note that for all h, the event E 2 (h) ∩ E 2 (h) depends only on fixed clocks in C a 1 ∪ {M (x 0 ,a 1 ) } over the fixed interval (t 0 + 2h, t 0 + 2h + 2]. As such, the events E 2 (h) ∩ E 2 (h) are independent from each other, from E 1 , and from F t 0 (X ). It follows from Equation (52) that Let E 2 be the event that there exists t ∈ (t 0 , t 0 + 16m] such that |K 1 ∩ X t | ≥ 1. We now show that if E 1 ∩ E 2 (h) ∩ E 2 (h) occurs for some h ∈ {0, . . . , 8m − 1}, then E 2 occurs. Since E 1 occurs and T h ≤ 17m(log n) 2 , we have x 0 ∈ X T h . Since E 2 (h) occurs, by the definition of X , either a 1 is a mutant at time T h or K 1 is active at time T h ≤ t 0 + 16m (and hence E 2 occurs). If a 1 is a mutant at time T h , then since E 2 (h) occurs, a 1 spawns a mutant at time T h ≤ t 0 + 16m, and so E 2 occurs. Thus in all cases, if E 1 ∩ E 2 (h) ∩ E 2 (h) occurs, then E 2 occurs. Hence by Equation (53), Let Let E 3 be the event that T T ,1 in ≤ T + m and K 1 is full of mutants at time T T ,1 in . Let A be the set of all clocks which have both source and target in K 1 ∪ {a 1 }. For every t ≥ 0, let t be a random variable which contains, for each clock C ∈ A, a list of the times at which C triggers in (t, t + m]. Now consider any t ≥ t 0 . Let f be any possible value of F t (X ) such that the following events are consistent-F t 0 (X ) = f , F t (X ) = f , E 2 , T = t, and E 1 . Note that the first four of these events are determined by F t (X ) = f . Conditioned on F t (X ) = f , E 1 and t are independent. Also, conditioned on F t (X ) = f , E 3 is determined by t (since the definition of the megastar process ensures that a 1 is a nonmutant throughout [t, T t,1 in ]), so, conditioned on F t (X ) = f , E 3 is independent of E 1 . Now, applying Corollary 6.18, we have Thus, It therefore follows from Equations (54) and (55) that and so the result follows.
We are now able to prove Lemma 6.22.
LEMMA 6.22. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Suppose X 0 ⊆ R 1 . Then with probability at least 1 − 19(log n) 2 / , K 1 contains at most nonmutants at time n.
Since f i determines E 2 (0), . . . , E 2 (i − 1), it follows that Moreover, by Equation (4), It therefore follows by Equation (56) that Now let E 3 be the event that |K 1 \ X n | ≤ . Let Consider any t ∈ [0, 17m(log n) 2 ] and any possible value f of F t (X ) which is consistent with E 2 and T = t. Then by Lemma 6.19(ii) applied to the interval (t, n], with probability at least 1 − e − 1 2 (log n) 3 conditioned on F t (X ) = f , E 3 occurs. Thus, using Equation (57), we obtain as required.
6.6.2. The Other Cliques Become Almost Full. In Section 6.6.2, we will show that if the initial mutant of X lies in a reservoir, without loss of generality R 1 , then with high probability K 1 , . . . , K are all almost full of mutants at time n 8 (see Lemma 6.25). The following lemma will be the linchpin of the proof. LEMMA 6.23. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let t 0 ≥ 0, and let j ∈ [ − 1]. Let f be a possible value of F t 0 (X ) which implies that for all j ∈ [ j], K j contains at most non-mutants at time t 0 . Then PROOF. The first part of the result follows immediately from Lemma 6.19(ii) and a union bound over all j ∈ [ j ] by taking I = (t 0 , t 0 + 20c 2 r k]. We now define some stopping times. Let r (log n) 6 }. Let T 2 be the fourth time after T 1 at which a clock in C(M ) triggers. Let In addition, we define the following events.
Now consider any t 2 > t 0 and any possible value f 2 of F t 2 (X ) which implies that F t 0 (X ) = f , T 2 = t 2 , and that E 1 ∩ E 2 ∩ E 2 occurs. Note that, if F t 2 (X ) = f 2 , then since E 1 ∩ E 2 ∩ E 2 occurs, we must have |K j+1 ∩ X t 2 | ≥ 1. It follows from Corollary 6.18 that It therefore follows from Equation (61) that Finally, consider any t 3 ≥ t 0 and any possible value f 3 of F t 3 (X ) which implies that F t 0 (X ) = f , that T 3 = t 3 , and that E 1 ∩ E 2 ∩ E 2 ∩ E 3 occurs. Since F t 3 (X ) = f 3 implies that E 2 occurs, we have t 3 ≤ t 0 + 30c 2 r (log n) 6 + 1 + 10c 2 r k ≤ t 0 + 20c 2 r k − (log n) 7 . If F t 3 (X ) = f 3 , then E 3 occurs so K j+1 ⊆ X t 3 , which obviously implies |K j+1 \ X t 3 | ≤ . It therefore follows from Lemma 6.19(ii) applied to (t 3 , t 0 + 20c 2 r k] that and therefore by Equation (62), The second part of the result therefore follows.
We now apply Lemma 6.23 repeatedly to prove the following, which shows that, with high probability, as soon as, for some j, j of the cliques are almost full of mutants (having at most nonmutants), after a little while this is true for at least j + 1 of the cliques (so doing it again and again, we (nearly) fill all of the cliques). LEMMA 6.24. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let t 0 ≥ 0, and let j ∈ [ − 1]. Let f be a possible value of F t 0 (X ) which implies that for all j ∈ [ j], K j contains at most nonmutants at time t 0 . Then P for all j ∈ [ j + 1], |K j \ X t 0 +20c 2 r n 7 k | ≤ | F t 0 (X ) = f ≥ 1 − n 8 e −(log n) 2 . PROOF. For all i ∈ Z ≥1 , let t i = t 0 + 20c 2 r ki. Let E 1 (i) be the event that for all j ∈ [ j], |K j \ X t i | ≤ . Let E 2 (i) be the event that |K j+1 \ X t i | ≤ . For convenience, let F be the event that F t 0 (X ) = f . By a union bound, we have Moreover, Since for all i, the event i−1 s=1 (E 2 (s) ∩ E 1 (s)) ∩ F is determined by F t i−1 (X ), by Lemma 6.23 it follows that We also have Since for all i, the event ∩ i−1 s=1 E 1 (s) ∩ F is determined by F t i−1 (X ), it follows from Lemma 6.23 that Hence by Equations (63), (64), and (65), we have Now, let i ∈ [n 7 − 1]. Since E 2 (i ) ∩ F is determined by F t i (X ), it follows by Lemma 6.19(ii) applied to the interval (t i , t n 7 ] that Hence by Equation (66) we have P(E 2 (n 7 ) | F) ≥ 1 − 3n 7 e −(log n) 2 . In addition, by Equation (65), we have P(E 1 (n 7 ) | F) ≥ 1 − n 7 e −(log n) 2 . By a union bound, it follows that P(E 1 (n 7 ) ∩ E 2 (n 7 ) | F) ≥ 1 − n 8 e −(log n) 2 , and so the result follows.
The proof of Lemma 6.25, the goal of Section 6.6, now follows easily from Lemma 6.24.
LEMMA 6.25. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Suppose X 0 ⊆ R 1 . Then with probability at least 1 − 20(log n) 2 / , for all j ∈ [ ], K j contains at most nonmutants at time n 8 .
PROOF. For each positive integer i, let t i = n + (i − 1)20c 2 r n 7 k. Let E i be the event that at time t i , for all j ∈ [i] we have |X t i \ K j | ≤ . By Lemma 6.22, we have P(E 1 ) ≥ 1 − 19(log n) 2 / . For all i ∈ {2, . . . , }, by Lemma 6.24 (applied with j = i − 1 starting at t i−1 ), we have It follows that It therefore follows by Lemma 6.19(ii) applied to the interval (t , n 8 ] combined with a union bound that with probability at least 1 − 19(log n) 2 − n 8 e −(log n) 2 − e − 1 2 (log n) 3 ≥ 1 − 20(log n) 2 / , for all j ∈ [ ], K j contains at most mutants at time n 8 as required.

Filling Reservoirs from Cliques
6.7.1. Setting up an Iteration Scheme-Proof of Lemma 6.3. In this section, we outline an iterative argument which, together with Lemma 6.25, will allow us to prove our key lemma, Lemma 6.3.
Definition 6.26. For all i ∈ Z ≥0 , let Consider any t ≥ n 8 . Let i be the integer such that t ∈ [I − i , I + i ). Let f be a possible value of F t (X ). We say that f is good if the event F t (X ) = f implies that the following events occur.
corresponds to a phase of our iterative argument, which we state in Lemma 6.27. At the end of each interval, the number of nonmutants in each reservoir should drop by a factor of at least (2 log n) 2 to a minimum of (2 log n) 2 , as should the number of nonmutants in all reservoirs. In addition, every clique should remain almost full of mutants, and if there are fewer than nonmutants left in reservoirs, then many branches should be completely full of mutants. LEMMA 6.27. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let i be a nonnegative integer, and let f be a good possible value of F I − i (X ). Then . Assuming Lemma 6.27 for the moment, we give the proof of our key lemma, Lemma 6.3, which we restate here for convenience. LEMMA 6.3. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Suppose that x 0 ∈ R 1 ∪ · · · ∪ R . Then there exists a t ≥ 0 such that P X t = V (M ) ≥ 1 − 42(log n) 2 /n 1/2 . PROOF. By the symmetry of the megastar, we may assume that x 0 ∈ R 1 . For a nonnegative integer i, let E i be the event that F I − i (X ) is good and let E i be the event that X I + i = V (M ). To prove the lemma, it clearly suffices to show that By a union bound, we have For all i ∈ [2n], F I − i (X ) determines E 0 ∩ · · · ∩ E i−1 and thus, by Lemma 6.27, we have Also, since x 0 ∈ R 1 , for i = 0, we have by Lemma 6.25 that since P 1 (0), P 2 (0), and P 4 (0) hold trivially (from α 0 = m and β 0 = m). Combining Equations (69) and (70) we obtain that For i ≥ n, we have that for all i ∈ {n + 1, . . . , 2n}, it follows from Lemma 6.27 that Hence, we obtain that Now fix an integer h ≥ 0. Consider any time t i,a j n (h) ≥ I − i . Consider any integers w 0 , . . . , w h−1 and y ≥ 0, and any times t 0 , . . . , t y satisfying t i,a j n (h) ≤ t 0 ≤ · · · ≤ t y . Suppose that f is a value of F t y (X ) which implies that The event F t y (X ) = f determines whether or not t y = T i,a j m (h). We split the analysis into two cases.
Case 2. Suppose that F t y (X ) = f implies t y < T i,a j m (h). Let E be the event that in the interval (t y , ∞), some mutant clock with source in R j ∩ X t y triggers before any nonmutant clock with source in {a j , v * } triggers. If F t y (X ) = f , then t y < T i,a j m (h) ≤ T i end , so by (D3) and the fact that i ≥ 1, it follows that R j contains at least m−α i −α i+1 ≥ m − 2α 1 ≥ m/2 mutants at time t y . Hence We will now prove that if F t y (X ) = f and E occurs, then W i, j a (h) = y. Let T be the earliest time in the interval (t y , ∞) at which some mutant clock with source in R j ∩ X t y , say M (v,a j ) , triggers.
If t y < T i,a j m (h), then t y ≤ T i end so (D4) implies that K j has at most 2 nonmutants at t y . On the other hand, T h (y) = t y implies that K j is inactive at t y . Hence, it must be full of mutants at t y . This means that K j remains inactive after t y until a j spawns a nonmutant onto a vertex in K j . If E occurs, then no nonmutant clock with source a j triggers in (t y , T ], so K j is inactive throughout (t y , T ]. Recall that v ∈ X t y . If E occurs, then no nonmutant clock with source v * triggers in (t y , T ], so v ∈ X T also. Hence by definition, v spawns a mutant onto a j at time T . Moreover, K j is inactive at time T and so a j becomes a mutant at time T . Thus E occurs, as claimed. By Equation (74), it follows that if f implies that t y < T i,a j m (h), Combining Cases 1 and 2 by considering all possible f , t i,a j n (h), and t 0 , . . . , t y and combining Equations (73) and (75), it follows that The result therefore follows.
The following definition is related to Definition 6.30.
Definition 6.33. For each i ∈ Z ≥0 and j ∈ [ ], let W i, j a be the number of times in (I − i , T i end ] at which K j becomes active. We now combine Lemma 6.31 and Lemma 6.32 to show that, with high probability, K j does not become active too many times in (I − i , T i end ]. LEMMA 6.34. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let i be a nonnegative integer, and let f be a good possible value of F I − i (X ). Let j ∈ [ ]. Then P W i, j a ≤ α i √ n(log n) 4 | F I − i (X ) = f ≥ 1 − 2e −(log n) 3 . PROOF. First, note that K j can become active either when K j has no mutants and a mutant is spawned into it or when K j is full of mutants and a j spawns a nonmutant into it. By (D4), K j has at most 2 nonmutants throughout the interval (I We next consider cases on the value of i, namely, whether i = 0 or not. First suppose that i = 0, so α i = m. By Corollary 2.2 (see also the proof of Lemma 6.31), the probability that the nonmutant clocks with source a j trigger more than 2len(I i ) times in I i is at most e −n . It follows from Equation (76) that with probability at least 1 − e −n , W i, j a ≤ 2len(I i ) ≤ 4m √ n(log n) 3 = 4α i √ n(log n) 3 as required.
For each h ∈ [y + 1], let E h be the event that a mutant is spawned onto v * in the interval (t h−1 , t h ). Note that with probability 1, no mutant is spawned onto v * at any time t h . Thus, Now fix h ∈ [y + 1], and consider any possible value f h−1 of F t h−1 (X ) which implies that F t 0 (X ) = f and that E 1 ∩ · · · ∩ E h−1 occurs, and is consistent with = ϕ. Consider the evolution of X given F t h−1 (X ) = f h−1 and = ϕ. Since E 1 ∩ · · · ∩ E h−1 occurs, no mutant is spawned onto v * in the interval (t 0 , t h−1 ] and so X t h−1 = χ (h − 1). Moreover, X remains constant in [t h−1 , t h ) unless a mutant is spawned onto v * . Thus, given the condition that F t h−1 (X ) = f h−1 and = ϕ, E h occurs if and only if a mutant clock with source in χ (h − 1) and target v * triggers in the interval (t h−1 , t h ). Now, since E 1 ∩ · · · ∩ E h−1 occurs and t end > t h−1 , by (D4) we have |χ (h − 1) ∩ (K 1 ∪ · · · ∪ K )| ≥ k /2. Hence, It therefore follows from Equation (81) that The result therefore follows.
Definition 6.38. For all i ∈ Z ≥0 , let γ i = β i (log n) 20 /k. LEMMA 6.39. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let i be a nonnegative integer, and let f be a good possible value of F I − i (X ). Then, PROOF. For all h ∈ Z ≥0 , write S h = T i,v * m (h) − T i,v * n (h). Consider any s 0 , . . . s h−1 ≥ 0, t h ≥ I − i , and any possible value f h of F t h (X ) which implies that F I − i (X ) = f , S 0 = s 0 , . . . , S h−1 = s h−1 , and T i,v * n (h) = t h . Then by Lemma 6.37, for all t ≥ 0 we have By Corollary 6.36, we have The result therefore follows by Equations (82) and (83) and a union bound.
6.7.4. Proving Lemma 6.27. Recall that by Lemma 6.39, v * is very unlikely to spend more than γ i time as a nonmutant over the course of (I − i , T i end ]. This motivates the following definition. Definition 6.40. Recall the definition of (X ) (Section 3.3). For all i ∈ Z ≥0 , let In particular, if v * spends at most γ i time as a nonmutant in (I − i , T i end ], then it only spawns nonmutants onto vertices in U i . LEMMA 6.41. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let i be a nonnegative integer, and let f * be a possible value of F I − i ( (X )). Then with probability at least 1 − 3e −(log n) 2 conditioned on F I − i ( (X )) = f * , the following statements all hold.
The following lemma will be the heart of the proof of Lemma 6.27.
LEMMA 6.42. Consider any r > 1. There is an 0 , depending on r, such that the following holds for any ≥ 0 . Let i be a nonnegative integer, and let f * be a possible value of F I − i ( (X )) such that the induced value f of F I − i (X ) is good. Then with probability at least 1 − 10e −(log n) 2 conditioned on F I − i ( (X )) = f * , the following statements all hold. (i) (R 1 ∪ · · · ∪ R ) \ X I + i ⊆ U i .
(ii) If i ≥ 6, then for all j ∈ [ ] such that R j ∩ U i = ∅ and all t ∈ [I + i − len(I i )/4, I + i ], we have R j ∪ {a j } ∪ K j ⊆ X t .
PROOF. We first define events as follows. For all h ∈ Z ≥0 , let J − h = I − i + len(I i )/2 + h(log n) 7  Note that by Observation 3.7, F I − i ( (X )) is uniquely determined by F I − i (X ) and vice versa. Thus, the value f from the statement of the lemma is the unique value of f such that F I − i (X ) = f if and only if F I − i ( (X )) = f * , allowing us to apply results like Lemma 6.39 to X even though we are conditioning on F I − i ( (X )) rather than F I − i (X ). we may assign to each edge (u, v) a weight w uv , taking w uv = 0 if there is no such edge. Without loss of generality, we require that, for each vertex u, v∈V (G) w uv = 1, so the weights are probabilities. When a vertex u is chosen to reproduce in a weighted graph, its offspring goes to vertex v with probability w uv ; the process is, otherwise, identical to the process on unweighted graphs. Note that the unweighted process is recovered by assigning w uv = 1/d + (u) for every edge (u, v). A weighted graph G whose weights are probabilities is known as an evolutionary graph and is said to be isothermal [Lieberman et al. 2005;Shakarian et al. 2012] if, for all vertices u, v∈V (G) w vu = 1. This corresponds to the condition that the weighted adjacency matrix of G is doubly stochastic. Broom and Rychtár [2008] show that an undirected graph, when considered as a weighted graph with edge weights w uv = 1/d + (u), is isothermal if and only if it is regular.
In the supplementary material to Lieberman et al. [2005], Lieberman et al. state and prove the "isothermal theorem," which states that an evolutionary graph is "ρequivalent to the Moran process" if and only if it is isothermal. Being ρ-equivalent to the Moran process means that, for all sets X ⊆ V (G), the probability of reaching fixation from the state in which the set of mutants is X is (1 − 1/r |X| )/(1 − 1/r n ). In particular, this condition implies that the fixation probability given a single initial mutant placed uniformly at random is ρ reg (r, n) = (1 − 1/r)/(1 − 1/r n ).
The isothermal theorem has been incorrectly described in the literature. This may stem from the ambiguity of the informal statement of the theorem in the main text of Lieberman et al. [2005, p. 313]. Shakarian et al. [2012] state the theorem in the following form, which is very similar to the informal statement by Lieberman et al. It is true that all n-vertex connected isothermal graphs do have fixation probability ρ reg (r, n). However, the converse direction of the proposition does not hold. We prove the following. PROPOSITION 1.12. There is an evolutionary graph that is not isothermal, but has fixation probability ρ reg (r, n).
A counterexample to Proposition 8.1, proving Proposition 1.12, is the graph shown in Figure 4. This is an evolutionary graph: the total weight of outgoing edges from and unbiased when there are more. More specifically, the dominating chain Q(h) is a simple random walk on {0, . . . , m }. If h is below some threshold δ, then Q increases by 1 during each (discrete) step with probability γ /(1 + γ ) (for some γ > 1) and decreases by 1 with probability 1/(1 + γ ). If h is above δ, then it increases or decreases (by 1) with probability 1/2.
There are several problems with this domination. First, the domination is invalid because there are actually configurations in which the number of mutants is more likely to decrease than to increase. One such configuration is the configuration in which each reservoir contains m 2 mutants and m 2 nonmutants and the centre vertex v * and all path vertices are nonmutants. It is easy to see that the number of mutants is more likely to decrease than to increase from this configuration, and even that the number of mutants is likely to decrease at least k/(16r) times before it ever increases. Here is the idea. Before the mutant population of the reservoirs can possibly increase, v * must become a mutant. This takes at least k reproductions: a mutant in a reservoir must reproduce and a chain of k reproductions must move the mutant down the corresponding path to the centre. This is very likely to take at least nk/(2r) steps of the process. But during these nk/(2r) steps, v * is very likely to be chosen for reproduction at least k/(4r) times. Since v * is a nonmutant throughout this period, it must send a nonmutant into some reservoir each time it reproduces. Since half the reservoir vertices are mutants, it is very likely that these k/(4r) reproductions of the centre will cause the mutant population of the reservoirs to decrease, not just once, but at least k/(16r) times before it can even go up at all. Therefore, the assumption that the mutant population of the reservoirs is as likely to increase as to decrease does not hold for all configurations of mutants. A rigorous proof needs to cover all such possibilities.
Even in the early evolution of the process, when there are few mutants in reservoirs, there are still problems with making the domination rigorous. Jamieson-Lane and Hauert [2015, Section 3.5] say (translating their variable names to ours and adding a little notation for future reference) At any given time step, the probability of losing the initial mutant in the reservoir is p 1 := 1/(F t m). Based on the dynamics in the path, we derive the per time step probability that a second mutant is generated in any reservoir as the product of the probability that a "train" is generated and the probability that the train succeeds in producing a second mutant, which yields approximately p 2 := r 4 T /(F t m ).
Here, F t is taken to be the overall fitness (sum of individual fitnesses) in the configuration X t and T is the expected length of a "train" which is a chain of mutants at the end of the path. The dominating Markov chain Q is applied with γ ∼ p 2 / p 1 ∼ r 4 T . Since Q is a Markov chain, the domination is only valid if it applies step-by-step to every configuration. It applies, if, from every fixed configuration, the probability that the number of mutants next goes down is proportional to p 1 and the probability that it next goes up is proportional to p 2 . But this is not proved. First, note that the event that the number of reservoir mutants increases does not occur with probability p 2 at any particular step (conditioned on the configuration prior to the step). Instead, the expression given in p 2 is a heuristic aggregated probability which may, roughly, apply at some step or block of steps in the future. In order to rigorously dominate the number of reservoir mutants using the Markov chain Q it is necessary to split the process into discrete pieces (whose length may be a random variable) so that the number of reservoir mutants decreases by at most one in each piece. It is important that, conditioned on any configuration at the start of any piece, the probability that the number of reservoir mutants goes up must be at least p 2 / p 1 times the probability that it goes down. The article does not provide such a domination. Nevertheless, we do believe that