Successive minimum spanning trees

In a complete graph $K_n$ with edge weights drawn independently from a uniform distribution $U(0,1)$ (or alternatively an exponential distribution $\operatorname{Exp}(1)$), let $T_1$ be the MST (the spanning tree of minimum weight) and let $T_k$ be the MST after deletion of the edges of all previous trees $T_i$, $i<k$. We show that each tree's weight $w(T_k)$ converges in probability to a constant $\gamma_k$ with $2k-2\sqrt k<\gamma_k<2k+2\sqrt k$, and we conjecture that $\gamma_k = 2k-1+o(1)$. The problem is distinct from that of Frieze and Johansson (2018), finding $k$ MSTs of combined minimum weight, and for $k=2$ ours has strictly larger cost. Our results also hold (and mostly are derived) in a multigraph model where edge weights for each vertex pair follow a Poisson process; here we additionally have $\mathbb E(w(T_k)) \to \gamma_k$. Thinking of an edge of weight $w$ as arriving at time $t=n w$, Kruskal's algorithm defines forests $F_k(t)$, each initially empty and eventually equal to $T_k$, with each arriving edge added to the first $F_k(t)$ where it does not create a cycle. Using tools of inhomogeneous random graphs we obtain structural results including that $C_1(F_k(t))/n$, the fraction of vertices in the largest component of $F_k(t)$, converges in probability to a function $\rho_k(t)$, uniformly for all $t$, and that a giant component appears in $F_k(t)$ at a time $t=\sigma_k$. We conjecture that the functions $\rho_k$ tend to time translations of a single function, $\rho_k(2k+x)\to\rho_\infty(x)$ as $k \to \infty$, uniformly in $x\in \mathbb R$. Simulations and numerical computations give estimated values of $\gamma_k$ for small $k$, and support the conjectures just stated.


Introduction
1.1. Problem definition and main results. Consider the complete graph K n with edge costs that are i.i.d. random variables, with a uniform distribution U (0, 1) or, alternatively, an exponential distribution Exp (1). A wellknown problem is to find the minimum (cost) spanning tree T 1 , and its cost or "weight" w(T 1 ). A famous result by Frieze [10] shows that as n → ∞, w(T 1 ) converges in probability to ζ(3), in both the uniform and exponential cases.
Suppose now that we want a second spanning tree T 2 , edge-disjoint from the first, and that we do this in a greedy fashion by first finding the minimum spanning tree T 1 , and then the minimum spanning tree T 2 using only the remaining edges. (I.e., the minimum spanning tree in K n \ T 1 , meaning the graph with edge set E(K n ) \ E(T 1 ).) We then continue and define T 3 as the minimum spanning tree in K n \ (T 1 ∪ T 2 ), and so on. The main purpose of the present paper is to show that the costs w(T 2 ), w(T 3 ), . . . also converge in probability to some constants. Theorem 1.1. For each k 1, there exists a constant γ k such that, as n → ∞, w(T k ) p −→ γ k (for both uniform and exponential cost distributions).
The result extends easily to other distributions of the edge costs, see Remark 7.1, but we consider in this paper only the uniform and exponential cases.
A minor technical problem is that T 2 and subsequent trees do not always exist; it may happen that T 1 is a star and then K n \ T 1 is disconnected. This happens only with a small probability, and w.h.p. (with high probability, i.e., with probability 1 − o(1) as n → ∞) T k is defined for every fixed k, see Section 7. However, in the main part of the paper we avoid this problem completely by modifying the model: we assume that we have a multigraph, which we denote by K ∞ n , with an infinite number of copies of each edge in K n , and that each edge's copies' costs are given by the points in a Poisson process with intensity 1 on [0, ∞). (The Poisson processes for different edges are, of course, independent.) Note that when finding T 1 , we only care about the cheapest copy of each edge, and its cost has an Exp(1) distribution, so the problem for T 1 is the same as the original one. However, on K ∞ n we never run out of edges and we can define T k for all integers k = 1, 2, 3, . . . . Asymptotically, the three models are equivalent, as shown in Section 7, and Theorem 1.1 holds for any of the models. In particular: Theorem 1.2. For each k 1, as n → ∞, w(T k ) p −→ γ k also for the multigraph model with Poisson process costs.
Frieze [10] also proved that the expectation E w(T 1 ) converges to ζ (3). For the multigraph model just described, this too extends. Theorem 1.3. For the Poisson multigraph model, E w(T k ) → γ k for each k 1 as n → ∞.
It is well known that the minimum spanning tree (with any given costs, obtained randomly or deterministically) can be found by Kruskal's algorithm [21], which processes the edges in order of increasing cost and keeps those that join two different components in the forest obtained so far. (I.e., it keeps each edge that does not form a cycle together with previously chosen edges.) As in many other previous papers on the random minimum spanning tree problem, from [10] on, our proofs are based on analyzing the behavior of this algorithm.
Rescale to think of an edge of weight w as arriving at time t = nw. Kruskal's algorithm allows us to construct all trees T k simultaneously by growing forests F k (t), with F k (0) empty and F k (∞) = T k : taking the edges of K n (or K ∞ n ) in order of time arrival (increasing cost), an edge is added to the first forest F k where it does not create a cycle. We will also consider a sequence of graphs G k (t) ⊇ F k (t), where when we add an edge to F k we also add it to all the graphs G 1 , . . . , G k ; see Section 2.1 for details.
The proof of Theorem 1.1 is based on a detailed structural characterization of the graphs G k (t), given by Theorem 2.1 (too detailed to set forth here in full), relying heavily on the theory of inhomogeneous random graphs from [4] and related works. Where C 1 (G k (t)) denotes the number of vertices in the largest component of G k (t) (or equivalently of F k (t), as by construction they have the same components) Theorem 2.1 shows that C 1 (G k (t))/n converges in probability to some function ρ k (t), uniformly for all times t. Moreover, each G k has its own giant-component threshold: ρ k (t) is 0 until some time σ k , and strictly positive thereafter.
The functions ρ k (t) are of central interest. For one thing, an edge is rejected from F k , making it a candidate for F k+1 , precisely if its two endpoints are within the same component of F k , and it is shown (see Corollary 5.9) that this is essentially equivalent to the two endpoints both being within the largest component. This line of reasoning yields the constants γ k explicitly, see (6.23), albeit not in a form that is easily evaluated. We are able, at least, to re-prove (in Example 6.5) that γ 1 = ζ(3), as first shown in [10].
The functions ρ k also appear to have a beautiful structure, tending to time-translated copies of a single universal function: Conjecture 1.4. There exists a continuous increasing function ρ ∞ (x) : This suggests, though does not immediately imply, another conjecture. Conjecture 1.5. For some δ, as k → ∞, γ k = 2k + δ + o (1).
Although we cannot prove these conjectures, some bounds on γ k are obtained in Section 3 by a more elementary analysis of the sequence of forests F k . In particular, Theorem 3.1 and Corollary 3.2 lead to the following, implying that γ k ∼ 2k as k → ∞. Corollary 1.7. For every k 1, 2k − 2k 1/2 < γ k < 2k + 2k 1/2 . (1.1) See also the related Conjectures 3.5, 10.1 and 11.1.
Remark 1.8. For the simple graph K n with, say, exponential costs, there is as said above a small but positive probability that T k does not exist for k 2. Hence, either E w(T k ) is undefined for k 2, or we define w(T k ) = ∞ when T k does not exist, and then E w(T k ) = ∞ for k 2 and every n. This is no problem for the convergence in probability in Theorem 1.1, but it implies that Theorem 1.3 does not hold for simple graphs, and the multigraph model is essential for studying the expectation.
Remark 1.9. For the minimum spanning tree T 1 , various further results are known, including refined estimates for the expectation of the cost w(T 1 ) [8], a normal limit law [15], and asymptotics for the variance [15; 20; 30]. It seems challenging to show corresponding results for T 2 or later trees.

1.2.
Motivations. Frieze and Johansson [11] recently considered a related problem, where instead of choosing spanning trees T 1 , T 2 , . . . greedily one by one, they choose k edge-disjoint spanning trees with minimum total cost. It is easy to see, by small examples, that selecting k spanning trees greedily one by one does not always give a set of k edge-disjoint spanning trees with minimum cost, so the problems are different. We show in Theorem 9.3 that, at least for k = 2, the two problems also asymptotically have different answers, in the sense that the limiting values of the minimum cost -which exist for both problems -are different.
(Also, as discussed in Section 3.1, we improve on the upper bound from [11,Section 3] on the cost of the net cheapest k trees, since our upper bound (3.1) on the cost of the first k trees is smaller.) Both our question and that of Frieze and Johansson [11] are natural, both seem generally relevant to questions of robust network design, and both have mathematically interesting answers.
Another motivation for our question comes from Talwar's "frugality ratio" characterizing algorithmic mechanisms (auction procedures) [29]. The frugality ratio is the cost paid by a mechanism for a cheapest structure, divided by the nominal cost of the second-cheapest structure (in the sense of our T 2 ). Talwar showed that for any matroidal structure (such as our spanning trees), in the worst case over all cost assignments, the frugality ratio of the famous Vickrey-Clarke-Groves (VCG) auction is 1: the VCG cost lies between the nominal costs of T 1 and T 2 . It is natural to wonder, in our randomized setting, how these three costs compare.
Chebolu, Frieze, Melsted and Sorkin [7] show that in the present setting (MST in K n with i.i.d. U (0, 1) edge costs), the VCG cost is on average exactly 2 times the nominal cost of T 1 , and Janson and Sorkin [19] show that the VCG cost converges in probability to a limit, namely 2 times the limit ζ(3) of the cost of T 1 (with further results given for all graphs, and all matroids). Frieze and Johansson [11] show that the combined cost of the cheapest pair of trees converges in expectation to a constant which is numerically about 4.1704288, and Theorem 9.3 shows that the cost of T 1 +T 2 converges in probability to a value that is strictly larger. It follows that in this average-case setting, the frugality ratio converges in probability to some value smaller than (2ζ(3))/(4.1704288−ζ(3)), about 0.80991. So where Talwar found that the (worst-case) frugality ratio was at most 1 for matroids and could be larger in other cases, in the present setting it is considerably less than 1.
We use := as defining its left-hand side, and def = as a reminder that equality of the two sides is by definition. We write . = for numerical approximate equality, and ≈ for approximate equality in an asymptotic sense (details given where used).
We use "increasing" and "decreasing" in their weak senses; for example, a function f is increasing if f (x) f (y) whenever x y.
Unspecified limits are as n → ∞. As said above, w.h.p. means with probability 1 − o (1). Convergence in probability is denoted p −→. Furthermore, if X n are random variables and a n are positive constants, X n = o p (a n ) means, as usual, X n /a n p −→ 0; this is also equivalent to: for every ε > 0, w.h.p. |X n | < εa n .
Graph means, in general, multigraph. (It is usually clear from the context whether we consider a multigraph or simple graph.) If G is a multigraph, thenĠ denotes the simple graph obtained by merging parallel edges and deleting loops. (Loops do not appear in the present paper.) The number of vertices in a graph G is denoted by |G|, and the number of edges by e(G).
For a graph G, let C 1 (G), C 2 (G), . . . be the largest component, the second largest component, and so on, using any rule to break ties. (If there are less than k components, we define C k (G) = ∅.) Furthermore, let C i (G) := |C i (G)|; thus C 1 (G) is the the number of vertices in the largest component, and so on. We generally regard components of a graph G as sets of vertices.

Model.
We elaborate the multigraph model in the introduction.
We consider (random) (multi)graphs on the vertex set [n] := {1, . . . , n}; we usually omit n from the notation. The graphs will depend on time, and are denoted by G k (t) and F k (t), where k = 1, 2, 3, . . . and t ∈ [0, ∞]; they all start as empty at time t = 0 and grow as time increases. We will have G k (t) ⊇ G k+1 (t) and F k (t) ⊆ G k (t) for all k and t. Furthermore, F k (t) will be a forest. As t → ∞, F k (t) will eventually become a spanning tree, F k (∞), which is the kth spanning tree T k produced by the greedy algorithm in the introduction, operating on the multigraph G 1 (∞).
Since the vertex set is fixed, we may when convenient identify the multigraphs with sets of edges. We begin by defining G 1 (t) by letting edges arrive as independent Poisson processes with rate 1/n for each pair {i, j} of vertices; G 1 (t) consists of all edges that have arrived at or before time t. (This scaling of time turns out to be natural and useful. In essence this is because what is relevant is the cheapest edges on each vertex, and these have expected cost Θ(1/n) and thus appear at expected time Θ(1).) We define the cost of an edge arriving at time t to be t/n, and note that in G 1 (∞), the costs of the edges joining two vertices form a Poisson process with rate 1. Hence, G 1 (∞) is the multigraph model defined in Section 1.
Thus, for any fixed t 0, G 1 (t) is a multigraph where the number of edges between any two fixed vertices is Po(t/n), and these numbers are independent for different pairs of vertices. This is a natural multigraph version of the Erdős-Rényi graph G(n, t). (The process G 1 (t), t 0, is a continuous-time version of the multigraph process in e.g. [3] and [16, Section 1], ignoring loops.) Note thatĠ 1 (t), i.e., G 1 (t) with multiple edges merged, is simply the random graph G(n, p) with p = 1 − e −t/n . Next, we let F 1 (t) be the subgraph of G 1 (t) consisting of every edge that has arrived at some time s t and at that time joined two different components of G 1 (s). Thus, this is a subforest of G 1 (t), as stated above, and it is precisely the forest constructed by Kruskal's algorithm (recalled in the introduction) operating on G 1 (∞), at the time all edges with cost t/n have been considered. Hence, F 1 (∞) is the minimum spanning tree T 1 of i.e., the subgraph of G 1 (t) consisting of all edges rejected from F 1 (t); in other words G 2 (t) consists of the edges that, when they arrive to G 1 (t), have their endpoints in the same component.
We continue recursively. F k (t) is the subforest of G k (t) consisting of all edges in G k (t) that, when they arrived at some time s t, joined two different components in G k (s). And G k+1 (t) := G k (t) \ F k (t), consisting of the edges rejected from F k (t).
Hence, the kth spanning tree T k produced by Kruskal's algorithm equals F k (∞), as asserted above.
Note that F k (t) is a spanning subforest of G k (t), in other words, the components of F k (t) (regarded as vertex sets) are the same as the components of G k (t); this will be used frequently below. Moreover, each edge in G k+1 (t) has endpoints in the same component of G k (t); hence, each component of G k+1 (t) is a subset of a component of G k (t). It follows that an edge arriving to G 1 (t) will be passed through G 2 (t), . . . , G k (t) and to G k+1 (t) (and possibly further) if and only if its endpoints belong to the same component of G k (t), and thus if and only if its endpoints belong to the same component of F k (t).
2.2. More notation. We say that a component C of a graph G is the unique giant of G if |C| > |C | for every other component C ; if there is no such component (i.e., if the maximum size is tied), then we define the unique giant to be ∅.
We say that a component C of F k (t) is the permanent giant of F k (t) (or of G k (t)) if it is the unique giant of F k (t) and, furthermore, it is a subset of the unique giant of F k (u) for every u > t; if there is no such component then the permanent giant is defined to be ∅.
Let C * k (t) denote the permanent giant of F k (t). Note that the permanent giant either is empty or the largest component; thus |C * k (t)| is either 0 or C 1 (F k (t)) = C 1 (G k (t)). Note also that the permanent giant C * k (t) is an increasing function of t:

2.3.
A structure theorem. The basis of our proof of Theorems 1.1 and 1.2 is the following theorem on the structure of the components of G k (t).
Recall that F k (t) has the same components as G k (t), so the theorem applies as well to F k (t). The proof is given in Section 5. For k = 1, the theorem collects various known results for G(n, p). Our proof includes this case too, making the proof more self-contained.
Theorem 2.1. With the definitions above, the following hold for every fixed k 1 as n → ∞.
(i) There exists a continuous increasing function We note also a formula for the number of edges in G k (t), and two simple inequalities relating different k.
3. Bounds on the expected cost 3.1. Total cost of the first k trees. The following theorem gives lower and upper bounds on the total cost of the first k spanning trees.
be the total cost of the first k spanning trees, for every k 1, Comparing with Frieze and Johansson [11, Section 3], our upper bound is smaller than their k 2 + 3k 5/3 despite the fact that they considered a more relaxed minimization problem (see Section 9); as such ours is a strict improvement. In both cases the lower bound is simply the expected total cost of the cheapest k(n − 1) edges in G, with (3.2) matching [11, (3.1)].
Proof. The minimum possible cost of the k spanning trees is the cost of the cheapest k(n − 1) edges. Since each edge's costs (plural, in our model) are given by a Poisson process of rate 1, the set of all edge costs is given by a Poisson process of rate n 2 . Recall that in a Poisson process of rate λ, the interarrival times are independent exponential random variables with mean 1/λ, so that the ith arrival, at time Z i , has E Z i = i/λ. It follows in this case that W k We now prove the upper bound. An arriving edge is rejected from F i iff both endpoints lie within its "forbidden" set B i of edges, namely those edges with both endpoints in one component. The nesting property of the components means that B 1 ⊇ B 2 ⊇ · · · . An arriving edge e joins F k if it is rejected from all previous forests, i.e., e ∈ B k−1 (in which case by the nesting property, e also belongs to all earlier Bs) but can be accepted into F k , i.e., e / ∈ B k . The idea of the proof is to show that the first k forests fill reasonably quickly with n−1 edges each, and we will do this by coupling the forest-creation process (Kruskal's algorithm) to a simpler, easily analyzable random process.
Let s(τ ) = {s k (τ )} ∞ k=0 denote the vector of the sizes (number of edges) of each forest after arrival of the τ 'th edge; we may drop the argument τ when convenient. Let p k = |B k |/ n 2 , the rejection probability for F k . For any τ , by the nesting property of the components and in turn of the B k , The MST process can be simulated by using a sequence of i.i.d. random variables α(τ ) ∼ U (0, 1), incrementing s k (τ ) if both α(τ ) p k−1 (τ ) (so that e is rejected from F k−1 and thus from all previous forests too) and α(τ ) > p k (τ ) (so that e is accepted into F k ). We take the convention that p 0 (τ ) = 1 for all τ . For intuition, note that when s k = 0 an edge is never rejected from in F k (p k = 0, so α ∼ U (0, 1) is never smaller); when s k = 1 it is rejected with probability p k = 1/ n 2 ; and when s k = n − 1 it is always rejected (|B k | must be n 2 , so p k = 1). Given the size is maximized (thus so is p k ) when all the edges are in one component, i.e., The size vector s(τ ) thus determines the valuesp k (τ ) for all k. Let r(τ ) denote a vector analogous to s(τ ), but with By construction, For intuition, here note that when r k = 0 an arrival is never rejected from r k (p k = 0); when s k = 1 it is rejected with probabilityp k = 1/(n − 1) > p k = 1/ n 2 ; and when s k = n − 1 it is always rejected (p k = 1). Figure 1. Coupling of the forests' sizes s(τ ) to a simply analyzable random process r(τ ), showing the structure of the inductive proof (on τ ) that s(τ ) majorizes r(τ ).
Taking each F i (0) to be an empty forest (n isolated vertices, no edges) and accordingly s(0) to be an infinite-dimensional 0 vector, and taking r(0) to be the same 0 vector, we claim that for all τ , s(τ ) majorizes r(τ ), which we will write as s(τ ) r(τ ). That is, the prefix sums of s dominate those of r: for all τ and k, . We first prove this; then use it to argue that edge arrivals to the first k forests, i.e., to s, can only precede arrivals to the first k elements of r; and finally analyze the arrival times of all k(n − 1) elements to the latter to arrive at an upper bound on the total cost of the first k trees.
We prove s(τ ) r(τ ) by induction on τ , the base case with τ = 0 being trivial. Figure 1 may be helpful in illustrating the structure of this inductive proof. Suppose the claim holds for τ . The probabilities p k (τ ) are used to determine the forests F k (τ +1) and in turn the size vector s(τ +1). Consider an intermediate object s (τ + 1), the size vector that would be given by incrementing s(τ ) using the upper-bound valuesp k (τ ) taken from s(τ ) by (3.5). Then, s i (τ + 1) receives the increment if p i−1 α > p i , and s j (τ + 1) receives the increment ifp j−1 α >p j ; hence, fromp i−1 p i−1 α it is immediate that i j and thus s(τ + 1) s (τ + 1).
It suffices then to show that s (τ + 1) r(τ + 1). These two vectors are obtained respectively from s(τ ) and r(τ ), with s(τ ) r(τ ) by the inductive hypothesis, using probability thresholdsp k (τ ) = f (s k (τ )) and p k (τ ) = f (r k (τ )) respectively, applied to the common random variable α, where f (s) = s/(n − 1) (but all that is important is that f is a monotone function of s). Suppose that so that elements i in s and j in r are incremented. If i j, we are done. (Prefix sums of s(τ ) dominated those of r(τ ), and an earlier element is incremented in s (τ + 1) than r(τ + 1), thus prefix sums of s (τ + 1) dominate those of r(τ + 1).) Consider then the case that i > j. In both processes the increment falls between indices j and i, so the k-prefix sum inequality continues to hold for k < j and k i. Thus, for j k < i, from which it follows that s (τ + 1) r(τ + 1), completing the inductive proof that s(τ ) r(τ ).
Having shown that the vector s(τ ) of component sizes majorizes r(τ ), it suffices to analyze the latter. Until this point we could have used (3.4) rather than (3.5) to definep k ,p k , and the function f , but now we take advantage of the particularly simple nature of the process governing r(τ ). Recall that a new edge increments r i for the first i for which the U (0, 1) "coin toss" α(τ ) has α(τ ) >p i def = r i /(n − 1). Equivalently, consider an array of cells n − 1 rows high and infinitely many columns wide, generate an "arrival" at a random row or "height" X(τ ) uniform on 1, . . . , n − 1, and let this arrival occupy the first unoccupied cell i at this height, thus incrementing the occupancy r i of column i. This is equivalent because if r i of the n − 1 cells in column i are occupied, the chance that i is rejected -that X(τ ) falls into this set and thus the arrival moves along to test the next column i + 1 -is r i /(n − 1), matching (3.6).
Recalling that the cost of an edge arriving at time t is t/n in the original graph problem, the combined cost W k of the first k spanning trees is 1/n times the sum of the arrival times of their k(n − 1) edges. The majorization means that the 'th arrival to the first k forests comes no later than the 'th arrival to the first k columns of the cell array. Thus, the cost W k of the first k trees is at most 1/n times the sum of the times of the k(n − 1) arrivals to the array's first k columns.
The continuous-time edge arrivals are a Poisson process with intensity 1/n on each of the n 2 edges, thus intensity (n − 1)/2 in all; it is at the Poisson arrival times that the discrete time τ is incremented and X(τ ) is generated. Subdivide the "X" process into the n − 1 possible values that X may take on, so that arrivals at each value (row in the cell array) are a Poisson process of intensity λ = 1 2 . The sum of the first k arrival times in a row is the sum of the first k arrival times in its Poisson process. The ith such arrival time is the sum of i exponential random variables, and has expectation i/λ. The expected sum of k arrival times of a line is thus k+1 2 /λ = k(k + 1), and (remembering that cost is time divided by n), the expected total cost of all n − 1 lines is n − 1 n k(k + 1), yielding the upper bound in (3.1) and completing the proof of the theorem.
Then, for every k 1, Proof. Immediate from Theorems 3.1 and 1.3.
Proof of Corollary 1.7. For the upper bound, we note that obviously γ 1 γ 2 . . . , and thus, for any 1, using both the upper and lower bound in (3.11), and hence and hence Choosing, again, = √ k gives the lower bound in (1.1).
Besides these rigorous results, taking increments of the left and right-hand sides of (3.11) also suggests the following conjecture.
3.3. Improved upper bounds. The upper bounds in Theorem 3.1 and Corollary 3.2 were proved using the bound (3.5). A stronger, but less explicit, bound can be proved by using instead the sharper (3.4). That is, we consider the random vectors r(τ ) defined as above but with (3.6) replaced byp As remarked before (3.4), this approximation comes from imagining all edges in each F k to be in a single component; this overestimates the probability that an arriving edge is rejected from F k and, as developed in the previous subsection, gives s(τ ) r(τ ) just as whenp k was defined by (3.5).
Using for consistency our usual time scaling in which edges arrive at rate (n − 1)/2, by a standard martingale argument one can show that, for each k 1, uniformly for t 0, (3.17) for some continuously differentiable functions g k (t) satisfying the differential equations, with g 0 (t) := 1, Moreover, using s(τ ) r(τ ) and taking limits, it can be shown that We omit the details, but roughly, in time dt, 1 2 n dt edges arrive, all costing about t/n, and a g k (t) 2 fraction of them pass beyond the first k graphs (to the degree that we are now modeling graphs). Compare (3.19) with (6.19), with reference to (6.3).
For k = 1, (3.18) has the solution g 1 (t) = tanh(t/2), and (3.19) yields the bound Γ 1 = γ 1 2 ln 2 . = 1.386. This is better than the bound 2 given by (3.11), but still far from precise since γ 1 = ζ(3) . = 1.202. For k 2 we do not know any exact solution to (3.18), but numerical solution of (3.18) and calculation of (3.19) (see Section 11.3) suggests that Γ k < k 2 + 1. We leave the proof of this as an open problem. If proved, this would be a marked improvement on Γ k k 2 + k, which was the exact expectation of the random process given by (3.5) (that part of the analysis was tight). In particular, it would establish that 2k − 2 γ k 2k; see Conjecture 11.1.
For k = 2, the numerical calculations in Section 11.3 give γ 1 + γ 2 4.554 2 . . . (see Table 2) and thus γ 2 3.352 1 . . .. The same value was also obtained using Maple's numerical differential equation solver, with Maple giving greater precision but the two methods agreeing in the digits shown here.
4. Preliminaries and more notation 4.1. Some random graphs. For a symmetric array (p ij ) n i,j=1 of probabilities in [0, 1], let G(n, (p ij )) be the random (simple) graph on the vertex set [n] := {1, . . . , n} where the edge ij appears with probability p ij , for i < j, and these n 2 events are independent. We extend this (in a trivial way) by defining G(n, A) = G(n, (a ij )) := G(n, (a ij ∧ 1) n i,j=1 ) for any symmetric non-negative n × n matrix A = (a ij ) n i,j=1 . Moreover, the matrix A can be a random, in which case G(n, A) is defined by first conditioning on A. (Hence, edges appear conditionally independently, given A.) Note that we do not allow loops, so the diagonal entries p ii or a ii are ignored, and may be assumed to be 0 without loss of generality.

4.2.
Susceptibility. The susceptibility χ(G) of a (deterministic or random) graph G of order n = |G| is defined by can be interpreted as the mean size of the component containing a random vertex, see [18]. We also exclude the first term in the sum and define (This is particularly interesting for a graph G with a single giant component of order Θ(n), when the sum in (4.1) is dominated by the first term.) Viewing each term in the sums (4.1)-(4.2) as ( C i (G)/n = 1 each sum can be viewed as a weighted sum of the sizes C i (G) with the weights summing to at most 1, so Let π(G) be the probability that two randomly chosen distinct vertices in G belong to the same component. Then 4.3. Kernels and an integral operator. A kernel, or graphon, is a nonnegative symmetric measurable function κ : S 2 → [0, ∞), where (in a common abuse of notation) S = (S, F, µ) is a probability space. Given a kernel κ on a probability space S, let T κ be the integral operator defined by (for suitable functions f on S), and let Φ κ be the non-linear operator In our cases, the kernel κ is bounded, and then T κ is a compact (in fact, Hilbert-Schmidt) operator on L 2 (S, µ). Since furthermore κ 0, it follows that there exists an eigenfunction ψ 0 on S with eigenvalue T k , see [4,Lemma 5.15], where T κ denotes the operator norm of T k as an operator on L 2 (S, µ).

4.4.
Branching processes. Given a kernel κ on a probability space (S, F, µ), as in [4] let X κ (x) be the multi-type Galton-Watson branching process with type space S, starting with a single particle of type x ∈ S, and where in each generation a particle of type y is replaced by its children, consisting of a set of particles distributed as a Poisson process on S with intensity κ(y, z) dµ(z). Let further X κ be the same branching process started with a particle of random type, distributed as µ.
(i) The function ρ κ is a fixed point of Φ κ , i.e., it satisfies the equation The proof of Theorem 2.1 is based on induction; we assume throughout this section that, for k 1, Theorem 2.1 holds for k − 1 and show that it holds for k.
For convenience, we define F 0 (t) := G 0 (t) := K n for every t 0; this enables us to consider G 1 (t) together with G k (t) for k > 1. (Alternatively, we could refer to known results for the random graph process G 1 (t).) Note that Theorem 2.1 then trivially holds for k = 0, with ρ 0 (t) = 1 for all t and σ 0 = 0, except that (iii) has to be modified (since ρ 0 is constant). There are some trivial modifications below in the case k = 1 (and also some, more or less important, simplifications); we leave these to the reader.
Thus, fix k 1, assume that Theorem 2.1 holds for k − 1 and consider the evolution of G k (t). Essentially everything in this proof depends on k, but we often omit it from the notation. (Recall that we also usually omit n.) We condition on the entire process (F k−1 (s)) s 0 . For two distinct vertices i, j ∈ [n], let τ (i, j) = τ k−1 (i, j) be the time that i and j become members of the same component in F k−1 (t). This is the time when edges ij start to be passed to G k (t), and it follows that, conditionally on (F k−1 (s)) s 0 , the process G k (t), t 0, can be described as G 1 (t) above (Section 2), except that for each pair {i, j} of vertices, edges appear according to a Poisson process on (τ (i, j), ∞). In particular, for a fixed time t (a value, independent of n) and conditioned on (F k−1 (s)) s 0 , in the multigraph G k (t), the number of edges ij is Po (t − τ (i, j)) + /n , and these numbers are (conditionally) independent for different pairs {i, j}. Hence, if we merge multiple edges and obtain the simple graphĠ k (t), we see thaṫ the random graph defined in Section 4.1 with when i = j, and (for completeness) p ii = 0. Note that the probabilities p ij depend on (F k−1 (s)) s 0 and thus are random, and recall that therefore i.e., the first time that vertex i belongs to the permanent giant of 4) but strict inequality is possible since i and j may both belong to a component of F k−1 (t) that is not the permanent giant. We shall see that this does not happen very often, and one of the ideas in the proof is that we may regard the inequality (5.4) as an approximate equality. This is formalized in the following lemma, and leads to a more tractable graph defined in (5.22) and compared withĠ k (t) in Lemma 5.3.
Proof. Fix ε > 0, let L := t/ε and let t : Note that, using (5.4), for any pair (i, j), and for a good pair (i.e., a pair that is not bad), By the induction hypothesis Theorem 2.1(v), w.h.p. G k−1 (t 1 ) has a permanent giant, so we may assume that this holds. (Failures contribute o p (n 2 ) to the right-hand side of (5.5).) In the first case, i and j belong to the same component in G k−1 (σ k−1 − ε), and in the second case they belong to the same component in G k−1 (t ), but not to the largest one, since that is assumed to be the permanent giant. Hence, for any t, the number of bad pairs (i, j) is at most, using the definitions (4.1)-(4.2), (5.9) By (4.3) and the induction hypothesis (i) and (iii), 10) and similarly for every , by (4.4) and the induction hypothesis (ii), By (5.9)-(5.11), the number of bad pairs is o p (n 2 ). Hence, using (5.7) and 12) and the result follows since ε is arbitrary.
We use the machinery and notation in Bollobás, Janson and Riordan [4, in particular Section 2], and make the following definitions: • µ k−1 is the probability measure on S with distribution function • ν n is a probability measure given by where δ x is the point mass (Dirac delta) at x. (In other words, ν n is the empirical distribution of {x 1 , . . . , x n }. Put yet another way, for any set A ⊂ S, ν n (A) := 1 n {i : x i ∈ A} .) Note that x n and ν n are random, and determined by (F k−1 (s)) s 0 .
Proof. The claim is equivalent to In the terminology of [4, Section 2], (S, µ k−1 ) is a ground space and, by Lemma 5.2, is a vertex space, meaning that the number of vertices x i appearing by time t is governed by µ k−1 , as made precise by (5.18). We define also, for every t 0, the kernel Note that, for fixed t, the kernel κ t is bounded and continuous; hence κ t is a graphical kernel [4, Definition 2.7, Remark 2.8 and Lemma 8.1]. Furthermore, κ t is strictly positive, and thus irreducible, on [0, t)×[0, t), and 0 on the As detailed in [4, Section 2], specifically near its (2.3), these ingredients define a random graph G V (n, κ t ). (5.21) Recall that in our case the kernel κ t is given by (5.20) while the vertex space V is given by (5.19), in turn with S and µ k−1 given by (5.13) and (5.14), and x n given by (5.15). In general, G V (n, κ t ) denotes a random graph with vertices arriving at random times x n , vertices i and j joined with probability κ t (x i , x j ), and [4] describes the behavior of such an inhomogeneous random graph. It suffices to think of G V (n, κ t ) in terms of S, µ k−1 , and κ t , because as shown in [4] the particulars of x n are irrelevant as long as x n is consistent with µ k−1 in the sense of (5.18) and (5.16), this consistency following from the fact that V is a vertex space (see (5.19) and the line following it).
Here, G V (n, κ t ) is the random graph alluded to after (5.4), a proxy for G k (t) with the difference that it is based on the times τ (i) of vertices joining the permanent giant of F k−1 , rather than the more complicated two-variable times τ (i, j) of two vertices first belonging to a common component. Concretely, when i = j, and (for completeness) p − ii = 0. We assume throughout that n t, so that p − ij ∈ [0, 1]; this is not an issue since t is fixed while n → ∞.
Note that by (5.2) and (5.5), Recall that both p ij and p − ij depend on (F k−1 (s)) s 0 , and thus are random. By (5.1) and (5.22),Ġ k (t) = G(n, (p ij )) and G V (n, κ t ) = G(n, (p − ij )), so by making the obvious maximal coupling of G(n, (p ij )) and G(n, (p − ij )) conditionally on (F k−1 (s)) s 0 , we obtain a coupling ofĠ k (t) and is the number of edges that are present in one of the graphs but not in the other, then Then, using Markov's inequality, for any δ > 0, P X n > εn P(Y n > δn) + E P(X n > εn | Y n δn) P(Y n > δn) + δn εn .

5.2.
Towards part (i). The following lemma establishes (2.1) of Theorem 2.1(i) for any fixed t 0; doing so uniformly for all t 0, as the theorem states, follows later. Here we rely on [4, Theorem 3.1], which, roughly speaking, relates the size of the largest component of a random graph G V (n, κ t ), to the survival probability of the branching process defined by the same kernel κ t and the measure (here µ k−1 ) comprised by the vertex space V. By Lemma 5.3, the graphĠ k (t) of interest differs from G V (n, κ t ) in only o p (n) edges, and the stability theorem [4,Theorem 3.9] shows that the size of the largest component ofĠ k (t) is about the same as that of G V (n, κ t ).
Let X t = X t,k := X κt be the branching process defined in Section 4.4 for the kernel κ t and the measure µ k−1 , and (recalling (4.8)) let ρ(κ t ) def = ρ(κ t ; µ k−1 ) be its survival probability.
the survival probability of the branching process X t .
Hence we may in the rest of the proof assume t > σ k−1 and thus µ k−1 (t) = ρ k−1 (t) > 0. As noted after (5.20) above, the kernel κ t then is quasiirreducible. Hence, it follows from [4, Theorem 3.1] that We have shown in Lemma 5.3 thatĠ k (t) differs from G V (n, κ t ) by only o p (n) edges, and we appeal to the stability theorem [4, Theorem 3.9] to show that largest components of these two graphs have essentially the same size.
(Alternatively, we could use [5, Theorem 1.1].) A minor technical problem is that this theorem is stated for irreducible kernels, while κ t is only quasiirreducible. We can extend the theorem (in a standard way) by considering only the vertices i with x i = τ k−1 (i) t, i.e., the vertices i in the permanent giant C * k−1 (t) of G k−1 (t), see (5.3). This defines a generalized vertex space [4, Section 2] V = (S , µ k−1 , (x n ) n 1 ), where S := [0, t], µ k−1 is the restriction of µ k−1 to S , and x n is the subsequence of x n = (x 1 , . . . , x n ) consisting of all x i ∈ S . The kernel κ t is strictly positive a.e. on S × S , and is thus irreducible.
Thus, we may take G n in [4, Theorem 3.9] to be G n := G V (n, κ t ), (5.30) which may be thought of as the restriction of G V (n, κ t ) to C * k−1 (t). Take the theorem's G n to be the restriction ofĠ k (t) to C * k−1 (t). For any δ > 0, from Lemma 5.3, w.h.p.
δn. Restricting each of these graphs to C * k−1 (t), it follows that w.h.p. e(G n G n ) δn. (5.32) Thus, G n and G n fulfill the theorem's hypotheses. For any ε > 0, we may choose δ > 0 per the theorem's hypotheses, and it follows from the theorem and (5.29) that w.h.p.
Our aim is to establish (2.2), which is (5.33) with C 1 (G k (t)) in lieu of C 1 (G n ). Each component C of G k (t) (or equivalently ofĠ k (t)) is a subset of some component of G k−1 (t), either C 1 (G k−1 (t)) or some other component. Since t > σ k−1 and by the induction hypothesis (v) of Theorem 2.1, w.h.p. C * k−1 (t) = ∅ and thus C 1 (G k−1 (t)) = C * k−1 (t). Thus, components of G k (t) contained in C 1 (G k−1 (t)) are also contained in G n , and the largest such component is governed by (5.33). Components of G k (t) contained in a smaller component of G k−1 (t) have size at most C 2 (G k−1 (t)), which by the induction hypothesis (ii) is w.h.p. smaller than any constant times n, and thus smaller than the component described by (5.33). Consequently, w.h.p. C 1 (G k (t)) = C 1 (G n ), and thus (5.33) implies (2.2) w.h.p., for every ε > 0, which is equivalent to (2.1).

5.3.
Towards part (ii). The next lemma establishes something like Theorem 2.1 (ii), but only for any fixed t 0; extending this to the supremum follows later.
Proof. We use the notation of the proof of Lemma 5.4, specifically (5.30) and (5.31). Let G † n be the graph G n with a single edge added such that the two largest components C 1 (G n ) and C 2 (G n ) are joined and let ε > 0. (If C 2 (G n ) = ∅, let G † n := G n .) Since w.h.p. the analog of (5.32) holds also for G † n , [4, Theorem 3.9] applies also to G n and G † n and shows that w.h.p.
Furthermore, as shown in the proof of Lemma 5.4, w.h.p. every component of G k (t) that is not part of G n has size at most C 2 (G k−1 (t)), which w.h.p. is εn by the induction hypothesis. Consequently, w.h.p.
which completes the proof.
Let T t = T t,k := T κt be the integral operator defined by (4.6) with the measure µ k−1 . We regard T t as an operator on L 2 (S, µ k−1 ), and recall that (since κ t is bounded) T t is a bounded and compact operator for every t 0.

Proofs of parts (iii) and (iv).
Proof of Theorem 2.1(iii). By Lemma 5.6, there exists a unique σ k > 0 such that where the last equivalence follows from [4, Theorem 3.1], establishing T t > 1 as a necessary and sufficient condition for the existence of a giant component, and providing its size. In order to see that ρ k is strictly increasing on [σ k , ∞), let σ k < t < u. Since κ u (x, y) κ t (x, y) for all x, y ∈ S, we may couple the branching processes X t = X κt and X u = X κu such that X u is obtained from X t by adding extra children to some individuals. (Each individual of type x gets extra children of type y distributed as a Poisson process with intensity (κ u (x, y)−κ t (x, y)) dµ k−1 (y), independent of everything else.) Then clearly X u survives if X t does, so ρ k (u) := ρ(κ u ) ρ(κ t ) = ρ k (t). (See [4, Lemma 6.3].) Moreover, there is a positive probability that X t dies out but X u survives, for example because the initial particle has no children in X t but at least one in X u , and this child starts a surviving branching process. Hence ρ k (u) > ρ k (t).
We next prove Theorem 2.1(iv). A simple lemma will be useful here and subsequently.
Consider the process defined in Section 2 of all graphs G j (t), j 1 and t 0, under some edge-arrival process; consider also a similar set of graphs G j (t) coming from a second arrival process thicker than the first (i.e., containing the same arrivals and possibly others).
Lemma 5.7. The thicker process yields larger graphs, i.e., G j (t) ⊆ G j (t) for all j and t. Also, any edge e present in both arrival processes, if contained in F 1 (t) ∪ · · · ∪ F j (t), is also contained in F 1 (t) ∪ · · · ∪ F j (t).
Proof. It is easy to see that adding edges can only make G 1 larger, i.e., that G 1 (t) ⊇ G 1 (t). Thus any edge originally passed on to G 2 (t) will still be passed on, plus perhaps some others; by induction on j, any G j (t) can only increase, i.e., G j (t) ⊇ G j (t). This proves the first assertion. The second assertion follows from the first. If e is not contained in F 1 (t) ∪ · · · ∪ F j (t) then it is passed on to G j+1 (t), and hence, as just shown, it belongs also to G j+1 (t) and therefore not to F 1 (t) ∪ · · · ∪ F j (t).
Let q n (t) be the probability that two fixed, distinct, vertices in G k (t) belong to the same component. By symmetry, this is the same for any pair of vertices, and thus also for a random pair of distinct vertices. Hence, recalling (4.5), q n (t) = E π(G k (t)). (5.43) Lemma 5.8. There exist constants b k , B k > 0 such that, for every n 2, Proof. Fix some t 0 > σ k ; thus ρ k (t 0 ) > 0 by (iii). Then, cf. (4.5), writing . lim Let q := ρ k (t 0 ) 2 /2, say. Then (5.46) shows that if n is large enough, q n (t 0 ) > q. By reducing q, if necessary, we may assume that this holds for every n, since obviously q n (t 0 ) > 0 for every fixed n 2.
For an integer m 0, consider the process defined in Section 2 of all graphs G j (t), j 1 and t 0, but erase all edges and restart at mt 0 ; denote the resulting random graphs by G    = G k (t 0 ); furthermore, the random graphs G k,m , m = 0, 1, . . . , are independent, since they depend on edges arriving in disjoint time intervals.
Consider the process at times i t 0 for integers i. By Lemma 5.7, G k (i t 0 ) dominates what it would have been had no edges arrived by (i − 1)t 0 , which, for i 2, is simply an independent copy of G k (t 0 ) (that is, independent of G k (t 0 ) but identically distributed). Consequently, for any integer M , vertices x and y can be in different components of G k (M t 0 ) only if they are in different components in each of the M copies of G k (t 0 ). Thinking of all values i 1 at once, these copies of G k (t 0 ) are all independent, as they depend on edge arrivals in disjoint time intervals (i − 1)t 0 , i t 0 . Thus, (5.47) Thus, for any t 0, taking M := t/t 0 , which shows (5.44). (In fact, we get B k = e q < e; we can take q arbitrarily small and thus B k arbitrarily close to 1, at the expense of decreasing b k .) Proof of Theorem 2.1(iv). Let C 1 (t) be the component of G k (t) that contains vertex 1. Then, by Lemma 5.8, Furthermore, by Lemma 5.4 and dominated convergence, E C 1 (G k (t))/n → ρ k (t) as n → ∞. Hence, (5.49) implies ρ k (t) 1 − B k e −b k t , which is (2.3).
for every j N . (The case j = N is trivial, since G k (∞) a.s. is connected.) Then, for every j = 1, . . . , N and every t ∈ [t j−1 , t j ], 52) which together with a similar lower bound shows that w.h.p. |C 1 G k (t) /n− ρ k (t)| 2ε for all t 0. Since ε is arbitrary, this shows (2.2).
Assume (5.51) and (5.53) for every j N , and also that C 2 G k (t) > 3εn for some t 0. Choose j with 1 j N such that t ∈ [t j−1 , t j ]. If C 2 (G k (t)) has not merged with C 1 (G k (t)) by time t j , then which contradicts (5.53). If on the other hand these two components have merged, then, using (5.51) and (from (5.50)) that ρ k (t j−1 ) ρ k (t j ) − ε, which contradicts (5.51). Consequently, w.h.p. sup t C 2 G k (t) 3εn.
Proof of Theorem 2.1(v). If t > σ k , then ρ k (t) > 0 by Theorem 2.1(iii). Let δ = ρ k (t)/2. Then, by (i) and (ii), w.h.p. C 1 (G k (t)) > δn, and, simultaneously for every u 0, C 2 (G k (u)) < δn. Assume that these inequalities hold. Then, in particular, the largest component of G k (t) is a unique giant. (Recall the definition from Section 2.2.) Moreover, for every u t, the component C of G k (u) that contains the largest component of G k (t) then satisfies |C| C 1 (G k (t)) > δn > C 2 (G k (u)), (5.56) showing that C is the unique giant of G k (u). Hence, the largest component of G k (t) is w.h.p. a permanent giant. Consequently, if t > σ k , then w.h.p. |C * k (t)| = C 1 (G k (t)) and (2.4) follows from (2.1). On the other hand, if t σ k , then (2.1) and (iii) yield This completes the proof of Theorem 2.1.

5.6.
A corollary. We note the following corollary.  , it is not difficult to prove the much stronger results that if t < σ k is fixed, then there exists a finite constant χ k (t) such that χ(G k (t)) p −→ χ k (t), and if t = σ k is fixed, then there exists a finite constant χ k (t) such that χ(G k (t)) p −→ χ k (t). Furthermore, these limits can be calculated from the branching process X t = X κt on (S, µ k−1 ): if we let |X t | be the total population of the branching process, then χ k (t) = E(|X t |) and χ k (t) = E |X t | 1{|X t | < ∞} . We omit the details.
Proof of Theorem 2.3. Since κ t (x, y) = 0 for every y when x t, a particle of type x t will not get any children at all in the branching process X t,k = X κt , hence has survival probability ρ κ (x) = 0. Thus, recalling (4.9) and (5.14), the survival probability Moreover, even if x < t, there is a positive probability that x has no children in X t,k , and thus there is strict inequality in (5.61) whenever ρ k−1 (t) > 0.
An alternative view of the last part is that, asymptotically, no edges arrive in G k (t) until t = σ k−1 , and even if all edges were passed on to G k (t) from that instant, G k (t) would thenceforth evolve as a simple Erdős-Rényi random graph, developing a giant component only 1 unit of time later, at t = σ k−1 + 1.

Proof of Theorem 1.2
For a and b with 0 a < b ∞, let N k (a, b) be the number of edges that arrive to G 1 (t) during the interval (a, b] and are not passed on to G k+1 (t); furthermore, let W k (a, b) be their total cost. In other words, we consider the edges, arriving in (a, b], that end up in one of T 1 = F 1 (∞), . . . , T k = F k (∞). In particular, for 0 t ∞, and thus Since an edge arriving at time t has cost t/n, we have Lemma 6.1. Let 0 a < b ∞ and k 1. For any ε > 0, w.h.p.
Let F t be the σ-field generated by everything that has happened up to time t. At time t, the fraction of edges arriving to G 1 (t) that are rejected by all of F 1 (t), . . . , F k (t) is simply the fraction lying within a component of F k (t), namely π(G k (t)) (see (4.5)). Since edges arrive to G 1 (t) at a total rate 1 n n 2 = n−1 2 , conditioned on F t , edges are added to F 1 (t) ∪ · · · ∪ F k (t) at a rate, using (4.5), (6.6) By Corollary 5.9, for every fixed t, Condition on the event r k (a) 1 − ρ k (a) 2 + ε n/2, which by (6.7) occurs w.h.p. Then, since r k (t) is a decreasing function of t, the process of edges that are added to F 1 (t) ∪ · · · ∪ F k (t) can for t a be coupled with a Poisson process with constant intensity 1−ρ k (a) 2 +ε n/2 that is thicker (in the sense defined just before Lemma 5.7). Thus, letting Z be the number arriving in the latter process in (a, b], we have w.h.p.
Furthermore, by the law of large numbers, w.h.p.
For the lower bound, we stop the entire process as soon as r k (t) < 1 2 1 − ρ k (b) 2 − ε n. Since r k (t) is decreasing, if the stopping condition does not hold at time t = b then it also does not hold at any earlier time, so by (6.7), w.h.p. we do not stop before b. As long as we have not stopped, we can couple with a Poisson process with constant intensity 1 − ρ k (b) 2 − ε n/2 that is thinner (i.e., opposite to thicker), and we obtain the lower bound in (6.5) in an analogous way as the upper bound.
Proof. Let N 1 and define t j := jb/N . By (6.4) and Lemma 6.1, for every j ∈ [N ], w.h.p. (6.11) where we define the piecewise-constant function f N by Consequently, w.h.p., (6.14) We obtain a corresponding lower bound similarly, using the lower bounds in (6.4) and (6.5).
, which is (6.10). We want to extend Lemma 6.2 to b = ∞. This will be Lemma 6.4, but to prove it we need the following lemma. Lemma 6.3. For any k 1 there exist constants b k , B k > 0 such that, for all t 0, Proof. For any t, recalling that N k counts edges arriving at rate r k (t) and that r k (t) is a decreasing function, we obtain by (6.6), (5.43), and Lemma 5.8, Thus, by (6.4), for b k := b k /2 and some B k < ∞, Hence, for some B k < ∞ and all t 0, 2B k e −b k t by Theorem 2.1(iv), establishing that the integral converges.
Proof of Theorem 1.2. By (6.3) and Lemma 6.4 (with W 0 (0, ∞) = 0), (6.23) Example 6.5. The limit γ k in Theorem 1.2 is thus given by the integral in (6.23). Unfortunately, we do not know how to calculate this, even numerically, for k 2. However, we can illustrate the result with the case k = 1. In this case, ρ 1 (t) is the asymptotic relative size of the giant component in G(n, t/n), and as is well-known, and follows from (5.28) and (4.10) noting that κ t (x, y) = t, σ 1 = 1 and for t > 1, ρ 1 (t) = 1 − e −tρ 1 (t) . The latter function has the inverse t(ρ) = − log(1 − ρ)/ρ, ρ ∈ (0, 1). Hence, by an integration by parts and two changes of variables, with ρ = 1 − e −x , where the final integral can be evaluated using a series expansion. Hence we recover the limit ζ(3) found by Frieze [10].
Remark 6.6. An argument similar to the proofs of Lemmas 6.2 and 6.4 shows that However, since T k has n − 1 edges, we trivially have N k (0, ∞) = k(n − 1) a.s. Hence, for any k 1, (This is easily verified for the case k = 1, by calculations similar to (6.24).) Equivalently, for any k 1 (since (6.26) holds trivially for k = 0 too), Proof of Theorem 1.3. It follows from (6.1) that N k (0, t) k(n − 1) and thus, using also (6.4), W k (0, b) kb. Consequently, Lemma 6.2 and dominated convergence yield, for every b < ∞, (6.28) Hence, (6.28) holds for b = ∞ too by the following routine three-epsilon argument: We have (6.29) where, for any ε > 0, we can make all three terms on the right-hand side less than ε (in absolute value) by choosing first b and then n large enough. The result follows since w(T k ) = W k (0, ∞) − W k−1 (0, ∞), cf. (6.23).
We can now prove Theorem 2.2.
Proof of Theorem 2.2. Let 0 a < b ∞, and let N (a, b) be the total number of edges arriving to G 1 (s) in the interval s ∈ (a, b]. Then N (a, b) ∼ Po n 2 1 n (b − a) , and by the law of large numbers, for any ε > 0, w.h.p.
The number of edges passed to 1 (a, b), and thus it follows from (6.30) and (6.5) that for any ε > 0, w.h.p. In the Poisson (process) model studied so far, we have a multigraph with an infinite number of parallel edges (with increasing costs) between each pair of vertices.
It is also of interest to consider the simple graph K n with a single edge (with random cost) between each pair of vertices, with the costs i.i.d. random variables. We consider two cases, the exponential model with costs Exp(1) and the uniform model with costs U (0, 1). When necessary, we distinguish the three models by superscripts P, E, and U.
We use the standard coupling of the exponential and uniform models: if X E ij ∼ Exp(1) is the cost of edge ij in the exponential model, then the costs are i.i.d. and U (0, 1), and thus yield the uniform model. Since the mapping X E ij → X U ij is monotone, the Kruskal algorithm selects the same set of edges for both models, and thus the trees T 1 , T 2 , . . . (as long as they exist) are the same for both models; the edge costs are different, but since we select edges with small costs, X U ij ≈ X E ij for all edges in T k and thus w(T U k ) ≈ w(T E k ); se Lemma 7.4 for a precise statement. Remark 7.1. We can in the same way couple the exponential (or uniform) model with a model with i.i.d. edge costs with any given distribution. It is easily seen, by the proof below and arguments as in Frieze [10] or Steele [28] for T 1 , that Theorem 1.1 extends to any edge costs X ij that have a continuous distribution on [0, ∞) with the distribution function F (x) having a right derivative F (0+) = 1 (for example, an absolutely continuous distribution with a density function f (x) that is right-continuous at 0 with f (0+) = 1); if F (0+) = a > 0, we obtain instead w(T k ) p −→ γ k /a. This involves no new arguments, so we confine ourselves to the important models above as an illustration, and leave the general case to the reader.
Moreover, we obtain the exponential model from the Poisson model by keeping only the first (cheapest) edge for each pair of vertices. We assume throughout the section this coupling of the two models. We regard also the exponential model as evolving in time, and define G E k (t) and F E k (t) recursively as we did G k (t) and F k (t) in Section 2, starting with G E 1 (t) := G 1 (t), the simple subgraph of K n obtained by merging parallel edges and giving the merged edge the smallest cost of the edges (which is the same as keeping just the first edge between each pair of vertices).
Recall from the introduction that while in the Poisson model every T k exists a.s., in the exponential and uniform models there is a positive probability that T k does not exist, for any k 2 and any n 2. (In this case we define w(T k ) := ∞.) The next lemma shows, in particular, that this probability is o(1) as n → ∞. (The estimates in this and the following lemma are not best possible and can easily be improved.) Lemma 7.2. In any of the three models and for any fixed k 1, w.h.p. T k exists and, moreover, uses only edges of costs 2k log n/n.
Proof. Consider the exponential model; the result for the other two models is an immediate consequence by the couplings above (or by trivial modifications of the proof). The result then says that w.h.p. G E k (2k log n) is connected.
By induction, we may for k 1 assume that the result holds for k − 1. Thus, w.h.p., G E k−1 2(k − 1) log n is connected, and then all later edges are passed to G E k (t). Consider now the edges arriving in (2(k − 1) log n, 2k log n]. They form a random graph G(n, p) with p = e −2(k−1) log n/n − e −2k log n/n = e −2k log n/n e 2 log n/n − 1 As is well-known since the beginning of random graph theory [9], see e.g. [2], such a random graph G(n, p) is w.h.p. connected. We have also seen that w.h.p. this graph G(n, p) is a subgraph of G E k (2k log n). Hence, G E k (2k log n) is w.h.p. connected, which completes the induction. Proof. Since the exponential model is obtained from the Poisson model by deleting some edges, we have by Lemma 5.7 that every edge contained in both processes and contained in F 1 (t)∪· · ·∪F k (t) is also contained in F E 1 (t)∪ · · · ∪ F E k (t); the only edges "missing" from the latter are those that were repeat edges in the Poisson model.
At time t k := 2k log n, how many repeat edges are there? For two given vertices i and j, the number of parallel edges is Po(t k /n), so the probability that it is two or more is p 2 (t k ) := P(Po(t k /n) 2) (t k /n) 2 /2. (We use that the kth factorial moment of Po(λ) is λ k , and Markov's inequality.) Hence, the number of pairs {i, j} with more than one edge is Bi( n 2 , p 2 (t k )), which is stochastically smaller than Bi(n 2 , (t k /n) 2 ), which by Chebyshev's inequality w.h.p. is 2t 2 k = 8k 2 log 2 n. Similarly, the probability that i and j have three or more parallel edges is (t k /n) 3 /6 and thus w.h.p. there are no triple edges in G 1 (t k ). By Lemma 7.2, w.h.p. T P 1 ∪· · ·∪T P k = F 1 (t k )∪· · ·∪F k (t k ), and we have just established that w.h.p. all but at most 2t 2 k of the edges in Since each spanning tree has exactly n − 1 edges, the missing edges are replaced by the same number of other edges, which by Lemma 7.2 w.h.p. also have cost t k /n each, thus total cost at most 2t 3 k = 16k 3 log 3 n/n. Consequently, w.h.p., Having additional edges can never hurt (in this matroidal context), so This yields the first inequality in (7.3), while the second follows from (7.4) together with (7.5) for j k − 1.
Lemma 7.4. For each fixed k, Proof. As said above, T E k and T U k consist of the same edges, with edge costs related by (7.1). Since (7.1) implies 0 2k log n n 2 .

(7.7)
Proof of Theorem 1.1. It follows from Theorem 1.2 and Lemma 7.3 that for each fixed k, w(T E k ) p −→ γ k , and then from Lemma 7.4 that w(T U k ) p −→ γ k , which is Theorem 1.1.
Recall that the corresponding statement for the expectation is false, as E w(T E k ) = E w(T U k ) = ∞ for k 2; see Remark 1.8.

The second threshold
As noted in Example 6.5 we do not know how to calculate the limit γ 2 . However, we can find the threshold σ 2 . In principle, the method works for σ k for any k 2, provided we know ρ k−1 , so we will explain the method for general k. However, we will assume the following: This is, we think, not a serious restriction, for the following reasons. First, (8.1) is easily verified for k = 2, since we know ρ 1 explicitly (see Example 6.5), so the calculation of σ 2 is rigorous. Second, we conjecture that (8.1) holds for all k 2, although we have not proved this. (Cf. what we have proved in Theorem 2.1.) Third, even if this conjecture is wrong and (8.1) does not hold for some k, we believe that the result below is true, and can be shown by suitable modifications of the argument and perhaps replacing ρ k−1 by suitable approximations.

A related problem by Frieze and Johansson
As said in the introduction, Frieze and Johansson [11] recently considered the problem of finding the minimum total cost of k edge-disjoint spanning trees in K n , for a fixed integer k 2. (They used random costs with the uniform model, see Section 7; we may consider all three models used above.) We denote this minimum cost by mst k , following [11] (which uses mst k (K n , X) for the random variable, where X is the vector of random edge costs, and uses mst k (K n ) for its expectation). Trivially, (9.1) and as said in the introduction, it is easy to see that strict inequality may hold when k 2, i.e., that our greedy procedure of choosing T 1 , T 2 , . . . successively does not yield the minimum cost set of k disjoint spanning trees.
We assume in this section that n 2k; then k edge-disjoint spanning trees exist and thus mst k < ∞. (Indeed, K 2k can be decomposed into k Hamilton paths, as shown in 1892 by Lucas [24, pp. 162-164] using a construction he attributes to Walecki. 1 ) Remark 9.1. As observed by Frieze and Johansson [11], the problem is equivalent to finding the minimum cost of a basis in the matroid M k , defined as the union matroid of k copies of the cycle matroid of K n . This means that the elements of M k are the edges in K n , and a set of edges is independent in M k if and only if it can be written as the union of k forests, see e.g. [32,Chapter 8.3]. (Hence, the bases, i.e., the maximal independent sets, are precisely the unions of k edge-disjoint spanning trees. For the multigraph version in the Poisson model, of course we use instead the union matroid of k copies of the cycle matroid of K ∞ n ; we use the same notation M k .) We write r k for rank in this matroid.
Kruskal's algorithm, recapitulated in the introduction, is valid for finding a minimum cost basis in any matroid; see e.g. [32,Chapter 19.1]. In the present case it means that we process the edges in order of increasing cost and keep the ones that are not dependent (in M k ) on the ones already selected; equivalently, we keep the next edge e if r k (S ∪ {e}) > r k (S), where r k is the rank function in M k and S is the set of edges already selected.
Remark 9.2. It follows that the largest individual edge cost for the optimal set of k edge-disjoint spanning trees is at most the largest edge cost for any given set of k edge-disjoint spanning trees. Hence, it follows from Lemma 7.2 that for the random models studied here, the optimal k spanning trees w.h.p. use only edges of cost 2k log n/n. It follows, with only minor modifications of the proofs, that analogues of Lemmas 7.3 and 7.4 hold for mst k for the three different models. Hence, for limits in probability, the three models are equivalent for mst k too.
Moreover, one can similarly show that for any b > 0 there is a constant B such that with probability at least 1 − n −b , the optimal k spanning trees w.h.p. use only edges of cost Bk log n/n. One can then argue as for the minimum spanning tree, see e.g. [12], [25,Section 4.2.3] or [6,Example 3.15], and obtain strong concentration of mst k for any of the three models; in particular Var(mst k ) = o(1), and thus convergence of the expectation E(mst k ) is equivalent to convergence in probability of mst k .
Frieze and Johansson [11] stated their results for the expectation E mst k (for the uniform model), but the results thus hold also for convergence in probability (and for any of the three models).
For k = 2, Frieze and Johansson [11] show that the expectation This is strictly smaller than our estimate for the total cost of two edgedisjoint spanning trees chosen successively, γ 1 + γ 2 . = 1.202 . . . + 3.09 . . . > 1 Lucas introduces the problem as one of "Les Jeux de Demoiselles", namely "Les Rondes Enfantines", a game of children holding hands in a circle repeatedly, never repeating a partner. The conversion between the Hamilton cycles of the game and the Hamilton paths serving as our spanning trees is simple, and Walecki's construction is more naturally viewed in terms of Hamilton paths. For much stronger recent results on Hamilton decompositions, see for example [22]. Tables 1 and 3. This would show that choosing minimum spanning trees one by one is not optimal, even asymptotically, except that our estimates are not rigorous. The following theorem is less precise but establishes rigorously (subject to the numerical solution to (8.24) giving σ 2 as in (8.26)) that the values are indeed different.
With µ 2 defined by the limit in (9.2), this can be restated in the following equivalent form. The proof of the theorem is based on the fact that many edges are rejected from T 1 and T 2 after time σ 2 , but none is rejected from the union matroid until a time c 3 , and c 3 (which we will show is the threshold for appearance of a 3-core in a random graph) is later than σ 2 .
We begin with three elementary lemmas that are deterministic, and do not assume any particular distribution of edge costs; nevertheless, we use the same scaling of time as before, and say that an edge with cost w is born at time nw. (Lemma 9.5 has been used in several works, including [11], in the study of minimum spanning trees.) Lemma 9.5. Suppose that we select N edges e 1 , . . . , e N , by any procedure, and that e i has cost w i . Let N (t) := |{i : w i t/n}|, the number of selected edges born at or before time t. Then the total cost is For the next lemma, recall from Remark 9.1 that r k is rank in the union matroid M k . We consider several (multi)graphs with the same vertex set [n], and we define the intersection G ∩ H of two such graphs by E(G ∩ H) := E(G) ∩ E(H). (We regard the multigraphs as having labelled edges, so parallel edges are distinguishable.) Note too that the trees T i in the lemma are arbitrary, not necessarily the trees T i defined in Section 2.1.
Lemma 9.6. Consider K ∞ n with any costs w e 0. Suppose that T 1 , . . . , T k are any k edge-disjoint spanning trees. For t 0, let G(t) be the graph with edge set {e ∈ E(K ∞ n ) : w e t/n}, and let N (t) := e G(t) ∩ (T 1 ∪ · · · ∪ T k ) . Then, N (t) r k (G(t)) for every t, and Proof. First, N (t) is by definition the number of edges in E G(t) ∩ (T 1 ∪ · · ·∪T k ) , an independent (with respect to M k ) subset of E(G(t)), and thus N (t) r k (G(t)), as asserted. Now apply Lemma 9.5, taking N = k(n − 1), taking the edges e 1 , . . . , e N to be the N edges in T 1 ∪ · · · ∪ T k , and noting that the definition of N (t) in Lemma 9.5 matches that here. This yields Next, as a special case, consider a collection of k spanning treesT 1 , . . . ,T k with minimum total cost. (Since we are in a deterministic setting, such a collection may not be unique.) We may assume that they are found by Kruskal's algorithm, and thus, for every t, the set of edges in G(t)∩(T 1 ∪· · ·∪ T k ) is a maximal set of independent edges in G(t) (independent with respect to M k ), hence the number of these edges is N (t) = r k (G(t)). Consequently, Lemma 9.5 yields The result (9.4) follows by subtracting (9.5) from (9.6).
Lemma 9.7. Let the multigraph G be a subgraph of K ∞ n and assume that the (k + 1)-core of G is empty for some k 1. Then the edge set E(G) is a union of k disjoint forests. In other words, r k (G) = e(G).
The properties in two last sentences are equivalent by Remark 9.1.
Proof. We use induction on |G|; the base case |G| = 1 is trivial.
If |G| > 1 and |G| has an empty (k + 1)-core, then there exists a vertex v in G of degree d(v) k. Let G be G with v and its incident edges deleted. By the induction hypothesis, E(G ) is the union of k edge-disjoint forests F 1 , . . . , F k . These forests do not contain any edge with v as an endpoint, so we may simply add the first edge of v to F 1 , the second to F 2 , and so on, to obtain the desired decomposition of E(G).
Alternatively, the lemma follows easily from a multigraph version of a theorem of Nash-Williams [26], appearing also as [32,Theorem 8.4.4]; specifically, the matroidal proof in [32] extends to multigraphs. This theorem hypothesizes that G is "sparse", meaning that for every vertex subset A, e(G[A]) k(|A| − 1), but this follows from our hypothesis. If G has empty core, so does G[A], thus G[A] has a vertex v of degree k, whose deletion leaves another such vertex, and so on until there are no edges, showing that e(G[A]) k(|A| − 1).
Proof of Theorem 9.3. By Lemmas 7.3-7.4 and Remark 9.2, the choice of model does not matter; for convenience we again take the Poisson model.
10. Conjectured asymptotics of ρ k (t) As discussed in Section 3, γ k ∼ 2k for large k, see for example Corollary 1.7. Moreover, simulations (see Section 11) suggest that the functions ρ k (t) converge, after suitable translations. If so, and assuming suitable tail bounds, (6.26) implies that the translations should be by 2k, up to an arbitrary constant plus o(1); this is formalized in Conjecture 1.4.
It is easy to see that this, together with suitable tail bounds justifying dominated convergence, by (6.23) and (6.27) would imply Recall that ρ k (t) is given by Lemma 5.4 as the survival probability of the branching process X t defined in Section 4.4 with kernel κ t (x, y) on the probability space (R + , µ k−1 ) where µ k−1 has the distribution function ρ k−1 (t). More generally, we could start with any distribution function F (t) on R + and the corresponding probability measure µ and define a new distribution function Ψ(F )(t) as the survival probability ρ(κ t ; µ). This defines a map from the set of distribution functions (or probability measures) on [0, ∞) into itself, and we have ρ k = Ψ(ρ k−1 ). If one could show that Ψ is a contraction for some complete metric (perhaps on some suitable subset of distribution functions), then Banach's fixed point theorem would imply the existence of a unique fixed point ρ ∞ , and convergence of ρ k to it. However, the mapping Ψ is quite complicated, and we leave the possible construction of such a metric as an open problem.
Recall also that t = σ k is where ρ k (t) becomes non-zero, see Theorem 2.1(iii). Hence, Conjecture 1.4 suggests also the following, related conjecture.
Conjecture 10.1. There exists a real constant σ ∞ such that as k → ∞, In particular,

Computational results
11.1. Naive simulations. For intuition and as a sanity check on all calculations, we first directly simulate the problem described in the introduction's Poisson edge-weight model. Specifically, we take a graph with n vertices and random edge weights (i.i.d.exponential random variables with mean 1), find the MST, add fresh exponentials to the weights of the MST edges, and repeat to get the second and subsequent MSTs. For each MST, we plot each edge's rank within the MST, divided by n (so, 1/n for the first edge, up to (n − 1)/n for the last) on the vertical axis, against the edge's weight (multiplied by n in accordance with our time scaling) on the horizontal axis.
The results are shown in Figure 2. The corresponding estimates of γ k , for k up to 5, are 1.197, 3.055, 5.035, 7.086, 9.100. This was done for just a single graph with n = 4 000, not averaged over several graphs. For a sense of the limited accuracy of the estimates, remember that γ 1 = ζ(3) = 1.2020 . . ..

Better simulations.
Better simulations can be done with reference to the model introduced in Section 2.1 and used throughout. We begin with k empty graphs of order n. At each step we introduce a random edge e and, in the first graph G i for which e does not lie within a component, we merge the two components given by its endpoints. (If this does not occur within the k graphs under consideration, we do nothing, just move on to the next edge.) For each graph we simulate only the components (i.e., the sets of vertices comprised by each component); there is no need for any more detailed structure. The edge arrivals should be regarded as occurring as a   Poisson process of intensity (n − 1)/2 but instead we simply treat them as arriving at times 2/n, 4/n, etc. Figure 3 depicts the result of a single such simulation with n = 1 000 000, showing for each k from 1 to 5 the size of the largest component of G k (as a fraction of n) against time.
A larger simulation, using 10 simulations each with n =10M, and up to time t = 40 (i.e., 200M steps), supports Conjecture 1.6 that γ k = 2k − 1 + o(1); see Figure 4. 11.3. Estimates of the improved upper bound. The differential equation system (3.18), giving the improved upper bound of Section 3.3, is easy to solve numerically. We did so as a discrete-time approximation, setting g k (t + ∆t) = g k (t) + 1 2 ∆ g k−1 (t) 2 − g k (t) 2 , using ∆t = 0.000 01 and considering k up to 50. Figure 5 shows the results up to time t = 10. Because g k (t) pertains to a model in which all edges of F k are imagined to be in a single component, this plot is comparable both to that in Figure 2 (which counts all edges) and to those in Figure 3 (which counts edges in the largest component) and Figure 7 (the theoretical giant-component size).

Figure 5.
Values g k (t) plotted against t. The function g 7 (t) is just rising from 0 within the plot range; the values of g k (t) for larger k are too close to 0 to be seen. Table 2 and Figure 6 show the corresponding upper bounds on Γ k . Specifically, the bound on Γ k from (3.19), call it Γ k , is estimated as Γ k . = 1 2 ∆ t∈T t(1 − g k (t) 2 ) where T = {0, ∆, 2∆, . . .}. Since we cannot sum to infinity, we terminate when the final g k under consideration is judged sufficiently close to 1, specifically within 0.000 000 1 of 1. It appears experimentally that the gap 1−g k (t) decreases exponentially fast (very plausible in light of (2.3)) so termination should not be a large concern; see also (6.15).  Table 2. Upper bounds on Γ k obtained from numerical solution of (3.19). Figure 6 suggests that the gaps Γ k − k 2 level off at about 0.743. (Beyond about k = 25 the gaps decrease, but using ∆t = 0.000 1 they continued to increase, and in either case the degree of change is comparable with ∆t and thus numerically unreliable.) This suggests the following conjecture. (Recall from (3.11) that Γ k k 2 .) Conjecture 11.1. For every k 1, Γ k Γ k k 2 +δ for some constantδ. We established in Section 3.3 that Γ k Γ k , so only Γ k k 2 +δ is conjectural. If the conjecture holds, then it follows, using also (3.11), that γ k = Γ k − Γ k−1 (k 2 ) − ((k − 1) 2 +δ) = 2k − 1 −δ and γ k = Γ k − Γ k−1 (k 2 +δ) − (k − 1) 2 = 2k − 1 +δ. Hence, the conjecture would imply 2k − 1 −δ γ k 2k − 1 +δ. (11.1) In particular, if Conjecture 11.1 holds withδ 1 as it appears, then 2k−2 γ k 2k.

11.4.
Estimates of the fixed-point distributions ρ k . We also numerically estimated the distributions ρ k ; recall from Theorem 2.1 that C 1 (G k (t))/n p −→ ρ k (t). We may begin with either ρ 0 (t), which is 0 for t < 0 and 1 for t 0, or with ρ 1 (t), which as described in Example 6.5 is the inverse function of − log(1 − ρ)/ρ. (Both choices gave similar results, the latter being slightly preferable numerically.) We use ρ k−1 to obtain ρ k , following the branching process described in Section 4.4. The survival probability ρ t (x) at time t of a particle born at time x in the branching process equivalent of G k is given by the function ρ t = ρ κ which (see (4.10)) is the largest fixed point of (11.2) (the time t is implicit in the kernel κ = kk t thus in the operators Φ κ and T κ ), where (see (4.6)) T κ is given by T κ f (x) = S κ(x, y)f (y) dµ(y). With reference to the kernel κ defined in (5.20), T κ f (x) = 0 for x > t, while otherwise, with µ = ρ k−1 as in (5.14), Given t, to find ρ t numerically we iterate (11.2), starting with some f known to be larger than ρ t and repeatedly setting f (x) equal to Φ κ f (x); this gives a sequence of functions that converges to the desired largest fixed point ρ t , cf. [4,Lemma 5.6]. We will estimate ρ t for times i ∆t, for ∆t some small constant and i = 0, . . . , I, with I∆t judged to be sufficient time to observe all relevant behavior. We initialize with f ≡ 1 to find ρ I∆t , then iteratively initialize with f = ρ i∆t to find ρ (i−1)∆t . Since the branching process is monotone in t -each vertex can only have more children by a later time t -so is the survival probability, thus ρ i∆t is larger than ρ (i−1)∆t and therefore a suitable starting estimate. In practice we find that the process converges in 20 iterations or so even for ρ I∆t , and less for subsequent functions ρ i∆t , with convergence defined as two iterates differing by at most 10 −8 for any x.
For each k in turn, we do the above for all times t, whereupon the desired function ρ k (t) def = ρ(κ) = ρ(κ t ) is given by (see (4.9) and (5.28)) ρ k (t) = ∞ 0 ρ t (x) dρ k−1 (x). (11.4) Do not confuse ρ t of (11.2) and ρ k of (11.4), respectively the ρ κ and ρ of (4.8); see also (5.28) and the comment following it. All the calculations were performed with time (t, x, and y) discretized to multiples of ∆t = 0.01 and restricted to the interval [0, 10]. For a fixed t, the calculation in (11.3) can be done efficiently for all x. The derivative of (11.3) with respect to x is − x 0 f (y) dρ k−1 (y) (cf. (8.3) and (8.4)). So, given the value of (11.3) for some x, that at the next discrete x is the discrete sum corresponding to this integral, and in one pass we can compute these integrals (discretized to summations) for all x. Each computed ρ k (t) is translated by 2k to keep the functions' interesting regimes within the time range [0, 10], before doing the computations for k + 1, but these translations are reversed before interpreting the results.
The first observation is that the estimates of ρ k are consistent with Conjecture 1.4. As shown in Figure 7, even the first few functions ρ k have visually very similar forms.
To make a more precise comparison, we time-shift each function ρ k so that it reaches the value 1 − e −1 at time t = 4 (arbitrarily chosen). Figure 8 shows the thus-superposed curves for ρ 1 , ρ 2 , and ρ 1 000 ; the curve ρ 5 (not shown) is already visually indistinguishable from ρ 1 000 .
Estimates for γ k , obtained from those for ρ k via (6.23), are shown in Table 3. Estimates of γ k for large k were deemed numerically unreliable for two reasons. First, discretization of time to intervals of size ∆t = 0.01 is problematic: the timing of ρ 1 is uncertain to this order, that of ρ 2 additionally uncertain by the same amount, and so on, and translation of ρ k directly affects the corresponding estimate of γ k . Second, the time range t ∈ [0, 10] (translated as appropriate) used in the computations proved to be too narrow, in that for large k the maximum value of ρ k observed only  about 0.9975, and the gap between this and 1 may be enough to throw off the estimates of γ k perceptibly.  Table 3. Estimates of γ k from (6.23).

Open questions
We would be delighted to confirm the various conjectures above, in particular Conjectures 1.4-1.6, and get a better understanding of (and ideally a closed form for) ρ ∞ (provided it exists).
It is also of natural interest to ask this kth-minimum question for structures other than spanning trees. Subsequent to this work, the length of the kth shortest s-t path in a complete graph with random edge weights has been studied in [13]. The behavior is quite different: the first few paths cost nearly identical amounts, while [13] gives results for all k from 1 to n − 1.
The "random assignment problem" is to determine the cost of a minimumcost perfect matching in a complete bipartite graph with random edge weights, and a great deal is known about it, by a variety of methods; for one relatively recent work, with references to others, see [31]. It would be interesting to understand the kth cheapest matching.
It could also be interesting to consider other variants of all these questions. One, in the vein of [11], is to consider the k disjoint structures which together have the smallest possible total cost. Another is to consider a second structure not disjoint from the first, but differing in at least one element, either of our choice or as specified by an adversary.