A Modification of the Random Cutting Model

We propose a modification to the random destruction of graphs: Given a finite network with a distinguished set of sources and targets, remove (cut) vertices at random, discarding components that do not contain a source node. We investigate the number of cuts required until all targets are removed, and the size of the remaining graph. This model interpolates between the random cutting model going back to Meir and Moon and site percolation. We prove several general results, including that the size of the remaining graph is a tight family of random variables for compatible sequences of expander-type graphs, and determine limiting distributions for binary caterpillar trees and complete binary trees.


1
Introduction and structure of the paper Investigating the behaviour of trees when randomly removing vertices was first done by Meir and Moon in [28]. The process starts with a rooted tree, where at every step a uniformly chosen vertex is deleted, and all remaining components that do not include the root vertex are discarded. Since the process naturally stops once the root node has been cut, the question of interest is the random number of cuts needed to reach this state. In [28], the expected value and the variance of this random variable for a random labelled tree are found. Since this initial paper, the model has been considered for different types of random and deterministic graphs, such as in the works of Panholzer [29,30], Janson [20,21] (see also [2] for an alternative approach by Addario-Berry, Broutin and Holmgren), or Holmgren [18,19].
The random cutting model relates to both the record problem (as was observed in [21]) and to fragmentation processes and their Cut-trees, see e.g. [4], [5], or [8]. Moreover, it connects to Union-Find algorithms, see [31] and the further work in [13,15,22,23].
In recent years, modifications of the original cutting model have been discussed: Kuba and Panholzer regarded the case of isolating a leaf, a general node, or multiple nodes simultaneously (instead of isolating the root) in [24,25,26], and Cai, Holmgren et al. proposed and investigated the k-cut model in [6,10,11], where a node is only removed after it has been cut for the k-th time.
In this paper, a different modification of the cutting model is introduced, where, additionally to one or several root vertices (which we will call sources) a second set of vertices (targets) are given. This allows for defining a stopping time for the cutting procedure on the graph by looking at the first moment when all of the targets have been removed (i.e. the

4:2
A Modification of the Random Cutting Model sources have been separated from the targets) -see Section 2 for the detailed definitions. We can then ask several natural questions, such as about the number of cuts necessary to separate the two sets and about the size of the remaining graph. Section 3 will contain several basic estimates, and we will formalise the imprecise notion that separation interpolates between the cutting model and site percolation (Propositions 1 and 3). This requires the right definition that enables us to approximate such a graph by finite graphs, all while respecting the choice of sources and targets, see Definition 2.
In Section 4 we obtain the probability for a fixed subgraph to be the remaining graph at the time of separation. This leads to Theorem 9, which could be regarded as the main result of the paper and gives sufficient conditions for the size of the graph at separation to be a tight sequence of random variables when the graph approximates a locally finite infinite graph in the sense of Definition 2.
For the final section, our scope will focus on rooted trees, since their recursive nature can be used to simplify many of the arguments and calculations. The arguments here relate to earlier work by Devroye et al., [12,14]. We consider the separation sizes and separation times for complete binary trees as an illustrating example, and finish by indicating potential future research directions.
In order to comply with the requirements for this extended abstract, proofs of auxiliary statements have been omitted, but can be found in the full version of this paper, [9]. Moreover, the full version contains several examples that were not included here.

2
Cutting procedures

Some notation
We will always use G = (V (G), E(G)) to denote a graph, consisting of its vertex and edge set, but will shorten the notation to V = V (G) and E = E(G) if there is no ambiguity from the context. Since most subgraphs we will consider are induced and therefore uniquely determined by their vertex set, we will not distinguish between an induced subgraph and its vertex (sub-)set. If two vertices v, w are neighbours, we also use the notation v ∼ w. More generally, we write dist(v, w) for the graph distance between vertices v, w. In the case where A, B ⊆ V (G) are subsets, dist(A, B) is to be understood as min{dist (v, w) : v ∈ A, w ∈ B}.
Given any set A ⊆ V (G) and a fixed set S of source nodes, we define the closure of A to be clos S (A) := A ∪ S ∪ {v ∈ V (G) : v ∼ w for some w ∈ A}. The (exterior) boundary of A is defined as ∂ S A := clos S (A) \ A. In other words, the vertices in ∂ S A are precisely the vertices not in A that are in S or neighbour some vertex in A. Note that this implies e.g. ∂ S ∅ := S.

Cutting and separation
Consider a finite simple connected graph G = (V, E) with a distinguished subset S ⊆ V whose vertices are referred to as sources. Now, proceed as follows: 1. Choose a vertex v ∈ V uniformly at random, and remove it -together with all edges incident to v -from the graph. This will potentially split the graph into connected components, in which case we only keep the components containing sources, regarding them jointly as a new (potentially disconnected graph). 2. Iterate step 1, where the randomness in choosing the node is assumed to be independent from everything that happened previously.

Figure 1
The cutting procedure on a graph G from top left to bottom right, with source nodes in black and target nodes in white. The symbol * in a vertex indicates that this is the vertex about to be cut in the next step. In this example, S(G) = 5, C(G) = 6 (after the last source node has been cut) and G S is a path on three vertices.

The continuous-time model
As has been observed previously by [21] and since been brought to effective use, the above cutting model is equivalent to a model where each node is equipped with a random alarm clock whose alarm triggers after time X v , v ∈ V . Whenever an alarm rings, and the corresponding node is still in the graph at this time, that node will be removed together with any new components that do not contain source nodes. Here, to ensure equivalence to the discrete-time cutting model above, we assume that (X v ) v∈V is an i.i.d. family of Exp(1)-distributed random variables -while any i.i.d. family of continuous random variables would suffice, the memoryless property will be useful later.

4:4 A Modification of the Random Cutting Model
This once again yields a monotone stochastic process of subgraphs of G, but now parametrised by continuous time, (G c t ) t∈[0,∞) . We will denote this process by Cut c (G). However, G t will only attain finitely many different graphs, and we will still denote those graphs by G 0 , G 1 , G 2 , ... in order of occurrence, as before. Hence, we can denote by G t − the graph that was attained by Cut c (G) immediately before time t; so, G t − = G t iff no cut happened at time t.
Note that there are two ways of generalising the random variables C and S to the continuous-time setting: By default, C and S respectively denote the quantities exactly as before, while C c and S c denote respectively.

Cutting and site percolation
The following proposition asserts that the additional freedom of choosing target nodes for the separation number can be used to obtain the original cutting number. In other words, S(G) can be understood as a generalisation of C(G).

▶ Proposition 1. Let G = (V, E) be a finite connected graph, and let S, T ⊆ V be the sets of source and target nodes, respectively. Then, we have
Moreover, all of those statements also hold true for S c (G) and C c (G) in the continuous-time model.
We remark that therefore, S(G) = C(G) holds in distribution if and only if it holds deterministically.
Proof. At time C(G), the remaining graph is empty, so separation must have occurred already. Thus S(G) ≤ C(G). If S ⊆ T , then separation will occur as soon as the last source node has been removed, at which time the remaining graph will be empty. Thus S(G) = C(G). Conversely, if there exists v ∈ S \ T then G S contains v with some positive probability p 0 . If this happens, For the continuous-time model, only the last argument requires modification: Once again, We now show that in a certain sense, the continuous-time separation model on an infinite graph G with infinite distance dist(S, T ) contains the site percolation model on G.
More precisely, recall that for Bernoulli site percolation in an infinite graph G, every node is independently kept with some probability p ∈ [0, 1] and otherwise rejected, thus giving a random subgraph of G. We denote by perc S (p) the probability that the Ber(p)-site percolation on G exhibits an infinite cluster containing at least one vertex of S. ▶ Definition 2. Let G be a locally finite, infinite connected graph, containing two subsets S, T ⊆ V (G). We say that the sequence (G (n) ) n≥1 of finite induced subgraphs of G exhausts G if the following conditions are satisfied: (i) The G (n) are connected subgraphs satisfying (ii) The set S is entirely contained in G (n) for all n (and understood to be the set of source nodes of G (n) ), and each subgraph G (n) is endowed with the target set Observe that the target nodes T (n) are indeed a subset of V G (n) and will be non-empty even if T = ∅. Moreover, condition (ii) necessitates that S is finite.
▶ Proposition 3. Let G be a locally finite, infinite graph with a finite set of source nodes, S, and let T = ∅. Assume that (G (n) ) n≥1 exhausts G. Then, Proof. Note first that if T = ∅, then (using the notation of Definition 2) dist S, Recall next that independently removing each vertex v of G at a random time X v ∼ Exp(1) gives rise to the monotonous coupling of Bernoulli site percolation for all parameters p ∈ [0, 1] (cf. [27, p.138]). Indeed, at time x ∈ [0, ∞], the graph we observe that way is a sample of Ber(e −x )-site percolation on G. We can couple the process obtained in this way to the continuous-time cutting model by restricting our attention to the intersection of G (n) with those percolation clusters that intersect S.
To show ≥ in (1), assume that Ber(e −x )-site percolation exhibits an infinite cluster which intersects S, such cluster necessarily intersects T (n) as well and hence, for each n, contains a path connecting S with T (n) . By the coupling indicated above, this path must then also be present in the sample of the continuous-time cutting model on G (n) at time x. Therefore, perc S (e −x ) ≤ P S c G (n) ≥ x , and letting n tend to ∞ yields For the other inequality, suppose now that Ber(e −x )-site percolation does not exhibit an infinite cluster intersecting S, so that the total mass of clusters intersecting S is bounded by some finite integer, say k. By the second assumption, we have dist(S, T (n) ) > k for all but finitely many n. However, this implies that eventually, the clusters intersecting S cannot intersect T (n) , which, for the coupled cutting procedure, means that separation in G (n) must have occurred before time x. So, Taking the limit superior for n → ∞ yields which implies the existence of the limit and ≤ in (1) after passing to the limit for k → ∞ as well.

Visiting probability of subgraphs and size of the separation graph
Consider a finite simple connected graph G with S, T ⊆ V as usual. The aim of this section is to determine the probability that at some time i ≥ 1, the cutting procedure Cut(G) will produce a specific subgraph G * .
▶ Lemma 4. Fix an induced subgraph G * of G with every component of G * containing at least one source node. Then, for all times t ≥ 0 in the continuous-time cutting model, we have Moreover, consider v * ∈ ∂ S G * . Then ▶ Corollary 5. Fix an induced subgraph G * of G with every component of G * containing at least one source node. Let v * ∈ ∂ S G * . Denote the i-th graph obtained in the cutting process by G i and the i-th cut node by v i . Then and therefore Assuming that T ̸ = ∅, it can be shown (Lemma 12 in [9]) that G * ⊆ G is admissible if and only if G * contains no target nodes and every component of G * contains at least one source node.
Relying on the preceding results, we can establish the following connection between the graph G S at separation and the continuous-time separation number S c : ▶ Proposition 7. Let G * be an admissible subgraph of G. Then, Proof. Fix a vertex v * ∈ ∂ S G * , and assume that this is the last node to be removed for separation to occur. We observe first that by definition of the separation number, any graphs obtained by Cut(G) before separation must have contained a path from v * to T . In particular, the last graph before separation occurred contained such a path, which additionally was not where conditional independence holds true because In light of (3) from Lemma 4, we can now rewrite equation (7) as Finally, observe that, with µ X denoting the distribution of X v * , so that, after plugging in the expression from (8) and using X v * ∼ Exp(1), we obtain which only differs from (6) Moreover, in the first three formulations, the strict inequality ">" is impossible, so one could just as well write "=" there. Additionally, since S c (H[v * ]) ≤ X v * , we have the estimate Recall that a family of real-valued random variables X i , i ∈ I, is tight if for all ε > 0, there exists a constant M such that P[|X i | ≥ M ] < ε for all i ∈ I, cf. (b+1) b+1 and satisfies Then the sizes of the separation graphs, G Observe that, since c : , condition (ii) requires that the radius of convergence, r, of the power series f is at least c. In case r > c, this condition is trivially satisfied as the integrand is bounded. However, in case of equality r = c, f has a singularity at c by virtue of Pringsheim's theorem, cf. [16,Theorem IV.6], and (10) is a non-trivial requirement. We also remark that condition (i) above is a variant of the notion of expander graphs, which play a crucial role in the theory of percolation, see e.g. [3].
Proof. We first show that for fixed m, the sequence a m,n is nondecreasing and eventually constant. Recall from Definition 2 that the sets T (n) consist of two parts, namely T ∩V G (n) and vertices that are incident to edges leaving G (n) . By copying the respective argument from the proof of Proposition 3, we can show again that dist S, T (n) \ T → ∞ as n → ∞. Hence, for n sufficiently large, the closed neighbourhood of S of radius m will become independent of n, and so will a m,n . Moreover, since the G (n) are monotonously growing, any admissible subgraph in G (n) will also be admissible in G (n+1) and the number of boundary vertices will not decrease when changing from G (n) to G (n+1) . Hence a m,n is nondecreasing in n. In particular, the limit a m exists, is finite, and an upper bound to a m,n for all n.
Let M ≥ L. We now apply Proposition 7, where we set p v * (x) := P[S c (H[v * ]) ≥ − ln x] (note that this still depends on G S as well!) for brevity. Summing over all m ≥ M and all A ∈ A m (G (n) ) yields where we applied the estimate (9). Using assumption (i), we get Then, by the above argument on the monotonicity of a m,n , we obtain

4:9
By monotone convergence, we moreover obtain for all M ≥ L that which is finite by assumption (ii). Hence the tail of the series on the left-hand side converges to zero, and therefore P G is a helpful property in order to translate limit laws from the cutting times C G (n) to the separation times S G (n) : Assume that there exist sequences α n and β n > 0 such that for a random variable X with positive variance. If β n → ∞ and G

Separating trees
In this section, let G = T be a rooted tree, where we will always interpret the root node as the (unique) source vertex, and the leaves as targets.
To each node w ∈ V (T ), we assign a polynomial p[w] from Z[x] recursively as follows: If w is a leaf, define p[w](x) = x. Otherwise, denote the children of w by w 1 , ..., w r for r ≥ 1. Then, define Observe that in the case where w only has a single child w 1 , this simplifies to for all x ≥ 0. Equivalently, one can interpret p[T ](q) for q ∈ [0, 1] as the probability that Ber(q)-site percolation on T contains a path from the root to a leaf.

4:10
A Modification of the Random Cutting Model Figure 2 A faithful subtree of an underlying rooted tree T . The root is shown in black, with vertices belonging to the subtrees being shaded grey. Dotted edges and white vertices belong to T , but not to the subgraph. ▶ Corollary 12. We have Moreover, we have for any subtree T * ⊆ T containing the root node but none of the leaves: A transversal in a rooted tree T is defined to be a subset of vertices that intersect every path from the root to a leaf. It then follows from the proof of Proposition 11 that 1−p[T ](1−q) yields the probability that a random set of vertices, containing each vertex independently with probability q, is a transversal of T . It is this expression that was investigated in [14,12].
▶ Example 16. Denote by CBT n the full complete rooted binary tree on 2 n − 1 vertices, this being the rooted binary tree having 2 h vertices at every height h = 0, ..., n − 1. Observe that CBT 1 is the tree consisting of only the root node, and that CBT n+1 splits into two copies of CBT n upon removing the root node. Thus, the associated polynomials satisfy the recurrence relation It can be shown that p[CBT n ] converges pointwise and monotonically decreasing to as n → ∞, for x ∈ [0, 1]. In light of Proposition 3, this result should not be surprising: The sequence of rooted trees CBT n satisfies all the conditions, and the function φ(x) indeed equals the probability that the root node is contained in an infinite cluster of Ber(x)-site percolation, as can be verified independently, e.g. from [17, p.256].
Continuing with our analysis, the probability of the remaining tree at separation being empty now follows handily from Corollary 12: In a similar fashion, we can continue to determine the limiting probability of separation graphs of any size m ≥ 0: Since there are C m = 1 m+1 2m m -many 1 subtrees of the infinite rooted binary tree on m vertices, and each of those has m + 1 boundary vertices, it can be shown using the dominated convergence theorem that Observe that, in the notation of Theorem 9, the sequence a m = 2m m has generating function Note also that a random variable X having the probability distribution defined by (18) does not have a finite first moment: Imitating the approach of (19) leads to where the integral on the right-hand side diverges. 1 These are, of course, the Catalan numbers, [1, OEIS A000108].
A more detailed investigation of how p[T ] depends on the tree T reveals that two trees that only differ in a fringe tree which is rooted far away from the root node will have approximately the same polynomial function p[ · ] over [0, 1], provided a technical condition holds, see [9,Theorem 25]. Using this, it is possible to verify that the same limiting distribution as in (18) also holds if we consider the sequence of complete binary trees on n vertices (of which the full complete binary trees are merely a subsequence): Denote by T n the complete binary tree on n vertices, this being the binary tree having 2 k vertices at height k for 0 ≤ k < ⌊lg n⌋ =: m, with the remaining n − 2 m + 1 vertices at height m in their left-most positions.
Consequently, the limiting distribution of |T n,S | coincides with that of |CBT n,S | and is given by equation (18).
Observe that by this proposition and by (19), the random variables |T n,S | converge in distribution and are therefore tight. By Remark 10 this means that the limit law obtained by Janson in [20, Theorem 1.1] for C(T n ) holds also for S(T n ). More explicitly, if we denote by {x} := x − ⌊x⌋ the fractional part of x ∈ R then we obtain the following

Further questions
We use this final section to present several, deliberately broad questions or remarks that could lead to interesting future research. 1. Determine the asymptotic distributions of S(G (n) ) and G (n) S for other families of deterministic and random trees. The author hopes to answer this for conditioned Galton-Watson trees in a follow-up paper. 2. What happens if the roles of S and T are exchanged? For which graphs and which choices of S, T are the random variables S(G; S, T ) and S(G; T, S) equal in distribution? 3. How to evaluate the asymptotic distribution of S directly, without relying on previous knowledge of C as in Corollary 18? 4. It is easy to see that the edge-cutting process on a graph G is exactly the vertex-cutting process on the line graph of G. This therefore raises the question: How is the separation time on G related to the separation time on the line graph of G? 5. For which sequences of graphs G (n) exhausting a locally finite infinite G (with fixed sources and targets) are the random variables G