Inversions in split trees and conditional Galton--Watson trees

We study $I(T)$, the number of inversions in a tree $T$ with its vertices labeled uniformly at random, which is a generalization of inversions in permutations. We first show that the cumulants of $I(T)$ have explicit formulas involving the $k$-total common ancestors of $T$ (an extension of the total path length). Then we consider $X_n$, the normalized version of $I(T_n)$, for a sequence of trees $T_n$. For fixed $T_{n}$'s, we prove a sufficient condition for $X_n$ to converge in distribution. As an application, we identify the limit of $X_n$ for complete $b$-ary trees. For $T_n$ being split trees, we show that $X_n$ converges to the unique solution of a distributional equation. Finally, when $T_n$'s are conditional Galton--Watson trees, we show that $X_n$ converges to a random variable defined in terms of Brownian excursions. By exploiting the connection between inversions and the total path length, we are able to give results that are stronger and much broader compared to previous work by Panholzer and Seitz.


Inversions in a fixed tree
Let σ 1 , . . . , σ n be a permutation of {1, . . . , n}. If i < j and σ i > σ j , then the pair (σ i , σ j ) is called an inversion. The concept of inversions was introduced by Cramer [14] (1750) due to its connection with solving linear equations. More recently, the study of inversions has been motivated by its applications in the analysis of sorting algorithms, see, e.g., [37,Section 5.1]. Many authors, including Feller [21, pp. 256], Sachkov [52, pp. 29], Bender [7], have shown that the number of inversions in uniform random permutations has a central limit theorem. More recently, Margolius [42] and Louchard and Prodinger [39] studied permutations containing a fixed number of inversions.
The concept of inversions can be generalized as follows. Consider an unlabeled rooted tree T on node set V . Let ρ denote the root. Write u < v if u is a proper ancestor of v, i.e., the unique path from ρ to v passes through u and u = v. Write u ≤ v if u is an ancestor of v, i.e., either u < v or u = v. Given a bijection λ : V → {1, . . . , |V |} (a node labeling), define the number of inversions I(T, λ) def = u<v 1 λ(u)>λ (v) .
Note that if T is a path, then I(T, λ) is nothing but the number of inversions in a permutation. Our main object of study is the random variable I(T ), defined by I(T ) = I(T, λ) where λ is chosen uniformly at random from the set of bijections from V to {1, . . . , |V |}.
The enumeration of trees with a fixed number of inversions has been studied by Mallows and Riordan [41] and Gessel et al. [25] using the so called inversions polynomial. While analyzing linear probing hashing, Flajolet et al. [23] noticed that the numbers of inversions in Cayley trees with uniform random labeling converges to an Airy distribution. Panholzer and Seitz [46] showed that this is true for conditional Galton-Watson trees, which encompasses the case of Cayley trees.
For a node v, let z v denote the size of the subtree rooted at v. The following representation of I(T ), proved in Section 2, is the basis of most of our results. Lemma 1.1. Let T be a fixed tree. Then where {Z v } v∈V are independent random variables, and Z v ∼ Unif{0, 1, . . . , z v − 1}. where Υ(T ) is called the total path length (or internal path length) of T . Let κ k = κ k (X) denote the k-th cumulant of a random variable X (provided it exists); thus κ 1 (X) = E [X] and κ 2 (X) = Var (X) (see [27,Theorem 4.6.4]). We now define Υ k (T ), the k-total common ancestors of T , which allows us to generalize (1.2) to higher cumulants of I(T ). For k nodes v 1 , . . . , v k (not necessarily distinct), let c(v 1 , . . . , v k ) be the number of ancestors that they share, i.e., We define Υ k (T ) where the sum is over all ordered k-tuples of nodes in the tree. For a single node v, h(v) = c(v) − 1, since v itself is counted in c(v). So Υ(T ) = Υ 1 (T ) − |V |; i.e., we recover the usual notion of the total path length. Remark 1.5. An inversion is a special case of a pattern in a permutation. Thus, just as we can study inversions in trees, we can also study other patterns in trees. A recent paper by Albert et al. [2] gives generalizes 1.2 from inversions to any fixed patterns.

Inversions in sequences of trees
The total path length Υ(T ) has been studied for random trees like split trees [9] and conditional Galton-Watson trees [4,Corollary 9]. This leads us to focus on the deviation under some appropriate scaling s(n), for a sequence of (random or fixed) trees T n , where T n has size n.
Fixed trees Theorem 1.6. Let T n be a sequence of fixed trees on n nodes. Let Assume that for all k ≥ 1, for some sequence (ζ 2k ). Then there exists a unique distribution X with such that X n d −→ X and, moreover, E e tXn → E e tX < ∞ for every t ∈ R.
= E e tX are called moment generating functions of X n and X respectively. The convergence ψ See, e.g., [27,Theorem 5.9.5].
As simple examples, we consider two extreme cases.
Example 1.9. When P n is a path of n nodes, we have for fixed k ≥ 1 Thus Υ 2k (P n )/Υ 2 (P n ) k → κ 2k = 0 for k ≥ 2. So by Theorem 1.6, X n converges to a normal distribution, and we recover the central limit law for inversions in permutations. Also, the vertices have subtree sizes 1, . . . , n and so we also recover from Theorem 1.2 the moment generating function n j=1 (e jt − 1)/(j(e t − 1)) [42,52]. Example 1.10. Let T n = S n−1 , a star with n − 1 leaves, and denote the root by o. We have z o = n and z v = 1 for v = o. Hence, by Lemma 1.1, or directly, I(S n−1 ) ∼ Unif{0, . . . , n − 1}, and consequently ]. This follows also by Theorem 1.6, since Υ k (S n−1 ) ∼ n k for k ≥ 2 (e.g., by Lemma 2.3 below).
It is straightforward to compute the k-total common ancestors for b-ary trees. Thus our next result follows immediately from Theorem 1.6. Theorem 1.11. Let b ≥ 2 and let T n be the complete b-ary tree of height m with n = (b m+1 − 1)/(b − 1) nodes. Let where (U d,j ) d≥0,j≥1 are independent Unif[−1/2, 1/2]. Then X n d −→ X and E e tXn → E e tX < ∞, for every t ∈ R. Moreover X is the unique random variable with

Random trees
We move on to random trees. We consider generating a random tree T n and, conditioning on T n , labeling its nodes uniformly at random. The relation (1.2) is maintained for random trees: The deviation of I(T n ) from its mean can be taken to mean two different things. Consider for some scaling function s(n), .
Then X n and Y n each measure the deviation of I(T n ), unconditionally and conditionally. They are related by the identity In the case of fixed trees W n = 0 and X n = Y n , but for random trees we consider the sequences separately.
We consider two classes of random trees -split trees and conditional Galton-Watson trees.
A split tree can be constructed as follows. Consider a rooted infinite b-ary tree where each node is a bucket of finite capacity s. We place n balls at the root, and the balls individually trickle down the tree in a random fashion until no bucket is above capacity. Each node draws a split vector V = (V 1 , . . . , V b ) from a common distribution, where V i describes the probability that a ball passing through the node continues to the ith child. The trickle-down procedure is defined precisely in Section 4. Any node u such that the subtree rooted as u contains no balls is then removed, and we consider the resulting tree T n .
In the context of split trees we differentiate between I(T n ) (the number of inversions on nodes), andÎ(T n ) (the number of inversions on balls). In the former case, the nodes (buckets) are given labels, while in the latter the individual balls are given labels. For balls β 1 , β 2 , write β 1 < β 2 if the node containing β 1 is a proper ancestor of the node containing β 2 ; if β 1 , β 2 are contained in the same node we do not compare their labels. Definê Similarly defineΥ(T n ) as the total path length on balls, i.e., the sum of the depth of all balls. And letX (1.10) Here s 0 is a fixed integer denoting the number of balls in any internal node, and we haveX n = Y n + s 0Ŵn /2 (formally justified in Section 4). The following theorem gives the limiting distributions of the random vector (X n ,Ŷ n ,Ŵ n ). In Section 4.4 we state a similar result for (X n , Y n , W n ) under stronger assumptions. Note that the concepts are identical for any class of split trees where each node holds exactly one ball, such as binary search trees, quad trees, digital search trees and random simplex trees.
Let d 2 denote the Mallows metric, also called the minimal 2 metric (defined in Section 4). Let M d 0,2 be the set of probability measures on R d with zero mean and finite second moment.
Theorem 1.12. Let T n be a split tree and let V = (V 1 , . . . , V b ) be a split vector. Define Assume that P {∃i : V i = 1} < 1 and s 0 > 0. Let (X,Ŷ ,Ŵ ) be the unique solution in M 3 0,2 for the system of fixed-point equations (1.11) Then the sequence (X n ,Ŷ n ,Ŵ n ) defined in (1.10) converges to (X,Ŷ ,Ŵ ) in d 2 and in moment generating function within a neighborhood of the origin.
The proof of Theorem 1.12 uses the contraction method, introduced by Rösler [49] for finding the total path length of binary search trees. The technique has been applied to d-dimensional quad trees by Neininger and Rüschendorf [44] and to split trees in general by Broutin and Holmgren [9]. The contraction method also has many other applications in the analysis of recursive algorithms, see, e.g., [45,50,51]. Remark 1.13. We assume that s 0 > 0, for otherwise we trivially haveX n = 0 and Theorem 1.12 reduces to Theorem 2.1 in [9]. Remark 1.14. In a recent paper, Janson [34] showed that preferential attachment trees and random recursive trees can be viewed as split trees with infinite-dimensional split vectors. Thus we conjecture that the contraction method should also be applicable for these models and give results similar to Theorem 1.12.
Remark 1.15. Assume that the constant split vector V = (1/b, . . . , 1/b) is used and each node holds exactly one ball (a special case of digital search trees, see [15,Example 7]). Then D(V) = −1 and (1.11) has the unique solution (X,Ŷ ,Ŵ ) = (X, X, 0), where X has the limiting distribution for inversions in complete b-ary trees (see Theorem 1.11). This is as expected, as the shape of a split tree with these parameters is likely to be very similar to a complete b-ary tree.

Conditional Galton-Watson trees
Finally, we consider conditional Galton-Watson trees (or equivalently, simply generated trees), which were introduced by Bienaymé [8] and Watson and Galton [55] to model the evolution of populations. A Galton-Watson tree starts with a root node. Then recursively, each node in the tree is given a random number of child nodes. The numbers of children are drawn independently from the same distribution ξ called the offspring distribution.
A conditional Galton-Watson tree T n is a Galton-Watson tree conditioned on having n nodes. It generalizes many uniform random tree models, e.g., Cayley trees, Catalan trees, binary trees, b-ary trees, and Motzkin trees. For a comprehensive survey, see Janson [32]. For recent developments, see [10,17,33,38].
In a series of three seminal papers, Aldous showed that T n converges under re-scaling to a continuum random tree, which is a tree-like object constructed from a Brownian excursion [3,4,5]. Therefore, many asymptotic properties of conditional Galton-Watson trees, such as the height and the total path length, can be derived from properties of Brownian excursions [4]. Our analysis of inversions follows a similar route. In particular, we relate I(T n ) to the Brownian snake studied by e.g., Janson and Marckert [36].
In the context of Galton-Watson trees, Aldous [4,Corollary 9] showed that n −3/2 Υ(T n ) converges to an Airy distribution. We will see that the standard deviation of I(T n ) − 1 2 Υ(T n ) is of order n 5/4 n 3/2 , which by the decomposition (1.9) implies that n −3/2 I(T n ) converges to the same Airy distribution, recovering one of the main results of Panholzer and Seitz [46,Theorem 5.3]. Our contribution for conditional Galton-Watson trees is a detailed analysis of Y n under the scaling function s(n) = n 5/4 . Let e(s), s ∈ [0, 1] be the random path of a standard Brownian excursion, and define C(s, t) We define a random variable, see [31], (1.12) Theorem 1.16. Suppose T n is a conditional Galton-Watson tree with offspring distribution ξ such that E [ξ] = 1, Var (ξ) = σ 2 ∈ (0, ∞), and E e αξ < ∞ for some α > 0, and define where N is a standard normal random variable, independent from the random variable η defined in The moments of η and Y are known [35], see Section 5.
The rest of the paper is organized as follows. In Section 2, we prove Lemma 1.1 and Theorem 1.2. The results for fixed trees (Theorems 1.6, 1.11) are presented in Section 3. Split trees and conditional Galton-Watson trees are considered in Sections 4 and 5 respectively. Sections 4 and 5 are essentially self-contained, and the interested reader may skip ahead.

A fixed tree
In this section we study a fixed, non-random tree T . We begin with proving Lemma 1.1, which shows that I(T ) is a sum of independent uniform random variables.
Proof of Lemma 1.1. We define Z u = v:v>u 1 λ(u)>λ(v) and note that showing (1.1). Let T u ⊆ T denote the subtree rooted at u. It is clear that conditioned on the set λ(T u ), λ restricted to T u is a uniformly random labeling of T u into λ(T u ). Recall that z u denotes the size of T u . If the elements of λ(T u ) are 1 < · · · < zu and if λ(u) = i , then Z u = i − 1. As λ(u) is uniformly distributed, so is Z u .
We prove independence of the Z v by induction on V . The base case |V | = 1 is trivial. Let T 1 , . . . , T d be the subtrees rooted at the children of the root ρ, and condition on the sets λ(T 1 ), . . . , λ(T d ). Given these sets, λ restricted to T i is a uniformly random labeling of T i using the given labels λ(T i ), and these labelings are independent for different i. Hence, conditioning on λ(T 1 ), . . . , λ(T d ), the d families (Z v ) v∈T i are independent, and each is distributed as the corresponding family for the tree T i .

Consequently, by induction, still conditioned on
Hence the family (Z v ) v =ρ of independent random variables is also independent of Z ρ , and thus (Z v ) v∈V are independent. This completes the induction, and thus the proof.
Our first use of the representation in Lemma 1.1 is to prove Theorem 1.2, which gives both a formula for the moment generating function and explicit formulas for the cumulants of I(T ) for a fixed T . The proof begins with a simple lemma giving the cumulants and the moment generating function of Z v in Lemma 1.1, from which Theorem 1.2 will follow immediately.
Proof. This is presumably well-known, but we include a proof for completeness. The moment generating function of Z N is The function (e t − 1)/t is analytic and non-zero in the disc |t| < 2π, and thus has there a well-defined analytic logarithm with f (0) = 0. By (2.4) and (2.5), the cumulant generating function of Z N can be written as and thus, using (2.1), Consequently, and thus by (2.6) Recall that in the introduction, we defined Proof. It is easily seen that Similarly, More generally, Remark 2.4. Observe that all common ancestors of the k vertices must lie on a path; stretching from the last common ancestor to the root. Define a related parameter Υ k (T ) to be the sum over all k-tuples of the length of this path (rather than number of vertices in the path). We call this the k-common path length. Now Υ 1 (T ) = Υ(T ) and Υ 2 (T ) has appeared in various contexts, see for example [31] (where it is denoted Q(T )). Let v 1 ∧ v 2 denote the last common ancestor of the vertices v 1 and v 2 . It is easy to see that, with n = |T |, Remark 2.5. Let S k be a star with k leaves 1 , . . . , k and root o.
Proof of Theorem 1.2. Since cumulants are additive for sums of independent random variables, an immediate consequence of Lemmas 1.1 and 2.1 is that where the last equality follows from Lemma 2.3. The fact that E [I(T )] = 1 2 Υ(T ) was noted already in (1.2).
For the estimate (1.7), note first, e.g. by Taylor expansions, that cosh x ≤ e x 2 /2 for every real x. It follows that if U is any symmetric random variable with |U | ≤ a, then [29, (4.16)] for a more general result.) Lemma 1.1 thus implies, applying which yields (1.7), using also Lemma 2.3.

A sequence of fixed trees
In this section, we study where T n is a sequence of fixed trees and s(n) is an appropriate normalization factor. We start by proving Theorem 1.6, a sufficient condition for X n to converge in distribution when s(n) = Υ 2 (T n ).
Proof of Theorem 1.6. First κ 1 (X n ) = E [X n ] = 0. For k ≥ 2, note that shifting a random variable does not change its k-th cumulant. Also note that Υ k (T n ) Recall that all odd Bernoulli numbers except B 1 are zero. Thus letting ζ k = 0 for all odd k, the assumption that Υ 2k (T n )/Υ 2 (T n ) k → ζ 2k for all k ≥ 1 implies that Since every moment can be expressed as a polynomial in cumulants, it follows that every moment E X k n converges, k ≥ 1. Thus to show that there exists an X such that X n d −→ X, it suffices to show that the moment generating function E e tXn stays bounded for all small fixed t; we shall show that this holds for all real t. In fact, using Lemma 2 Hence, (1.7) yields This and the moment convergence imply the claims in the theorem.

The complete b-ary tree
We prove Theorem 1.11, which asserts that for complete b-ary trees the limiting variable of X n is the unique X for which κ k (X) = B k k b k−1 b k−1 −1 for even k ≥ 2 and zero for odd k. Fix b ≥ 2. In the complete b-ary tree of height m, each node v at depth d ∈ {0, 1, . . . , m} has subtree size It is not difficult to show this rigorously by truncating the sums. Also, it is not difficult to prove Theorem 1.11 by showing that E e tXn → E e tX for all t ∈ R and checking the cumulants of X, using Remark 2.2. But instead we choose the route of computing the k-total common ancestors of b-ary trees and then applying Theorem 1.6.
Proof. The height of T n is m ∼ log b n. It follows from Lemma 2.3 that Similarly, for any fixed k ≥ 2, Proof of Theorem 1.11. Let X n = (I(T n ) − E [I(T n )])/ Υ 2 (T n ). By Lemma 3.1, for fixed k ≥ 1, By Theorem 1.6, there exists a unique distribution X such that moreover, E e tX n → E e tX < ∞ for every t. Recall that, using Lemma 3.1 again, 1/2 X ; then E e tXn → E e tX for every real t and X has cumulants as in (1.8). It is not difficult to show that X has the same distribution as X defined in (3.1) by checking the cumulants of X, using Remark 2.2.

Balanced b-ary trees
We call a b-ary tree balanced if all but the last level of the tree is full and vertices at the last level take the leftmost positions. A simple example of a balanced binary tree is T n in which both the left and right subtrees are complete b-ary trees but the left subtree has one more level than the right subtree. Since the left subtree is of size about 2n/3, and the right subtree is of size about n/3, Theorem 1.11 and Lemma 1.1 imply that where U ∼ Unif[− 1 2 , 1 2 ] and X , X are independent copies of X. The three terms in the limit correspond to inversions involving the root, inversions in the left subtree and inversions in the right subtree.
The above example shows that the limit distribution of X n in a balanced b-ary tree in which each subtree of the root is complete should be U plus a linear combination of independent copies of X. We formalize this observation in the following corollary.
Corollary 3.2. Let T n be a balanced b-ary tree. Let X n and X be as in Theorem 1.11. Let {x}

2)
where i ∈ {0, . . . , b} is a constant. We have Moreover E e tXn → E e tX(b,i) for all t ∈ R. Remark 3.3. Condition (3.2) is equivalent of saying that all the b subtrees of the root of T n except one (either the i-th or the (i + 1)-th) are complete b-ary trees and the exceptional subtree differs from a complete b-ary tree in size by at most o(n/ log(n)).
We will now define split trees introduced by Devroye [16]. The random split tree T n has parameters b, s, s 0 , s 1 , V and n. The integers b, s, s 0 , s 1 are required to satisfy the inequalities We define T n algorithmically. Consider the infinite b-ary tree U, and view each node as a bucket with capacity s. Each node u is assigned an independent copy V u of the random split vector V. Let C(u) denote the number of balls in node u, initially setting C(u) = 0 for all u. Say that u is a leaf if C(u) > 0 and C(v) = 0 for all children v of u, and internal if C(v) > 0 for some proper descendant v, i.e., v < u. We add n balls labeled {1, . . . , n} to U one by one. The j-th ball is added by the following "trickle-down" procedure.
1. Add j to the root.
2. While j is at an internal node u, choose child i with probability V u,i , where (V u,1 , . . . , V u,b ) is the split vector at u, and move j to child i.
3. If j is at a leaf u with C(u) < s, then j stays at u and we set C(u) ← C(u) + 1.
If j is at a leaf with C(u) = s, then the balls at u are distributed among u and its children as follows. We select s 0 ≤ s of the balls uniformly at random to stay at u. Among the remaining s + 1 − s 0 balls, we uniformly at random distribute s 1 balls to each of the b children of u.
Each of the remaining s + 1 − s 0 − bs 1 balls is placed at a child node chosen independently at random according to the split vector assigned to u. This splitting process is repeated for any child which receives more than s balls.
For example, if we let b = 2, s = s 0 = 1, s 1 = 0 and V have the distribution of (U, 1 − U ) where U ∼ Unif[0, 1], then we get the well-known binary search tree.
Once all n balls have been placed in U, we obtain T n by deleting all nodes u such that the subtree rooted at u contains no balls. Note that an internal node of T n contains exactly s 0 balls, while a leaf contains a random amount in {1, . . . , s}. We assume, as previous authors, that P {∃i : V i = 1} < 1. We can assume V has a symmetric (permutation invariant) distribution without loss of generality, since a uniform random permutation of subtree order does not change the number of inversions.
An equivalent definition of split trees is as follows. Consider an infinite b-ary tree U. The split tree T n is constructed by distributing n balls (pieces of information) among nodes of U. For a node u, let n u be the number of balls stored in the subtree rooted at u. Once n u are all decided, we take T n to be the largest subtree of U such that n u > 0 for all u ∈ T n . Let the split vector V ∈ [0, 1] b be as before. Let V u = (V u,1 , . . . , V u,b ) be the independent copy of V assigned to u. Let u 1 , . . . , u b be the child nodes of u. Conditioning on n u and V u , if n u ≤ s, then n u i = 0 for all i; if n u > s, then where Mult denotes multinomial distribution, and b, s, s 0 , s 1 are integers satisfying (4.1). Note that b i=1 n u i ≤ n (hence the "splitting"). Naturally for the root ρ, n ρ = n. Thus the distribution of (n u , V u ) u∈V (U ) is completely defined.

Outline
In this section we outline how one can apply the contraction method to prove Theorem 1.12 but leave the detailed proof to Section 4.2 and Section 4.3. In Section 4.4 we state and outline the proof of the corresponding theorem for inversions on nodes under stronger assumptions.
Let n = (n 1 , . . . , n b ) denote the vector of the (random) number of balls in each of the b subtrees of the root. Broutin and Holmgren [9] showed that, conditioning on n, We derive similar recursions forX n andŶ n . Conditioning on n,Î(T n ) satisfies the recursion whereẐ ρ denotes the number of inversions involving balls contained in the root ρ. Therefore, still conditioning on n, we havê where we use that In Lemma 4.3 below, we show thatẐ where U 1 , . . . , U s 0 are independent and uniformly distributed in [0, 1]. Broutin and Holmgren [9] have shown thatD n (n) a.s.
−→ (V 1 , . . . , V b ) (by the law of large number), we arrive at the following fixed-point equations (already presented in Theorem 1.12) For a random vector X ∈ R d , let X be the Euclidean norm of X.
Recall that M d 0,2 denotes the set of probability measures on R d with zero mean and finite second moment. The Mallows metric on M d 0,2 is defined by Using the contraction method, Broutin and Holmgren [9] proved thatŴ n d 2 −→Ŵ , the unique solution of the first equation of (4.7) in M 1 0,2 .
We can apply the same contraction method to show that the vector (X n ,Ŷ n ,Ŵ n ) d 2 −→ (X,Ŷ ,Ŵ ), the unique solution of (4.7) in M 3 0,2 . But we only outline the argument here since we will actually use a result by Neininger [43] which gives us a shortcut. Assume that the independent vectors X (i) ,Ŷ (i) ,Ŵ (i) , i = 1, . . . , b share some common distribution µ ∈ M 3 0,2 . Let F (µ) ∈ M 3 0,2 be the distribution of the random vector given by the right hand side of (4.7). Using a coupling argument, we can show that for all ν, λ ∈ M 3 0,2 , where c ∈ (0, 1) is a constant. Thus F is a contraction and by Banach's fixed point theorem, (4.7) must have a unique solution (X,Ŷ ,Ŵ ) ∈ M 3 0,2 . Finally, we can use a similar coupling argument to show that (X n ,Ŷ n ,Ŵ n )
We will apply Theorem 4.1 of Neininger [43], which summarizes sufficient conditions for the contraction method outlined in the previous section to work. Since the statement of the theorem is rather lengthy, we do not repeat it here and refer the readers to the original paper.
In particular, is constant if ln V 1 is non-lattice, meaning that d = 0.
The convergence of the toll function can now be deduced from the same result on the total path length from [9], but we include the short argument for completeness. Conditioning on the split vector of the root (V 1 , . . . , V n ) and noting that (n 1 /n, . . . , n b /n) a.s.
where we use that is continuous and has the same period as ln V i . So we havê where λ 1 < λ 2 < · · · < λ s 0 are the labels for the balls in the root, chosen uniformly at random from [n] without replacement. Indeed, the ball with label λ i forms an inversion with the balls with labels {λ : λ < λ i , λ = λ j ∀j < i}, a set of size λ i − i. As |λ i /n − U i | ≤ 1/n, it is clear thatẐ ρ /n = s 0 i=1 (λ i − i)/n converges in the second moment to s 0 j=1 U j . By the triangle inequality, this is also true forẐ ρ /n.

Convergence in moment generating function
To finish the proof of Theorem 1.12, it remains to show following lemma. Lemma 4.4. There exists a constant L ∈ (0, ∞] such that for all fixed t ∈ R 3 with t < L, where · denotes the inner product. If we further assume that P {∃i : V i = 1} = 0, then L = ∞. Remark 4.5. The condition P {∃i : V i = 1} = 0 is necessary for L = ∞. Assume the opposite. By (4.7), for all t ∈ R, where U i are independent Unif[0, 1]. This implies that E e tX = ∞ if we chose t large enough such The proofs of the next two lemmas are similar to Lemma 4.1 by Rösler [49], which deals with the total path length of binary search trees. However, we have extended the result to cover general split trees. Moreover, Lemma 4.7 can be applied not only to inversions and the total path length, but also to any properties of split trees that satisfies the assumptions.
Lemma 4.6. Let C 1 > 0 be a constant. There exists a constant L such that for all t ∈ (−L, L), there exists K t ≥ 0 such that If we further assume that P {∃i : V i = 1} = 0, then L = ∞.
Together with U n ≤ 0, the above inequality implies that for all n ≥ n 0 , t ∈ (−L, L), and K t ∈ R, if L is small enough. On the other hand, we may assume that t = 0 and then if K t is large enough. Together (4.13) and (4.14) implies (4.12). Note that if p = 0, then L can be arbitrarily large.
Lemma 4.7. Let (J n ) n≥1 be a sequence of d-dimensional random vectors. Let (J (i) n ) n≥1 for i = 1, . . . , b be independent copies of (J n ). Let A (i) n be a diagonal matrix with n i /n on its diagonal. Let (B n ) n≥1 be a sequence of random N b → R d functions. Assume that conditioning on n, Further assume that sup n≥1 B n (n) < C 1 and J 1 < C 2 deterministically for some constants C 1 , C 2 and that s 0 > 0. Then there exists a constant L ∈ (0, ∞], such that for all t ∈ R d with t < L, there exists K t ≥ 0, such that If we further assume that P {∃i : V i = 1} = 0, then L = ∞.
Proof. It follows from Lemma 4.6 that there exists an L ∈ (0, ∞], such that for all t with t < L, there exists K t ≥ 0, such that Now we use induction on n. Since J 1 < C 2 , we can increase K t such that (4.15) holds for n = 1.
Proof of Lemma 4.4. Let J n = (X n ,Ŷ n ,Ŵ n ). Then (4.3), (4.5), (4.2) can be written as where A where T denotes the transposition of a matrix. By Lemma 4.1, J n converges in distribution to (X,Ŷ ,Ŵ ). Note that B n (n) is bounded. Therefore Lemma 4.7 implies that there exists an L ∈ (0, ∞] such that for all t ∈ R 3 with t < L, E e t·Jn → E e t·(X,Ŷ ,Ŵ ) < ∞.

Split tree inversions on nodes
We turn to node inversions in a split tree. The main challenge in this context is that the number N of nodes is random in general. Thus we will limit our analysis to split trees satisfying the following two assumptions N n and E [Υ(T n )] = α µ n ln n + n (ln n) + o(n), (4.19) for some constant α ∈ (0, 1] and some continuous periodic function with period d = sup{a ≥ 0 : These two conditions are satisfied for many types of split trees. Holmgren [30] showed that if ln V 1 is non-lattice, i.e., d = 0, then E [N ] /n = α + o(1) and furthermore (4.18) holds. However, in the lattice case, Régnier and Jacquet [48] showed that, for tries (split trees with s 0 = 0 and s = 1) with a fixed split vector (1/b, . . . , 1/b), E [N ] /n does not converge. Thus (4.18) cannot be true for these trees.
Condition (4.19) has been shown to be true for many types of split trees including m-ary search trees [6,11,19,40]. More specifically, Broutin and Holmgren [9] showed that in the non-lattice case, if E [N ] /n = α + O(ln −1−ε n) for some ε > 0, then (4.19) is satisfied. However, Flajolet et al. [24] showed that even in the non-lattice case, there exist tries with some very special parameter values where E [n] /n − α tends to zero arbitrarily slowly.
We have the following theorem that is similar to Theorem 1.12.
Assume that P {∃i : V i = 1} < 1. Let D(V) be as in (4.6). Let (X, Y, W ) be the unique solution in M 3 0,2 for the system of fixed-point equations . If s 0 > 0, then (X n , Y n , W n ) also converges to (X, Y, W ) in moment generating function within a neighborhood of the origin.
The convergence in Mallows metric again follows from Neininger [43,Theorem 4.1]. We leave the details to the reader as it is rather similar to inversions on balls. However, we emphasize that the assumption (4.19) is needed to argue that For convergence in moment generating function, note that s 0 > 0 implies N ≤ n and Z ρ /n ≤ 1. Therefore, we can again apply Lemma 4.7 as in Section 4.3.

A sequence of conditional Galton-Watson trees
Let ξ be a random variable with E [ξ] = 1, Var ξ = σ 2 < ∞, and E e αξ < ∞ for some α > 0, (The last condition is used in the proof below, but is presumably not necessary.) Let G ξ be a (possibly infinite) Galton-Watson tree with offspring distribution ξ. The conditional Galton-Watson tree T ξ n on n nodes is given by P T ξ n = T = P G ξ = T G ξ has n nodes for any rooted tree T on n nodes. The assumption E [ξ] = 1 is justified by noting that if ζ is such that P {ξ = i} = cθ i P {ζ = i} for all i ≥ 0 then T ξ n and T ζ n are identically distributed; hence it is typically possible to replace an offspring distribution ζ by an equivalent one with mean 1, see [32,Sec. 4].
We fix some ξ and drop it from the notation, writing T n = T ξ n .
In a fixed tree T with root ρ and n total nodes, for each node v = ρ let Q v ∼ Unif(−1/2, 1/2), all independent, and let Q ρ = 0. For each node v define In other words, Φ u is the sum of Q v for all v on the path from the root to u. For each v = ρ also define Z v = (Q v + 1/2)z v , where z v denotes the size of the subtree rooted at v. Then Z v is uniform in {0, 1, . . . , z v − 1}, and by Lemma 1.1, the quantity is equal in distribution to the centralized number of inversions in the tree T , ignoring inversions involving ρ. The main part (1.13) of Theorem 1.16 will follow from arguing that for a conditional Galton-Watson tree T n , J(T n ) Indeed, under the coupling of Q v and Z v above, and similarly J(T n ) > I * (T n ) − n. As ρ contributes at most n inversions to I(T n ), it follows from the triangle inequality that |J(T n ) − (I(T n ) − Υ(T n )/2)| ≤ 2n = o(n 5/4 ). Thus (5.1), once proved, will imply that The quantity J(T n ) and the limiting distribution (5.1) have been considered by several authors. In the interest of keeping this section self-contained, we will now outline the proof of (5.1) which relies on the concept of a discrete snake, a random curve which under proper rescaling converges to a Brownian snake, a curve related to a standard Brownian excursion. This convergence was shown by Gittenberger [26], and later in more generality by Janson and Marckert [36], whose notation we use.
Define f : {0, . . . , 2(n − 1)} → V by saying that f (i) is the location of a depth-first search (under some fixed ordering of nodes) at stage i, with f (0) = f (2(n − 1)) = ρ. Also define V n (i) = d(ρ, f (i)) where d denotes distance. The process V n (i) is called the depth-first walk, the Harris walk or the tour of T n . For non-integer values t, V n (t) is given by linearly interpolating adjacent values. See Figure 1.
In other words, R n (t) takes the value of node f ( t ) or f ( t ), whichever is further from the root. We can recover J(T n ) from R n (t) via Indeed, for each non-root node v there are precisely two unit intervals during which R n (t) draws its value from v, namely the two unit intervals during which the parent edge of v is being traversed. Now, since Q v ∼ Unif(−1/2, 1/2) we have |R n (i) − R n (i − 1)| ≤ 1/2 for all i > 0 and J(T n ) n 5/4 = where r n (s) def = n −1/4 R n (2(n − 1)s). Also normalize v n (s) def = n −1/2 V n (2(n − 1)s). Theorem 2 of [36] (see also [26]) states that (r n , v n ) d −→ (r, v) in C[0, 1] × C[0, 1], with r, v to be defined shortly.
Before defining r and v, we will briefly motivate what they ought to be. Firstly, as the offspring distribution ξ of T n satisfies E [ξ] = 1, we expect the tour V n to be roughly a random walk with zeromean increments, conditioned to be non-negative and return to the origin at time 2(n − 1), and the limiting law v ought to be a Brownian excursion (up to a constant scale factor). Secondly, consider a node u and the path ρ = u 0 , u 1 . . . , u d = u, where d is the depth of u. We can define a random walk Φ u (t) for t = 0, . . . , d by Φ u (0) = 0 and Φ u (t) = t i=1 Q u i for t > 0, noting that Φ u = Φ u (d). Under rescaling, the random walk Φ u (t) will behave like Brownian motion. For any two nodes u 1 , u 2 with last common ancestor at depth m, the processes Φ u 1 , Φ u 2 agree for t = 0, . . . , m, while any subsequent increments are independent. Hence Cov(Φ u 1 , Φ u 2 ) = cm for some constant c > 0. Now, for any i, j ∈ {0, . . . , 2(n − 1)}, the nodes f (i), f (j) at depths V n (i), V n (j) have last common ancestor f (k), where k is such that V n (k) is minimal in the range i ≤ k ≤ j. Hence r(s) should be normally distributed with variance given by v(s), and the covariance of r(s), r(t) proportional to min s≤u≤t v(u).
The constant 1/12 appears as the variance of the random increments Q v . Again, Theorem 2 of [36] states that (r n , v n ) d −→ (r, v) in C[0, 1] 2 . We conclude that lim n→∞ J(T n ) n 5/4 = This integral is the object of study in [35], wherein it is shown that where N is a standard normal variable, η is given by η = as k → ∞, where β = 0.981038 . . . . Further analysis of the moments of η and Y , including the moment generating function and tail estimates, can be found in [35].
Remark 5.1. Conditioning on the value of η, the random variable Y has variance η/(12σ). The random variable η can be seen as a scaled limit of the second common path length Υ 2 (T n ), which appeared in our earlier discussion on cumulants. Indeed, recall that Υ 2 (T n ) def = u,v∈Tn c(u, v), where c(u, v) denotes the number of common ancestors of u, v.

Convergence of the moment generating function
The last bit of Theorem 1.16 which remains to be proved is that E e tYn → E e tY for all fixed t ∈ R. Since we have already shown Y n d −→ Y , we can apply the Vitali convergence theorem once we have shown that the sequence e tYn is uniformly integrable. This follows from the following lemma. where H n denotes the height of T n . It follows that The random variable H n has been well-studied. In particular, Addario-Berry et al. [1] showed that there exist positive constants C 2 and c 2 such that P {H n > x} ≤ C 2 exp −c 2 x 2 n , for all n ∈ N and x ≥ 0. Therefore, we have E exp H n √ n t 2 = 1 + ∞ 0 e x P H n √ n t 2 > x dx ≤ 1 + ∞ 0 e x C 2 exp −c 2 x 2 t 4 dx ≤ 1 + C 1 t 2 e c 3 t 4 for some positive constants c 3 and C 1 . (For the equality in the above computation, see [20, pp. 56].) Thus the lemma follows.