Abstract 1 Introduction 2 Grandchildren-balanced trees 3 Bottom-up algorithm 4 Top-down algorithm 5 Complexity analysis References Appendix A Missing proofs of Sections 2 and 3

Grandchildren-Weight-Balanced Binary Search Trees

Vincent Jugé ORCID LIGM, Univ Gustave Eiffel & CNRS, Marne-la-Vallée, France
IRIF, Université Paris Cité & CNRS, France
Abstract

We revisit weight-balanced trees, also known as trees of bounded balance. Invented by Nievergelt and Reingold in 1972, these trees are obtained by assigning a weight to each node and requesting that the weight of each node should be quite larger than the weights of its children, the precise meaning of “quite larger” depending on a real-valued parameter γ. Blum and Mehlhorn then showed how to maintain them in a recursive (bottom-up) fashion when 2/11γ11/2, their algorithm requiring only an amortised constant number of tree rebalancing operations per update (insertion or deletion). Later, in 1993, Lai and Wood proposed a top-down procedure for updating these trees when 2/11γ1/4.

Our contribution is two-fold. First, we strengthen the requirements of Nievergelt and Reingold, by also requesting that each node should have a substantially larger weight than its grandchildren, thereby obtaining what we call grandchildren-balanced trees. Grandchildren-balanced trees are not harder to maintain than weight-balanced trees, but enjoy a smaller node depth, both in the worst case (with a 6 % decrease) and on average (with a 1.6 % decrease). In particular, unlike standard weight-balanced trees, all grandchildren-balanced trees with n nodes are of height less than 2log2(n).

Second, we adapt the algorithm of Lai and Wood to all weight-balanced trees, i.e., to all parameter values γ such that 2/11γ11/2. More precisely, we adapt it to all grandchildren-balanced trees for which 1/4<γ11/2. Finally, we show that, except in limit cases (where, for instance, γ=11/2), all these algorithms result in making a constant amortised number of tree rebalancing operations per tree update.

Keywords and phrases:
Data structures, Balanced binary trees
Copyright and License:
[Uncaptioned image] © Vincent Jugé; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Sorting and searching
Related Version:
Full Version: https://arxiv.org/abs/2410.08825 [12]
Editors:
Pat Morin and Eunjin Oh

1 Introduction

Among the most fundamental data structures are search trees. Such trees are aimed at representing sets of pairwise comparable elements of size n while allowing basic operations such as membership test, insertion and deletion, which typically require a time linear in the depth of the tree. That is why various data structures such as height-balanced (AVL) trees [1], weight-balanced trees [17], B-trees [4], red-black trees [9], splay trees [19], relaxed k-trees [8] or weak AVL trees [10] were invented since the 1960s. All of them provide worst-case or amortized 𝒪(log(n)) complexities for membership test, insertion and deletion, and may use local modifications of the tree shape called rotations.

The most desirable features of such trees include their height, their internal and external path lengths, and the amortised number of rotations triggered by an insertion or deletion. Such statistics are presented in Table 1 below for families of binary trees.

Note that, although relaxed k-trees can be made arbitrarily short (given that each tree should be of height at least log2(n)), achieving these excellent guarantees on the height of the trees requires performing significantly more rotations, and thus considering other data structures may still be relevant. This is, for instance, why Haeupler, Tarjan and Sen proposed weak AVL trees in 2009: these are a variant of AVL trees, isomorphic to red-black trees, whose height can be worse than that of AVL trees (but never worse than that of red-black trees), but which require only a constant number of rotations per update, whereas standard AVL trees may require up to log(n) rotations per update.

Table 1: Asymptotic approximate worst-case height, internal/external path length and amortised number of rotations per update in binary trees with n nodes. Next to each worst-case bound are indicated references where this bound is proved.
Tree family Worst-case
height path length am. rotations/update
AVL 1.4404log2(n) [1] 1.4404nlog2(n) [13] Θ(log(n)) [2]
Weight-balanced 2log2(n) [17] 1.1462nlog2(n) [17] Θ(1) [5]
Red-black 2log2(n) [9] 2nlog2(n) [7] Θ(1) [9]
Splay n [19] n2/2 [19] Θ(1) [19]
Relaxed k- (1+ε)log2(n) [8] (1+ε)nlog2(n) [8] 𝒪(1/ε) [8]
Weak AVL 2log2(n) [9] 2nlog2(n) [7] Θ(1) [10]
Grandchildren-balanced 1.8798log2(n) 1.1271nlog2(n) Θ(1)

In 1972, Nievergelt and Reingold invented weight-balanced trees, or trees of bounded balance [17]. This family of trees depends on a real parameter γ, and is denoted by 𝖡𝖡[γ]. It is based on the notion of weight of a node n, which is the integer |n| defined as one plus the number of descendants of a node n; alternatively, |n| is the number of empty sub-trees descending from n. Although they might be less efficient than other families of balanced trees in general, they are a reference data structure for order-statistic trees: they are relevant in contexts where we need to store the rank of elements in a dynamic set [6] (i.e., finding efficiently the kth smallest element, or counting elements smaller than a given bound). They are also widely used in libraries of mainstream languages, e.g., Haskell set or map implementations [15, 16].

The family 𝖡𝖡[γ] consists of those binary search trees in which, for all nodes u and v, we have |u|γ|v| whenever u is a child of v. Nievergelt and Reingold also gave an algorithm for dynamically maintaining 𝖡𝖡[γ]-trees of size n whenever γ21, which required only 𝒪(log(n)) element comparisons and pointer moves per modification.

Blum and Mehlhorn then found that this algorithm worked only when 2/11γ21. They also proved that, in that case, it required only 𝒪(1) amortised pointer moves per modification [5]. Lai and Wood subsequently proposed an algorithm for maintaining such trees through a single top-down pass per modification whenever 2/11γ1/4, or whenever updates were guaranteed to be non-redundant [14]. This limitation on the domain of γ when tackling possibly redundant updates is somewhat unfortunate, given that those values of γ for which 𝖡𝖡[γ] have the best upper bounds on their height and path length are those for which γ approaches 21.

Top-down maintenance algorithms have two main advantages over bottom-up algorithms. First, bottom-up algorithms require maintaining a link from each node to its parent or storing the list of nodes we visited along a given branch on a stack, which makes them typically slower than top-down algorithms [3]. Second, they are also a strong bottleneck in concurrent or parallel settings: going down to a leaf and then back to the root must be an atomic operation (or requires maintaining children-to-parent links), since it might finish by changing the root [9, 10].

Thus, our goal here is two-fold. First, by making small changes to a long-known data structure, we improve the guarantees it offers in terms of tree height. Second, we aim at circumventing the limitations of the algorithm from Lai and Wood, by proposing a top-down updating algorithm that will be valid in all cases, and which we also extend to our enhanced data structure.

Contributions

We propose a new variant of weight-bounded trees, which we call grandchildren-balanced trees. This family of trees depends on two real parameters α and β, and is denoted by 𝖦𝖢𝖡[α,β]. It consists of those binary search trees in which, for all nodes u and v, we have |u|α|v| whenever u is a child of v, and |u|β|v| whenever u is a grandchild of v. Grandchildren-balanced trees generalise weight-bounded trees, because 𝖦𝖢𝖡[α,α2]=𝖡𝖡[1α].

We prove that, for well-chosen values of α and β, the height and internal and external path lengths of grandchildren-balanced trees are smaller than those of weight-bounded trees; in particular, we break the 2log2(n) lower bound on the worst-case height of weight-balanced trees. These results are achieved by using Algorithm 11.

We also extend the algorithm of Lai and Wood, and propose an algorithm for maintaining grandchildren-balanced trees in a single top-down pass per modification, which requires only 𝒪(1) amortised pointer moves per modification; this is Algorithm 16. Our algorithm is valid for all relevant values of α and β, including those for which 𝖦𝖢𝖡[α,β]-trees enjoy the best upper bounds on their height and path length.

Most proofs are omitted in the body of this article. Those of Sections 2 and 3 can be found in the appendix; the other ones can be found in the complete version of this article [12].

2 Grandchildren-balanced trees

In this section, we describe a new family of binary search trees, called grandchildren-balanced trees, which depends on two real parameters α and β. We also describe the domain in which the pair (α,β) is meaningful and, for such pairs, we provide upper bounds on the height and internal and external path lengths of grandchildren-balanced trees.

Let us first recall the definition given in Section 1.

Definition 1.

The weight of a binary tree 𝒯, denoted by |𝒯|, is defined as the number of empty sub-trees of 𝒯; alternatively, |𝒯|1 is the number of nodes of 𝒯. The weight of a node n of 𝒯, also denoted by |n|, is defined as the weight of the sub-tree of 𝒯 rooted at n.

In what follows, it may be convenient to see each empty tree (of weight 1) as if it had two empty children of weight 1/2 each, themselves having two empty children of weight 1/4 each, and so on. In other words, we may see each empty tree as an infinite complete binary tree whose nodes of depth d have weight 2d.

Let n be a node of 𝒯, let n1 and n2 be its children, and let n11n12n21 and n22 be its grandchildren. The child-balance and the grandchild-balance (also called C-balance and GC-balance) of n in 𝒯 are defined as the real numbers

𝖻𝖺𝗅𝖢(𝒯,n)=max{|n1|,|n2|}|n| and 𝖻𝖺𝗅𝖦𝖢(𝒯,n)=max{|n11|,|n12|,|n21|,|n22|}|n|.

When the context is clear, we may omit referring to 𝒯, thereby simply writing 𝖻𝖺𝗅𝖢(n) and 𝖻𝖺𝗅𝖦𝖢(n); alternatively, when n is the root of 𝒯, its C-balance and the GC-balance may also be directly denoted by 𝖻𝖺𝗅𝖢(𝒯) and 𝖻𝖺𝗅𝖦𝖢(𝒯).

Given real numbers α and β, we say that a node n of 𝒯 is α-child balanced (or α-C-balanced) when 𝖻𝖺𝗅𝖢(n)α; β-grandchild-balanced (or β-GC-balanced) when 𝖻𝖺𝗅𝖦𝖢(n)β; and (α,β)-balanced when n is both α-C-balanced and β-GC-balanced.

Finally, we say that 𝒯 is a 𝖦𝖢𝖡x,y[α,β]-tree when its root is (x,y)-balanced and its other nodes are (α,β)-balanced. For the sake of concision, we simply say that 𝒯 is a 𝖦𝖢𝖡x[α]-tree, a 𝖦𝖢𝖡[α,β]-tree or a 𝖦𝖢𝖡[α]-tree when (y,β)=(1,1)(x,y)=(α,β) or (x,y,β)=(α,1,1), respectively.

Below, we will be mostly interested in the family of 𝖦𝖢𝖡[α,β]-trees. Although it generalises weight-bounded trees invented by Nievergelt and Reingold, we had to shift away from their notation. Indeed, weight-bounded trees were parametrised with a real number γ, by requesting that |u|γ|v| whenever u is a child of v; however, being β-GC-balanced cannot be expressed by using constraints such as “|u|γ|v| whenever u is a grandchild of v”. In particular, in [5, 17], the root-balance of the node n is simply defined as the real number ρ(n)=|n1|/|n|; here, we set 𝖻𝖺𝗅𝖢(n)=max{ρ(n),1ρ(n)}.

The family of 𝖦𝖢𝖡[α,β]-trees coincides with

  • both families of 𝖦𝖢𝖡[α]-trees and 𝖡𝖡[1α]-trees when α2β;

  • the empty set when α<1/2;

  • the family of 𝖦𝖢𝖡[1,β]-trees when 1α;

  • the family of 𝖦𝖢𝖡[2β,β]-trees when β<α/2;

  • the family of all binary trees when α=β=1.

Thus, we assume below that the pair (α,β) belongs to the domain

𝒟={(α,β):α/2βα2 and 1/2α1}{(1,1)},

represented in Figure 1 along with two other domains 𝒟 and 𝒟′′ at the end of Section 2.

Figure 1: Two-dimensional domains 𝒟𝒟′′𝒟, and one-dimensional (thick, gray) curve 𝒞. Whereas 𝖡𝖡[1α]-trees correspond to choosing (α,β) on the curve 𝒞, our 𝖦𝖢𝖡[α,β]-trees can be created on the entire domain 𝒟.

We first obtain the following upper bound on the height of 𝖦𝖢𝖡[α,β]-trees.

Theorem 2.

When (α,β)𝒟, each 𝖦𝖢𝖡[α,β]-tree with 𝐍 nodes is of height

h2logβ(𝐍+1).

Proof.

Let 𝒯 be a 𝖦𝖢𝖡[α,β]-tree with 𝐍 nodes. Let h be its height, and let n be a node of 𝒯 at depth h. An induction on k proves that |n(k)|2βk/2α whenever 0kh, where n(k) denotes the kth ancestor of n, and is defined by n if k=0, or as the parent of n(k1) if k1. It follows that 𝐍+1|n(h)|2βh/2αβh/2, i.e., that h2logβ(𝐍+1).

We also focus on the internal and external path lengths of 𝖦𝖢𝖡[α,β]-trees. The internal path length of a tree 𝒯 is the sum of the lengths of the paths from the root of 𝒯 to the nodes of 𝒯; the external path length of 𝒯 is the sum of the lengths of the paths from the root of 𝒯 to the empty sub-trees of 𝒯. These lengths are closely related to the average number of queries required to check membership in the set of labels of 𝒯.

Indeed, let 𝒯 be a binary search tree whose 𝐍 nodes are labelled by elements of a linearly ordered set , and let λ𝗂𝗇𝗍 and λ𝖾𝗑𝗍 be its internal and external path lengths. The labels of the nodes of 𝒯 form a linearly ordered set 𝒮 of size 𝐍, which splits  into 𝐍+1 intervals. Then, the average number of queries used to find an element x is:

  • λ𝗂𝗇𝗍/𝐍 when x is chosen uniformly at random in 𝒮;

  • λ𝖾𝗑𝗍/(𝐍+1) when x is chosen in 𝒮, and the interval of 𝒮 to which x belongs is chosen uniformly at random.

By construction, each node n increases the lengths λ𝗂𝗇𝗍 and λ𝖾𝗑𝗍 by |n|2 and |n|, respectively. This means that λ𝖾𝗑𝗍 is the sum of the weights of the tree nodes, and that λ𝗂𝗇𝗍=λ𝖾𝗑𝗍2𝐍.

By adapting the proof from [18], we obtain the following upper bound on the external path lengths of 𝖦𝖢𝖡[α,β]-trees.

Theorem 3.

When (α,β)𝒟, each 𝖦𝖢𝖡[α,β]-tree with 𝐍 nodes is of external path length

λ(𝐍+1)log2(𝐍+1)/Δ,

where Δ=(𝖧2(α)+α𝖧2(β/α))/(1+α), and 𝖧2(x)=xlog2(x)(1x)log2(1x) is Shannon’s binary entropy.

In what follows, we will investigate more closely the family of 𝖦𝖢𝖡[α,β]-trees when (α,β) belongs to the more restricted domain

𝒟={(α,β):1/2α<3/4 and (α)βα2}, where (α)=1+4α12.

In the limit cases where α=1/20.7071 and β=(α)0.4783, Theorems 2 and 3 translate into the inequalities

h1.8798log2(𝐍+1) and λ1.1271(𝐍+1)log2(𝐍+1),

thereby improving the inequalities

h2log2(𝐍+1) and λ1.1462(𝐍+1)log2(𝐍+1)

obtained in the best case (i.e., in the limit case where α=1/2 and β=1/2) for 𝖡𝖡[1α]-trees.

On the domain 𝒟, or even on the larger domain 𝒟′′={(α,β):2/3β/αα<3/4}, the upper bounds presented in Theorem 2 and 3 are (unsurprisingly) quite tight.

Proposition 4.

Let (α,β) be a pair belonging to 𝒟′′, and let 𝐍0 be an integer. There exists a 𝖦𝖢𝖡[α,β]-tree with 𝐍 nodes, of height h2log2(𝐍+1)/log2(β)7 and external path length λ(𝐍+1)log2(𝐍+1)/Δ4𝐍.

3 Bottom-up algorithm

In this section, we propose a simple algorithm for rebalancing a 𝖦𝖢𝖡[α,β] tree 𝒯 after having added a leaf to 𝒯 or deleted a node from 𝒯, when the pair (α,β) belongs to the domain 𝒟.

Rebalancing 𝖦𝖢𝖡[α,β] trees might also be possible when 3/4α1 and (α)βα2 by using similar ideas, but doing so may require more complicated algorithms, in particular for dealing with base cases, which is not necessarily worth the effort since such parameters provide us with trees of larger height and path length.

In case of a deletion, as illustrated in the three cases of Figure 2, we can safely assume that the node to be deleted is a leaf s, by using Hibbard’s deletion technique [11]. Indeed, if the targeted node a has just one child, since a is α-C-balanced and α<3/4, we know that |a|3, which means that this child is a leaf; thus, we can just focus on deleting that child and, a posteriori, replace the key of a with the key of this just-deleted child. Similarly, if the targeted node a has two children, we can first identify the successor of a as the deepest node b of the left branch stemming from the right child of a; b has no left child, which means that we can just focus on deleting it and, a posteriori, replace the key of a with the key of this just-deleted node b.

In both cases, let r be the root of 𝒯, and let s be the node that must be inserted or deleted. The weights of the nodes from r to s need to be incremented by 1 in case of a non-redundant insertion (since s just replaced an empty tree, we can pretend that its weight was 1 before it was inserted), or decremented by 1 in case of a non-redundant deletion (since s was a leaf and is replaced by an empty tree, we can pretend that its new weight is 1). However, if the update is redundant, i.e., if we tried to insert a key that was already present in 𝒯, or to delete an absent key, these weights will not be changed.

In practice, each tree node will contain two fields for storing explicitly its label and its weight. Moreover, in the description below, we simply say “ if s lies in a sub-tree 𝒯 ” as a place-holder for “ determine, based on the key x that you want to insert or delete and on the key y stored in the root of the current tree, which sub-tree 𝒯 should contain s ”.

Figure 2: Inserting or deleting a key x: either an empty sub-tree grows into a leaf or a leaf shrinks to an empty sub-tree. When we wish to delete an internal node, we just focus on deleting a leaf descending from that node, and then swap node keys.

Our algorithm is based on two algorithmic building blocks, called 𝖢-balancing (Algorithm 5) and 𝖦𝖢-balancing (Algorithm 6). They consist in locally performing rotations in order to make a given node better balanced when needed, without damaging the balance of its parent and children too much.

In a nutshell, the idea of our bottom-up updating algorithm, which will be made more precise in Algorithm 11, is as follows:

  1. 1.

    We recursively update the sub-tree that needs to be updated.

  2. 2.

    Using the 𝖢-balancing algorithm ensures our root and its children are suitably child-balanced.

  3. 3.

    Using the 𝖦𝖢-balancing algorithm then ensures our root is suitably child- and grandchild-balanced.

This is the same general idea as the original algorithm from Nievergelt and Reingold [5, 17], adapted to take grandchild balances into account: one should first make our tree child-balanced, and only then can one make it grandchild-balanced too.

One difficulty is that, when we perform a rotation to make our root child- or grandchild-balanced, this rotation should be worth its cost: we expect that, for the next few updates (a quantity that can be made precise, and will be studied in Section 5), no update will be required at all. This is what will make the amortised number of rotations per update a constant.

In what follows, 𝒯 denotes a binary tree that may need to be rebalanced. Its root is denoted by n and, for every node x, the left and right children of x are denoted by x1 and x2, respectively. Furthermore, we consider that nodes move when rotations are performed. For instance, when either rotation represented in Figure 3 is performed, the node n, which used to be the root of the tree, becomes the right child of the root of the tree resulting from the rotation.

Below, we also say that a node whose children list changed during the rotation was affected by the rotation. We will always make sure that, when a simple or double rotation is performed on a tree 𝒯, the resulting tree 𝒯 satisfies the inequality 𝖻𝖺𝗅𝖢(𝒯)𝖻𝖺𝗅𝖢(𝒯). That way, if 𝒯 was in fact rooted at some child of a node x, neither the C-balance nor the GC-balance of x, or of any unaffected node, will increase as a result of the rotation.

Figure 3: Performing a simple or double rotation. Affected nodes are coloured in grey.
Algorithm 5 (𝖢-balancing).

Given real numbers uvw and x, the 𝖢(u,v,w,x)-balancing algorithm operates as follows on the binary tree 𝒯 it receives as input:

  1. 1.

    if |n1|>u|n|+w and |n11|(1v)|n|x, perform the simple rotation of Figure 3;

  2. 2.

    if |n2|>u|n|+w and |n22|(1v)|n|x, perform the mirror image of that rotation;

  3. 3.

    if |n1|>u|n|+w and |n11|<(1v)|n|x, perform the double rotation of Figure 3;

  4. 4.

    if |n2|>u|n|+w and |n22|<(1v)|n|x, perform the mirror image of that rotation;

  5. 5.

    if max{|n1|,|n2|}u|n|+w, do not modify 𝒯.

Algorithm 6 (𝖦𝖢-balancing).

Given real numbers y and z, the 𝖦𝖢(y,z)-balancing algorithm operates as follows on the binary tree 𝒯 it receives as input:

  1. 1.

    if |n11|>y|n|+z, perform the simple rotation shown in Figure 3;

  2. 2.

    if |n22|>y|n|+z, perform the mirror image of that rotation;

  3. 3.

    if |n12|>y|n|+z, perform the double rotation shown in Figure 3;

  4. 4.

    if |n21|>y|n|+z, perform the mirror image of that rotation;

  5. 5.

    if max{|n11|,|n12|,|n21|,|n22|}y|n|+z, do not modify 𝒯.

Parameters wx and z can be seen as terms governing some error margin, that we will set equal to zero in Section 3 and will be non-zero in Section 4. Indeed, in the former case, we develop bottom-up updating algorithms, and thus have a perfect knowledge of 𝒯. By contrast, in the latter case, we develop top-down algorithms and thus cannot yet know where the leaf s should be inserted or deleted: depending on the answer, the tree 𝒯 may have different shapes, and we cannot anticipate which will be chosen.

The idea of the 𝖢-balancing algorithm is to either check that our tree root is u-C-balanced or to transform it into a v-C-balanced node; in the latter case, each affected node will also be made v-C-balanced, which will improve the C-balance of our tree root by at least uv without damaging the balances of other nodes too much. If we perform a simple rotation, n11 will be a new root child, and it will definitely not be too large, but its new sibling, of weight |n||n11|, might prevent our new root from being v-C-balanced; this undesirable case should occur when |n11|<(1v)|n|, which is why, in that case, we perform a double rotation instead of a simple one. More precisely, here is a result that can be stated about the 𝖢-balancing algorithm.

Lemma 7.

Let α be a real number such that 1/2α<3/4. Then, let α=19/24 and

α^=1+α(1α)(5α)2(2α1).

We have 1/2α^α. Moreover, when α^uα, the five cases in which the 𝖢(u,α^,0,0)-balancing algorithm consists are pairwise incompatible, and those rotations they trigger can always be performed; the algorithm itself transforms each 𝖦𝖢𝖡α[α]-tree 𝒯 whose root is not u-C-balanced into a 𝖦𝖢𝖡[α]-tree 𝒯 such that 𝖻𝖺𝗅𝖢(𝒯)𝖻𝖺𝗅𝖢(𝒯) and whose affected nodes are α^-C-balanced.

Proof.

Below, we will use the following (in)equalities, which are all easy to check with any computer algebra system (whereas some are quite tedious to check by hand) whenever 1/2α<3/4. Except the first two inequalities, each of them is labelled and later reused to prove another inequality with the same label; equality (2.1 + 2.5) is used to prove both subsequent inequalities (2.1) and (2.5).

1/2α^;
αα<α^; (1.1)
(1α^)2<α^2(1α); (1.3)
(1α^)α2<α^(1α); (2.2)
(1α^)α2α<α^(1α); (2.4)
α^α;
(1α^)(α1+α^)<α^(1α); (1.2)
(1α^)2=α^(1α)(2α^1); (2.1 + 2.5)
(1α^)(1α)+αα<α^; (2.3)
1+(α21)α^<α^. (2.6)

This already proves the inequality 1/2α^α of Lemma 7.

Then, let |m| denote the weight of a node m in the tree 𝒯, and let |m| denote its weight in 𝒯. When these weights are equal, we will prefer using the notation |m| even when considering the weight of m in 𝒯. The root of 𝒯 is denoted by n, its left and right children are denoted by n1 and n2, and so on. Hence, when uα^, we have 2u|n|2|n||n1|+|n2|, which makes the inequalities |n1|>u|n| and |n2|>u|n| incompatible.

Proving that the desired rotations can indeed be performed amounts to showing that n and n1 are actual tree nodes (instead of spurious empty nodes of weight 1/2 or less) in case 1, and that n12 is also a tree node in case 3; cases 2 and 4 will be treated symmetrically. In cases 1 and 3, since |n1|>u|n|>|n|/2, the node n is an actual tree node, and cannot be a leaf, which means, as desired, that n and n1 are tree nodes; furthermore, |n1|>u(|n1|+1)>(|n1|+1)/2, i.e., |n1|>1+2>2, and therefore |n1|3 and |n|4. Then, in case 3, we also have |n12|=|n1||n11|>u|n|(1α^)|n|(21)|n|>1, which means that n12 is also a tree node.

It remains to prove that each affected node is α^-C-balanced: if the root of 𝒯 was not u-C-balanced, we will have 𝖻𝖺𝗅𝖢(𝒯)α^u𝖻𝖺𝗅𝖢(𝒯).

When α|n||n1|>u|n|α^|n| and |n11|(1α^)|n|, a simple rotation is performed, and

|n11| α|n1|αα|n|α^|n|=α^|n1|; (1.1)
(1α^)|n12| =(1α^)(|n1||n11|)(1α^)(α|n|(1α^)|n|)
α^(1α)|n|α^(|n||n1|)=α^|n2|; (1.2)
(1α^)|n2| =(1α^)(|n||n1|)(1α^)(1α^)|n|
α^(1α)α^|n|α^(1α)|n1|α^(|n1||n11|)=α^|n12|; (1.3)
|n| =|n||n11||n|(1α^)|n|=α^|n|=α^|n1|, (1.4)

which means precisely that n and n1 are α^-C-balanced in 𝒯.

Similarly, when α|n||n1|>u|n|α^|n| and |n11|<(1α^)|n|, a double rotation is performed, and

(1α^)|n11| (1α^)2|n|=α^(1α)(α^|n|(1α^)|n|)
α^(1α)(|n1||n11|)=α^(1α)|n12|
α^(|n12||n122|)=α^|n121|; (2.1)
(1α^)|n121| (1α^)α|n12|(1α^)α2|n1|
α^(1α)|n1|α^(|n1||n12|)=α^|n11|; (2.2)
|n1| =|n11|+|n121||n11|+α|n12|=(1α)|n11|+α|n1|
(1α)(1α^)|n|+αα|n|α^|n|=α^|n12|; (2.3)
(1α^)|n122| (1α^)α|n12|(1α^)α2|n1|(1α^)α2α|n|
α^(1α)|n|=α^|n2|; (2.4)
(1α^)|n2| =(1α^)(|n||n1|)(1α^)2|n|=α^(1α)(α^|n|(1α^)|n|)
α^(1α)(|n1||n11|)=α^(1α)|n12|
α^(|n12||n121|)=α^|n122|; (2.5)
|n| =|n||n1|+|n122||n||n1|+α|n12|
|n|+(α21)|n1||n|+(α21)α^|n|α^|n|=α^|n12|, (2.6)

which means precisely that nn1 and n12 are α^-C-balanced in 𝒯.

Finally, the cases where |n2|>u|n| are symmetrical, and therefore the conclusion of Lemma 7 is valid in these cases too.

Similarly, the 𝖦𝖢-balancing algorithm aims at making our tree root y-GC-balanced. If our tree is too unbalanced, we will either promote the responsible grandchild to being a child (if this grandchild was n11 or n22) with a simple rotation, or split it in two (if this grandchild was n12 or n21) with a double rotation. For adequate values of the parameters uv and y, each node affected by the 𝖦𝖢-balancing algorithm will be (v,y^)-balanced for some real number y^y (that does not need to be given as parameter), thus avoiding any damage to the GC-balance of affected nodes, and improving that of our tree root by at least yy^. Hence, here is a result that can be stated about the 𝖦𝖢-balancing algorithm; its proof is similar to that of Lemma 7, and can be found in appendix.

Lemma 8.

Let α and β be real numbers such that 1/2α<3/4 and (α)βα2. Then, let

α^=12α+6α2(1α)(5α)5(2α1)β^=1+α(1α)2+4α(α2β)2(1α2+β)α,

as well as δ=min{α^,β^/α}.

We have 1/4<β^β. Moreover, when α^uα and β^yβ, if we apply the 𝖦𝖢(y,0)-balancing algorithm on a 𝖦𝖢𝖡α^,1[α,β]-tree 𝒯, cases 1 to 5 are pairwise incompatible, and those rotations they trigger can always be performed; if the root of 𝒯 is not y-GC-balanced, the algorithm transforms 𝒯 into a 𝖦𝖢𝖡u,y[α,β]-tree 𝒯 such that 𝖻𝖺𝗅𝖢(𝒯)𝖻𝖺𝗅𝖢(𝒯) and whose affected nodes are (δ,β^)-balanced.

Finally, these building blocks can be combined to ensure that a 𝖦𝖢𝖡[α,β]-tree whose root is suddenly slightly out of line will be rebalanced.

Algorithm 9 (𝖢𝖦𝖢-balancing).

Given real numbers uvy and y^, the 𝖢𝖦𝖢(u,v,y,y^)-balancing algorithm executes the following operations on the tree 𝒯 it receives as input:

  1. 1.

    the 𝖢(u,v,0,0)-balancing algorithm transforms 𝒯 into a tree 𝒯(1);

  2. 2.

    if 𝒯𝒯(1),

    • the 𝖦𝖢(y^,0)-balancing algorithm is applied (in parallel) to the left and right sub-trees of 𝒯(1), which transforms 𝒯(1) into a tree 𝒯(2);

    • then, the 𝖦𝖢(y^,0)-balancing algorithm is applied to 𝒯(2) itself;

  3. 3.

    otherwise, the 𝖦𝖢(y,0)-balancing algorithm is applied to 𝒯.

Proposition 10.

Let α and β be real numbers such that 1/2α<3/4 and (α)βα2; we recall that (α)=(1+4α1)/2. Then, let α=19/24,

α^=12α+6α2(1α)(5α)5(2α1) and β^=1+α(1α)2+4α(α2β)2(1α2+β)α.

When α^uα and β^yβ, the 𝖢𝖦𝖢(u,α^,y,β^)-balancing algorithm, when applied to a 𝖦𝖢𝖡α,1[α,β]-tree 𝒯 whose root is not (u,y)-balanced, transforms 𝒯 into a 𝖦𝖢𝖡α^,β^[α,β]-tree 𝒯 such that 𝖻𝖺𝗅𝖢(𝒯)𝖻𝖺𝗅𝖢(𝒯) and whose affected nodes are all (α^,β^)-balanced.

Proof.

If the root of 𝒯 is (u,y)-balanced, the 𝖢𝖦𝖢-balancing algorithm does nothing. If this root is u-C-balanced but not y-GC-balanced, the 𝖦𝖢(y,0)-balancing algorithm is called; Lemma 8 ensures that every affected node will be (α^,β^)-balanced and that 𝖻𝖺𝗅𝖢(𝒯)𝖻𝖺𝗅𝖢(𝒯).

Finally, if the root of 𝒯 it is not u-C-balanced, we will call the 𝖢(u,α^,0,0)-algorithm, obtaining a 𝖦𝖢𝖡[α,β]-tree 𝒯(1), whose root will be denoted by r. Nodes affected by this call are r and one or two of its children r1 and r2. The 𝖦𝖢(β^,0)-balancing algorithm is then applied to the 𝖦𝖢𝖡α,1[α,β]-trees rooted at r1 and r2; as a result, 𝒯(2) is a 𝖦𝖢𝖡α,1[α,β]-tree rooted at r, to which the 𝖦𝖢(β^,0)-balancing algorithm is applied once more. No tree to which the 𝖢- and 𝖦𝖢-balancing algorithms were applied saw its C-balance increase, and thus 𝖻𝖺𝗅𝖢(𝒯)𝖻𝖺𝗅𝖢(𝒯).

Moreover, each node m affected by the 𝖢𝖦𝖢-balancing algorithm is either affected by some call to the 𝖦𝖢-balancing algorithm or the root of some tree that the 𝖦𝖢-balancing algorithm did not modify; the latter case may only concern nodes rr1 and r2. In both cases, m ends up being (α^,β^)-balanced, which completes the proof.

Now is the time when we actually describe our update algorithm, which we will apply to a 𝖦𝖢𝖡[α,β]-tree in which a leaf s is to be inserted, or from which s is to be deleted. We have two regimes: the first one concerns trees with 11 nodes or less, which we can just restructure in order to make them as balanced as possible, and the second regime concerns trees with 12 nodes or more.

Algorithm 11 (Bottom-up update).

Let αα^ and β be real numbers given in Proposition 10. Our updating algorithm executes the following operations on the tree 𝒯 it receives as input:

  • if 𝒯 contains 11 nodes or less, insert or delete s (if possible) and make the resulting tree a perfectly balanced tree, i.e., a tree in which the weights of two siblings differ by at most 1;

  • if 𝒯 contains 12 nodes or more,

    1. 1.

      recursively rebalance the (left or right) sub-tree in which s lies, and then

    2. 2.

      if the update turned out to be non-redundant, update the weight of the root of 𝒯, and then apply the 𝖢𝖦𝖢(α,α^,β,β^)-balancing algorithm to 𝒯.

Due to the recursive flavour of this algorithm, the rebalancing operations are performed bottom-up. We will see in Section 4 how to perform these operations top-down.

Theorem 12.

Let α and β be real numbers such that 1/2α<3/4 and (α)βα2. The tree obtained by using Algorithm 11 to insert a leaf in a 𝖦𝖢𝖡[α,β]-tree 𝒯 or delete a node from 𝒯 is also a 𝖦𝖢𝖡[α,β]-tree.

Proof.

Our proof is a variant of the proof of [5].

Let 𝒯 and 𝒯′′ be the trees obtained just after step 1 and step 2 of Algorithm 11, respectively. We will prove by induction on |𝒯| that 𝒯′′ is a 𝖦𝖢𝖡[α,β]-tree. The result being correct by construction when |𝒯|12, we assume that |𝒯|13.

Let 𝒯1 and 𝒯2 be the left and right sub-trees of 𝒯, and let t=|𝒯|t1=|𝒯1| and t2=|𝒯2|. Similarly, let 𝒯1 and 𝒯2 be the left and right sub-trees of 𝒯, and let t=|𝒯| t1=|𝒯1| and t2=|𝒯2|. Without loss of generality, we assume that the update was non-redundant, and that the tree 𝒯 was altered by adding s to 𝒯1 or by deleting s from 𝒯2. This means that either (t,t1,t2)=(t+1,t1+1,t2) or (t1,t2)=(t1,t1,t21). In both cases, the induction hypothesis ensures that 𝒯1 and 𝒯2 are 𝖦𝖢𝖡[α,β]-trees.

We prove now that the root of 𝒯 is α-C-balanced, thereby allowing us to use Proposition 10 and completing the induction. Indeed, let t=t1+t2 and t=t1+t2. Since 4t14αt<3t, we know that 4t13t1.

Thus, in case of an insertion, t1=t1+1(3t+3)/4=3t/4αt and t2=t2αtαt.

Similarly, in case of a deletion, t1=t1(3t1)/4=(3t+2)/4=αt(t12)/24αt, whereas t2=t21αt1αtαt.

4 Top-down algorithm

In this section, we propose a top-down algorithm for inserting an element into (or deleting an element from) a weight-balanced tree. This algorithm is valid whenever 1/2α<3/4 and (α)βα2, thereby completing the algorithm of Lai and Wood [14], which works only when 3/4α9/11 and β=α2. This new algorithm is inspired by theirs: we wish to perform a top-down restructuring pass while adjusting weight information. If the update is redundant, a second top-down pass will be needed to update this information, but no further restructuring will be needed.

More precisely, we aim at having a top-down algorithm that requires considering only a constant number of tree nodes at each step; this number may not depend on the parameters such as α and β. Moreover, we still wish, by using this algorithm, to perform only a constant number of rotations per update; this number may depend on αβ and other parameters. Consequently, the idea of the algorithm, which will be made more precise in Algorithm 16, is as follows:

  1. 1.

    If the tree is large enough, we may rebalance it to make sure it will remain balanced even if we recursively update one of its children. Like in Section 3, rotations performed in this phase should be so efficient that only an amortised constant number of rotations per update will be useful.

  2. 2.

    If the tree is small enough, we make it as balanced as possible, the notion of being top-down being void in this case.

A key object towards defining and proving the correctness of our algorithm is the notion of robust grandchildren-balanced trees.

Definition 13.

Let n be a node of a binary tree 𝒯, let n1 and n2 be its children, and let n11n12n21 and n22 be its grandchildren. Given real numbers α and β, we say that n is robustly α-C-balanced when

max{|n1|,|n2|}+1α(|n|+1) and max{|n1|,|n2|}α(|n|1);

that n is robustly β-GC-balanced when

max{|n11|,|n12|,|n21|,|n22|}+1β(|n|+1) and max{|n11|,|n12|,|n21|,|n22|}β(|n|1);

and that n is robustly (α,β)-balanced when n is robustly α-C-balanced and robustly β-GC-balanced.

Finally, we say that a tree 𝒯 is a 𝖱𝖦𝖢𝖡x,y[α,β]-tree when its root is robustly (x,y)-balanced and its other nodes are (α,β)-balanced. For the sake of concision, we simply say that 𝒯 is a 𝖱𝖦𝖢𝖡[α,β]-tree or a 𝖱𝖦𝖢𝖡[α]-tree when (x,y)=(α,β) or (x,y,β)=(α,1,1), respectively.

Inequalities |ni|+1α(|n|+1) and |ni|α(|n|1) ensure that, even if a leaf is inserted into or deleted from 𝒯, the node n will remain α-C-balanced. Inequalities |nij|+1β(|n|+1) and |nij|β(|n|1) serve the same purpose, but for ensuring that n remains β-GC-balanced. Finally, each tree node is robustly (1,1)-balanced.

Setting 𝖱(t)=min{t,t1} and observing that min{t(|n|+1)1,t(|n|1)}=t|n|+𝖱(t) for all real numbers t provides us with a more succinct characterisation of (α,β)-balanced nodes, which will require giving non-zero values to the parameters wx and z: the node n is robustly (α,β)-balanced if and only if max{|n1|,|n2|}α|n|+R(α) and max{|n11|,|n12|,|n21|,|n22|}β|n|+𝖱(β).

We can now group several calls to the 𝖢- and 𝖦𝖢-balancing algorithms into a so-called 𝖱𝖢𝖦𝖢-balancing algorithm, whose aim is to make a tree root robustly balanced.

Algorithm 14 (𝖱𝖢𝖦𝖢-balancing).

Given real numbers uvy and y^, the 𝖱𝖢𝖦𝖢(u,v,y,y^)-balancing algorithm executes the following operations, in this order, on the tree 𝒯 it receives as input, thus transforming it into a tree 𝒯:

  1. 1.

    the 𝖢(u,v,𝖱(u),𝖱(v))-balancing algorithm transforms 𝒯 into a tree 𝒯(1);

  2. 2.

    if 𝒯𝒯(1),

    • the 𝖢𝖦𝖢(v,v,y^,y^)-balancing algorithm is applied (in parallel) to the left and right sub-trees of 𝒯(1), which transforms 𝒯(1) into a tree 𝒯(2);

    • the 𝖦𝖢(y^,𝖱(y^))-balancing algorithm transforms 𝒯(2) into a tree 𝒯(3);

    • the 𝖢𝖦𝖢(v,v,y^,y^)-balancing algorithm is applied (in parallel) to the left and right sub-trees of 𝒯(3), which yields the tree 𝒯;

  3. 3.

    otherwise, the 𝖦𝖢(y,𝖱(y))-balancing algorithm transforms 𝒯 into a tree 𝒯(3), and then,

    • if 𝒯𝒯(3), the 𝖢𝖦𝖢(v,v,y^,y^)-balancing algorithm is applied (in parallel) to the left and right sub-trees of 𝒯(3), which yields the tree 𝒯;

    • otherwise, 𝒯=𝒯.

The idea is similar to that of the plain 𝖢- and 𝖦𝖢-balancing algorithm, but we wish to transform a tree whose root is not robustly (u,y)-balanced into a tree 𝒯 whose root will be robustly (v,y^)-balanced. Calling the 𝖢(u,v,𝖱(u),𝖱(v))-, 𝖦𝖢(y^,𝖱(y^))- and 𝖦𝖢(y,𝖱(y))-balancing algorithms achieves this goal, while possibly damaging the balance of the children of the root of 𝒯; we solve this issue by the 𝖢𝖦𝖢-balancing algorithm to the sub-trees rooted at these children.

Proposition 15.

Let αα^β and β^ be real numbers given in Proposition 10. The 𝖱𝖢𝖦𝖢(α,α^,β,β^)-balancing algorithm transforms each 𝖦𝖢𝖡α,1[α,β]-tree 𝒯 with 30 nodes or more and whose root is not robustly (α,β)-balanced into a 𝖱𝖦𝖢𝖡[α,β]-tree 𝒯 such that 𝖻𝖺𝗅𝖢(𝒯)𝖻𝖺𝗅𝖢(𝒯) and whose affected nodes are all (α^,β^)-balanced.

We finally describe our top-down updating algorithm, which we will apply to a 𝖦𝖢𝖡[α,β]-tree in which a leaf s is to be inserted, or from which s is to be deleted. Like the algorithm of Lai and Wood [14], this algorithm is not purely top-down, because (i) if we wanted to delete an internal node a, as illustrated in Figure 2, it requires maintaining a pointer to a, whose key will later be replaced by another node key, and (ii) if the update turned out to be redundant, a second top-down pass will be required to cancel every weight update we performed. Alternative representations of weight-balanced trees or 𝖦𝖢𝖡[α,β]-trees allow omitting this second pass: instead of storing the weight |n| at each node n, we should store one of the weights |n1| or |n2| as well as a bit indicating which weight we stored.

Algorithm 16 (Top-down update).

Let αα^β and β^ be real numbers given in Proposition 10. Our updating algorithm executes the following operations on the tree 𝒯 it receives as input:

  • if 𝒯 contains 29 nodes or less, insert or delete s (if possible) and make the resulting tree a perfectly balanced tree, i.e., a tree in which the weights of two siblings differ by at most 1;

  • if 𝒯 contains 30 nodes or more,

    1. 1.

      apply the 𝖱𝖢𝖦𝖢(α,α^,β,β^)-balancing algorithm to 𝒯, then

    2. 2.

      update the weight of the root of 𝒯, and finally

    3. 3.

      recursively update the (left or right) sub-tree of 𝒯 that contains s.

Theorem 17.

Let α and β be real numbers such that 1/2α<3/4 and (α)βα2. The tree obtained by using Algorithm 16 to insert a leaf in a 𝖦𝖢𝖡[α,β]-tree 𝒯 or delete a node from 𝒯 is also a 𝖦𝖢𝖡[α,β]-tree.

5 Complexity analysis

In this final section, we prove that Algorithms 11 and 16 proposed above perform an amortised constant number of rotations per attempted insertion or deletion.

The idea consists in evaluating the sums of child- and grandchild-balances of all tree nodes. Indeed, inserting or deleting a leaf s will slightly damage the child- and grandchild balances of the ancestors of s, which may increase the sum of these balances by no more than a constant. In the opposite direction, whenever the 𝖢𝖦𝖢- or 𝖱𝖢𝖦𝖢-balancing algorithm changes a sub-tree 𝒯, some node will see its child-balance decrease by approximately αα^, or its grandchild-balance decrease by approximately ββ^; other node balances might be damaged in the process, but not to the point of exceeding (α^,β^). Thus, not too many changes may be performed.

These ideas lead to the following result.

Theorem 18.

Let α and β be real numbers such that 1/2<α<3/4 and (α)<βα2. Let η and ε be given by η=α1/2 and ε=min{β(α),α2β} if (α)<β<α2, or ε=1 if β=α2.

Then, let 𝒯0,𝒯1,,𝒯k be 𝖦𝖢𝖡[α,β]-trees defined as follows: 𝒯0 is the empty tree and, for all 0, the tree 𝒯+1 is obtained by inserting a leaf in 𝒯 or deleting a leaf from 𝒯, using either Algorithm 11 or 16 to do so. Gradually transforming 𝒯0 into 𝒯k via such steps requires only 𝒪(k/η+k/ε) rotations.

Proof outline.

Each node n is given a real-valued counter 𝖼(n) that receives the value 0 when n is created (i.e., inserted in a tree) or affected by a rotation, and is increased by 2/|n| when a descendant of n is about to be created or deleted.

Finally, let 𝖢 be the sum of all these counters 𝖼(n), and let δ=min{αα^,ββ^,1/62}/2. One can prove that the sum 𝖢 increases by no more than 16 when a leaf is inserted or deleted, and decreases by at least δ when the 𝖢𝖦𝖢- or 𝖱𝖢𝖦𝖢-balancing algorithms trigger a rotation and the tree contains at least 32 nodes; if the tree contains fewer than 32 nodes, rotations do not make 𝖢 increase anyway.

Since the sum 𝖢 is initially zero, and terminates with a non-negative value, no more than 𝒪(k/δ) rotations are performed on trees with 32 nodes or more, and no more than 𝒪(k) rotations can be performed on trees with fewer than 32 nodes.

In particular, let us focus on some approach to the critical point (α𝖼,β𝖼)=(1/2,(1/2)). When setting α=α𝖼+x and β=β𝖼+x, and provided that 0<x<1/10, the inequalities 1/2<α<3/4 and (α)<β<α2 are valid. Moreover,

  • Theorem 2 states that 𝖦𝖢𝖡[α,β]-trees with 𝐍 nodes are of height

    h2log2(𝐍+1)/log2(β𝖼)+7log2(𝐍+1)x;
  • Theorem 3 states that 𝖦𝖢𝖡[α,β]-trees with 𝐍 nodes are of external path length

    λ(𝐍+1)log2(𝐍+1)/Δ𝖼+2(𝐍+1)log2(𝐍+1)x,

    where Δ𝖼=(𝖧2(α𝖼)+α𝖼𝖧2(β𝖼/α𝖼))/(1+α𝖼);

  • Theorem 18 states that inserting or deleting k leaves in 𝖦𝖢𝖡[α,β]-trees requires 𝒪(k/x) rotations. More precisely, digging in the constants hidden in the 𝒪 notation yields a (crude) upper bound of less than 30(1+28/x)k rotations.

References

  • [1] Georgii Maksimovich Adel’son-Velskii and Evgenii Mikhailovich Landis. An algorithm for organization of information. Doklady Akademii Nauk, 146(2):263–266, 1962.
  • [2] Mahdi Amani, Kevin Lai, and Robert Tarjan. Amortized rotation cost in AVL trees. Information Processing Letters, 116(5):327–330, 2016. doi:10.1016/j.ipl.2015.12.009.
  • [3] Lukas Barth and Dorothea Wagner. Engineering top-down weight-balanced trees. In 22nd Workshop on Algorithm Engineering and Experiments (ALENEX), pages 161–174. SIAM, 2020. doi:10.1137/1.9781611976007.13.
  • [4] Rudolf Bayer and Edward McCreight. Organization and maintenance of large ordered indices. Acta Informatica, 1:173–189, 1972. doi:10.1007/BF00288683.
  • [5] Norbert Blum and Kurt Mehlhorn. On the average number of rebalancing operations in weight-balanced trees. Theoretical Computer Science, 11:303–320, 1980. doi:10.1016/0304-3975(80)90018-3.
  • [6] Gerth Stølting Brodal and Allan Grønlund Jørgensen. Data structures for range median queries. In International Symposium on Algorithms and Computation, pages 822–831. Springer, 2009. doi:10.1007/978-3-642-10631-6_83.
  • [7] Helen Cameron and Derick Wood. A note on the path length of red-black trees. Information Processing Letters, 42(5):287–292, 1992. doi:10.1016/0020-0190(92)90038-W.
  • [8] Rolf Fagerberg, Rune Jensen, and Kim Larsen. Search trees with relaxed balance and near-optimal height. In 7th International Workshop on Algorithms and Data Structures (WADS), volume 2125 of Lecture Notes in Computer Science, pages 414–425. Springer, 2001. doi:10.1007/3-540-44634-6_38.
  • [9] Leonidas Guibas and Robert Sedgewick. A dichromatic framework for balanced trees. In 19th Annual Symposium on Foundations of Computer Science (FOCS), pages 8–21. IEEE Computer Society, 1978. doi:10.1109/SFCS.1978.3.
  • [10] Bernhard Haeupler, Siddhartha Sen, and Robert Tarjan. Rank-balanced trees. ACM Transactions on Algorithms (TALG), 11(4):1–26, 2015. doi:10.1145/2689412.
  • [11] Thomas Hibbard. Some combinatorial properties of certain trees with applications to searching and sorting. Journal of the ACM, 9(1):13–28, 1962. doi:10.1145/321105.321108.
  • [12] Vincent Jugé. Grand-children weight-balanced binary search trees, 2024. doi:10.48550/arXiv.2410.08825.
  • [13] Rolf Klein and Derick Wood. A tight upper bound for the path length of AVL trees. Theoretical Computer Science, 72(2&3):251–264, 1990. doi:10.1016/0304-3975(90)90037-I.
  • [14] Tony Lai and Derick Wood. A top-down updating algorithm for weight-balanced trees. International Journal of Foundations of Computer Science, 4(4):309–324, 1993. doi:10.1142/S0129054193000201.
  • [15] Daan Leijen. Set implementation in Haskell 2010.
    hackage.haskell.org/package/containers-0.7/docs/Data-Set.html, 2002.
  • [16] Daan Leijen and Andriy Palamarchuk. Map implementation in Haskell 2010.
    hackage.haskell.org/package/containers-0.7/docs/Data-Map.html, 2008.
  • [17] Jürg Nievergelt and Edward Reingold. Binary search trees of bounded balance. In 4th Annual ACM Symposium on Theory of Computing (STOC), pages 137–142. ACM, 1972. doi:10.1145/800152.804906.
  • [18] Jürg Nievergelt and C. K. Wong. Upper bounds for the total path length of binary trees. Journal of the ACM, 20(1):1–6, 1973. doi:10.1145/321738.321739.
  • [19] Daniel Sleator and Robert Tarjan. Self-adjusting binary search trees. Journal of the ACM, 32(3):652–686, 1985. doi:10.1145/3828.3835.

Appendix A Missing proofs of Sections 2 and 3

Theorem 3. [Restated, see original statement.]

When (α,β)𝒟, each 𝖦𝖢𝖡[α,β]-tree with 𝐍 nodes is of external path length

λ(𝐍+1)log2(𝐍+1)/Δ,

where Δ=(𝖧2(α)+α𝖧2(β/α))/(1+α), and 𝖧2(x)=xlog2(x)(1x)log2(1x) is Shannon’s binary entropy.

Proof.

Let λ(𝒯) be the sum of the weights of the nodes of a tree 𝒯: we prove by induction on |𝒯| that λ(𝒯)|𝒯|log2(|𝒯|)/Δ.

When |𝒯|=1, this inequality rewrites as 00. When |𝒯|=2, it rewrites as Δ1, which follows from the fact that 𝖧2(x)1 whenever x[0,1]. Therefore, we assume that |𝒯|3.

Let n be the root of 𝒯, and let 𝒯1 and 𝒯2 be its left and right sub-trees. Without loss of generality, we assume that |𝒯1||𝒯2|. Thus, |𝒯1||𝒯|/2>1, and 𝒯1 is non-empty. Let n1 be its root, and let 𝒯11 and 𝒯12 be the left and right sub-trees of n1. Once again, without loss of generality, we assume that |𝒯11||T12|.

Let t=|𝒯|x=|𝒯1|/|𝒯| and y=|𝒯11|/|𝒯1|. The induction hypothesis states now that

Δλ(𝒯)/t =Δ(x+1)+Δλ(𝒯11)/t+Δλ(𝒯12)/t+Δλ(𝒯2)/t
Δ(x+1)+xylog2(txy)+x(1y)log2(tx(1y))+(1x)log2(t(1x))
Δ(x+1)+log2(t)x𝖧2(y)𝖧2(x),

and it remains to prove that the quantity 𝖥α,β(x,y)=x𝖧2(y)+𝖧2(x)Δ(x+1) is non-negative.

First, let γ=β/α. If γxα, since 𝖧2 is decreasing on the interval [1/2,1] and 1/2yβ/x, we know that 𝖥α,β(x,y)𝖥α,β(x,β/x). Moreover, the function 𝖦:x𝖥α,β(x,β/x) is concave on [γ,α], because its second derivative is

𝖦′′(x)=1β(xβ)(1x)ln(2)<0

whenever βγ<x<α1. Observing that 𝖦(α)=𝖥α,β(α,γ)=0 and that

𝖦(γ) =𝖥α,β(γ,α)=(β𝖧2(α)+𝖧2(γ))(γ+1)(𝖧2(α)+α𝖧2(γ))/(α+1)
=(1β)(𝖧2(γ)𝖧2(α))/(α+1)0,

we conclude that 𝖥α,β(x,y)𝖦(x)min{𝖦(α),𝖦(γ)}0 whenever γxα. Finally, if 1/2xγ, and since 1/2yα, we have 𝖥α,β(x,y)𝖥α,x(x,y)𝖥α,x(x,α)0.

Proposition 4. [Restated, see original statement.]

Let (α,β) be a pair belonging to 𝒟′′, and let 𝐍0 be an integer. There exists a 𝖦𝖢𝖡[α,β]-tree with 𝐍 nodes, of height h2log2(𝐍+1)/log2(β)7 and external path length λ(𝐍+1)log2(𝐍+1)/Δ4𝐍.

Proof.

Let γ=β/α. For each integer s1, we define inductively a tree 𝒯(s) of weight s as follows:

  • if s2𝒯(s) is the only tree of weight s;

  • if s3𝒯(s) is the tree whose left child is 𝒯(s1) and whose right child’s children are 𝒯(s21) and 𝒯(s22), where s1=sαss21=γαs and s22=ss1s21.

We present in Figure 4 the trees 𝒯(s) obtained when 1s6. Since 2/3<γα<3/4, these trees do not depend on the values of α and γ (or β).

Figure 4: Trees 𝒯(1) to 𝒯(6). Nodes are labelled by their weight; empty trees are denoted by .

We first prove by induction on s that 𝒯(s) is a 𝖦𝖢𝖡[α,β]-tree. This is visibly true when s4, hence we assume that s5. Let s2=αs be the weight of the right child of 𝒯(s): it suffices to prove that s1γss2αss21γs2 and s22γs2. The inequalities s2αs and s21γs2 are immediate, and s22s/33, thereby proving that:

  • s1(1α+1/s)s(12/3+1/5)s<2s/3<γs;

  • s22(1γ+1/s2)s2(12/3+1/3)s2=2s2/3<γs2.

Now, let h(s) be the height of 𝒯(s) and let λ(s) be the sum of the weights of the nodes of 𝒯(s). We will prove inductively that h(s)h+(s+κ) and λ(s)λ+(s+κ)+v for all s1, where we set h+(x)=2logβ(x)7λ+(x)=xlog2(x)/Δ4xκ=(γ+1)/(1β) and v=(1+(α+1)κ)/2.

First, we observe, when (α,β)𝒟, that 4/5𝖧2(3/4)Δ𝖧2(2/3)log2(5e)/4, 3κ4 and 3v4. We will use these inequalities several times below.

For example, κ+15(4/3)6β3 and κ+26(4/3)7β7/2, so that h(s)=s2h+(s+κ) when s=1 or s=2.

Then, if s3, observing that s21+κβsγ1=β(s+κ) and using the induction hypothesis proves that h(s)h(s21)+2h+(s21+κ)+222logβ(β(s+κ))7=h+(s+κ).

Second, let λ(s) be the smallest possible sum of the weights of the nodes of a binary tree with weight s. By construction, λ(s)λ(s), and λ shines as the non-decreasing convex function defined by λ(1)=0 and λ(s)=s+λ(s/2)+λ(s/2) for all s2. Consequently, observing that

λ+(s+κ)+v5(s+κ)log2(s+κ)/44(s+κ)+v5(s+4)log2(s+4)/44(s+3)+4

whenever s1 suffices to check (by computer) that λ(s)λ(s)slog2(s)λ+(s+κ) for all integers s26.

Then, if s27, observe that α(1α)(s+κ)2/3×1/4×305. It follows that:

  • s1+κ(1α)s+κ(1α)(s+κ)5;

  • s21+καγs+κγ1=αγ(s+κ)5;

  • s22+κα(1γ)s+κ+γ1α(1γ)(s+κ)5.

Moreover, the function λ+ is increasing on the interval (16Δ/e,+), and 16Δ/e5. In addition, the equality λ+(x)=(α+1)x+λ+((1α)x)+λ+(αγx)+λ+(α(1γ)x is valid for all x1. Thus, the induction hypothesis proves that

λ(s) s+s2+λ(s1)+λ(s21)+λ(s22)
s+s2+λ+(s1+κ)+λ+(s21+κ)+λ+(s22+κ)+3v
(α+1)s+λ+((1α)(s+κ))+λ+(αγ(s+κ))+λ+(α(1γ)(s+κ))+(3v1)
λ+(s+κ)(α+1)κ+(3v1)=λ+(s+κ)+v.

Finally, we check by hand that λ(s)slog2(s)5slog2(s)/44s+4slog2(s)/Δ4s+4 when 1s8, whereas, since λ+ is increasing on [5,+),

λ(s) λ+(s+κ)+vλ+(s+3)+3=(s+3)log2(s+3)/Δ4s9
(slog2(s)+3log2(s+3))/Δ4s9
slog2(s)/Δ+15log2(12)/44s9slog2(s)/Δ4s+4

when s9.

Lemma 8. [Restated, see original statement.]

Let α and β be real numbers such that 1/2α<3/4 and (α)βα2. Then, let

α^=12α+6α2(1α)(5α)5(2α1)β^=1+α(1α)2+4α(α2β)2(1α2+β)α,

as well as δ=min{α^,β^/α}.

We have 1/4<β^β. Moreover, when α^uα and β^yβ, if we apply the 𝖦𝖢(y,0)-balancing algorithm on a 𝖦𝖢𝖡α^,1[α,β]-tree 𝒯, cases 1 to 5 are pairwise incompatible, and those rotations they trigger can always be performed; if the root of 𝒯 is not y-GC-balanced, the algorithm transforms 𝒯 into a 𝖦𝖢𝖡u,y[α,β]-tree 𝒯 such that 𝖻𝖺𝗅𝖢(𝒯)𝖻𝖺𝗅𝖢(𝒯) and whose affected nodes are (δ,β^)-balanced.

Proof.

Below, we will use the following inequalities, which are all easy to check with any computer algebra system (whereas some look too frightening to check by hand) whenever 1/2α<3/4 and (α)βα2. Except the first three inequalities, each of them is labelled and later reused to prove another inequality with the same label; the first two inequalities already prove that 1/4<β^β, as stated in Lemma 8.

1/4<β^;
α<2β^;
(1δ)(αβ^)<δ(1α); (3.2)
1β^<δ; (3.4)
(1δ)β<δ(1α); (4.2)
(1δ)αβ<δ(1α); (4.4)
1δ+(δ+βδ1)αδβ^; (4.5b)
β^β;
α2<δ; (3.1)
(1δ)(1γ^)δγ^(1α); (3.3)
(1δ)(αβ^)<δ(β^αβ); (4.1)
αβ^+αβ<δ; (4.3)
1δ(1+β); (4.5a)
1+(β1)γ^<δ. (4.6)

Then, let |m| denote the weight of a node u in the tree 𝒯, and let |m| denote its weight in 𝒯. When these weights are equal, we will prefer using the notation |m| even when considering the weight of m in 𝒯. The root of 𝒯 is denoted by n, its left and right children are denoted by n1 and n2, and so on. Finally, we set γ^=β^/α, so that δ=min{α^,γ^}.

Since 2βα and n is α-C-balanced, observing that 2β|n|α|n||n1|=|n11|+|n12| proves that the inequalities |n11|>β|n| and |n12|>β|n| are incompatible. Similarly, the nodes n1 and n2 are α-C-balanced, and thus observing that 2β|n|α|n|=α|n1|+α|n2||n11|+|n21| proves that the inequalities |n11|>β|n| and |n21|>β|n| are incompatible. Finally, for symmetry reasons, all four inequalities |nij|>β|n| are incompatible: this makes our description of the algorithm unambiguous, as announced.

Then, proving that the desired rotations can indeed be performed simply requires showing that, if some case 1 to 4 happens, the grandchild nij be a grandchild of n whose weight |nij| is maximal is an actual tree node instead of an empty node of weight 1/2 or less. If this were the case, we would have |ni|=2|nij|1, and ni would be either an empty node or a leaf; in both cases, we would have |n|2|ni|, i.e., |n|4|nij|, contradicting the inequality |nij|>y|n|β^|n|>|n|/4.

It remains to prove that each affected node m is δ-C-balanced; its children being α-C-balanced, it will then be β^-GC-balanced. Furthermore, if a rotation was triggered, n was not β-GC-balanced but its children were α-C-balanced, which proves that 𝖻𝖺𝗅𝖢(𝒯)>β/αδ^𝖻𝖺𝗅𝖢(𝒯).

When |n11|>w|n|β^|n|, a simple rotation is performed, and

|n11| α|n1|α2|n|δ|n|=δ|n1|; (3.1)
(1δ)|n12| =(1δ)(|n1||n11|)(1δ)(α|n|β^|n|)
δ(1α)|n|δ(|n||n1|)=δ|n2|; (3.2)
(1δ)|n2| =(1δ)(|n||n1|)(1δ)(|n||n11|/α)(1δ)(1γ^)|n|
δγ^(1α)|n|δ(1/α1)|n11|<δ(|n1||n11|)=δ|n12|; (3.3)
|n| =|n||n11|(1β^)|n|δ|n|=δ|n1|, (3.4)

which means precisely that n and n1 are δ-C-balanced in 𝒯.

Similarly, when |n12|>w|n|β^|n|, a double rotation is performed, and

(1δ)|n11| =(1δ)(|n1||n12|)(1δ)(α|n|β^|n|)
δ(β^|n|αβ|n|)δ(|n12|β|n1|)δ(|n12||n122|)=δ|n121|; (4.1)
(1δ)|n121| (1δ)β|n1|δ(|n1|α|n1|)δ(|n1||n12|)=δ|n11|; (4.2)
|n1| =|n1||n12|+|n121||n1||n12|+β|n1|
(αβ^+αβ)|n|δ|n|=δ|n12|; (4.3)
(1δ)|n122| (1δ)β|n1|(1δ)αβ|n|δ(1α)|n|δ(|n||n1|)=δ|n2|; (4.4)
(1δ)|n2| (1δ)|n2|+βδ|n1|δ|n121|=(1δ)|n|+(δ+βδ1)|n1|δ|n121|
(1δ)|n|+(δ+βδ1)α|n|δ|n121| (4.5a)
δβ^|n|δ|n121|δ|n12|δ|n121|=δ|n122|; (4.5b)
|n| =|n||n1|+|n122||n|+(β1)|n1|
|n|+(β1)|n12|/α|n|+(β1)γ^|n|δ|n|=δ|n12|, (4.6)

which means precisely that nn1 and n12 are δ-C-balanced in 𝒯.

Finally, the cases where |n21|>w|n| or |n22|>w|n| are symmetrical, and therefore the conclusion of Lemma 8 is valid in these cases too.