Abstract 1 Introduction 2 Related Work 3 Preliminaries 4 Properties of Binary Trees 5 Enumeration of Full Binary Trees 6 Enumeration of Binary Trees 7 Extension to 𝜶-ary trees 8 Conclusion References Appendix

Enumeration of Ordered Trees with Leaf Restrictions

Yasuaki Kobayashi ORCID Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan Dominik Köppl111Corresponding author ORCID Department of Informatics, Yamanashi University, Japan Yasuko Matsui ORCID Department of Mathematical Sciences, Tokai University, Japan Hirotaka Ono111Corresponding author ORCID Department of Mathematical Informatics, Nagoya University, Japan Toshiki Saitoh ORCID School of Computer Science and Systems Engineering, Kyushu Institute of Technology, Fukuoka, Japan Yushi Uno Graduate School of Informatics, Osaka Metropolitan University, Japan
Abstract

An α-ary tree for a constant α2 is a rooted tree in which each node has at most α children. A node having no children is called a leaf. For a given rooted tree and a node v, the number of edges from the root to v is called the depth of v. We call a vector 𝒘=(w1,w2,,wd) of nonnegative integers an (α-ary) distribution if there is an α-ary tree T such that the number of leaves at each depth i[1..d] in T is wi. Although not every vector of nonnegative integers is a distribution, a distribution can be associated with many α-ary trees. In this paper, we present an algorithm to enumerate all α-ary trees for a given distribution. Our algorithm reports the first tree in O(d+i=1dwi) time, and then each subsequent α-ary tree in O(maxi=1dwi) time by representing each tree as the difference from the previous one. The algorithm can be restricted to computing all trees that are full, i.e., trees whose nodes have exactly α or no children.

Keywords and phrases:
binary trees, ordered trees, rooted trees, enumeration algorithm, constant-time delay
Category:
Research
Funding:
Yasuaki Kobayashi: JSPS KAKENHI Grant Numbers JP23K28034, JP24H00686, and JP24H00697.
Dominik Köppl: JSPS KAKENHI Grant Numbers JP23H04378 and JP25K21150.
Yasuko Matsui: JSPS KAKENHI Grant Numbers JP20H05964 and JP20K04973.
Hirotaka Ono: JSPS KAKENHI Grant Numbers JP20H05967, JP22H00513, JP24K02898, and JP25K03077, and JST CRONOS Grant Number JPMJCS24K2.
Toshiki Saitoh: JSPS KAKENHI Grant Number JP21H05857.
Yushi Uno: JSPS KAKENHI Grant Numbers JP20H05964 and JP21K11757.
Copyright and License:
[Uncaptioned image] © Yasuaki Kobayashi, Dominik Köppl, Yasuko Matsui, Hirotaka Ono, Toshiki Saitoh, and Yushi Uno; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Enumeration
; Mathematics of computing Trees ; Theory of computation
Editors:
Alessio Conte, Andrea Marino, Giovanna Rosone, and Jeffrey Scott Vitter

1 Introduction

It is our pleasure to dedicate this work to Roberto Grossi, whose broad enthusiasm for algorithms and data structure has significantly shaped the landscape of enumeration algorithms [9]. Over the past decades, his contributions made significant impact on structural and algorithmic aspects of trees, strings, and graphs, inspiring both theoretical advances and practical applications.

In this spirit, we turn our attention to the enumeration of ordered trees, a topic rooted in combinatorics with strong connections to data structures, formal languages, and algorithmic analysis. Ordered trees play a fundamental role in various domains, hierarchical data representations, and the analysis of recursive algorithms are major examples. Enumerating ordered trees uncovers rich combinatorial patterns, with which we hope to offer a tiny tribute to Roberto’s enduring influence, exploring a topic in line with his passion for enumerating combinatorial structures of mathematical beauty.

We here tackle a problem whose restricted version on binary trees has been posed in 2023 as an open problem presented at IWOCA 2023: The question is to efficiently enumerate all rooted full binary trees whose number of leaves at each depth is given as an input. This problem spawned from the research of prefix-free codes in coding theory.

Here, we tackle this enumeration problem in even more general settings: binary (not necessarily full) trees and α-ary trees of a fixed depth d for a constant α2. For any of these tree types, our algorithm outputs the first solution in O(d+n) time. Subsequently, it outputs the difference between two successive solutions in O(m) time with constant delay, where n is the number of leaves and m is the maximum number of leaves at any depth. We here define delay as the time between the end of the previous output and the start of the subsequent output.

2 Related Work

With respect to the enumeration of rooted ordered trees, we are aware of an algorithm with constant delay to output all trees [3], or with restrictions on a specific diameter [12] or on a fixed number of internal nodes and leaves [17]. Given the number m of internal nodes, Zaks [18] gave an enumeration algorithm for all α-ary trees having m internal nodes. This algorithm reports the trees in a lexicographical order based on a bit encoding of the trees. Recently, an algorithm [5] enumerating AVL trees has been proposed. In addition, closed formulas have been studied for counting various properties in rooted ordered trees of unbounded degree [6, 7].

For the enumeration of full binary trees with n leaves, we propose a 2n-bit encoding of a full binary tree. We use this encoding in the same way as Ruskey and Proskurowski [13] for the enumeration of binary trees by using Gray codes for enumerating bit encodings. Another bit encoding variant has been proposed by Baba et al. [1], who however used a depth-first encoding based on DFUDS [2], while our encoding is level-wise in the spirit of LOUDS [10].

3 Preliminaries

Given a fixed integer α2, an α-ary tree is an ordered rooted tree whose nodes each have at most α children, which are ordered. In this paper, we only consider trees that have at least two nodes. Except for the root, the parent of each node is uniquely determined. There is a left-to-right ordering of the child nodes with respect to the parent. A node is called a leaf if it has no child, otherwise it is called an internal node. For a given rooted tree and a node v, the number of edges from the root to v is called the depth of v, which we denote by depth(v). In particular, the depth of the root is zero. The height of the tree is the depth of the leaf with the largest depth. A full α-ary tree is an α-ary tree whose nodes are either leaves or internal nodes with α children. A special case is α=2, for which we name a 2-ary tree a binary tree.

In what follows, we start with the enumeration of full binary trees, which is the easiest case. Then we extend our techniques to arbitrary binary trees, and finally consider α-ary trees.

4 Properties of Binary Trees

Our input is a so-called binary (tree leaf) distribution, a term coined by Buro [4, Lemma 2.3]. It is defined as follows. We call an integer vector 𝒘=(w1,,wd) a binary distribution if there is a binary tree T with depth d such that T has wi leaves with depth i for every i[1..d]. We say that the binary (tree leaf) distribution of T is 𝒘 – there can be many trees that have the same binary distribution. A binary distribution is also called a full-binary distribution if there is a full binary tree with the same property. Buro [4, Thm. 2.4] showed that the full-binary distribution of a full binary tree with maximum path length is unique. Here we study the connection of arbitrary binary distributions and their corresponding trees. In detail, the first problem we want to tackle in this paper is as follows.

Problem 1.

Given a (full-)binary distribution 𝐰, enumerate all (full) binary trees whose (full-)binary distribution is 𝐰.

We first characterize the necessary conditions for an integer vector to be a (full) binary distribution. For that, we make use of Kraft’s inequality [11].

Lemma 2 (Kraft’s inequality).

For a binary tree with leaves v1,,vn, it holds that i=1n1/2depth(vi)1.

Let f(𝒘)=i=1dwi/2i be the weight of a vector 𝒘 of nonnegative integers. Then f(𝒘)1 is a necessary condition according to Kraft’s inequality. We show that it is also sufficient.

Lemma 3.

A vector of nonnegative integers 𝐰 is a binary distribution if and only if f(𝐰)1.

Proof by induction on the depth 𝒅..

For d=1, the binary tree can have only one or two leaves, i.e., 𝒘=(1) or 𝒘=(2), and in any case w12 holds. For the induction hypothesis, assume that our claim holds for a depth d. Given a vector of nonnegative integers 𝒘=(w1,,wd+1) with f(𝒘)1, we show that there exists a binary tree whose binary distribution is 𝒘. For that, we define a vector 𝒘=(w1,,wd1,wd) and distinguish the following two cases.

  • If wd+1 is even, we set wd=wd+wd+1/2, and thus f(𝒘)=f(𝒘). By the induction hypothesis, there is a binary tree T whose binary distribution is 𝒘. If we expand wd+1/2 leaves with depth d in T to internal nodes with two leaves, we obtain a binary tree whose binary distribution in 𝒘.

  • If wd+1 is odd, we rewrite the inequality 1f(𝒘)=i=1d+1wi2i as 2d+1i=1d+1wi2d+1i. Observing that the right summation only consists of even terms except for wd+1, the left-hand side is even while the right-hand side is odd. This allows us to increment the right-hand side by one while still retaining validity, i.e., as 2d+1i=1dwi2d+1i+wd+1+1. We therefore can apply the even case analysis for the vector (w1,,wd,wd+1+1).

Lemma 4.

A vector 𝐰=(w1,,wd) with nonnegative integers and wd>0 is the full-binary distribution of a full binary tree if and only if f(𝐰)=1.

Proof.

We prove both directions by induction, for which we use d=1 as the starting point. For depth 1, on the one hand, there is only one full binary tree with the root having two leaves as children. On the other hand, there is only one vector 𝒘=(w1)=(2) with f(𝒘)=1, and therefore the claim holds. For the induction hypothesis, let us assume that the claim holds for d.

Direction .

Consider a full binary tree T whose full-binary distribution is 𝒘=(w1,,wd+1). We can group the wd+1 leaves into wd+1/2 pairs, each pair of leaves sharing the same parent at depth wd. We therefore can contract these pairs with their parents to a contracted tree T having wd+wd+1/2 leaves at depth wd. Since T is a full binary tree and wd+wd+1/2>0, according to the induction hypothesis, the full-binary distribution 𝒘=(w1,,wd+wd+1/2) of T satisfies f(𝒘)=1. This yields f(𝒘)=f(𝒘) because

1=i=1d1wi2i+wd+wd+1/22d=i=1d+1wi2i.
Direction .

Given a vector 𝒘=(w1,,wd+1) with nonnegative integers such that f(𝒘)=1, i.e., i=1d+1wi2i=1. Multiplying this equation on both sides by 2d+1 gives 2dw1+2d1w2++2wd+wd+1=2d+1, and hence wd+1 must be even since all other terms are even. Therefore, wd=wd+wd+1/2 is integer and 𝒘=(w1,,wd1,wd) is a vector with f(𝒘)=1 because

i=1d1wi2i+wd2d=i=1d1wi2i+wd2d+wd+12d+1=1.

By the induction hypothesis, 𝒘 is a full-binary distribution of a full binary tree T. We now expand wd+1/2 leaves of T at depth d to internal nodes, each having two leaf children. This gives us a full binary tree whose full-binary distribution is 𝒘.

Let g(i,𝒘)=j=idwj/2ji for i[1..d].

Corollary 5.

Given a full-binary tree distribution 𝐰=(w1,,wd), then g(i,𝐰) is the number of nodes with depth j, which is even, for every i[1..d].

Proof.

Following the proof of Lemma 4, we iteratively merge the leaves on depth j with their parent node for j from d to i+1. At each step, we obtain a full binary tree such that the number of leaves with the highest depth is always even.

Theorem 6.

The number of full binary trees having a full-binary distribution 𝐰=(w1,,wd) with wd>0 and f(𝐰)=1 is

i=1d1(g(i,𝒘)wi).
Proof.

We show the claim by induction on d. The base case is d=1, for which we have seen that 𝒘=(2) is uniquely defined. For the induction hypothesis, let us assume that the claim holds for d.

Consider a full-binary distribution 𝒘=(w1,,wd+1) with wd+1>0 and f(𝒘)=1. Then g(i,𝒘) is even for all i according to Corollary 5. In particular, the vector 𝒘:=(w1,,wd) defined by wi=wi for i[1..d1] and wd=wd+wd+1/2 is an integer vector with f(𝒘)=1. With the induction hypothesis for 𝒘 we obtain

i=1d1(j=idwj2jiwi)=i=1d1(wi+wi+12++wd12(d1)i+wd+wd+1/22diwi)=i=1d1(j=id+1wj2jiwi).

Each of these trees has wd+wd+1/2 leaves with depth d. Fix one of them, which we modify to obtain a tree whose full-binary distribution is 𝒘. For that, we select wd+1/2 of those leaves, for which we have (wd+wd+1/2wd+1/2)=(wd+wd+1/2wd) possibilities. By modifying each of the trees whose full-binary distribution is 𝒘 in all possible ways, we obtain

i=1d1(j=id+1wj2jiwi)(wd+wd+1/2wd)=i=1d(j=id+1wj2jiwi).

By construction, all modifications are distinct by using only expansions. If two expansions are equal, they must have originated from the same tree we expanded.

Example 7.

For d=2, there are two full-binary distributions 𝐰1=(0,4) and 𝐰2=(1,2). For 𝐰1, (0+420)=1, and there is only the perfect full binary tree having 𝐰1 as full-binary distribution. For 𝐰2, (1+221)=2, there are two full binary trees, as shown in Figure 1.

Figure 1: Illustration of Example 7, where the left tree has the full-binary distribution (0,4) and the other trees (1,2). Black nodes are leaves.

5 Enumeration of Full Binary Trees

In what follows, we present an enumeration algorithm for Problem 1 in the case of full binary trees. Our main idea is to process the trees in a linear order, which we define by a binary encoding of the trees.

For that, we recall the result of Corollary 5 that g(i,𝒘) gives the number of nodes with depth i[1..d], which is common to all full binary trees having the full-binary distribution 𝒘. In short, we define the vector 𝜼=(η1,,ηd) with ηi:=g(i,𝒘) for i[1..d]. Then i=1dηi is the number of all nodes. In particular, we can use an ηi-length bit vector to specify which nodes on depth i are internal nodes or leaves. The list of bit vectors on all depths is sufficient to represent a full binary tree. Then i=1dηi=i=1dj=idwj2ji2i=1dwi. For n=i=1dwi, this means that we can represent each full binary tree whose full-binary distribution is 𝒘 in 2n bits. We use this bit representation to enumerate these trees. For that, we build on the algorithm of [14].

Lemma 8 ([14]).

There is an algorithm that enumerates all subsets with m[1..n] integers of the integer range [1..n] in ascending lexicographic order with constant delay.

Figure 2: Enumerating all full binary trees whose distribution is 𝒘=(1,1,1,2) with d=4. Both trees on the left and right depict the same enumeration tree of 𝒘 explained in Section 5, but with a different representation. A node on the left depicts a full binary tree T whose encoding 𝑩T is given by a node on the right at the same position in the respective tree. We partition each 𝑩T by the depths [1..d1] (depth d contains only zeros). For instance, the root node of the right enumeration tree E specifies with 10 that there is a left leaf and a right internal node on each depth in [1..3], while the deepest leaf of E specifies a tree with opposite characteristics.
Figure 3: An enumeration tree E on an extended example compared to Figure 2, with distribution 𝒘=(0,2,3,2). We here provide a combined view of both trees as illustrated in Figure 2. While the root node shows its full bit vector, we only depict the bit vector at the change level of each other node.

The connection to our problem is that a bit vector of length n with m “1”s exactly represents such a subset. Since we can impose the natural lexicographic order on the 2n-length bit vectors, the missing step is to find, given a 2n-length bit vector representing a full binary tree, the lexicographic succeeding bit vector that represents a tree we want to output.

To facilitate our starting, we assume that wi1 for every i[1..d]. Suppose that we have a full-binary tree T whose distribution is 𝒘. We deduce from T a bit vector 𝑩T representing T. For explanation, we assume that the least significant bits are on the left. We start with a bit vector 𝑩T of length i=1dηi2n to represent the order of the internal nodes and leaves on each depth. To do this, we partition 𝑩T by depths 𝑩T=(𝒃(1)𝒃(d)) such that 𝒃(i) is a bit vector of length ηi with bj(i)=1 if and only if the j-th node with depth i is a leaf. In particular, 𝒃(i) has wi zeros. As an example, the values of 𝑩 for the trees depicted in Figure 1 are 001111 (left), 1011 (middle), and 0111 (right). (Recall that 𝑩T does not encode the root node, for which we assume it is always an internal node.) As a practical optimization, we can omit the highest depth d since it always contains leaves, i.e., 𝒃(d) contains only zeros. Since we can reconstruct T from 𝑩T uniquely, there is a one-to-one mapping from full binary trees with at least two nodes and a subset of bit strings defined by the above tree encoding. Consequently, each full binary tree has exactly one specific bit vector representation.

In what follows, we represent each full binary tree T having the distribution 𝒘 with the bit vector 𝑩T, which is stored as a node in a so-called enumeration tree E. The root of E represents the full binary tree T obtained by grouping all internal nodes to the left side, which we expressed by the lexicographic least bit vector, obtained by shifting all “1”s in every 𝒃(i) of 𝑩T to the left. An E-node is augmented by a bit vector 𝑩T representing a full binary tree T and an integer, which we call the change level. The change level of an E-node v states the depth at which the bit vector of v differs from its parent; as an exception, we stipulate that the change level of the root node is 0. The change level is an invariant we impose on E in the sense that, when traversing from a node u to of its children v, the tree represented by u and by v can and must differ only at the change level of v. A consequence is that E-nodes with change level d are leaves.

We organize the children of an E-node as follows. Fix one E-node v with bit vector 𝑩T=(𝒃(1)𝒃(d)) and change level δ[0..d1]. Then the children of v correspond to all possible rearrangements of the bit patterns in one of the vectors 𝒃(δ+1),,𝒃(d). We start with the first children of v that enumerate all possible bit patterns of 𝒃(d) (resulting by permutations). So they have change level d with a bit vector that differs from v’s bit vector only at depth d. (In particular, these nodes are leaves by the definition of the change level.) Next, we continue with the group of children for 𝒃(d1) and change level d1, and so on, up until 𝒃(δ+1) and change level δ+1. In particular, we arrange each group of children with the same change level i in lexicographic order of their corresponding bit vector 𝒃(i). This order of the children and the order of their groups allows us to enumerate all full binary trees by a pre-order traversal of E in lexicographic order. (In particular, the lexicographic order imposes the invariant that we never enumerate a full binary tree twice.) This structure of E warrants the following – see Figure 2 for an example and Figure 3 for a more elaborated example involving a non-binary choice at one level:

  • Any subtree rooted at a node of E is an enumeration tree itself. That is because an E-node with change level δ[1..d] representing a tree T is the root node of an enumeration tree that considers only the suffix 𝒃(δ+1)𝒃(d) of 𝑩T for the enumeration. In detail, for an E-node with change level δ it holds that 𝒃(i)=(11wi00ηiwi) for every i[δ+1..d1] by the construction E – recall that a node with change level δ only makes a change on 𝒃(δ+1) compared to its parent node, and reading the change levels upwards to the root gives a monotone decreasing sequence of change levels (so there is no way that any deeper level has changed with respect to the initial bit vector of the root.) In particular, an E-node is the lexicographically least among all its descendants.

  • Since deeper recursion levels only touch shorter suffixes of the bit vectors, the rightmost leaf of an E-node v is lexicographically smaller than v’s right sibling (if it exists). Therefore, all E-nodes represent different vectors, and the sum of all E-nodes is the size of the output we want to compute.

If we now run the algorithm of Lemma 8 on d nested loops we can traverse the constructed enumeration tree E in a pre-order traversal and output each node upon visiting it. The maximum delay happens when moving to the next value in the lowest d1 loops. This delay is O(d), but O(1) amortized because to reach a depth of d we have already output d nodes.

It is left to argue that we can transform the sequence of outputted E-nodes into the full binary trees we want to output. For that, we initially build the full binary tree represented by the root of E from scratch, taking O(n) time. Subsequently, for each E-node visit, it suffices to change the order of the nodes of the previously computed full binary tree at a specific depth, costing O(maxig(i,𝒘)) time.

Lemma 9.

Given a full-binary distribution 𝐰 with wi1 for all i[1..d], we can enumerate all full binary trees whose full-binary distribution is 𝐰 in lexicographic order with constant amortized delay or O(d) worst-case delay. The precomputation takes O(n) time.

We can generalize this approach to full-binary distributions having zero values. The obstacle is that we cannot guarantee constant delay since we might traverse long paths with zero weights. As a remedy, we want to skip each depth i with wi=0. To do that, let us denote by I={i[1..d]wi>0} all depths with leaves. Then it suffices to restrict the values of a change level of a node to values of I{0}. By construction, the root node has change level 0, and a node with change level δ has children whose augmented value is the successor of δ in I.

Lemma 10.

Given a full-binary distribution 𝐰 with wd1, we can enumerate all full binary trees whose full-binary distribution is 𝐰 in lexicographic order with constant amortized delay or O(d) worst-case delay. The precomputation takes O(n) time.

To obtain worst-case constant delay time, we can drop the requirement to output in lexicographic order by using a folklore technique, such as the alternating output technique [15]. The idea is to output nodes on odd depths when visiting them the first time, but postpone the output of nodes on even depths until we backtrack and move outside their respective subtrees.

Theorem 11.

Given a full-binary distribution 𝐰=(w1,,wd), the corresponding full binary tree can be output in O(d+n) time for the first tree and then output all trees in the form of differences with a size of O(m) with a delay of O(1) in O(m) time, where n=i=1dwi and m=max{wii[1..d]}.

6 Enumeration of Binary Trees

In what follows, we show how to generalize the enumeration of full binary trees to general binary trees, where internal nodes are allowed to have one or two children.

From Lemmas 3 and 4, the binary distribution 𝒘=(w1,,wd) of a non-full binary tree also satisfies f(𝒘)=i=1dwi/2i<1. We make the connection to full binary trees by adding dummy leaves to all internal nodes that have only one child. To this end, we denote the number of dummy leaves added at depth i by xi, and construct a vector 𝒙=(x1,,xd). The sum 𝒙+𝒘 gives a vector that is the binary distribution of a full binary tree because i=1d(wi+xi)/2i=1. In what follows, we call 𝒙 the complement vector of 𝒘, and 𝒘+𝒙 the completion of 𝒘. For such 𝒙, a nonnegative integer vector 𝒚 that satisfies 𝒚𝒙 is called the subcomplement vector of 𝒘.

Lemma 12.

For a complement vector 𝐱 of 𝐰 we have that xig(i,𝐰).

Proof.

We can only attach a dummy leaf at depth i to internal nodes that have exactly one child on depth i. By the pigeonhole principle, there are at most g(i,𝒘) many such children on depth i.

From Lemmas 3 and 4, the following immediately holds.

Lemma 13.

If the binary distribution 𝐰 satisfies f(𝐰)<1, then there exists a complement vector for 𝐰.

However, the complement vector is not unique, in general. For example, 𝒘:=(0,2,2) is not a full-binary distribution since 2/4+2/8=3/4, while (0,1,0) and (0,0,2) are both complement vectors of 𝒘. See Figure 4 for an example.

Figure 4: Examples for Lemma 13 with the binary distribution 𝒘=(0,2,2), where two binary trees exist, which can be complemented by adding dummy leaves. The left tree and the right tree can be complemented by the vectors (0,0,2) and (0,1,0) to full binary trees, respectively. Black nodes are leaves, and dummy nodes are rectangles.

Also, even if 𝒙 satisfies f(𝒘+𝒙)=1, it does not necessarily mean that 𝒙 is the complement vector of 𝒘. For example, 𝒘:=(0,2,0,2) has (0,0,2,2) as a complement vector, but 𝒙=(0,0,0,6) is not a complement vector: A tree having the binary distribution 𝒘 cannot have six dummy leaves on depth 4. However, a necessary condition is given below.

Lemma 14.

A nonnegative integer vector 𝐱=(0,,0,yd) is a subcomplement vector of a complement vector 𝐲=(y1,,yd) of a binary distribution 𝐰=(w1,,wd) if and only if ydwd(mod2), yd[0,wd], and f(𝐰)+yd/2d1.

Proof.

Let 𝒙=(0,,0,yd).

Direction .

In order for 𝒙 to be a subcomplement vector of 𝒘, a binary tree with yd+wd leaves, including dummy leaves, must exist. Therefore, it follows from Corollary 5 that yd+wd is even, so ydwd(mod2). Also, since dummy leaves can only be attached to internal nodes that have exactly one child (in this case, a leaf), ydwd by Lemma 12. Furthermore, in order for 𝒙+𝒘 to be a binary tree (with dummy leaves), f(𝒘+𝒙)=i=1d(wi+xi)/2i=i=1dwi/2i+xd/2d1 must hold.

Direction .

We show that there exists a complement vector 𝒚 for 𝒘 such that 𝒙𝒚.

For depth d=1, 𝒘 must be either (1) or (2). In the case of 𝒘=(2), 𝒘 represents a full binary tree, so the complement vector of 𝒘 is 𝒚=(0), and thus 𝒙=(0). In the case of 𝒘=(1), the only complement vector is 𝒚=(1), and hence 𝒙=(1) is the unique subcomplement vector of 𝒚=(1) satisfying xd=yd. Thus, in both cases the subcomplement vectors are uniquely defined by the right-hand side conditions.

In what follows, we assume that d2. Given a binary distribution 𝒘 of a binary tree T, by assumption wd+yd and wdyd are even numbers. Hence, we can apply the contraction trick like in previous proofs. For that, consider the vector 𝒘:=(w1,,wd2,wd1+(wd+yd)/2). The vector 𝒘 is well-defined: Since wdydmod2 and ydwd, (wd+yd)/2=(wdyd)/2+yd is a positive integer. Since

f(𝒘)=i=1d2wi/2i+(wd1+(wd+yd)/2)/2d1=f(𝒘)+yd/2d1,

by Lemmas 3 and 13, 𝒘 is a binary distribution of a binary tree T. The normal leaves (i.e., excluding dummy leaves) of depth d1 in T are wd1+(wd+yd)/2 in total, of which

  • wd1 of them are former leaves of T unaffected by the contraction,

  • (wd+yd)/2 internal nodes of T are changed into leaves on depth d1 by contracting them with their two child leaves (where one may be a dummy leaf).

The resulting binary tree T therefore has wd1+(wd+yd)/2 leaves on depth d1 in total. The contraction of T to T by removing depth d does not affect the number of leaves at depth d2 or less in T. By definition, there is a complement vector 𝒛=(z1,,zd1) of 𝒘 that can turn T into a full binary tree. Finally, we turn yd leaves of T on depth d into internal nodes, each having one child. This restores T and induces the complement vector 𝒛=(z1,,zd1,yd) of 𝒘.

For every binary tree, there is only one way to add dummy leaves to internal nodes to make it a full binary tree. Hence, the complement vector for each binary tree, whose completion is the binary distribution of the full binary tree, is uniquely determined. Conversely, given a full binary tree and a configuration classifying leaves in dummy leaves and non-dummy leaves, we can obtain the original binary tree by removing dummy leaves according to the configuration to obtain the original binary tree. Based on this observation, we first find the completion, i.e., the corresponding full binary tree, for 𝒘, and then enumerate the binary trees by considering all ways of adding dummy leaves while enumerating the corresponding full binary completions. That is, by enumerating all possible complement vectors for 𝒘, and then considering the process of enumerating the different ways of adding dummy leaves while considering the corresponding full binary completion, we can enumerate all binary trees. In summarizing the above, the enumeration algorithm is designed in a three-layer structure:

  1. 1.

    enumeration of complement vectors (Section 6.1),

  2. 2.

    enumeration of the way of adding dummy leaves to each complement vector (Section 6.2), and

  3. 3.

    enumeration of binary trees for each way of adding dummy leaves (Section 6.3).

Figure 5: Enumeration tree of complement vectors for the binary distribution 𝒘=(0,1,1,4,2,4), where leaves correspond to the complement vectors. A node on depth i determines the value x[di+1] of the complement vector. The details are described in Section 6.1.
Algorithm 1 Subroutine for enumerating complement vectors for 𝒘, cf. Section 6.1.

6.1 Enumeration of Complement Vectors

We enumerate the complement vectors with the insights of Lemmas 13 and 14. In detail, Lemma 14 gives us a strategy to enumerate the complement vectors by enumerating subcomplement vectors. We show the algorithmic steps in Algorithm 1, which is a recursive algorithm that determines the values of xi in descending order xd,xd1, in a depth-first manner. We initially call EnumCompVector with the arguments (f(𝒘),𝟎,d,0), for the weight f(𝒘) of the binary distribution 𝒘, 𝟎 the d-dimensional vector with 0 entries, the depth d and the number of conceptional leaves on depth d. The last argument needs a definition: This algorithm recursively contracts leaves and dummy leaves at depth i+1 to the so-called conceptional leaves on depth i. If we have xi+1 dummy leaves and wi+1 ordinary leaves on depth i+1, then the number of conceptional leaves at depth i is ci=(xi+1+wi+1)/2, and we consider from then on the binary distribution (w1,w2,,wi1,wi+ci) for a tree of depth i. Now, in the function EnumCompVector, we enumerate all xi satisfying the conditions of Lemma 14 at Line 5. We proceed with each such xi at Line 6 with a subcomplement vector 𝒚=𝒙+(0,,0,xi). At Line 6, we check if we have obtained a completion with 𝒚 at that point. If 𝒚 is a complement vector of 𝒘, we output 𝒚 at Line 7 and return. If we can still increase a value in 𝒚 to add dummy leaves (Line 10), we recursively call the procedure for the preceding depth i1. By Lemma 14, if we recurse, we either continue recursing or turn a subcomplement vector into a complement vector. We can write each recursive call in the form of a tree, where each node is a call of EnumCompVector: the root is the initial call, and each leaf is a call that returns a complement vector. See Figure 5 for an example. Finally, since we can evaluate the checks in the pseudocode in O(1) time, the delay between the output of two complement vectors is O(d) time.

6.2 Enumeration of Dummy Leaves

Given the complement vector 𝒙 for 𝒘, there can be multiple ways to add the dummy leaves specified by 𝒙 to a binary tree whose binary distribution is 𝒘. Our task is to enumerate all these ways. For that, we follow the strategy of Section 5, where we enumerate the leaf/internal node configurations at depth i for full binary trees. However, unlike normal leaves, the sibling of a dummy leaf must be a normal leaf or an internal node. To facilitate checking this condition, we enumerate the internal node one level above the dummy leaf itself, rather than at the configuration of the dummy leaf. We call such an internal node the dummy parent of a dummy leaf. For each dummy parent, it suffices to enumerate the two possibilities of having its dummy leaf as its left or right child.

Figure 6: Enumeration of dummy parent configurations and Gray code, cf. Section 6.2. We start with 𝒘 and 𝒙 defined in Figure 5. The third row in the left upper box is a vector 𝒛 with zi=j=i+1d(wj+xj)/2ji such that νi=wi+xi+zi. We only have dummy leaves on the depths 3,4,5 (those depths i for which xi>0 marked by a red rectangle around the entries of 𝒙). Therefore, our dummy parent configuration considers only the depths 2,3,4, for which we generate bit vectors for each node in the depicted enumeration tree. The lengths of the bit vectors are determined by νx by the entries i with xi+1>0, which are 4,5,7 in our case (blue rectangle in the upper left box). For each node in the enumeration tree we visit, we enumerate all Gray codes on the upper right box by visiting linearly all depicted nodes in this linked list. The Gray code has length i=1dxi and assigns a dummy parent the role whether it has a left or a right dummy leaf.

First, as in Section 5, suppose that we have a full-binary distribution 𝒘+𝒙. From the proof of Corollary 5, the number of nodes at depth i is νi:=g(i,𝒘+𝒙)=j=id(wj+xj)/2ji.

Suppose δ is the smallest depth at which a dummy leaf appears for the first time. At depth δ, there are xδ dummy leaves. By definition, there are xδ dummy parents at depth δ1. We specify these dummy parents by a bit vector of length νδ1 with “1”s at xδ positions.

In doing so, at depth δ1, we have already determined the dummy parents of the dummy leaves at depth δ, regardless of whether they have a dummy leaf as their left or right child, so the number of nodes that we can freely choose at depth δ is νδxδ. We again specify the dummy parents at depth δ and recurse. Therefore, we can express our selection of dummy parents by a (νδxδ)-length bit vector with “1”s in xδ+1 positions. By recursing on deeper levels, we can apply this technique on all depths, generalizing from δ to any i[δ..d]. In what follows, we want to express the configuration of dummy parents on each level by a bit vector, for which we want to bound the size of νi.

Lemma 15.

Each νi is at most 2j=idwj.

Proof.

From Lemma 14, νd=wd+xd2wd. So there is an i[0..d1] such that the induction hypothesis νdk2j=dkdwj holds for every k[0..i]. Then we obtain

νd(i+1)=wd(i+1)+νdi/2+xd(i+1)2wd(i+1)+νdi2j=d(i+1)dwj,

where we used Lemma 12 for xd(i+1)νdi/2+wd(i+1).

Therefore, we can specify dummy parents with an O(j=idwj)-length bit vector at each depth. We denote the sequence of bit vectors for specifying the dummy parent configuration by 𝑫.

Furthermore, we express whether a dummy leaf at depth i is the left or right child of its parent by a bit vector of length xi (for each configuration of the dummy parents at depth i1, there are 2xi ways for the xi dummy leaves at depth i to be attached). In total, we can express the dummy leaf configuration by a (i=1dxi)-length bit vector. Since xiνi2j=idwj by the above analysis, this bit vector has length O(di=1dwi).

An enumeration of bit vectors are Gray codes [8], which have O(1) delay because two subsequently output bit vectors differ by one bit. Here we use Gray codes to enumerate the dummy leaf configuration for each dummy parent configuration with constant delay.

To conclude, we can enumerate the configuration of dummy leaves in the following manner:

  • Enumerate the configuration of the dummy parents with constant time delay, and output this configuration in O(i=1dwi).

  • Generate a Gray code of i=1dxi=O(di=1dwi) bits for each configuration of dummy parents, and enumerate the configuration of dummy leaves.

Figure 6 shows an example of a dummy parent enumeration tree and a Gray code for a binary distribution 𝒘=(0,1,1,4,2,4) and its complement vector 𝒙=(0,0,1,1,2,0).

Figure 7: Enumeration of all leaf configurations, cf. Section 6.3. We follow the example of Figure 6. Here, 𝒙 denotes the vector xi=xi+1. The number of leaves and internal nodes having no dummy leaves at depth i is νixixi+1=νixixi. Since all nodes on depth 1 are internal nodes (w1=0) and all nodes on depth d are leaves (νd=wi), the leaf selection vector only addresses the depths 2,3,4 (red rectangles in the left upper box).

6.3 Enumeration of the Leaf Configuration

Based on the previous discussions, at this point, we have determined the configuration of dummy parents 𝑫 with xi dummy leaves and xi+1 dummy parents of νi nodes at each depth i of the binary tree. Here, we assign roles to the remaining νixixi+1 nodes as leaves and internal nodes having no dummy leaves as children. As before, we can specify the roles in the form of a bit vector of length νixixi+1 with wi “1”s. We call such a vector sequence a leaf configuration vector sequence and denote it by 𝑳. Fixing one configuration of dummy leaves, we enumerate i=1d(νixixi+1wi) combinations in the same way as in Section 5 to obtain all binary trees having that configuration of dummy leaves. We proceed in this order here despite that once 𝒙 is determined, the number of dummy parents and leaves at each depth is fixed, so we can determine the leaf configuration vector independently of the dummy parent configuration. Figure 7 gives an example of a leaf configuration enumeration tree for a binary distribution 𝒘=(0,1,1,4,2,4) and its complement vector 𝒙=(0,0,1,1,2,0).

6.4 Overview of the Algorithm

We combine the above to enumerate binary trees using enumeration trees and Gray codes. For the sake of explanation, we enumerate the configuration of dummy leaves not directly after enumerating the configuration of dummy parents – we can take care of the dummy leaves at any time after we have determined the dummy parents. This also allows us to postpone the computation of the configuration of dummy leaves to the end.

We visualize our pipeline that ends at the generation of Gray codes following pointers from three nested enumeration tree traversals in Figure 8. There, we start with a top-down traversal of the leftmost tree T. Each leaf of T determines a complement vector 𝒙. We perform the actual traversal by executing EnumCompVector. Fixing 𝒙, we move to the neighboring trees TD(𝒙) and TL(𝒙) to enumerate all dummy parent configurations and all leaf configurations 𝑫 and 𝑳, respectively. First, we enter the enumeration tree TD(𝒙). Each time we visit a new node in TD(𝒙), we jump into TL(𝒙) and traverse this tree. For each TL(𝒙) node we visit, we have determined (𝑫,𝑳). Given |x|=i=1dxi, for each Gray code 𝒈{0,1}|x| we enumerate, we finally output (𝑫,𝑳,𝒈). From the discussion in Section 5, we can determine and output the pair (𝑫,𝑳) with constant delay. Since we can also determine 𝒈 with constant delay, we can output (𝑫,𝑳,𝒈) for a fixed 𝒙 with constant delay. However, the delay in outputting 𝒙 itself is O(d), but this time bound is dominated by the time bound to output the first (𝑫,𝑳,𝒈) for 𝒙. This leads to the following complexities.

Figure 8: Layout of the algorithm enumerating all binary trees for a given binary distribution, cf. Section 6.4. An example of the leftmost tree T is given by Figure 5, and for the subsequent trees TD and TL by Figure 6 and Figure 7, respectively.
Theorem 16.

Given a binary distribution 𝐰=(w1,,wd), the corresponding binary tree can be output in O(d+n) time for the first tree and then output all trees in the form of differences with a size of O(i=1dwi) with a delay of O(1).

While the complexities stated in Theorem 16 are in terms of 𝒘, we create a full-binary tree whose binary distribution is 𝒘+𝒙 for each completion vector 𝒙. Nevertheless, xiwi for all i[1..d], and therefore the number of nodes of this full-binary tree is asymptotically equal to the number of leaves of the tree we output.

Finally, an example of the binary tree corresponding our running example is given in Figure 9.

Figure 9: Binary tree corresponding to a vector representation returned by the algorithm whose complexities are stated in Theorem 16. The left box depicts a node of TD(𝒙) (red, cf. Figure 6) and TL(𝒙) (blue, cf. Figure 7). Informally, the red node specifies with a “1” the placement of a dummy parent, and the blue node with a “1” the placement of a (normal) leaf. Finally, a Gray code determines for each dummy parent whether its normal leaf is a left child or a right child.

7 Extension to 𝜶-ary trees

In this section, we extend the results of the previous section to general α-ary trees for a constant α2. Like for binary trees (α=2), we first define the notion of an α-ary distribution.

7.1 Properties of full 𝜶-ary trees and their enumeration

We call a vector of integers 𝒘=(w1,,wd) a (full) α-ary distribution if there is a (full) α-ary tree that has wi leaves on depth i for every i[1..d]. For a nonnegative integer vector 𝒘=(w1,,wd), we define the function fα(𝒘)=i=1dwi/αi, which is a generalization of f with f2=f. The problem we want to tackle in this section can be stated as follows.

Problem 17.

Given a (full) α-ary distribution 𝐰, enumerate all (full) α-ary trees whose α-ary distribution is 𝐰.

Like in the case of binary trees, the following holds.

Lemma 18 (Kraft’s inequality).

Let α2. A nonnegative integer vector 𝐰=(w1,,wd) that satisfies wd>0 is a α-ary distribution if and only if fα(𝐰)1.

Corollary 19.

Let 𝐰=(w1,,wd) be a nonnegative integer vector satisfying wd>0 and is a full α-ary distribution. Then, gα(j,𝐰):=i=jdwi/αij is a multiple of α, and represents the number of nodes at depth j of the α-ary tree having 𝐰 as its α-ary distribution.

Theorem 20.

Let α2. Given a full α-ary distribution 𝐰=(w1,,wd) satisfying wd>0, the number of all full α-ary trees that have 𝐰 as their α-ary distribution is

i=1d1(j=idwjαjiwi).

From these properties, the enumeration of full binary trees can be extended to the enumeration of full α-ary trees. For that, we observe that the approach for enumerating full binary trees analyzed in Section 5 is independent of the degree of the tree, since the representation only considers the configuration of internal nodes and leaves per depth. Therefore, we can generalize this approach in a straightforward manner by replacing f and g with fα and gα, respectively, and updating the values of ηj:=gα(j,𝒘). We obtain the following result based on an extension of Theorem 11:

Theorem 21.

Given a full α-ary distribution 𝐰=(w1,,wd), the corresponding full α-ary tree can be output in O(n) time for the first tree and then output all trees in the form of differences with a size of O(m) with a delay of O(1) in O(m) time, where n=i=1dwi and m=max{wii[1..d]}.

7.2 Enumeration of 𝜶-ary trees

We can carry out the enumeration of α-ary trees in the same way as binary trees, for which we had three levels of enumeration: for the complement vectors, the dummy leaves, and the actual leaves. Here, we proceed in the same way.

7.2.1 Enumeration of full 𝜶-ary tree complement vectors

Suppose we have an α-ary distribution 𝒘 with fα(𝒘)<1. We fix one α-ary tree corresponding to 𝒘 and add dummy leaves to all internal nodes with fewer than α children to make it a full α-ary tree. In this case, let the number of dummy leaves added at depth i be xi, and consider the vector 𝒙=(x1,,xd). By construction, we obtain that i=1d(wi+xi)/αi=1. We call the vector 𝒙 constructed in this way the complement vector of 𝒘, and we call 𝒘+𝒙 the full α-ary tree complement of 𝒘. For such 𝒙, a nonnegative integer vector 𝒙 is called a subcomplement vector of 𝒘 if 𝒙𝒙. Similarly to binary trees in Lemma 13, the following holds.

Lemma 22.

Given an α-ary distribution 𝐰 with fα(𝐰)<1, there exists a complement vector of 𝐰.

Generalizing Lemma 14, the following is true.

Lemma 23.

A nonnegative integer vector (0,,0,yd) is a subcomplement vector of the complement vector 𝐲=(y1,,yd) of an α-ary distribution 𝐰=(w1,,wd) if and only if ydwd(modα), yd[0..(α1)wd], and fα(𝐰)+yd/αd1.

Using these lemmas, we can generalize Algorithm 1 to Algorithm 2 to enumerate the complement vector of an α-ary distribution.

Algorithm 2 Subroutine for enumerating full α-ary tree complement vectors for 𝒘.

7.2.2 Enumeration of the configuration of dummy leaves

Suppose that we have determined the complement vector 𝒙 of 𝒘. The next step is to determine the configuration of dummy leaves, which is (like for binary trees) not unique, in general. However, in the case of α-ary trees, among all siblings of a dummy leaf there must be at least one normal node, i.e., an internal node or a leaf that is not dummy. Therefore, we cannot infer from a dummy parent its number of dummy leaves, and thus specifying the configuration of dummy nodes with a Gray code like for binary trees seems not possible. Instead, we enumerate the number of dummy leaves that each dummy parent has from 0 to α1, and then enumerate the configuration of dummy leaves corresponding to that number. This strategy is slightly more complicated than in the case of binary trees because of this additional intermediate step, but the calculation time is asymptotically absorbed by the time spent for determining one complement vector. Having computed the complement vector, the following steps are analogous to the case of binary trees, for which we obtain constant time delay per output.

We enumerate all possible ways to place the dummy leaves at each depth δ[1..d] by choosing a combination of the configuration of the dummy leaves at depth δ. For that, we first specify how many dummy leaves each parent has at depth δ1. According to Corollary 19, in the complemented tree corresponding to the binary distribution 𝒘+𝒙 with the added dummy leaves, there are ηδ=gα(δ,𝒘) nodes at depth δ (i.e., there are ηδ/α parents at depth δ1). We assign each parent a rank from [1..ηδ/α] in order from left to right, and we say that pi(δ) is the number of dummy leaves that the i-th parent has. Since each parent must contain at least one normal node, pi(δ)[0..α1], and the number of dummy leaves at depth δ is xδ, so 𝒑(δ)=(p1(δ),,pηδ/α(δ)) satisfies

i=1ηδ/αpi(δ)=xi. (1)

We use a known result for enumerating all distinct integer tuples (p1(δ),,pηδ/α(δ)) with pi(δ)[0..α1] satisfying Equation 1 with constant delay [16]. For each such integer tuple, we enumerate the configuration of dummy leaves corresponding to each 𝑷. After that, all α-ary trees can be enumerated with constant time delay as in the case of binary trees.

8 Conclusion

We have addressed the problem to efficiently enumerate trees whose numbers of leaves on each depth are given by a query vector. Specifically, for any leaf distribution 𝒘=(w1,,wd) of (full) binary or (full) α-ary trees, we obtain the following time complexities. First, we can output the first tree in O(d+n) time. We can output every subsequent tree, after constant delay, in O(m) time by encoding this tree as a difference to a previous one using O(m) words. Here, n=i=1dwi is the total number of nodes and m=max{wii[1..d]} is the maximal number of nodes at any depth. These results hold for both full binary trees and full α-ary trees with α2, and extend to general binary trees with analogous enumeration guarantees.

As future work, we want to determine the exact number of full binary trees and full α-ary trees with distribution 𝒘 by a closed-form product formulas involving binomial coefficients. We further strive to give practical implementations of our proposed enumeration algorithms.

References

  • [1] Masahiro Baba, Hirotaka Ono, Kunihiko Sadakane, and Masafumi Yamashita. A succinct representation of a full binary tree. Technical Report 0, Record of Joint Conference of Electrical and Electronics Engineers in Kyushu, 2009. doi:10.11527/jceeek.2009.0.59.0.
  • [2] David Benoit, Erik D. Demaine, J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Representing trees of higher degree. Algorithmica, 43(4):275–292, 2005. doi:10.1007/s00453-004-1146-6.
  • [3] Terry Beyer and Sandra Mitchell Hedetniemi. Constant time generation of rooted trees. SIAM J. Comput., 9(4):706–712, 1980. doi:10.1137/0209055.
  • [4] Michael Buro. On the maximum length of Huffman codes. Inf. Process. Lett., 45(5):219–223, 1993. doi:10.1016/0020-0190(93)90207-P.
  • [5] Jeremy Chizewer, Stephen Melczer, J. Ian Munro, and Ava Pun. Enumeration and succinct encoding of AVL trees. In Proc. AofA, volume 302 of LIPIcs, pages 2:1–2:12, 2024. doi:10.4230/LIPICS.AOFA.2024.2.
  • [6] Nachum Dershowitz and Shmuel Zaks. Enumerations of ordered trees. Discret. Math., 31(1):9–28, 1980. doi:10.1016/0012-365X(80)90168-5.
  • [7] Sen-Peng Eu, Seunghyun Seo, and Heesung Shin. Enumerations of vertices among all rooted ordered trees with levels and degrees. Discret. Math., 340(9):2123–2129, 2017. doi:10.1016/J.DISC.2017.04.007.
  • [8] Frank Gray. Pulse code communication. US Patent 2632058, 1953.
  • [9] Roberto Grossi. Enumeration of paths, cycles, and spanning trees. In Encyclopedia of Algorithms, pages 640–645. Springer, 2016. doi:10.1007/978-1-4939-2864-4_728.
  • [10] Guy Jacobson. Space-efficient static trees and graphs. In Proc. FOCS, pages 549–554, 1989. doi:10.1109/SFCS.1989.63533.
  • [11] Leon Gordon Kraft. A device for quantizing, grouping, and coding amplitude-modulated pulses. Master’s thesis, Massachusetts Institute of Technology, 1949.
  • [12] Shin-Ichi Nakano and Takeaki Uno. Constant time generation of trees with specified diameter. In Proc. WG, volume 3353 of LNCS, pages 33–45, 2004. doi:10.1007/978-3-540-30559-0_3.
  • [13] Frank Ruskey and Andrzej Proskurowski. Generating binary trees by transpositions. J. Algorithms, 11(1):68–84, 1990. doi:10.1016/0196-6774(90)90030-I.
  • [14] Ivan Stojmenovic. A simple systolic algorithm for generating combinations in lexicographic order. Computers & Mathematics with Applications, 24(4):61–64, 1992.
  • [15] Takeaki Uno. Two general methods to reduce delay and change of enumeration algorithms. Technical Report NII-2003-004E, NII Technical Report, 2003. doi:10.20736/0000000385.
  • [16] Timothy R. Walsh. Loop-free sequencing of bounded integer compositions. Journal of Combinatorial Mathematics and Combinatorial Computing, 33:323–345, 2000.
  • [17] Katsuhisa Yamanaka, Yota Otachi, and Shin-Ichi Nakano. Efficient enumeration of ordered trees with k leaves. Theor. Comput. Sci., 442:22–27, 2012. doi:10.1016/j.tcs.2011.01.017.
  • [18] Shmuel Zaks. Lexicographic generation of ordered trees. Theor. Comput. Sci., 10:63–82, 1980. doi:10.1016/0304-3975(80)90073-0.

Appendix

Table 1: Symbols used in this article.
symbol meaning
𝒘 distribution
d maximal depth and dimension of 𝒘
n number of leaves
m maximum value in 𝒘
𝒙 complement vector
𝒚 subcomplement vector
f(𝒘) i=1dwi/2i weight function
𝜼 ηi:=g(i,𝒘)
νi g(i,𝒘+𝒙)
g(i,𝒘) j=idwi/2ji for i[1..d]