Avoiding the Global Sort: A Faster Contour Tree Algorithm

We revisit the classical problem of computing the contour tree of a scalar field f:M→R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f:\mathbb {M}\rightarrow \mathbb {R}$$\end{document}, where M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {M}$$\end{document} is a triangulation of a ball in Rd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}^d$$\end{document}. The contour tree is a fundamental topological structure that tracks the evolution of level sets of f and has numerous applications in data analysis and visualization. All existing algorithms begin with a global sort of at least all critical values of f, which can require (roughly) Ω(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega (n\log n)$$\end{document} time, where n is the number of vertices of the mesh. Existing lower bounds show that there are pathological instances where this sort is required. We present the first algorithm whose time complexity depends on the contour tree structure, and avoids the global sort for non-pathological inputs. (We assume that the Morse degree of the function f at any point is at most 3.) If C denotes the set of critical points in M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {M}$$\end{document}, the running time is roughly O(N+∑v∈Clogℓv)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(N+\sum _{v \in C} \log \ell _v)$$\end{document}, where ℓv\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _v$$\end{document} is the depth of v in the contour tree and N is the total complexity of M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {M}$$\end{document}. This matches all existing upper bounds, but is a significant asymptotic improvement when the contour tree is short and fat. Specifically, our approach ensures that any comparison made is between nodes that are either adjacent in M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {M}$$\end{document} or in the same descending path in the contour tree, allowing us to argue strong optimality properties of our algorithm. Our algorithm requires several novel ideas: partitioning M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {M}$$\end{document} in well-behaved portions, a local growing procedure to iteratively build contour trees, and the use of heavy path decompositions for the time complexity analysis.


Introduction
Geometric data is often represented as a function f : R d → R. Typically, a finite representation is given by considering f to be piecewise linear over some triangulated mesh (i.e. simplicial complex) M in R d . Contour trees are a topological structure used to represent and visualize the function f . It is convenient to think of f as a simplicial complex sitting in R d+1 , with the last coordinate (i.e. height) given by f . Imagine sweeping the hyperplane x d+1 = h with h going from +∞ to −∞. At every instance, the intersection of this plane with f gives a set of connected components, the contours at height h. As the sweeping proceeds various events occur: new contours are created or destroyed, contours merge into each other or split into new components. The contour tree is a concise representation of these events [7,26].
If f is smooth, all points where the gradient of f is zero are critical points. These points are the "events" where the contour topology changes. The contour tree tracks a specific subset of events, called "joins" and "splits". We provide formal definitions later. An edge of the contour tree connects two such critical points if one event immediately "follows" the other as the sweep plane makes its pass. Figures 1 and 2 show examples of simplicial complexes, with heights and their contour trees. Think of the contour tree edges as pointing downwards. Leaves are either maxima or minima, and internal nodes are either "joins" or "splits".
Consider f : M → R, where M is the triangulation of a d-dimensional ball with n vertices, N faces in total, and t ≤ n critical points. (We assume that f : M → R is a linear interpolant over distinct valued vertices, with Morse degree at most 3, see Definition 2.3. The degree assumption can be achieved via vertex unfolding, as discussed later, and is standard in this literature [26].) A fundamental result in this area is the algorithm of Carr, Snoeyink, and Axen to compute contour trees, which runs in O(n log n + N α(N )) time [7] (where α(·) denotes the inverse Ackermann function). In practical applications, N is typically (n) (certainly true for d = 2). The most expensive operation is an initial sort of all the vertex heights. Chiang et al. build on this approach to get a faster algorithm that only sorts the critical vertices, yielding a running time of O(t log t + N ) [8]. Common applications for contour trees involve turbulent combustion or noisy data, where the number of critical points is likely to be (n). There is a worst-case lower bound of (t log t) by Chiang et al. [8], based on a construction of Bajaj et al. [1].
All previous algorithms begin by sorting (at least) the critical points. Can we beat this sorting bound for certain instances, and can we characterize which inputs are hard? Intuitively, points that are incomparable in the contour tree do not need to be compared. Look at Fig. 1 to see such an example. All previous algorithms waste time sorting all the maxima. Also consider the surface of Fig. 2. The final contour tree is  On the left, a surface with a balanced contour tree, but whose join and split trees have long tails. On the right (from left to right), the contour, join and split trees basically two binary trees joined at their roots, and we do not need the entire sorted order of critical points to construct the contour tree.
Our main result gives an affirmative answer. Remember that we can consider the contour tree as directed from top to bottom. For any node v in the tree, let v denote the length of the longest directed path passing through v. Essentially, the "run time per critical point" is determined by the height/depth of the point in the contour tree. This bound immediately yields a run time of O(t log D + tα(t) + N ), where D is the directed diameter of the contour tree. This is a significant asymptotic improvement for short and fat contour trees. For example, if the tree is balanced, then we get a bound of O(t log log t). Even if T contains a long path of length O(t/ log t), but is otherwise short, we get the improved bound of O(t log log t).

A Refined Bound with Optimality Properties
Theorem 1.1 is a direct corollary of a stronger but more cumbersome theorem. Definition 1.2 For a contour tree T , a leaf path is any path in T containing a leaf, which is also monotonic in the height values of its vertices. Then a path decomposition, P(T ), is a partition of the vertices of T into a set of vertex disjoint leaf paths.

Contour Tree Basics
We detail the basic definitions about contour trees, following the terminology of Chapter 6 of Carr's thesis [6]. All our assumptions and definitions are standard for results in this area, though there is some variability in notation. The input is a continuous piecewise-linear function f : M → R, where M is a fully triangulated simplicial complex of a topological ball in R d , except for specially designated boundary facets. We assume that all pairs of vertices from M have distinct function values, except for pairs from the same boundary facet, and that f is only explicitly defined on the vertices, and all other values are obtained by linear interpolation.
We assume that the boundary values satisfy a special property. This is mainly for convenience in presentation.

Definition 2.1
The function f is boundary critical if the following holds. Consider a boundary facet F. All vertices of F have the same function value. Furthermore, all neighbors of vertices in F, which are not also in F itself, either have all function values strictly greater than or all function values strictly less than the function value at F. Boundary facets allow us to capture the resulting surface pieces after our algorithm makes a horizontal cut. The above definition is convenient, as any point within a given facet has a well-defined height, including the boundary facets.
We think of the dimension d, as constant, and assume that M is represented in a data structure that allows constant-time access to neighboring simplices in M (e.g. incidence graphs [12]). (This is analogous to a doubly connected edge list, but for higher dimensions.) Observe that f : M → R can be thought of as a d-dimensional simplicial complex living in R d+1 , where f (x) is the "height" of a point x ∈ M, which is encoded in the representation of M. Specifically, rather than writing our input as (M, f ), we abuse notation and typically just write M to denote the lifted complex. Note that a contour that does not contain a boundary is itself a simplicial complex of one dimension lower, and is represented (in our algorithms) as such. We let δ and ε denote infinitesimals. Let B ε (x) denote a ball of radius ε around x, and let f |B ε (x) be the restriction of f to B ε (x). The following definition of a critical point is not standard, but is convenient for our presentation.
A regular point has both Morse up-degree and down-degree 1. A maximum has Morse up-degree 0, while a minimum has Morse down-degree 0. A Morse Join has Morse up-degree strictly greater than 1, while a Morse Split has Morse down-degree strictly greater than 1. Non-regular points are called critical.
The set of critical points is denoted by V( f ). Because f is piecewise-linear, all critical points are vertices in M. A value h is called critical, if f (v) = h, for some v ∈ V( f ). A contour is called critical, if it contains a critical point, and it is called regular otherwise.
The critical points are where certain topological properties of level sets change. By assuming that our simplicial complex is boundary critical, the vertices on a given boundary are either collectively all maxima or all minima. We abuse notation and refer to this entire set of vertices as a maximum or minimum. Definition 2.4 Two regular contours ψ and ψ are equivalent if there exists an f -monotone path p connecting a point in ψ to ψ , such that no x ∈ p belongs to a critical contour.
This equivalence relation gives a set of contour classes. Every such class maps to intervals of the form ( f (x i ), f (x j )), where x i , x j are critical points. Such a class is said to be created at x i and destroyed at x j .

Definition 2.5
The contour tree is the graph on vertex set V = V( f ), where edges are formed as follows. For every contour class that is created at v i and destroyed at v j , there is an edge (v i , v j ). (Conventionally, edges are directed from higher to lower function value.) We denote the contour tree of M by C(M). The corresponding node and edge sets are denoted as V(·) and E(·). It is not immediately obvious that this graph is a tree, but alternative definitions of the contour tree in [7] imply this is a tree. Since this tree has height values associated with the vertices, we can talk about up-degrees and down-degrees in C(M).
We assume that the Morse up (or down) degree of any point in M is at most 2, and that the total Morse degree is at most 3. This immediately implies analogous bounds on the up and down-degrees in C(M). This is standard in the literature among others, though it can be challenging to enforce in practice. Theoretically, a manifold with points of higher Morse degree can be converted to one with the degree condition through a process of vertex unfolding (refer to Sect. 6.3 in [13]). This effectively converts a multi-saddle to a set of ordinary saddles. For 2D manifolds, this conversion does not increase the surface complexity. In higher dimensions, the unfolding can increase complexity if M has vertices of non-constant degree.

Some Remarks
(1) Note that if one intersects M with a given ball B, then a single contour in M might be split into more than one contour in the intersection. In particular, two ( f (x) + δ)contours of f | B ε (x) , given by Definition 2.3, might actually be the same contour in M. Alternatively, one can define the up-degree (as opposed to Morse up-degree) as the number of ( f (x) + δ)-contours (in the full M) that intersect B ε (x), a potentially smaller number. This up-degree is exactly the up-degree of x in C(M). (Analogously, for down-degree.) When the Morse up-degree is 2 but the up-degree is 1, the topology of the level set changes but not by the number of connected components. When d = 2, this distinction is not necessary, since any point with Morse degree strictly greater than 1 will have degree strictly greater than 1 in C(M).
(2) As Carr points out in Chapter 6 of his thesis, the term contour tree can be used for a family of related structures. Every regular vertex in M is associated with an edge in C(M), and sometimes the vertex is explicitly placed in C(M) (by subdividing the respective edge). This is referred to as augmenting the contour tree, and it is common to augment C(M) with all vertices. Alternatively, one can smooth out all vertices of up-degree and down-degree 1 to get the unaugmented contour tree. (For d = 2, there are no such vertices in C(M).) The contour tree of Definition 2.5 is the typical definition in all results on output-sensitive contour trees. Theorem 1.3 is applicable for any augmentation of C(M) with a predefined set of vertices, though we will not delve into these aspects in this paper.

A Tour of the New Contour Tree Algorithm
We provide a high-level description of the main ideas of subsequent sections.
Do not globally sort: The starting point for this work is Fig. 1. We have two terrains with exactly the same contour tree, but different orderings of (heights of) the critical points. Turning it around, we cannot deduce the full height ordering of critical points from the contour tree, and so sorting all critical points is computationally unnecessary. In Fig. 2, the contour tree consists of two balanced binary trees, one of the joins, another of the splits. Again, it is not necessary to know the relative ordering between the mounds on the left (or among the depressions on the right) to compute the contour tree. Yet some ordering information is necessary: on the left, the little valleys are higher than the big central valley, and this is reflected in the contour tree. Leaf paths in the contour tree have points in sorted order, but incomparable points in the tree are unconstrained. How do we sort exactly what is required, without knowing the contour tree in advance?
Join and split trees: The standard contour tree algorithm of Carr et al. [7], builds two different trees, called the join and split trees, and then merges them together to get the contour tree. Consider sweeping down the hyperplane x d+1 = h and taking the superlevel sets. These are the connected components of the portion of M above height h. For a terrain, the superlevel sets are a collection of "mounds". As we sweep downwards, these mounds keep joining each other, until finally, we end up with all of M. The join tree, J (M), tracks exactly these events. Formally, let M + v denote the simplicial complex induced on the subset of vertices higher than v. Refer to Fig. 2 for the join tree of a terrain. Note that nothing happens at splits, but these are still put as vertices in the join tree. The split tree is obtained by simply inverting this procedure, sweeping upwards and tracking sublevel sets.
A major insight of [7] is an ingeniously simple linear time procedure to construct the contour tree from the join and split trees. So the bottleneck is computing these trees. Observe in Fig. 2 that the split vertices form a long path in the join tree (and vice versa). Therefore, constructing these trees forces a global sort of the splits, an unnecessary computation for the contour tree. Unfortunately, in general (i.e. unlike Fig. 2) the heights of joins and splits may be interleaved in a complex manner, and hence the final merging of [7] to get the contour tree requires having the split vertices in the join tree. Without this, it is not clear how to get a consistent view of both joins and splits, required for the contour tree.
Our aim is to break M into smaller pieces, where this unnecessary computation can be avoided.
Cutting M into extremum dominant pieces: We define a simplicial complex endowed with a height to be extremum dominant if there exists only a single minimum, or only a single maximum. (When formally defined in Sect. 6, extremum dominant complexes will allow additional trivial minima or maxima which are a small complication resulting from the following cutting procedure.) We first cut M into disjoint extremum dominant pieces in linear time. Take an arbitrary maximum x and imagine torrential rain at the maximum. The water flows down, wetting any point that has a non-ascending path from x. We end up with two portions, the wet part of M and the dry part. This is similar to watershed algorithms used for image segmentation [21]. The wet part is obviously connected and can be shown to be extremum dominant. We can cut along the interface of the wet and dry, remove the wet part, and recurse on the dry parts. This is carefully done to ensure that the total complexity of the pieces is linear (see Sect. 6 for details).
We now have carved M up into a collection of extremum dominant pieces. In Sect. 5 we prove a simple contour surgery theorem allowing us to build the contour tree of M from the contour trees of these various pieces. Specifically, we argue our cutting in M was done in such a way that each cut corresponds to cutting only a single edge in the contour tree. Informally this means gluing M back together along these cuts corresponds to gluing the contour tree together.
Ultimately, the reason for cutting M into extremum dominant pieces is that such pieces have simpler contour tree structure. Specifically, using ideas from [7], in Sect. 7 we prove that the contour tree of an extremum dominant complex is (basically) just the join tree. Thus there is no longer a disconnect between the amount of sorting required for the contour tree versus the join tree, and so the problem of computing contour trees efficiently reduces to that for join trees.
Join trees from painted mountaintops: Our main result is a faster algorithm for join trees. The key idea is paint spilling. Start with each maximum having a large can of paint, with distinct colors for each maximum. In arbitrary order, spill paint from each maximum, wait till it flows down, then spill from the next, etc. Paint is viscous, and only flows down edges. It does not paint the interior of faces with dimension strictly larger than 1. Furthermore, paints do not mix, so each edge receives a unique color, decided by the first paint to reach it. Note that the raining procedure, used above to carve the input into extremum dominant pieces, cannot be used here as it wets interiors of higher dimensional faces and so raining from each maximum may significantly increase the input complexity (see Fig. 4). On the other hand, the interface between two different colors is not a contour, and so the join trees of each color class cannot just be simply glued together.
Our algorithm incrementally builds J (M) from the leaves (maxima) to the root (dominant minimum), by repeatedly merging colors and components together. We say In that case, there are "mounds" corresponding to 1 and 2 that merge at a valley v. Suppose this was the entire input, and 1 was colored blue and 2 was colored red. Both mounds are colored completely blue or red, while v is touched by both colors. So this indicates that v joins the blue maximum and red maximum in J (M). This is precisely how we hope to exploit the information in the painting. We prove later that when some join v has exactly two colors among all incident edges with v as the lower endpoint, the corresponding maxima (of those colors) are exactly the children of v in J (M). To proceed further, we "merge" the colors red and blue into a new color, purple. In other words, we replace all red and blue edges by purple edges. This indicates that the red and blue maxima have been handled. Imagine flattening the red and blue mounds until reaching v, so that the former join v is now a new maximum, from which purple paint is poured. In terms of J (M), this is equivalent to removing leaves 1 and 2 , and making v a new leaf. Alternatively, J (M) has been constructed up to v, and it remains to determine v's parent. The merging of the colors is not explicitly performed as that would be too expensive; instead we maintain a union-find data structure for that. So the red and blue mounds are not actually recolored purple, and instead say red now points to blue in the union find data structure.
Of course, things are more complicated when there are other mounds. There may be a yellow mound, corresponding to 3 that joins with the blue mound higher up at some vertex u (see the right part of Fig. 3). In J (M), 1 and 3 are sibling leaves, and 2 is a sibling of some ancestor of these leaves. So we cannot merge red and blue, until yellow and blue merge. Naturally, we use priority queues to handle this. We know u must also be touched by blue, so all critical vertices touched by blue are put in a priority queue keyed by height, and vertices are handled in that order.
What happens when finally blue and red join at v? We merge the two colors, but now have blue and red queues of critical vertices, which also need to be merged to get a consistent painting. This necessitates using a priority queue with efficient merges. Specifically, we use binomial heaps [27], as they provide logarithmic time merges and deletes (though other similar heaps work). We stress that the feasibility of the entire approach hinges on the use of such an efficient heap structure.
In this discussion, we ignored an annoying problem. Vertices may actually be touched by numerous colors, not just one or two as assumed above. A simple solution would be to insert vertices into heaps corresponding to all colors touching it. But there could be super-constant numbers of copies of a vertex, and handling all these copies would lead to extra overhead. We show that it suffices to simply put each vertex v into at most two heaps, one for each "side" of a possible join. We are guaranteed that when v needs to be processed, all edges incident from above have at most two colors, because of all the color merges that previously occurred.
Running time intuition: All non-heap operations can be easily bounded by O(tα(t) + N ). One can argue any heap always contains a subset of a single leaf to root path. Bounding each heap by the size of this path suffices to prove the bound of Theorem 1.1, stated for join trees. However, this may grossly overestimate heap sizes, as each vertex on a path can have at most two colors (the ones getting merged). Thus as we get closer to the root the competition for which two colors a vertex gets assigned to grows, since the number of descendant leaf colors grows. This means for some vertices the size of the heap will be significantly smaller during an associated heap operation.
The intuition is that the paint spilling from the maxima in the simplicial complex, corresponds to paint spilling from the leaves in the join tree, which decomposes the join tree into a set of colored paths. Unfortunately, the situation is more complex since while a given color class is confined to a single leaf to root path, it may not appear contiguously on this path, as the right part of Fig. 3 shows. Specifically, in this figure the far left saddle (labeled i) is hit by blue paint. However, there is another saddle on the far right (labeled j) which is not hit by blue paint. Since this far right saddle is slightly higher than the far left one, the merge event involving the component containing the blue mound (and also the yellow and red mounds) and corresponding to the far right saddle will occur before the one corresponding to the far left saddle. Hence, the vertices initially touched by blue are not contiguous in the join tree. This non-contiguous complication along with the fact that heap sizes keep changing as color classes merge, causes the analysis to be technically challenging. We employ a variant of heavy path decompositions, first used by Sleator and Tarjan for analyzing link/cut trees [23]. The final analysis charges expensive heap operations to long paths in the decomposition, resulting in the bound stated in Theorem 1.3.

Painting to Compute Join Trees
We now present the main algorithmic contribution, which is a new algorithm for computing join trees of any triangulated simplicial complex M. In subsequent sections we then show how to reduce computing the contour tree to computing the join tree by using contour surgery.
We now formally define the join and split trees as given by Carr et al. [7]. Conventionally, all edges are directed from higher to lower function value. In the following, for vertex v, we use M + v to denote the simplicial complex obtained by only keeping vertices u such that f (u) > f (v). Analogously, define M − v . Note that M + v may contain numerous connected components. Some basic facts about these trees. All outdegrees in J (M) are at most 1, all indegree 2 vertices are joins, all leaves are maxima, and the global minimum is the root. All indegrees in S(M) are at most 1, all outdegree 2 vertices are splits, all leaves are minima, and the global maximum is the root. As these trees are rooted, we can use ancestor-descendant terminology. Specifically, for two adjacent vertices u and v, u is the parent of v if u is closer to the root (i.e. each node can have at most one parent, but can have two children).

Painting
The central tool is a notion of painting M. Initially associate a color with each maximum. Imagine there being a large can of paint of a distinct color at each maximum x. We will spill different paint from each maximum and watch it flow down. So paint only flows down edges, and it does not color the interior of higher dimensional faces. Furthermore, paints do not mix, so every edge of M gets a unique color. This process (and indeed the entire algorithm) works purely on the 1-skeleton of M, which is just a graph.
is referred to as the color of z, with the following property. Consider an edge e. There exists a descending path from some maximum x to e consisting of edges in E, such that all edges along this path have the same color as x.
An initial painting also requires that the restriction χ : X → [|X |] is a bijection. • For any maximum x ∈ X , we say that x is both touched and fully touched by χ(x).

The Algorithm
We first give an informal, idealized description of the algorithm. That is, in the actual algorithm the details differ slightly and additional data structures are used to improve the running time.
2. For each color, construct a (binomial) max-heap of the critical points touched by that color, indexed by the heights of the critical points. 3. Initialize the join tree to consist of a set of isolated vertices, one for each color class.
(We now iteratively fill in the rest of the join tree from the leaves towards the root.) 4. Repeat until all heaps are empty: (a) Find a critical point w that is at the top of all heaps containing w.
(b) Add w to the join tree by connecting it to the lowest vertex of each component of the join tree that contains a maximum whose color matches one of the heaps containing w. (c) Delete w from all these heaps, and merge them.
This motivates the definition of ripe vertices.
To efficiently implement the above, we need a number of data structures. Namely, binomial heaps for efficient merges, union-find to avoid explicit recoloring, and a stack to hold unripe (though ripening) vertices, which is used to effectively amortize the cost of the otherwise expensive operation of finding ripe vertices. Note we also later show it suffices to assign each vertex to only a single color from each adjacent up-star, rather than the colors of all the edges which touch it.

The Data Structures
The binomial heaps T (c): For each color c, T (c) is a subset of vertices touched by c. This is stored as a binomial max-heap keyed by vertex heights. Abusing notation, T (c) refers both to the set and the data structure used to store it.
The union-find data structure on colors: We will repeatedly perform unions of classes of colors, and this will be maintained as a standard union-find data structure. For any color c, rep(c) denotes the representative of its class.
Color list col(v): For each vertex v, we maintain col(v) as a simple list. In addition, we will maintain another (sub)list of colors L such that ∀c ∈ L, v is not the highest vertex in T (rep(c)). Note that given c ∈ col(v) and rep(c), this property can be checked in constant time. If c is ever removed from this sublist, it can never enter it again.
For notational convenience, we will not explicitly maintain this sublist. We simply assume that, in constant time, one can determine (if it exists) an arbitrary color c ∈ col(v) such that v is not the highest vertex in T (rep(c)).
The stack K : This consists of non-extremal critical points, with monotonically increasing heights as we go from the base to the head.
Attachment vertex att(c): For each color c, we maintain a critical point att(c) of this color. We will maintain the guarantee that the portion of the join tree above (and including) att(c) has already been constructed.

The Formal Algorithm
We now give a detailed description of the main algorithm. The primary challenge is to use the data structures to rapidly find ripe vertices. A simple example of the algorithm execution is given in Appendix A.

init(M)
1. Construct an initial painting of M using a descending BFS from maxima that does not explore previously colored edges. Insert v into T (c). 5. Initialize rep(c) = c and set att(c) to be the unique maximum colored c. 6. Initialize K to be an empty stack.

Determine all critical points in
(e) Take union of cur(h) and denote resulting representative color by c. (f) Set att( c) = h and mark h as processed.

and update head h.
A few simple facts: • At all times, the colors form a valid painting.
• Each vertex is present in at most two heaps. This is because the Morse up-degree of every point is at most 2. After processing, it is removed from all heaps. • After v is processed, all edges incident to v from above have the same (representative) color. • Vertices on the stack are in increasing height order.

Observation 4.5
Each unprocessed vertex is always in exactly one queue of the colors in each of its up-stars. Specifically, for a given up-star of a vertex v, init(M) puts v into the queue of exactly one of the colors of the up-star, say c. As time goes on this queue may merge with other queues, but while v remains unprocessed, it is only ever (and always) in the queue of rep(c), since v is never added to a new queue and is not removed until it is processed. In particular, finding the queues of a vertex in update(K ) requires at most two union find operations (assuming each vertex records its two colors from init(M)).

Proving Correctness
Our main workhorse is the following lemma. The current color of an edge, e, refers to the value of rep(χ (e)), where χ(e) is the color of e from the initial painting. Proof Since e has current color c, there must exist vertices in P currently touched by c. Consider the highest vertex w in P that is currently touched by c and some other color. If no such vertex exists, this means all edges incident to a vertex currently touched by c are currently colored c. By walking through P, we deduce that all edges are currently colored c.
So assume w exists. Take the ( f (w) + δ)-contour φ that intersects B ε (w) and intersects some currently c-colored edge incident to w. Note that all edges intersecting φ are also currently colored c, since w is the highest vertex to be currently touched by c and some other color. (Take the path of currently c-colored edges from the maximum to w. For any point on this path, the contour passing through this point must be currently colored c.) Hence, c currently fully touches w. But w is currently touched by another color, and the corresponding edge cannot intersect φ. So w must have up-degree 2 and is critical. Proof update(K ) is only called if there are unprocessed vertices remaining, and so by the time we reach step 3 in update(K ), the stack has some unprocessed vertex h on it. If h is ripe, then we are done, so suppose otherwise.
Let P be one of the components of M + h . By construction, h was put in the heap of some initially adjacent color c. Therefore, h must be in the current heap of rep(c) (see Observation 4.5). Now by Lemma 4.6, either all edges in P are colored rep(c) or there is some vertex w fully touched by rep(c) and touched by some other color. The former case implies that if there are any unprocessed vertices in P then they are all in T (rep(c)), implying that h is not the highest vertex and a new higher up unprocessed vertex will be put on the stack for the next iteration of the while loop. Otherwise, all the vertices in P have been processed. However, it cannot be the case that all vertices in all components of M + h have already been processed, since this would imply that h was ripe, and so one can apply the same argument to the other non-fully processed component.
Now consider the latter case, where we have a non-monochromatic vertex w. In this case w cannot have been processed (since after being processed it is touched only by one color), and so it must be in T (rep(c)) since it must be in some heap of a color in each up-star (and one up-star is entirely colored rep(c)). As w lies above h in M, this implies h is not on the top of this heap. Proof L et c be the color of some edge in this up-star. By ripeness, v is the highest in T (rep(c)). Denote the component of M + v by P. By Lemma 4.6, either all edges in P are colored rep(c) or there exists critical vertex w ∈ P fully touched by rep(c) and touched by another color. In the latter case, w has not been processed, so w ∈ T (rep(c)) (contradiction to ripeness). Therefore, all edges in P are colored rep(c).

Claim 4.9
The partial output on the processed vertices is exactly the restriction of J (M) to these vertices.
Proof More generally, we prove the following: all outputs on processed vertices are edges of J (M) and for any representative color c, att(c) is the lowest processed vertex of that representative color. We prove this by induction on the processing order. The base case is trivially true, as initially the processed vertices and attachments of the color classes are the set of maxima. For the induction step, consider the situation when v is being processed.
Since v is being processed, we know by Corollary 4.7 that it is ripe. Take any up-star of v, and the corresponding component P of M + v that it connects to. By Claim 4.8, all edges in P and the up-star have the same current color, say c. If some critical vertex in P is not processed, it must be in T (c), which violates the ripeness of v. Thus, all critical vertices in P have been processed, and so by the induction hypothesis, the restriction of J (M) to P has been correctly computed. Additionally, since all critical vertices in P have been processed, they all have the same representative color c of the lowest critical vertex in P. Thus by the strengthened induction hypothesis, this lowest critical vertex is att(c).
If there is another component of M + v , the same argument implies the lowest critical vertex in this component is att(c ) (where c is the current color of edges in the respective component). Now by the definition of J (M), the critical vertex v connects to the lowest critical vertex in each component of M + v , and so by the above v should connect to att(c) and att(c ), which is precisely what v is connected to by build(M). Moreover, build merges the representative colors c and c and correctly sets v to be the attachment, as v is the lowest processed vertex of this merged color (as by induction att(c) and att(c ) were the lowest vertices before merging colors).

Theorem 4.10 Given an input complex M, build(M) terminates and outputs J (M).
Proof First observe that each vertex can be processed at most once by build(M). By Corollary 4.7, we know that as long as there is an unprocessed vertex, update(K ) will be called and will terminate with a ripe vertex which is ready to be processed. Therefore, eventually all vertices will be processed, and so by Claim 4.9 the algorithm will terminate having computed J (M).

Running Time
We now bound the running time of the algorithm of Sect. 4.2. In subsequent sections, through a sophisticated charging argument, this bound is then related to matching upper and lower bounds in terms of path decompositions. Therefore, it will be useful to set up some terminology that can be used consistently in both places. Specifically, the path decomposition bounds will be purely combinatorial statements on colored rooted trees, and so the terminology is of this form.
Any tree T considered in the following will be a rooted binary tree 1 where the height of a vertex is its distance from the root r (i.e. conceptually T will be a join tree with r at the bottom). As such, the children of a vertex v ∈ T are the adjacent vertices of larger height (and v is the parent of such vertices). Then the subtree rooted at v, denoted T v consists of the graph induced on all vertices which are descendants of v (including v itself). For two vertices v and w in T let d(v, w) denote the length of the path between v and w. We use A(v) to denote the set of ancestors of v. For a set of nodes U , A(U ) = u∈U A(u).

Definition 4.11
A leaf assignment χ of a tree T assigns two distinct leaves to each internal vertex v, one from the left child and one from the right child subtree of v (naturally if v has only one child it is assigned only one leaf). The following lemma should justify these definitions.
Proof First we look at the initialization procedure init(M). This procedure runs in O(N ) time. Indeed, the painting procedure consists of several BFS's but as each vertex is only explored by one of the BFS's, it is linear time overall. Determining the critical points is a local computation on the neighborhood of each vertex and so is linear (i.e. each edge is viewed at most twice). Finally, each vertex is inserted into at most two heaps and so initializing the heaps takes linear time in the number of vertices. Now consider the union-find operations performed by build and update. Initially the union find data structure has a singleton component for each leaf (and no new components are ever created), and so each union-find operation takes O(α(t)) amortized time. For update, by Observation 4.5, each iteration of the while loop requires a constant number of finds (and no unions). Specifically, if a vertex is found to be ripe (and hence processed next) then these can be charged to that vertex. If a vertex is not ripe, then these can be charged to the vertex put on the stack. As each vertex is put on the stack or processed at most once, update performs O(t) finds overall. Finally, build(M) performs one union and at most two finds for each vertex. Therefore the total number of union find operations is O(t).
For the remaining operations, observe that for every iteration of the loop in update, a vertex is pushed onto the stack and each vertex can only be pushed onto the stack once (since the only way it leaves the stack is by being processed). Therefore the total running time due to update is linear (ignoring the find operations).
What remains is the time it takes to process a vertex v in build(M). In order to process a vertex there are a few constant time operations, union-find operations, and queue operations. Therefore the only thing left to bound are the queue operations.
Let v be a vertex in J (M), and let c 1 and c 2 be its children (the same argument holds if v has only one child). At the time v is processed, the colors and queues of all vertices in a given component of M + v have merged together. In particular, when v is processed we know it is ripe and so all vertices above v in each component of M + v have been processed, implying these merged queues are the queues of the current colors of c 1 and c 2 . Again since v is ripe, it must be on the top of these queues and so the only vertices left in these queues are those in H c 1 and H c 2 . Now when v is handled, three queue operations are performed. Specifically, v is removed from the queues of c 1 and c 2 , and then the queues are merged together. By the above arguments the sizes of the queues for each of these operations are |H c 1 |, |H c 2 |, and |H v |, respectively. As merging and deleting takes logarithmic time in the heap size for binomial heaps, the claim now follows.
For any critical point v let v denote the length of the longest directed path passing through v in the join tree. As |H v | only counts vertices in a v to root path, trivially |H v | ≤ v , immediately implying the analog of Theorem 1.1 for join trees. Note however that there is fair amount of slack in this argument as |H v | may be significantly smaller than v . This slack allows for the more refined upper and lower bounds mentioned in Sect. 1.1. Quantifying this slack however is quite challenging, and requires a significantly more sophisticated analysis involving path decompositions, which is the subject of Sect. 8.

Divide and Conquer Through Contour Surgery
Now that we have an efficient join tree algorithm, we describe how to reduce computing the contour tree to computing the join tree, as discussed in Sect. 3. Specifically, in this section we first describe our basic cutting tool, which we call contour surgery. In Sect. 6 we then describe how we use this tool to cut the surface into so called extremum dominant pieces where computing the contour tree reduces to computing the join tree, and this equivalence is proven in Sect. 7. The pieces we create are technically not simplicial complexes, in that each piece may have facets that are not simplices, that is facets of larger size than the dimension. These can easily be completed into simplicial complexes, though as we do not need this, for ease of exposition, we do not explicitly perform this step in our algorithm.
The cutting operation: We define a "cut" operation on f : M → R that cuts along a regular contour to create a new simplicial complex with an added boundary. Given a contour φ, roughly speaking, this constructs the simplicial complex M\φ. We will always enforce the condition that φ never passes through a vertex of M. Again, we use ε for an infinitesimally small value. We denote φ + (resp. φ − ) to be the contour at An h-contour is achieved by intersecting M with the hyperplane x d+1 = h and taking a connected component. (Think of the d + 1-dimension as height.) Given some point x on an h-contour φ, we can walk along M from x to determine φ. We can "cut" along φ to get a new (possibly) disconnected simplicial complex M . This is achieved by splitting every face F that φ intersects into an "upper" face and "lower" face. Algorithmically, we cut F with φ + and take everything above φ + in F to make the upper face. Analogously, we cut with φ − to get the lower face. The faces are then triangulated to ensure that they are all simplices. This creates the two new boundaries φ + and φ − , and we maintain the property of constant f -value at a boundary.
Note that by assumption φ cannot cut a boundary face, and moreover all nonboundary faces have constant size. Therefore, this process takes time linear in |φ|, the number of faces φ intersects. This new simplicial complex is denoted by cut(φ, M). We now describe a high-level approach to construct C(M) using this cutting procedure. To prove Theorem 5.1, we require a theorem from [6] (Theorem 6.6) that maps paths in C(M) to M. In particular, for every monotone path P in M, there exists a monotone path Q in the contour tree to which P maps, and vice versa.
Theorem 5.1 is a direct consequence of the following lemma. Consider the contour class corresponding to edge (u, v) of C(M). There is a natural ordering of the contours by function value, ranging from f (u) to f (v). All contours in this class "above" φ form a maximal contour class in M , represented by edge (u, φ + ). Analogously, there is another contour class represented by edge (φ − , v). We have now accounted for all contours in C(M ), completing the proof.
A useful corollary of this lemma shows that a contour actually splits the simplicial complex into two disconnected complexes.

Definition 6.1 A simplicial complex with a height function is minimum dominant if
there exists a minimum x such that every non-minimal vertex in the complex has a non-ascending path to x. Analogously define maximum dominant.
The first aspect of the partitioning is "raining". Start at some point x ∈ M and imagine rain at x. The water will flow downwards along non-ascending paths and "wet" all the points encountered. Note that this procedure considers all points of the complex, not just vertices. Definition 6.2 Fix x ∈ M. The set of points y ∈ M such that there is a non-ascending path from x to y is denoted by wet(x, M) (which in turn is represented as a simplicial complex). A point z is at the interface of wet(x, M) if every neighborhood of z has non-trivial intersection with wet(x, M) (i.e. the intersection is neither empty nor the entire neighborhood).
The following claim gives a description of the interface.

Claim 6.3 For any x, each non-empty component of the interface of wet(x, M) contains a join vertex.
Proof If p ∈ wet(x, M), all the points in any contour containing p are also in wet(x, M). (Follow the non-ascending path from x to p and then walk along the contour.) The converse is also true, so wet(x, M) contains entire contours.
Let ε, δ be sufficiently small as usual. Fix some y at the interface. Note that y ∈ wet(x, M). (Otherwise, B ε (y) ∩ M is dry.) The points in B ε (y) ∩ M that lie below y have a descending path from y and hence must be wet. There must also be a dry point in B ε (y) ∩ M that is above y, and hence, there exists a dry, regular ( f (y) + δ)-contour φ intersecting B ε (y).
Let y be the contour containing y. Suppose for contradiction that ∀ p ∈ y , p has up-degree 1 (see Definition 2.3). Consider the non-ascending path from x to y and let z be the first point of y encountered. There exists a wet, regular ( f (y) + δ)-contour ψ intersecting B ε (z). Now, walk from z to y along y . If all points w in this walk have up-degree 1, then ψ is the unique ( f (y) + δ)-contour intersecting B ε (w). This would imply that φ = ψ, contradicting the fact that ψ is wet and φ is dry.
Note that wet(x, M) (and its interface) can be computed in time linear in the size of the wet simplicial complex. We perform a non-ascending search from x. Any face F of M encountered is partially (if not entirely) in wet(x, M). The wet portion is determined by cutting F along the interface. Since each component of the interface is a contour, this is equivalent to locally cutting F by a hyperplane. All these operations can be performed to output wet(x, M) in time linear in |wet(x, M)|.
We define a simple lift operation on the interface components. Consider such a component φ containing a join vertex y. Take any dry increasing edge incident to y, and pick the point z on this edge at height f (y) + δ (where δ is an infinitesimal, but larger than the value ε used in the definition of cut). Let lift(φ) be the unique contour through the regular point z. Note that lift(φ) is dry. The following claim follows directly from Theorem 5.4.  On the left, downward rain spilling only (each shade of gray represents a piece created by each different spilling), producing a grid. Note we are assuming raining was done in sorted order of the maxima (i.e. lowest to highest). On the right, flipping the direction of rain spilling Proof By Theorem 5.4, cut(M, lift(φ)) results in two disjoint simplicial complexes. Let N be the complex containing the point x (the argument in wet(x, M)), and let N be the other complex. Any path from x to N must intersect lift(φ), which is dry. Hence N is dry.
We describe the main partitioning procedure that cuts a simplicial complex N into extremum dominant complexes, shown below as rain(x, N). It takes an additional input of a maximum x. To initialize, we begin with N set to M and x as an arbitrary maximum. When we start, rain flows downwards. In each recursive call, the direction of rain is switched to the opposite direction. This is crucial to ensure linear running time, and Fig. 4 shows how failing to switch directions can lead to a blow up in complexity. The switching is easily implemented by inverting a complex N , achieved by negating the height values. We can now let rain flow downwards, as it usually does in our world.

Lemma 6.5 Each output M i is extremum dominant.
Proof Consider a call to rain(x, N). If the interface is empty, then all of N is in wet(x, N), so N is trivially extremum dominant. So suppose the interface is non-empty and consists of φ 1 , . . . , φ k (as denoted in the procedure). By repeated applications of Claim 6.4, N k+1 contains wet(x, M). Consider wet(x, N k+1 ). The interface is exactly φ 1 , . . . , φ k . So the only dry vertices of N k+1 are those on the boundaries created by calling cut for each φ i , but such boundaries are all maxima.
As rain(x, M) proceeds, new faces/simplices are created because of repeated cutting. The key to the running time of rain(x, M) is bounding the number of newly created faces, for which we have the following lemma. Lemma 6.6 A face F ∈ M is cut 2 at most once during rain(x, M).
Proof Notation here follows the pseudocode of rain. First, by Theorem 5.4, all the pieces on which rain is invoked are disjoint. Second, all recursive calls are made on dry complexes.
Consider the first time that F is cut, say, during the call to rain(x, N). Specifically, say this happens when cut(N i , φ i ) is constructed. cut(N i , φ i ) will cut F with two horizontal cutting planes, one ε above φ i and one ε below φ i . This breaks F into lower and upper portions which are then triangulated (there is also a discarded middle portion). The lower portion, which is adjacent to φ i , gets included in N k+1 , the complex containing the wet points, and hence does not participate in any later recursive calls. The upper portion (call it U ) is in L i . Note that the lower boundary of U is in the boundary B i . Since a recursive call is made to rain(B i , L i ) (and L i is inverted), U becomes wet. Hence U , and correspondingly F, will not be subsequently cut.
The following are direct consequences of Lemma 6.6 and the surgery procedure.

Theorem 6.7 The total running time of rain(x, M) is O(|M|).
Proof The only non-trivial operations performed are wet and cut. Since cut is a linear time procedure, Lemma 6.6 implies the total time for all calls to cut is O(|M|). As for the wet procedure, observe that Lemma 6.6 additionally implies there are only O(|M|) new faces created by rain. Therefore, since wet is also a linear time procedure, and no face is ever wet twice, the total time for all calls to wet is O(|M|). Proof Consider the tree of recursive calls in rain(x, M), with each node labeled with some M i . Walk through this tree in a leaf first ordering. Each time we visit a node we connect its contour tree to the contour tree of its children in the tree using the surgery procedure. Each surgery call takes constant time, and the total time is the size of the recursion tree.

Contour Trees of Extremum Dominant Complexes
The previous section allows us to restrict attention to extremum dominant complexes. We will orient so that the extremum in question is always a minimum. We will fix such a simplicial complex M, with the dominant minimum m * .
Recall the definitions of join and split trees from Sect. 4. The main theorem of this section asserts that contour trees of minimum dominant complexes have a simple description in terms of J (M). First we make the key observation that S(M) is trivial for a minimum dominant M. Proof It suffices to prove that each split v has one child that is just a leaf, which is a non-dominant minimum. Specifically, any minimum is a leaf in S(M) and thereby attached to a split, which implies that if we removed all non-dominant minima, we must end up with a path, as asserted above. Consider a split v. For sufficiently small ε, δ, there are exactly two ( f (v) − δ)contours φ and ψ intersecting B ε (v). Both of these are regular contours. There must be a non-ascending path from v to the dominant minimum m * . Consider the first edge (necessarily decreasing from v) on this path. It must intersect one of the ( f (v) − δ)contours, say φ. By Theorem 5.4, cut(M, φ) has two connected components, with one (call it L) having φ − as a boundary maximum. This complex contains m * as the non-ascending path intersects φ only once. Let the other component be called M .
Consider cut(M , ψ) with connected component N having ψ − as a boundary. N does not contain m * , so any path from the interior of N to m * must intersect the boundary ψ − . But the latter is a maximum in N, so there can be no non-ascending path from the interior to m * . Since M is overall minimum dominant, the interior of N can only contain a single vertex w, a non-dominant minimum.
The split v has two children in S(M), one in N and one in L. The child in N can only be the non-dominant minimum w, which is a leaf.
It is convenient to denote the non-dominant minima as m 1 , m 2 , . . . , m k and the corresponding splits (as given by the lemma above) as s 1 , s 2 , . . . , s k .
Using the above lemma we can now prove that computing the contour tree for a minimum dominant complex amounts to computing its join tree. Specifically, to prove our main theorem, we rely on the correctness of the merging procedure from [7] that constructs the contour tree from the join and split trees. It actually first constructs the augmented contour tree A(M), which is obtained by replacing each edge in the contour tree with a path of all regular vertices (sorted by height) whose corresponding contour belongs to the equivalence class of that edge.
Consider a tree T with a vertex v of in and out degree at most 1.
Erasing v from T is the following operation: if v is a leaf, just delete v. Otherwise, delete v and connect its neighbors by an edge (i.e. smooth v out). This tree is denoted by T v. The merging procedure of [7] is shown below as merge(J (M), S(M)). This procedure builds A(M) by iteratively adding edges that are adjacent to leaves of either J (M) or S(M), erasing the corresponding vertex from both J (M) and S(M), and repeating.

C(M) is then produced by erasing all regular vertices in A(M).
We use the fact that this procedure correctly produces the contour tree (regardless of the order leaves are processed), and refer the interested reader to [7] for the proof of this fact, as it is not needed for our purposes. Remark 7.4 The above theorem, combined with the previous sections, implies that in order to get an efficient contour tree algorithm, it suffices to have an efficient algorithm for computing J C (M). Note however that for minimum dominant complexes, converting between J C and J is trivial, as J is just J C with each non-dominant minimum m i augmented along the edge leaving s i . (Indeed J C (M) was only defined to make the above theorem statement clearer.) Thus to get an effi-cient contour tree algorithm it suffices to have an efficient algorithm for computing J (M).

Putting it All Together
We now have all the pieces required to prove Theorem 1.1.
Proof of Theorem 1.1 Consider a critical point v of the initial input complex. By Theorem 5.4 this vertex appears in exactly one of the pieces output by rain. As in the Theorem 1.1 statement, let v denote the length of the longest directed path passing through v in the contour tree of the input complex, and let v denote the longest directed path passing through v in the join tree of the piece containing v. By Theorem 5.1, ignoring non-dominant extrema introduced from cutting (whose cost can be charged to a corresponding saddle), the join tree on each piece output by rain is isomorphic to some connected subgraph of the contour tree of the input complex, and hence v ≤ v . Moreover, |H v | only counts vertices in a v to root path and so trivially |H v | ≤ v , implying Theorem 1.1.
Ultimately our goal is to prove the more refined upper and lower bounds mentioned in Sect. 1.1. To achieve this, in the next section we give a careful analysis of how heap sizes evolve over the course of our join tree algorithm, rather than using the blunt bound |H v | ≤ v , as done in the above proof.

Leaf Assignments and Path Decompositions
In this section, we set up a framework to analyze the time taken to compute a join tree J (M) (see Definition 4.1), and consequently for contour trees by the above discussion. We adopt all notation already defined in Sect. 4.4. From here forward we will often assume binary trees are full binary trees (this assumption simplifies the presentation but is not necessary).
Let χ be some fixed leaf assignment to a rooted binary tree T , which in turn fixes all the heaps H v . We partition the edges of T into a path decomposition, where each set of the partition (a path) is a subset of edges in T with each internal vertex of degree at most 2. This naturally gives a path decomposition. For each internal vertex v ∈ T , add the edge from v to arg max v l ,v r {|H v l |, |H v r |} where v l and v r are the children of v (if |H v l | = |H v r | then pick one arbitrarily). This is called the maximum path decomposition, denoted by P max (T ).
Our main goal in this section is to prove the following theorem. We use | p| to denote the number of vertices in a path p.
We conclude this section in Sect. 8.6 by showing that proving this theorem implies our main result Theorem 1.3.

Tall and Short Paths
The paths in P max (T ) naturally define a tree 3 of their own. Specifically, in the original tree T contract each path down to its root. Call the resulting tree the shrub of T corresponding to the path decomposition P max (T ). Abusing notation, we simply use P max (T ) to denote the shrub. As a result, we use terms like 'parent', 'child', 'sibling', etc. for paths as well. The shrub gives a handle on the heaps of a path. We use b( p) to denote the base of the path, which is the vertex in p closest to root of T . We use ( p) to denote the leaf in p. We use H p to denote H b( p) .
We wish to prove v∈T log |H v | = O( p∈P max | p| log | p|). The simplest approach is to prove ∀ p ∈ P max , v∈ p log |H v | = O(| p| log | p|). This is unfortunately not true, which is why we divide paths into two categories.

Definition 8.2
For p ∈ P max (T ), p is short if | p| < |H p |/100, and tall otherwise.
The following lemma demonstrates that tall paths can "pay" for themselves.

Lemma 8.3 If p is tall, v∈
There are some short paths that we can also "pay" for. Consider any short path p in the shrub. We will refer to the tall support chain of p as the tall ancestors of p in the shrub which have a path to p which does not use any short path (i.e. it is a chain of tall paths adjacent to p).

Definition 8.4
A short path p is supported if at least |H p |/100 vertices v in H p lie in paths in the tall support chain of p.
Let L be the set of short paths, L be the set of supported short paths, and H be the set of tall paths given by P max (T ) (notation which will be used for the remainder of this section).
Proof Pick p ∈ L . As we traverse the tall support chain of p, there are at least |H p |/100 vertices of H p that lie in these paths. These are encountered in a fixed order.
Consider any v ∈ H p . Let S be the set of paths p ∈ L such that v ∈ H p . We now show |S| ≤ 2 (i.e. it contains at most one path other than p). First observe that any two paths in S must be unrelated (i.e. S is an anti-chain), since paths which have an ancestor-descendant relationship have disjoint tall support chains. However, any vertex v receives exactly one color from each of its two subtrees (in T ), and therefore |S| ≤ 2 since any two paths which share descendant leaves in T (i.e. their heaps are tracking the same color) must have an ancestor-descendant relationship.
In other words, any log |H v | appears at most twice in the above triple summation. Hence, we can bound it by O( q∈H v∈q log |H v |).

Corollary 8.6
p∈L v∈ p log |H v | = O( p∈P max | p| log | p|). Proof By applying Lemma 8.3, Claim 8.5, and then Lemma 8.3 again, we have the following.

Unsupported Shrubs
So far we know that tall paths and supported short paths can be "paid for." We will also be able to pay for unsupported paths, though doing so is much trickier.
Recall that L denotes the set of short paths, L the set of supported short paths, and H the set of tall paths given by P max (T ). We now describe a partitioning of the unsupported paths into a forest of shrubs. Consider p ∈ L\L , and traverse the chain of ancestors from p. Eventually, we must reach another short path q. (If not, we have reached the root r of P max (T ). Hence, p is supported.) Insert an edge from p to q, so q is the parent of p. This construction leads to the shrub forest of L\L , denoted F, where all the roots are supported short paths, and the remaining nodes are the unsupported short paths.
Let U be a shrub in F. For any node q in the shrub U, let T SC(q) denote the set of all vertices of T from all paths in the tall support chain of q. For q in U we then define the loss to be λ(q) = T SC(q) ∩ H q . Intuitively, if c is a child node of a parent node p in U, then λ(c) are the vertices "lost" from the heap of c to its tall support chain, by the time one reaches p.
The following is a key property of unsupported paths which we will use multiple times. It is worth noting that the proof of this lemma is where the construction of P max (T ) enters the picture.

Lemma 8.7
Let U denote a connected component of F, and let q be the child of some node p in U. Then |H p | ≥ 3 2 |H q |. Proof Let h(q) denote the tall path that is a child of p in P max (T ), and an ancestor of q. If no such tall path exists, then by construction p is the parent of q in P max (T ), and the following argument will go through by setting h(q) = q.
The chain of ancestors from q to h(q) consists only of tall paths. Since q is unsupported, these paths contain at most |H q |/100 vertices of H q . Thus, |H h(q) | ≥ 99|H q |/100.
Consider the base of h(q), which is a node w in T . Let v denote the sibling of w in T . Their parent is called u. Note that both u and v are nodes in the path p. Now, the decomposition P max (T ) puts u and v in the same path p. This implies Since p is a short path, | p| < |H p |/100. Applying this bound, we get |H p | ≥ (2 − δ)|H w | (for a small constant δ > 0). Since w is the base of h(q), H w = H h(q) . We apply the bound |H h(q) | ≥ 99|H q |/100 to get |H p | ≥ 197|H q |/100, implying the lemma.

Corollary 8.8 Let U denote a connected component of F. Let p and d be any two nodes in U such that d is a descendant of p, and p is not the root of U. Then for any
Proof By the previous lemma, |H w | = O(|H p |). Note that there are at most |H p |/100 nodes in λ( p), since these are all in the tall support chain of p (which is unsupported). Thus, for any v ∈ λ( p), |H v | ≥ 99|H p |/100, completing the proof.
To handle the unsupported paths, for every shrub U in F we first prune away what we call buffered sub-shrubs. In the following subsection, we show how to charge away the cost of these sub-shrubs to their tall support chains. In Sect. 8.4 we then show how to pay for the remaining pruned shrubs, which is the most difficult part of the argument. Afterwards in Sect. 8.5, it is then a straightforward task to put everything together to prove Theorem 8.1.

Pruning Away Buffered Sub-shrubs
Ultimately our goal in this subsection is to prune away subtrees from U which can be paid for by charging to their tall support chains (similarly to supported short paths). The remaining portion of the shrub will then be handled in a later subsection. We now formally define this pruning process, which is quite intuitive if one thinks of U as an actual physical shrub.

Definition 8.9
Let U denote a shrub in F. A pruning U of U is any subgraph defined by some number of rounds of the following iterative process. Initially set U = U. In each iteration select some subtree U r rooted at a node r ∈ U and set U = U \{U r }.
The above definition does not specify which subtrees are pruned, which we now formally define.

Definition 8.10
Let U be a pruning of a shrub of U of F. Then a rooted subtree B of U is called a buffered sub-shrub, if it does not contain the root of U (i.e. it is a proper subtree), and the following conditions hold:
The following is quite straightforward. By pruning from the leaves towards the root, we can remove all buffered sub-shrubs from a given U. {U , B 1 , B 2 , . . . B r } where each B i is buffered. Furthermore, no subtree of U is buffered.

Lemma 8.11 Any shrub U can be partitioned into
Proof Take any reverse topological order of the nodes in U (so descendants occur before ancestors). Initialize U to U. We process the nodes in order. For node p, let U p denote the rooted subtree of p in U . We simply check if q∈U p |λ(q)| ≥ q∈U p |q|/100. If so, we prune off U p , unless p is the root. (Note that U is redefined to be the remainder.) At the end of this process, we have the desired decomposition. Since we process in reverse topological order, the minimality condition of Definition 8.10 automatically holds. Thus, each pruned sub-shrub is buffered and the remaining U has the desired property.
We prove an important lemma. The cost of heap operations corresponding to a buffered sub-shrub can be charged to tall support chains.

Lemma 8.12 Let B be a buffered sub-shrub. Then
Proof The proof is done by a charging argument. For every p ∈ B and v ∈ λ( p), we create 100 tokens. Each of these tokens is labeled log |H v |, and this is called the "value" of the token. Observe that the sum of token values is, up to a constant factor, the bound we wish to show. The set of tokens created for all v ∈ λ( p) is denoted T p .
There is a grand total of 100 p∈B |λ( p)| tokens, which (by the buffered property of B) is strictly more than p∈B | p|. We will place all these tokens on the nodes in B, using the following process. We reverse topologically sort (descendants before ancestors) the nodes in B. For each p in order, we will place tokens in T p on the nodes of the subtree B p . Take an arbitrary token in T p , and find the smallest (in the ordering) descendant q of p that currently has strictly less than |q| tokens. Place the token on q, and repeat. This is repeated until all descendants q have |q| tokens, or all tokens in T p are used up. When a node q has |q| tokens, we use "q is full". Now for a useful claim. For non-root p, all tokens of T p are used up. Suppose not. Consider the first such p in the ordering, and look at the situation when we finish processing T p . Every descendant q of p must be full. Furthermore, all the tokens on q ∈ B p must have come from some T p , where p ∈ B p . Thus, q∈B p |q| < | p ∈B p T p | = 100 p ∈B p |λ( p )|. This violates the second condition of buffered shrubs.
Let r be the root of B. We argue that T r is not used up. Suppose the contrary, so T r is used up. By the previous paragraph, all tokens are used up. But this implies that p∈B | p| ≥ | p∈B T p | = 100 p∈B |λ( p)|, violating the buffered condition. Since T r is not used up, it must mean that all p's are full, as r is the root and so T r tokens can be placed on any path which is not full. Consider any p, and let K p be the set of tokens placed at p. Each token is labeled log |H v |, for some v ∈ λ(p), wherê p is an ancestor of p or p itself (as descendants do not place tokens on ancestors).
For any fixed vertex v on a tall path, there are at most two unsupported short paths such that v is contained in λ(·). Specifically, v is assigned exactly two colors, one from each of its child subtrees in T . Pick one of these two colors, which corresponds to some leaf in T . Any two unsupported short paths which are tracking the color of this leaf, must be ancestor-descendant in P max , however such paths have disjoint tall support chains, and hence only one such path can count v in λ(·). (Note this is the same argument as in Claim 8.5.) Thus by summing over all tall support chains we have the following.

Paying for Pruned Shrubs: The Root Is Everything
We focus attention on the remaining shrub left by Lemma 8.11: this is a shrub U such that no subtree of U is buffered. Refer to such a shrub as buffer-pruned.
The goal of this section is to prove the following lemma.

Lemma 8.14 Let U denote a buffer-pruned sub-shrub of a shrub in F and let r be
Lemma 8.14 asserts the root r in U pretty much encompasses all sizes and heaps in U . Observe that part (i) of Lemma 8.14 is immediate, as it follows directly from Lemma 8.7. Proving part (ii) requires much more work.
We need the following technical claim. Let N (U ) denote the set of nodes in U .

Claim 8.15
There exists a function α : N (U ) → R + with the following properties. (We use α p to denote the function value.) Let A p = q∈U p α q . Then, Proof We define α inductively, from leaves upwards. For a leaf in U , set α = |H |. Consider an internal node p with children q 1 , q 2 , . . . , q k . The heap H p is formed by the union of the H q i heaps, after deleting certain vertices from them, and potentially adding some new vertices. The vertices in p and those in the λ(q i ) are deleted. Define α p to be the number of new vertices added to H p .
Thus, we have |H p | = i≤k |H q i | + α p − # vertices deleted from H q i 's. The number of vertices deleted is at most 2(| p| + i≤k |λ(q i )|). There is a factor two because each vertex appears in at most two heaps.
It is evident that |H p | ≤ q∈U p α q = A p . Note that Crucially, since U is buffer-pruned, q∈U p ,q = p |λ(q)| ≤ q∈U p ,q = p |q|/100 ≤ q∈U p |q|/100. This completes the proof.
We have the main charging claim. This assumes that A p is increasing exponentially as we go up the tree. That property is proven later by induction, using the following claim for sub-shrubs.

Claim 8.16
Fix any path p ∈ U \r . Suppose for any q, q ∈ U p where q is a parent of q in U p , A q ≥ (5/4)A q . Then q∈U p |q| ≤ A p /20.
Proof Since q is an unsupported short path, |q| < |H q |/100 ≤ |A q |/99. We prove that q∈U p |A q |/99 ≤ A p /20 by a charge redistribution scheme. Assume that each q ∈ U p starts with A q /99 units of charge. We redistribute this charge over all nodes in U q , and then calculate the total charge. For q ∈ U p , spread its charge to all nodes in U q proportional to α values. In other words, give ( A q /99) · (α q /A q ) units of charge to each q ∈ U q .
After the redistribution, let us compute the charge deposited at q. Every ancestor in U p , denoted a 0 , a 1 , a 2 , . . . , a k , contributes to the charge at q. The charge is expressed in the following equation. We use the assumption that A a i ≥ (5/4)A a i−1 and hence A a i ≥ (5/4) i A a 0 ≥ (5/4) i , as a 0 is an unsupported short path and hence A a 0 ≥ 1.
The total charge is q∈U p α q t/20 = A p /20.

Claim 8.17
Fix any path p ∈ U \r . Let q be any path in U p , then for any child path q of q it holds that A q ≥ (5/4)A q .
Proof We prove this by induction over the depth of q. When q is a leaf, this is vacuously true. Consider some internal node q, with children q 1 , q 2 , . . . , q k . Applying the induction hypothesis to the subshrubs rooted at q i , we can invoke Claim 8. 16  and A q ≥ |H q |. By Lemma 8.7, |H q | ≥ (3/2)|H q i |. We combine these bounds to complete the proof.
The proof of Lemma 8.14 just combines all these claims, and essentially repeats the argument in the previous claim. We apply Claims 8.17 and 8.16 to every child of the root r . Denoting the children as q 1 , q 2 , . . ., we get that p∈U q i | p| ≤ A q i /20.
Thus, |H r | = (A r ). We combine these bounds to show that p∈U | p| = O(|H r |).

Proving Theorem 8.1
We split the summation into tall, supported short, and unsupported short paths.
By Corollary 8.6 and Lemma 8.3, the second and last term can be bounded by O( p∈P max (T ) | p| log | p|). The following claim applies the results of the previous two subsections to give the same bound on the first term, thus completing the proof of Theorem 8.1.
Proof Apply Lemma 8.11 to U to get the family of buffered sub-shrubs B(U) and buffer-pruned U .
where the last two lines follow from Corollary 8.13 and Lemma 8.3, respectively. So what remains is to bound the first term. By Lemma 8.14, |H v | = O(|H r |), where r is the root of U , and furthermore, Note that the roots in the shrub forest are supported short paths, and hence when we sum this over all U in the shrub forest we get where the last equality follows by Claim 8.5 (via the same argument as in Corollary 8.6).

Our Main Result
We now show that Theorem 8.1 allows us to upper bound the running time for our join tree and contour tree algorithms in terms of path decompositions. This result for join trees easily implies our main result, Theorem 1.3, which we now restate and prove. At this point we can now see what the path decomposition referenced in theorem statement should be. It is just the union of all the maximum path decompositions across the extremum dominant pieces, P max (C(M)) = k i=1 P max (J (M i )). Since all procedures besides computing the join trees take linear time in the size of the input complex, we can therefore compute the contour tree in time

Lower Bound by Path Decomposition
We first prove a lower bound for join trees, and then generalize to contour trees. Note that the form of the theorem statements in this section differs from Theorem 1.4, as they are stated directly in terms of path decompositions. Theorem 1.4 is an immediate corollary of the final theorem of this section, Theorem 9.7.

Join Trees
We focus on terrains, so d = 2. Consider any path decomposition P of a valid join tree (i.e. any rooted binary tree). When we say "compute the join tree", we require the join tree to be labeled with the corresponding vertices of the terrain. Lemma 9.1 Fix any path decomposition P. There is a family of terrains, F P , all with the same triangulation, such that |F P | = p i ∈P (| p i | − 1)!, and no two terrains in F P define the same labeled join tree.
Proof We describe the basic building block of these terrains, which corresponds to a fixed path p ∈ P. Informally, a tent is an upside down cone with m triangular faces (see Fig. 5). Construct a slightly tilted cycle of length m with the two antipodal points at heights 1 and 0. These are called the anchor and trap of the tent, respectively. The remaining m − 2 vertices are evenly spread around the cycle and heights decrease monotonically when going from the anchor to the trap. Next, create an apex vertex at some appropriately large height, and add an edge to each vertex in the cycle. Note that in order to obtain a terrain, a cycle here means the vertices lie on a circle but are connected with straight line segments rather than by the curved edges of the circle. Now we describe how to attach two different tents. In this process, we glue the base of a scaled down "child" tent on to a triangular cone face of the larger "parent" tent (see Fig. 5). Specifically, the anchor of the child tent is attached directly to a face of the parent tent at some height h. The remainder of the base of the child cone is then extended down (at a slight angle) until it hits the face of the parent. Note that this gluing process introduces some non-triangular faces, and so these must be triangulated in order to maintain a proper triangulated terrain.
The full terrain is obtained by repeatedly gluing tents. For each path p i ∈ P, we create a tent of size | p i | + 1. The two faces adjacent to the anchor are always empty, and the remaining faces are for gluing on other tents. (Note that tents have size | p i |+1 Fig. 5 Left: angled view of a tent / Right: a parent and child tent put together since | p i | − 1 faces represent the joins of p i , the apex represents the leaf, and we need two empty faces next to the anchor.) Now we glue together tents of different paths in the same way the paths are connected in the shrub P S (see Sect. 8.1). Specially, for two paths p, q ∈ P where p is the parent of q in P S , we glue q onto a face of the tent for p as described above. (Naturally for this construction to work, tents for a given path will be scaled down relative to the size of the tent of their parent.) By varying the heights of the gluing, we get the family of terrains.
Observe now that the only saddle points in this construction are the anchor points. Moreover, the only maxima are the apexes of the tents. We create a global boundary minimum by setting the vertices at the base of the tent representing the root of P S all to the same height (and there are no other minima). Therefore, the saddles on a given tent will appear contiguously on a root to leaf path in the join tree of the terrain, where the leaf corresponds to the maximum of the tent (since all these saddles have a direct line of sight to this apex). In particular, this implies that, regardless of the heights assigned to the anchors, the join tree has a path decomposition whose corresponding shrub is equivalent to P S .
There is a valid instance of this described construction for any relative ordering of the heights of the saddles on a given tent. In particular, there are (| p i | − 1)! possible orderings of the heights of the saddles on the tent for p i , and hence p i ∈P (| p i | − 1)! possible terrains we can build. Each one of these functions will result in a different (labeled) join tree. All saddles on a given tent will appear in sorted order in the join tree. So, any permutation of the heights on a given tent corresponds to a permutation of the vertices along a path in P. Proof The primary "non-determinism" of the algorithm is the initial painting constructed by init(M). We show that regardless of how paint spilling is done, the number of heap operations is bounded as above.
Consider an arbitrary order of the initial paint spilling over the surface. Consider any join on a face of some tent, which is the anchor point of some connecting child tent. The join has two up-stars, each of which has exactly one edge. Each edge connects to a maximum and must be colored by that maximum. Hence, the two colors touching this join (according to Definition 4.12) are the colors of the apexes of the child and parent tent.
Take any join v, with two children w 1 and w 2 . Suppose w 1 and v belong to the same path in the decomposition. The key is that any color from a maximum in the subtree at w 2 cannot touch any ancestor of v. This subtree is exactly the join tree of the child tent attached at v. The base of this tent is completely contained in a face of the parent tent. So all colors from the child "drain off" to the base of the parent, and do not touch any joins on the parent tent.
Hence, |H v | is at most the size of the path in P containing v. By Lemma 4.13, the total number of heap operations is at most v log |H v |, completing the proof.
The following is the equivalent of Theorem 1.4 for join trees, and immediately follows from the previous lemmas. Note that the phrasing of the two theorems differ, but can be connected by observing that for any value of γ from Theorem 1.4, there is a path decomposition P of some tree T such that γ = ( p∈P | p| log | p|).

Theorem 9.3 Consider a rooted tree T and an arbitrary path decomposition P of T .
There is a family F P of terrains such that any algebraic decision tree computing the join tree 4 (on F P ) requires ( p∈P | p| log | p|) time. Furthermore, our algorithm makes O( p∈P | p| log | p|) comparisons on all these instances.
Proof The proof is a basic entropy argument. Any algebraic decision tree that is correct on all of F P must distinguish all inputs in this family. By Stirling's approximation, the depth of this tree is ( p i ∈P | p i | log | p i |). Lemma 9.2 completes the proof.

Contour Trees
We first generalize previous terms to the case of contour trees. In this section T will denote an arbitrary contour tree with every internal vertex of degree 3.
For simplicity we now restrict our attention to path decompositions consistent with the raining procedure described in Sect. 6 (more general decompositions can work, but it is not needed for our purposes). Definition 9.4 A path decomposition, P(T ), is called rain consistent if its paths can be obtained as follows. Perform a downward BFS from an arbitrary maximum v in T , and mark all vertices encountered. Now recursively run a directional BFS from all vertices adjacent to the current marked set. Specifically, for each BFS run, make it an downward BFS if it is at an odd height in the recursion tree and upward otherwise.
This procedure partitions the vertex set into disjoint rooted subtrees of T , based on which BFS marked a vertex. For each such subtree, now take any partition of the vertices into leaf paths. 5 The following is analogous to Lemma 9.1, and in particular uses it as a subroutine. Lemma 9.5 Let P be any rain consistent path decomposition of some contour tree. There is a family of terrains, F P , all with the same triangulation, such that the size of F P is p i ∈P (| p i | − 1)!, and no two terrains in F P define the same contour tree.
Proof As P is rain consistent, the paths can be partitioned into sets P 1 , . . . , P k , where P i is the set of all paths with vertices from a given BFS, as described in Definition 9.4. Specifically, let T i be the subtree of T corresponding to P i and let r i be the root vertex of this subtree. Note that the P i sets naturally define a tree where P i is the parent of P j if r i (i.e. the root of T i ) is adjacent to a vertex in P j .
As the set P i is a path decomposition of a rooted binary tree T i , the terrain construction of Lemma 9.1 for P i is well defined. Actually the only difference is that here the rooted tree is not a full binary tree, and so some of the (non-achor adjacent) faces of the constructed tents will be blank. Specifically, these blank faces correspond to the adjacent children of P i , and they tell us how to connect the terrains of the different P i 's.
So for each P i construct a terrain as described in Lemma 9.1. Now each T i is (roughly speaking) a join or a split tree, depending on whether the BFS which produced it was an upward or downward BFS, respectively. As the construction in Lemma 9.1 was for join trees, each terrain we constructed for a P i which came from a split tree, must be flipped upside down. Now we must described how to glue the terrains together.
By construction, the tents corresponding to the paths in P i are connected into a tree structure (i.e. corresponding to the shrub of P i ). Therefore the bottoms of all these tents are covered except for the one corresponding to the path containing the root r i . If r i corresponds to the initial maximum that the rain consistent path decomposition was defined from, then this will be flat and corresponds to the global outer face. Otherwise, P i has some parent P j in which case we connect the bottom of the tent for r i to a free face of a tent in the construction for P j , specifically, the face corresponding to the vertex in T which r i is adjacent to. This gluing is done in the same manner as in Lemma 9.1, attaching the anchor for the root of P i directly the corresponding face of P j , except that now P i and P j have opposite orientations. See Fig. 6.
Just as in Lemma 9.1 we now have one fixed terrain structure, such that each different relative ordering of the heights of the join and split vertices on each tent produces a surface with a distinct contour tree. The specific bound on the size of F P , defining these distinct contour trees, follows by applying the bound from Lemma 9.1 to each P i .

Lemma 9.6
For all M ∈ F P , the number of heap operations is ( p∈P | p| log | p|).
Proof This lemma follows immediately from Lemma 9.2. The heap operations can be partitioned into the operations performed in each P i . Apply Lemma 9.2 to each of the P i separately and take the sum.
We now restate Theorem 1.4, which follows immediately from an entropy argument, analogous to Theorem 9.3. Again, as discussed above for Theorem 9.3, the phrasing differs from Theorem 1.4. Theorem 9.7 Consider any rain consistent path decomposition P. There exists a family F P of terrains (d = 2) with the following properties. Any contour tree algorithm makes ( p∈P | p| log | p|) comparisons in the worst case over F P . Furthermore, for any terrain in F P , our algorithm makes O( p∈P | p| log | p|) comparisons. Remark 9.8 Note that for the terrains described in this section, the number of critical points is within a constant factor of the total number of vertices. In particular, for this family of terrains, all previous algorithms required (n log n) time.     The table assumes that in the stack update, s 1 is the first arbitrary unprocessed critical point selected Note that in Table 1, a "stage" refers to any time at which one of the data structures is modified. For simplicity, the trivial stages where the maxima are removed from their corresponding heaps are not shown. (Alternatively the algorithm in Sect. 4.2 could be modified to exclude the maxima from the heaps, but we wanted to keep the algorithm description simple.)