Abstract 1 Introduction 2 Preliminaries 3 The family of support networks and its two subfamilies 4 Counting the support networks of three types 5 Finding a support network with the fewest reticulations 6 Finding a support network with the minimum level 7 Conclusion and future directions References Appendix A Appendix

Which Phylogenetic Networks Are Level-k Networks with Additional Arcs? Structure and Algorithms

Takatora Suzuki ORCID Department of Pure and Applied Mathematics, Graduate School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan Momoko Hayamizu111Corresponding author ORCID Department of Applied Mathematics, Faculty of Science and Engineering, Waseda University, Tokyo, Japan
Abstract

Reticulate evolution gives rise to complex phylogenetic networks, making their interpretation challenging. A typical approach is to extract trees within such networks. Since Francis and Steel’s seminal paper, “Which Phylogenetic Networks are Merely Trees with Additional Arcs?” (2015), tree-based phylogenetic networks and their support trees (spanning trees with the same root and leaf-set as a given network) have been extensively studied. However, not all phylogenetic networks are tree-based, and for the study of reticulate evolution, it is often more biologically relevant to identify support networks rather than trees. This study generalizes Hayamizu’s structure theorem, which yielded optimal algorithms for various computational problems on support trees of rooted almost-binary phylogenetic networks, to extend the theoretical framework for support trees to support networks. This allows us to obtain a direct-product characterization of each of three sets: all, minimal, and minimum support networks, for a given network. Each characterization yields optimal algorithms for counting and generating the support networks of each type. Applications include a linear-time algorithm for finding a support network with the fewest reticulations (i.e., the minimum tier). We also provide exact and heuristic algorithms for finding a support network with the minimum level, both running in exponential time but practical across a reasonably wide range of reticulation numbers.

Keywords and phrases:
Phylogenetic networks, Support networks, Level-k networks, Tier-k networks, Structure theorem, Enumeration, Optimization
Funding:
Takatora Suzuki: JST SPRING Program Grant Number JPMJSP2128, Japan.
Momoko Hayamizu: JST FOREST Program Grant Number JPMJFR2135, Japan.
Copyright and License:
[Uncaptioned image] © Takatora Suzuki and Momoko Hayamizu; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Applied computing Biological networks
Related Version:
Previous Version (Preprint): https://doi.org/10.48550/arXiv.2505.11947
Supplementary Material:
Software  (Source Code, Datasets and Detailed Experimental Results): https://github.com/hayamizu-lab/structure-support-networks
  archived at Software Heritage Logo swh:1:dir:a40004b85200c9d46525feb46dc740a1f83ab3eb
Acknowledgements:
The authors thank Tsuyoshi Urata, Haruki Miyaji, and Yukihiro Murakami for their discussions, Manato Yokoyama for help with the GitHub repository, and anonymous reviewers for their careful reading and useful comments.
Editors:
Broňa Brejová and Rob Patro

1 Introduction

Evolutionary histories involving reticulate events, such as horizontal gene transfer, hybridization, and recombination, are better represented by networks than trees. However, networks can be far more complex than trees, making their interpretation challenging. A typical approach is to find meaningful subgraphs, particularly spanning trees, of these networks.

Since Francis and Steel’s seminal paper “Which Phylogenetic Networks are Merely Trees with Additional Arcs?” [8], tree-based networks and their support trees (spanning trees with the same root and leaf-set as the network, also known as subdivision trees) have been extensively studied (e.g., [9, 10, 16]). Hayamizu [9] developed optimal (linear-time or linear-delay) algorithms for various computational problems, including counting, listing, and optimization of support trees, by establishing a theoretical foundation called a structure theorem for rooted almost-binary phylogenetic networks, a class encompassing binary ones. This theorem provides a canonical way to decompose any such network into its unique maximal zig-zag trails, and characterizes the family of edge-sets of support trees of the network.

While tree-based phylogenetic networks encompass many well-studied subclasses such as tree-child [1], stack-free [14], and orchard networks [4, 12], networks inferred from biological data are not necessarily tree-based. This has led to increased interest in non-tree-based networks (e.g., [2, 6, 7, 15]). Indeed, for the study of reticulate evolution, it is often more biologically relevant to consider networks within networks rather than trees. In such contexts, finding a most concise subgraph, such as one with the fewest reticulations (minimum tier) or with the minimum level, is particularly meaningful. These metrics can be interpreted as measures of deviation from being tree-based [7, 6], as they equal zero for tree-based networks.

In this paper, we generalize Hayamizu’s structure theorem to build a theoretical framework for support networks of rooted almost-binary phylogenetic networks. Building on the maximal zig-zag trail decomposition established in [9], we derive direct-product characterizations of three families of support networks for a given network N: all support networks 𝒜N, minimal support networks N, and minimum support networks 𝒞N. These characterizations yield closed-form product formulas for their cardinalities, revealing unexpected connections to Fibonacci, Lucas, Padovan, and Perrin numbers. We present a linear-time algorithm for counting each of |𝒜N|, |N|, and |𝒞N|, as well as a linear-delay algorithm to list the support networks in each family.

Our theoretical results lead to practical algorithms for two key optimization problems: Problem 1 (Reticulation Minimization) and Problem 2 (Level Minimization). Problem 1 asks for a support network with the minimum reticulation number (minimum tier). We present a linear-time optimal algorithm for solving Problem 1, which can be achieved by selecting any element from 𝒞N. Problem 2 asks for a support network with the minimum level. We conjecture that this problem is NP-hard and present exact and heuristic algorithms for solving Problem 2 (Algorithms 1 and 2). Although they are both exponential-time algorithms, they are practical for networks with a reasonably wide range of reticulation numbers. The heuristic is particularly scalable and accurate in most cases.

The remainder of the paper is organized as follows. In Section 2, we define graph theoretical terminology and review the relevant materials on support trees (Section 2.1) and the structure theorem (Section 2.2). In Section 3, we define support networks, their families 𝒜N, N, and 𝒞N, and relevant concepts. Section 4 provides characterizations and counting formulas for each family. Numerical results are also presented in Section 4.4. In Section 5, we give a linear-time algorithm for solving Problem 1. Section 6 presents the exact and heuristic algorithms for Problem 2, along with performance evaluations using synthetic data. Section 7 concludes with a summary of our contributions and directions for future research.

2 Preliminaries

The graphs in this paper are finite, simple (i.e. having neither loops nor multiple edges), acyclic directed graphs, unless otherwise stated. For a graph G, V(G) and E(G) denote the sets of vertices and edges of G, respectively. For two graphs G and H, G is a subgraph of H if both V(G)V(H) and E(G)E(H) hold, in which case we write GH. Two graphs G and H are isomorphic, denoted by G=H, if there exists a bijection φ:V(G)V(H) such that (u,v)E(G) if and only if (φ(u),φ(v))E(H) for all u,vV(G). A subgraph G of H is proper if GH. A subgraph G of H is a spanning subgraph of H if V(G)=V(H).

Given a graph G and a non-empty subset SE(G), the edge-set S is said to induce the subgraph G[S] of G, that is, the one whose edge-set is S and whose vertex-set is the set of the ends of all edges in S. For a graph G with |E(G)|1 and a partition {E1,,Ed} of E(G), the collection {G[E1],,G[Ed]} is a decomposition of G, and G is said to be decomposed into G[E1],,G[Ed]. Here we recall that a partition of a set is a collection of pairwise disjoint non-empty subsets whose union is the entire set.

For an edge e=(u,v) of a graph G, u and v are denoted by tail(e) and head(e), respectively. For a vertex v of a graph G, the in-degree of v in G, denoted by indegG(v), is the cardinality of {eE(G)ℎ𝑒𝑎𝑑(e)=v}. The out-degree of v in G, denoted by outdegG(v), is defined in a similar way. For any graph G, a vertex vV(G) with outdegG(v)=0 is called a leaf of G. Subdividing an edge (u,v) means replacing it with a directed path from u to v of length at least two. Smoothing a vertex v where indegG(v)=outdegG(v)=1 means suppressing v from G, namely, the reverse operation of edge subdivision.

An undirected graph is connected if there is a path between every pair of vertices. For a connected simple undirected graph G, a cut vertex (resp. cut edge) of G is a vertex (resp. edge) whose removal disconnects G, and a block of G is a maximal connected subgraph of G that contains no cut vertex. In this paper, a block of a directed graph refers to a block of its underlying undirected graph.

2.1 Phylogenetic networks and support trees

Suppose X represents a non-empty finite set of present-day species. A rooted almost-binary phylogenetic network (on a leaf-set X) is defined to be a finite simple directed acyclic graph N with the following properties (P1)–(P3). The vertex ρ is called the root of N, and any vertex v with indegN(v)>1 is called a reticulation of N.

(P1)

There exists a unique vertex ρ of N with indegN(ρ)=0 and outdegN(ρ){1,2};

(P2)

The set of leaves of N is identical to X;

(P3)

For any vV(N)(X{ρ}), indegN(v){1,2} and outdegN(v){1,2};

The above definition allows N to contain a vertex v with indegN(v)=outdegN(v)=2 or indegN(v)=outdegN(v)=1. In this sense, it slightly generalizes the notion of rooted binary phylogenetic networks N on X, which is defined by (P1), (P2) and (P4). When N has the properties (P1), (P2) and (P5), N is particularly called a rooted binary phylogenetic tree on X.

(P4)

For any vV(N)(X{ρ}), {indegN(v),outdegN(v)}={1,2}.

(P5)

For any vV(N)(X{ρ}), (indegN(v),outdegN(v))=(1,2).

While the concept of support trees was originally defined for rooted binary phylogenetic networks in [8], the theoretical framework developed in [9], which includes the structure theorem and associated algorithms and forms the foundation of the present work, applies equally to rooted almost-binary networks. Therefore, in this paper, following the approach taken in [15], we will consider support trees of almost-binary (including binary) networks, unless stated otherwise. For brevity, we omit “rooted almost-binary” when no confusion is likely to arise.

Definition 1.

Let N be a rooted almost-binary phylogenetic network on X. If there exists a rooted binary phylogenetic tree T~ on X and a spanning tree T of N such that T~ is obtained by smoothing all vV(T) with indegT(v)=outdegT(v)=1, then N is called a tree-based network on X. In this case, T is called a support tree of N and T~ a base tree of N.

As a support tree T is a spanning tree of N=(V,E), each T is specified by its edge-set. This leads to the natural question: Which subsets S of E yield a support tree N[S]=(V,S) of N? Francis and Steel [8] proved that such “admissible” subsets S of E are characterized by the following conditions (C1)–(C3). As in [9], we slightly generalize the original definition in [8] so that we can consider admissible edge selections for any subgraph of N.

Definition 2.

Let N be a rooted almost-binary phylogenetic network and let Z be any subgraph of N. A subset S of E(Z) is admissible if it satisfies the following conditions:

(C1)

If (u,v) is an edge of Z with outdegN(u)=1 or indegN(v)=1, then S contains (u,v).

(C2)

If e1 and e2 are distinct edges of Z with tail(e1)=tail(e2), then S contains at least one of {e1,e2}.

(C3)

If e1 and e2 are distinct edges of Z with head(e1)=head(e2), then S contains exactly one of {e1,e2}.

Theorem 3 (almost-binary version of Theorem 1(a) in [8]).

Let N be a rooted almost-binary phylogenetic network and let SE(N). Then, the subgraph N[S] of N induced by S is a support tree of N if and only if S is an admissible subset of E(N). Moreover, there exists a one-to-one correspondence between the family of admissible subsets S of E(N) and the family of support trees of N.

 Remark 4.

In the setting where the internal vertices of N (those except the root and leaves) are unlabeled and thus indistinguishable from one another, two different admissible subsets S1S2 of E(N) can induce isomorphic support trees N[S1]=N[S2]. Due to this subtlety, multiple variations of the support tree counting problem have been explicitly formulated (see Fig. 1 and Section 7.2 in [9]). Therefore, a more precise interpretation of Theorem 3 may be that it establishes a one-to-one correspondence between the family of admissible subsets S of E(N) and the family of support trees of N, under the assumption that all vertices (or edges) of N are distinguishable. In this paper, we adopt this assumption to obtain generalizations of Theorem 3.

Theorem 3 and Remark 4 allow us to identify each support tree T of N with its edge-set E(T). In other words, counting or listing support trees of N refers to counting or listing admissible subsets of E(N).

2.2 Structure theorem for rooted almost-binary phylogenetic networks

We recall the relevant materials from [9, 15]. Given a rooted almost-binary phylogenetic network N, a connected subgraph Z of N with m:=|E(Z)|1 is a zig-zag trail (in N) if the edges of Z can be permuted as (e1,,em) such that either head(ei)=head(ei+1) or tail(ei)=tail(ei+1) holds for each i[1,m1]. A zig-zag trail is represented by an alternating sequence of (not necessarily distinct) vertices and distinct edges, e.g., (v0,(v0,v1),v1,(v2,v1),v2,,(vm,vm1),vm), but this can be more concisely written as v0>v1<v2>>vm1<vm or its reverse, where each edge (vi,vi+1) is represented by vi>vi+1. A zig-zag trail Z in N is maximal if N contains no zig-zag trail Z such that Z is a proper subgraph of Z. Each maximal zig-zag trail in N falls into one of the following four types. A crown is a maximal zig-zag trail Z that has even m:=|E(Z)|4 and can be written in the cyclic form v0<v1>v2<v3>>vm2<vm1>vm=v0. A fence is any maximal zig-zag trail that is not a crown. An N-fence is a fence Z with odd m:=|E(Z)|1, expressed as v0>v1<v2><vm1>vm. A fence Z with even m:=|E(Z)|2 is an M-fence if expressed as v0<v1>v2<<vm1>vm, and it is a W-fence if expressed as v0>v1<v2>>vm1<vm.

As established in [9] (and will be restated in Theorem 8), any rooted almost-binary phylogenetic network N admits a unique decomposition 𝒵={Z1,,Zd} into its maximal zig-zag trails (d1), called the zig-zag trail decomposition of N (see Figure 1 for an illustration). To understand how this decomposition arises, observe that every edge e of N belongs to an obvious zig-zag trail tail(e)>head(e) in N. If tail(e) has out-degree two in N, or head(e) has in-degree two in N, then this trail is not maximal. In such cases, by repeatedly extending the zig-zag trail, one can eventually obtain a maximal one. By construction, if each vertex of N has in-degree and out-degree at most two, then a maximal zig-zag trail containing e is unique for each e. In other words, if N is almost-binary, then no two distinct maximal zig-zag trails in N have a common edge. It follows that {E(Z1),,E(Zd)} is a partition of E(N), and thus 𝒵 is indeed a decomposition of N.

Figure 1: (a) A rooted binary phylogenetic network N. (b) The maximal zig-zag trail decomposition 𝒵={Z1,,Z7} of N (different arrow styles are used to distinguish the individual trails), where Z1, Z2 and Z3 are M-fences, Z4 is a crown, Z5 is a W-fence, and Z6 and Z7 are N-fences.
 Remark 5.

If Z is a maximal zig-zag trail in N, then, the following is equivalent to condition (C1) in Definition 2: If (u,v) is an edge of Z with outdegZ(u)=1 or indegZ(v)=1, then S contains (u,v). In fact, by the maximality of Z, we have outdegZ(u)=1 if and only if outdegN(u)=1 holds. Similarly, indegZ(v)=1 and indegN(v)=1 are equivalent.

Proposition 6 ([9]).

The maximal zig-zag trail decomposition 𝒵 of N can be computed in Θ(|E(N)|) time.

As in the approach taken in [9, 10], we often consider a maximal zig-zag trail Z as a sequence (e1,,e|E(Z)|) of edges, ordered according to their appearance in the trail, where e1 and e|E(Z)| are called the terminal edges of Z when Z is a fence. For any maximal zig-zag trail Z=(e1,,e|E(Z)|) in N, any subset S of E(Z) is specified by a 0-1 sequence b1b2b|E(Z)|, where bi=1 if eiS and bi=0 otherwise. For example, given a crown Z=(e1,e2,e3,e4), a subset {e1,e3} of E(Z) can be encoded as 1 0 1 0. We often write (10)2 to avoid a repetition such as 1 0 1 0.

For the reader’s benefit, we now illustrate how to find admissible subsets using small examples. Let 𝒵={Z1,,Zd} be the maximal zig-zag trail decomposition of N. Suppose Z1=(e1,e2,e3,e4,e5) is an N-fence with head(e4)=head(e5). Then, {e1,e3,e5} is the only admissible subset of E(Z1) because (C1) requires inclusion of both terminal edges e1 and e5, (C3) excludes selecting both e4 and e5, and (C2) then requires e3, which in turn excludes e2 due to (C3). Thus, the family 𝒮(Z1) of admissible subsets of E(Z1) consists of a single element {e1,e3,e5}, and so 𝒮(Z1)={1 0 1 0 1}. Next, suppose Z2=(e1,e2,e3,e4) is a crown. Then, {e1,e3} and {e2,e4} are the only two admissible subsets of E(Z2), because they are the only subsets that satisfy both (C2) and (C3), and crowns have no edge subject to (C1). Thus, 𝒮(Z2)={1 0 1 0,0 1 0 1}.

 Remark 7.

The choice of an admissible subset from 𝒮(Zi) is independent of and does not influence the selection concerning any other 𝒮(Zj). Indeed, Remark 5 implies that condition (C1) merely requires both terminal edges of each fence in 𝒵 be always selected (i.e. be marked by 1). In addition, (C2) and (C3) are relevant only to a fence or crown that has a pair of edges expressed as vk1<vk>vk+1 and vk1>vk<vk+1, respectively. Recalling that the maximal zig-zag trails are edge-disjoint, whenever such a pair of edges exists, it is contained in a unique maximal zig-zag trail in N. Thus, one can check whether SE(Zi) is admissible or not even without looking at the entire network N.

Remark 7 provides a key idea behind the direct product decomposition in Theorem 8. After obtaining 𝒵={Z1,Zd} for N, one can construct an admissible subset S of E(N) simply by computing S=S1Sd, where Si is any admissible subset of E(Zi). The family 𝒯 of support trees of N – the family of admissible subsets of E(N) – by listing all such combinations (S1,,Sd) with each SiE(Zi).

Using the definitions, notation, and key ideas outlined above, we now state the structure theorem for rooted almost-binary phylogenetic networks. It provides a way to canonically decompose such networks N and characterizes the family of admissible subsets S of E(N), namely, the family of edge-sets of support trees of N.

Theorem 8 ([9]).

For any rooted almost-binary phylogenetic network N, the following hold:

  1. 1.

    The maximal zig-zag trail decomposition 𝒵={Z1,,Zd} of N is unique to N.

  2. 2.

    Given SE(N), N[S] is a support tree of N if and only if for each Zi𝒵, SE(Zi) is an admissible subset of E(Zi). Here, the family 𝒮(Zi) of admissible subsets of E(Zi) is given by Equation (1).

  3. 3.

    The family 𝒯 of support trees of N is non-empty (i.e., N is tree-based) if and only if each 𝒮(Zi) is non-empty (i.e., no element of 𝒵 is a W-fence). When 𝒯 is non-empty, it is characterized by 𝒯=i=1d𝒮(Zi), where (Z1,,Zd) is an arbitrary ordering for 𝒵.

𝒮(Zi)={if Zi is a W-fence{1(01)(|E(Zi)|1)/2}if Zi is an N-fence{(10)|E(Zi)|/2,(01)|E(Zi)|/2}if Zi is a crown{1(01)p(10)q1p,q0,p+q=(|E(Zi)|2)/2}if Zi is an M-fence (1)

Theorem 8 and Proposition 6 have furnished optimal algorithms for various computational problems on support trees [9, 10]. For example, the number |𝒯| of support trees of N is given by |𝒯|=i=1d|𝒮(Zi)|, and using (2), it can be computed in Θ(|E(N)|) time.

|𝒮(Zi)|={0if Zi is an W-fence1if Zi is an N-fence2if Zi is a crown|E(Zi)|/2if Zi is an M-fence (2)

3 The family of support networks and its two subfamilies

For a rooted (not-necessarily-binary) phylogenetic network N=(V,E), r(N) denotes the reticulation number of N, i.e., the number of reticulations in V, and level(N) denotes the level of N, i.e., the maximum number of reticulations of N contained in a block of N, and the tier is |E||V|+1. By definition, level(N)r(N) holds. In general, r(N)|E||V|+1 holds. If N is almost-binary, then r(N)=|E||V|+1 holds by the hand-shaking lemma for directed graphs, which states that the sum of the in-degrees over all vertices equals the number of edges gives. Since N is assumed to be almost-binary in this paper, we will use the terms “tier” and “reticulation number” interchangeably.

Let N be a rooted almost-binary phylogenetic network on X and let G be a spanning subgraph of N. If G is also a rooted almost-binary phylogenetic network on X, then G is a support network of N. The base network G~ of N is obtained from G by smoothing all vertices vV(G) with indegG(v)=outdegG(v)=1. Unlike support and base trees, support and base networks always exist since N itself is a support network of N. The base tier of N, denoted by r(N), is the minimum value of r(G):=|E(G)||V(G)|+1, which equals r(G~), over all support networks G of N. Similarly, the base level of N, denoted by level(N), is the minimum value of level(G), which equals level(G~), over all support networks G of N. If k is the base tier (resp. base level) of N, then N is tier-k-based (resp. level-k-based).

For unrooted phylogenetic networks, Fischer and Francis [6] has introduced the concept of support networks to discuss tier-k-based and level-k-based networks. While the definitions are analogous, the mathematical properties of these concepts differ between rooted and unrooted networks. Indeed, for unrooted N, it was shown in [6] that r(N)=level(N) holds if N has at most one “non-trivial blob” – a maximal connected subgraph without a cut edge and with at least two vertices. For rooted N, however, this equality does not hold in general (see Figure 2 for a counterexample).

Figure 2: (a) A rooted binary phylogenetic network N with r(N)=2 and level(N)=1. (b) A support network G of N that attains the optimal values r(G)=2 and level(G)=1. The edges of G are shown by solid arrows. The reticulations in each graph are colored red.

The focus of this paper is on the computation of r(N) and level(N) for rooted N, with particular emphasis on the latter which presents greater computational challenges. We approach these problems using the families of minimal and minimum support networks of N, which are defined as follows:

Definition 9.

For a rooted almost-binary phylogenetic network N,

  • 𝒜N is the set of all support networks of N;

  • N is the set of minimal support networks of N (i.e. those with a minimal edge-set);

  • 𝒞N is the set of minimum support networks of N (i.e. those with the fewest edges).

Definition 9 implies ()𝒜NN𝒞N. As Figure 3 shows, these families can be all distinct. We also note that if N is tree-based, then 𝒞N is nothing but the set 𝒯 of support trees of N characterized in Theorems 3 and 8.

Figure 3: (a): A rooted binary phylogenetic network N. (b): A support network in 𝒜NN. (c): A support network in N𝒞N. (d): A support network in 𝒞N. The reticulations in each graph are colored red.

4 Counting the support networks of three types

Examining the proper subfamilies N and 𝒞N rather than the entire family 𝒜N of support networks could reduce the search space for computing r(N) or level(N). A natural question is how substantial this reduction might be. In this section, we generalize Theorem 8 to derive analogous direct-product characterizations of 𝒜N, N, and 𝒞N, and determine their cardinalities. To achieve this, we relax condition (C3) in Definition 2 to introduce different notions of admissibility as follows.

Definition 10.

Let N be a rooted almost-binary phylogenetic network and let Z be any subgraph of N. A subset S of E(Z) is 𝒜-admissible if it satisfies the following conditions:

(C1)

If (u,v) is an edge of Z with outdegN(u)=1 or indegN(v)=1, then S contains (u,v);

(C2)

If e1 and e2 are distinct edges of Z with tail(e1)=tail(e2) or head(e1)=head(e2), then S contains at least one of {e1,e2}.

In particular, given an 𝒜-admissible subset S of E(Z), S is -admissible if it is minimal, i.e. no proper subset of S is an 𝒜-admissible subset of E(Z), and S is 𝒞-admissible if it is smallest among all 𝒜-admissible subsets of E(Z).

The definition of 𝒜-admissibility differs from the previous admissibility (Definition 2) in a single aspect: it relaxes the original condition (C3). In fact, (C1) is identical to the original condition (C1). Similarly to Remark 5, when Z is a maximal zig-zag trail in N, we may replace outdegN(v) and indegN(v) in condition (C1) with outdegZ(v) and indegZ(v), respectively. The new condition (C2) is an amalgamation of the original condition (C2) and a relaxed version of (C3). In other words, (C2) retains the original requirement for edges sharing the same tail, while it replaces the “exactly one” requirement for edges with the same head by a weaker “at least one”. This modification of (C3) allows an 𝒜, , and 𝒞-admissible subsets of E(N) to contain both incoming edges at a reticulation vertex and to induce a non-tree subgraph of N with a reticulation.

Lemma 11 generalizes Theorem 3 and the second statement of Theorem 8. The proof is given in Appendix A.1.

Lemma 11.

Let N be a rooted almost-binary phylogenetic network, and let 𝒳{𝒜,,𝒞}. Then, there is a one-to-one correspondence between the family 𝒳N of support networks and the family 𝒮𝒳 of 𝒳-admissible subsets of E(N). Moreover, the subgraph N[S] of N induced by SE(N) is a support network in 𝒳N if and only if SE(Zi) is an 𝒳-admissible subset of E(Zi) for each maximal zig-zag trail Zi in N.

4.1 The number |𝓐𝑵| of all support networks

By Lemma 11, one can find an 𝒜-admissible subset of E(N) in essentially the same manner as for an admissible subset of E(N): first, compute the zig-zag trail decomposition 𝒵={Z1,,Zd} of N, and then select edges of Zi according to the conditions (C1) and (C2) in Definition 10. As explained in Remark 7, when Zi=(e1,,e|E(Zi)|) is a fence, (C1) requires that the terminal edges e1 and e|E(Zi)| be marked 1. In contrast, new condition (C2) prohibits unselected consecutive edges of Zi – consecutive 0’s in the 0-1 sequence – regardless of whether Zi is a crown or a fence.

Thus, the family 𝒮𝒜(Zi) of 𝒜-admissible subsets of E(Zi) is expressed by (3) for each Zi𝒵. We note that, if Zi is a crown, different choices of e1 result in different sequences representing the same subset of E(Zi). For example, both 1 0 0 1 and 0 1 1 0 represent the same subset that does not satisfy condition (C2).

𝒮𝒜(Zi)={{b1b|E(Zi)|b1=b|E(Zi)|=1, and no 00 occurs}if Zi is a fence{b1b|E(Zi)|no 00 occurs in any circular ordering}if Zi is a crown (3)

Moreover, by Lemma 11, 𝒜N is characterized by 𝒜N=i=1d𝒮𝒜(Zi), similarly to 𝒯 in Theorem 8. Therefore, |𝒜N|=i=1d|𝒮𝒜(Zi)|. As in (3), each number |𝒮𝒜(Zi)| depends on whether Zi is a fence or not. This number can be more explicitly represented using Fibonacci and Lucas numbers (OEIS A000045 and OEIS A000032, respectively) as follows. The proof of Theorem 12 is given in Appendix A.2.

Theorem 12.

Let N be a rooted almost-binary phylogenetic network and 𝒵={Z1,,Zd} be the maximal zig-zag trail decomposition of N. Let {Fn} be the Fibonacci sequence defined by F1=F2=1 and Fn=Fn1+Fn2 (n3), and {Ln} be the Lucas sequence defined by L1=1,L2=3 and Ln=Ln1+Ln2 (n3). Then, the number of support networks |𝒜N| is given by (4).

|𝒜N|=Zi:fenceF|E(Zi)|Zi:crownL|E(Zi)| (4)

Moreover, |𝒜N|=Θ(ϕ|E(N)|) holds, where ϕ=(1+5)/2=1.6180 is the golden ratio.

Theorem 12 yields an obvious algorithm for counting |𝒜N|. It first computes the maximal zig-zag trail decomposition 𝒵={Z1,,Zd} of N, and then calculates |𝒜N| using (4).

Proposition 13.

|𝒜N| can be computed by in Θ(|E(N)|) time.

Proof.

By Proposition 6, 𝒵 can be computed in Θ(|E(N)|) time. For each Zi𝒵, deciding whether Zi is a crown or not takes O(|E(Zi)|) time, and F|E(Zi)| and L|E(Zi)| can be computed in O(|E(Zi)|) time. Since |𝒵|=O(|E(N)|), one can count |𝒜N| using (4) in O(|E(N)|) time. Loading N takes Ω(|E(N)|) time, so the overall complexity is Θ(|E(N)|).

Similarly, we can develop an algorithm to generate all (or a desired number of) elements of 𝒜N by sequentially outputting each element of i=1d𝒮𝒜(Zi), achieving Θ(|E(N)|) delay. We refer the reader to [9] or [10] for the basics on the complexity analysis of listing algorithms.

4.2 The number |𝓑𝑵| of minimal support networks

We can see that 𝒜-admissible subset S of E(Zi) is -admissible if and only if S contains no three consecutive edges in the edge sequence (e1,,e|E(Zi)|) of Zi. Thus, the family 𝒮(Zi) of -admissible subsets of E(Zi) is expressed as in (5) for each Zi𝒵.

𝒮(Zi)={{b1b|E(Zi)|b1=b|E(Zi)|=1, and no 00 or 111 occurs}if Zi is a fence{b1b|E(Zi)|no 00 or 111 occurs in any circular ordering}if Zi is a crown (5)

By Lemma 11, N is characterized by N=i=1d𝒮(Zi). Therefore, |N|=i=1d|𝒮(Zi)|. This number is expressed using Padovan numbers (OEIS A000931) and Perrin numbers (OEIS A001608) as follows. The proof of Theorem 14 is in Appendix A.3.

Theorem 14.

Let N be a rooted almost-binary phylogenetic network and 𝒵={Z1,,Zd} be the maximal zig-zag trail decomposition of N. Let {Pn} be the Padovan sequence defined by (P1,P2,P3)=(1,1,1) and Pn=Pn2+Pn3 (n4), and {Qn} be the Perrin sequence defined by (Q1,Q2,Q3)=(0,2,3) and Qn=Qn2+Qn3 (n4). Then, the number of minimal support networks |N| is given by (6).

|N|=Zi:fenceP|E(Zi)|Zi:crownQ|E(Zi)| (6)

Moreover, |N|=Θ(ψ|E(N)|) holds, where ψ=(9+69)/183+(969)/183=1.3247 is the plastic number.

Proposition 15.

|N| can be computed in Θ(|E(N)|) time.

Proof.

The only difference between (4) and (6) is that F|E(Zi)| and L|E(Zi)| are replaced by P|E(Zi)| and Q|E(Zi)|, respectively. For each Zi𝒵, both P|E(Zi)| and Q|E(Zi)| can also be computed in O(|E(Zi)|) time. The remainder is the same as the proof of Proposition 13.

As noted in Section 4.1, the elements of N can also be listed with Θ(|E(N)|) delay.

4.3 The number |𝓒𝑵| of minimum support networks

The construction of 𝒞-admissible subsets of each E(Zi) is the same as that of admissible subsets for support trees, with the only difference that 𝒞-admissible subsets allow W-fences.

𝒮𝒞(Zi)={{(10)|E(Zi)|/2,(01)|E(Zi)|/2}if Zi is a crown{1(01)(|E(Zi)|1)/2}if Zi is an N-fence{1(01)p(10)q1p,q0,p+q=(|E(Zi)|2)/2}if Zi is an M-fence or a W-fence (7)

By Lemma 11, 𝒞N is characterized by 𝒞N=i=1d𝒮𝒞(Zi). Therefore, |𝒞N|=i=1d|𝒮𝒞(Zi)|. From (7), we have |𝒮𝒞(Zi)|=2 for any crown Zi, |𝒮𝒞(Zi)|=1 for any N-fence Zi, and |𝒮𝒞(Zi)|=|E(Zi)|/2 for any other Zi. Thus, we obtain Theorem 16 and Proposition 17.

Theorem 16.

Let N be a rooted almost-binary phylogenetic network and 𝒵={Z1,,Zd} be the maximal zig-zag trail decomposition of N. Then, |𝒞N|=2cZi𝒵MW(|E(Zi)|/2) holds, where c is the number of crowns in 𝒵, and 𝒵MW:={Zi𝒵Zi is an M-fence or W-fence}.

Proposition 17.

|𝒞N| can be computed in Θ(|E(N)|) time.

As noted in Section 4.1, the elements of 𝒞N can also be listed with Θ(|E(N)|) delay.

4.4 Case study: Comparison of the numbers |𝓐𝑵|, |𝓑𝑵|, and |𝓒𝑵|

One can solve any optimization problem on support networks by listing all elements of 𝒜N, but this approach is generally infeasible due to its enormous size. Although all of |𝒜N|, |N|, and |𝒞N| grow exponentially with the size of N, we present a comparison of their practical sizes to motivate our approach to Problem 2 described in Section 6. The full implementation including the code used to generate the networks, along with the generated samples and detailed count data, is available in our GitHub repository.

For this case study, we generated rooted binary phylogenetic networks with n leaves and r=2(n1) reticulations. Such a network has 2n2+3r=8(n1) edges, and thus 50% of its edges are reticulation edges, representing a high level of reticulation. For each 3n10, we created 100 such networks by the following procedure: starting from a rooted binary phylogenetic tree N0, the procedure repeatedly adds a new edge between two arbitrary edges of Ni, directing it so as to preserve the acyclicity of Ni+1, and returns Nr. Note that networks generated in this manner may or may not be tree-based, since every new edge is added not between edges of the initial tree N0 but between edges of the current network Ni.

For each generated network N, we computed |𝒜N|, |N|, and |𝒞N| using the linear-time algorithms described in Sections 4.14.3. The minimum, maximum, and median counts across the 100 samples of each size are summarized in Table 1.

Table 1 shows that |N| is consistently much smaller than |𝒜N|, and |𝒞N| is even smaller. This indicates that, even though |𝒜N| can be prohibitively large, both |N| and |𝒞N| are typically small enough to allow exhaustive enumeration. We also see that the number of support networks varies widely even among networks of the same size, suggesting that the computational cost of exhaustive search within N or 𝒞N may vary substantially depending on the structure of N. For example, the number |𝒞N| of minimum support networks of N with r=18 was 1728 for a certain sample but was only 4 for another sample.

Table 1: Summary of the case study results: minimum, maximum, and median of |𝒜N|, |N|, and |𝒞N| for 100 rooted binary phylogenetic networks with n leaves and r=2(n1) reticulations.
n r |𝒜N| |N| |𝒞N|
Min Max Median Min Max Median Min Max Median
3 4 15 64 30 2 10 4 1 9 3
4 6 55 336 160 4 28 9 1 18 4
5 8 288 2,640 900 8 72 24 1 36 8
6 10 900 15,840 4,478 6 256 48 1 108 12
7 12 6,435 134,784 24,720 24 768 142 2 192 24
8 14 34,272 571,914 134,400 40 1,680 270 2 576 38
9 16 93,600 4,186,080 699,920 90 2,880 576 2 1,152 68
10 18 449,280 21,715,200 3,861,936 144 7,056 1,372 4 1,728 124

5 Finding a support network with the fewest reticulations

As defined in Section 3, the base tier r(N) of N is the minimum value of r(G) over all support networks G of N. This leads us to Problem 1.

Problem 1 (Reticulation Minimization).

Given a rooted almost-binary phylogenetic network N, compute the base tier r(N), and find a support network G of N with r(G)=r(N).

Since r(G):=|E(G)||V(G)|+1 and G is a spanning subgraph of N, the support networks G of N minimize r(G) if and only if it has the fewest edges among all support networks, i.e., it is a minimum support network. Therefore, an optimal solution of Problem 1 can be obtained by simply selecting an arbitrary element from 𝒞N=i=1d𝒮𝒞(Zi). This is done by computing the maximal zig-zag trail decomposition {Z1,,Zd} of N, and then selecting an arbitrary element of 𝒮𝒞(Zi) expressed in (7) for each Zi.

For example, the network N in Figure 1 is decomposed into the maximal zig-zag trails Z1 through Z7. According to (7), the sequence 11 is 𝒞-admissible for E(Z1), E(Z2), and E(Z5); 1010 is 𝒞-admissible for E(Z3) and E(Z4); and 1 is 𝒞-admissible for E(Z6) and E(Z7). Piecing these 𝒞-admissible subsets together yields the edge-set of a support network G of N with the minimum reticulation number, that is, r(G)=1.

Theorem 18.

The above algorithm solves Problem 1 in Θ(|E(N)|) time.

When N is tree-based, r(N)=0 holds and a support network G with r(G)=0 is simply its support tree. In general, r(G) equals the number of W-fences in the maximal trail decomposition of N, as easily shown by induction on the number of W-fences.

6 Finding a support network with the minimum level

In Section 3, we have also defined level(N), i.e., the base level of N. In contrast to Problem 1, we conjecture that Problem 2 is NP-hard. Here, we provide exact and heuristic exponential algorithms for solving Problem 2.

Problem 2 (Level Minimization).

Given a rooted almost-binary phylogenetic network N, compute the base level, level(N), and find a support network G with level(G)=level(N).

6.1 Exact algorithm

An obvious exact algorithm for Problem 2 computes level(G) of each element G in 𝒜N to obtain the minimum value, but this method is clearly impractical. Algorithm 1 performs an exhaustive search in N instead of 𝒜N. Recalling the numerical results in Table 1, we know that this search space reduction is significant. We can prove that N contains a correct G, yielding Theorem 19. The proof is given in Appendix A.4.

Algorithm 1 Exact method for solving Level Minimization (Problem 2).
Theorem 19.

Algorithm 1 returns a correct solution of Problem 2. It runs in O(|N||E(N)|)=O(ψ|E(N)||E(N)|) time, where ψ is as in Theorem 14.

6.2 Heuristic algorithm

Algorithm 2 performs an exhaustive search in an even smaller search space 𝒞N. It repeats the same procedure as Algorithm 1, except its iteration number is reduced from |N| to |𝒞N|. By Theorem 19, Algorithm 2 therefore runs in O(|𝒞N||E(N)|) time.

Algorithm 2 does not always output a correct solution of Problem 2, because minimizing the number of reticulations does not imply minimizing the level of support networks. Consider a level-1-based network N in Figure 4(a), which has the property that any G𝒞N satisfies r(G)=2 and level(G)=2. Then, Algorithm 2 outputs a level-2 support network as illustrated in Figure 4(b). An optimal support network G with level(G)=1 exists in N𝒞N while it is not optimal in terms of reticulation numbers, as shown in Figure 4(c).

Algorithm 2 Heuristic method for solving Level Minimization (Problem 2).
Figure 4: (a) An instance of Problem 2 for which Algorithm 2 cannot find a correct solution. (b) An example of a support network output by Algorithm 2 with two reticulations (red) and level 2. (c) An optimal support network that has three reticulations (red) yet achieves level 1.

6.3 Performance evaluation of the exact and heuristic algorithms

We evaluated the performance of our exact algorithm (Algorithm 1) and heuristic algorithm (Algorithm 2) using rooted binary phylogenetic networks with n=8 leaves and various reticulation numbers r. Although our experiment focuses on only binary networks, this simplification does not undermine the validity of our evaluation, as the search space sizes |N| for Algorithm 1 and |𝒞N| for Algorithm 2 are not affected by whether N is binary or almost-binary.

For each r ranging from 1 to 50, we generated 25 instances using the procedure described in Section 4.4. The experiments were conducted on a MacBook Pro equipped with an Apple M3 Max CPU (4.05 GHz) and 36 GB of memory, with a timeout limit of 30 minutes per instance. The Python code and experimental results, including the outputs of both algorithms for all instances, are available in our GitHub repository.

We first assessed computational efficiency of Algorithms 1 and 2 by measuring runtime on each instance. Figure 5 shows average runtime on a log scale with minimum and maximum values across 25 instances for each r. Both algorithms exhibit expected exponential runtime growth as r increases. Algorithm 1 completed up to r=36, beyond which timeouts occurred, while Algorithm 2 successfully handled all instances up to r=49. The performance gap is particularly pronounced for larger networks. For example, at r=35, Algorithm 1 approaches its limits while Algorithm 2 completes within seconds on average.

Notably, both algorithms exhibit substantial runtime variability even for the same r, as shown by the wide error bars in Figure 5. This variation reflects runtime dependence on network structure beyond just reticulation number, consistent with our observations in Section 4.4 regarding the significant variability of |BN| and |CN| among instances with identical r values.

Figure 5: Runtimes (log scale) of Algorithms 1 and 2 on rooted binary phylogenetic networks with 8 leaves and various reticulation numbers r (1r36 for exact, 1r49 for heuristic). Runtime is averaged over 25 instances per r, with error bars showing minimum and maximum.

While Algorithm 1 guarantees optimal solutions (Theorem 19), Algorithm 2 provides no such theoretical guarantee. To evaluate the practical quality of the heuristic solutions, we compared the outputs of Algorithm 2 against the optimal values level(N) computed by Algorithm 1 for all instances with r36 where both algorithms completed successfully within the time limit. Figure 6 provides a breakdown of the output differences Δ for each r, where Δ is defined as the difference between the value returned by Algorithm 2 and the optimal value level(N) computed by Algorithm 1. The bars show the proportion of instances (out of 25) with Δ=0 (exact match), Δ=1, Δ=2, and Δ3.

Although the accuracy gradually declines as r increases, Algorithm 2 maintains practical effectiveness across the entire range of tested networks. In fact, up to r17, Algorithm 2 achieved exact solutions (Δ=0) for all instances. Even at r=30, it still returned optimal values for approximately 80% of instances. Large deviations (Δ3) remain rare throughout the tested range, occurring in less than 10% of instances even for the largest networks.

Figure 6: Accuracy of Algorithm 2 on randomly generated rooted binary phylogenetic networks with 8 leaves and r{1,,36} (25 instances per r). Bars show the percentage of instances with difference Δ between Algorithm 2’s output and optimal level(N) found by Algorithm 1: Δ=0 (blue), Δ=1 (green), Δ=2 (orange), Δ3 (red).

7 Conclusion and future directions

In this paper, we have extended Hayamizu’s structure theorem to develop a theoretical foundation for support networks of rooted almost-binary phylogenetic networks. The extension from support trees allows us to analyze not only tree-based networks but also general, non-tree-based rooted phylogenetic networks. Our main contributions are threefold:

First, we established direct-product characterizations of three families of support networks for a given network N: all support networks 𝒜N, minimal support networks N, and minimum support networks 𝒞N. These characterizations yielded closed-form counting formulas that can be computed in linear time, revealing interesting connections to well-known integer sequences such as Fibonacci, Lucas, Padovan, and Perrin numbers. We also described and implemented a linear-delay algorithm for listing the support networks in each family.

Second, we developed a linear-time algorithm for Reticulation Minimization, which finds a support network with the minimum reticulation number (minimum tier). We proved that an optimal solution can be found simply by picking any element from the set 𝒞N.

Third, we proposed both exact and heuristic algorithms for Level Minimization. While both algorithms have exponential time complexity, our experimental evaluations demonstrated their practicality across a wide range of reticulation levels. Notably, the heuristic method found optimal solutions in most cases, and when it did not, it still produced good approximations.

There are some promising avenues for future research. In this paper, we have discussed support networks of almost-binary networks, which form a subclass of non-binary networks. However, extending this framework to fully non-binary networks is challenging because the uniqueness of the maximal zig-zag trail decomposition of N, a key component of Hayamizu’s structure theorem that underpins our work, is not guaranteed beyond the almost-binary case. It is also important to explore whether support networks with the minimum tier or level can gain meaningful insights into reticulate evolutionary processes from biological data. As our heuristic algorithm empirically produces good approximate solutions, theoretical analysis of its approximation ratio and the design of improved approximation algorithms also remain valuable pursuits.

References

  • [1] Gabriel Cardona, Francesc Rosselló, and Gabriel Valiente. Comparison of Tree-Child Phylogenetic Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(4):552–569, 2009. doi:10.1109/TCBB.2007.70270.
  • [2] Nathan Davidov, Amanda Hernandez, Justin Jian, Patrick McKenna, K.A. Medlin, Roadra Mojumder, Megan Owen, Andrew Quijano, Amanda Rodriguez, Katherine St. John, Katherine Thai, and Meliza Uraga. Maximum Covering Subtrees for Phylogenetic Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(6):2823–2827, November 2021. doi:10.1109/TCBB.2020.3040910.
  • [3] Tomislav Došlić and Ivana Zubac. Counting maximal matchings in linear polymers. Ars Mathematica Contemporanea, 11:255–276, January 2016. doi:10.26493/1855-3974.851.167.
  • [4] Péter L Erdős, Charles Semple, and Mike Steel. A class of phylogenetic networks reconstructable from ancestral profiles. Mathematical Biosciences, 313:33–40, 2019. doi:10.1016/j.mbs.2019.04.009.
  • [5] E. J. Farrell. On the Occurrences of Fibonacci Sequences in the Counting of Matchings in Linear Polygonal Chains. The Fibonacci Quarterly, 24(3):238–246, 1986.
  • [6] Mareike Fischer and Andrew Francis. How tree-based is my network? Proximity measures for unrooted phylogenetic networks. Discrete Applied Mathematics, 283:98–114, September 2020. doi:10.1016/j.dam.2019.12.019.
  • [7] Andrew Francis, Charles Semple, and Mike Steel. New characterisations of tree-based networks and proximity measures. Advances in Applied Mathematics, 93:93–107, February 2018. doi:10.1016/j.aam.2017.08.003.
  • [8] Andrew Francis and Mike Steel. Which Phylogenetic Networks are Merely Trees with Additional Arcs? Systematic Biology, 64(5):768–777, September 2015. doi:10.1093/sysbio/syv037.
  • [9] Momoko Hayamizu. A Structure Theorem for Rooted Binary Phylogenetic Networks and Its Implications for Tree-Based Networks. SIAM Journal on Discrete Mathematics, 35(4):2490–2516, January 2021. doi:10.1137/19M1297403.
  • [10] Momoko Hayamizu and Kazuhisa Makino. Ranking Top-k Trees in Tree-Based Phylogenetic Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20(3):2349–2355, May 2023. doi:10.1109/TCBB.2022.3229827.
  • [11] John Hopcroft and Robert Tarjan. Algorithm 447: Efficient Algorithms for Graph Manipulation. Communications of the ACM, 16(6):372–378, June 1973. doi:10.1145/362248.362272.
  • [12] Remie Janssen and Yukihiro Murakami. On cherry-picking and network containment. Theoretical Computer Science, 856:121–150, 2021. doi:10.1016/j.tcs.2020.12.031.
  • [13] Thomas Koshy. Fibonacci and Lucas Numbers with Applications. John Wiley & Sons, September 2001. doi:10.1002/9781118033067.
  • [14] Charles Semple and Jack Simpson. When is a Phylogenetic Network Simply an Amalgamation of Two Trees? Bulletin of Mathematical Biology, 80(9):2338–2348, 2018. doi:10.1007/s11538-018-0463-x.
  • [15] Takatora Suzuki, Han Guo, and Momoko Hayamizu. Bridging Between Deviation Indices for Non-Tree-Based Phylogenetic Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 21(6):2226–2234, 2024. doi:10.1109/TCBB.2024.3456575.
  • [16] Louxin Zhang. On Tree-Based Phylogenetic Networks. Journal of Computational Biology, 23(7):553–565, July 2016. doi:10.1089/cmb.2015.0228.

Appendix A Appendix

In what follows, N=(V,E) is a rooted almost-binary phylogenetic network on the leaf-set X, 𝒵={Z1,,Zd} is the maximal zig-zag trail decomposition of N, and 𝒵=(Z1,,Zd) is any ordered set of the elements of 𝒵. In addition, for each Zi𝒵, we let mi:=|E(Zi)| and write Zi=(e1,,emi).

A.1 Proof of Lemma 11

We begin by proving the first statement. To show the one-to-one correspondence between 𝒳N and 𝒮𝒳 in the case of 𝒳=𝒜, we first prove that N[S] is a support network of N if and only if S is an 𝒜-admissible subset of E. Assume that G is a support network of N. Then, V(G)=V by definition. We will now verify that S~:=E(G) satisfies conditions (C1) and (C2). Recall that Z in Definition 10 is any subgraph of N, so we may let Z be N. For any (u,v)E with outdegN(u)=1, we have (u,v)S~ since otherwise outdegG(u)=0 would hold and thus G would have a leaf u not in the leaf-set X of N. Also, for any (u,v)E with indegN(v)=1, we have (u,v)S~ since otherwise indegG(v)=0 would hold and thus G would have a root v other than the unique root of N. Hence, S~ satisfies (C1). Next, for any distinct e1,e2E with either tail(e1)=tail(e2) or head(e1)=head(e2), at least one of {e1,e2} is in S~ since otherwise, by the assumption that N is almost-binary, tail(e1)X would be a leaf of G or head(e1) would be an extra root of G. Hence, S~ satisfies (C2).

To prove the converse, assume that S is an 𝒜-admissible subset of E. Then, S satisfies (C1) and (C2). This implies that N[S] is a spanning subgraph of N. Indeed, if there were vV with v{head(e),tail(e)} for any eS, then (C1) or (C2) (or both) would be violated, as easily verified for each possible case of indegN(v){0,1,2}. Let G~:=(V,S) be the spanning subgraph of N. Since S satisfies both (C1) and (C2), outdegN(v)=0 if and only if outdegG~(v)=0, so the root of N remains the unique root of G~. Similarly, outdegN(v)=0 if and only if outdegG~(v)=0, so the leaf-set of G~ remains X. Hence, G~=(V,S) is a support network of N.

By the assumption on N described in Remark 4, if S1 and S2 are different 𝒜-admissible subsets of E, then N[S1]=(V,S1) and N[S2]=(V,S2) are different support networks of N, and the converse also holds. Thus, the map μ:𝒮𝒜𝒜N, defined by μ(S)=N[S], is a bijection.

The above arguments 𝒳=𝒜 apply for similarly to 𝒳{,𝒞}. Indeed, by Definitions 9 and 10, an 𝒜-admissible subset S𝒮𝒜 belongs to 𝒮 if and only if the support network N[S]𝒜N belongs to N. Thus, the restriction of μ to 𝒮 gives a bijection between 𝒮 and N. The same holds for 𝒳=𝒞. This completes the proof of the first statement.

To prove the second statement for 𝒳=𝒜, we show that a subset SE satisfies conditions (C1) and (C2) with Z:=N if and only if SE(Zi) satisfies (C1) and (C2) with Z:=Zi for each i[1,d]. First, observe that if (u,v)E satisfies outdegN(u)=1 or indegN(v)=1, then there exists a unique Zi𝒵 such that (u,v)E(Zi). Since {E(Z1),,E(Zd)} is a partition of E, the condition (C1) for S with Z:=N holds if and only if each SE(Zi) satisfies the condition (C1) with Z:=Zi. Similarly, if two distinct edges e1,e2E share the same head or the same tail, then both edges must belong to a unique Zi𝒵, i.e., {e1,e2}E(Zi). Then, the requirement that S contains e1 or e2 is equivalent to requiring that SE(Zi) contains e1 or e2. Thus, S satisfies (C2) with Z:=N if and only if each SE(Zi) satisfies (C2) with Z:=Zi. Hence, S is an 𝒜-admissible subset of E (i.e., N[S] is a support network of N) if and only if each Si:=SE(Zi) is an 𝒜-admissible subset of E(Zi).

The result just proved for 𝒳=𝒜 implies that any 𝒜-admissible subset SE can be obtained by independently selecting, for each Zi𝒵, an 𝒜-admissible subset SiE(Zi), and then taking their union: S=S1Sd. Then, similarly to Remark 7, the admissible subset S is minimal (resp. minimum) if and only if each Si is minimal (resp. minimum). This completes the proof.

A.2 Proof of Theorem 12

We prove that |𝒮𝒜(Zi)|=Fmi holds if Zi is a fence and that |𝒮𝒜(Zi)|=Lmi holds if Zi is a crown. Consider a fence Zi=(e1,,emi)𝒵. As the case of mi2 is trivial, we may assume mi3. Let Zi=(e1,,emi) be the undirected graph obtained from Zi by ignoring all edge orientations, and let Zi′′=(e2,,emi1) be its subgraph induced by the non-terminal edges of Zi. We can assume without loss of generality that Zi′′ is a path. Indeed, when N is binary, this holds trivially because the fence Zi visits each internal vertex exactly once. If N is almost-binary but not binary, Zi′′ may fail to be a path due to the presence of a vertex v with both indegZi(v)=2 and outdegZi(v)=2. However, we can transform N by a preprocessing step that splits each such vertex v into two vertices connected by a new edge, which ensures that Zi′′ is a path. This operation preserves |𝒜N| and does not affect the set 𝒮𝒜(Zi).

Let S be a subset of E(Zi) and let S¯:=E(Zi)S. By Lemma 11 and Equation (3), S is 𝒜-admissible if and only if e1,e2S¯ and for any k[1,mi1], {ek,ek+1}S¯. Then, there exists a bijection between 𝒮𝒜(Zi) and the family of matchings in the path Zi′′. For example, when mi=5, an 𝒜-admissible subset S={e1,e3,e5} of E(Zi) corresponds to a matching S¯={e2,e4} in Zi′′=(e2,e3,e4). Theorem 1 in [5] says that the number of (possibly empty) matchings in a path with 1 edges equals F+2. Since |E(Zi′′)|=mi2, the number of matchings in Zi′′ equals Fmi. Hence, |𝒮𝒜(Zi)|=Fmi holds.

Consider a crown Zi=(e1,,emi). Since a crown has no terminal edges, SE(Zi) is 𝒜-admissible if and only if for any k[1,mi1], {ek,ek+1}S¯. Let Zi be the undirected graph obtained from Zi by ignoring all edge orientations. Using the pre-processing described above, we can treat Zi as a cycle graph without loss of generality. Similarly to above, there is a bijection between 𝒮𝒜(Zi) and the family of matchings in the cycle Zi. Theorem 2 in [5] says that the number of (possibly empty) matchings in a cycle with 3 edges equals L. Hence, |𝒮𝒜(Zi)|=Lmi holds.

Theorems 5.6 and 5.8 in [13] say that Fn and Ln are expressed by the closed-forms Fn=(ϕn(ϕ)n)/5 and Ln=ϕn+(1ϕ)n, respectively. Then, Fn=Θ(ϕn) and Ln=Θ(ϕn) hold. This implies Θ(|𝒜N|)=Θ(ϕ|E(Z1)|××ϕ|E(Zd)|)=Θ(ϕ|E(N)|) holds. This completes the proof.

A.3 Proof of Theorem 14

We prove that |𝒮(Zi)|=Pmi holds if Zi is a fence and |𝒮(Zi)|=Qmi holds if Zi is a crown. Consider a fence Zi=(e1,,emi). As the case of mi2 is trivial, we may assume mi3. Let Zi′′ be the undirected path as in the proof of Theorem 12. An 𝒜-admissible subset S of E(Zi) is minimal (namely, -admissible) if and only if a matching in Zi′′ corresponding to S¯:=E(Zi)S is maximal. Then, there exists a bijection between 𝒮(Zi) and the family of maximal matchings in Zi′′. For example, when mi=5, a -admissible subset S={e1,e2,e4,e5} of E(Zi) corresponds to a maximal matching S¯={e3} in Zi′′=(e2,e3,e4). Proposition 3.1 in [3] say that the number of maximal matchings in a path with edges equals P+2. Since |E(Zi′′)|=mi2, the number of maximal matchings in Zi′′ equals Pmi. Hence, |𝒮(Zi)|=Pmi holds.

Consider a crown Zi=(e1,,emi). Let Zi be the undirected cycle as in the proof of Theorem 12. Then, there exists a bijection between 𝒮 and the family of maximal matchings in the cycle Zi. By the statement after Proposition 3.9 in [3], the number of maximal matchings in a cycle with 3 edges equals Q. Thus, |𝒮(Zi)|=Qmi.

By [3], Pn=Θ(ψn) and Qn=Θ(ψn) hold. Hence, Θ(|N|)=Θ(ψ|E(Z1)|××ψ|E(Zd)|)=Θ(ψ|E(N)|) holds. This completes the proof.

A.4 Proof of Theorem 19

As Algorithm 1 checks the level of each support network in N, our goal is to show that N contains a level-k support network of N. To obtain a contradiction, assume that level(G)>k holds for any GN. This implies that any level-k support network G of N is in 𝒜NN. Since G is not minimal, there exists a minimal one GN with GG. Then, level(G)level(G)=k since each block of G is a subgraph of its corresponding block of G, which is a contradiction. This completes the proof of the correctness of Algorithm 1.

Since N is a binary network, we have O(|V(N)|)=O(|E(N)|). Also, as any minimal support network GN is a subgraph of N, we have O(|E(G)|)=O(|E(N)|). For each GN, it takes O(|V(G)|+|E(G)|)=O(|E(N)|) time to compute its block decomposition {G1,,Gk} using depth-first search [11]. For any block Gi, one can compute ri=|E(Gi)||V(Gi)|+1 in O(|E(Gi)|) time. Then, for each GN, it takes O(|E(G1)|++|E(Gk)|)=O(|E(G)|)=O(|E(N)|) time to calculate level(G), namely the maximum ri across all blocks Gi of G. Overall, it takes O(|N||E(N)|) time to obtain the minimum value of level(G) across all GN. By Theorem 14, O(|N||E(N)|)=O(ψ|E(N)||E(N)|) holds.