Abstract 1 Introduction 2 Preliminaries 3 Parameter 𝑫¯+𝐬𝐰𝓕 4 Parameter 𝒌+𝐬𝐰𝓕+𝜹+𝒉 5 Discussion References A Appendix – Omitted proofs

Parameterized Algorithms for Diversity of Networks with Ecological Dependencies

Mark Jones ORCID TU Delft, The Netherlands Jannik Schestag ORCID TU Delft, The Netherlands
Abstract

For a phylogenetic tree, the phylogenetic diversity of a set A of taxa is the total weight of edges on paths to A. Finding small sets of maximal diversity is crucial for conservation planning, as it indicates where limited resources can be invested most efficiently. In recent years, efficient algorithms have been developed to find sets of taxa that maximize phylogenetic diversity either in a phylogenetic network or in a phylogenetic tree subject to ecological constraints, such as a food web. However, these aspects have mostly been studied independently. Since both factors are biologically important, it seems natural to consider them together.

In this paper, we introduce decision problems where, given a phylogenetic network, a food web, and integers k, and D, the task is to find a set of k taxa with phylogenetic diversity of at least D under the maximize all paths measure, while also satisfying viability conditions within the food web. Here, we consider different definitions of viability, which all demand that a “sufficient” number of prey species survive to support surviving predators.

We investigate the parameterized complexity of these problems and present several fixed-parameter tractable (FPT) algorithms. Specifically, we provide a complete complexity dichotomy characterizing which combinations of parameters – out of the size constraint k, the acceptable diversity loss D¯, the scanwidth of the food web sw, the maximum in-degree δ in the network, and the network height h – lead to W[1]-hardness and which admit FPT algorithms.

Our primary methodological contribution is a novel algorithmic framework for solving phylogenetic diversity problems in networks where dependencies (such as those from a food web) impose an order, using a color coding approach.

Keywords and phrases:
Phylogenetic Diversity, Fixed-Parameter Tractability, Phylogenetic Networks, Food Webs, Color Coding
Funding:
Mark Jones: Partially funded by the Dutch Organisation for Scientific Research (NWO) grant OCENW.KLEIN.125 and OCENW.M.21.306.
Jannik Schestag: Funded by the Dutch Research Council (NWO), project “Optimization for and with Machine Learning (OPTIMAL)” OCENW.GROOT.2019.015.
Copyright and License:
[Uncaptioned image] © Mark Jones and Jannik Schestag; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Applied computing Computational biology
; Theory of computation Fixed parameter tractability
Related Version:
Full Version: http://arxiv.org/abs/2510.09512 [18]
Editors:
Akanksha Agrawal and Erik Jan van Leeuwen

1 Introduction

The Sixth Mass Extinction eliminates species and their genera at an unprecedented rate [5], even exceeding rates in previous mass extinction events [27]. The situation is severe enough that approximately a quarter of Earth’s existing species are at threat [10]. Inaction now means jeopardizing further parts of the animal tree of life.

Because resources are not sufficiently available to preserve all species, scientists developed the phylogenetic diversity measure [8] to cover the necessity of making an educated decision on which set of species the limited resources should be invested in. Given a phylogenetic tree with edge weights in which leaves represent present-day species, the phylogenetic diversity of a set of species A is the total weight of edges on paths from the root to species in A. Intuitively, a set of species with larger phylogenetic diversity captures a larger variety of genetic material, and is therefore expected to have larger biodiversity. Phylogenetic diversity became the most popular measure of the significance of a set of species [33].

Apart from its biological relevance, phylogenetic diversity probably also found favor in the eyes of many scientists for its highly desirable trait of being easy to compute [8]. Indeed, an optimal solution for maximizing phylogenetic diversity can be found with a greedy algorithm [25, 30] and therefore even large instances can be solved within seconds [22].

To model further relevant aspects of conservation planning, more problems were defined, in which a special focus was placed upon capturing varying costs of saving species [20, 26], finding optimal conservation areas [3, 23], considering species’ extinction times [17], or preserving viable sets of species [23]. In the latter problem a food web, modeling predator-prey relationships, is given in addition to the phylogenetic tree. It is asked to find a set A of species that maximizes the phylogenetic diversity and is viable in the sense that each species in A is either a source in the ecosystem, or finds prey within A. Due to concerns that one prey could be insufficient, further definitions of viability have been defined [28].

The notion of phylogenetic trees has been generalized. Ancestry of species is in recent years more often modeled with phylogenetic networks, which in contrast to phylogenetic trees also allow hybridization events or horizontal gene transfer [15]. Consequently, generalizations of phylogenetic diversity on networks have been proposed [4, 16, 31, 32, 34].

As phylogenetic networks represent the connection between species a lot better than phylogenetic trees and considering viability constraints are vital, it is natural to combine these two aspects. Yet, to the best of our knowledge, this has not been done so far. Therefore, we take the step and define problems for the maximization of phylogenetic diversity on networks with different viability definitions. As it was expected that “a combination of these concepts [would] result in very hard problems.” [28], we turn to the toolbox of parameterized complexity to break intractability.

Parameterized complexity is one method to cope with NP-hardness. In this, we consider a problem Π and a parameter p of size κ and say that (Π,p) is FPT if Π can be solved in 𝒪(f(κ)) time, for some computable function f. If Π is W[1]-hard with respect to p or even NP-hard if κ is a constant, then the existence of an FPT-algorithm is unlikely [6, 7].

We consider three definitions of viability, based on whether a non-source species in the food web needs to have one or all of its prey available, or whether a weighted sum of the available prey must reach a certain threshold. We provide a complete complexity dichotomy, for the latter two of these definitions, in the sense that for every combination of the following parameters, we show whether the defined problems are W[1]-hard, or can be solved with an FPT-result: the size constraint k; the acceptable diversity loss D¯; the scanwidth of the food web sw; the maximum in-degree δ in the network; and the network height h. For the other definition of viability, we have a near-complete complexity dichotomy that omits only one of the above combinations. In particular, we provide FPT algorithms for the most general version of the problem we define for the parameters D¯+sw and k+sw+δ+h. For the former, we depend on a notion of anchors already used in [17], but we improve on their algorithmic idea, in two ways. First, we use a color coding technique which only requires one color per edge. Secondly, we consider a non-linear order. By this, we are able to even provide algorithms for the smaller parameter k¯+sw for the version of the problem on trees.

2 Preliminaries

2.1 Definitions

We use the 𝒪-notation which omits factors polynomial in the input size. For a positive integer a, by [a] we denote the set {1,2,,a}, and by [a]0 the set {0}[a]. For functions f:AB, where B is a family of sets, we define f(A):=aAf(a), and if B, we define fΣ(A):=aAf(a), for subsets AA. We write {u,v} for an undirected edge between u and v and uv for a directed edge from u to v. For any graph G, we write V(G) and E(G) for the set of vertices or edges, respectively. For a set of edges E, we write V(E) for the vertices with at least one endpoint in an edge of E. For a vertex set VV(G), we let G[V]:=(V,{eE(G)V({e})V}) denote the subgraph of G induced by V. We generalize to edge sets EE(G) by G[E]:=G[V(E)].

Phylogenetic Networks and Phylogenetic Diversity.

For a given set X, a phylogenetic X-network 𝒩=(V,E,ω) is a directed, acyclic graph (V,E) with edge-weight ω:E>0, in which there is a single vertex with an in-degree of 0, the root ρ, and X is the set of vertices with an in-degree of 1 and an out-degree of 0, called the leaves. All other vertices split up into tree vertices, which have an in-degree of 1 and an out-degree of at least 2, and reticulations, which have an out-degree of 1 and an in-degree of at least 2. Edges incoming at reticulations (tree vertices) are reticulation edges (tree edges). The set of reticulations, tree vertices, reticulation and tree edges are denoted with VR(𝒩), VT(𝒩), ER(𝒩), and ET(𝒩), respectively. A phylogenetic X-tree 𝒯=(V,E,ω) is a phylogenetic X-network without reticulations. The set X is a set of taxa (species). We interchangeably use the words taxon and leaf. In biological applications, the set X is a set of taxa, and the other vertices of 𝒩 correspond to biological ancestors of these taxa. An edge e=uv represents direct biological inheritance from u to v. The weight ω(e) describes the phylogenetic distance between the endpoints of e. As these endpoints correspond to distinct, (possibly extinct) species, we may assume this distance is greater than zero. Reticulations correspond to species that have direct inheritance from multiple ancestors, such as hybrid species.

For a vertex vV, the offspring off(v) of v is the set of leaves xX for which there is a path from v to x. Given a set of taxa AX, let E(A) denote the set of edges uvE with off(v)A. The phylogenetic diversity PD𝒩(A) of A is defined by

PD𝒩(A):=eE(A)ω(e). (1)

In other words, the phylogenetic diversity PD𝒩(A) of a set A of taxa is the sum of the weights of edges that are on a path to a taxon in A.

For phylogenetic trees, this measure is well established [8]. For phylogenetic networks the search for the most relevant measure is still ongoing, but so far the above definition, also called All-Paths-PD, is the measure that is “the simplest” [4, 16, 31, 32, 34] and is the only measure of phylogenetic diversity considered in this paper.

Food Webs.

A food web =(X,E,γ) on X for a set of taxa X, is a directed acyclic graph (X,E) with an edge weight function γ:E[0,1].

For an edge xyE, we say that x is prey of y and y is a predator of x. Thus, edges in are directed from prey to predator. The set of prey and predators of x is denoted with prey(x) and pred(x), respectively. The set prey(E)(x) is the set of edges incoming at v and pred(E)(x) is the set of edges outgoing of v in . Taxa without prey are sources. For the problems considered in this paper, instances in which the food web has several sources can in quadratic time be transformed into one where there is only one source [21, Observation 2.3]. Therefore, throughout the paper, we assume that only has a single source s.

For a given food web , a set of taxa AX is γ-viable if γΣ({uxuprey(x)A})1 for each non-source xA. That is, the total weight of edges incoming from taxa in A is at least 1 [28]. Given also a set EE() of edges, a set of taxa AX is E-part-γ-viable if γΣ({uxprey(E)(x)uA or uxE})1 for each non-source xA. That is, the total weight of edges incoming from taxa in A together with incoming edges in E is at least 1.

We consider two important special cases of viability. If γ(e)=1 for all eE, then we say a γ-viable set AX is ε-viable. That is equivalent to saying that A is ε-viable, if each non-source in A can find prey in A. If γ(ux)=1/|prey(x)| for every edge in prey(E)(x) for each xX, then we say a γ-viable set AX is 1-viable. That is, A is 1-viable, if all prey of each taxon in A are in A.

Problem Definitions.

In the classical Maximize Phylogenetic Diversity (Max-PD) problem, we are given a phylogenetic tree 𝒯, and integers k, and D, and it is asked whether a set of k taxa with a phylogenetic diversity of at least D exists [8].

The problems Map-PD [4], ε-PDD [23], and 1-PDD [28] are generalizations of Max-PD in the following sense. In Map-PD, we are given a phylogenetic network instead of a tree and the question stays the same – just with the more general phylogenetic diversity measure. In ε-PDD, 1-PDD and Weighted-PDD, we are, in addition to the input of Max-PD, given a food web and the solution set is required to be ε-viable, 1-viable or γ-viable, respectively.

In this paper, we consider the following generalizations of these problems.

Map-Weighted-PDD

Input: A phylogenetic X-network 𝒩, a food web on X, and integers k,D.

Question: Is there an γ-viable set SX such that |S|k, and PD𝒩(S)D?

We call the set S a solution of the instance. The problems Map-ε-PDD and Map-1-PDD are defined analogously, where the set S is required to be ε-viable and 1-viable, respectively.

We note that in ε-PDD, 1-PDD, and Weighted-PDD, as originally defined, the phylogenetic network is required to be a tree.

Throughout the paper, we adopt the convention that n is the number of taxa |X|=|V()| and that m is the number of edges of the food web |E()|.

Scanwidth.

For a directed, acyclic graph G=(V,E), a rooted, directed tree T=(V,E) is a tree extension of G if for each edge uvE there is a path from u to v in T. We say that an edge uvE passes over an edge eE if the (only) path from u to v in T contains e. For an edge uvE, the set of edges that pass over uv is GW(v) and T(v) is the set of vertices that can be reached from v in T. The scanwidth of a tree extension T of G is the maximum number of edges of G that pass over an edge of T. The scanwidth of G is the minimal scanwidth of any tree extension of G [2]. Computing the tree extension and scanwidth of a directed, acyclic graph is NP-hard [2] and, when considered for phylogenetic networks, FPT when parameterized by the level of the network [11, 13]. In this paper, we will use the parameter scanwidth sw of the food web, and assume we are given a tree extension of of minimum scanwidth.

Other Main Parameters.

Here, we define the main parameters which are used in this paper.

For an instance =(𝒩,,k,D) of Map-Weighted-PDD, we define k¯:=|X|k and D¯:=PD𝒩(X)D=eEω(e)D. Observe that if a set S of k taxa with diversity D is preserved, then k¯ taxa are not in S and a diversity of D¯ is lost. We therefore speak of D¯ as the acceptable loss of diversity.

We define δ to be the maximum in-degree of a reticulation of 𝒩. We define hR and hT to be the maximum number of reticulations and tree vertices, respectively, which are in a path in 𝒩. Further, we define h:=hR+hT to be the height of the network.

Color Coding.

In this paper, we use color coding methods. For an in-depth treatment of color coding, we refer the reader to [6, Chapter 5] and [1].

Definition 2.1.

For integers n and k, an (n,k)-perfect hash family is a family of mappings f:[n][k] such that for every subset Z of [n] of size at most k, there is an f which is injective when restricted to Z.

Each mapping in an (n,k)-perfect hash family can be seen as coloring of a set of n items into k colors. If we are interested in finding a set of at most k items satisfying some property, then we may assume that each element in the set will have a different color under some coloring. This assumption can make solving a problem easier, and we will use it in the algorithms developed in Sections 3 and 4.

Proposition 2.2 ([6, 24]).

For any integers n,k1, a (n,k)-perfect hash family of size ekk𝒪(logk)logn can be constructed in time ekk𝒪(logk)nlogn.

2.2 Related work

ε-PDD, defined in [23], was conjectured to be NP-hard [29]. A formal prove appeared in [9].

Theorem 2.3 ([9, Theorem 5.1]).

ε-PDDis NP-hard even if the phylogenetic tree has a height of 2 and the food web is an out-tree.

ε-PDDhas been analyzed within parameterized complexity [21] and it has been shown that ε-PDD is FPT when parameterized with D, but W[1]-hard with respect to D¯ [21]. In [28], further definitions for viability were given, among them γ-viable and 1-viable. It has been shown that 1-PDD is W[1]-hard with respect to k, D, k¯, and D¯ [12]. Further, ε-PDD [21] and Weighted-PDD [28] were analyzed with respect to parameters categorizing the structure of the food web.

Theorem 2.4.
  1. (a)

    ε-PDD is W[1]-hard when parameterized with k¯ or D¯, even if the phylogenetic tree is a star [21, Proposition 5.1].

  2. (b)

    1-PDD is W[1]-hard when parameterized with k¯ or D¯, even if the phylogenetic tree is a star [12, Theorem 3.3].

  3. (c)

    1-PDD is W[1]-hard when parameterized with k or D, even if the phylogenetic tree is a star [12, Theorem 3.2].

In recent years, the question of how to model phylogenetic diversity in networks best has drawn some attention [4, 31, 32, 34]. The measure All-Paths-PD, as defined in [4], is hereby the easiest to understand and also computationally slightly less challenging than other definitions. Map-PD is FPT when parameterized with D, and can be solved in 𝒪(2ret𝒩) time for ret𝒩 being the number of reticulations, in 𝒪(2tw𝒩) for tw𝒩 the treewidth of the network [16], and in 𝒪(2sw𝒩) time for sw𝒩 the scanwidth of the network [14].

Theorem 2.5 ([4, 16]).

Map-PDis W[2]-hard, when parameterized with the solution size k, even if every path from the root to a leaf contains exactly one tree vertex and one reticulation.

Recently, Map-PD has been considered in semidirected networks. In semi directed networks, Map-PD can be solved in 𝒪(2) time, where is the level of the network [14].

Our Contribution.

We analyze Map-ε-PDD and Map-1-PDD with respect to the parameters k, D¯, sw, δ, and h and show, for any combination of these parameters, whether Map-1-PDD is W[1]-hard, or admits an FPT algorithm. For Map-ε-PDD, we only leave one case as a conjecture and prove all others. All algorithms we prove for the generalization Map-Weighted-PDD. Parameterized results for Map-ε-PDD and Map-1-PDD are displayed in Table 1.

In Section 3, we prove that Map-Weighted-PDD is FPT with respect to D¯+sw. In Section 4, we show that Map-Weighted-PDD is FPT with respect to k+sw+h+δ. If any of these parameters is dropped, Map-1-PDD is W[1]-hard.

The algorithm presented in Section 3 is our primary methodological contribution. In this novel approach, only a single color per edge is used in a color coding algorithm. This can be used to prove that ε-PDD and 1-PDD are FPT with respect to k¯+sw, see Corollary 3.2b. We expect this to be applicable for similar problems parameterized with k¯ as well.

Due to space restrictions, proofs of theorems and lemmas marked with are partly or fully deferred to the appendix.

Table 1: This table shows parameterized complexity results. Except for the marked cell, Map-1-PDD and Map-ε-PDD have the same complexity result. All FPT results are new. For any set of taxa A, in polynomial time PD𝒩(A) and whether A is γ-viable can be computed. Consequently, by iterating over all subsets of X, Map-Weighted-PDD is trivially FPT for n=k+k¯<k+D¯.
Entry *: Map-1-PDD is W[1]-hard when parameterized with k+δ+h or even D+δ+h (Thm. 2.4c) while we conjecture Map-ε-PDD to be FPT when parameterized with k+δ+h (Con. 4.9).
sw δ h parameter alone k + parameter D¯ + parameter
no parameter W[2]-h Thm. 2.5,[4, 16] W[1]-h Thm. 2.4a&b,[21, 12]
p-NP-h Thm. 2.3,[9] W[2]-h Thm. 2.5,[4, 16] W[1]-h Thm. 2.4a&b,[21, 12]
p-NP-h Thm. 2.3,[9] W[2]-h Cor. 2.6 W[1]-h Thm. 2.4a&b,[21, 12]
p-NP-h Thm. 2.3,[9] * see caption W[1]-h Thm. 2.4a&b,[21, 12]
p-NP-h Thm. 2.3,[9] W[2]-h Thm. 2.5,[4, 16] FPT Cor. 3.2a
p-NP-h Thm. 2.3,[9] W[2]-h Thm. 2.5,[4, 16] FPT Cor. 3.2a
p-NP-h Thm. 2.3,[9] W[2]-h Cor. 2.6 FPT Cor. 3.2a
p-NP-h Thm. 2.3,[9] FPT Cor. 4.3 FPT Cor. 3.2a

2.3 Preliminary Observations

By Theorem 2.5 Map-PD is W[2]-hard with respect to k even for a network of a small height [16]. In this hardness reduction, however, the maximal in-degree of reticulations is big. It is an easy observation, that, in polynomial time, one can replace vertices of a large degree with a stack of vertices, where the newly created edges have negligible weight compared to the original edges of the network. This works for reticulations and tree vertices alike. We obtain the following result.

Corollary 2.6.

Map-PDis W[2]-hard with respect to k even if the network is binary.

If the food web is an out-tree, each taxon xs has exactly one prey. We conclude that the result of Theorem 2.3 also holds for 1-PDD.

Corollary 2.7.

1-PDDis NP-hard even if the phylogenetic tree has a height of 2 and the food web is an out-tree.

3 Parameter 𝑫¯+𝐬𝐰𝓕

By Theorem 2.4a and b, ε-PDD and 1-PDD are W[1]-hard when parameterized by D¯. In this section, we prove that Map-ε-PDD and Map-1-PDD are FPT when parameterized by D¯+sw. Further, we show that when the phylogenetic network is a tree, even an FPT-running time with respect to the smaller parameter k¯+sw is possible. To prove these results, we present our main methodological contribution. This approach uses anchors, as introduced in [17]. Yet, it is new to use this approach either on phylogenetic networks or for the smaller parameter k¯.

Theorem 3.1 ().

Let a tree extension T of the food web with scanwidth sw be given.

  1. (a)

    Map-Weighted-PDD can be solved in 𝒪(27.530D¯+sw+𝒪(log2(D¯))n|E(𝒩)|log|E(𝒩)|) time.

  2. (b)

    Weighted-PDD can be solved in 𝒪(215.059k¯+sw+𝒪(log2(k¯))n|E(𝒩)|log|E(𝒩)|) time.

As a consequence of this theorem, we obtain these results.

Corollary 3.2.
  1. (a)

    Map-ε-PDD and Map-1-PDD are FPT with respect to D¯+sw.

  2. (b)

    ε-PDD and 1-PDD are FPT with respect to k¯+sw.

For the remainder of this section, fix a tree extension T of the food web. Let x1,xn be an ordering of the taxa such that if xi is a parent of xj in T then i<j, and if xi and xj share a parent in T with i<j, then every vertex in the subtree of T rooted xi appears before every vertex in the subtree of T rooted xj in the ordering. Such an ordering can be found by taking a depth-first traversal of T. For a set of edges F in a directed acyclic graph G, we say an edge e=uvF is a highest edge in F if no incoming edge of u is in F, and similarly we say uvF is a lowest edge in F if no outgoing edge of v is in F. For a vertex u with outgoing edges uv and uv, we say uv is a sibling edge of uv.

The main ideas of our proof is as follows. Our approach uses color coding techniques, wherein we generate a number of colorings on the edges of 𝒩, and seek a solution under the assumption that a certain set fulfills that each edge in the set has a different color.

Our dynamic programming algorithm will search for a structure that we define below as perfect triples. Roughly speaking, a perfect triple (A,χ1,χ2) consists of a set of taxa A such that XA is a solution for our instance of Map-Weighted-PDD, and colorings χ1,χ2 assigning each leaf in A to a set of colors. Suppose the leaves of A are ’killed’ one at a time, in order determined by the depth-first traversal of T. Then as each leaf is deleted, a certain set of edges will be ’lost’ in the sense that they no longer have any offspring that have not been killed. For each xA, the set χ1(x) corresponds to the colors of the edges that are lost when x is deleted. Furthermore, for each highest edge e that is lost when x is killed, there must be a corresponding sibling edge e that has not yet been lost (otherwise the parent edge(s) of e would also be lost. The colors of these sibling edges are represented by the set χ2(x). Note that some of these sibling edges may themselves be lost when a later leaf from A is killed. We may assume, via standard color-coding techniques, that all the lost edges and sibling edges together are multicolored.

Our dynamic programming algorithm will keep track of the existence of perfect triples satisfying certain properties. In particular, we store the minimum possible total weight of the set of lost edges for a perfect triple.

We now give some formal definitions.

Definition 3.3.

Fix an integer N and a mapping c:ET(𝒩)[N]. For a taxon xX and a set of colors C[N], a set of edges FET(𝒩) is (x,C)-respecting, if

  • F contains the edge incoming at x,

  • c(e)C for each eF,

  • for each uvF there is a directed path from v to x within 𝒩[F], and

  • there is a set F of edges, which we call anchors of F, such that

    • c(e)C for each edge eF,

    • c(e1)c(e2) for each pair e1,e2FF, and

    • for each edge uvF, exactly one of the following is true

      1. (1)

        F contains all edges incoming at u and c(e)C for each edge e outgoing of u.

      2. (2)

        F does not contain any edge incoming at u and there is an edge e outgoing of u with c(e)C and eF.

When (x,C) is clear from the context we say an (x,C)-respecting set is simply respecting.

In Figure 1 an example of a respecting set of edges is given and some example networks for which x under a certain coloring has no respecting set of edges.

Figure 1: Top Left: An example network. The set Fx,C is highlighted in Refer to caption and one possible set Fx,C in Refer to caption. For the sake of readability, colors of edges in Fx,C are omitted. Bottom Left: The bipartite graph which is constructed in Lemma 3.5. A perfect matching is highlighted.
Right: Four examples of networks, where Fx,C is not a respecting set of edges.

We continue with proving some essential properties about respecting sets.

Lemma 3.4.

For each taxon xX and each set C[N] of colors, at most one set of edges is (x,C)-respecting.

Proof.

Towards a contradiction, assume that F1 and F2 are (x,C)-respecting and that there is uvF1 such that uvF2. Fix a set F1 of anchors of F, and a set F2 of anchors of F2. Choose any path from v to x in 𝒩[F1] and let uv be the first edge on this path that occurs in F2, also. Such an edge exists as the edge incoming at x is in any respecting set of edges. As F2 does not contain all edges incoming at u, there is an edge uwF2 with c(uw)C. But then F1 is not respecting, as F1 contains an edge incoming at u.

As a consequence of Lemma 3.4, we write Fx,C for the unique set of (x,C)-respecting edges, if existent. If the set Fx,C exists, we write c(Fx,C) for the colors on edges in Fx,C. By definition, c(Fx,C) and C are disjoint. If the set Fx,C does not exist, we define ω(Fx,C)=. We note that the set of anchors is not necessarily unique.

Lemma 3.5.

For a taxon xX and a set of colors C[N], in 𝒪(|E(𝒩)|N) time, we can compute Fx,C, or conclude that it does not exist.

Proof.

We construct a unique set of edges F such that either F is respecting for x and C, or such a set does not exist, as follows. Initially let F contain the unique incoming edge e at x. Now, for each edge uv in F in turn, if u has no outgoing edge uv with c(uv)C, then add all incoming edges of u to F. Repeat this process exhaustively to complete the construction of F. If c(e)C for some eF or if two edges have the same color, no respecting set exists. By using appropriate data structures, this can be implemented in 𝒪(|E(𝒩)|) time.

It remains to decide whether there exists a valid set of anchors F for F. In particular, for each highest edge uvF, we need to choose one outgoing edge uv with c(uv)C to add to F such that c(e1)c(e2) for each pair e1,e2F. This can be done by reducing to an instance of Perfect Matching, as follows. Construct a bipartite graph G with vertex set FhC, where Fh corresponds to the set of highest edges in F. For each eFh and cC, add an edge {e,c} to G if e has a sibling edge e with c(e)=c. Now, F is an (x,C)-respecting set if and only if G has a perfect matching covering Fh. Perfect Matching can be solved in 𝒪(|E(G)||V(G)|) time with the famous Hopcroft-Karp Algorithm [19]. For each edge in 𝒩, there is at most one edge in G. Thus, the overall running time is 𝒪(|E(𝒩)|N).

With respecting sets defined, we now formally define perfect triples. A triple (A,χ1,χ2), consisting of a set of taxa AX, and mappings χ1,χ2:A2[N], is perfect, if

  • the sets χi(x) and χi(y) are pairwise disjoint for x,yA and i{1,2},

  • the sets χ1(xi) and χ2(xj) are pairwise disjoint for xi,xjA and ij, and

  • for each xA there is a set of respecting edges Fx,χ2(x)E(𝒩) with χ1(x)=c(Fx,χ2(x)).

To give some intuition behind the notion of a perfect triple, we observe that the existence of a perfect triple (A,χ1,χ2) provides a lower bound on the phylogenetic diversity of (XA). The key idea is that as we remove the elements of A from X, one at a time, the set of edges lost with the removal of each xA is a subset of the edges in Fx,χ2(x).

Lemma 3.6 ().

PD𝒩(XA)ω(E(𝒩))xAω(Fx,χ2(x)) for every perfecttriple (A,χ1,χ2).

Having these definitions, we now define a colored problem, which we use as an auxiliary problem for solving Theorem 3.1. In ex-N-colored-Map-W-PDD, besides the usual input of Map-Weighted-PDD (𝒩,,k,D) – where γ is a weighting of – we are given a mapping c:E(𝒩)[N] of a color per edge. We ask whether there exists a set of taxa AX of size at least k¯ and mappings χ1,χ2:A2[N] such that XA is γ-viable, the triple (A,χ1,χ2) is perfect, and xAω(Fx,χ2(x))D¯.

Note that the existence of a perfect triple (A,χ1,χ2) does not imply that XA is γ-viable.

Lemma 3.7 ().

Given a tree-extension T of the food web, we can solve instances of ex-N-colored-Map-W-PDD in 𝒪(5N2swn(Nk¯+|E(𝒩)|)) time.

The intuition behind Lemma 3.7 is as follows: We consider the tree extension of the food web with a dynamic program bottom up. At each vertex v, we determine whether there exists a perfect triple (A,χ1,χ2) satisfying certain conditions, where A is a subset of T(v), the set of vertices descended from v in T. We want that XA is γ-viable; to help determine this we keep track of a subset of edges ΦGW(v), and require that T(v)A is Φ-part-γ-viable.

It remains to show how to reduce instances of Map-Weighted-PDD to instances of ex-N-colored-Map-W-PDD. This is done using standard color-coding techniques.

Proof of Theorem 3.1.

Reduction.

Let =(𝒩,,k,D) be an instance of Map-Weighted-PDD. If 𝒩 is a tree, then set N to 4k¯2 and otherwise to 2D¯.

Arbitrarily order the edges e1,,eq of 𝒩. We may assume q>N, as otherwise, we can consider a single instance of ex-q-colored-Map-W-PDD. Let be a (q,N)-perfect hash family. For every f we define a coloring cf by cf(ej)=f(j) for j[q] and let f=(𝒩,,k,D,cf) be the corresponding instance of ex-N-colored-Map-W-PDD. Solve every instance f, and return yes if and only if f is a yes-instance for some f.

Correctness.

The proof of the correctness and running time is deferred to the appendix.

4 Parameter 𝒌+𝐬𝐰𝓕+𝜹+𝒉

By Theorem 2.5, Map-ε-PDD and Map-1-PDD are W[2]-hard with respect to k+h, even if the food web does not contain edges. In the following, we therefore add the maximum in-degree of a reticulation δ as a parameter and show that even the more general problem Map-Weighted-PDD is FPT with respect to k+sw+h+δ.

To do this, we consider a parameter H generalizing the height of a tree. H is the maximum number of tree edges that, in 𝒩, are on a path from the root to any taxon. We observe that in a phylogenetic tree, H is the height of the tree minus one. Next, we prove bounds on the value of H in a phylogenetic network.

Lemma 4.1 ().

htH and Hδhrhtδh.

In this section we prove the following.

Theorem 4.2.

Given a tree extension T of the food web with scanwidth sw, Map-Weighted-PDD can be solved in 𝒪(22.443kH+sw+𝒪(log2(kH))swn|E(𝒩)|2log|ET(𝒩)|) time.

As Map-ε-PDD and Map-1-PDD are special cases of Map-Weighted-PDD, this result transfers to them.

Corollary 4.3.

Map-ε-PDDand Map-1-PDD are FPT with respect to k+sw+h+δ.

First we define some objects necessary in this chapter. For a a set A of taxa and a mapping χ:X2[kH], we say that tuple (A,χ) is colorful if χ(x)χ(y) for x,yA implies x=y.

Definition 4.4.

Fix a mapping c:ET(𝒩)[kH]. For a taxon xX and a mapping χ:X2[kH], A set of edges FE(𝒩) is (x,χ)-suitable, if

  • c(e)χ(x) for each edge eFET(𝒩),

  • c(e1)c(e2) for each pair e1,e2FET(𝒩), and

  • for each edge uvF, there is a path from v to x in 𝒩[F].

When (x,χ) is clear from the context we say an (x,χ)-respecting set is respecting.

Figure 2 shows an example. Note that a respecting set may contain reticulation edges.

Figure 2: A hypothetical network on 3 colors {,,}. For each taxon the (x,χ)-suitable set Fx,χ is indicated in , , or . In this network, H takes the value of 3 – there are three tree edges on paths to x3 or x4.

To prove Theorem 4.2, we first show how to solve a colored variant of Map-Weighted-PDD. In kH-colored-Map-W-PDD, besides the usual input (𝒩,,k,D), we are given a mapping c:ET(𝒩)[kH] of a color per tree edge. We ask whether a γ-viable set of taxa SX of size at most k and a mapping χ:S2[kH] exist such that (S,χ) is colorful and for each xS there is a set of suitable edges Fx,χ and xSω(Fx,χ)D.

Lemma 4.5.

Given an instance of kH-colored-Map-W-PDD, a leaf x, and a mapping χ, a suitable edge set F with maximal value ω(F) can be computed in 𝒪(2H|E(𝒩)|(kH+|E(𝒩)|)) time.

Because of this lemma, in the rest of the section, we simply will write ω(Fx,χ) when referring to the weight of a suitable edge set for x, which has maximal weight.

Proof.

Algorithm.

Let Ex be the set of tree edges uv with xoff(v). For a taxon x and a mapping χ, iterate over all subsets F of Ex. If uvF and u is a reticulation, then add all incoming edges of u to F, for each uvF. Add the edge incoming at x to F. Return the biggest value of a so computed F which is suitable.

Correctness.

If the algorithm returns value d, then there is a suitable set F with d=ω(F).

Let F be a suitable set. The set Fx:=FEx appears in the iteration. The edge incoming at x is in F, as it is suitable. Reticulation edges have no color. Thus, adding reticulation edges leading to F keeps the set suitable. Consequently, a set F with FF has been considered by the algorithm.

Running Time.

The set Ex has size at most H, by definition. All sets F are computed in 𝒪(2H|E(𝒩)|) time. Checking whether a set F is colorful can be done in 𝒪(kH+|E(𝒩)|) time.

Lemma 4.6.

If (S,χ) is colorful, then any (x,χ)-suitable set Fx,χ and any (y,χ)-suitable set Fy,χ are pairwise disjoint for x,yS, xy.

Proof.

Fix vertices x,yS and an edge uvFx,χ. If uv is a tree edge, then c(uv)χ(x) and because χ(x) and χ(y) are disjoint, we conclude uvFy,χ.

Now let uv be a reticulation edge. Let w be the unique first tree vertex on a path from v to x – or any other leaf. Then c(ew)χ(x) for the edge ew incoming at w. We conclude that ewFy,χ and there is no path from v to y in 𝒩[Fy,χ]. Consequently uvFy,χ.

If v=x, or the only taxon reachable from v is x, then uvFy,χ.

We write FS,χ for the union xSFx,χ for a set S. For a colorful set S we conclude with Lemma 4.6 that PD𝒩(S)ω(FS,χ)D.

We are now ready to prove how to solve instances of kH-colored-Map-W-PDD.

Lemma 4.7 ().

Given an instance of kH-colored-Map-W-PDD and a tree-extension T of the food web, we can solve in 𝒪(2kH+swk4H2swn|E(𝒩)|2) time.

The intuition behind Lemma 4.7 is as follows: We consider the tree extension of the food web bottom up in a dynamic programming algorithm. We track the existence of colorful tuples whose taxa are all below a given vertex of the tree extension. We index colorful tuples by the sets of colors used, as well as by the set of edges Φ for which the set of taxa is Φ-part-γ-viable, and we store the total weight of the suitable edge sets. We use the fact that the sets χ(x) and χ(y) have to be pairwise disjoint. Therefore, if a taxon x is saved and a set of colors has been assigned, these colors can be removed. We define χ(x) when selecting x.

It remains to show how to reduce instances of Map-Weighted-PDD to an instances of kH-colored-Map-W-PDD. For this, we use perfect hash families. (Definition 2.1)

Proof of Theorem 4.2.

Reduction.

Let =(𝒩,,k,D) be an instance of Map-Weighted-PDD.

Arbitrarily order the tree edges e1,,eq of 𝒩. We may assume q>kH. Let be a (q,kH)-perfect hash family. For every f we define a coloring cf by cf(ej)=f(j) for j[q] and let f=(𝒩,,k,D,cf) be the corresponding instance of kH-colored-Map-W-PDD. Now, solve instance f, and return yes if and only if f is a yes-instance for some f.

Correctness.

For any set EET(𝒩) of edges with a size of at most kH, there is a function f such that cf(E) contains each color at most once.

Now, let be a yes-instance of Map-Weighted-PDD with solution S={x1,,xk}X. Further, let ET(1) be the set of tree edges on paths from the root to x1 in 𝒩, and for i[k1] let ET(i+1) be the set of tree edges on paths to xi+1 which are not in ET(i). We define ET(S) as the union of these sets. By definition of H, each set ET(i) has a size of at most H. By definition of perfect hash families, there is some f, such that cf is injective on ET(S). Taking χ(xi)=cf(ET(i)), we conclude that (S,χ) is colorful and ω(FS,χ)=PD𝒩(S)D. Thus, (S,χ) is a solution of the yes-instance f of kH-colored-Map-W-PDD.

Conversely, whenever (S,χ) is a solution for instance f, then S is also a solution for .

Running Time.

The construction of takes ekH(kH)𝒪(logkH)qlogq time, and for each f the construction of instance f of kH-colored-Map-W-PDD takes time linear in ||. By Lemma 4.7, instances of kH-colored-Map-W-PDD can be solved in 𝒪(2kH+swk4H2swn|E(𝒩)|2) time, and the number of instances is ||=ekH(kH)𝒪(logkH)logq.

Thus, the total running time is 𝒪(ekH(kH)𝒪(logkH)logq(q+2kH+swk4H2swn|E(𝒩)|2)). This simplifies to 𝒪((2e)kH2sw+𝒪(log2(kH))swn|E(𝒩)|2log|ET(𝒩)|).

By Theorem 2.4c, Map-Weighted-PDD is not FPT with respect to k+h. But this differs when we add sw as a parameter. In a phylogenetic tree, the value of H is h.

Corollary 4.8.

Weighted-PDDis FPT with respect to k+h+sw and can be solvedin 𝒪(22.443kh+sw+𝒪(log2(kh))swn|E(𝒩)|2log|ET(𝒩)|) time.

Map-ε-PDD with respect to 𝒌+𝜹+𝒉

By Theorem 4.2, Map-Weighted-PDD is FPT with respect to k+sw+δ+h and therefore also Map-ε-PDD and Map-1-PDD. If any of the four parameters is dropped then Map-1-PDD is W[1]-hard, as pointed out in Table 2.

Table 2: Map-1-PDD is W[1]-hard if one of the parameters of k+sw+δ+h is dropped.
sw+δ+h k+sw+δ k+sw+h k+δ+h
Theorem 2.3,[9] Corollary 2.6 Theorem 2.5,[4, 16] Theorem 2.4c,[28]

While the first three hardness results hold for both problems, the last explicitly only shows hardness for Map-1-PDD. We expect this hardness not to hold for Map-ε-PDD, but believe that with an approach similar to the one presented in [21] to show that ε-PDD is FPT with respect to k+h, one can also show that Map-ε-PDD is FPT when parameterized with k+δ+h. Unfortunately, the proof given in [21] has an incorrect lemma. An example of the error is explained in greater detail in this paper’s Arxiv version [18]. We still believe both of the claims, the one [21] and following, to hold.

Conjecture 4.9.

Map-ε-PDDis FPT when parameterized with k+δ+h.

5 Discussion

In this paper, we made the approach to combine the natural problems of maximizing phylogenetic diversity in a network with viability constraints given through a food web. We defined the problem Map-Weighted-PDD and its special cases Map-ε-PDD and Map-1-PDD. We provided several FPT algorithms for these problems and even presented a complete complexity dichotomy by showing for which combination of parameters of k, D¯, sw, δ, and h, the three problems are in FPT and for which they are W[1]-hard.

Still, several questions remain open. The most obvious is whether ˜4.9 holds.

Further, in Section 3, we showed that Map-Weighted-PDD is FPT with respect to D¯+sw. We showed that this approach is sufficient to show that Weighted-PDD is FPT with respect to the smaller parameter k¯+sw. Yet, it is unclear whether Map-Weighted-PDD admits an FPT algorithm for this parameter.

Another major open question, already pointed out in [21], is whether ε-PDD and 1-PDD are FPT when parameterized with k. This question even remains open if the food web is restricted to an out-tree – or if any vertex in has a degree of at most 1.

In this paper, we observed the All-Paths-PD measure in phylogenetic networks as a measure of phylogenetic diversity. This measure is computationally the least challenging, but probably not the best measure in capturing phylogenetic diversity. In recent years, several other measures have been proposed [4, 31, 32, 34]. We wonder if there are also efficient algorithms for these problems that respect the viability of the selected set of taxa.

References

  • [1] Noga Alon, Raphael Yuster, and Uri Zwick. Color-Coding. Journal of the Association for Computing Machinery (JACM), 42(4):844–856, 1995. doi:10.1145/210332.210337.
  • [2] Vincent Berry, Celine Scornavacca, and Mathias Weller. Scanning phylogenetic networks is NP-hard. In Proceedings of the 46th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2020), pages 519–530. Springer, 2020.
  • [3] Magnus Bordewich and Charles Semple. Nature Reserve Selection Problem: A Tight Approximation Algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(2):275–280, 2008. doi:10.1109/TCBB.2007.70252.
  • [4] Magnus Bordewich, Charles Semple, and Kristina Wicke. On the complexity of optimising variants of phylogenetic diversity on phylogenetic networks. Theoretical Computer Science, 917:66–80, 2022. doi:10.1016/J.TCS.2022.03.012.
  • [5] Gerardo Ceballos and Paul R. Ehrlich. Mutilation of the tree of life via mass extinction of animal genera. Proceedings of the National Academy of Sciences, 120(39):e2306987120, 2023.
  • [6] Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. doi:10.1007/978-3-319-21275-3.
  • [7] Rodney G. Downey and Michael R. Fellows. Fundamentals of Parameterized Complexity. Texts in Computer Science. Springer, 2013. doi:10.1007/978-1-4471-5559-1.
  • [8] Daniel P. Faith. Conservation evaluation and Phylogenetic Diversity. Biological Conservation, 61(1):1–10, 1992.
  • [9] Beáta Faller, Charles Semple, and Dominic Welsh. Optimizing Phylogenetic Diversity with Ecological Constraints. Annals of Combinatorics, 15(2):255–266, 2011.
  • [10] Mariana Napolitano Ferreira. Conservation priorities mapping – A first step toward building area-based strategies. Frontiers in Science, 2:1440501, 2024.
  • [11] Niels Holtgrefe. Computing the Scanwidth of Directed Acyclic Graphs. Master’s thesis, Delft University of Technology, 2023.
  • [12] Niels Holtgrefe, Jannik Schestag, and Norbert Zeh. Limits of Kernelization and Parametrization for Phylogenetic Diversity with Dependencies. Manuscript in preparation, 2025.
  • [13] Niels Holtgrefe, Leo van Iersel, and Mark Jones. Exact and Heuristic Computation of the Scanwidth of Directed Acyclic Graphs. arXiv preprint arXiv:2403.12734, 2024. doi:10.48550/arXiv.2403.12734.
  • [14] Niels Holtgrefe, Leo van Iersel, Ruben Meuwese, Yuki Murakami, and Jannik Schestag. PANDA: Maximizing All-Path Phylogenetic Diversity in Networks. Manuscript in preparation, 2025.
  • [15] Daniel H. Huson and David Bryant. Application of Phylogenetic Networks in Evolutionary Studies. Molecular biology and evolution, 23(2):254–267, 2006.
  • [16] Mark Jones and Jannik Schestag. How Can We Maximize Phylogenetic Diversity? Parameterized Approaches for Networks. In Proceedings of the 18th International Symposium on Parameterized and Exact Computation (IPEC 2023). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.IPEC.2023.30.
  • [17] Mark Jones and Jannik Schestag. Maximizing Phylogenetic Diversity under Time Pressure: Planning with Extinctions Ahead. arXiv preprint arXiv:2403.14217, 2024. doi:10.48550/arXiv.2403.14217.
  • [18] Mark Jones and Jannik Schestag. Parameterized Algorithms for Diversity of Networks with Ecological Dependencies. arXiv preprint, 2025. arXiv:2510.09512.
  • [19] Marek Karpiński and Wojciech Rytter. Fast Parallel Algorithms for Graph Matching Problems. Oxford University Press, 1998.
  • [20] Christian Komusiewicz and Jannik Schestag. A Multivariate Complexity Analysis of the Generalized Noah’s Ark Problem. In Proceedings of the 19th Cologne-Twente Workshop on Graphs and Combinatorial Optimization, pages 109–121. Springer, 2023.
  • [21] Christian Komusiewicz and Jannik Schestag. Maximizing Phylogenetic Diversity under Ecological Constraints: A Parameterized Complexity Study. In Proceedings of the 44th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2024). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.FSTTCS.2024.28.
  • [22] Bui Quang Minh, Steffen Klaere, and Arndt von Haeseler. Phylogenetic Diversity within Seconds. Systematic Biology, 55(5):769–773, October 2006. doi:10.1080/10635150600981604.
  • [23] Vincent Moulton, Charles Semple, and Mike Steel. Optimizing phylogenetic diversity under constraints. Journal of Theoretical Biology, 246(1):186–194, 2007.
  • [24] Moni Naor, Leonard J. Schulman, and Aravind Srinivasan. Splitters and near-optimal Derandomization. Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 182–191, 1995. doi:10.1109/SFCS.1995.492475.
  • [25] Fabio Pardi and Nick Goldman. Species Choice for Comparative Genomics: Being Greedy Works. PLoS Genetics, 1(6):e71, 2005.
  • [26] Fabio Pardi and Nick Goldman. Resource-Aware Taxon Selection for Maximizing Phylogenetic Diversity. Systematic Biology, 56(3):431–444, 2007.
  • [27] Stuart L. Pimm, Clinton N. Jenkins, Robin Abell, Thomas M. Brooks, John L. Gittleman, Lucas N. Joppa, Peter H. Raven, Callum M. Roberts, and Joseph O. Sexton. The biodiversity of species and their rates of extinction, distribution, and protection. Science, 344(6187):1246752, 2014.
  • [28] Jannik Schestag. Weighted Food Webs Make Computing Phylogenetic Diversity So Much Harder. arXiv preprint, 2025. arXiv:2510.05911.
  • [29] Andreas Spillner, Binh T. Nguyen, and Vincent Moulton. Computing Phylogenetic Diversity for Split Systems. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(2):235–244, 2008. doi:10.1109/TCBB.2007.70260.
  • [30] Mike Steel. Phylogenetic Diversity and the Greedy Algorithm. Systematic Biology, 54(4):527–529, 2005.
  • [31] Leo van Iersel, Mark Jones, Jannik Schestag, Celine Scornavacca, and Mathias Weller. Average-Tree Phylogenetic Diversity of Networks. In 25th International Workshop on Algorithms in Bioinformatics (WABI 2025). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPIcs.WABI.2025.15.
  • [32] Leo van Iersel, Mark Jones, Jannik Schestag, Celine Scornavacca, and Mathias Weller. Phylogenetic Network Diversity Parameterized by Reticulation Number and Beyond. In Proceedings of the 23rd RECOMB International Workshop on Comparative Genomics (RECOMB-CG 2025), Seoul, Republic of Korea, 2025.
  • [33] Mark Vellend, William K. Cornwell, Karen Magnuson-Ford, and Arne Ø. Mooers. Measuring phylogenetic biodiversity, 2011.
  • [34] Kristina Wicke and Mareike Fischer. Phylogenetic diversity and biodiversity indices on phylogenetic networks. Mathematical Biosciences, 298:80–90, 2018.

A Appendix – Omitted proofs

Proof of Theorem 3.1.

Reduction.

Let =(𝒩,,k,D) be an instance of Map-Weighted-PDD. If 𝒩 is a tree, then set N to 4k¯2 and otherwise to 2D¯.

Arbitrarily order the edges e1,,eq of 𝒩. We may assume q>N, as otherwise, we can consider a single instance of ex-q-colored-Map-W-PDD. Let be a (q,N)-perfect hash family. For every f we define a coloring cf by cf(ej)=f(j) for j[q] and let f=(𝒩,,k,D,cf) be the corresponding instance of ex-N-colored-Map-W-PDD. Solve every instance f, and return yes if and only if f is a yes-instance for some f.

Correctness.

For any subset of edges E with |E|N, there is some f such that cf is injective on E by Definition 2.1.

Let f be a yes-instance of ex-N-colored-Map-W-PDD. Thus, there is a perfect triple (A,χ1,χ2) such that XA is γ-viable, A has a size of at least k¯ and for each xA there is a set of respecting edges Fx,χ2(x)E(𝒩) with χ1(x)=c(Fx,χ2(x)) and xAω(Fx,χ2(x))D¯. We show that S:=XA is a solution for instance . By definition, S is γ-viable and has a size of |X||A||X|k¯=k. As (A,χ1,χ2) is perfect, Fx,χ2(x) have pairwise disjoint colors for x,yA and are therefore disjoint. We conclude with Lemma 3.6 that PD𝒩(S)ω(E(𝒩))xAω(Fx,χ2(x))ω(E(𝒩))D¯=D.

Now, let be a yes-instance of Map-Weighted-PDD with solution SX. Let A:=XS and since we can assume that S has a size of k, the size of A is k¯. Let EAE(𝒩) be the set of edges uv where off(v)A. As DPD𝒩(S)=ω(E(𝒩)EA), we conclude that ω(EA)D¯.

Now define Fx1 to be the edges uv in EA with x1off(v), and define Fxi to be the edges uv in EAj=1i1Fxj with xioff(v). We define sets Fxi as follows. For each edge uvFxi, for which no incoming edge of u is in Fxi, we add an edge uw outgoing of u to Fxi, where uw is from Zi:=((E(𝒩)EA)j=1i1Fxi)j=1i1Fxi. If Fxi contains more than one edge outgoing of u, adding one edge is sufficient.

Claim A.1.

The set Zi contains an edge outgoing of u.

Proof.

Assume first that j=1i1Fxi contains an edge uv. Without loss of generality, let uvFxt and j=t+1i1Fxi does not contain edges outgoing of u. Then there is an edge uvtFxt and by construction uvt is not in Fxj for any j[i]. Consequently, the set Zi contains uvt.

Now, assume that j=1i1Fxi does not contain edges outgoing of u. If E(𝒩)EA contains an edge e outgoing of u, then eZi. Otherwise, all edges incoming at u are in EA. As Fxi contains no edge incoming at u, these edges and at least one edges e outgoing of u have to be in Fxt for some t[i1]. We conclude eZi and the set Zi contains an edge outgoing of u.

The sets Fxi are constructed to be the respecting sets and Fxi are the auxiliary sets from the definition of respecting sets. We observe that EA is the union of all Fxi. Thus, if 𝒩 is a tree, then |EA|2k¯1, as a forest with k¯ leaves has at most 2k¯1 edges. If 𝒩 is a network, then |EA|ω(EA)=ω(E(𝒩))PD𝒩(S)D¯. As we added at most one edge to Fxi per edge of Fxi, we conclude that the union U of all Fxi and Fxi contains at most 4k¯2 if 𝒩 is a tree, and 2D¯ edges if 𝒩 is a network.

Consequently, there is some f, such that cf is injective on U. We set χ1(xi)=cf(Fxi) and χ2(xi)=cf(Fxi). By the construction we conclude that (A,χ1,χ2) is perfect, S is γ-viable, and i=1k¯ω(Fxi,χ2(xi))=i=1k¯ω(Fxi)=ω(EA)k¯. Thus, f is a yes-instance of ex-N-colored-Map-W-PDD.

Running Time.

The construction of takes eNN𝒪(logN)qlogq time (Proposition 2.2), and for each f the construction of instance f of ex-N-colored-Map-W-PDD takes time linear in ||. By Lemma 4.7, solving instances of ex-N-colored-Map-W-PDD takes 𝒪(5N2swn(N2k¯2+|E(𝒩)|)) time, and the number of instances is ||=eNN𝒪(logN)logq.

Thus, the total running time is 𝒪(eNN𝒪(logN)logq(q+5N2swn(N2k¯2+E(𝒩)))). This simplifies to 𝒪((5e)N2sw+𝒪(log2(N))n|E(𝒩)|log|E(𝒩)|), as k¯N.

Inserting 2D¯ or 4k¯2 into N, respectively, gives the desired running times.

Proof of Lemma 3.6.

Recall that x1,,xn is the ordering of X given by a depth-first traversal of T. For the sake of notational convenience, assume that A={x1,,x|A|}.

The intuitive idea behind our proof is to show that, as we kill each of the taxa x1,,x|A| in order, the amount of diversity we lose by killing xi is at most ω(Fxi,χ2(xi)). To this end, define Ext𝒩(A):=E(𝒩)E(XA) for any AX. That is, Ext𝒩(A) is the set of edges in 𝒩 with all offspring in A.

We prove the following claim by induction. Ext𝒩({x1,,xs})i=1sω(Fxi,χ2(xi)), for each s[|A|]. Note that by letting s=|A|, this implies PD𝒩(XA)=ωΣ(E(𝒩)Ext𝒩(A))=ωΣ(E(𝒩))ωΣ(Ext𝒩(A))ωΣ(E(𝒩)xAω(Fx,χ2(x)), as required.

For the base case, observe that Ext𝒩({x1}) consists of the edges uv for which v has a path to x, and either v=x or v is a reticulation. But the definition of a respecting set implies that all such edges are also in Fx1,χ2(x1), and so Ext𝒩({x1})Fx1,χ2(x1).

For the inductive step, assume that Ext𝒩({x1,,xs})i=1sω(Fxi,χ2(xi)) for some s<|A|, and now assume for a contradiction that Ext𝒩({x1,,xs+1}) is not a subset of i=1s+1ω(Fxi,χ2(xi)). Then consider a lowest edge e=uv in Ext𝒩({x1,,xs+1})i=1s+1ω(Fxi,χ2(xi)). By definition edges outgoing of v are also in Ext𝒩({x1,,xs+1}), and as uv is a lowest edge, all edges outgoing of v are also in i=1s+1ω(Fxi,χ2(xi)). At least one child edge vw is not in Ext𝒩({x1,,xs}) (otherwise uvExt𝒩({x1,,xs})i=1sω(Fxi,χ2(xi)), and this edge vw must be in Fxs+1,χ2(xs+1). Then as v has an edge incoming which is not in Fxs+1,χ2(xs+1), it follows by definition of a respecting set that vw has a sibling edge vw with c(vw)χ2(s+1).

Note however that vwi=1s+1ω(Fxi,χ2(xi)), as c(vw)χ2(xs+1), which is disjoint from χ1(vi) for each is, and χ1(e)χ1(vi) for each eFxi,χ2(xi). Therefore, vw is also not in Ext𝒩({x1,,xs+1}), as otherwise uv was not a lowest edge in Ext𝒩({x1,,xs+1})i=1s+1ω(Fxi,χ2(xi)). It follows that w, and therefore v, has a path to a leaf which is not in {x1,,xs+1}, contradicting the assumption that uv was not a lowest edge in Ext𝒩({x1,,xs+1}).

Proof of Lemma 3.7.

Intuition.

We consider the tree extension of the food web with a dynamic program bottom up. At each vertex v, we determine whether there exists a perfect triple (A,χ1,χ2) satisfying certain conditions, where A is a subset of T(v), the set of vertices descended from v in T. We want that XA is γ-viable; to help determine this we keep track of a subset of edges ΦGW(v), and require that T(v)A is Φ-part-γ-viable.

Table Definition.

For a set of taxa QX, sets of colors C1,C2[N], a set of edges ΦE() between Q and XQ, and an integer [k¯]0, let 𝒮(Q,C1,C2,Φ,) be the set of perfect triples (A,χ1,χ2) of a set AQ of size at least and mappings χ1,χ2:A2C with χ1(A)=C1 and χ2(A)χ1(A)C2, for which QA is Φ-part-γ-viable.

We define a dynamic programming algorithm with table DP. For a vertex vV(T), sets of colors C1,C2[N], a set of edges ΦGW(v), and an integer [k]0, we store in DP[v,C1,C2,Φ,] the minimum value of ω(FA,χ2):=xAω(Fx,χ2(x)) for any perfect triple (A,χ1,χ2)𝒮(T(v),C1,C2,Φ,). If v an internal vertex of T with children w1,,wt, then we define further auxiliary tables DPi[v,C1,C2,Φ,Ψ,] analogously, with (S,χ1,χ2) considered from 𝒮(Qi,C1,C2,Pi,) with Qi:=j=1iT(wj) and Pi:=(ΦΨ)(j=1iGW(wj)); that is – only the first i children of v are considered. We require that the set of edges Ψ is either empty or pred(E)(v).

Algorithm.

We define the table in a bottom-up fashion. Let v be a leaf of T. (That is a taxon without predators). Fix disjoint color sets C1,C2[N]. If >1, we store in DP[v,C1,C2,Φ,]. If =0 and γΣ(Φ)1, we store 0 in DP[v,C1,C2,Φ,=0]. Otherwise – if =1 or γΣ(Φ)<1 – we store min{ω(Fv,C2)C2C2,c(Fv,C2)=C1} in DP[v,C1,C2,Φ,=1].

Now, let v be an internal vertex of T with children w1,,wt such that wi comes before wi+1 in the depth-first traversal ordering of X, for i[t1]. We define DP1[v,C1,C2,Φ,Ψ,] to be DP[w1,C1,C2,(ΦΨ)GW(w1),]. To compute further values for i[t1], we use the following recurrence in which we define DPi+1[v,C1,C2,Φ,Ψ,] to be

min DPi[v,C1,C2(C1C1),Φ,Ψ,] (3)
+DP[wi+1,C1C1,C2C2,(ΦΨ)GW(wi+1),].

Here, we take the minimum over all C1C1,C2C2 and []0.

Finally, if v=s or γΣ(Φprey(E)(v))1, we set DP[v,C1,C2,Φ,] to be

min{DPt[v,C1,C2,Φ,pred(E)(v),];DPt}. (4)

Here, DPt is minC2C2DPt[v,C1c(Fv,C2),C2C2,Φ,,1]+ω(Fv,C2). Otherwise, we set DP[v,C1,C2,Φ,] to be DPt. Intuitively, DPt corresponds to the case that v is one of the taxa to be going extinct, while DPt[v,C1,C2,Φ,pred(E)(v),] corresponds to the case that DPt[v,C1,C2,Φ,pred(E)(v),] corresponds to the case that v is saved.

We return yes if DP[s,C1,C2,,k¯]D¯, for some C1,C2[N] which are disjoint. Otherwise, we return no.

Correctness.

Let us quickly consider the basic case. For a leaf v in T, the set T(v) only contains v and so 𝒮(T(v),C1,C2,Φ,) for >1 is empty. If =0, the only possible triples (A,χ1,χ2) in 𝒮(T(v),C1,C2,Φ,) have A{v}, i.e. we must let v survive or go extinct. We can only let v survive if {v} is Φ-part-γ-viable. Thus we can store 0 if γΣ(Φ)1 and otherwise the minimal value of ω(FA,χ2) is just ω(Fv,χ2(x)), and so we store the minimum weight of a set of respecting edges Fv,C such that c(Fv,C)=C1 and CC2.

To show the correctness of Recurrence (3) and Recurrence (4), we assume that each previous table entries are correct and we prove that if DPi[v,C1,C2,Φ,Ψ,]=d, respectively DP[v,C1,C2,Φ,]=d, then there is a triple (A,χ1,χ2) with ω(FA,χ2)=d from 𝒮(Qi,C1,C2,Pi,), respectively 𝒮(T(v),C1,C2,Φ,). Afterward, we show that for each such triple, it holds that DPi[v,C1,C2,Φ,Ψ,]ω(FA,χ2), respectively DP[v,C1,C2,Φ,]ω(FA,χ2).

We first show the correctness of Recurrence (3). Assume that DPi+1[v,C1,C2,Φ,Ψ,]=d. Then, by Recurrence (3), there is a d1[d]0 and C1C1,C2C2,[]0 such that DPi[v,C1,C2,Φ,Ψ,]=d1 and DP[wi+1,C1C1,(C2C2)C1,(ΦΨ)GW(wi+1),]=dd1=:d2. Consequently, there is (A1,χ11,χ21) in 𝒮(Qi,C1,C2(C1C1),Pi,) such that ω(FA1,χ21)=d1 and there is (A2,χ12,χ22)𝒮(T(wi+1),C1C1,(C2C2),(ΦΨ)GW(wi+1),) such that ω(FA2,χ22)=d2. As Q and T(wi+1) are disjoint, so also A1 and A2 are disjoint. We therefore define a set A:=A1A2, and mappings χi with χi(x)=χij(x) for each taxon xAj, i,j{1,2}. Then ω(FA,χ2)=ω(FA1,χ21)+ω(FA2,χ22)=d1+d2=d. It remains to show that (A,χ1,χ2) is in 𝒮(Qi+1,C1,C2,Pi+1,). Most of the conditions follow because A1 and A2 are disjoint and the axioms hold for the individual sets. The difficult part is to show that χ1(xi) and χ2(xj) are disjoint for ij, in the case that xiA1 and xjA2. For this, we observe that χ11(A1)C1C1 and χ22(A2)C2C2C2, and these are disjoint.

Now assume (A,χ1,χ2) is in 𝒮(Qi+1,C1,C2,Pi+1,). Let A1:=AQi and let χi1 be the restriction of χi to A1, for i{1,2}. Similarly let A2:=AQi, and let χi2 be the restriction of χi to A2, for i{1,2}. It is straightforward to check that (A1,χ11,χ21) is in 𝒮(Qi,C1,C2(C1C1),Pi,) and (A2,χ12,χ22)𝒮(T(wi+1),C1C1,(C2C2),(ΦΨ)GW(wi+1),), where C1=χ1(A1),C2=χ2(A1)C1, and =|A1|.

We next show the correctness of Recurrence (4). Assume that DP[v,C1,C2,Φ,]=d. By Recurrence (4), we have DPt[v,C1,C2,Φ,pred(E)(v),]=d or minC2C2DPt[v,C1c(Fv,C2),C2C2,Φ,,1]+ω(Fv,C2)=d. In the former case, there is (A,χ1,χ2) in 𝒮(Qt,C1,C2,Pt,) with ω(FA,χ2)=d. We observe that (A,χ1,χ2) is also in in 𝒮(T(v),C1,C2,Φ,) which is sufficient for this case. in the latter case, fix C2 such that DPt[v,C1c(Fv,C2),(C2C2)c(Fv,C2),Φ,Ψ=,1]+ω(Fv,C2)=d. Consequently, there is a triple (A,χ1,χ2) in 𝒮(Qt,C1c(Fv,C2),(C2C2)c(Fv,C2),Pt,). Consider (A,χ1χ2) with A:=A{v}, χi is χi on A with χ2(v)=C2 and χ1(v):=c(Fv,C2). Clearly ω(FA,χ2)=ω(FA,χ2)+ω(Fv,C2)=d. As (T(v)A=QtA and QtA is Pt-part-γ-viable, we have also that T(v)A is ϕ-part-γ-viable (note that the only edges in Ptϕ are those incoming at v, which are not needed for a set not containing v.) It remains to show that (A,χ1,χ2) is perfect, assuming that (A,χ1,χ2) is perfect.

Note that v appears before any vertex in Qt in the depth-first traversal of T, and so we may assume v is the first element of A. It is clear that χ1(v)=c(Fv,C2) is disjoint from C1c(Fv,C2), and so χ1(x) and χ1(y) are disjoint for all x,yA. As χ2(y)C2C2(C1c(Fv,C2)) for all yA and C1,C2 are disjoint, we have that χ2(v)=C2 and χ2(y) are disjoint. Therefore χ2(x) and χ2(y) are disjoint for all x,yA. Similarly as χ1(v)=c(Fv,C2) and χ2(y)C2C2(C1c(Fv,C2) for yA, we have that χ1(xi) and χ2(xj) are pairwise disjoint for all xi,xjA with i<j. The existence of Fx,C2 follows from DPt+ω(Fv,C2)=d.

On the converse, if (A,χ1,χ2) is in 𝒮(T(v),C1,C2,Ψ,), then we can show by a case distinction and with arguments similar to the previous paragraph that DP[v,C1,C2,Φ,]ω(FA,χ2).

Running Time.

By Lemma 3.5, we can compute Fx,C or conclude that it does not exist for all xX and C[N] in 𝒪(2NNn|E(𝒩)|) time.

We observe that because C1 and C2 are disjoint, T is a tree and therefore any vertex has at most one parent, and the field Ψ can only take two values for a fixed vertex v.

In the basic cases, in Recurrence (3), and in Recurrence (4), we iterate over C2C2 and in Recurrence (3) we additionally iterate over C1C1. Any color c[N] can therefore be in [N](C1C2),C2C2,C2,C1C1 or in the case of Recurrence (3) also in C1.

Thus, all table entries can be computed within 𝒪(5N2swn(Nk¯+|E(𝒩)|)) time.

Proof of Lemma 4.1.

Recall hr (and ht) is the maximum number of reticulations (respectively tree vertices) on a path from the root to a leaf. For each path P from the root to a leaf we observe |E(P)ET(𝒩)|H and as there is a path of containing ht tree edges, we conclude htH.

To see that Hδhrhtδh, let 𝒩 be a network maximizing the value of H for values of hr,ht,δ, and let xX be a leaf for which the maximum number of tree vertices with a path to x is maximized. We show that there is no tree vertex below a reticulation on any path to x. Assume for a contradiction that there is an edge rv1E(𝒩) with v1VT(𝒩) and rVR(𝒩) and a path from v1 to x. Let u1,,us be the parents of r and w a child of v1 such that w has a path to x. Remove the edges uir for i[s], rv1, and v1w, add vertices v2,,vs with attached leaves, and add edges uivi for i[s], vir for i[s] and rw. Observe that after this transformation, the values of hr, ht, and δ have not changed, but there are s1 further tree vertices with a path to x. As s>1, this contradicts the maximality.

We may therefore assume that no tree vertices with a path to x are below a reticulation. We conclude that from the root at most δhr different paths of length ht of tree vertices can lead to a leaf. Figure 3 shows this scenario.

Figure 3: An example for Lemma 4.1 where H is maximized. This is the case if above the leaf an upside-down pyramid of reticulations is followed by δhr paths of tree vertices of length ht.

Proof of Lemma 4.7.

Intuition.

We consider the tree extension of the food web bottom up in a dynamic programming algorithm. We track the existence of colorful tuples whose taxa are all below a given vertex of the tree extension. We index colorful tuples by the sets of colors used, as well as by the set of edges Φ for which the set of taxa is Φ-part-γ-viable, and we store the total weight of the suitable edge sets. We use the fact that the sets χ(x) and χ(y) have to be pairwise disjoint. Therefore, if a taxon x is saved and a set of colors has been assigned, these colors can be removed. We define χ(x) when selecting x.

Table Definition.

For a set of taxa QX, a set of colors C[kH], a set of edges ΦE() between Q and XQ, and an integer [k]0, let 𝒮(Q,C,Φ,) be the set of colorful tuples (S,χ) of a set SQ of size at most and a mapping χ:S2C with χ(S)=C such that S is Φ-part-γ-viable. We define a dynamic programming algorithm with table DP. For a vertex vV(T), a set of colors C[kH], a set of edges ΦGW(v), and an integer [k]0, we store in DP[v,C,Φ,] the maximum value ω(FS,χ) of any tuple (S,χ)𝒮(T(v),C,Φ,). The value of an optimal set can then be found in DP[s,[kH],,k]. Is v an internal vertex of T with children w1,,wt, then we define further auxiliary tables DPi[v,C,Φ,Ψ,] analogously, where tuples (S,χ) are considered from 𝒮(Q,C,P,) with Q:=j=1iT(wj) and P:=(ΦΨ)(j=1iGW(wj)); that is – only the first i children of v are considered. We require that the set of edges Ψ is either empty or pred(E)(v).

Algorithm.

We define the table in a bottom-up fashion. Let v be a leaf of T. (That is a taxon without predators). In DP[v,C,Φ,], we store ω(Fv,χ) with χ(v)=C if γΣ(Φ)=1 and 1. Otherwise, we store 0.

Now let v be an internal vertex of T with children w1,,wt. We define DP1[v,C,Φ,Ψ,] to be DP[w1,C,(ΦΨ)GW(w1),]. To compute further values for i[t1], we use the following recurrence in which we define DPi+1[v,C,Φ,Ψ,] to be

maxCC,[]0DPi[v,C,Φ,Ψ,]+DP[wi+1,CC,(ΦΨ)GW(wi+1),]. (5)

Finally, if 1 and (v=s or γΣ(Φprey(E)(v))1), we set DP[v,C,Φ,] to be

max{DPt[v,C,Φ,,];maxCCDPt[v,CC,Φ,pred(E)(v),1]+ω(Fv,χ)}. (6)

Here, we use χ(v):=C. Otherwise, we set DP[v,C,Φ,] to be DPt[v,C,Φ,,].

We return yes if DP[s,[kH],,k]D. Otherwise, we return no.

Correctness.

We only show the correctness of Recurrence (6) and omit to show the similar case in Recurrence (5) as well as the basic cases. Assume that DP stores the correct value for all children of v in T and DPt stores the correct value for v. We show first that if DP[v,C,Φ,]=d, then there is a tuple (S,χ)𝒮(T(v),C,Φ,) and ω(FS,χ)=d. Afterward, we show that DP[v,C,Φ,]ω(FS,χ) for each tuple (S,χ)𝒮(T(v),C,Φ,).

So assume that DP[v,C,Φ,]=d. Consequently, we have DPt[v,C,Φ,,]=d, or there is some CC such that DPt[v,CC,Φ,pred(E)(v),1]=dω(Fv,χ) for χ(v)=C. In the former case, there is a colorful tuple (S,χ) with a set Si=1tT(wi)=T(v){v} of size and a mapping χ such that ω(FS,χ)=d and S is (Φ)-part-γ-viable. Consequently, S is also Φ-part-γ-viable in T(v). In the latter case, 1 and there is a colorful tuple (S,χ) with a set ST(v){v} of size 1 and a mapping χ with χ(S)=CC, such that χ(v)=C and ω(FS,χ)=dω(Fv,χ). Thus, adding v to S yields the desired set.

Conversely, let (S,χ)𝒮(T(v),C,Φ,) be given. Assume first that v is not in S. Because S is Φ-part-γ-viable, we conclude DP[v,C,Φ,]=DPt[v,C,Φ,,]ω(FS,χ). Now, let v be in S. As S is Φ-part-γ-viable and prey(E)(x)GW(v)=prey(E)(x)Φ for each xS, the set S{v} is (Φpred(E)(v))-part-γ-viable. We conclude (S{v},χ)𝒮(T(v){v},C,Φ,1) and therefore DPt[v,C,Φ,pred(E)(v),1]ω(FS,χ)ω(Fv,χ) and further DP[v,C,Φ,h]ω(FS,χ).

Running Time.

Observe that for any vertex v, only two values are possible for Ψ. Therefore, all tables have 𝒪(2kH+swnk) entries, together. By Recurrence (5), in time 𝒪(2kH2swk2sw(kH)n) all entries of DPi+1 can be computed, using convolutions. By Recurrence (6), we can compute DP[v,C,Φ,] in 𝒪(kH) time and ω(Fv,χ) needs to be computed once per vertex, which by Lemma 4.1 can be done in 𝒪(2H|E(𝒩)|(kH+|E(𝒩)|)) time. This leads to an overall running time of 𝒪(2kH+swk4H2swn|E(𝒩)|2).