Abstract 1 Introduction 2 Approximations for Constant 𝒏𝒅 3 Approximation Algorithms and Hardness for Graphical-GP2P with ±𝟏 Charges 4 A Polynomial-Time Logarithmic Approximation References Appendix A Standard Reductions Appendix B Restricted Instances of GP2P Appendix C Adapting k-MST Approximations

Approximation Algorithms for the Generalized Point-To-Point Problem

Zachary Friggstad ORCID Department of Computing Science, University of Alberta, Edmonton, Canada Mohammad R. Salavatipour ORCID Department of Computing Science, University of Alberta, Edmonton, Canada Hao Sun ORCID Department of Computer Science, University of Houston, TX, USA
Abstract

We consider the Generalized Point-to-Point (GP2P) problem in which we have an edge-weighted graph G with (possibly negative) node charges ϕ(v). The goal is to find a minimum-cost set of edges such that each component has nonnegative total charge. Viewing the positive charges as specifying supply and negative charges as demand quantities at various nodes, the problem is equivalent to build the cheapest network so that it is possible to satisfy all demands by routing supplies across the network.

This problem is a significant generalization of other network design problems such as the well-studied Steiner Forest problem. Even the special case of only having one single demand point (having charge k and all the other nodes having charge +1) is capturing the k-Minimum Spanning Tree problem. Earlier work by Hajiaghayi et al. (2016) [10] gave an O(logn) approximation in pseudo-polynomial time with further improved guarantees if the total supply is not much larger than the total demand, and also a 2-approximation if the total supply equals the total demand.

Our contributions are four-fold: (a) we show how known k-Minimum Spanning Tree approximations can be extended to GP2P approximations while losing only a ϵ-factor if the number of demand points in the instance is bounded by a constant, (b) we improve the running time to be Fixed-Parameter Tractable (FPT) in the number of demand points in constant-dimensional Euclidean metrics, (c) we give a 2-approximation in instances where edge costs are all 1 and ϕ(v)=±1 for each node v and show such instances are APX-hard, and (d) we show how the logarithmic approximations in earlier work can be modified to run in truly polynomial time.

Keywords and phrases:
Point-to-Point Network design, Approximation, Steiner Forest, k-MST
Funding:
Zachary Friggstad: Supported by NSERC.
Mohammad R. Salavatipour: Supported by NSERC.
Copyright and License:
[Uncaptioned image] © Zachary Friggstad, Mohammad R. Salavatipour, and Hao Sun; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Graph algorithms analysis
; Theory of computation Approximation algorithms analysis
Editors:
Pat Morin and Eunjin Oh

1 Introduction

Consider the following setting in a network: some locations have a finite supply of goods and some locations have a given demand for goods. The goal is, naturally, to route supplies to the demand locations in order to satisfy all demands. For example, in the Minimum-Cost Transportation the goal is to do this in a way that minimizes the total travel cost of all goods. However, it is natural to consider variations of this problem. A network-design version would be to install a minimum-cost set of connections (edges between locations) so that it is possible to find a routing satisfying demands such that each unit of supply only travels along installed connections. That is, we only pay one for each link. Such a setting is captured by the following problem.

Definition 1.

In the Generalized Point to Point Connection problem (GP2P), we are given a tuple (G=(V,E),c,ϕ) where G is an undirected graph with edge costs c(u,v)0, for all eE and (possibly negative) node charges ϕ(v), for all vV. The goal is to find a minimum-cost FE such that vCϕ(v)0 for each connected component CV of (V,F).

A convenient view is that ϕ(v)<0 corresponds to a demand point, ϕ(v)>0 corresponds to a supply point, and ϕ(v)=0 is a location that is a Steiner point. For brevity, throughout this paper when the instance is clear from the context we let Φ=vVmax{1,|ϕ(v)|}. This way, Φ represents the total charge in absolute value but also ensures to count the zero-charge nodes.

It seems the first study of GP2P was actually for a directed variant of the problem and was conducted by Di Gaspero et al.[5] in the course of studying a practical problem related to scheduling shifts. The undirected version itself was formally proposed by Even, Kortsarz, and Slany [6] under the name Infinite Capacity Minimum Edge Cost Flow problem and an O(logΦ)-approximation was given. The most recent work on GP2P is by Hajiaghayi et al. [10] where they give a 2-approximation when ϕ(V)=0 and an O(log(min{n,ϕ(V)+2}))-approximation in general time that is polynomial in Φ. This is achieved by an adaptation of the primal-dual algorithm of [9] for the classic Steiner Forest problem and its generalization (e.g. when we have a proper function). As pointed out in [6], many special cases of GP2P were already well-studied. For example:

  • Steiner Forest: We are given terminal pairs (s1,t1),,(sk,tk) in an edge-weighted graph and the goal is to buy the cheapest set of edges so each pair has both endpoints in the same component. This is modelled by GP2P by letting ϕ(si)=2i and ϕ(ti)=2i (and ϕ(v)=0 for all non-terminals). Goemans and Williamson give a 2-approximation for Steiner Forest [9].

  • k-Minimum Spanning Tree (k-MST): We are given a particular root node rV and an integer k0. The goal is to find the cheapest tree including r and at least k other nodes. This is modelled by GP2P by letting ϕ(r)=k and ϕ(v)=1 for all other vV{r}. Note, this instance of GP2P has only a single demand point. The history of approximating k-MST is storied, currently the best approximation is a 2-approximation [8, 4].

  • Knapsack Covering: We are given n items with sizes s1,,sn and costs c1,,cn plus a demand value D. The goal is to find a minimum-cost set of items with total size at least D. This modelled by GP2P by using a star with center r and leaf nodes v1,,vn. The cost of rvi is ci and the charges are ϕ(r)=D and ϕ(vi)=si. An FPTAS for Knapsack Covering is known, e.g. by slightly modifying the standard dynamic programming based FPTAS for standard for standard Knapsack.

Notably, all these cases have constant-factor approximations or better. An open problem is to determine if GP2P has a constant-factor approximation even if we allow pseudo-polynomial running time, i.e. running time that is polynomial in Φ; note for the purposes of getting an O(1)-approximation we can assume without loss of generality that the edge costs are bounded by a polynomial in n (see Lemma 19 in Appendix A). Even special cases of GP2P that generalize some of the classic problems stated above are open. For instance, for generalization of k-MST to the situation where we have more than one root node, say r1,r2,,rd, and the goal is to find a minimum cost subgraph where the connected component of each ri has at least k nodes, it is not known if one can get an O(1)-approximation or not. This is one of the cases we study in this paper.

1.1 Our Results

For brevity, we say an instance (G,c,ϕ) of GP2P is metric (Metric-GP2P) if G is a complete graph and edge costs satisfy the triangle inequality c(u,v)c(u,w)+c(w,v) for any distinct u,v,wV. As is usual with single-connectivity network design problems, e.g. Steiner Forest, assuming the graph is metric can be done without loss of generality. The standard proof is found in Lemma 18 in Appendix A. We also say an instance (G,c,ϕ) of GP2P is graphical (Graphical-GP2P) if all edge costs are 1 but G is not necessarily complete.

Our results need approximation algorithms for slight generalizations of k-MST.

Definition 2.

In k-MST with required nodes (k-MST-R), instead of a single root node rV we are given a subset RV. The goal is to find the cheapest tree spanning R and at least k other nodes in VR. In weighted k-MST-R (W-k-MST-R), each vertex vVR is given an integer weight wv0 and the goal is to find the cheapest tree spanning R plus a subset of nodes in VR with total weight at least k.

We note all k-MST approximations we consider in this paper can be readily adapted to the generalization k-MST-R. One can also extend them to W-k-MST-R that would run in pseudo-polynomial time (the simple reduction can be found in Appendix A).

Observation 3.

If there is a polynomial-time α-approximation for k-MST-R then there is an α-approximation for W-k-MST-R whose running time is polynomial in the number of nodes and the total weight of all nodes.

Let nd be the number of demand points in a given instance, i.e. |{vV:ϕ(v)<0}|. Recall from above that k-MST is a special case of GP2P when there is only one demand point, i.e. nd=1. We consider the case of GP2P with nd1. This is akin to generalizing k-MST to multiple roots, each with its own size requirement, but there is one key difference. One might imagine a multiple-root k-MST generalization would want to keep the trees disjoint but in GP2P we can have multiple demand points in a single component as long as the total supply in the component is at least the total demand in the component.

Our main result shows how an α-approximation for W-k-MST-R can be used to give a (1+ϵ)α-approximation for GP2P.

Theorem 4.

For any ϵ>0, given an α-approximation algorithm 𝒜 for W-k-MST-R, there is a (1+ϵ)α-approximation for GP2P. This algorithm makes nO(nd/ϵ) calls to 𝒜. For D-dimensional Euclidean metrics the running time of the GP2P approximation can be improved to O((16nd/ϵ)Dnd) calls to 𝒜.

We also show how existing algorithms for k-MST (the (2+ϵ)-approximation for k-MST in [3] for general metrics and the PTAS for k-MST in constant-dimensional Euclidean metrics [2, 11]) can be suitably adapted to given the same approximation guarantees for W-k-MST-R, which combined with the above theorem imply the following:

Corollary 5.

For any ϵ>0, there is a (2+ϵ)-approximation for GP2P that runs in time is nO(nd/ϵ). For D-dimensional Euclidean metrics there is a (1+ϵ)-approximation for GP2P with running time O((16nd/ϵ)DndΦO(D/ϵ))

Note that the (2+ϵ)-approximation for W-k-MST-R and for GP2P runs in truly polynomial time for any fixed nd but the (1+ϵ)-approximation for the Euclidean metrics is polynomial in Φ, hence only pseudo-polynomial, as it is not clear how to get a PTAS for W-k-MST-R in the Euclidean metrics.

We also mention that this would also extend to doubling metrics using the quasi-polynomial time (1+ϵ)-approximation for k-MST in metrics with constant doubling dimension by Talwar [12]. The running time would be quasipolynomial in Φ but still FPT in nd.

Our next collection of results is for Graphical-GP2P. First, we observe that particular restrictions of GP2P remain essentially as hard to approximate as the general problem.

Observation 6.

If there is a polynomial-time α-approximation for instances of GP2P with ϕ(v){1,+1} for each vV, then there is a 2α-approximation for general instances of GP2P with running time being polynomial in Φ.

Observation 7.

If there is a polynomial-time α-approximation for instances of Graphical-GP2P with ϕ(v){1,0,+1} for each vV, then for any ϵ>0 there is a (1+ϵ)α-approximation for general instances of GP2P with running time being polynomial in 1/ϵ and Φ.

These simple observations are proven in Appendix B. Our main contribution here is that the intersection of these two restrictions of GP2P is still APX-hard but does at least admit a simple approximation.

Theorem 8.

The restriction of Graphical-GP2P with ϕ(v)={1,+1} for each vV is APX-hard and admits a polynomial-time 2-approximation.

Finally, we improve the running time of the logarithmic approximations in [10] by making them run in truly polynomial time (i.e. removing the dependence on Φ).

Theorem 9.

There is an O(log(min{n,ϕ(V)+2}))-approximation for GP2P running in polynomial time.

1.2 Notation

For a graph G=(V,E) and a subset of edges FE, we let V(F) denote the set of all nodes in V that appear as the endpoint of at least one edge in F and say that F spans V(F). We refer to a component of G as a subset of vertices corresponding to a connected component of G. For a given instance (G=(V,E),c,ϕ) of GP2P, let Vs={vV:ϕ(v)>0} be the supply points, Vd={vV:ϕ(v)<0} be the demand points, and V0={vV:ϕ(v)=0} be the Steiner points. Correspondingly, let ns,nd, and n0 denote their sizes with n:=|V|=ns+nd+n0. We also use the convention c(F):=eFc(e) for FE and, similarly, ϕ(C):=vCϕ(v) for CV. Note ϕ(V) differs from Φ:=vVmax{1,|ϕ(v)|}, the former measures the excess of positive charge in the input and the latter measures the absolute value of all charges while ensuring Steiner points are still counted.

1.3 Organization

The algorithms proving Theorem 4 appear in Section 2. Theorem 8 is proven in Section 3 and Section 4 then concludes the paper with the proof of Theorem 9. The proofs of some supporting results as well as Observations 6 and 7 appear in the appendix.

2 Approximations for Constant 𝒏𝒅

The algorithms proving Theorem 4 are obtained by reducing to W-k-MST-R. Recall that in W-k-MST-R, we are given a set of required nodes R along with (integer) node weights wv0 and the goal is to find a minimum cost tree spanning R such that the total weight of the nodes in the tree is at least a given integer k. The special case of |R|=1 and wv=1 for all vr is the traditional k-MST problem. Despite its generality, W-k-MST-R can be approximated as well (or nearly as well) as the simpler k-MST problem by adapting the known algorithms.

Theorem 10.

The k-MST algorithm described [3] can be adapted to give a polynomial-time (2+ϵ)-approximation for W-k-MST-R for any constant ϵ>0. Similarly, the PTAS for k-MST in D-dimensional Euclidean metrics described in [11] with running time O(nO(D/ϵ)) can be adapted to give a PTAS for k-MST-R with the same asymptotic running time and a (1+ϵ)-approximation for W-k-MST-R with running time O((nΦ)O(D/ϵ)).

These adaptations are discussed in Appendix C. We emphasize both algorithms run in polynomial time (for constant ϵ) even if the number of required nodes R is not bounded by a constant. It is possible that the 2-approximations for k-MST in [8] or [4] could be extended to W-k-MST-R and that the PTAS for Euclidean k-MST in [2] could be similarly extended. We focus on these particular algorithms due to the ease in describing how to extend them.

We present the proof of Theorem 4 in two parts. First for the general metrics and then the improved FPT time for D-dimensional Euclidean metrics.

2.1 Theorem 4: General metrics

Throughout, let OPT denote the cost of an optimum solution. Fix ϵ>0 to be a sufficiently small constant, ϵ<1/5 suffices. From Lemma 18 (Appendix A), we may assume G is a complete graph with edge costs satisfying the triangle inequality. For any r0 and any vV we let B(v,r)={uV:c(u,v)r} be the ball of radius r around v.

The intuition for our algorithm is the following. If the optimum solution consisted of only a single tree containing all nodes with negative charge, we could treat it as a W-k-MST-R instance where R=Vd and k=ϕ(R) while setting wv=0 for vR and wv=ϕ(v) for vR. However, this may not be the case. To cope, we show how to decompose the instance into disjoint subproblems that have single-tree optimal solutions whose total costs are close to OPT.

To perform the decomposition, we first show that a near-optimum solution to the original instance (G,c,ϕ) exists such that any two trees in this solution notably far apart. Then we guess a small “net” of points in each tree, the union of balls with a particular common radius around each net point covers all trees, but two balls from different trees are disjoint. This allows us to partition the instance into disjoint instances, one per tree in the near optimum solution.

The first step is to show some near-optimum solution has its components being bounded away from each other.

Lemma 11.

There is a solution FE with c(F)(1+ϵ)OPT such that for any two components C1,C2 of (V,F) with C1Vd and C2Vd we have c(u,v)ϵndOPT for any uC1,vC2.

Proof.

Let F initially be an optimum solution. While there are two components C1,C2 of (V,F) with C1Vd and C2Vd such that c(u,v)<ϵndOPT for some uC1 and vC2, add uv to F. Each such addition increases the cost of F by at most ϵndOPT and this procedure will be executed fewer than nd times since there are initially at most nd components containing a node in Vd.

Of course, we may also assume, by discarding edges, that F is a minimal solution meaning F{e} is not feasible for any eF. So F consists of mnd vertex-disjoint trees, say T1,,TmF that collectively span all of Vd plus some other nodes in V0Vs. Any other node v not in a tree has ϕ(v)0 and forms an isolated component of (V,F).

Next, we identify a net of nodes in the trees T1,,Tm that will be small enough for our algorithm to guess.

Lemma 12.

For any tree T and any r>0 there is a set NV(T) with |N|1+c(T)/r such that c(v,N)r for each vV(T).

Similar constructions have been considered many times before, one example is in the (2+ϵ)-approximation for k-MST [3]. We include a proof for completeness.

Proof.

We prove this by induction on |V(T)|. Root T at an arbitrary node w. The base case is when c(w,v)r for each vV(T). If so, we simply let N={w}.

Otherwise, for two u,vV(T) let cT(u,v) be the cost of the unique uv path in T. Pick any vV(T) that has maximum value cT(w,v). Note cT(w,v)c(w,v)>r. Now let u be the furthest node along the vw path such that cT(v,u)r. Since c(v,u)cT(v,u)r, then uw. Let p(u) be the parent of u in T. By our choice of u we also have cT(v,p(u))>r.

Let Tu denote the subtree of T rooted at u. For any vV(Tu) we have c(v,u)cT(v,u)cT(v,u)r. Let T be the tree obtained by removing the subtree rooted at u from T. This removes all edges of the vp(u) path, so c(T)c(T)r. By induction, we can find a set NV(T) with |N|1+c(T)/rc(T)/r such that c(w,N)r for each wV(N).

Finally, set N={u}N and note |N|=1+|N|1+c(T)/r. Every uV(T) is now within distance r from some node in N, as required.

The entire algorithm is summarized in Algorithm 1. The high-level idea is that it guesses a value ν very close to OPT and then guesses the “nets” N1,,Nm from Lemma 12 applied to each tree of the near-optimum solution F using r=ϵ4ndν. Below, we show for the proper guess of N1,,Nm that the sets Vi obtained by the union of balls B(v,r) for vVi are disjoint. This instance of GP2P then naturally decomposes into disjoint instances of W-k-MST-R. Supporting results demonstrating the performance of our algorithm are found below.

Algorithm 1 GP2P approximation.

We first discuss the running time. The number of iterations of the outer loop is logarithmic in the ratio c(E)/minec(e), which is polynomial in the number of bits used to represent the costs in the instance. There are clearly only nd possible values for m and the number of m-tuples satisfying the stated bounds is at most nO(5nd/ϵ). So when nd is regarded as a constant, the total number of iterations is polynomial in the input size and, thus, the entire algorithm makes a polynomial number of calls to a k-MST-R approximation on instances with at most n nodes and, otherwise, runs in polynomial time.

Towards the performance guarantee, we show for the “correct” guess of values in the loops the algorithm will perform well.

Lemma 13.

Let ν[OPT,(1+ϵ)OPT] and T1,,Tm be the trees in the near-optimum solution F from Lemma 11. For each 1im, let Ni be the set identified by Lemma 12 when applied to Ti using r=ϵ4ndν. Also let Vi=vNiB(v,r) for each 1im.

Then (a) i=1m|Ni|5nd/ϵ, (b) ViVj= for distinct i,j, and (c) Vdi=1mVi.

Proof.

For (a), we have

i|Ni|m+c(F)rnd+(1+ϵ)OPTOPT4ndϵ5ϵnd.

For (b), if, say, wViVj for some distinct i,j then there would be some uNi and vNj such that

c(u,v)c(u,w)+c(w,v)r+r=ϵ2ndνϵ(1+ϵ)ndOPTϵndOPT.

But this contradicts Lemma 11, which showed c(u,v)>ϵndOPT. Finally, (c) follows because the balls B(v,r) for vNi collectively cover all of V(Ti) and each node of Vd lies on some Ti.

To finish the analysis, consider the iteration of the algorithm for the particular setting of ν and T1,,Tm described in Lemma 13. With these values, the algorithm proceeds to run the k-MST-R approximations. In the instance corresponding to Vi, we know Ti itself is a feasible solution so the returned tree Ti satisfies c(Ti)αc(Ti). Thus, the candidate GP2P solution found in this iteration has cost i=1mc(Ti)αc(F)α(1+ϵ)OPT.

2.2 Theorem 4: Euclidean metrics

We simply describe how to modify Algorithm 1. Clearly, we can use a (1+ϵ)-approximation for W-k-MST-R in Euclidean metrics to make Algorithm 1 a (1+ϵ)-approximation in Euclidean metrics. The pseudo-polynomial running time in the statement of Theorem 4 comes from the fact that we only know how to adapt k-MST PTASes to k-MST-R and then rely on Observation 3 to get a pseudo-polynomial time W-k-MST-R (1+ϵ)-approximation. One small comment is that even though the distances are not necessarily rational numbers, the number of iterations of the outer loop is still polynomial in the number of bits used to describe the locations of the points in V.

To improve the running time to be FPT in nd, we change how the nets are guessed. Let D be the dimension of the metric, recall that D is assumed to be a constant. For brevity, let δ:=ϵ/(16nd). Note 4δν is the value r from Algorithm 1. The idea of the improvement is the following. For simplicy let’s consider the case of D=2. If one considers a square of side length L, the number of disjoint balls of radius ϵL/nd that can be placed inside that square is O((nd/ϵ)2). This simple packing argument can be used to bound the number of guessed points for the nets N to be bounded by O((nd/ϵ)D).

For each guess ν in the outer loop, we first let N be any δν-net of V. That is, every vV has c(v,N)δν yet c(u,v)>δν for any u,vN. Such a set N can be constructed by greedily adding points while maintaining the property that c(u,v)>δν until no more points can be added. For each vV, let τ(v) be its closest point in N. So τ(v)=v for vN and, otherwise, we at least know c(v,τ(v))δν.

Now let N1,,Nm be the sets identified by applying Lemma 12 using r=δν and let Ni={τ(v):vNi}. For uNi and vNj for ij we still have that B(u,δ)B(v,δ)=. That is, suppose otherwise and let uNi be such that τ(u)=u and vNj be such that τ(v)=v. Then when ν[OPT,(1+ϵ)OPT] we have

c(u,v)c(u,u)+c(u,v)+c(v,u)δν+2δν+δνϵ4ndτϵndOPT

which contradicts Lemma 11 and the fact that u and v lie in different trees in F.

So it suffices to guess N1,,Nm in the algorithm. But now we leverage packing property of Euclidean metrics to help reduce the number of guesses to a constant depending on D,nd and ϵ.

Lemma 14.

For each vV, |B(v,ν)N| is bounded by O((4/δ)D).

Proof.

The Euclidean balls of radius δ/2ν about points in B(v,ν)N are disjoint by construction of N and are completely contained in the Euclidean ball of radius (1+δ/2)ν2ν about v. The volume a D-dimensional ball with radius r is within an absolute constant factor of f(R):=1D(2πeD)D/2RD. Therefore, |B(v,ν)N| is at most a constant factor times f(2ν)/f(δ/2ν)=(4/δ)D.

The steps for guessing N1,,Nm are to first try all ways to partition Vd into m nonempty groups, a coarse upper bound on the number of such choices is ndnd. For each such partition, let v1,,vmVd be any particular representatives from the m parts. We try all tuples N1,,Nm where each NiB(vi,ν)N such that i|Ni|17/ϵnd (as opposed to 5/ϵnd as in Lemma 13 since the radius δ is smaller). For each such tuple that passes the other requirements of Lemma 13, we partition the instance into disjoint Euclidean k-MST-R instances and approximate these with a PTAS. A coarse upper bound on the number of such tuples N1,,Nm is O((4/δ)Dnd).

3 Approximation Algorithms and Hardness for Graphical-GP2P with ±𝟏 Charges

Observations 6 and 7 show GP2P is not much easier to approximate if we assume either that ϕ(v){1,+1} for each vV or that the instance is graphical and ϕ(v){1,0,+1} for each vV. We begin this section by showing the common intersection of these two special cases does admit a 2-approximation. After this, we complete the proof of Theorem 8 by showing such GP2P instances remain APX-hard.

3.1 Graphical Instances with Unit Charges

Let (G=(V,E),c,ϕ) be an instance of Graphical-GP2P where ϕ(v){1,+1} for each vV. As usual, let OPT denote the optimum solution cost which, in this case, is just measuring the minimum size feasible solution FE.

Theorem 15.

Let FE be any minimal feasible solution (i.e. F{e} is not feasible for any eF). Then |F|2OPT.

A 2-approximation is then straightforward as one could simply start with F being any spanning tree and then iteratively try to drop an edge from F while retaining feasibility until no such drop is possible.

Proof.

Recall Vd={vV:ϕ(v)=1} and nd=|Vd|. So Vs=VVd as no charge is zero in this case. We claim ndOPT and that any minimal solution has size at most 2nd. Thus, any minimal solution F satisfies |F|2nd2OPT, as required.

For the first bound, let F be an optimum solution and C any connected component in (V,F) that contains at least one node in Vd. If C has, say, ndC1 nodes in Vd then C must contain at least ndC nodes in Vs as well. That is, |C|2ndC so F contains at least 2ndC1ndC edges in component C. Summing over all components that contain at least one node in Vd, we see |F|nd.

Now let F be any minimal feasible solution. Let C be any connected component of (G,F) and let FC be the edges of F in component C. If CVd= then minimality of F means C is a single node v with ϕ(v)=+1 (i.e. it has no edges). If CVd, we claim ϕ(C)=0. If so, then |CVd|=|CVs| so |FC|<2|CVd| as FC is a tree (by minimality). Summing over all components would complete the claim that |F|<2nd.

To see ϕ(C)=0, again recall FC is a tree. Further, for each eFC we have that deleting e would produce a component with negative charge but it could not be that both components have negative charge since ϕ(C)0. Consider the orientation of edges that directs each edge e toward the side that would have negative charge if e was deleted. Since (C,FC) is a tree, the orientation of edges produces a directed acyclic graph. Let r be any source node in this DAG.

View the tree (C,FC) as being rooted at r and say r has m children. Let C1,,CmC be such that Ci are the nodes in the subtree under the i’th child of r. Since CVD and ϕ(C)0, we know C has at least two nodes so m1.

By the orientation of edges, we have ϕ(Ci)1 for each 1im. Therefore

ϕ(C)=ϕ(r)+i=1mϕ(Ci)ϕ(r)mϕ(r)10

which, by feasibility of F, means ϕ(C)=0.

It may be possible to get a better-than-2 approximation through a more involved approach. For example, notice in our proof ndOPT is only tight if all components C of the optimum solution F with CVd had |C|=2 (i.e. F is a matching). So if the optimal solution F has at least, say, 0.01nd nodes of Vd in components of size greater than 2 then any minimal solution would be a 1.99-approximation. Otherwise, nearly all nodes of Vd are in components that consist of a single edge. In this case, a maximum-size matching between Vd and Vs would then create components of charge 0 that collectively include most nodes of Vd. One can even show there is a way to “alternate” the matching so that a greedy algorithm yields a good approximation (i.e. by alternating so our matching edges largely agree with the optimum matching edges), but efficiently finding such an alternation seems challenging.

One would reasonably wonder if instances of GP2P with unit-cost edges and ±1 charges are actually hard. Indeed, we conclude this section by showing APX-hardness.

Theorem 16.

The restriction of Graphical-GP2P to instances with unit edge costs and ±1 charges is APX-hard.

Proof.

We reduce from the Minimum Vertex Cover Problem in simple, cubic graphs which is known to be APX-hard [1]. That is, for some constants 0β<α1 it is NP-hard to distinguish if a cubic graph on n vertices has a vertex cover of size at most βn or if all vertex covers have size exceeding αn.

So let G=(V,E) be an n-vertex cubic graph with m=3n/2 edges. We obtain a graph H=(V,E) and set the charges of vV as follows.

Initially let H=G and set ϕ(v)=+1 for every vV. Then subdivide each edge e with a single vertex ue with ϕ(ue)=1. Then for each v we add new vertices v+,v,v0,v1,v2,v3 and edges {v0v,vv+,v+v,v+v1,v+v2,v+v3}. Here, v,v+,v1,v2,v3 all have positive charge and v0,v have negative charge. See Figure 1 for an illustration of this process.

Figure 1: An illustration of the reduction to a vertex v and two if its neighbours. In the right picture, positively-charged nodes are shown in grey and negatively-charged nodes are shown in white. Intuitively, selecting vv+ corresponds to using v in the vertex cover because it now permits using the positive charge nodes v1,v2,v3 to satisfy the demands of some negative charge nodes ue.

We claim G has a vertex cover of size k if and only the Graphical-GP2P instance given by H and ϕ has a solution of size 2n+2m+k=5n+k. So if G has a vertex cover of size at most βn then there is a GP2P solution of size at most (5+β)n and if all vertex covers in G have more than αn then all solutions to this GP2P instance have size more than (5+α)n. This shows there is no 5+α5+β-approximation for GP2P with unit edge costs and charges ±1 unless P = NP.

To see the claim, first let CV be a vertex cover of G with size k. To get a corresponding Graphical-GP2P solution, buy the following edges:

  • For each vC, purchase vv+.

  • For each vV, purchase v0v and v+v.

  • For each eE, let v be any endpoint of e in C. Purchase uev and any edge of the form viv+ that was not purchased by another edge this way. Since G is cubic, this is always possible.

In total, this purchases k+2n+2m=k+5n edges. This can be seen to be a feasible solution by matching each negative-charged vertex with a positively-charged vertex in its component in a one-to-one fashion. Such a mapping is immediate from the construction: each v0 can be matched with v, each v can be matched with v+, and each ue can be matched with the corresponding vi purchased in the description above.

For the converse, we first claim there is a well-structured optimal solution.

Claim 17.

There is an optimal solution F such that: (a) for each vV both v0v and v+v are in F, (b) F has exactly m edges of the form viv+ where vV and i=1,2,3, (c) each eE has vv+F for at least one endpoint v of e, and (d) for each edge e=vw exactly one of uevF or uewF.

If so, then C={v:vv+F} is a vertex cover of G and |F|=|C|+2n+2m=|C|+5n, as required to complete the proof.

Proof of Claim 17.

Let F be an optimal solution. First observe we must have v0vF and v+vF for every vV otherwise some negatively-charged node would be isolated. So in each component we must have the number of nodes of the form ue for various eE is at most the number of nodes of the form vi for various vV,i=1,2,3. By optimality of F and the fact each vi is a pendant node, we have that these counts are in fact equal in each component.

Let τ be any minimum-cost pairing of nodes ue,eE with nodes of the form vi lying in the same component as ue. Here, the cost of pairing two nodes is the length of their shortest path using only edges in F. For each eE, let Pe denote the corresponding shortest path from ue to τ(ue). Finally, let v(e) be the first vertex after ue along Pe, notice v(e) corresponds to an endpoint of e in the original cubic graph G.

Now consider an alternative pairing of the ue vertices with the vi vertices (which do not necessarily need to lie in the same component as ue, for now). For each ue, pair it with any vertex of the form v(e)i for some i=1,2,3 that has not been paired before. Such a pairing is possible since G is cubic. Let F be obtained by modifying F in the following way. From F, first delete all edges of the form v+viF then add all edges of the form v+vi where vi is paired under the new pairing.

This does not change the size of F. To ensure it is feasible, do the following. For each eE such that P was not initially paired with one of v(e)1,v(e)2,v(e)3, it must have been that v(e)ue was the second edge along P for some ee and that v(e)v(e) (otherwise Pe and Pe both use v(e)ue but in opposite directions, meaning we can uncross the paths to get a cheaper pairing than τ). So we remove v(e)ue and add v(e)v(e)+ (if it was not already there). Doing so for all eE will ensure it is now feasible since each ue can now reach the vertex it is paired with while not increasing the size of F because an edge of the form v(e)v(e)+ is added only after an edge of the form v(e)ue is removed. This also ensures all properties required by the claim now hold.

4 A Polynomial-Time Logarithmic Approximation

We show that a slight variation of algorithm of [10] yields an O(logn)-approximation in truly polynomial time. The algorithm in [10] begins by observing, using standard metric embeddings [7], that it suffices to given an O(1)-approximation if the graph is a tree which, after rooting at some vertex, has exactly two children per node. Then they present an exact algorithm using dynamic programming where one index of the DP table considers values up to Φ. Roughly speaking, for every vertex v and every ΦpΦ they consider the subtree Tv rooted at v and compute the cheapest subset of edges in the tree such that the component with v has charge p and every other component in the subtree has nonnegative charge.

We simply point out that we can flip the roles of charges and costs in the DP table. First, we use the reduction in Lemma 19 (using, say, ϵ=1/2) to instances where each edge has its cost being bounded by a polynomial in n. It is easy to verify this reduction produces a tree if the input graph was a tree. Let (T,c,ϕ) be the resulting instance of GP2P, i.e. T is a tree and each edge cost is a positive integer bounded by a polynomial in n.

Root T at an arbitrary node r and for each node v let Tv be the subtree under v. Our dynamic programming table is the following: for each node v and each 0ceEce let f[v,c] be the maximum p such that in the subtree Tv, it is possible to purchase edges with total cost at most c such that the component with v has charge at least p and every other component has nonnegative charge. Given these values, the optimum solution cost is then the minimum c such that f[r,c]0.

To compute the f[v,c] values:

  • If v is a leaf node then f[v,c]=ϕ(v).

  • Otherwise, say u,w are the two children of v. Intuitively, we try all subsets of {uv,wv} to delete and try all ways to split the remaining budget between the subproblems and keep the best solution found overall. That is, we try all ways to purchase a subset S{uv,wv} such that c(S)c and all 0cu,cw such that cu+cw+c(S)=c and such that f[u,cu]0 if uvS and f[w,cw]0 if uwS (i.e. if we do not purchase the corresponding parent edge, then the budget in the subproblem better be able to buy a feasible solution in the subtree).

    Then f[v,c] is the maximum of the following expression over such S,cu,cw:

    ϕ(v)+𝕀[uvS]f[u,cu]+𝕀[uwS]f[u,cw]

    where 𝕀[] is the {0,1}-indicator for the logical expression enclosed by the brackets.

The number of distinct subproblems is polynomial in n since the edge costs are integers at most n and there are at most 4(c+1) different ways to select (S,cu,cv) in a subproblem, the algorithm runs in polynomial time. In particular, if C denotes the total edge cost in the tree then the number of distinct subproblems is O(Cn) and processing each entry f[v,c] takes O(c) time (including O(c) recursive calls) so the total running time is O(C2n). Finally, if one only permits recursive calls to subproblems f[v,c] where c is at most the total edge cost in the subtree Tv and the loops over the split cu+cv=c only iterate over values where cu and cv are at most the total edge cost of their respective subtrees, the running time is improved to O(C2).

This polynomial-time approximation for GP2P in trees can be used in a black-box fashion to improve the running time of the O(log(min{n,ϕ(V)+2})) in [10] to run in true polynomial time.

References

  • [1] Paola Alimonti and Viggo Kann. Some apx-completeness results for cubic graphs. Theoretical Computer Science, 237(1):123–134, 2000. doi:10.1016/S0304-3975(98)00158-3.
  • [2] Sanjeev Arora. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. J. ACM, 45(5):753–782, September 1998. doi:10.1145/290179.290180.
  • [3] Sanjeev Arora and George Karakostas. A 2 + ϵ approximation algorithm for the k-mst problem. Math. Program., 107(3):491–504, July 2006.
  • [4] Emmett Breen, Renee Mirka, Zichen Wang, and David P. Williamson. Revisiting garg’s 2-approximation algorithm for the k-mst problem in graphs. In 2023 Symposium on Simplicity in Algorithms, SOSA 2023, pages 56–68. SIAM, 2023. doi:10.1137/1.9781611977585.CH6.
  • [5] Luca Di Gaspero, Johannes Gärtner, Guy Kortsarz, Nysret Musliu, Andrea Schaerf, and Wolfgang Slany. The minimum shift design problem. Annals of operations research, 155:79–105, 2007. doi:10.1007/S10479-007-0221-1.
  • [6] Guy Even, Guy Kortsarz, and Wolfgang Slany. On network design problems: fixed cost flows and the covering steiner problem. ACM Trans. Algorithms, 1(1):74–101, July 2005. doi:10.1145/1077464.1077470.
  • [7] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary metrics by tree metrics. Journal of Computer and System Sciences, 69(3):485–497, 2004. Special Issue on STOC 2003. doi:10.1016/j.jcss.2004.04.011.
  • [8] Naveen Garg. Saving an epsilon: a 2-approximation for the k-mst problem in graphs. In Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’05, pages 396–402, 2005. doi:10.1145/1060590.1060650.
  • [9] Michel X. Goemans and David P. Williamson. A general approximation technique for constrained forest problems. SIAM Journal on Computing, 24(2):296–317, 1995. doi:10.1137/S0097539793242618.
  • [10] Mohammadtaghi Hajiaghayi, Rohit Khandekar, Guy Kortsarz, and Zeev Nutov. On fixed cost k-flow problems. Theor. Comp. Sys., 58(1):4–18, January 2016. doi:10.1007/s00224-014-9572-6.
  • [11] Joseph S. B. Mitchell. Guillotine subdivisions approximate polygonal subdivisions: A simple polynomial-time approximation scheme for geometric tsp, k-mst, and related problems. SIAM Journal on Computing, 28(4):1298–1309, 1999. doi:10.1137/S0097539796309764.
  • [12] Kunal Talwar. Bypassing the embedding: algorithms for low dimensional metrics. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pages 281–290. ACM, 2004. doi:10.1145/1007352.1007399.

Appendix A Standard Reductions

Proof of Observation 3.

Each vVR with wv1 is replaced with (n+1)wv colocated copies and each vVR with wv=0 is left as well. In the new k-MST-R instance, use k:=(n+1)k. Consider any tree T (say, one obtained using a k-MST-R approximation) in the new instance that spans R and at least k other nodes. By adding 0-cost edges if necessary, we may assume that T contains all colocated copies of any node it spans.

Since at most n nodes have wv=0, then T spans at least (n+1)kn>(n+1)(k1) nodes from groups of colocated copies of original nodes. Since each group has a size that is a multiple of n+1, then T spans at least (n+1)k such nodes, i.e. viewing T as a tree in the original graph after contracting colocated copies of nodes it spans yields a feasible solution.

Lemma 18.

Let (G=(V,E),c,ϕ) be an instance of GP2P and let G=(V,E) be the complete graph over V with metric edge costs c(uv) given by the minimum-cost uv path. Any feasible solution to the GP2P instance (G,c,ϕ) can be mapped to a feasible solution to the Metric-GP2P instance (G,c,ϕ) with no greater cost and vice-versa.

Proof.

For each eE we have c(e)c(e) since e is one possible path between its endpoints. So for any feasible solution F to (G,c,ϕ) we have c(F)c(F), as required. Conversely, for any feasible solution FE to (G,c,ϕ) if we let F be the union of all shortest uv paths in G for each uvF then F then c(F)c(F). Also, two nodes that were in the same connected component in (V,F) still lie in the same connected component of (V,F). That is, each connected component C of (V,F) is the union of one or more connected components {C1,,Ca} in (V,F) so ϕ(C)=i=1aϕ(C)0.

Lemma 19.

For any constant ϵ>0, if there is an α-approximation for instances of GP2P where every edge cost c(e) is an integer at most n2/ϵ+1, then there is a (1+ϵ)α-approximation for general instances of GP2P.

Proof.

Let (G=(V,E),c,ϕ) be a general instance of GP2P with optimum solution FE with cost OPT. By contracting 0-cost edges (which does not change the optimal solution value), we assume c(e)>0 for each eE.

First, compute smallest value ν such that every connected component C in the graph Gν with edges {eE:c(e)ν} has ϕ(C)0. We claim νOPTnν: the first bound is because any feasible solution must use at least one edge of cost ν by our choice of ν and the other is because a spanning forest of Gν is a feasible solution using fewer than n edges each of cost at most ν.

Let E={eE:c(e)nν}; we have FE since OPTnν. Define new edge costs c(e):=nc(e)/(ϵν) for each eE. Notice c(e) is a positive integer (since c(e)>0) and c(e)nc(e)/(ϵν)+1n2/ϵ+1 by construction of E.

Let OPT denote the optimum solution cost for the GP2P instance ((V,E),c,ϕ). We have OPTc(F)eF(nc(e)/(ϵν)+1)=nOPT/(ϵν)+n. Therefore, running an α-approximation on this new instance finds a set of edges FE with c(F)α(nOPT/(ϵν)+n). Since c(e)ϵνc(e)/n for every eE,

c(F)ϵνc(F)/nα(OPT+ϵν)(1+ϵ)αOPT.

Appendix B Restricted Instances of GP2P

Proof of Observation 6.

Let (G=(V,E),c,ϕ) be an instance of GP2P. As noted in the introduction, we may assume this is an instance of Metric-GP2P. Finally, let V=VV0 (i.e. the nodes with non-zero ϕ(v)) and G be the subgraph of G induced by V. Notice the restriction of (G,c,ϕ) to H is an instance of Metric-GP2P.

Let FE be an optimal solution for the original Metric-GP2P instance. For each tree T in the forest F, let CT be the tour obtained by doubling the edges of T and shortcutting the resulting Eulerian tour past nodes in V0. In this way, CT spans all nodes in VV0 that are spanned by T and c(CT)2c(T). Thus, the optimal solution cost in the restriction to H is at most twice the optimal solution cost for (G,c,ϕ).

To complete the reduction, for each vertex v of H if ϕ(v)1 then replace v with ϕ(v) collocated copies each having charge 1 and if ϕ(v)1 then replace v with ϕ(v) collocated copies each having charge 1. Note this steps takes pseudopolynomial time.

Proof of Observation 7.

Consider any constant 0<ϵ1. Let (G=(V,E),c,ϕ) be an instance of GP2P. First we consider some preprocessing. If the 0-cost edges form a feasible solution, then it must be optimal so there is nothing more to do. Otherwise, apply Lemma 19 with ϵ:=ϵ/3 and let (G=(V,E),c,ϕ) be the resulting graph with positive integer edges costs being bounded by a polynomial in n.

Let K be a positive integer to be specified later. Form (G′′=(V′′,E′′),c′′,ϕ′′) by performing the following operations to G.

  • Subdivide each eE into a path of length Kc(e) of unit cost edges. Each new vertex v in the subdivision has ϕ′′(v)=0.

  • For each vV with ϕ(v)1, append a path Pv to v using ϕ(v)1 new vertices and edges: each edge has cost 1 and each vertex on Pv, including v itself, has ϕ′′(v)=1. Note, the other endpoint of Pv is a pendant.

  • Similarly each vV with ϕ(v)1, append a path Pv to v using ϕ(v)1 new vertices and edges: each edge has cost -1 and each vertex on Pv, including v itself, has ϕ′′(v)=1.

  • Each vV with ϕ(v)=0 also has ϕ′′(v)=0.

  • Finally, each edge e of this new graph G′′ has c′′(e)=1.

Observe (G′′,c′′,ϕ′′) is an instance of Graphical-GP2P with ϕ(v){1,0,+1} for each vV′′.

Note a solution FE naturally maps to a solution in the final instance with cost at most Φ+Kc(F) by including all pendant paths Pv and all subdivided paths corresponding to edges in F. Conversely, consider any solution F′′E′′. Let FE be the set of edges of G such that their entire subdivision is included in F′′. It is easy to verify that F is a feasible solution with cost at most c′′(F′′)/K.

Let OPT be the optimal solution value for instance (G,c,ϕ) and let F′′ be the result of using α-approximation on (G′′,c′′,ϕ′′). By the preceding discussion, this yields a feasible solution F to (G,c,ϕ) with

c(F)c′′(F′′)/Kα(Φ+KOPT)/K=αOPT+αΦ/K.

By setting K=3Φ/ϵ and noting that 1OPT as edge costs are positive integers in (G,c,ϕ), this is at most (1+ϵ/3)αOPT.

Finally, by accounting for the application of Lemma 19 at the start of this proof we see that we would have an approximation for the original instance (G,c,ϕ) with guarantee (1+ϵ/3)2α(1+ϵ)α.

Appendix C Adapting k-MST Approximations

We only sketch how the algorithms can be adapted. We refer the reader to the respective papers for their details.

Lemma 20 (Slight adaptation of Arora and Karakostas [3]).

There is a polynomial-time (2+ϵ)-Approximation for W-k-MST-R.

Proof.

The algorithm in [3] explicitly guesses a subset of vertices that appear in the optimum solution and builds that into the LP relaxation they write. So it can already handle the situation where we have a larger set of required nodes R. The only thing to mention is how it can be extended to handle node weights wv0 for vR. We can assume each node with weight wv is implicitly a collection of wv many nodes connected using 0-cost edges in a star fashion. The algorithm of [3] first guesses an additional set of size O(1/ϵ) of vertices of OPT to be required. The next step of the algorithm is to run a “primal dual” like algorithm to find a tree T. This step works without modification in our setting.

The final step in [3] is to modify T appropriately. That is, the manner in which T was constructed actually provides us with two options: include some subset of nodes or not. One choice would result in fewer than k non-required nodes being spanned and the other would result in at least k non-required nodes being spanned. In [3], it is mentioned how to pick the correct number of nodes contiguously from this “optional” portion so that grafting them in to the remaining portion of T yields a feasible solution with the required number of nodes. The grafting only costs O(ϵOPT) due to the guess of the net at the start of the algorithm. This can also be done in polynomial time if one implicitly maintains a 0-cost tree spanning each group of colocated points.

Lemma 21 (Slight adaptation of Arora [2]).

There is a PTAS for k-MST-R

We first comment that a similar adaptation could be made to Mitchell’s PTAS [11]. We chose this one because it was slightly easier to describe. We also emphasize that this adaptation is only for unweighted k-MST-R. Combining this with Observation 3 yields a pseudo-polynomial time approximation scheme for W-k-MST-R.

Proof.

The PTAS for k-MST in constant-dimensional Euclidean plane uses a dynamic programming routine through a quadtree dissection of the plane (an in higher dimensions a 2D-tree dissection in D-dimensional spaces). The DP table entries for each square, roughly speaking, describe the interface of the optimal solution across the boundary of that square through “portals” and also include the guess for how many nodes should be covered within the square. If there are required nodes R, we can use the same DP table and simply insist that the subproblem’s solution also span any nodes of R in the square. The base cases are trivially extended to this setting and the combination of subproblems (i.e. the recurrence) for a non-base case is identical to before.