A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics

We present a unified polynomial-time approximation scheme (PTAS) for the prize collecting traveling salesman problem (PCTSP) and the prize collecting Steiner tree problem (PCSTP) in doubling metrics. Given a metric space and a penalty function on a subset of points known as terminals, a solution is a subgraph on points in the metric space, whose cost is the weight of its edges plus the penalty due to terminals not covered by the subgraph. Under our unified framework, the solution subgraph needs to be Eulerian for PCTSP, while it needs to be connected for PCSTP. Before our work, even a QPTAS for the problems in doubling metrics is not known. Our unified PTAS is based on the previous dynamic programming frameworks proposed in [Talwar STOC 2004] and [Bartal, Gottlieb, Krauthgamer STOC 2012]. However, since it is unknown which part of the optimal cost is due to edge lengths and which part is due to penalties of uncovered terminals, we need to develop new techniques to apply previous divide-and-conquer strategies and sparse instance decompositions.

1 Introduction more general than Euclidean spaces, recent results show that many optimization problems have similar approximation guarantees for both spaces: there exist PTAS's for the TSP [BGK16], a certain version of the TSP with neighborhoods [CJ16], and the Steiner forest problem [CHJ16], in doubling metrics. Our Contributions. In this paper, we extend this line of research, and give a unified PTAS framework for both PC TSP and PC STP . We use PC X when the description applies to either problem. Our main result is Theorem 1.1.
Theorem 1.1. For any 0 < < 1, there exists an algorithm that, for any PC X instance with n terminal points in a metric space with doubling dimension at most k, runs in time and returns a solution that is a (1 + )-approximation with constant probability.
Technical Difficulty. As a first trial, one might try to adapt the sparsity framework used in previous PTAS's for the TSP and Steiner forest problems [BGK16,CJ16,CHJ16] in doubling metrics. The framework typically uses a polynomial-time estimator H on any ball B, which gives a constant approximation for PC X on some appropriately defined sub-instance around B. Intuitively, the estimator works because the local behavior of a (nearly) optimal solution can be well estimated by looking at the sub-instance locally. In particular, the following properties are needed in this framework: • If H(B) is large, then the optimal solution for the sub-instance induced on B is large; moreover, any (nearly) optimal solution for the global instance would have a large part of its cost due to B.
• If H(B) is small, then for any (nearly) optimal solution F for the global instance, the cost of F contributed by the sub-instance due to B should be small.
While the first property is somehow straightforward, the following example shows that the second property is non-trivial to achieve in PC X . Example Instance: Figure 1. The example is defined on the real line. The terminals are grouped into two clusters. The left cluster contains 2m terminals, and the right cluster contains m terminals. Within each cluster, the distance between adjacent terminals is 1. The two clusters are at distance l apart. The penalty for each terminal is t. The parameters are chosen such that l mt and t m. Observe that for PC X , the optimal solution is to visit all the terminals in the left cluster with total edge weights O(m) and incur the penalty mt for the terminals in the right cluster. The reason is that it will be too costly to add an edge to connect terminals from different clusters, and it is better to visit the cluster with more terminals and suffer the penalty for the cluster with fewer terminals. Local Estimator Fails on the Example Instance. Suppose the estimator is applied around a ball B centered at some terminal in the right cluster with radius r. Then, any constant-approximate solution for the sub-instance needs to connect all Θ(r) terminals in the ball, since the penalty for any single terminal is too large. This costs Θ(r). However, in the optimal solution, no terminal in the right cluster is visited and all penalties are taken, which has cost Ω(tr). Hence, the estimator fails to serve as an upper bound for the contribution by ball B to the cost of an optimal solution.
The conclusion is that the optimal solution of a local sub-instance can differ a lot from how an optimal global solution behaves for that sub-instance. Our Insight: Trading between Weight and Penalty. Our example in Figure 1 shows that what points a local optimal solution visits in a sub-instance can be very different from the points in the sub-instance visited by a global optimal solution. Our intuition is that the optimal cost of a sub-instance should reflect part of the cost in a global optimal solution due to the sub-instance. In other words, if a sub-instance has large optimal cost, then any global solution either (1) has a large weight within the sub-instance, or (2) suffers a large penalty due to unvisited terminals in the sub-instance. This insight leads to the following key ingredients to tackle the aforementioned technical difficulty.
1. Inferring Local Behavior from Estimator. In Lemma 3.1, we show that the value returned by the local estimator (which consists of both the weight and the penalty) on a ball B gives an upper bound on the weight w(F | B ) of any (near) optimal tour F inside ball B. In the example in Figure 1, a global optimal solution does not visit the right cluster at all, and hence, the local estimator on the right cluster does give an upper bound on the weight part of the global solution due to the right cluster. This turns out to be sufficient because the sparsity of a solution is defined with respect to only the weight part (and not the penalty part). Hence, the local estimator can be used in the sparsity decomposition framework [BGK16,CJ16,CHJ16] to identify a critical instance W 1 (i.e., the local estimator reaches some threshold, but still not too large) around some ball B. Since the instance W 1 is sparse enough, an approximate solution F 1 can be obtained by the dynamic program framework. Then, one can recursively solve for an approximate solution F 2 for the remaining instance W 2 . However, we need to carefully define W 2 and combine the solutions F 1 and F 2 , because, as we remarked before, even if the approximate algorithm returns F 1 for the instance W 1 , a near optimal global solution might not visit any terminals in W 1 . 2. Adaptive Recursion. In all previous applications of the sparsity decomposition framework, after a critical ball B around some center u is identified, the original instance is decomposed into sub-instances W 1 and W 2 that can be solved independently. An issue in applying this framework is that after obtaining solutions F 1 and F 2 for the sub-instances, in the case that F 1 and F 2 are far away from each other as in our example in Figure 1, it is not clear immediately which of F 1 and F 2 should be the weight part of the global solution and which would become the penalty part. We use a novel idea of adaptive recursion, in which W 2 depends on the solution F 1 returned for W 1 . The high level idea is that in defining the instance W 2 , we add an extra terminal point at u, which becomes a representative for solution F 1 . The penalty of u in W 2 is the sum of the penalties of terminals in W 1 minus the cost c(F 1 ) of solution F 1 . After a solution F 2 for W 2 is returned, if F 2 does not visit the terminal u, then edges in F 1 are discarded, otherwise the edges in F 1 and F 2 are combined to return a global solution.
We can see that in either case, the sum c(F 1 )+c(F 2 ) of the costs of the two solutions reflect the cost of the global solution. In the first case, F 2 does not visit u and hence, c(F 2 ) contains the penalty due to u, which is the penalties of unvisited terminals in W 1 minus c(F 1 ). Therefore, when c(F 1 ) is added back, the sum simply contains the original penalties of unvisited terminals in W 1 . In the second case, F 2 does visit u and does not incur a penalty due to u. Therefore, c(F 1 ) + c(F 2 ) does reflect the cost of the global solution after combining F 1 and F 2 . Revisiting the Sparsity Structural Lemma. All PTAS's in the literature for TSP-like problems in doubling metrics rely on the sparsity structural lemma [BGK16, Lemma 3.1]. Intuitively, it says that if a solution is sparse, then there exists a structurally light solution that is (1+ )-approximate. Hence, one can restrict the search space to structurally light solutions, which can be explored by a dynamic program algorithm. Because of the significance of this lemma, we believe that it is worthwhile to give it a more formal inspection, and in particular, resolve some significant technical issues as follows.
• Issue with Conditioning on the Randomness of Hierarchical Decomposition. Given a hierarchical decomposition and a solution T , the first step is to reroute the solution such that every cluster is only visited through some designated points known as portals. The randomness in the hierarchical decomposition is used to argue that the expected increase in cost to make the solution portal-respecting is small.
However, typically the randomness in the hierarchical decomposition is still needed in subsequent arguments. Hence, if one analyzes the portal-respecting procedure as a conceptually separate step, then subsequent uses of the randomness of the hierarchical decomposition need to condition on the event that the portal-respecting step does not increase the cost too much. Moreover, edges added in the portal-respecting step are actually random objects depending on the hierarchical decomposition, and hence, will in fact cross some clusters with probability 1. Unfortunately, even in the original paper by Talwar [Tal04] on the QPTAS for TSP in doubling metrics, these issues were not addressed properly.
• Issues with Patching Procedure. A patching procedure is typically used to reduce the number of times a cluster is crossed. In the literature, after reducing the number of crossings, the triangle inequality is used to implicitly add some shortcutting edges outside the cluster. However, it is never argued whether these new shortcutting edges are still portal-respecting. It is plausible that making them portal-respecting might introduce new crossings.
From the above discussion, it is evident that one should consider the portal-respecting step and the patching procedure together, because they both rely on the randomness of the hierarchical decomposition. To make our arguments formal, we need a more precise notation to describe portals, and in Section 5, we actually revisit the whole randomized hierarchical decomposition to make all relevant definitions precise. In Theorem 5.1, we analyze the portal-respecting step and the patching procedure together through a sophisticated accounting argument so that the patching cost is eventually charged back to the original solution (as opposed to stopping at the transformed portal-respecting solution).
Moreover, we give a unified patching lemma that works for both PC TSP and PC STP . Even though our proofs use similar ideas as previous works, the charging argument is significantly different. Specifically, our argument does not rely on the small MST lemma [Tal04, Lemma 6], which was also used in [BGK16]. Paper Organization. Section 2 gives the formal notation and describes the outline of the sparsity decomposition framework to solve PC X . Section 3 gives the properties of the local sparsity estimator. Section 4 gives the technical details of the sparsity decomposition and shows that approximate solutions in sub-instances can be combined to give a good approximation to the global instance. Section 5 revisits the hierarchical decomposition and sparse instance frameworks for TSP-like problems in doubling metrics; the notation is more involved than previous works, and readers who are already familiar with the literature might choose to skip it during the first read. Section 6 gives the details of the dynamic program for sparse instances and the analysis of its running time.

Preliminaries
We consider a metric space M = (X, d) (see [DL97,Mat02] for more details on metric spaces), where we refer to an element x ∈ X as a point or a vertex. For x ∈ X and ρ ≥ 0, a ball B(x, ρ) is the set {y ∈ X | d(x, y) ≤ ρ}. The diameter Diam(Z) of a set Z ⊂ X is the maximum distance between points in Z. A set S ⊂ X is a ρ-packing, if any two distinct points in S are at a distance more than ρ away from each other. A set S is a ρ-cover for Z ⊆ V , if for any z ∈ Z, there exists x ∈ S such that d(x, z) ≤ ρ. A set S is a ρ-net for Z, if S is a ρ-packing and a ρ-cover for Z. We assume the access to an oracle that takes a series of balls {B i } i where each B i is identified by the center and radius, and returns a point x ∈ X such that ∀i, x / ∈ S i 2 . A greedy algorithm can construct a ρ-net efficiently given the access to this oracle.
We consider metric spaces with doubling dimension [Ass83,GKL03] at most k; this means that for all x ∈ X, for all ρ > 0, every ball B(x, 2ρ) can be covered by the union of at most 2 k balls of the form B(z, ρ), where z ∈ X. The following fact captures a standard property of doubling metrics.
Fact 2.1 (Packing in Doubling Metrics [GKL03]). Suppose in a metric space with doubling dimension at most k, a ρ-packing S has diameter at most R. Then, |S| ≤ ( 2R ρ ) k .
Edges. An edge 3 e is an unordered pair e = {x, y} ∈ X 2 whose weight w(e) = d(x, y) is induced by the metric space (X, d). Given a set F of edges, its vertex set V (F ) := ∪ e∈F e ⊂ X is the vertices covered (or visited ) by the edges in F . If T ⊂ X is a set of vertices, we use the shorthand T \ F := T \ V (F ) to denote the vertices in T that are not covered by F . Problem Definition. We give a unifying framework for the the prize collecting traveling salesman problem (PC TSP ) and the prize collecting Steiner tree problem (PC STP ) , and we use PC X when the description applies to both problems. An instance W = (T, π) of PC X consists of a set T ⊂ X 2 Such an oracle is trivial to construct for finite metric spaces. It may also be efficiently constructed for many special infinite metric spaces, such as bounded dimensional Euclidean spaces.
3 To have a complete description, we also need the notion of self-loop, which is simply a singleton {x}.
of terminals (where |W | := |T | = n) and a penalty function π : T → R + . The goal is to find a (multi-)set F ⊂ X 2 of edges with minimum cost 4 c W (F ) := w(F ) + π(T \ F ), such that the following additional conditions are satisfied for each specific problem: • For PC TSP , the edges in the multi-set F form a circuit on V (F ); for |V (F )| = 1, F contains only a single self-loop (with zero weight). • For PC STP , the edges F form a connected graph on V (F ), where we also allow the degenerate case when F is a singleton containing a self-loop. The vertices in V (F ) \ T are known as Steiner points. Simplifying Assumptions and Rescaling Instance. Fix some constant > 0. Since we consider asymptotic running time to obtain (1 + )-approximation for PC X , we consider sufficiently large n > 1 . Since F can contain a self-loop, an optimal solution covers at least one terminal u. Moreover, there is some terminal v (which could be the same as u) such that the solution covers v and does not cover any terminal v with d(u, v ) > d(u, v). Since we aim for polynomial time algorithms, we can afford to enumerate the O(n 2 ) choices for u and v.
For some choice of u and v, suppose R := d(u, v). Then, R is a lower bound on the cost of an optimal solution. Moreover, the optimal solution F has weight w(F ) at most nR, and hence, we do not need to consider points at distances larger than nR from u . Since F contains at most 2n edges (because of Steiner points in PC STP ), if we consider an R 32n 2 -net S for X and replace every point in F with its closest net-point in S, the cost increases by at most · OPT. Hence, after rescaling, we can assume that inter-point distance is at least 1 and we consider distances up to O( n 3 ) = poly(n). By the packing property of doubling dimension (Fact 2.1), we can hence assume Hierarchical Nets. As in [BGK16], we consider some parameter s = (log n) c k ≥ 4, where 0 < c < 1 is a universal constant that is sufficiently small (as required in Lemma 6.2). Set L := O(log s n) = O( k log n log log n ). A greedy algorithm can construct N L ⊆ N L−1 ⊆ · · · ⊆ N 1 ⊆ N 0 = N −1 = · · · = X such that for each i, N i is an s i -net for N i−1 , where we say distance scale s i is of height i. Net-Respecting Solution. As defined in [BGK16], a graph F is net-respecting with respect to {N i } i∈ [L] and > 0 if for every edge {x, y} in F , both x and y belong to N i , where s i ≤ · d(x, y) < s i+1 .
Given an instance W of a problem, let OPT(W ) be an optimal solution; when the context is clear, we also use OPT(W ) to denote the cost c(OPT(W )), which includes both its weight and the incurred penalty; similarly, OPT nr (W ) refers to an optimal net-respecting solution.

Overview
We achieve a PTAS for PC X by a unified framework, which is based on the framework of sparse instance decomposition as in [BGK16,CJ16,CHJ16]. Sparse Solution [BGK16]. Given an edge set F and a subset S ⊆ X, F | S := {e ∈ F : e ⊆ S} is the edges in F totally contained in S. An edge set F is called q-sparse, if for all i ∈ [L] and all u ∈ N i , w(F | B(u,3s i ) ) ≤ q · s i . Sparsity Structural Property. (Revisited in Theorem 5.1) An important technical lemma [BGK16, Lemma 3.1] in this framework states that if a (net-respecting) solution F is sparse, then with constant probability, there is some (1+ )-approximate solution F that is structurally light with respect to some randomized hierarchical decomposition (see Section 5.1). Then, a bottom-up dynamic program (given in Section 6) based on the hierarchical decomposition searches for the best solution with the lightness structural property in polynomial time.
Remark 2.1. We observe that this technical lemma is used crucially in all previous works on PTAS's for TSP variants in doubling metrics. Hence, we believe that its proof should be verified rigorously. In Section 1, we outlined the technical issue in the original proof [BGK16], and this issue actually appeared as far as in the first paper on TSP for doubling metrics [Tal04]. In Section 5, we give a detailed description to complete the proof of this important lemma.
Sparsity Heuristic. As in [BGK16,CJ16,CHJ16], we estimate the local sparsity of an optimal net-respecting solution with a heuristic. For i ∈ [L] and u ∈ N i , given an instance W , the heuristic H (i) u (W ) is supposed to estimate the sparsity of an optimal net-respecting solution in the ball B := B(u, O(s i )). We shall see in Section 3 that the heuristic actually gives a constant approximation to some appropriately defined sub-instance W in the ball B . Divide and Conquer. Once we have a sparsity estimator, the original instance can be decomposed into sparse sub-instances, whose approximate solutions can be found efficiently. As we shall see, the partial solutions are combined with the following extension operator. The algorithm outline is described in Figure 2.
Definition 2.1 (Solution Extension). Given two partial solutions F and F of edges, we define the Analysis of Approximation Ratio. We follow the inductive proof as in [BGK16] to show that with constant probability (where the randomness comes from DP), ALG(W ) in Figure 2 returns a solution with expected length at most 1+ 1− · OPT nr (W ), where expectation is over the randomness of decomposition into sparse instances in Step 4.
As we shall see, in ALG(W ), the subroutine DP is called at most poly(n) times (either explicitly in the recursion or in the heuristic H (i) ). Hence, with constant probability, all solutions returned by all instances of DP have appropriate approximation guarantees.
Suppose F 1 and F 2 are solutions returned by DP(W 1 ) and ALG(W 2 ), respectively. We use c i as a shorthand for c W i , for i = 1, 2, and c as a shorthand for c W . Since we assume that W 1 is sparse enough and DP behaves correctly, achieving the desired ratio. Analysis of Running Time. As mentioned above, if H (i) u (W ) is found to be critical, then in the decomposed sub-instances W 1 and W 2 , H (i) u (W 2 ) should be small. Hence, it follows that there will be at most |X| · L = poly(n) recursive calls to ALG. Therefore, as far as obtaining polynomial running times, it suffices to analyze the running time of the dynamic program DP. The details are in Section 6.

Sparsity Estimator for PC X
Recall that in the framework outlined in Section 2, given an instance W of PC X , we wish to estimate the weight of OPT nr (W )| B(u,3s i ) with some heuristic H (i) u (W ). We consider a more general sub-instance associated with the ball B(u, ts i ) for t ≥ 1.
Generic Algorithm. We describe a generic framework that applies to PC X . Similar framework is also used in [BGK16, CJ16, CHJ16, CHJ16] to obtain PTAS's for TSP related problems. Given an instance W , we describe the recursive algorithm ALG(W ) as follows. This description is mostly the same with that in [CHJ16], except that the decomposition in Step 4 is more involved.
1. Base Case. If |W | = n is smaller than some constant threshold, solve the problem by brute force, recalling that is at most q 0 · s i , for some appropriate threshold q 0 , call the subroutine DP(W ) to return a solution, and terminate. 3. Identify Critical Instance. Otherwise, let i be the smallest height such that there exists 4. Divide and Conquer. Define a sub-instance W 1 from around the critical instance (possibly using randomness). Loosely speaking, W 1 is a sparse enough sub-instance induced in the region around u at distance scale s i . Since it is sparse enough, we apply the dynamic programming algorithm on W 1 and get solution F 1 . We define an appropriate sub-instance W 2 with the information of F 1 . Intuitively, W 2 captures the remaining sub-problem not included in W 1 . We emphasize that as opposed to previous work [BGK16,CJ16,CHJ16], W 2 can depend on F 1 (through the choice of the penalty function). Moreover, we ensure that any solution F 2 of W 2 can be extended to F 2 u F 1 as a solution for W , and the following holds: We solve W 2 recursively and suppose the solution is F 2 . We note that H (i) u (W 2 ) ≤ q 0 · s i , and hence the recursion will terminate. Moreover, the following property holds: where the expectation is over the randomness of the decomposition. We return F := F 2 u F 1 as a solution to W .
is characterized by terminal set W ∩ B(u, ts i ), equipped with penalties given by the same π. Using the classical (deterministic) 2-approximation algorithms by Goemans and Williamson for PC X [GW95], we obtain a 2-approximation and then make it net-respecting to produce solution F Defining the Heuristic. The heuristic is defined as H In order to show that the heuristic gives a good upper bound on the local sparsity of an optimal net-respecting solution, we need the following structural result in Proposition 3.1 [CHJ16, Lemma 3.2] on the existence of long chain in well-separated terminals in a Steiner tree. As we shall see, the corresponding argument for the case PC TSP is trivial.
Given an edge set F , a chain in F is specified by a sequence of points (p 1 , p 2 , . . . , p l ) such that there is an edge {p i , p i+1 } in F between adjacent points, and the degree of an internal point p i (where 2 ≤ i ≤ l − 1) in F is exactly 2.
Proposition 3.1 (Well-Separated Terminals Contains A Long Chain). Suppose S and T are sets in a metric space with doubling dimension at most k such that Diam(S ∪T ) ≤ D, and d(S, T ) ≥ τ D, where 0 < τ < 1. Suppose F is an optimal net-respecting Steiner tree covering the terminals in S ∪ T . Then, there is a chain in F with weight at least τ 2 4096k 2 · D such that any internal point in the chain is a Steiner point.
Lemma 3.1 (Local Sparsity Estimator). Let F be an optimal net-respecting solution for an instance W of PC X . Then, for any i ∈ [L], u ∈ N i and t ≥ 1, we have Proof. We follow the proof strategy in [CHJ16, Lemma 3.2], except that now a feasible solution needs not visit all terminals and can incur penalties instead. We denote B := B(u, ts i ) and Given an optimal net-respecting solution F for instance W of PC X , we shall construct another net-respecting solution in the following steps.
. Convert each added edge into a net-respecting path if necessary. Observe that the weight of edges added in this step is O( stk ) O(k) · s i . 4. So far we have accounted for every terminal inside B, which is either visited or charged with its penalty according to c(F (i,t+1) u ). We will give a more detailed description to ensure that the terminals outside B that are covered by F will still be covered by the new solution.
For PC TSP , we will show that this step can be achieved by increasing the weight by at most O( stk ) O(k) · s i ; for PC STP , this can be achieved by replacing some edges without increasing the weight. Hence, after the claim in Step 4 is proved, the optimality of F implies the result. Ensuring Terminals Outside B are accounted for. We achieve this by considering the following steps.
Recall that the goal is to make sure that all terminals outside B that are visited by C will also be visited in the new solution.
2. Pick some x in C ∩ B. If no such x exists, this implies that we have the trivial situation F | B = ∅. Let C ⊆ C be the maximal connected component containing x that is contained within B. Define S := C ∩ B (which contains x) and T := {y ∈ C ∩ B : ∃v / ∈ B, {y, v} ∈ F }, which corresponds to the points that are connected to the outside B. , which is the set of vertices in C that are directed connected by F to some point outside B. Again, the case that T = ∅ is trivial.
Case (a): There exists y ∈ T , d(u, y) ≤ (t + 1 2 )s i . In this case, this implies there is some v / ∈ B such that {y, v} ∈ F and d(y, v) ≥ s i 2 . Since F is net-respecting, this implies that y ∈ N j and hence, the component C (and also C) is already connected to H.
2 )s i . We next show that there is a long chain contained in C. For PC TSP , this is trivial, because we know that T contains only y, and C is a chain from a = x to b = y of length at least d(x, y) ≥ s i 2 . For PC STP , by the optimality of F , it follows that C is an optimal net-respecting Steiner tree covering vertices in S ∪ T . Hence, using Proposition 3.1, C contains some chain from a to b with length at least 4ηs i (where the constant in the Theta in the definition of η is chosen such that this holds).
Once we have found this chain from a to b, we remove the edges in this chain. Hence, we can use this extra weight to connect a and b to their corresponding closest points in N j via a net-respecting path; observe that for PC TSP , it suffices to connect only b = y to it closest point in N j .
Finally, observe that for PC TSP , it is possible to carry out the above procedures such that all vertices with odd degrees are in the minimum spanning tree H . Therefore, extra edges are added to ensure that the degree of every vertex is even to ensure the existence of an Euler circuit. This has extra cost at most w( This completes the proof. Corollary 3.1 (Threshold for Critical Instance). Suppose F is an optimal net-respecting solution for an instance W of PC X , and q ≥ Θ( sk ) Θ(k) . If for all i ∈ [L] and u ∈ N i , H

Decomposition into Sparse Instances
In Section 3, we define a heuristic H (i) u (W ) to detect a critical instance around some point u ∈ N i at distance scale s i . We next describe how the instance W of PC X can be decomposed into W 1 and W 2 such that equations (1) and (2) in Section 2.1 are satisfied. Decomposing a Critical Instance. We define a threshold q 0 := Θ( sk ) Θ(k) according to Corollary 3.1. As stated in Section 2.1, a critical instance is detected by the heuristic when a smallest i ∈ [L] is found for which there exists some u ∈ N i such that H ) > q 0 s i . Moreover, in this case, u ∈ N i is chosen to maximize H (i) u (W ). To achieve a running time with an exp(O(1) k log(k) ) dependence on the doubling dimension k, we also apply the technique in [CJ16] to choose the cutting radius carefully. ). Then, there exists 0 ≤ λ < k such that T(λ + 1) ≤ 30k · T(λ).

Proof.
A similar proof is found in [CHJ16], and we adapt the proof to include penalties of unvisited terminals. Suppose the contrary is true. Then, it follows that T(k) > (30k) k · T(0). We shall obtain a contradiction by showing that there is a solution for the instance W ) with small weight. Define N i to be the set of points in N i that cover B(u, (2k + 5)s i ).
We construct an edge set F that is a solution to the instance W (i,4+2k) u . For each v ∈ N i , we include the edges in the solution F (i,4) v , whose cost includes the edge weights and the penalties of unvisited terminals. By the choice of u, the sum of the costs of these partial solutions is at most |N i | · T(0).
We next stitch these solutions together by adding extra edges of total weight at most 2 · 2(2k + 5) · |N i | · s i ; for PC TSP , we make sure that the degree of every vertex is even to form an Euler tour.
Cutting Ball and Sub-Instances. Suppose λ ≥ 0 is picked as in Claim 4.1, and sample h ∈ [0, 1 2 ] uniformly at random. Define B := B(u, (4 + 2λ + h)s i ). The original instance W = (T, π) is decomposed into instances W 1 and W 2 as follows: • For W 1 = (T 1 , π 1 ), the terminal set is T 1 := (B ∩ T ) ∪ {u}, where for v = u π 1 (v) := π(v) and π 1 (u) := +∞. We denote the cost function associated with W 1 by c 1 . • Suppose F 1 is the (random) solution for instance W 1 (that covers u) returned by the dynamic program for sparse instances in Section 5. Then, instance W 2 = (T 2 , π 2 ) is defined with respect to F 1 . The terminal set is . Observe that the instance W 2 depends on F 1 through the choice of the penalty for u.
Lemma 4.1 (Combining Solutions of Sub-Instances). Suppose instance W 1 is defined with cost function c 1 and instance W 2 is defined with respect to F 1 of W 1 . Furthermore, suppose F 2 is a solution to instance W 2 , whose cost function is denoted as c 2 . Then, we have the following.
(i) Suppose F 1 is any solution to W 1 that contains u, and let F : . This implies (1) in Section 2.1. (ii) The sub-instance W 2 does not have a critical instance with height less than i, and H Proof. For the first statement, Definition 2.1 ensures that F is connected; for PC TSP , it suffices to observe that the union of two intersecting tours is also a tour. Hence, F is a feasible solution for the instance W of PC X .
We next give an upper bound on c(F ), by pessimistically considering the case that F 2 does not cover any terminal in B ∩ T .
For the case that F 2 covers u, we have F = F 2 ∪ F 1 and we have c(F ) = w( F 1 )+w( For the case that F 2 does not cover u, we have F = F 2 , and c(F ) = w( The second statement follows from the choice of i. Moreover, H (i) u (W 2 ) = 0 because in instance W 2 the only terminal in B is u 1 , which can be covered by a self-loop of weight 0.
For the third statement, we use the fact that there is no critical instance at height i − 1 to show that there is a solution to W 1 with small cost.
Moreover, we consider the solutions corresponding to H In order to stitch these partial solutions together, we add extra edges with weights at most |N i−1 | · O(s i ). Hence, the total cost for the solution to (any sub-instance of) W 1 is at most O(s) O(k) · q 0 · s i .

Lemma 4.2 (Combining Costs of Sub-Instances).
Suppose F is an optimal net-respecting solution for instance W of PC X . Then, for any realization of the decomposed sub-instances W 1 and W 2 as described above, there exist (not necessarily net-respecting) solution F 1 for W 1 and net-respecting solution F 2 for W 2 such that (1 − ) · E c 1 ( F 1 ) + E c 2 ( F 2 ) ≤ c W (F ), where the expectation is over the randomness to generate W 1 and W 2 . Recall that the randomness to generate W 1 and W 2 involves the random ball B and the randomness used in the dynamic program to generate F 1 to produce instance W 2 and its cost function c 2 .
Proof. Recall that the random ball B = B(u, (4 + 2λ + h) · s i ) for random h ∈ [0, 1 2 ], and denote B := B(u, (4 + 2λ + 1) · s i ), which is deterministic. For the trivial case V (F ) ∩ B = ∅, we choose F 1 := F 1 (which is the solution used to define W 2 and c 2 ) and F 2 := F . In this case, we have For the rest of the proof, we can assume that V (F ) ∩ B is non-empty. Moreover, the solution F 2 we are going to construct will always include u.
Denote V 1 := {x ∈ B | ∃y / ∈ B, {x, y} ∈ F }. We start by including F | B in F 1 , and including the remaining edges of F in F 2 . Then, we will add extra edges such that (i) in each of F 1 and F 2 , the vertices covered form a connected component and include V 1 , (ii) F 2 visits u, (iii) for PC TSP , every vertex has even degree.
Hence, all the terminals in V (F ) ∩ B are visited by F 1 and all terminals in V (F ) \ B are visited by F 2 . If we can show that these extra edges have expected total weight at most · E c 1 ( F 1 ) , then the lemma follows.
Define N to be the subset of N j that cover the points in B, where s j < δs i ≤ s j+1 and δ = Θ( k ). We include edges of a minimum spanning tree H of N in each of F 1 and F 2 , and make it net-respecting; for PC TSP , each edge in H can be included a constant number of times to ensure that the degree of every vertex is even. Furthermore, since V (F ) ∩ B is non-empty, even when V (F ) ∩ B is empty, it just takes one net-respecting path of length at most 2δs i to connect F 2 to H. The sum of the weights of edges added from H is at most To ensure the connectivity of both F 1 and F 2 , we add extra edges to ensure that in each of F 1 and F 2 , each point in V 1 is connected to some point in N , which is connected by edges in H.
Note that if such a point x ∈ V 1 is incident to some edge in F with weight at least s i 4 , then the net-respecting property of F implies that x is already in N . Otherwise, we need to connect x to some point in N with a net-respecting path of length at most 2δs i ; observe that this happens because some edge {x, y} in F is cut by B, which happens with probability at most O( d(x,y) s i ). Hence, each edge {x, y} ∈ F | B has an expected contribution of δs i · O( d(x,y) s i ) = O(δ) · d(x, y). Charging the Extra Costs to F 1 . Apart from using edges in F , the extra edges come from a constant number of copies of the minimum spanning tree H, and other edges with cost O(δ)·w(F | B ). We charge these extra costs to c 1 ( F 1 ).
Observe that the heuristic c(F 8 · s i , by choosing large enough q 0 . Therefore, the sum of weights of edges from H is at most O( ks ) O(k) · s i ≤ 2 · c 1 (F 1 ).
We next give an upper bound on w(F | B ), which is at most c(F Hence, by choosing small enough δ = Θ( k ), we can conclude that the extra edges has expected weight at most O(δ) · w(F | B ) ≤ 2 · c 1 ( F 1 ). Therefore, we have shown that E c 1 ( F 1 ) + E c 2 ( F 2 ) ≤ c(F ) + · c 1 ( F 1 ), where the right hand side is a random variable. Taking expectation on both sides and rearranging gives the required result.

Revisiting Hierarchical Decomposition and Sparse Instance Frameworks for TSP-like Problems
In this section, we revisit the randomized hierarchical framework that is used in all known PTAS's (and QPTAS's) for TSP-like problems in doubling metrics [Tal04, BGK16, CJ16, CHJ16]. As mentioned in Section 1 and Remark 2.1, in the original paper [Tal04], the randomness in the underlying hierarchical decomposition is first used to bound the increase in the cost of a solution to achieve some portal-respecting property. However, conditioned on the portal-respecting property, some more careful arguments should be required for the conditional randomness of the hierarchical decomposition.
Since this random hierarchical framework is widely used in subsequent works, we think it is worthwhile to revisit the framework and resolve any previous technical issues. In particular, in Section 5.1 we give a more precise definition and notation for cluster portals in a hierarchical decomposition. As a result of the modified definition of portals, in Section 5.2, we also revisit the analysis of the sparsity framework [BGK16] that was used to achieve the first PTAS for TSP on doubling metrics. Even though we use similar concepts in the modified framework, some arguments are quite different from previous proofs. In particular, below are highlights of the changes made in the modified framework: • We make use of a net tree to define portals with respect to a hierarchical decomposition. As a result, we also need to modify the notion of (m, r)-lightness for a solution.
• As opposed to previous approaches [Tal04,BGK16], when a solution uses too many active portals for a cluster, our patching argument does not rely on the small MST lemma [Tal04, Lemma 6].
• After a given solution is modified to observe the portal-respecting property, any newly edges actually depend on the randomness of the hierarchical decomposition. Hence, in order to use the randomness of the decomposition again, we give a new charging argument that, loosely speaking, maps a newly added edge back to an original edge that created it.

Randomized Hierarchical Decomposition Framework
Net Tree. Recall that given a metric space, we consider a sequence of hierarchical nets {N i } i as defined in Section 2. We define a net tree with respect to the hierarchical nets {N i } i in a way similar to [CLNS15]; for notational convenience, we assume that for all i ≤ 0, N i = X. For each height i and each u ∈ N i , there is some node (u, i) in the net tree; for notational convenience, for i < 0, (u, i) := (u, 0). The metric d can naturally be extended to nodes. Denote N i := {(u, i) | u ∈ N i }, and the tree has node set X := i N i . Notice we use point to refer to an element in X and a node to refer to an element in X. Observe that N L contains only one point r ∈ X, and the corresponding node (r, L) is the root of the net tree. The edges of the tree is defined by a parent function Par, mapping a non-root node to its parent. For i < L and u ∈ N i , define Par(u, i) := (v, i + 1), where v ∈ N i+1 is the closest point in N i+1 to u (breaking ties arbitrarily). For a point u ∈ X, define Anc j (u) ∈ N j be the height-j ancestor of (u, 0). In this section, we assume an underlying net tree is constructed.

Subgraph on Nodes.
Observe that a multi-graph G with vertex set in X naturally induces a multi-graph G with vertex set in X.
Recall that we consider multi-graphs because the solution for PC TSP needs to be Eulerian. We use the following decompositions as mentioned in [BGK16,CJ16,CHJ16].
Definition 5.1 (Single-Scale Decomposition [ABN06]). At height i, an arbitrary ordering π i is imposed on the net N i . Each net-point u ∈ N i corresponds to a cluster center and samples random h u from a truncated exponential distribution Exp i having density function t → χ χ−1 · ln χ s i · e − t ln χ s i for t ∈ [0, s i ], where χ = O(1) k . Then, the cluster at u has random radius r u := s i + h u . The clusters induced by N i and the random radii form a decomposition Π i , where a point p ∈ V belongs to the cluster with center u ∈ N i such that u is the first point in π i to satisfy p ∈ B(u, r u ). We say that the partition Π i cuts a set P if P is not totally contained within a single cluster.
The results in [ABN06] imply that the probability that a set P is cut by Π i is at most β·Diam(P )

Definition 5.2 (Hierarchical Decomposition). Given a configuration of random radii for
are induced as in Definition 5.1. At the top height L − 1, the whole space is partitioned by Π L−1 to form height-(L − 1) clusters. Inductively, each cluster at height i + 1 is partitioned by Π i to form height-i clusters, until height 0 is reached. Observe that a cluster has K := O(s) k child clusters. Hence, a set P is cut at height i iff the set P is cut by some partition Π j such that j ≥ i; this happens with probability at most j≥i Portals. We define portals with respect to some hierarchical decomposition. For a height-i cluster C, define its portals as {Anc j (u) | u ∈ C}, where j satisfies s j ≤ Θ( kL ) · s i < s j+1 . Observe that the same node in X could be a portal for several clusters of the same height. However, we emphasize that a portal p is naturally associated with some cluster that it is assigned. Hence, whenever we talk about a portal p, we implicitly mean that "p is a portal of some cluster C of height i", and say that p is a height-i portal for short. We use P i to denote the set of height-i portals, and denote P := ∪ i P i . Since a height-i cluster has diameter O(s i ), by packing property, each cluster has at most m := O( kLs ) k portals. Solutions on Portals. Observe that a multi-graph G with vertex P naturally induces a multi-graph with vertex set X (and a multi-graph with vertex set X) in the natural way. For a terminal t ∈ X, a multi-graph G solution on P visits t only iff G covers (t, i) for some i. Portal-Respecting Solution. Our algorithm works with solutions with vertex set P . A multigraph F with vertex set in P , is called portal-respecting with respect to some hierarchical decomposition, if for any edge e = {u, v} in F , where u is a portal of height-j cluster C and v is a portal of height-j cluster C with j ≥ j , it holds that • If j = j , then C and C have the same parent cluster; • If j > j , then j = j + 1 and C is a child cluster of C.
Active Portals. Suppose F is portal-respecting (with respect to some hierarchical decomposition). Consider a portal p of a height-i cluster C that is visited by F . We say that p is an active portal if p is connected (in F ) to a height-i portal of a sibling of C, or a height-(i + 1) portal of a parent cluster of C. (m, r)-Light Solution. A (multi-)graph F is called (m, r)-light, if it is portal-respecting for a hierarchical decomposition in which each cluster has at most m portals, and each cluster has at most r active portals.
Remark 5.1. Almost all previous works consider a solution as a subgraph with vertex set in the original metric space X. However, such solutions in the previous frameworks are implicitly induced by ones with portals P as the vertex set.
We have a unified notion of (m, r)-lightness that is the same for both PC TSP and PC STP . We next describe additional properties for a PC TSP solution that justify why our lightness notion does not count the number of times a tour visits a cluster.
Additional Structure for PC TSP . We consider additional properties of a portal-respecting solution, which is Eulerian.
Definition 5.3 (Crossing Portal Pair). A portal-respecting Eulerian tour crosses a cluster C through the (ordered) portal pair (p, q) (where p and q can be the same) if the node immediately preceding p and the node immediately succeeding q are portals of the parent or a sibling of C, and all the nodes (if any) visited from p to q are portals of (not necessarily proper) descendant clusters of C.
A portal-respecting Eulerian tour is economical if for every cluster C and every ordered portal pair (p, q), the tour crosses C through (p, q) at most once.
Definition 5.4 (Scratch and Removal). A portal-respecting Eulerian tour scratches a cluster C at portal p if the two nodes x and y that immediately go before and after p are both portals of the parent or a sibling of C. Hence, a scratch is a special case of crossing cluster C through (p, p).
Observe that the edge {x, y} is portal-respecting. Hence, if the portal p is visited in another part of the tour, the scratch at portal p of cluster C can be removed by using the shortcut {x, y}, without increasing the length of the tour.
Lemma 5.1 (Economical Tour). A portal-respecting Eulerian tour can be modified to be economical without increasing its length and still visit the same set of terminals.
Proof. Suppose the tour crosses some cluster C through the ordered pair (p, q) at least twice: E 1 , p, P 1 , q, E 2 , p, P 2 , q, E 3 , where the E's and P 's represent sequences of visited edges. Moreover, the nodes visited by the P 's are all portals of the descendant clusters of C.
Consider an alternative tour: E 1 , p, P 1 , q, P 2 , p, E 2 , q, E 3 , where S for an edge sequence S denotes the reverse of S. Then, observe that E 1 , p, P 1 , q, P 2 , p, E 2 induces only one crossing of C through (p, p). Moreover, the scratch of C at q induced by E 2 , q, E 3 can be removed as in Definition 5.4 since q is visited in E 1 , p, P 1 , q, P 2 , p, E 2 .
Hence, we have replaced two crossings of C at (p, q) by one crossing of C at (p, p). Notice that the above argument holds even in the case where p = q. Moreover, the number of edges in the tour is reduced by one due to the removal of the scratch of C at q so the procedure can only be carried out a finite number of times. Using this argument repeatedly gives the result of the lemma.

Sparsity Structural Lemma
We revisit the sparsity structural lemma in [BGK16, Lemma 3.1]. On a high level, the lemma says that given a net-respecting q-sparse solution T and an appropriate hierarchical decomposition, there exists an (m, r)-light solution with appropriate parameters m and r, whose length does not increase too much. Property of Hierarchical Decomposition. Recall that in the hierarchical decomposition, for each i and u ∈ N i , a random radius h (i) u is sampled from a truncated exponential distribution, and we define a random ball B(u, r  u ] ≥ 1 2 . Theorem 5.1 (Sparse Structural Property). Suppose T is an optimal net-respecting solution with points in X is q-sparse . Given any hierarchical decomposition, there is a way to transform T into an an (m, r)-light solution T on portals P that visits the same terminals as T , with m := O( kLs ) k , r := q · Θ(1) k + Θ( s ) k , such that where the randomness comes from the hierarchical decomposition. Furthermore, if T is Eulerian, then so is T .
Proof. Suppose some hierarchical decomposition is fixed. in T , where j satisfies s j ≤ Θ( kL ) · s i < s j+1 . Observe that every node in this path is an active portal, and we say these portals are activated by e. It is immediate that T is portal-respecting. Moreover, if T is Eulerian, then T is Eulerian as well.
For an active portal p, let f (p) be any edge in T that activates p. Observe that for any l, d(u, Anc l (u)) ≤ O(s l ). So, the weight of the path from u to Anc j (u) is at most O( kL ) · s i , and so is that from v to Anc j (v). By triangle inequality, d(Anc j (u), Therefore, the additional cost This cost occurs only if height i is the largest height at which e is cut, and it is of probability at most O(k) · w(e) s i , by Proposition 5.1. Summing this over all i, this cost is at most in expectation, conditioning on A. This completes the proof of Lemma 5.2.
Part II: (m, r)-light T . We shall define T from T , so that T is (m, r)-light. Examine each cluster C from higher height to lower height. Let r be the number of active portals of C.
• If r ≤ r, then we do nothing and proceed to the next cluster.
• Otherwise r > r. Apply the following patching procedure in Lemma 5.3 to C.
Lemma 5.3 (Patching Lemma). As defined above, T is a portal respecting solution. Suppose C is a height-i cluster with active portal set R. Recall that by definition, an active portal is connected by an edge to a portal of the parent or a sibling cluster of C; let R be the set of such portals that portals in R connect to. Let E(R, R) be the edge set beween R and R. Then, T can be modified such that the following holds.
1. The modified solution is still portal-respecting.
2. The number of active portals for any cluster is not increased.
3. There is at most one active portal of C.

The resulting solution has cost increased by at most
Proof. Let R be the active portal set of C. Observe that all portals in R are height-i or height-(i+1) portals.
2. Remove edges between R\{u} and R.
3. Patching inside. Consider the subgraph G of the original T induced on the portals of C and C's descendant clusters. This graph may also be viewed as the "inside" C part of T , after removing edges from R to R. Then, there exist at most O(|R|) edges each of length O(s i ), adding which makes G a connected (also Eulerian in the case of PC TSP ) graph. Include these edges to T .
4. Patching outside. For each active portal a ∈ R, consider the set of points R a ⊆ R that are connected to a before step 2 (where the edges are removed). Denote the removed edges between R a and a as E a . We add a minimum spanning tree on R a and then connect it to u; observe that the edges E a together with the edge {a, u} is a connected subgraph covering R a and u. Hence, the additional cost is at most O(E a + s i ).
If the original graph T is Eulerian, we can add some edges to make the resulting graph Eulerian as well. Notice that in either case, the additional cost for patching the crossing edges of a is bounded by O(E a + s i ).
We note that the resultant solution is still portal respecting. This is because we are only deleting and adding edges between portals of sibling clusters or those of child and parent clusters which are, by definition, still portal respecting. Hence, item 1 follows. Items 2 -4 follows from the above procedure as well. This completes the proof of Lemma 5.3.
Lemma 5.4. Suppose T is the solution obtained after applying the patching procedure to all appropriate clusters. Then, E[w(T ) − w(T ) | A] ≤ · w(T ).
Proof. Observe that the weight increase of T from T is due to the patching. Consider a height-i cluster C to which the patching is applied. Then, just before the patching, C has r > r active portals. By Lemma 5.3, the increase of weight is at most O(w(E(R, R))) + O(r ) · s i . We charge this cost to the active portals. For each active portal a, let R a ⊆ R be the portals in R that is connected to a, and E a be the edges between R a and a. Then, the portal a is charged with cost O(w(E a )) + O(s i ).
We first give an upper bound for w(E a ). Observe that each node in R a is a height i or i + 1 portal and by packing property, there are at most O(s) k · m such portals. Since each edge in E a is of length at most O(s i+1 ), it follows that this part of the cost is at most O(s) k · m · s i+1 for all clusters with a being an active portal. The bound also dominates the second term O(s i ).
Hence, a height-i portal takes charge at most O(s) k · O(m) · s i+1 . Charging Argument. We shall ultimately charge the cost to the original solution T . Observe that a height-i portal is charged only if it belongs to some cluster that is patched, so it is sufficient to charge to T whenever some cluster is patched.
Suppose C is a cluster of height-i that is patched, and R is the set of active portals before patching, where |R| > r. We shall somehow charge the cost received by its portals to T . By Lemma 5.3, the patching procedure does not introduce new active portals. Hence, all active portals come from T (actually all nodes in T are active by construction). Let R j := {p ∈ R | h(f (p)) = j}, recalling that h(e) is defined to be the largest height at which some edge e in T is cut, and f (p) is some edge in T that caused the portal p to be added in Part I to produce T .
Then, {R j } j is a partition of R. Also, it is immediate that |R j | = 0 if j < i.
and R (short) j := {p ∈ R j | w(f (p)) ≤ s j }. Portals activated by long edges. Consider an edge e ∈ E j such that w(e) > s j . Because T is net-respecting, both endpoints of e are in N j for s j ≤ · s j < s j +1 . This implies that all active portals that e activates correspond to some points in N j .
Observing that j ≥ i, by the packing property, there are at most O( s ) k such portals in the height-i cluster C. Therefore, |R (long) j | ≤ O( s ) k . Portals activated by short edges. Consider an edge e in E j such that w(e) ≤ s j . By definition, e is cut at height j but is not cut at height-(j + 1). So each such cut must be contributed by height-j clusters. Notice that at least one endpoint of e is within distance 2s i to cluster C. By the definition of short edges, the other endpoint of e is also within a distance of 2s i + s j from C. Since j ≥ i, it follows that each cluster that cuts some short edges in R (short) j is within distance 4s j of the center of C. By the packing property, there are at most O(1) k number of such clusters. By event A, each such cluster contributes at most O(q · k) cut of edges. Since each edge can only activate at most one portal in one cluster (in Part I), we have Combining the two cases completes the proof of Lemma 5.5.
Hence, each portal in R ≥i+j still takes O(s) k ·O(m)·s i+1 , with a slightly larger hidden constant. Then, we further charge this cost for each p ∈ R ≥i+j to f (p). Expected Charge. Finally, we use the randomness of hierarchical decomposition to bound the expected cost.
Consider some e in T . Observe that e is charged only from a portal of height at most h(e) − j . By definition, the number of portals from each height that are activated by e is at most 2.
Therefore, e takes cost However, by definition, e takes this charge only if it is cut at height h(e), and this is with probability at most O(k) · w(e) s h(e) . Therefore, by summing the contribution for all L possible values of h(e), the expected charge that e takes conditioned on A is at most This completes the proof of Lemma 5.4.

Constant Probability Bound. Combining Lemma 5.2 and Lemma 5.4, we have E[w(T ) | A] ≤
(1 + ) · w(T ). Observe that the optimality of T implies that w(T ) ≥ 1 1+ · w(T ). We let B to be the event that w(T ) ≤ w(T ).
If Pr[B | A] ≥ 1/2, then we are done. Otherwise, we have By Markov's Inequality, we have Then, we can bound the following probability as where the last inequality follows because Pr[B | A] ≤ 1 2 . This completes the proof of Theorem 5.1.
Observe that Theorem 5.1 assumes that some good event A related to the hierarchical decomposition happens. However, the event A is defined with respect to the unknown optimal net-respecting solution. By Proposition 5.2, it is sufficient to sample a collection of O(log n) random radii for each ball in the hierarchical decomposition. Then, with constant probability, there is a way for each ball to choose its radius from its collection to satisfy event A.
The dynamic program searches for (m, r)-solutions with respect to all possible hierarchical decompositions obtained from the choosing the radius for each ball from the collection of sampled radii. We show that this does not blow up the search space too much by first describing the information needed to identify each cluster at each height. Information to Identify a Cluster.
1. Height i and cluster center u ∈ N i . This has L · O(n k ) combinations, recalling that |N i | ≤ O(n k ). 2. For each j ≥ i, and v ∈ N j such that d(u, v) ≤ O(s j ), the random radius chosen by (v, j).
Observe that the space around B(u, O(s i )) can be cut by net-points in the same or higher heights that are nearby with respect to their distance scales. As argued in [BGK16], the number of configurations that are relevant to (u, i) is at most O(log n) L·O(1) k = n O(1) k , where L = O(log s n) and s = (log n) c k , for some sufficiently small universal constant 0 < c < 1. 3. For each j > i, which cluster at height j (specified by the cluster center v j ∈ N j ) contains the current cluster at height i. This has O(1) kL = n O( k 2 log log n ) combinations. Since it is always possible to assign a direction to each edge of a tour, our algorithm for PC TSP works on a tour in which every edge is assigned a direction.
Let m and r be defined as in Theorem 5.1. Entries of DP. A DP entry is identified as (C, R, P ), where each field is explained as follows.
• C denotes a cluster.
• R denotes the subset of active portals (as defined in Section 5.1) of C with |R| ≤ r.

•
-In PC TSP , P is a set of distinct 5 ordered pairs (p, q) of R (where we allow p = q), such that each portal in R appears in at least one pair in P . An ordered pair (p, q) in P means that the solution tour crosses cluster C through (p, q), in the sense of Definition 5.3.
-In PC STP , P is a partition of R, where each part U in P corresponds to a connected component in the solution restricted to C that connects to portals outside the cluster via the portals in U .
Lemma 6.1 (Number of Entries). The number of entries (C, R, P ) is at most n O(1) k · O(m2 r ) r .
Proof. As discussed earlier, the number of cluster C is at most n O(1) k . With a fixed C, R is chosen from m portals, and that |R| ≤ r. Hence, the number of possibilities for R is at most m ≤r ≤ O(m) r . Finally, P has at most 2 r 2 possibilities, in either PC X problem.
Therefore, the number of entries is at most n O(1) k · O(m2 r ) r .
Invariant: Value of an Entry. The value of an entry (C, R, P ), defined as v(C, R, P ), is the cost of the minimum cost graph F defined on portals of cluster C and its descendants, whose penalty is with respect to the terminals in C and (R, P ) gives the connectivity requirements as follows.
• If R = ∅, then the value is the solution satisfies the same connectivity requirements as PC X restricted to the sub-instance induced by C.
• Otherwise, we have: -For PC TSP , F can be partitioned into directed paths, such that each pair (p, q) ∈ P corresponds to a directed path from p to q in F . The special case p = q corresponds to a directed cycle containing p, where a degenerate cycle just containing p with no edges is allowed.
-For PC STP , F is a forest, where two portals in R are in the same connected component iff they are in the same part in P .
Evaluating a Subproblem. Suppose E := (C, R, P ) is an entry to be evaluated. If C is a height-0 cluster containing only one point, then it is the base case, and its value is easily computed. Otherwise, enumerate all possible configurations for C's child clusters I := {(C i , R i , P i )} C i ∈Children(C) . Then, enumerate all graphs G between the portals in R and R i 's such that each edge either connects (i) a node from R to one in R i , (ii) two nodes from different R i 's) or (iii) two nodes from R, where edges are directed for PC TSP . Additional edges are added among nodes within each R i to form an augmented graph G in the following way.
• For PC TSP , for each i and each pair (p, q) ∈ P i such that p = q, add a directed edge (p, q) to G.
• For PC STP , for each i, for each part U in P i , edges from an arbitrary spanning tree on U is added to G.
The following procedure checks whether the graph G is consistent with (R, P ). Consistency Checking. It is consistent, if all the following are true.
1. If R = ∅, then for every i, every portal in R i is connected to some portal in R in G. Otherwise, all portals in i R i are connected in G.

2.
• For PC TSP . If R = ∅, then the directed graph G is Eulerian, i.e., the in-degree of every node equals its out-degree. Otherwise, we check that G can be partitioned into directed paths specified by the pairs in P , where each pair (p, q) corresponds to a directed path p from q in G. For p = q, this is a cycle containing p, where a degenerate cycle with an isolated p is allowed. A brute force way is to consider all permutations of the edges in G and interleave the permutation with the pairs in P .
• For PC STP . If R = ∅, then G is a forest such that two portals in R are in the same connected component iff they are in the same part in P .
If they are consistent, then the configuration (I, G) is a candidate configuration for entry E = (C, R, P ). The value for a candidate configuration shall be defined in the following, and and this value is a candidate value for E. The final value for E is the minimum over all candidate values. Evaluating a Candidate Value: • If R = ∅, the candidate value is min{w(G) + i:R i =∅ v(C i , R i , P i ) + i:R i =∅ π(C i ), min i:R i =∅ {v(C i , R i , P i ) + π(C\C i )}}, where w(G) is the weight of edges in G.
• Otherwise, the candidate value is w(G) + i:R i =∅ v(C i , R i , P i ) + i:R i =∅ π(C i ).
Final Solution. The final solution is corresponding to (C L , ∅, ∅), where C L is the only height-L cluster. It is easy to check that the value defined in this way satisfies the invariant. Moreover, a solution may be constructed from the values of entries.
Lemma 6.2 (Running Time). The time complexity of the DP is n O(1) k · exp √ log n · O( k ) O(k) .
Proof. Recall that the algorithm first enumerates an entry E := (C, R, P ), and then enumerate possible configurations of child entries I := {(C i , R i , P i )}, and a graph G on R and the R i 's. As in Lemma 6.1, the number of entries E is at most n O(1) k · O(mr) r . Suppose E is fixed, and suppose C is of height-i. We shall upper bound the number of child configurations I. Observe that there are at most O(s) k child clusters. As noted in [BGK16], the child clusters have to be consistent with C on all heights at least i. Therefore, only radii on height-(i−1) are not fixed, and this implies the number Since G has at most r · O(s) k vertices, and G is a simple directed graph, the number of G is at most 2 O(r)·O(s) k .
For PC TSP , a brute force way to check consistency between G and (R, P ) takes time at most In conclusion, the time complexity is n O(1) k · O(mr) r 2 ·O(s) O(k) . Plugging in m and r defined in Theorem 5.1 and the value of q 0 , the time complexity is where the inequality is by choosing a small enough c in s := (log n) c k .