Approximation algorithms for covering vertices by long paths

Given a graph, the general problem to cover the maximum number of vertices by a collection of vertex-disjoint long paths seemingly escapes from the literature. A path containing at least $k$ vertices is considered long. When $k \le 3$, the problem is polynomial time solvable; when $k$ is the total number of vertices, the problem reduces to the Hamiltonian path problem, which is NP-complete. For a fixed $k \ge 4$, the problem is NP-hard and the best known approximation algorithm for the weighted set packing problem implies a $k$-approximation algorithm. To the best of our knowledge, there is no approximation algorithm directly designed for the general problem; when $k = 4$, the problem admits a $4$-approximation algorithm which was presented recently. We propose the first $(0.4394 k + O(1))$-approximation algorithm for the general problem and an improved $2$-approximation algorithm when $k = 4$. Both algorithms are based on local improvement, and their theoretical performance analyses are done via amortization and their practical performance is examined through simulation studies.


Introduction
Path Cover (PC) is one of the most well-known NP-hard optimization problems in algorithmic graph theory [11], in which given a simple undirected graph G = (V, E) one wishes to find a minimum collection of vertex-disjoint paths that cover all the vertices, that is, every vertex of V is on one of the paths. It has numerous applications from the real life, such as transportation networks, communication networks and networking security. In particular, it includes the Hamiltonian path problem [11] as a special case, which asks for the existence of a single path covering all the vertices. The Hamiltonian path problem is NP-complete; therefore, the PC problem cannot be approximated within ratio 2 if P ̸ = NP. In fact, to the best of our knowledge, there is no o(|V |)-approximation algorithm for the PC problem. In the literature, several alternative objective functions have been proposed and studied [3,1,18,2,19,4,12,5]. For example, Berman and Karpinski [3] tried to maximize the number of edges on the paths in a path cover, which is equal to |V | minus the number of paths, and proposed a 7/6-approximation algorithm. Chen et al. [4,5] showed that finding a path cover with the minimum total number of order-1 and order-2 paths (where the order of a path is the number of vertices on the path; i.e., singletons and edges) can be done in polynomial time, but it is NP-hard to find a path cover with the minimum number of paths of order at most ℓ when ℓ ≥ 3.
Recently, Kobayashi et al. [15] generalized the problem studied by Chen et al. [4,5] to assign a weight representing its profit or cost to each order-ℓ path, with the goal of finding a path cover of the maximum weight or the minimum weight, respectively. For instance, when the weight f (ℓ) of an order-ℓ path is f (ℓ) = 1 for any ℓ ≤ k and f (ℓ) = 0 for any ℓ ≥ k + 1, where k is a fixed integer, the minimization problem reduces to the problem studied by Chen et al. [4,5]; when f (ℓ) is f (ℓ) = 1 for any ℓ ≤ k and f (ℓ) = +∞ for any ℓ ≥ k + 1, the minimization problem is the so-called k-path partition problem [20,16,8,7,9,6]; when f (ℓ) is f (ℓ) = 0 for any ℓ ≤ |V | − 1 but f (|V |) ̸ = 0, the maximization problem reduces to the Hamiltonian path problem.
Given an integer k ≥ 4, in the special case where f (ℓ) = 0 for any ℓ < k but f (ℓ) = ℓ for any ℓ ≥ k, the maximization problem can be re-phrased as to find a set of vertex-disjoint paths of order at least k to cover the most vertices, denoted as MaxP k+ PC. The MaxP 4+ PC problem (i.e., k = 4) is complementary to finding a path cover with the minimum number of paths of order at most 3 [4,5], and thus it is NP-hard. Kobayashi et al. [15] presented a 4-approximation algorithm for MaxP 4+ PC by greedily adding an order-4 path or extending an existing path to the longest possible.
For a fixed integer k ≥ 4, the MaxP k+ PC problem is NP-hard too [15]; to the best of our knowledge there is no approximation algorithm designed directly for it. Nevertheless, the MaxP k+ PC problem can be cast as a special case of the Maximum Weighted (2k − 1)-Set Packing problem [11], by constructing a set of ℓ vertices when they are traceable (that is, they can be formed into a path) in the given graph and assigning its weight ℓ, for every ℓ = k, k + 1, . . . , 2k − 1. (This upper bound 2k − 1 will become clear in the next section.) The Maximum Weighted (2k − 1)-Set Packing problem is APX-complete [13] and the best known approximation guarantee is k − 1 63,700,992 + ϵ for any ϵ > 0 [17]. In this paper, we study the MaxP k+ PC problem from the approximation algorithm perspective. The problem and its close variants have many motivating real-life applications in various areas such as various (communication, routing, transportation, optical etc.) network design [14]. For example, when a local government plans to upgrade its subway infrastructures, the given map of rail tracks is to be decomposed into multiple disjoint lines of stations, each of which will be taken care of by a team of workers. Besides being disjoint so that while some lines are under construction the other lines can function properly, each line is expected long enough for the team to work on continuously during a shift without wasting time and efforts to move themselves and materials from one point to another. Viewing the map as a graph, the goal of planning is to find a collection of vertex-disjoint long paths to cover the most vertices (and of course, possibly under some other real traffic constraints).
We contribute two approximation algorithms for the MaxP k+ PC problem, the first of which is a (0.4394k + O(1))-approximation algorithm for any fixed integer k ≥ 4, denoted as Approx1. We note that Approx1 is the first approximation algorithm directly designed for the MaxP k+ PC problem, and it is a local improvement algorithm that iteratively applies one of the three operations, addition, replacement and double-replacement, each takes O(|V | k ) time and covers at least one more vertex. While the addition and the replacement operations have appeared in the 4-approximation algorithm for the MaxP 4+ PC problem [15], the doublereplacement operation is novel and it replaces one existing path in the current path collection with two new paths. At termination, that is, when none of the local improvement operations is applicable, we show by an amortization scheme that each path P in the computed solution is attributed with at most ρ(k)n(P ) vertices covered in an optimal solution, where n(P ) denotes the order of the path P and ρ(k) ≤ 0.4394k + 0.6576 for any k ≥ 4.
The second O(n 8 )-time algorithm, denoted as Approx2, is for the MaxP 4+ PC problem. Besides the three operations in Approx1, we design two additional operations, re-cover and look-ahead. The re-cover operation aims to increase the number of 4-paths in the solution, and the look-ahead operation covers at least one more vertex by trying multiple paths equivalent to an existing path in the current solution in order to execute a replacement operation. With these two more local improvement operations, we design a refined amortization scheme to show that, on average, each vertex covered in the computed solution is attributed with at most two vertices covered in an optimal solution. That is, Approx2 is a 2-approximation algorithm for the MaxP 4+ PC problem. We also show a lower bound of 16 9 on the worst-case performance ratio of Approx2.
The rest of the paper is organized as follows. In Section 2, we introduce the basic notations and definitions. Section 3 is devoted to the MaxP k+ PC problem, where we present the Approx1 algorithm and its performance analysis. In Section 4, we present the Approx2 algorithm for the MaxP 4+ PC problem, and outline the performance analysis. Due to space limit, while we provide most part of the performance analyses for both algorithms here to convince the readers, some technical details are left out to the full version of the paper. We conclude the paper in the last section with some possible future work.

Preliminaries
For a fixed integer k ≥ 4, in the MaxP k+ PC problem, we are given a simple undirected graph and want to find a collection of vertex-disjoint paths of order at least k to cover the maximum number of vertices. We consider simple undirected graphs in this paper and we fix a graph G for discussion. Let V (G) and E(G) denote its vertex set and edge set in the graph G, respectively. We simplify V (G) and E(G) as V and E, respectively, when the underlying graph is clear from the context. We use n(G) to denote the order of G, that is, n = |V | is the number of vertices in the graph. A subgraph S of G is a graph such that V (S) ⊆ V (G) and E(S) ⊆ E(G); and likewise, n(S) = |V (S)| denotes its order. Given a subset of vertices R ⊆ V , the subgraph of G induced on R is denoted as G[R], of which the vertex set is R and the edge set contains all the edges of E each connecting two vertices of R. A (simple) path P in G is a subgraph of which the vertices can be ordered into (v 1 , v 2 , . . . , v n(P ) ) such that E(P ) = {{v i , v i+1 }, i = 1, 2, . . . , n(P ) − 1}. A path of order ℓ is called an ℓ-path (also often called a length-(ℓ − 1) path in the literature).
In this paper we are most interested in paths of order at least 4. In the sequel, given an ℓ-path P with ℓ ≥ 4, we let u j denote the vertex of P at distance j from one ending vertex of P , for 0 ≤ j ≤ ⌈ ℓ 2 ⌉ − 1, and v j denote the vertex of P at distance j from the other ending vertex, for 0 ≤ j ≤ ⌊ ℓ 2 ⌋ − 1. When ℓ is odd, then the center vertex of the path is u ℓ−1

53:4 Covering Vertices by Long Paths
Given another path Q, let Q − P denote the subgraph of Q by removing those vertices in V (P ), and the edges of E(Q) incident at them, from Q. Clearly, if V (P ) ∩ V (Q) = ∅, then Q − P = Q; otherwise, Q − P is a collection of sub-paths of Q each has at least one endpoint that is adjacent to some vertex on P through an edge of E(Q). For a collection P of vertex-disjoint paths, it is also a subgraph of G, with its vertex set V (P) = ∪ P ∈P V (P ) and edge set E(P) = ∪ P ∈P E(P ). We similarly define Q − P to be the collection of sub-paths of Q after removing those vertices in V (P) from V (Q), together with the edges of E(Q) incident at them. Furthermore, for another collection Q of vertex-disjoint paths, we can define Q − P analogously, that is, Q − P is the collection of sub-paths of the paths in Q after removing those vertices in V (P) from V (Q), together with the edges of E(Q) incident at them.

▶ Definition 1 (Associatedness). Given two collections
One sees that a path S of Q − P can be associated with zero to two vertices in V (P) ∩ V (Q), and conversely, a vertex of V (P) ∩ V (Q) can be associated with zero to two paths in Q − P.
If the paths of the collection P all have order at least k, then the vertices of V (P) are said covered by the paths of P, or simply by P.
Note that there could be many extensions at v, and we use n(v) = max e(v) n(e(v)) to denote the order of the longest extensions at the vertex v.
▶ Lemma 2. Given two collections P and Q of vertex-disjoint paths in the graph G, for every vertex v ∈ V (P), any path of Q − P associated with v has order at most n(v).
Proof. The lemma holds since Q − P is a subgraph of the induced subgraph G[V − V (P)], that is, every path of Q−P is a path in G[V −V (P)], and the associatedness (see Definition 1) is a special adjacency through an edge of E(Q). ◀ Our goal is to compute a collection of vertex-disjoint paths of order at least k, such that it covers the most vertices. In our local improvement algorithms below, we start with the empty collection P = ∅ to iteratively expand V (P) through one of a few operations, to be defined later. Notice that for an ℓ-path with ℓ ≥ 2k, one can break it into a k-path and an (ℓ − k)-path by deleting an edge. Since they cover the same vertices, we assume without loss of generality hereafter that any collection P inside our algorithms contains vertex-disjoint paths of order in between k and 2k − 1, inclusive.

A (0.4394k + 0.6576)-approximation algorithm for MaxP k+ PC
For a given integer k ≥ 4, the best known approximation algorithm for the Maximum Weighted (2k − 1)-Set Packing problem leads to an O(n 2k−1 )-time (k − 1 63,700,992 + ϵ)-approximation algorithm for the MaxP k+ PC problem, for any ϵ > 0 [17]. In this section, we define three local improvement operations for our algorithm for the MaxP k+ PC problem, denoted as Approx1. We show later that its time complexity is O(n k+1 ) and its approximation ratio is at most 0.4394k + 0.6576.
For the current path collection P, if there is a path covering k-vertices outside of V (P), then the following operation adds the k-path into P.
▶ Operation 3. For a k-path P in the induced subgraph G[V − V (P)], the Add(P ) operation adds P to P.
Since finding a k-path in the induced subgraph G[V − V (P)], for any P, can be done in O(n k ) time, determining whether or not an addition operation is applicable, and if so then applying it, can be done in O(n k ) time too. Such an operation increases |V (P)| by k.
Recall that a path P ∈ P is represented as Though it is undirected, we may regard u 0 the head vertex of the path and v 0 the tail vertex for convenience. The next operation seeks to extend a path of P by replacing a prefix (or a suffix) with a longer one.
▶ Operation 4. For a path P ∈ P such that there is an index t and an extension e Similarly, one sees that finding an extension e(u t ) (of order at most k − 1, or otherwise an Add operation is applicable) in the induced subgraph G[V −V (P)], for any vertex u t ∈ P ∈ P, can be done in O(n k−1 ) time. Therefore, determining whether or not a prefix or a suffix replacement operation is applicable, and if so then applying it, can be done in O(n k ) time. Note that such an operation increases |V (P)| by at least 1.
The third operation tries to use a prefix and a non-overlapping suffix of a path in P to grow them into two separate paths of order at least k.
▶ Operation 5. For a path P ∈ P such that (i) there are two indices t and j with j ≥ t + 1 and two vertex-disjoint extensions e(u t ) and or there are two indices t and j and two vertex-disjoint extensions e(u t ) and e(v j ) with n(e(u t )) ≥ k − (t + 1) and n(e(v j )) ≥ k − (j + 1), the DoubleRep(P ) operation replaces P by two new paths Note that finding an extension e(u t ) (of order at most t, or otherwise a Rep operation is applicable) can be limited to those indices t ≥ k−1 2 . Furthermore, we only need to find an extension e(u t ) of order at most k−1 2 (equal to k−1 2 only if t = k−1 2 ). For the same reason, we only need to find an extension e(v j ) of order at most k−1 2 for those indices j ≥ k−1 2 (equal to k−1 2 only if j = k−1 2 ). Since k ≤ n(P ) ≤ 2k − 1 for any P ∈ P, we only need to find an extension e(u j ) of order at most k−1 2 too (equal to k−1 2 only if j = n(P )−1 2 and n(P ) = k). In summary, finding the two vertex-disjoint extensions e(u t ) and e(u j ), or e(u t ) and e(v j ), in the induced subgraph G[V − V (P)] can be done in O(n k−1 ) time (in Θ(n k−1 ) for at most 2 pairs of t and j). It follows that determining whether or not a double replacement operation is applicable, and if so then applying it, can be done in O(n k ) time. Also, such an operation increases |V (P)| by at least 1 as the total number of vertices covered by the two new paths P 1 and P 2 is at least 2k. We summarize the above observations on the three operations into the following lemma. Given a graph G = (V, E), our approximation algorithm for the MaxP k+ PC problem, denoted as Approx1, is iterative. It starts with the empty collection P = ∅; in each iteration, it determines whether any one of the three operations Add, Rep and DoubleRep is applicable, and if so then it applies the operation to update P. During the entire process, P is maintained to be a collection of vertex-disjoint paths of order in between k and 2k − 1. The algorithm terminates if none of the three operations is applicable for the current P, and returns it as the solution. A simple high level description of the algorithm Approx1 is given in Algorithm 1. From Lemma 6, we see that each operation improves the collection P to cover at least one more vertex. Therefore, the overall running time of Approx1 is in O(n k+1 ).

Algorithm 1 Approx1 (high level description).
Input: A graph G = (V, E); 1. initialize P = ∅; 2. while (one of the operations Add, Rep, DoubleRep is applicable) 2.1 apply the operation to update P; 2.2 break any path of order 2k or above into two paths, one of which is a k-path; 3. return the final P.
Below we fix P to denote the collection of paths returned by our algorithm Approx1. The next three lemmas summarize the structural properties of P, which are useful in the performance analysis. Proof. The lemma holds due to the termination condition of the algorithm Approx1, since otherwise an Add operation is applicable. ◀ ▶ Lemma 8. For any path P ∈ P, n(u j ) ≤ j and n(v j ) ≤ j for any index j.
Proof. The lemma holds due to the termination condition of the algorithm Approx1, since otherwise a Rep operation is applicable. ◀ ▶ Lemma 9. Suppose there is a vertex u t on a path P ∈ P and an extension e(u t ) with n(e(u t )) ≥ k − t − 1. Then, (i) for any vertex u j with j ≥ t + 1, j ≤ k − 2 and every extension e(u j ) vertex-disjoint to e(u t ) has order n(e(u j )) ≤ k − j − 2; (ii) for any index j, every extension e(v j ) vertex-disjoint to e(u t ) has order n(e(v j )) ≤ k − j − 2.
Proof. The lemma holds due to the termination condition of the algorithm Approx1.
First, for any vertex u j with j ≥ t + 1, an extension e(u j ) vertex-disjoint to e(u t ) has order n(e(u j )) ≤ k − (n(P ) − j) − 1 since otherwise a DoubleRep operation is applicable. Using the fact that n(P ) ≥ 2j Next, similarly, for any index j, a vertex-disjoint extension e(v j ) to e(u t ) has order n(e(v j )) ≤ k − j − 2 since otherwise a DoubleRep operation is applicable. ◀ We next examine the performance of the algorithm Approx1. We fix Q to denote an optimal collection of vertex-disjoint paths of order at least k that covers the most vertices. We apply an amortization scheme to assign the vertices of V (Q) to the vertices of V (P) ∩ V (Q). We will show that, using the structural properties of P in Lemmas 7-9, the average number of vertices received by a vertex of V (P) is upper bounded by ρ(k), which is the approximation ratio of Approx1.
In the amortization scheme, we assign the vertices of V (Q) to the vertices of V (P) ∩ V (Q) as follows: Firstly, assign each vertex of V (Q) ∩ V (P) to itself. Next, recall that Q − P is the collection of sub-paths of the paths of Q after removing those vertices in V (Q) ∩ V (P). By Lemma 7, each path S of Q − P is associated with one or two vertices in V (Q) ∩ V (P). If the path S is associated with only one vertex v of V (Q) ∩ V (P), then all the vertices on S are assigned to the vertex v. If the path S is associated with two vertices v 1 and v 2 of V (Q) ∩ V (P), then a half of the vertices on S are assigned to each of the two vertices v 1 and v 2 . One sees that in the amortization scheme, all the vertices of V (Q) are assigned to the vertices of V (P) ∩ V (Q); conversely, each vertex of V (P) ∩ V (Q) receives itself, plus some fraction of or all the vertices on one or two paths of Q − P. (We remark that the vertices of V (P) − V (Q), if any, receive nothing.) ▶ Lemma 10. For any vertex u j on a path P ∈ P with j ≤ k − 2, if n(u j ) ≤ k − j − 2, then u j receives at most 3 2 min{j, k − j − 2} + 1 vertices. Proof. By Lemma 8, we have n(u j ) ≤ j. Therefore, n(u j ) ≤ min{j, k − j − 2}. By Lemma 2, any path of Q − P associated with u j contains at most min{j, k − j − 2} vertices.
If there is at most one path of Q − P associated with u j , then the lemma is proved. Consider the remaining case where there are two paths of Q − P associated with u j . Since 2 min{j, k − j − 2} + 1 ≤ j + (k − j − 2) + 1 = k − 1, while the path Q ∈ Q containing the vertex u j has order at least k, we conclude that one of these two paths of Q − P is associated with another vertex of V (Q) ∩ V (P). It follows from the amortization scheme that u j receives at most 3 2 where mod is the modulo operation.
Proof. The formula can be directly validated by distinguishing the two cases where k is even or odd, and using the fact that min{j, k − j − 2} = j if and only if j ≤ ⌊ k 2 ⌋ − 1.
Proof. Recall that all the vertices of V (Q) are assigned to the vertices of V (Q) ∩ V (P), through our amortization scheme. Below we estimate for any path P ∈ P the total number of vertices received by the vertices of V (P ) ∩ V (Q), denoted as r(P ), and we will show that We fix a path P ∈ P for discussion. If it exists, we let t denote the smallest index j such that the vertex u j on the path P is associated with a path e(u j ) of Q − P with order n(e(u j )) ≥ k − j − 1. Note that if necessary we may rename the vertices on P , so that the non-existence of t implies any path of Q − P associated with the vertex u j or v j has order at most k − j − 2, for any index j. Furthermore, by Lemma 9, if t exists, then any path of Q − P, except e(u t ), associated with the vertex u j or v j has order at most k − j − 2, for any index j ̸ = t. We remark that e(u t ) could be associated with another vertex on the path P . When n(P ) = 2k − 1, t exists and t = k − 1.
We distinguish three cases for n(P ) based on its parity and on whether it reaches the maximum value 2k − 1. Due to space limit, below we only discuss the first case in detail. Case 1. n(P ) = 2s If the index t does not exist, that is, any path of Q−P associated with the vertex u j or v j has order at most k −j −2, for any index j, then by Lemma 10 each of the vertices u j and v j receives at most 3 2 min{j, k − j − 2} + 1 vertices. Hence we have If the index t exists, then by Lemmas 8 and 2, n(e(u t )) ≤ t and thus the vertex u t receives at most 2t + 1 vertices. Note that if e(u t ) is associated with another vertex on the path P , then we count all the vertices of e(u t ) towards u t but none to the other vertex (that is, we could overestimate r(P )). It follows from Lemma 10, the above Eq. (1), t ≤ s, and s ≥ k−1 Combining Eqs. (1,2), and by Lemma 11, in Case 1 we always have Therefore, using n(P ) = 2s + 1 we have where the upper bound ρ(k) is achieved when n(P ) ≈ √ 2k.

A 2-approximation algorithm for MaxP 4+ PC
One sees that the algorithm Approx1 is an O(n 5 )-time 2.4150-approximation algorithm for the MaxP 4+ PC problem, which improves the previous best O(n 5 )-time 4-approximation algorithm proposed in [15] and O(n 7 )-time (4 − 1 63,700,992 + ϵ)-approximation algorithm implied by [17]. In the amortized analysis for Approx1, when a path of Q − P is associated with two vertices of V (Q) ∩ V (P), one half of the vertices on this path is assigned to each of these two vertices. We will show that when k = 4, that is, for the MaxP 4+ PC problem, the assignment can be done slightly better by observing where these two vertices are on the paths of P and then assigning vertices accordingly. To this purpose, we will need to refine the algorithm using two more local improvement operations, besides the three Add, Rep and DoubleRep operations. We denote our algorithm for the MaxP 4+ PC problem as Approx2.
We again use P to denote the path collection computed by Approx2. Since Approx2 employs the three Add, Rep and DoubleRep operations, all of which are not applicable at termination, the structural properties stated in Lemmas 7-9 continue to hold, and we summarize them specifically using k = 4. We also fix Q to denote an optimal collection of vertex-disjoint paths of order at least 4 that covers the most vertices. Using Lemma 13, we may attach a longest possible extension in Q − P to every vertex of a path of P, giving rise to the worst cases illustrated in Figure 1, with respect to the order of the path. Since a vertex of a path P ∈ P can be associated with up to two paths in Q − P, the worst-case performance ratio of the algorithm Approx2 is at most max{ 17 7 , 14 6 , 13 5 , 8 4 } = 2.6. Our next two local improvement operations are designed to deal with three of the four worst cases where the path orders are 5, 6 and 7. Afterwards, we will show by an amortization scheme that the average number of vertices assigned to a vertex of V (P) is at most 2.
Since the average number of vertices assigned to a vertex on a 4-path in P is already at most 2, the first operation is employed to construct more 4-paths in V (P), whenever possible. ▶ Operation 14. For any two paths P and P ′ in P of order at least 5 such that their vertices are covered exactly by a set of paths in G of order at least 4 and of which at least one path has order 4, the Re-cover(P, P ′ ) operation replaces P and P ′ by this set of paths.
In other words, the Re-cover(P, P ′ ) operation removes the two paths P and P ′ from P, and then uses the set of paths to re-cover the same vertices. Since there are O(n 2 ) possible pairs of paths P and P ′ in P, and by |V (P ) ∪ V (P ′ )| ≤ 14 the existence of a set of paths re-covering V (P ) ∪ V (P ′ ) can be checked in O(1) time, we conclude that determining whether Figure 1 The worst case of a path P ∈ P with respect to its order, where a vertex on the path P is attached with a longest possible extension in Q − P, with its edges dashed. The edges on the path P are solid and the vertices are shown filled, while the vertices on the extensions are unfilled.
or not a Re-cover operation is applicable, and if so then applying it, can be done in O(n 2 ) time. Note that such an operation does not change |V (P)|, but it increases the number of 4-paths by at least 1. For example, if n(P ) = 5 and n(P ′ ) = 7, and we use u ′ j 's and v ′ j 's to label the vertices on P ′ , then a Re-cover operation is applicable when u 0 is adjacent to any one of u ′ 0 , u ′ 2 , v ′ 2 , v ′ 0 (resulting in three 4-paths); or if n(P ) = n(P ′ ) = 6, then a Re-cover operation is applicable when u 0 is adjacent to any one of ▶ Operation 15. For a path P ∈ P such that there is an index t ∈ {2, 3} and an extension e(u t ) with n(e(u t )) = t, (i) if replacing P by the path e(u t )-u t -· · · -v 1 -v 0 enables a Rep operation, then the Lookahead(P ) operation first replaces P by e(u t )-u t -· · · -v 1 -v 0 and next executes the Rep operation; (ii) if n(P ) = 6 and one of v 0 and v 2 is adjacent to a vertex w, such that w is on e(u 2 ) or on u 0 -u 1 or on another path of P but at distance at most 1 from one end, then the Look-ahead(P ) operation first replaces P by the path u 0 -u 1 -u 2 -e(u 2 ) and next uses v 0 -v 1 -v 2 as an extension e(w) to execute a Rep operation.
In some sense, the Look-ahead(P ) operation looks one step ahead to see whether or not using the extension e(u t ) in various ways would help cover more vertices. Recall that we can rename the vertices on a path of P, if necessary, and thus the above definition of a Look-ahead operation applies to the vertex v 2 symmetrically, if there is an extension e(v 2 ) with n(e(v 2 )) = 2. Also, when u t is the center vertex of the path P (i.e., n(P ) = 5, 7), one should also examine replacing P by the path u 0 -u 1 -· · · -u t -e(u t ).
When the first case of a Look-ahead operation applies, its internal Rep operation must involve at least two of the t vertices u 0 , u 1 , . . . , u t−1 2 because otherwise an Add or a Rep operation would be applicable before this Look-ahead operation. Since there are O(n 3 ) possible extensions at the vertex u t , it follows from Lemma 6 that determining whether or not the first case of a Look-ahead operation is applicable, and if so then applying it, can be done in O(n 6 ) time. Note that such an operation increases |V (P)| by at least 1.
When the second case of a Look-ahead operation applies, we see that after the replacement (which reduces |V (P)| by 1), the vertex w is always on a path of P at distance at most 1 from one end; and thus the succeeding Rep operation increases |V (P)| by at least 2. The net effect is that such a Look-ahead operation increases |V (P)| by at least 1. Since there are O(n 2 ) possible extensions at the vertex u 2 , determining whether or not the second case of a Look-ahead operation is applicable, and if so then applying it, can be done in O(n 4 ) time.
We summarize the above observations on the two new operations into the following lemma.
▶ Lemma 16. Given a collection P of vertex-disjoint paths of order in between 4 and 7, determining whether or not one of the two operations Re-cover and Look-ahead is applicable, and if so then applying it, can be done in O(n 6 ) time. Each operation either increases |V (P)| by at least 1, or keeps |V (P)| unchanged and increases the number of 4-paths by at least 1.
We are now ready to present the algorithm Approx2 for the MaxP 4+ PC problem, which in fact is very similar to Approx1. It starts with the empty collection P = ∅; in each iteration, it determines in order whether any one of the five operations Add, Rep, DoubleRep, Re-cover, and Look-ahead is applicable, and if so then it applies the operation to update P. During the entire process, P is maintained to be a collection of vertex-disjoint paths of order in between 4 and 7. The algorithm terminates if none of the five operations is applicable for the current P, and returns it as the solution. A simple high level description of the algorithm Approx2 is given in Algorithm 2. From Lemmas 6 and 16, the overall running time of Approx2 is in O(n 8 ). Next we show that the worst-case performance ratio of Approx2 is at most 2, and thus its higher running time is paid off. The performance analysis is done through a similar but more careful amortization scheme. Input: A graph G = (V, E); 1. initialize P = ∅; 2. while (one of Add, Rep, DoubleRep, Re-cover and Look-ahead is applicable) 2.1 apply the operation to update P; 3. return the final P.
The amortization scheme assigns the vertices of V (Q) to the vertices of V (P) ∩ V (Q). Firstly, each vertex of V (Q) ∩ V (P) is assigned to itself. Next, recall that Q − P is the collection of sub-paths of the paths of Q after removing those vertices in V (Q) ∩ V (P). By Lemma 7, each path S of Q − P is associated with one or two vertices in V (Q) ∩ V (P).
When the path S of Q − P is associated with two vertices v and v ′ of V (Q) ∩ V (P), a half of all the vertices on S are assigned to each of v and v ′ , except the first special case below. In this first special case, n(S) = 1, v = u 1 (or v = v 1 , respectively) and v ′ = u ′ 3 on some paths P, P ′ ∈ P with n(P ) ≥ 5 and n(P ′ ) = 7, respectively; then the whole vertex on S is assigned to u ′ 3 (that is, none is assigned to u 1 , or v 1 , respectively). When the path S of Q − P is associated with only one vertex v of V (Q) ∩ V (P), all the vertices on S are assigned to the vertex v, except the second and the third special cases below where n(S) = 1 and v = u 1 (or v = v 1 , respectively) on a path P ∈ P with n(P ) ≥ 5. In the second special case, S-u 1 -[r]-u ′ 3 is a subpath of some path Q ∈ Q, where u ′ 3 is the center vertex of some 7-path P ′ ∈ P, and [r] means the vertex r might not exist but if it exists then it is not the center vertex of any 7-path in P; in this case, a half of the vertex on S One might wonder whether the performance analysis can be done better. Though we are not able to show the tightness of the performance ratio 2, we give below a graph to show that 16 9 is a lower bound.

Figure 2
A graph of order 32 to show that the performance ratio of the algorithm Approx2 is lower bounded by 16 9 . All the edges in the graph are shown, either solid or dashed. The 18 filled vertices are covered by the collection of two 5-paths and two 4-paths computed by Approx2, and the edges on these paths are shown solid; the edges on an optimal collection of paths, which covers all the vertices, are shown dashed.

Conclusion
In this paper, we studied the general vertex covering problem MaxP k+ PC, where k ≥ 4, to find a collection of vertex-disjoint paths of order at least k to cover the most vertices in the input graph. The problem seemingly escapes from the literature, but it admits a k-approximation algorithm by reducing to the weighted (2k − 1)-set packing problem [17]. We proposed the first direct (0.4394k + O(1))-approximation algorithm and an improved 2-approximation algorithm when k = 4. Both algorithms are local improvement based on a few operations, and we proved their approximation ratios via amortized analyses. We suspect our amortized analyses are tight, and it would be interesting to either show the tightness or improve the analyses. For designing improved approximation algorithms, one can look into whether the two new operations in Approx2 for k = 4 can be helpful for k ≥ 5; other different ideas might also work, for example, one can investigate whether or not a maximum path-cycle cover [10] can be taken advantage of.
On the other hand, we haven't addressed whether or not the MaxP k+ PC problem, for a fixed k ≥ 4, is APX-hard, and if it is so, then it is worthwhile to show some non-trivial lower bounds on the approximation ratio, even only for k = 4.