On Kernels for d-Path Vertex Cover

In this paper we study the kernelization of the $d$-Path Vertex Cover ($d$-PVC) problem. Given a graph $G$, the problem requires finding whether there exists a set of at most $k$ vertices whose removal from $G$ results in a graph that does not contain a path (not necessarily induced) with $d$ vertices. It is known that $d$-PVC is NP-complete for $d\geq 2$. Since the problem generalizes to $d$-Hitting Set, it is known to admit a kernel with $\mathcal{O}(dk^d)$ edges. We improve on this by giving better kernels. Specifically, we give kernels with $\mathcal{O}(k^2)$ vertices and edges for the cases when $d=4$ and $d=5$. Further, we give a kernel with $\mathcal{O}(k^4d^{2d+9})$ vertices and edges for general $d$.


Introduction
Vertex deletion problems have been studied extensively in graph theory. These problems require finding a subset of vertices whose deletion results in a graph that belongs to some desired class of graphs. One such problem is path covering. Given a graph G = (V, E), the d-Path Vertex Cover problem (d-PVC) asks to compute a subset S ⊆ V of vertices such that the graph resulting from removal of S does not contain a path on d vertices. Here the path need not necessarily be induced. The problem was first introduced by Brešar et al. [1]. It is known to be NP-complete for any d ≥ 2 due to the meta-theorem of Lewis and Yannakakis [21]. The 2-PVC problem is the same as the well known Vertex Cover problem. The 3-PVC problem is also known as Maximum Dissociation Set or Bounded Degree-One Deletion. The d-PVC problem is motivated by the field of designing secure wireless communication protocols [22] or in route planning and speeding up of shortest path queries [18].

Preliminaries
We use the notations related to parameterized complexity as described by Cygan et al. [8].
We consider simple and undirected graphs unless otherwise stated. For a graph G, we use V (G) to denote the vertex set of G and E(G) to denote the edge set of G. A d-path (also denoted by P d ), denoted as an ordered d-tuple (p 1 , p 2 , . . . , p d ), is a path on d vertices {p 1 , p 2 , . . . , p d }. A d-path free graph is a graph that does not contain a d-path as a subgraph (the d-path needs not to be induced). The length of a path P is the number of edges in P , in particular, the length of a d-path P d is d − 1.
The d-Path Vertex Cover problem is formally defined as follows: d-Path Vertex Cover, d-PVC

Input:
A graph G = (V, E), a non-negative integer k.
Output: A set S ⊆ V , such that |S| ≤ k and G \ S is a P d -free graph.
A d-path packing P of size l in a graph G is a collection of l vertex disjoint d-paths in the graph G. We use V (P) to denote the union of the vertex sets of the d-paths in the packing P. For rest of the graph theory notations we refer to Diestel [11].
For a positive integer i, we will use [i] to denote the set {1, 2, . . . , i}.
▶ Proposition 1 (⋆). For a given graph G and an integer k, there is an algorithm which either correctly answers whether G has a d-path vertex cover of size at most k, or finds an inclusion-wise maximal d-path packing P of size at most k in O(2.619 d dkn log n) time.

General Reduction Rules
Let us start with reduction rules that apply to d-PVC for most values of d. Assume that we are working with an instance (G = (V, E), k) of d-PVC for some d ≥ 4. We start with a reduction rule whose correctness is immediate.
▶ Reduction Rule 1. If there is a connected component C in G which does not contain a P d , then remove C.
The next rule allows us to get rid of multiple degree-one vertices adjacent to a single vertex.

29:4
On Kernels for d-Path Vertex Cover

High Degree Reduction Rule for 4-PVC and 5-PVC
In this section, we are going to introduce the reduction rules which are applicable to both 4-PVC and 5-PVC instances. We assume that we are working with a d-PVC instance (G = (V, E), k) for d ∈ {4, 5} which is reduced by exhaustively employing Reduction Rule 2.
Our aim is to show that the degree of each vertex can be reduced to linear in the parameter. First assume that there is a large matching in the neighborhood of some vertex v. We call a matching M in G adjacent to vertex v, if it is a matching in G \ v and for each edge {a i , b i } ∈ M at least one of its vertices, say a i , is adjacent to v in G.

▶ Reduction Rule 3 (⋆).
If v is a vertex and M a matching adjacent to v of size |M| ≥ k + 2, then delete v and decrease k by 1.
To exhaustively apply Reduction Rule 3, we need to find for each v ∈ V a largest matching adjacent to v. This can be done as follows.
by removing edges with both endpoints in B. It is easy to observe, that each matching adjacent to v is also a matching in G v and vice-versa. Hence, it suffices to find a largest matching in G v , which can done in polynomial time [12].
Therefore, we further assume that the instance is reduced with respect to Reduction Rule 3. We fix a vertex v and find a largest matching M adjacent to it by the above algorithm. Let M be the set of vertices covered by matching M and m = |M|. Since the instance is reduced, we know that m ≤ k + 1. Let X = N (v) \ M . We refer the reader to the Figure 1 for overview of our setting.
We now partition the set X into three sets. Let X 2 be the set of vertices such that for each x ∈ X 2 we have some edge . Let X 0 be the vertices such that for each x ∈ X 0 we have that N (x) = {v}. Note, that X 0 contains at most one vertex due to Reduction Rule 2 being exhaustively applied. Lastly, let X 1 = X \ (X 2 ∪ X 0 ) be the rest of the vertices in X. See Figure 1 for an illustration of the sets X 2 , X 0 , and X 1 .
▶ Observation 5 (⋆). If the vertex v has degree at least (d + 2)(k + 1) + 1, then |X 1 | ≥ (d − 1)(k + 1). Now we focus on the edges between X 1 and M . By Observation 3, for each edge {a i , b i } in M we have that the vertices in X 1 may be adjacent to at most one vertex of such edge, We are now ready to employ the Expansion Lemma. We use the version of Fomin et al. [16], which is a generalization of the original results by Prieto [ Figure 2 A graphical interpretation of applying the Expansion Lemma to our setting.
We refer the reader to Figure 2 for a graphical interpretation of the situation guaranteed by Observation 8. Now, let us focus on the sets M ′ and X ′ and the way they are connected with vertex v. We are going to show that some edge between v and X ′ is now redundant.
▶ Reduction Rule 4. Let v be a vertex of degree at least (d + 2)(k + 1) + 1. Let M be a largest matching adjacent to v and M be the set of vertices covered by M. Let

29:6
On Kernels for d-Path Vertex Cover

Proof of Correctness. Let (G = (V, E), k) be the original instance and (G
is a solution for G, then S is also a solution for G ′ . Hence we will concetrate on the other direction. Suppose that S ′ is a solution for the reduced instance. If it is also a solution for the original one, then we are done. Suppose it is not, i.e., there is a P d in G \ S ′ . This P d contains the edge {x, v}, otherwise it would be also present in G ′ \ S ′ . Therefore v / ∈ S ′ and x / ∈ S ′ . There are three ways how solution S ′ can interact with M ′ and X ′ that we need to address. Firstly, suppose that M ′ ⊈ S ′ and let M = M ′ \ S ′ . See Figure 3 for an illustration. We have that for each vertex m ∈ M it must be that |Q m ∩ S ′ | ≥ 2. Indeed, if this is not the case, there would be a P d in G ′ \ S ′ which uses the (d − 2) vertices of Q m not in S ′ and the vertices m and v. Consider a set S = (S ′ ∪ M ∪ {v}) \ m∈ M Q m . Observe, that any P d which uses some vertex from X ′ must contain at least one of the vertices in M ′ or v, because N (X ′ ) ⊆ M ′ ∪ {v}. Therefore the set S is a solution for the reduced graph G ′ as it contains both M ′ and v. The set S is also a solution for G as it contains v and therefore covers any P d which might use the deleted edge {x, v}. Finally, |S| ≤ |S ′ |, because for each m that we add into S, we remove at least two vertices of Q m from S and therefore we have Secondly, assume that M ′ ⊆ S ′ and there is some x ′ ∈ X ′ such that x ′ ∈ S ′ . We construct the set S = (S ′ \ X ′ ) ∪ {v}. Again, observe that S ′ is a solution for G ′ because any P d which uses some vertex from X ′ must contain at least one of the vertices in M ′ or v and both are fully contained in S. We also have that S is a solution for G, again, as it contains v and therefore covers any P d which might use the deleted edge {x, v}. Finally, |S| ≤ |S ′ |, because we have the assumption that there is some x ′ ∈ X ′ and x ′ ∈ S ′ . Lastly, assume that M ′ ⊆ S ′ and X ′ ∩ S ′ = ∅. In this case, the P d that we found in G \ S ′ must be of the form P = (x, v, u 3 , . . . , u d ) and u 3 , .
To sum up, we have shown that when we delete the edge {x, v} from G, then for any solution S ′ for G ′ which is not also a solution for G, we can always find a new solution S, |S| ≤ |S ′ | which is a solution for both G ′ and G. ◀ As the application of the rule only requires finding a largest matching adjacent to v, classifying the vertices of N (v), and finding a (d−1)-expansion and these tasks can be done in polynomial time, the rule can be applied in polynomial time.

4-PVC Kernel with Quadratic Number of Edges
Let (G = (V, E), k) be an instance reduced by exhaustively employing Reduction Rules 1-4. Then the maximum degree in G is at most (d + 2)(k + 1) = 6k + 6. Furthermore, assume that the algorithm of Proposition 1 actually returned an inclusion-wise maximal packing P in G with at most k 4-paths instead of answering immediately. Let P = V (P), and let is a 4-path free graph, otherwise we would be able to increase the size of the packing P. Since the instance is reduced with respect to Reduction Rule 1, each connected component in G [A] is connected to P by at least one edge.
To show that an instance reduced with respect to all the above rules has a quadratic number of edges, it suffices to count separately the number of edges incident on P and the number of edges in G [A]. Since the maximum degree in G is at most 6k + 6 and there are at most 4k vertices in P , there are at most 4k · (6k + 6) = 24k 2 + 24k edges incident on P .
To count the edges in G[A], we first observe that a connected 4-path free graph is either a triangle, or a star (possibly degenerate, i.e., with at most 3 vertices). Here a q-star is a graph with vertices {c, l 1 , . . . , l q }, q ≥ 0 and edges {{c, l i } | i ∈ {1, . . . , q}}. Vertex c is called a center, vertices {l 1 , . . . , l q } are called leaves. The term star will be used for a q-star with an arbitrary number of leaves. Note that, a graph with a single vertex is a 0-star, a graph with two vertices and a single edge is a 1-star, and a 3-path is a 2-star. A triangle is a cycle on three vertices.
Secondly, as the instance is reduced with respect to Reduction Rule 1, each connected component in G[A] is connected to P by at least one edge. Therefore, there are at most 24k 2 + 24k connected components in G[A], as there are only that many edges going from P to A. Next, we provide an observation about stars in G[A].

5-PVC Kernel with Quadratic Number of Edges
The idea is completely analogous to the previous section. We employ the following characterization.
A star with a triangle is formed by connecting two leaves of a star with an edge. A bi-star is formed by connecting the centers of two stars with an edge.  In this section, we give a kernelization algorithm for d-PVC with d ≥ 6.
An intuition behind the approach. The kernelization algorithm marks some vertices and edges, which it wants to keep, and throws away the rest. Essentially, the kernelization creates a subgraph G of the input graph G. For correctness of the algorithm, we want to show that if there is a d-path P in G which misses some set of vertices S (a prospective solution), then we will also find some d-path P ′ in G, which also misses the set S.
We begin by finding a maximal packing M in G and we keep in G all vertices M of the packing and all edges between them. Now, on one hand, if the path P would be completely contained in M , then trivially the path appears also in G. On the other hand, the path P cannot be completely outside of M . Thus, the path P crosses between M and outside of M at least once. This corresponds to vertices of M being connected by a path of prescribed length outside of M . We later formalize this as a "request".
To get more structure, we leverage the behavior of DFS trees of the connected components outside of M . With the DFS trees we identify vertices, which are "crucial" for the requests, and we further split the requests into "sub-requests" according to the "crucial" vertices.
The algorithm is inspired by Dell and Marx [9]. However, while the considered problems have similarities, many ideas are not translatable. In particular, they could afford to consider all "sub-requests" and keep Ω(k) vertices for each without affecting their bound (cf. [9, p. 23]). We had to be more careful in which "sub-requests" we consider and we need to employ Lemma 15 (below) to only keep d O(d) vertices and edges for each such "sub-request" to achieve our precise bound. Also, to achieve the edge bound, we need to keep track of the purpose for which the individual vertices were marked, which makes it hard to split the algorithm into small self-contained steps.
For a rooted forest F and its vertex v ∈ V (F ), sub(v) denotes the set of vertices of a maximal subtree of F rooted at v and anc(v) the set of ancestors of v in F , i.e., We provide the following observations regarding the DFS trees T i and the forest F.  In the next paragraphs we cover the notion of the "crucial" vertices mentioned in the earlier intuition. Roughly speaking, a vertex of F is "crucial" if the set of its descendants (vertices of its subtree) satisfies some request. For Recall, that in the intuition we examined some d-path P in G. The resolved request basically ensures that there are at least k + d + 1 disjoint paths in G which satisfy said request. The idea is that the prospective solution may compromise at most k of these paths and the other parts of P may compromise at most d of these paths. As we will keep exactly k + d + 1 of these disjoint paths in G, we can be sure, that at least one of them will always be usable to reroute some part of P which will help us to find the desired path P ′ in G. Now, we focus on the unresolved requests. Let R * be the set of all requests ϱ(f, l) which are not resolved and let Y = ϱ(f,l)∈R * Y f,l .
We are now getting to the notion of sub-requests. These have either one endpoint in M and the other in Y, or both endpoints in Y, or only one prescribed endpoint, which is in Y. Note that if the two endpoint are in Y, then, by Observation 14, either one of the endpoints is an ancestor of the other, or there is no path connecting them outside (M ∪ Y).
An (M ∪ Y)-request (g, j, M ∪ Y) will be simply denoted as σ(g, j) and called a sub-request if there exists y ∈ Y such that y ∈ g and g ⊆ M ∪ anc(y). In particular, either g = {y} or one of the vertices in g is y and the other vertex is in M ∪ anc(y).
Even though we will not formally define a resolved sub-request, later we will actually show that the sub-request is "resolved" if there are at least 2d paths satisfying it.
Description of the algorithm. We are now ready to describe the kernelization algorithm. As we mentioned earlier, the algorithm first marks some vertices and edges, which it wants to keep, and it deletes the rest of the graph. Therefore, the core of the algorithm is the marking procedure. In our case, the main procedure is called Mark which in turn uses a procedure called Mark 2. These procedures are described in Algorithm 1 and Algorithm 2, respectively.
Let us now give an insight into how the procedures Mark and Mark 2 were constructed. We will start with Mark. Mark the vertices and edges of P .
foreach sub-request σ(g, j) such that g ⊆ M ∪ anc(y) and g ̸ ⊆ M do 10 Let C 1 , C 2 , . . . , C q ′ be the vertex sets of the connected components of Pick an arbitrary path P ∈ P Ci g,j .
14 Mark the vertices and edges of P . Mark all the edges and vertices of P .
The lines 3-7 deal with the resolved requests. Essentially, by preserving k + d + 1 corresponding paths for the request ϱ(f, l), we retain all the necessary structure such that we do not create any new solutions in the reduced instance.
In the two following for-cycles, we first pick a vertex y of Y. This fixes the set of ancestors anc(y), i.e., it fixes the set of vertices on the path from y to the root of its tree in F. And, for this particular y, we then pick a sub-request σ(g, j) which lives on this fixed set anc(y) and M . This allows us to look only at some components of G \ (M ∪ Y) and actually makes it possible for us to bound their number. The bounding happens on lines 11-14 and we can also say that the sub-request σ(g, j) is resolved when the number of components is at least 2d. The bound 2d follows from Lemma 15, which will be stated later. For a resolved sub-request we proceed similarly to resolved request. Namely, we preserve one corresponding path in each of some 2d of the components.
If the number of components is not large, the lines 15-17 run the second marking procedure Mark 2 on each of these components and the aim is to bound their size. Now, recall again, that in the intuition we examined some d-path P in G and some prospective solution S. The purpose of the marking procedure Mark 2 is to brute-force all the possible ways of how the path P and solution S may compromise the paths which satisfy the sub-request σ(g, j) and which are contained in the component C i . The procedure works recursively, starts with the empty set of "compromising" vertices W and it always picks a path which was not yet compromised, marks it (so that it remains in G), and tries to compromise its vertices one by one. By doing it like this, we ensure, that all the important parts of C i remain in G no matter what parts of C i were compromised.
And the main trick is that we can stop the recursion of Mark 2 once the number of "compromising" vertices reaches 2d. This number 2d again follows from Lemma 15. Now, the kernelization can be formally summarized as follows. Run the marking procedure Mark on the instance (G, k). The marking results in two subsets V ⊆ V (G) and E ⊆ E(G) corresponding to marked vertices and edges by Mark. Reduce the instance (G, k) to the instance ( G, k) where G = ( V , E).
With that we conclude the intuition and we continue with the formal proof of correctness. Proof. If S is a solution for ( G, k), we are done. Suppose to the contrary that it is not. Then there is a d-path P in G \ S. Assume that P is selected such that it contains the least number of vertices which are in S ′ i.e., |V (P ) ∩ S ′ | is minimized among all d-paths in G \ S. As S ′ \ S ⊆ i∈[c] C i , path P must contain at least one vertex from at least one set C i ∩ S ′ , because otherwise P would also be in G \ S ′ , which is a contradiction with S ′ being a solution for ( G, k). Further, since N (C i ) ⊆ (M ∪ Y), N (C i ) ∩ Y ⊆ anc(y), and anc(y) ⊆ S by assumption, we have N (C i ) \ S ⊆ M . Therefore P ∩ M ̸ = ∅, as otherwise the path P would be contained in C i , which is a contradiction with M being a maximal packing of d-paths.
We split the path P into segments according to the vertices of M , i.e., a segment of P is a sub-path (v 1 , v 2 , . . . , v s ) of P such that v 2 , v 3 , . . . , v s−1 ∈ V (G) \ M and either {v 1 , v s } ⊆ M (an inner segment), or one of v 1 , v s is in M , while the other is an endpoint of P (an outer segment). The argument is the same in both cases.
We also know that V (P ′ ) ∩ Y = ∅, because N (C i ) \ S ⊆ M . With that we argue that the request ϱ(f, l) must be resolved. Indeed, suppose it is not. By Observation 14(c) there is a vertex v in V (P ′ ) ∩ C i such that that V (P ′ ) ∩ C i ⊆ sub (v). But that implies that sub(v) satisfies the request ϱ(f, l) and, therefore, vertex v should have been included in Y f,l and, consequently, v should have been included in Y, which is a contradiction with V (P ′ ) ∩ Y = ∅. Now, as the request ϱ(f, l) is resolved, the marking procedure Mark picked k + d + 1 leaves h 1 , h 2 , . . . , h k+d+1 from F[Y f,l ] and for each such leaf h i it marked the vertices and edges of some path P i ∈ P sub(hi) f,l . Therefore, these paths P 1 , P 2 , . . . , P k+d+1 remained in G. Further, at least one of these paths is untouched by the vertices of S ′ and the vertices of P as |S ′ | ≤ k and |V (P )| ≤ d, respectively. Let this one untouched path be P i . Observe, that we can swap the segment P ′ with the path P i in P to obtain a d-path P * . But then the path P * contains strictly fewer vertices which are in S ′ than P , which is a contradiction with the choice of P . ◀ Now we can prove the correctness of the algorithm. Proof sketch. For the "if" direction we pick a solution S ′ to G such that any application of Lemma 15 would increase its size. If S ′ was not a solution to G, then as in Lemma 15, we pick a special d-path P witnessing that, but this time with the least number of unmarked edges in G. Then we again split P into segments according to M and, in case the corresponding request was not resolved, further into sub-segments according to Y. We always pick a (sub-)segment with at least one unmarked edge and we show that we can swap the (sub-)segment with some other suitable fully marked sub-path to obtain a contradiction with the choice of P . ◀ The following lemma shows the bound on the size of the kernel. We summarize the result in the following theorem.

Conclusion
We presented kernels with O(k 2 ) edges for 4-PVC and 5-PVC and with O(k 4 d 2d+9 ) edges for d-PVC for any d ≥ 6. An obvious open question is whether there is a kernel with O(k 2 ) edges for every d ≥ 6. Furthermore, the size of our kernel depends on d by a factor of d O(d) . We believe that this could be improved to 2 O(d) with the use of representative sets. However, improving this to a factor polynomial in d would imply coNP ⊆ NP/poly. As observed by Dell and Marx [9], running such a kernel with k = 0 would give a polynomial kernel for the d-Path problem, which would have the above mentioned implications.
Next, for 2-PVC and 3-PVC, there are kernels with linear number of vertices [19,32]. Hence, another open question is whether such a kernel can be obtained also for say 4-PVC. Further interesting open questions can be found in the recent survey of Tu [29].