Kernelization in Almost Linear Time for Clustering into Bounded Vertex Cover Components
Abstract
Motivated by the growing interest in graph clustering and the framework proposed during the Dagstuhl Seminar 23331, we consider a natural specialization of this general approach (as also suggested during the seminar). The seminar introduced a broad perspective on clustering, where the goal is to partition a graph into connected components (or “clusters”) that satisfy simple structural integrity constraints – not necessarily limited to cliques.
In our work, we focus on the case where each cluster is required to have bounded vertex cover number. Specifically, a connected component satisfies this condition if there exists a set with such that is an independent set. We study this within the framework of the Vertex Deletion to -Vertex Cover Components (Vertex Deletion to -VCC) problem: given a graph and an integer , the task is to determine whether there exists a vertex set of size at most such that every connected component of has vertex cover number at most . We also examine the edge-deletion variant, Edge Deletion to -Vertex Cover Components (Edge Deletion to -VCC), where the goal is to delete at most edges so that each connected component of the resulting graph has vertex cover number at most . We obtain following results.
-
1.
Vertex Deletion to -VCC admits a kernel with vertices and edges.
-
2.
Edge Deletion to -VCC, admits a kernel with vertices and edges.
Both of our kernelization algorithms run in time . It is important to note that, unless the Exponential Time Hypothesis (ETH) fails, the dependence on cannot be improved to , as the case reduces to solving the classical Vertex Cover problem, which is known to require time under ETH.
A key ingredient in our kernelization algorithms is a structural result about the hereditary graph class , consisting of graphs in which every connected component has vertex cover number at most . We show that admits a finite obstruction set (with respect to the induced subgraph relation) of size , where each obstruction graph has at most vertices. This combinatorial result may be of independent interest.
Keywords and phrases:
Parameterized complexity, Polynomial Kernels, Vertex Cover, Finite Forbidden CharacterizationCopyright and License:
2012 ACM Subject Classification:
Theory of computation Parameterized complexity and exact algorithms ; Mathematics of computing Graph algorithmsEditors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał SkrzypczakSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Graph modification problems, especially vertex and edge deletions to achieve a desired structural property, are central to algorithmic graph theory [4, 6, 15, 30, 35]. Formally, given a fixed graph class , the goal is to delete at most vertices or edges from an input graph so that the resulting graph belongs to . This general framework captures a wide variety of well-studied problems, including Vertex Cover [16], Feedback Vertex Set [16], Planar Deletion [26, 28], Multiway Cut [31], Interval Deletion [9], Chordal Deletion [10], and Claw-Free Deletion [15]. A key motivation for studying such problems is that many computationally hard problems become tractable when restricted to graphs in [1, 7, 15, 20]. Among these, the Vertex Cover problem is particularly fundamental. Given a graph and an integer , the Vertex Cover problem asks whether contains a vertex set , also called vertex cover of , of size at most such that does not contain an edge.
One of the classical graph modification problems also viewed as a graph clustering problem asks whether a given graph can be transformed into a cluster graph (i.e., a graph in which every connected component is a clique) by deleting at most vertices or edges [23, 29]. This blends the perspectives of graph editing and clustering. Building on this line of work, and inspired by the framework proposed during the Dagstuhl Seminar 23331 [27], we consider a natural specialization of this general approach (also suggested during the seminar). The seminar introduced a broad perspective on clustering, where the goal is to partition a graph into connected components (or “clusters”) that satisfy simple structural integrity constraints, not necessarily limited to cliques.
One such question proposed during the seminar is the Vertex Decomposition into Small Dominator Clusters problem. Given a graph and parameters , the task is to determine whether at most vertices can be deleted from to obtain a graph such that each connected component of has domination number at most [27]. The seminar posed the open question of whether this problem is fixed-parameter tractable (FPT) when parameterized by the combined parameters and . Since the problem is known to be W[2]-complete when [16], it remains computationally hard even in the absence of deletions. Nonetheless, it was natural to ask whether an algorithm exists when the parameter is allowed to appear in the exponent of . This question was recently answered in the affirmative by Bentert et al. [5], who presented an algorithm with running time .
In this work, we investigate two natural variants of this emerging family of vertex-deletion and edge-deletion clustering problems. Specifically, we study the Vertex Deletion to -Vertex Cover Components and Edge Deletion to -Vertex Cover Components problems, where the objective is to delete at most vertices or edges, respectively, so that each connected component of the resulting graph has vertex cover of size at most .
In the same seminar [27], Karypis et al. also proposed the study of the Vertex Decomposition into Small Vertex Cover Clusters problem, referred to as the Layered Vertex Cover problem. In this problem, the goal is to delete a small number of vertices so that every connected component of the resulting graph can be further reduced by deleting at most vertices, resulting in a graph where each connected component has a vertex cover of size at most . This formulation is closely related to our investigation of the Vertex Deletion to -VCC and Edge Deletion to -VCC problems.
While in the work of Bentert et al. [5], the classical Dominating Set problem played a central role in characterizing the connected components after the deletion of a small number of vertices, in our work, the Vertex Cover problem serves an analogous foundational role.
The Vertex Cover is one of the most extensively studied problems in parameterized complexity, and has served as a cornerstone in the development of both theoretical frameworks and algorithms especially in the context of kernelization, which is a central theme of this paper [16, 19].
Kernelization is a core technique in parameterized complexity. It focuses on designing polynomial-time preprocessing algorithms that reduce a given instance to an equivalent one of size bounded by a function of the parameter, while preserving correctness guarantees (see Definition 9 for a formal definition). The Vertex Cover problem has inspired a rich toolbox of such techniques, including crown rule reductions, the Nemhauser–Trotter theorem, and LP-based kernelization methods [16, 19, 13, 32, 33, 18, 3, 2, 14, 8, 25].
1.1 Our Results and Methods
From a parameterized complexity perspective, both Vertex Deletion to -VCC and Edge Deletion to -VCC exhibit rich algorithmic structure. Our main result is an almost-linear-time kernelization for both problems.
Theorem 1.
Vertex Deletion to -VCC admits a kernel with at most vertices and edges. Furthermore, it can be computed in time .
Theorem 2.
Edge Deletion to -VCC admits a kernel with at most vertices and edges. Furthermore, it can be computed in time .
In the same report [27] by Dagstuhl, the importance of developing “truly linear-time” FPT algorithms was emphasized – those with running time – to handle large-scale network data. While recent advances have made progress in this direction [11, 12], the area remains relatively underexplored. We believe that our kernelization results also contribute meaningfully to this growing body of work.
A central ingredient in our kernelization framework is a structural result concerning the hereditary graph class , defined as the class of graphs in which every connected component has vertex cover number at most . We show that this class admits a finite forbidden induced subgraph characterization: every minimal obstruction has at most vertices, and the total number of such obstructions is bounded by . This structural result may be of independent interest and could find applications in other graph modification problems where the goal is to decompose a graph into connected components with bounded vertex cover number.
Let denote the set of minimal forbidden induced subgraphs for . Observe that, since each graph has at most vertices, one can design a fixed-parameter tractable (FPT) algorithm for both Vertex Deletion to -VCC and Edge Deletion to -VCC with running time . The idea is to enumerate all subsets of of size at most and check whether any such subset induces a graph in . If so, the algorithm branches on how to intersect the obstruction using at most deletions. A natural question that arises is whether we can identify an obstruction graph more efficiently than by brute-force enumeration. This leads us to the following algorithmic problem.
We design the following algorithm for the Membership or an Obstruction to problem.
Theorem 3.
Let be a graph on vertices. Then, in time , one can either verify that , or output an induced subgraph such that and .
It is important to note that, while the algorithm of Bentert et al. [5] necessarily incurs an dependence in its running time, our results achieve an almost-linear dependence on the input size. At the same time, unless the Exponential Time Hypothesis (ETH) fails, the dependence on cannot be improved to in our setting either. This is because the special case reduces to the classical Vertex Cover problem, which is known to require time under ETH [24].
Finally, we turn to the Edge Deletion to -VCC problem. Observe that for , the problem is solvable in polynomial time: in this case, the task reduces to deleting all edges from the input graph, yielding an edgeless graph, which trivially satisfies the vertex cover condition. This stands in contrast to the classical Vertex Cover problem. However, this tractability does not extend beyond . In fact, one can show that Edge Deletion to -VCC becomes NP-complete already for . Specifically, we can prove that deciding whether one can delete edges to obtain a graph in which every connected component is a star is NP-complete, via a reduction from the Dominating Set problem.
As a final remark, we observe that the above formulation captures a hierarchy of increasingly general problems. When , the vertex-deletion variant reduces to the classical Vertex Cover problem. For , each connected component becomes a star. As increases, the allowed components grow in structural complexity, yet remain bounded by a core of size at most that governs their internal interactions. This hierarchy offers a smooth trade-off between expressiveness and algorithmic tractability.
2 Preliminaries
Graphs.
We use standard graph-theoretic notation, and we refer the reader to Diestel [17] for any undefined terms. We consider simple unweighted undirected graphs. For a positive integer , We use to denote the set . For a graph , we use and to denote the set of vertices and edges of , respectively. For vertices , we write to indicate that . For a graph and a vertex set , we define to be the graph obtained after deleting the set of vertices from . If is singleton, that is , then we use instead of . Similarly, for a subset of edges , we define to be the graph obtained after deleting the set of edges from . If is a singleton, that is , then we use instead of .
For a vertex , we say that is a neighbor of if is an edge in . Further, we denote neighborhood of a vertex in graph as to be the set of vertices that are neighbor to , i.e., we have . For a set of vertices , the neighborhood of is denoted by and defined as . Similarly, for a set of vertices , the closed neighborhood of is denoted by and defined as . We omit the subscript when the graph is clear from the context. For a graph and a vertex , we denote by the set of edges incident to , that is, .
We say that a graph is a subgraph of if and . Further we say that a graph is an induced subgraph of if and . We define the neighborhood of as and denote it by . We say that an edge is an internal edge of if . For a pair of vertices , a path is a sequence of vertices such that for all . The vertices are called the internal vertices of the path. The length of a path is defined as the number of vertices it contains.
The distance between two vertices and in a graph , denoted by , is defined as the length of any shortest path between them in . A subgraph of is called a tree if it is connected and acyclic. We say that is a rooted tree if there is a designated root, and we denote the root of as .
We denote to be the size of a minimum vertex cover of , and alternately we call it as vertex cover number of and to be a minimum vertex cover set of . Further, we denote by the set of connected components in . Note that every connected component of a graph is an induced subgraph of . For a connected component , we define to be the size of a minimum vertex cover of the connected component of .
Further, we define the boundary of a set of vertices in as follows:
Definition 4 (Boundary of a set of vertices [19]).
For a graph and a vertex subset , we define the boundary of as
We refer to as the boundary of . When the graph is clear from context, we omit the subscript and write .
Now we state the well-known Hall’s theorem which we need in Section 3.
Theorem 5 (Hall’s theorem [17]).
Let be a bipartite graph with bipartition and . Then, there exists a matching that matches every vertex in to a distinct vertex in if and only if for every subset , it holds that .
Theorem 6 (Hopcroft-Karp algorithm [22, 16]).
Let be an undirected bipartite graph with bipartition and , on vertices and edges. Then we can find a maximum matching as well as a minimum vertex cover of in time . Furthermore, in time , either we can find a matching saturating , or an inclusion-wise minimal set such that .
Definition 7 (Tree Decomposition [16]).
A tree decomposition of a graph is a pair , where is a tree and each node of is assigned a set , called a bag, such that:
-
1.
For every vertex , there exists at least one bag such that .
-
2.
For every edge , there exists a bag such that .
-
3.
For every vertex , the set of nodes induces a connected subtree of .
Definition 8 (Tree Width [16]).
The width of a tree decomposition is the size of its largest bag minus one, i.e., . The treewidth of , denoted , is the minimum width over all possible tree decompositions of .
It is easy to note that for a graph , we have , where is the size of a minimum vertex cover of [16].
Let denote the maximum over vertex cover number of all the connected components of , that is,
We say that a graph satisfies if every connected component of has vertex cover number at most .
Due to space constraints, the proofs of the results marked will be presented in the full version of the paper.
Parameterized algorithms and Kernelization.
In parameterized complexity, computational problems are analyzed based on two inputs: the main input size and an additional parameter , which typically captures some structural aspect of the instance.
Kernelization is a formal framework for polynomial-time preprocessing in parameterized complexity.
Definition 9 (Kernelization).
Given an instance of a parameterized problem, a kernelization algorithm transforms it in polynomial time into a new instance – called the kernel – such that:
-
the size of is bounded by a function (independent of the original input size), and
-
the new instance is equivalent to the original one; that is, is a YES-instance if and only if is.
If is a polynomial function, then the kernel is referred to as a polynomial kernel.
3 Bounding the Size of Minimal Obstructions and an Algorithm to Compute it
For a non-negative integer , let denote the class of graphs in which every connected component has vertex cover number at most . That is,
Note that is a hereditary graph class – meaning that if , then every induced subgraph of also belongs to .
Our problems, Vertex Deletion to -VCC and Edge Deletion to -VCC, correspond to deleting at most vertices or edges, respectively, to obtain a graph in . A standard approach for designing FPT algorithms or polynomial kernels is to identify a finite forbidden subgraph characterization of the base class . In this section, we pursue this approach. For a fixed integer , we also develop a recognition algorithm that, given a graph , runs in near-linear time and either confirms that , or returns a small induced subgraph that certifies (see Theorem 3).
3.1 Finite Forbidden Characterization
To show that admits a finite forbidden characterization under the induced subgraph relation, we begin by defining the notion of minimal obstructions for hereditary graph classes. A graph is called an obstruction for a graph class if . We now formally define the family of minimal such obstructions.
Definition 10 (Family of Minimal Obstructions).
Let be a hereditary graph class. A graph is a minimal obstruction for under the induced subgraph relation if , but every proper induced subgraph of belongs to . The set of all such minimal obstructions is denoted by , and is defined as:
Here, indicates that is a proper induced subgraph of .
The following lemma establishes that all minimal obstructions for the class are necessarily connected graphs.
Lemma 11.
Let be the graph class defined above. Then every graph is connected.
Proof.
Suppose, for the sake of contradiction, that there exists a graph that is disconnected and has at least two connected components. Since , at least one connected component, say , must have vertex cover number greater than . But then is a proper induced subgraph of and also violates the condition , implying . This contradicts the minimality of , as not all its proper induced subgraphs belong to . Hence, every minimal obstruction must be connected.
Next, we establish several structural properties of the minimal obstructions for . These results will later be combined in Lemma 16 to show that the set of minimal obstructions for is finite and its size is bounded by . Moreover, we show that every such obstruction has a number of vertices linear in . As a first step, we prove that every minimal obstruction for has vertex cover number exactly .
Lemma 12.
Let be defined as above. Let then .
Proof.
By Definition 10, since , we know that . This implies that . To complete the proof, we show that . Suppose, for contradiction, that .
Let be a spanning tree of , and let be a leaf of . Then the graph is connected, since removing a leaf from a spanning tree does not disconnect it. Moreover, . Observe that removing a vertex from a graph can decrease the size of a minimum vertex cover by at most one. Therefore,
This implies that , contradicting the fact that is a minimal obstruction – since all proper induced subgraphs of must belong to .
We conclude that our assumption is false. Hence, , as claimed.
To establish a bound on the size of each graph , we make use of the following well-known result, which relates the vertex cover number and the connected vertex cover number of a connected graph. While the size bound is well known, we are not aware of any existing implementation that computes the corresponding set efficiently in linear time.
Proofs of the lemmata marked with are provided in the full version of the paper.
Lemma 13 ().
Let be a connected graph, and let be a vertex cover of . Then there exists a set with such that the induced subgraph is connected. Moreover, such a set can be computed in time.
Next, our goal is to establish an upper bound on the size of each minimal obstruction in . To achieve this, we first prove the following lemma, which we use in the subsequent lemma to derive the desired size bound on minimal obstructions.
Lemma 14 ().
Let be a connected graph with . Suppose we are given sets , where is a vertex cover of size and is connected. Let , and suppose with . Then there exists a vertex such that remains connected and . Moreover, such a vertex can be found in time .
We are now ready to prove the upper bound on the size of an obstruction.
Lemma 15.
Let be as defined above. Let then .
Proof.
Suppose for contradiction that there exists a graph such that . By the definition of , we have that is connected and .
Let be a minimum vertex cover of , so . The complement is an independent set, and we have:
Since is connected and is a vertex cover, we can apply Lemma 13 to find a subset of size at most such that the subgraph is connected. Let , and observe that is connected and .
Now define . Then
Now, let be an arbitrary subset of size exactly where . Thus, all conditions of Lemma 14 are satisfied: is connected, , is a vertex cover of size , with connected, and with .
By Lemma 14, there exists a vertex such that the graph is connected and . But then is a proper induced subgraph of that is still connected and has vertex cover number . This contradicts the assumption that , since would not be minimal.
Thus, our assumption was false and we must have , completing the proof.
We combine the results obtained above in this section to prove the following lemma.
Lemma 16.
Let be as defined above. Then the obstruction set contains at most graphs. Moreover, every graph has at most vertices.
Proof.
By Lemma 15, every graph has at most vertices and vertex cover number . Therefore, the total number of such obstructions is bounded by the number of connected graphs on at most vertices. This number is at most , yielding the claimed bound on the size of the obstruction set.
3.2 Membership or an Obstruction to
In this section, we present an algorithm for Membership or an Obstruction to parameterized by . In particular, we prove the following result.
Theorem 3. [Restated, see original statement.]
Let be a graph on vertices. Then, in time , one can either verify that , or output an induced subgraph such that and .
For our algorithm, we require the following result, which combines a linear-time kernel for the classical Vertex Cover problem [34] with the currently known fastest parameterized algorithm for Vertex Cover [21].
Proposition 17.
Vertex Cover can be solved in time or on an instance , where .
We use this result as a subroutine in our algorithm for Membership or an Obstruction to , where we need to repeatedly solve Vertex Cover on various induced subgraphs of the input graph. Since the parameter in Membership or an Obstruction to bounds the vertex cover number of each connected component in the target graph, we can apply the proposition with , ensuring that each call to Vertex Cover runs efficiently with respect to the parameter .
Our proof proceeds in the following steps:
- Step 0.
-
We begin by applying a modified breadth-first search (BFS) that either determines that the total number of edges in is at most , or returns a subgraph with .
Observe that if , then the number of edges in is upper bounded by . Indeed, if a connected component has vertex cover number at most , then the number of edges in is at most . Summing this bound over all connected components yields the claimed global bound of on the total number of edges.
In the latter case, when the algorithm returns a subgraph with edges, we can immediately proceed to Step 2. This entire step can be executed in time .
- Step 1.
-
Given a graph , we first use Proposition 17 to check whether . If this test fails, then there exists a connected component such that . From this point onward, we focus on the subgraph , and aim to output an obstruction with . Without loss of generality, we refer to as itself, as we may assume is connected.
- Step 2.
-
In the second step, we identify a connected induced subgraph such that .
- Step 3.
-
Next, we obtain a subgraph by making the argument of Lemma 15 constructive. This guarantees that . However, may not yet belong to .
- Step 4.
-
Finally, by removing additional vertices from , we obtain a graph .
This completes the description of the algorithm for Membership or an Obstruction to . Next, we show the implementation of some of the non-trivial steps below. The implementation of Steps 2, 3 and 4 will be provided in the full version of the paper.
Running time.
The initial graph has at most vertices. In each iteration, we go over all vertices and compute the vertex cover number of . This takes vertex cover computations per step as the number of connected component in is . Since one vertex is removed in each iteration, the total number of steps is at most . Hence, the total running time to implement this step is bounded by .
4 Cubic Vertex Kernel for Vertex Deletion to -VCC
This section presents a polynomial kernel for Vertex Deletion to -VCC parameterized by . As a first step, we compute an -approximate solution for Vertex Deletion to -VCC, which serves as the basis for further reduction.
4.1 Approximation Algorithm for Vertex Deletion to -VCC
Lemma 18.
Let be an instance of Vertex Deletion to -VCC. Then, in time , we can either find a set of size at most such that , or correctly conclude that is a No-instance of Vertex Deletion to -VCC.
Proof.
We iteratively construct a solution of size at most by repeatedly identifying a minimal obstruction using Theorem 3. We also maintain a family of induced subgraphs and a counter . The procedure proceeds as follows:
-
1.
While contains a minimal obstruction and , do:
-
(a)
Add to the solution: ,
-
(b)
Add to the family: ,
-
(c)
Increment the counter: ,
-
(d)
Remove from .
-
(a)
-
2.
If , return that is a No-instance of Vertex Deletion to -VCC.
-
3.
Otherwise, return the set .
Correctness follows from the fact that the obstructions added to are pairwise vertex-disjoint. Since each obstruction is a minimal forbidden induced subgraph for , any feasible solution must delete at least one vertex from each obstruction to ensure that the resulting graph belongs to . Thus, if we identify such disjoint obstructions, any solution must contain at least vertices, implying that is a No-instance.
In the other case, we remove at most disjoint obstructions. By Lemma 15, each obstruction contains at most vertices. Hence, the total number of vertices added to is at most . Moreover, since contains no induced obstruction, we have , and therefore .
Finally, since we invoke Theorem 3 at most times, the overall running time is bounded accordingly. This concludes the proof.
4.2 Marking and Irrelevant Vertices
From this point onward, we assume that we are given a set of size at most such that . We now examine the connected components of the graph . Let denote the set of connected components of . We partition into two subsets based on the size of the components: We now partition the set of connected components based on their size:
-
Let denote the set of singleton components (i.e., isolated vertices).
-
Let denote the set of components with more than one vertex.
We first introduce the following simple reduction rule, which removes all connected components of such that . The safeness of this rule is immediate.
Reduction Rule 1.
Let be a connected component such that every vertex satisfies . In this case, remove from the graph. The resulting instance is .
Next, we aim to bound the number of connected components in that contain more than one vertex, i.e., the components in the set . To achieve this, we make use of the Expansion Lemma.
Definition 19 (-Expansion).
Let be a bipartite graph with vertex bipartition , and let , . We say that has a -expansion into if there exists a collection of edges from to such that:
-
Each vertex in is incident to exactly of these edges, and
-
The endpoints of these edges in are distinct (i.e., no vertex in is incident to more than one of the selected edges).
Equivalently, there exists a collection of pairwise vertex-disjoint matchings from into .
Lemma 20 (Expansion Lemma [19]).
Let be a positive integer, and let be a bipartite graph with vertex bipartition such that:
-
(i)
, and
-
(ii)
There are no isolated vertices in .
Then, there exist nonempty vertex sets and such that:
-
The set has a -expansion into , and
-
No vertex in has a neighbor outside , that is, .
Furthermore, the sets and can be found in time .
To bound the number of non-trivial connected components in , we apply the -Expansion Lemma. Specifically, we aim to upper-bound the number of components in . To this end, we construct a bipartite graph that captures the adjacency between the approximate deletion set and the components in . Formally, the bipartition of is defined as follows:
-
The left side of the bipartition corresponds to the set , and
-
The right side contains one representative vertex for each connected component .
We add an edge between a vertex and a component if and only if is adjacent (in ) to at least one vertex in ; that is, if . We slightly abuse notation by referring to both a connected component and its corresponding vertex in using the same symbol.
Note that by Reduction Rule 1, every component in has at least one neighbor in ; hence, there are no isolated vertices on the right side of the bipartite graph . We now apply the Expansion Lemma (Lemma 20) to this bipartite graph to argue that if is too large relative to , then the instance must be a No-instance. This leads to the following reduction rule:
Reduction Rule 2.
If , then apply the algorithm guaranteed by the Expansion Lemma to compute sets and such that has a -expansion into and . Remove the vertices in from and reduce the parameter accordingly. The resulting instance is .
To establish the correctness of Reduction Rule 2, we prove the following lemma.
Lemma 21 ().
Reduction Rule 2 is safe; that is, the instance of Vertex Deletion to -VCC is a Yes-instance if and only if the reduced instance is a Yes-instance.
After exhaustively applying Reduction Rules 1 and 2, we obtain that the number of non-trivial components in satisfies the bound . However, if we remove these connected components one by one as suggested, the overall running time may no longer remain bounded by .
To address this, we proceed as follows. Let be the subset of vertices in that have at least neighbors on the right side of the bipartite graph , i.e., into . We claim that every solution of size at most must contain all vertices in .
Indeed, suppose for contradiction that there exists a vertex . Since has at least neighbors in , and contains at most vertices, there are at least components in that are completely disjoint from . But then, in the graph , the vertex , along with these components, induces a subgraph that requires a vertex cover of size at least , contradicting the assumption that is a valid solution (i.e., ). This proves that all vertices in must belong to any valid solution of size at most .
Reduction Rule 3.
Let be the subset of vertices in that have at least neighbors on the right side of the bipartite graph , i.e., into . The new instance is .
Since the number of edges in the bipartite graph is upper bounded by the number of edges in , we can compute the set in time . After removing , let denote the set of components that become isolated on the right side of . These correspond to all connected components such that . By applying Reduction Rule 1, we can remove all such components in simultaneously. This step also takes linear time. Following this reduction, the number of remaining non-trivial components – i.e., the size of , which corresponds to the non-isolated vertices on the right side of the updated bipartite graph – is upper bounded by . We now apply Reduction Rule 2 exhaustively on this reduced bipartite graph to further shrink to at most components. The number of times Reduction Rule 2 needs to be applied is at most , and each invocation takes time . Since the number of edges in is at most and the number of vertices is at most , the total time required for this step is bounded by . We summarize our results in the following lemma.
Lemma 22.
Given an instance of Vertex Deletion to -VCC and a set of size at most such that , we can compute, in time , an equivalent instance of Vertex Deletion to -VCC, where is an induced subgraph of , satisfying the following properties:
-
The graph has no connected component such that , and
-
The number of non-trivial connected components (i.e., ) in is at most .
Let be the equivalent instance of Vertex Deletion to -VCC obtained from by application of Lemma 22. For clarity of presentation, we rename back to . We now enhance the set to a new set such that is an independent set; in other words, is a vertex cover of . To this end, observe that each connected component in has a vertex cover of size at most , and by previous reductions, the number of such components is at most . In particular, we get the following.
Lemma 23 ().
Let be an instance of Vertex Deletion to -VCC where has no connected component with vertex cover at most , and let be a set of size at most such that . Then, in time, we can compute a set such that:
-
is a vertex cover of , and
-
.
Next, we apply a marking scheme to bound the size of the independent set obtained after augmenting .
Marking Scheme 1.
The marking procedure is defined as follows:
- (1)
-
For each , mark neighbors of in .
- (2)
-
For each pair , mark common neighbors of and in .
Let be the set of marked vertices. After marking the vertices of we give the following reduction rule.
Reduction Rule 4.
Let be the set of unmarked vertices of . The new instance is .
To prove the safeness of above Reduction 4, we proof the following lemma.
Lemma 24 ().
The instance of Vertex Deletion to -VCC is a Yes-instance if and only if is a Yes-instance of Vertex Deletion to -VCC.
We now describe an implementation of Reduction Rule 4.
For every pair of vertices , we maintain a set , where may be equal to . Along with each such pair, we keep a counter that tracks the size of . Initially, for all , we set and .
Next, we iterate over each vertex and perform the following operations:
-
For every neighbor , if , then add to and increment the counter: .
-
For every unordered pair with , if , then add to and increment the counter: .
Note that since is an independent set, each vertex has all of its neighbors in . By earlier bounds, we know that , and hence . Therefore, for each vertex , we perform at most operations.
As a result, the entire marking procedure (Marking Scheme 1) and application of Reduction Rule 4 can be carried out in total time .
Lemma 25.
Reduction Rule 4 can be applied in time .
The size of the kernel is upper bounded by the number of vertices in and those in the marked sets. Specifically, the total number of vertices is bounded by:
Furthermore, if the number of edges in the reduced graph exceeds , we can safely return a trivial No-instance, as such a graph cannot admit a solution of size at most within the class . The running time of the overall algorithm is computed as follows:
-
Approximate solution: Compute the set , an approximate solution, using Lemma 18, in time .
Hence, the overall running time of the algorithm is
This gives us the following result.
Theorem 1. [Restated, see original statement.]
Vertex Deletion to -VCC admits a kernel with at most vertices and edges. Furthermore, it can be computed in time .
5 Linear Vertex Kernel for Edge Deletion to -Vertex Cover Components
In this section, we design a polynomial kernel for Edge Deletion to -VCC. Our approach follows the same high-level outline as the one used in Section 4. As a first step, we compute an -approximate solution for Edge Deletion to -VCC.
5.1 Approximation Algorithm for Edge Deletion to -VCC
Lemma 26.
Let be an instance of Edge Deletion to -VCC. Then, in time , we can either find a set of size such that , or correctly conclude that is a No-instance of Edge Deletion to -VCC.
Proof.
We iteratively construct a solution of size by repeatedly enumerating edge-disjoint obstructions of bounded size (in number of edges). This is accomplished by identifying a minimal obstruction for induced subgraph relation using Theorem 3. Note that this may not be minimal obstruction with respect to subgraph relation. From Lemma 15, the number of vertices in the obstruction is upper bounded by , hence the number of edges by . We also maintain a family of induced subgraphs and a counter . The procedure proceeds as follows:
-
1.
While contains a minimal obstruction and , do:
-
(a)
Add to the solution: ,
-
(b)
Add to the family: ,
-
(c)
Increment the counter: ,
-
(d)
Remove from .
-
(a)
-
2.
If , return that is a No-instance of Edge Deletion to -VCC.
-
3.
Otherwise, return the set .
Correctness follows from the fact that the obstructions added to are pairwise edge-disjoint. Since each obstruction is a minimal forbidden induced subgraph for , any feasible solution must delete at least one edge from each obstruction to ensure that the resulting graph belongs to . Thus, if we identify such disjoint obstructions, any solution must contain at least edges, implying that is a No-instance.
In the other case, we remove at most disjoint obstructions. By Lemma 15, each obstruction contains at most vertices. Hence, the total number of edges added to is at most . This is indeed an approximate solution for Edge Deletion to -VCC. Moreover, since contains no induced obstruction, we have , and therefore .
Finally, since we invoke Theorem 3 at most times, the overall running time is bounded accordingly. This concludes the proof.
5.2 Marking and Irrelevant Vertices
From this point onward, we assume that we are given a set of size such that . We now examine the connected components of the graph . Let denote the set of connected components of . Note that for each , where , we have . For each , we compute a minimum vertex cover using the best-known FPT algorithm for Vertex Cover parameterized by the solution size , which runs in time (see Proposition 17). We define the set
Let denote the number of edges in that are incident to at least one vertex in . Then is at most , since each edge in can be charged to at most two distinct components of in the summation, and no edge outside contributes to . We now describe a marking scheme that will be used to identify irrelevant vertices, enabling their removal from in order to obtain a kernel for the instance .
Marking Scheme 2.
The marking scheme is defined as follows:
- (1)
-
For all , we do the following. For each pair with , mark common neighbors of and in .
- (2)
-
For all and , mark neighbors of in .
Let be the set of marked vertices obtained from the above marking scheme.
Reduction Rule 5.
Let be the set of unmarked vertices in . The new instance is .
For the correctness of Reduction 5, we state and prove the following lemmata.
Lemma 27 ().
Let be a Yes-instance of Edge Deletion to -VCC, and let be a solution such that . Let be the connected components of . Then there exists a solution of size at most such that:
where denotes the number of edges in that are incident to at least one vertex in .
Finally, we present a lemma that establishes the safeness of the reduction rule.
Lemma 28 ().
is a Yes-instance of Edge Deletion to -VCC if and only if is a Yes-instance of Edge Deletion to -VCC.
Similar to implementation of Marking 1 and Reduction 4, we can implement Marking 2 and Reduction 5 in time .
Lemma 29.
Reduction Rule 5 can be applied in time .
Theorem 2. [Restated, see original statement.]
Edge Deletion to -VCC admits a kernel with at most vertices and edges. Furthermore, it can be computed in time .
Proof.
Let be an instance of Edge Deletion to -VCC. We begin by exhaustively removing all connected components of such that . These components already satisfy the target property and require no further edge deletions. After this preprocessing step, every remaining connected component must require at least one edge deletion. This can be performed in (see Proposition 17).
Next, we apply Reduction 5 to further simplify the instance in time by Lemma 29. We now analyze the structure of the remaining graph and proceed to bound its size.
-
a -sized vertex cover of , for all
-
the vertices of those are marked by Marking 2, for all .
-
.
Finally, we are ready to bound the number of vertices in the reduced graph. For a fixed , Marking 2 marks vertices, i.e., vertices of . So, the total number of marked vertices in one connected component of is at most . Hence for each , there are vertices of in the reduced graph. Summing over all the connected components in , we get that the number of vertices in the reduced graph is
We now proceed to bound the number of edges. Let be a connected component of . There are at most edges in the vertex cover of of size at most . In , is an independent set and hence there are no edges in . Because of the reduction rule, the number of vertices in is at most , and therefore the number of edges between and is at most . So the total number of edges in component is at most . Hence, the total number of edges in the graph is , which is (since ). Hence, the number of edges is bounded by . This concludes the theorem.
6 Conclusion
In this paper, we designed an almost-linear-time kernelization algorithm for clustering into components with bounded vertex cover number. An interesting direction for future work is to extend this framework to other notions of structural simplicity in clusters – such as bounded feedback vertex number or bounded odd cycle transversal.
Another natural question is whether our algorithm can be further optimized to achieve fully linear-time preprocessing. Additionally, designing a kernel for the vertex-deletion variant whose size is linear in the number of vertices remains an intriguing open problem.
References
- [1] Tara Abrishami, Maria Chudnovsky, Cemil Dibek, and Pawel Rzazewski. Polynomial-time algorithm for maximum independent set in bounded-degree graphs with no long induced claws. In Joseph (Seffi) Naor and Niv Buchbinder, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, pages 1448–1470. SIAM, 2022. doi:10.1137/1.9781611977073.61.
- [2] Faisal N. Abu-Khzam. An improved kernelization algorithm for r-set packing. Inf. Process. Lett., 110(16):621–624, 2010. doi:10.1016/J.IPL.2010.04.020.
- [3] Faisal N. Abu-Khzam. A kernelization algorithm for d-hitting set. J. Comput. Syst. Sci., 76(7):524–531, 2010. doi:10.1016/J.JCSS.2009.09.002.
- [4] Noga Alon, Asaf Shapira, and Benny Sudakov. Additive approximation for edge-deletion problems. Annals of Mathematics, 170(1):371–411, 2009. URL: http://www.jstor.org/stable/40345467.
- [5] Matthias Bentert, Michael R. Fellows, Petr A. Golovach, Frances A. Rosamond, and Saket Saurabh. Breaking a graph into connected components with small dominating sets. In Rastislav Královic and Antonín Kucera, editors, 49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024, August 26-30, 2024, Bratislava, Slovakia, volume 306 of LIPIcs, pages 24:1–24:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.MFCS.2024.24.
- [6] Hans L. Bodlaender, Pinar Heggernes, and Daniel Lokshtanov. Graph modification problems (dagstuhl seminar 14071). Dagstuhl Reports, 4(2):38–59, 2014. doi:10.4230/DAGREP.4.2.38.
- [7] Andreas Brandstädt, Van Bang Le, and Jeremy P. Spinrad. Graph Classes: A Survey. Society for Industrial and Applied Mathematics, 1999. doi:10.1137/1.9780898719796.
- [8] Jonathan F. Buss and Judy Goldsmith. Nondeterminism within P. SIAM J. Comput., 22(3):560–572, 1993. doi:10.1137/0222038.
- [9] Yixin Cao and Dániel Marx. Interval deletion is fixed-parameter tractable. ACM Trans. Algorithms, 11(3):21:1–21:35, 2015. doi:10.1145/2629595.
- [10] Yixin Cao and Dániel Marx. Chordal editing is fixed-parameter tractable. Algorithmica, 75(1):118–137, 2016. doi:10.1007/S00453-015-0014-X.
- [11] Jianer Chen, Qin Huang, Iyad Kanj, and Ge Xia. Near-optimal algorithms for point-line covering problems. In Petra Berenbrink and Benjamin Monmege, editors, 39th International Symposium on Theoretical Aspects of Computer Science, STACS 2022, March 15-18, 2022, Marseille, France (Virtual Conference), volume 219 of LIPIcs, pages 21:1–21:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.STACS.2022.21.
- [12] Jianer Chen, Qin Huang, Iyad Kanj, and Ge Xia. Nearly time-optimal kernelization algorithms for the line-cover problem with big data. Algorithmica, 86(8):2448–2478, 2024. doi:10.1007/S00453-024-01231-6.
- [13] Jianer Chen, Iyad A. Kanj, and Weijia Jia. Vertex cover: Further observations and further improvements. J. Algorithms, 41(2):280–301, 2001. doi:10.1006/JAGM.2001.1186.
- [14] Benny Chor, Mike Fellows, and David W. Juedes. Linear kernels in linear time, or how to save k colors in o(n) steps. In Juraj Hromkovic, Manfred Nagl, and Bernhard Westfechtel, editors, Graph-Theoretic Concepts in Computer Science, 30th International Workshop,WG 2004, Bad Honnef, Germany, June 21-23, 2004, Revised Papers, volume 3353 of Lecture Notes in Computer Science, pages 257–269. Springer, 2004. doi:10.1007/978-3-540-30559-0_22.
- [15] Christophe Crespelle, Pål Grønås Drange, Fedor V. Fomin, and Petr A. Golovach. A survey of parameterized algorithms and the complexity of edge modification. Comput. Sci. Rev., 48:100556, 2023. doi:10.1016/J.COSREV.2023.100556.
- [16] Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. doi:10.1007/978-3-319-21275-3.
- [17] Reinhard Diestel. Graph Theory, 5th Edition, volume 173 of Graduate texts in mathematics. Springer, 2017. doi:10.1007/978-3-662-53622-3.
- [18] Michael R. Fellows. Blow-ups, win/win’s, and crown rules: Some new directions in FPT. In Hans L. Bodlaender, editor, Graph-Theoretic Concepts in Computer Science, 29th International Workshop, WG 2003, Elspeet, The Netherlands, June 19-21, 2003, Revised Papers, volume 2880 of Lecture Notes in Computer Science, pages 1–12. Springer, 2003. doi:10.1007/978-3-540-39890-5_1.
- [19] F.V. Fomin, D. Lokshtanov, S. Saurabh, and M. Zehavi. Kernelization: Theory of Parameterized Preprocessing. Cambridge University Press, 2019.
- [20] Martin Charles Golumbic. Algorithmic graph theory and perfect graphs. Elsevier, 2004.
- [21] David G. Harris and N. S. Narayanaswamy. A faster algorithm for vertex cover parameterized by solution size. In Olaf Beyersdorff, Mamadou Moustapha Kanté, Orna Kupferman, and Daniel Lokshtanov, editors, 41st International Symposium on Theoretical Aspects of Computer Science, STACS 2024, March 12-14, 2024, Clermont-Ferrand, France, volume 289 of LIPIcs, pages 40:1–40:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.STACS.2024.40.
- [22] John E. Hopcroft and Richard M. Karp. An n algorithm for maximum matchings in bipartite graphs. SIAM J. Comput., 2(4):225–231, 1973. doi:10.1137/0202019.
- [23] Falk Hüffner, Christian Komusiewicz, Hannes Moser, and Rolf Niedermeier. Fixed-parameter algorithms for cluster vertex deletion. Theory Comput. Syst., 47(1):196–217, 2010. doi:10.1007/S00224-008-9150-X.
- [24] Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Comput. Syst. Sci., 63(4):512–530, 2001. doi:10.1006/JCSS.2001.1774.
- [25] Bart M. P. Jansen and Hans L. Bodlaender. Vertex cover kernelization revisited - upper and lower bounds for a refined parameter. Theory Comput. Syst., 53(2):263–299, 2013. doi:10.1007/S00224-012-9393-4.
- [26] Bart M. P. Jansen, Daniel Lokshtanov, and Saket Saurabh. A near-optimal planarization algorithm. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1802–1811. SIAM, 2014. doi:10.1137/1.9781611973402.130.
- [27] George Karypis, Christian Schulz, Darren Strash, Deepak Ajwani, Rob H. Bisseling, Katrin Casel, Ümit V. Çatalyürek, Cédric Chevalier, Florian Chudigiewitsch, Marcelo Fonseca Faraj, Michael R. Fellows, Lars Gottesbüren, Tobias Heuer, Kamer Kaya, Jakub Lacki, Johannes Langguth, Xiaoye Sherry Li, Ruben Mayer, Johannes Meintrup, Yosuke Mizutani, François Pellegrini, Fabrizio Petrini, Frances A. Rosamond, Ilya Safro, Sebastian Schlag, Roohani Sharma, Blair D. Sullivan, Bora Uçar, and Albert-Jan Yzelman. Recent trends in graph decomposition (dagstuhl seminar 23331). Dagstuhl Reports, 13(8):1–45, 2023. doi:10.4230/DAGREP.13.8.1.
- [28] Ken-ichi Kawarabayashi and Bruce A. Reed. Computing crossing number in linear time. In David S. Johnson and Uriel Feige, editors, Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007, pages 382–390. ACM, 2007. doi:10.1145/1250790.1250848.
- [29] Christian Komusiewicz and Johannes Uhlmann. Cluster editing with locally bounded modifications. Discret. Appl. Math., 160(15):2259–2270, 2012. doi:10.1016/J.DAM.2012.05.019.
- [30] John M. Lewis and Mihalis Yannakakis. The node-deletion problem for hereditary properties is np-complete. J. Comput. Syst. Sci., 20(2):219–230, 1980. doi:10.1016/0022-0000(80)90060-4.
- [31] Dániel Marx. Parameterized graph separation problems. Theor. Comput. Sci., 351(3):394–406, 2006. doi:10.1016/j.tcs.2005.10.007.
- [32] N. S. Narayanaswamy, Venkatesh Raman, M. S. Ramanujan, and Saket Saurabh. LP can be a cure for parameterized problems. In Christoph Dürr and Thomas Wilke, editors, 29th International Symposium on Theoretical Aspects of Computer Science, STACS 2012, February 29th - March 3rd, 2012, Paris, France, volume 14 of LIPIcs, pages 338–349. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2012. doi:10.4230/LIPICS.STACS.2012.338.
- [33] George L. Nemhauser and Leslie E. Trotter Jr. Properties of vertex packing and independence system polyhedra. Math. Program., 6(1):48–61, 1974. doi:10.1007/BF01580222.
- [34] René van Bevern. Towards optimal and expressive kernelization for d-hitting set. Algorithmica, 70(1):129–147, 2014. doi:10.1007/S00453-013-9774-3.
- [35] Mihalis Yannakakis. Edge-deletion problems. SIAM J. Comput., 10(2):297–309, 1981. doi:10.1137/0210021.
