A randomized polynomial kernel for Subset Feedback Vertex Set

The Subset Feedback Vertex Set problem generalizes the classical Feedback Vertex Set problem and asks, for a given undirected graph $G=(V,E)$, a set $S \subseteq V$, and an integer $k$, whether there exists a set $X$ of at most $k$ vertices such that no cycle in $G-X$ contains a vertex of $S$. It was independently shown by Cygan et al. (ICALP '11, SIDMA '13) and Kawarabayashi and Kobayashi (JCTB '12) that Subset Feedback Vertex Set is fixed-parameter tractable for parameter $k$. Cygan et al. asked whether the problem also admits a polynomial kernelization. We answer the question of Cygan et al. positively by giving a randomized polynomial kernelization for the equivalent version where $S$ is a set of edges. In a first step we show that Edge Subset Feedback Vertex Set has a randomized polynomial kernel parameterized by $|S|+k$ with $O(|S|^2k)$ vertices. For this we use the matroid-based tools of Kratsch and Wahlstr\"om (FOCS '12) that for example were used to obtain a polynomial kernel for $s$-Multiway Cut. Next we present a preprocessing that reduces the given instance $(G,S,k)$ to an equivalent instance $(G',S',k')$ where the size of $S'$ is bounded by $O(k^4)$. These two results lead to a polynomial kernel for Subset Feedback Vertex Set with $O(k^9)$ vertices.


Introduction
In the subset feedback vertex set (subset fvs) problem we are given an undirected graph G = (V, E), a set of vertices S ⊆ V , and an integer k, and have to determine whether there is a set X of at most k vertices that intersects all cycles that contain at least one vertex of S. Clearly, because we can choose S = V , this is a generalization of the well-studied feedback vertex set (fvs) problem where, given G and k, we have to determine whether some set X of at most k vertices intersects all cycles in G. feedback vertex set has been extensively studied in parameterized complexity: It is known to be fixed-parameter tractable (FPT) with parameter k, i.e., solvable in time f (k) · |V | c , and after a series of improvements the fastest known algorithms take deterministic time O * (3.619 k ) [10] and randomized time O * (3 k ) [2]. It is also known to admit a polynomial kernelization [1], i.e., there is an efficient algorithm that reduces any instance (G, k) of fvs to an equivalent instance of size polynomial in k; the best known kernelization creates an equivalent instance with O(k 2 ) vertices [19].
In 2011, Cygan et al. [3] and Kawarabayashi and Kobayashi [9] independently showed that subset fvs is FPT. The algorithm of Cygan et al. runs in time 2 O(k log k) n O (1) , while the one of Kawarabayashi and Kobayashi runs in time O(f (k) · n 2 m). Wahlström [21] then gave the first single-exponential algorithm with running time 4 k · n O(1) ; an algorithm with subexponential dependence on k is ruled out under the Exponential-Time Hypothesis (e.g., because subset fvs generalizes vertex cover). More recently, Lokshtanov et al. [12] gave algorithms with deterministic time 2 O(k log k) · (n + m) and randomized time O(25.6 k · (n + m)).
Cygan et al. [3] ask whether the subset fvs problem also admits a polynomial kernelization and suggest that the matroid-based tools of Kratsch and Wahlström [11] could be applicable. The latter work uses representative sets of independent sets in matroids to obtain, amongst others, polynomial kernels for s-multiway cut and deletable terminal multiway cut (dtmwc) with O(k s+1 ) and O(k 3 ) vertices, respectively. In multiway cut we are given a graph G = (V, E), a set T ⊆ V of terminals, and an integer k and have to determine whether deletion of at most k non-terminal vertices separates all terminals. In s-multiway cut the terminal set has size at most s, and in dtmwc we are also allowed to delete terminals (which is essentially the same as restricting terminals to be degree one).
Interestingly, Cygan et al. [3] also provide a polynomial-time reduction from multiway cut to subset fvs that does not change the parameter value and, hence, is known to imply that subset fvs is at least as hard as multiway cut regarding existence of polynomial kernels. Accordingly, multiway cut would be the natural next target problem for attempting to find a polynomial kernelization (after s-multiway cut and deletable terminal multiway cut). It appears, however, that the reduction of Cygan et al. is from deletable terminal multiway cut rather than from the more general multiway cut, and it is not obvious whether similar ideas could yield a reduction from multiway cut to subset fvs.
Our work. We apply the matroid-based tools of Kratsch and Wahlström [11] and develop a randomized polynomial kernelization that reduces instances (G, S, k) of subset fvs to equivalent instances with at most O(k 9 ) vertices; this is our main result. Similarly to Cygan et al. [3] we also work on edge subset fvs where S is a set of edges of G and X needs to intersect all cycles that contain at least one edge of S; edge subset fvs and subset fvs are equivalent [3]. The result is obtained in two parts.
In the first part (Section 3) we establish a randomized polynomial kernelization for edge subset fvs parameterized by |S|+ k that reduces to equivalent instances with at most O(|S| 2 k) vertices. Note that nontrivial instances have k < |S| since one could otherwise remove S by deleting one endpoint of each edge in S. Thus, parameterization by |S| suffices, but O(|S| 2 k) gives a tighter overall bound than O(|S| 3 ).
At high level, this part is similar to the polynomial kernelization for deletable terminal multiway cut. We show that certain solutions X, later called dominant solutions, allow particular path packings in the underlying graph G. For deletable terminal multiway cut this is achieved by a fairly simple replacement argument for solutions X that are not sufficiently well connected to connected components of G − X. For edge subset fvs the endpoints T = V (S) of edges in S can be regarded as terminals, but this gives a different separation property: Solutions X need not generate many connected components in G − X since only S-cycles need to be prevented, and components may contain many vertices of T . Rather, in G − X there must be a tree-like (or forest-like) structure with components without S-edges playing the role of nodes and with edges given by S. Nevertheless, using the tree-like structure, a replacement argument can be found, implying that dominant solutions must create many components in (G − X) − S containing vertices of T and be well connected to them. This allows to set up a gammoid on G − S with sources T and apply, as in [11], a result of Lovász [13] (made algorithmic by Marx [15]) on representative sets in (linear) matroids that is then guaranteed to generate a superset of X. Randomization is only needed to generate a matrix representation for the gammoid.
In the second part (Section 4) we give a (deterministic) polynomial-time preprocessing that, given an instance (G, S, k) of edge subset fvs, returns an equivalent instance (G ′ , S ′ , k ′ ) with k ′ ≤ k and |S ′ | ∈ O(k 4 ). Together with the randomized kernelization from the first part this implies the claimed randomized kernelization to O(k 9 ) vertices.
A reduction of the number of S-edges is also a crucial ingredient in the FPT algorithm for edge subset fvs by Cygan et al. [3]. They achieve |S| ∈ O(k 3 ), but it is in a slightly more favorable setting: Using iterative compression, it suffices to solve the task of finding a solution X ′ of size k when given a solution X of size k + 1. (This is well known in parameterized complexity, and we prefer not to repeat it here.) Considering some unknown solution X ′ of size k, one can guess the intersection D of X ′ with X, by trying all O(2 k+1 ) possibilities. For the correct guess D = X ′ ∩ X, the remaining problem is to find for (G − D, S \ D, k − |D|) a solution Z ′ of size at most k − |D| that is disjoint from Z = X \ D, since Z ′ = X ′ \ D would be such a solution; here S \ D denotes the set of edges in S with no endpoint in D. Cygan et al. make the nice observation that the guessing also allows to assume that there is no other solution X ′ with an even larger intersection with X.
In contrast, we cannot afford to run iterative compression for a kernelization to get a starting solution of size k + 1 and, as is common, we have to start with an approximate solution Z, which can be assumed to be of size at most 8k using an 8-approximation algorithm of Even et al. [6]. The idea of guessing the intersection of an optimal solution with Z is infeasible regarding both time and the number of created instances. Thus, while several structures like z-flowers or disjoint x,y-paths containing S-edges appear in both approaches, many things have to be handled differently. For example, having k + 2 disjoint x,y-paths containing S-edges for x, y ∈ Z implies that one of x and y must be in every solution of size k; Cygan et al. can stop here because the solution would not be disjoint from Z; we need to instead store the information about x and y to later detect S-edges that can be safely removed. Like Cygan et al., we also use Gallai's A-path Theorem but we avoid the 2-expansion lemma by using the properties of a blocking set of size at most 2k differently. (Such a blocking set can be found if certain flowers of order k + 1 do not exist, using Gallai's A-path Theorem.) Cygan et al. compute a blocking set B of size at most 3k to find an F -flower of order |X| (with F ⊆ V outer-abundant; see [3,Definition 3.4]) under the assumption that certain F -flowers of order k + 1 do not exist and show that there exists a solution that contains X (under the assumption that there exists a solution that is disjoint from F ). We cannot assume that our solution is disjoint from F and have to take another approach. Moreover, we observe that z-flowers can be found via matroid parity on an appropriate gammoid. 1

Preliminaries
We use standard graph notation, mostly following Diestel [5]. All graphs are undirected and may contain multi-edges and loops; accordingly, they may contain cycles of length one and two (formed by loops and multi-edges, respectively.) An edge e ∈ E is called a bridge if (V, E \ {e}) has more connected components than G. For a set X ⊆ V , let G[X] denote the subgraph of G induced by X and let N G (X) denote the neighborhood of X in G, i.e., N G (X) = {v ∈ V \ X | ∃u ∈ X : {u, v} ∈ E}. Given two sets X, Y ⊆ V , by E(X, Y ) we denote the set of edges that have one endpoint in X and one endpoint in Y . For a set E ′ ⊆ E of edges let V (E ′ ) be the set of vertices that are incident with at least one edge in E ′ . For X ⊆ V and Note that the graph (G − X) − F is the same graph as the graph (G − F ) − X and we will drop the parentheses.
For A ⊆ V a path with endpoints in A and internal vertices not in A is called an A-path. The following theorem about A-paths was already used by Cygan et al. [3] for subset fvs and in the quadratic kernelization for feedback vertex set by Thomassé [19].
Theorem 1 (Gallai [8]). Let A ⊆ V and k ∈ N. If the maximum number of vertex-disjoint A-paths is strictly less than k + 1, then there exists a set B ⊆ V of at most 2k vertices that intersect every A-path.
In particular it is possible to find either (k + 1)-disjoint A-paths or a set B that intersects all A-paths in polynomial time. This follows from Schrijver's proof of Gallai's theorem [18].
Let (G, S, k) be an instance of the edge subset fvs problem. We call a cycle C an S-cycle, if at least one edge of S is contained in C. Let x be a vertex of V . A set {C 1 , C 2 , . . . , C t } of S-cycles that contain x is called an x-flower of order t, if the sets of vertices C i \ {x} are pairwise disjoint. Note that if there exists a x-flower of order at least k + 1, then the vertex x must be in every solution for (G, S, k), if one exists. A set B ⊆ V \ {x} of size t is called an x-blocker of size t, if each S-cycle through x also contains at least one vertex of B.
Parameterized complexity. A parameterized problem is a language Q ⊆ Σ * × N, where Σ is any finite set. The second component of an instance (x, k) is called the parameter. We say that a parameterized problem Q is fixed-parameter tractable (FPT) if there exists a computable function f : N → N and an algorithm A that on input of (x, k) ∈ Q × Σ * takes time at most f (k) · |x| O(1) and correctly decides whether (x, k) ∈ Q. A kernelization of a parameterized problem Q is an algorithm K that on input of (x, k) ∈ Σ * × N takes time polynomial in |x| + k and returns an equivalent instance ( where h is a computable function. The function h is called the size of the kernel. We say that K is a polynomial kernelization if h(k) ∈ O(k c ) for some constant c. The polynomial kernelization obtained in this paper is randomized, which means that there is a small chance for the reduced instance to not be equivalent to the input. The error probability can be made exponentially small in the input size without increasing the size of the kernelization. Similarly to previous work [11], the only source for error is the need to compute a matrix representation for a particular matroid (preliminaries on matroids follow below).
Matroids, gammoids, and representative sets. A matroid M = (U, I) consists of a finite set U and a family I of subsets of U , called independent sets, fulfilling the following properties: (i) ∅ ∈ I; (ii) if X ⊆ Y and Y ∈ I then also X ∈ I; and (iii) if X, Y ∈ I with |X| < |Y | then there exists y ∈ Y \ X such that X ∪ {y} ∈ I. The rank of of a matroid M , denoted by r(M ), is the size of the largest independent set of the matroid M .
Let A be a matrix over an arbitrary field F . Let U be the set of columns of A and let I be the family of all sets X ⊆ U of columns that are linearly independent over F . Then M = (U, I) is a matroid, called the linear matroid or vector matroid of A, and we say that A represents M . If M = (U, I) is representable over some field, then it is also representable by an r(M ) × |U | matrix; by Gaussian elimination we can always reduce a representing matrix for M to one with r(M ) many rows (cf. [15]). Let M 1 = (U 1 , I 1 ) and M 2 = (U 2 , I 2 ) be two matroids with If A 1 and A 2 represent the two matroids over the same field F , then matrix Let G = (V, E) be a graph that may have both directed and undirected edges and let S ⊆ V . A set T ⊆ V is linked to S if there exist |T | vertex-disjoint paths from S to T . Thus every vertex in T is endpoint of a different path from S. It holds that M = (U, I), where U ⊆ V and I contains all sets T ⊆ U that are linked to S in G, is a matroid [17]. The matroid M is also called the gammoid on G with sources S and ground set U ; if U = V then M is also called a strict gammoid. Marx [15] gave a randomized polynomial-time procedure for finding a matrix representation of a strict gammoid. The error probability can be made exponentially small in the size of the graph. (This is the only source of randomness and error in our kernelization.) A matrix representation for a gammoid for graph G = (V, E) with ground set U V and sources S can be obtained from one for the strict gammoid for G and S by simply deleting columns corresponding to elements of V \ U .
Let A, B be independent sets in a matroid. We say that A extends B if A ∩ B = ∅ and A ∪ B is again an independent set. Note that from the independence of A∪B follows the independence of A and B due to the second matroid property. Definition 1. Let M = (U, I) be a matroid, let A ⊆ I, and let q ∈ N. A set A ′ ⊆ A is q-representative for A if for every independent set B of size at most q there is a set A ∈ A that extends B if and only if there is also a set A ′ ∈ A ′ that extends B.
Observe that if A ′ is q-representative for A and there exists a set A ∈ A that uniquely extends some given independent set I of size at most q, then this implies that A ∈ A ′ .
The following theorem of Lovász [13] proves that for any linear matroid there exist small representative sets. It was made algorithmic by Marx [15] and, thus, permits to find representative sets in polynomial time when given a matrix representation of the matroid. A faster algorithm for this task was developed recently by Fomin et al. [7].
Lemma 1 (Lovász [13], Marx [15]). Let M be a linear matroid of rank q + p, and let T = {I 1 , I 2 , . . . , I t } be a collection of independent sets, each of size p. If |T | > q+p p , then there is a set I ∈ T such that T \ {I} is q-representative for T . Furthermore, given a representation A of M , we can find such a set Given a gammoid M we can compute in randomized polynomial-time a representation of the gammoid. Together with Theorem 1 it follows that given a gammoid M and a collection T = {I 1 , . . . , I t } of independent sets, each of size p, we can find in randomized polynomial time a set T ′ ⊆ T of size at most q+p p that is q-representative for T .

Randomized polynomial kernelization for parameter |S| + k
In this section we present a randomized polynomial kernelization for edge subset fvs parameterized by |S| + k. Because deletion of one endpoint of each edge in S always constitutes a feasible solution, nontrivial instances have |S| > k. Thus, our kernelization also works for parameter |S| alone. However, to achieve a better bound for edge subset fvs parameterized by k only it is beneficial to give the kernel size in terms of |S| and k rather than |S| alone.
We use representative sets of independent sets of matroids to obtain a kernel of size O(|S| 2 k). Our approach is similar to the kernelization of deletable terminal multiway cut(k) [11]. As in that paper we construct path packings such that certain vertices can be shown to be in a representative set. Note that, unlike for multiway cut-type problems, a solution X ⊆ V will not necessarily create many connected components. Rather, as used also in the FPT algorithm of Cygan et al. [3], it creates a particular tree-like structure in G − X. Nevertheless, endpoints of edges in S, denoted T := V (S), will play the role of terminals that need to be separated in a certain way; hence a vertex x in T is called a terminal. We will focus on the graph G − S, i.e., with edges of S deleted, in which a solution X creates a grouping of (not deleted) terminals into connected components. The structure of these components will be crucial for a replacement argument (Lemma 3) that leads to the required path packing; this constitutes one of the key arguments for our result.
The kernelization consists of four steps. In the first step we show that if an instance is YES then there exists a solution X with a certain path packing from T to X. Then we define an appropriate gammoid to find in a next step a representative set of size O(|S| 2 k) which is (essentially) a superset of X using Lemma 1. Finally we explain how to reduce the graph G, using the superset of the last step, to obtain an equivalent instance of edge subset fvs.
Analyzing solutions. Let (G, S, k) be a yes-instance of edge subset fvs (k + |S|). We say that a solution X for (G, S, k) is dominant, if it has minimum size and contains a maximal number of vertices from T among solutions of minimum size. The vertices in X ∩ T correspond to endpoints of edges in S that we delete and the vertices in X 0 = X \ T block all x-y paths with {x, y} ∈ S 0 = {e ∈ S | e ∩ X = ∅}, except the one that consists of the edge {x, y}. We show that X is linked to T in a strong sense, with vertices of X 0 playing a special role.
Lemma 2. Let X be a dominant solution for (G, S, k) and x any vertex in the set X 0 = X \ T . There exist |X| + 2 paths from T to X in G − S that are vertex-disjoint except for three paths ending in vertex x. Moreover, the paths can be chosen in such a way that each connected component of G − X − S is intersected by at most one path.
We use Hall's Theorem and the lemma below to prove this. For this purpose we use the two graphs G − X and G − X − S which simplify the analysis of a dominant solution. We call a connected component K of G − X − S interesting if it contains a terminal, i.e., if T ∩ V (K) = (T \ X) ∩ V (K) = ∅, and we say that x ∈ X 0 sees an interesting component K if x is adjacent to a vertex of K in G. We extend this definition by saying that Y ⊆ X 0 sees an interesting component K if at least one vertex y ∈ Y sees K.
Lemma 3. If X is a dominant solution then every nonempty set Y ⊆ X 0 sees at least |Y | + 2 interesting components of G − X − S.
Proof. Assume for contradiction that there exists a nonempty set Y ⊆ X 0 that sees at most |Y | + 1 interesting components of G − X − S. Let C i denote the set of interesting components of G − X − S seen by Y , and let C o denote the other components seen by Y . We will show that there is an alternative solution X ′ = (X \ Y ) ∪ Y ′ that is smaller than X or that contains more vertices of T , contradicting the choice of X as a dominant solution. To this end, let us consider the graphs G − X and G − (X \ Y ) (in part repeating things that have been said earlier to get a self-contained proof).
In G − X the components of G − X − S may be connected by edges of S and form a tree-structure with components playing the role of vertices and edges of S whose endpoints are not deleted being the edges of the tree: (We say tree-structure, but a forest of components, connected by S-edges, is also fine.) There can be no cycles in this tree-structure because they would give rise to S-cycles in G − X. Moreover, any other set X ′ of size at most k such that G − X ′ consists of components without S-edges that are connected in a tree-like manner by S-edges is also a valid solution. Note that non-interesting components of G − X − S are isolated in G − X because they do not contain vertices of T , i.e., no endpoints of S-edges, so they cannot be incident with S-edges in G − X.
In G − (X \ Y ) − S the components in C i and C o may form larger combined components because we do not delete the vertices in Y ; let C ′ denote the set of these components. Crucially, because Y ⊆ X 0 = X \ T , there are no additional vertices of T , i.e., Note that, in general, G − (X \ Y ) will not have the tree-structure: In comparison to G − X we are not deleting vertices of Y , which corresponds to merging some components in C i ∪ C o . This may lead to components in C ′ that are incident with both endpoints of some S-edges (the equivalent of loops) and it may also create other (longer) cycles. We will see that deleting at most |Y | edges of S, i.e., deleting a set Y ′ of at most |Y | endpoints of S-edges, will suffice to get the tree-structure, making (X \ Y ) ∪ Y ′ a valid solution.
Consider a component C ′ ∈ C ′ of G − (X \ Y ) − S that fully contains all vertices of some components C 1 i , . . . , C a i ∈ C i and C 1 o , . . . , C b o ∈ C o ; additionally it may contain vertices of Y . (The fact that we must have full containment follows directly by comparing deletion of X from with S-edges and thus be part of a larger component C + ; we want to see that deleting (one endpoint each of) at most a − 1 S-edges from C + suffices to get the tree-structure.
In G − X instead of component C + we may have several separate components because we additionally delete the vertices of Y . Since Y sees only components in C i ∪ C o there are at most a + b separate components "created" from C + by deleting Y since these are all components contained in C + that are seen by Y . Recall that components in C o are isolated in G − X and contain no vertices of T and, thus, they do not contribute any S-edges to C + . It remains to consider the components C 1 i , . . . , C a i that are contained in C + . Assume first that all components C 1 i , . . . , C a i are part of a single connected component in G − X. (Recall that they are connected components of G − X − S but may be connected by Sedges in G − X.) Thus, they are part of a single tree of components (connected by S-edges) and not deleting Y corresponds to merging a vertices in this tree into a single one. If the tree had c components and, thus, c − 1 S-edges then we obtain c − a + 1 components that are connected by , to delete one endpoint of each of a − 1 S-edges, to obtain the tree-structure. (Not any a − 1 edges are ok but we can keep any c − a S-edges spanning the c − a + 1 components and delete the ( In general, the components C 1 i , . . . , C a i may be part of several different connected components in G − X. Nevertheless, this still means that we have a cycle-free structure of components (seen as vertices) connected by S-edges. If overall the cycle-free structure has c components then, being cycle-free, it has at most c − 1 S-edges. Thus, merging yields c − a + 1 components connected by at most c − 1 S-edges and removing at most a − 1 S-edges suffices.
Overall, we get that a component C ′ ∈ C ′ that fully contains a interesting components from C i requires at most a − 1 vertex deletions of endpoints of S-edges to obtain the tree-structure. Since Y sees at most |Y | + 1 such components, the worst case is achieved by a single component C ′ containing all |Y | + 1 interesting components in C i ; this still costs at most (|Y | + 1) − 1 = |Y | vertex deletions, as claimed.
Let Y ′ contain all the endpoints of S-edges that we delete to get the tree-structure. We know that |Y ′ | ≤ |Y | and thus |(X \ Y ) ∪ Y ′ | ≤ |X|. Moreover, by the initial considerations, we know that this contradicts optimality of X (required for being a dominant solution). If |Y ′ | = |Y | then Y ′ = ∅ and X ′ is an optimal solution that contains more vertices of T ⊇ Y ′ , contradicting the choice of X as a dominant solution. Thus, every nonempty set Y must see at least |Y | + 2 connected components, as claimed. Now we are ready to give the proof of Lemma 2. The argument relies on Hall's Theorem and is similar to the one for deletable terminal multiway cut [11].
Proof of Lemma 2. We know that every nonempty set Y ⊆ X 0 sees at least |Y | + 2 interesting components. To prove existence of the required path packing we construct a bipartite graph where one side consists of the interesting components and the other side consists of the set X 0 and two copies x ′ , x ′′ of the vertex x ∈ X 0 . We connect v ∈ X 0 with an interesting component K if v sees K and we connect x ′ and x ′′ with the same interesting components as x. For this bipartite graph it holds that for all sets Y ⊆ X 0 ∪ {x ′ , x ′′ }, the size of N (Y ) is at least |Y |: This holds trivially for Y = ∅; assume there exists a nonempty set Since Hall's condition is satisfied there exists a matching M that covers X 0 ∪ {x ′ , x ′′ }. This matching gives rise to a path packing from T to X where exactly three paths end in x and no other vertices occur in more than one path: For each v ∈ X ∩ T pick the path of length zero that consists only of v.
the path contains no other vertices of X. Similarly, the path cannot contain S-edges between vertices of K, and its final edge to v cannot be in S because v ∈ X 0 = X \ T , i.e., because v is not endpoint of any S-edge. Moreover, since each interesting component is matched to a single vertex v ∈ X 0 ∩ {x ′ , x ′′ }, all the paths are vertex-disjoint except for the three paths that share their endpoint x. This path packing, including the trivial paths from X ∩ T to X ∩ T , contains |X| + 2 paths from T to X in G − S that are vertex-disjoint except for the three paths sharing endpoint x. By construction, there is at most one path to any vertex of X 0 starting in any interesting component K of G − X − S, because the components are used according to the matching M . All further paths are of length zero, consisting of only a vertex in X ∩ T and are, thus, not contained in components of G − X − S.
Setting up the gammoid. The gammoid M that we use is the direct sum of two gammoids M 1 and M 2 . To construct gammoid M 1 we define a graph G 1 = (V 1 , E 1 ) that is obtained from G − S by adding two so called sink-only copies v ′ and v ′′ for every vertex v ∈ V . A sinkonly copy of a vertex v is a vertex v ′ (or v ′′ ) that has a directed edge (u, v ′ ) for each edge {u, v}; these were already used in previous work [11]. Note that adding sink-only copies of vertices does not affect the possible path packings to other vertices since they can only be endpoints of paths; however, they are convenient to capture multiple vertex-disjoint paths that, intuitively, end in the same vertex. The matroid M 1 is defined to the gammoid on G 1 with sources T = V (S) and ground set note that the sink-only copies of vertices in T are not sources of M 1 . The rank of matroid M 1 is |T |, because the set of all trivial paths is independent and at most |T | vertices can be linked to T .
Matroid M 2 is the gammoid on the directed graph G 2 = K k,n = (S 2∪V , E 2 ) with sources S 2 and ground setV = {v | v ∈ V }; the edges in E 2 are directed from S 2 toV . In other words, gammoid M 2 is simply a uniform matroid and a (deterministic) matrix representation could also be obtained by using a Vandermonde matrix. The rank of M 2 is k = |S 2 | because no more than |S 2 | vertices can be linked to S 2 and every set of at most k vertices ofV is linked to S 2 .
For the application of Lemma 1 we will use the matroid M = M 1 ⊕M 2 , which has rank |T |+k. (Matroid M can also be seen as a gammoid on the graph G 1∪ G 2 with appropriate sources and ground set but we prefer the explicit direct sum and the implied block-diagonal representation obtained below.) Representations A 1 and A 2 for both M 1 and M 2 can be computed by a randomized polynomial-time algorithm with exponentially small error chance [15]; hence we get a representation for M by diag(A 1 , A 2 ), i.e., the block-diagonal matrix with blocks A 1 and A 2 . We may assume that A 1 has |T | rows and A 2 has k rows since this could be achieved by Gaussian elimination (cf. [15]).
Applying the representative set lemma. Let T := {{v ′ , v ′′ ,v} | v ∈ V }. For clarity, by the above notation, this means that v ′ , v ′′ ∈ V 1 andv ∈V for each v ∈ V . Using Lemma 1 we will prove that we can compute in randomized polynomial time a , since we can compute a matrix representation of M in randomized polynomial-time as described above. We will see later that we can find a (|T | + k − 3)-representative set of size O(|S| 2 k) by a careful look at the proof of Lemma 1, using the fact that M is the direct sum of two gammoids and that all sets {v ′ , v ′′ ,v} in T have two elements from the first and one element from the second gammoid; a similar argument for getting a smaller representative set was already used by Kratsch and Wahlström [11].
To ensure that all sets {x ′ , x ′′ ,x} with x ∈ X 0 are in T ′ we have to show that for each such set {x ′ , x ′′ ,x} there exists an independent set I of size at most |T | + k − 3 such that {x ′ , x ′′ ,x} uniquely extends I among triplets in T . This directly implies that {x ′ , x ′′ ,x} must be in every Proof. Let x be an arbitrary vertex of X 0 . In a first step we define an independent set I and show in a second step that {x ′ , x ′′ ,x} uniquely extends I. Applying Lemma 2 implies the existence of a path packing P of |X| + 2 paths from T to X in G − S that are vertex-disjoint except for three paths ending in x and such that each connected component of G − X − S is intersected by at most one path of P. This directly implies a path packing We retain the property that at most one path intersects the vertex set of any component of G − X − S, but note that we do not get exactly the same property for G 1 − X because of the still present sink-only copies of vertices in X. (The latter point will be no problem and should mainly explain why we need to talk about G − X − S and not only G 1 . Note that G − S and G 1 by construction share the vertex set V to be able to refer to connected components of G − X − S and the graph G 1 underlying the gammoid M 1 .) While we do not know the paths in P 1 entirely, we know for sure that no vertex of X ∪{x ′ , x ′′ } can be an internal vertex of any path in P 1 because there is a path ending in each of those vertices. Similarly, we may assume that no vertex of T is internal to any path of P 1 : If not then any path P ∈ P 1 with internal vertex from T can be shortened to start in that vertex; this argument cannot be repeated indefinitely (as the paths get shorter each time). There is still at most one path intersecting the vertex set of any component of G − X − S. Now, define T ′ ⊆ T as those vertices of T in which no path of P 1 starts; there must be exactly |T | − |P| = |T | − (|X| + 2) of them since no vertex of T is internal. Moreover, for each component K of G − X − S, the set T ′ contains all but at most one vertex of T ∩ V (K): At most one path of P 1 can start in T ∩ V (K) and no vertex can be internal. This will be important for proving the claim below.
Clearly, the set T ′ ∪ X ∪ {x ′ , x ′′ } is independent in M 1 because an appropriate path packing P ′ can be obtained from P 1 by adding length zero paths for each v ∈ T ′ . The setX = {x | x ∈ X} ⊆V is clearly independent in M 2 since it has size at most k. Thus, the set The size of I is at most Clearly, {x ′ , x ′′ ,x} extends I, as I ′ = {x ′ , x ′′ ,x} ∪ I is independent and both are disjoint by choice of I. We now show that no other {v ′ , v ′′ ,v} ∈ T extends I.
Assume, for contradiction, that v ∈ V \ X, i.e., that v = x. We know that {v ′ , v ′′ ,v} ∪ I is independent in M , so I 1 := I ∩ V 1 must be independent in M 1 . Thus, there exists a collection P ′′ of |I 1 | vertex-disjoint paths from T to I 1 in G 1 . Because X ⊆ I 1 , the paths, say P v ′ and P v ′′ , from T to {v ′ , v ′′ } cannot have internal vertices from the set X. Furthermore, they cannot have other sink-only copies as internal vertices. Since v ∈ V \ X, this implies that P v ′ and P v ′′ are entirely contained in some component corresponds to a component K of G−X −S but also has sink-only copies of each vertex.) Recall now that in T ′ we have all but at most one vertex of T ∩ V (K) for each connected component of G − X − S and this is also true for T ∩ V (K 1 ) as V (K 1 ) ∩ V = V (K). Thus, in P ′′ there is a path v of length zero for each vertex T ′ ∩ V (K 1 ), leaving at most one vertex of T to start paths to {v ′ , v ′′ }. This is a contradiction because P v ′ and P v ′′ are entirely contained in K 1 and fully vertex-disjoint.
Thus, if v ∈ V \ X then {v ′ , v ′′ } ∪ I 1 is not independent in M 1 and, hence, {v ′ , v ′′ ,v} does not extend I in M . Together with the first paragraph this implies that v = x, as claimed.
The set I fulfills the required properties which completes the proof.
We know now that for every vertex x ∈ V \ T that is a vertex in a dominant solution the set Shrinking the input graph to O(|V (T ′ )∪T |) vertices. In the previous parts we have shown that if there exists a solution for (G, S, k), then there exists a solution that is completely contained in W := V (T ′ ) ∪ T . Using this we can make all vertices in V \ W undeletable. We achieve this by applying the so-called torso operation to vertex set W in G; let G ′ = torso(G, W ). By definition of torso(G, W ), the resulting graph G ′ has vertex set W and is derived from G[W ] by making each pair {u, v} ⊆ W adjacent if there is a u,v-path in G with internal vertices from V \ W . Note that we do not create double edges or loops in G ′ and that all edges of S are preserved in G ′ because T ⊆ W . (The same can be achieved by iteratively selecting a vertex v ∈ V \ W , making its neighbors a clique, and deleting v from the graph.) It follows from Lemma 5 that (G ′ , S, k) is an equivalent instance and the graph of this instance contains at most |W | vertices. This completes the kernelization. The correctness of Lemma 5 follows from the fact that the torso operation preserves the separators that are contained in W (cf. [16]). For completeness we give a short proof of the lemma.
Proof of Lemma 5. Let X be a solution for (G ′ , S, k). We prove that X is also a solution for (G, S, k) by contradiction. Assume that X is not a solution for (G, S, k). Then there exists an S-cycle C = v 1 v 2 . . . v l in G − X. Note that S ⊆ E(G ′ ), because T = V (S) ⊆ W and therefore at least two vertices of C are contained in W . Now we modify C to obtain an S-cycle C ′ in G ′ .
Let v i , v j ∈ W ∩ C two vertices of the cycle with i < j such that {v i+1 , . . . , v j−1 } ⊆ V \ W . By definition there exists an edge {v i , v j } in torso(G, W ) and using these edges we obtain cycle C ′ . Note that C ′ contains no vertex of X and contains the same edges from S that C contains. Thus C is an S-cycle in G ′ − X which contradicts the assumption that X is a solution of (G ′ , S, k).
For the other direction we assume that (G, S, k) has a solution. Then there also exists a dominant solution X for (G, S, k) and we know that X ⊆ W . Again we prove that X is also a solution for (G ′ , S, k) by contradiction. Assume that X is not a solution for (G ′ , S, k). Then there exists a path P between the endpoints of an edge e = {x, y} ∈ S in G ′ − X that does not use the edge e. We modify P ′ to obtain a path P in G that does not contain the edge e. If P ′ uses an edge {u, v} that is not contained in G, then there exists a u-v path in V \ W connecting u and v. Crucially, V \ W is disjoint from X so this replacement still yields a walk that avoids X. Overall we get a walk from x to y in G that does not contain e as an edge and that avoids X. This walk contains a path P from x to y and this path together with the edge e is an S-cycle in G − X which is a contradiction to the assumption that X is a solution for (G, S, k).
So far we have a kernelization that creates an equivalent instance (G ′ , S, k) such that G ′ has |W | vertices. As mentioned above, Lemma 1 guarantees that |W | ∈ O(|S| 3 ) and this implies a polynomial kernel for edge subset fvs parameterized by |S|. If we use the fact that the gammoid M is the direct sum of two gammoids M 1 and M 2 , and that all sets {v ′ , v ′′ ,v} ∈ T contain exactly two elements of M 1 and one element of M 2 , then we can guarantee that |W | ∈ O(|S| 2 k), which is an improvement for all nontrivial instances with k < |S|. Lemma 6. Let M = M 1 ⊕ M 2 be the gammoid of rank |T | + k as defined above and T = {I 1 , I 2 , . . . , I t } be the set of independent sets of M that we use for the kernelization. Let A be represented by diag(A 1 , A 2 ) as above. If |T | > |T | 2 · k 1 , then there exists a set I ∈ T such that The proof of Lemma 6 is similar to Marx [15,Lemma 4.2]. We additionally use the fact that M is the direct sum of two gammoids to obtain that the vectors in the exterior algebra which represent the sets in T span a space of smaller dimension.
Proof of Lemma 6. Let U be the ground set of the matroid M which equals the set of columns of A. For each e ∈ U , let x e be the corresponding (|T | + k)-dimensional column vector of A and let w i = e∈A i x e be a vector in the exterior algebra of the linear space F |T |+k . Every w i is the wedge product of three vectors where exactly two are from A 1 0 and one from 0 A 2 . The two vectors corresponding to A 1 0 can only span a space of dimension |T | 2 and the vectors corresponding to 0 A 2 can only span a space of dimension k 1 . Thus, the w i 's span a space of dimension at most |T | 2 · k 1 . If |T | > |T | 2 · k 1 , then the w i 's are not independent and there exists some vector w l that can be expressed as a linear combination of the other vectors.
One can show analogously to Marx [15,Lemma 4.2] that T \{I l } is (|T |+k−3)-representative for T . We replicate this proof for convenience of the reader. Assume that there exists a set Y of size at most |T | + k − 3 such that I l extends Y and no other set I i , i = l extends Y . Let y = e∈Y x e . One property of the wedge product is that the product of some vectors in F |T |+k is zero if and only if they are not independent. Therefore it holds that w l ∧ y = 0 and w i ∧ y = 0 for every i = l. But w l is a linear combination of other w i 's and by the multi-linearity of the wedge product we get that w l ∧ y = 0 is a linear combination of the values w i ∧ y = 0 for i = l, which is a contradiction.
As mentioned above, Marx showed in [15] that one can find in randomized polynomialtime a matrix with r(M ) rows that represents a given gammoid M . We can make this proof algorithmic in the same way Marx did [15,Lemma 4.2]. Combined with Lemma 6 it follows directly that we can find a (|T | + k − 3)-representative subset T ′ of |T | whose size is at most |T |

Reducing the size of S
We have seen that edge subset fvs parameterized by |S| and k has a polynomial kernel. Now the goal is to reduce the size of the set S until |S| is polynomially bounded in k. This will lead to a polynomial kernel of edge subset fvs parameterized by k.
To begin, we do some initial modifications to ensure that we can always find a solution of size at most k that contains no vertex of the set V (S), if one exists. For this we first delete all vertices v ∈ V with the property that e = {v, v} ∈ S is a loop in G; since the vertex v must be in any solution, we decrease the value k by one. Next we delete all remaining loops, because these loops are not in S and cannot be contained in any S-cycle. We also reduce the number of edges between two vertices v, w ∈ V (G). If no edge that is incident to v and w is contained in the set S, then we delete all except one edge. On the other hand, if at least one edge between v and w is contained in S, then we delete all except two edges. One of these edges is contained in S and the other not. In the next step we add for every edge e = {v, w} ∈ S two new vertices v e , u e to the graph, subdivide the edge e into three edges {v, v e }, {v e , w e }, {w e , w}, and edit S by replacing edge e by the edge {v e , w e } in S. If a solution X of edge subset fvs contains a vertex x e ∈ V (S), then we can instead add the vertex x to X and delete x e from X, because every cycle that contains vertex x e also contains vertex x; hence we can always find an optimal solution that is disjoint from V (S).
Let (G, S, k) be an instance of edge subset fvs, such that G is a graph with the above properties. Analogous to the paper of Cygan et al. [3] we consider a solution Z of the edge subset fvs, with the difference that our solution is an 8-approximation of the problem, to reduce the size of S. Even et al. [6] show that there exists an 8-approximation algorithm for subset fvs. Since subset fvs and edge subset fvs are equivalent (cf. [3]), we can compute in polynomial time an 8-approximation for edge subset fvs and we can assume that Z ∩ V (S) = ∅. If |Z| > 8k, then we can stop immediately because no solution of size at most k can exist. On the other hand, if |Z| ≤ k, then Z is a solution for the problem and we are done.
The set Z is a feasible solution to edge subset fvs on (G, S, |Z|). This implies that every edge e ∈ S is a bridge in G − Z. In a next step we also remove all edges in S from G − Z. Every connected component in G − Z − S contains no edge from S and, following Cygan et al. [3], we call such a component a bubble. We denote the set of bubbles by D Z and define a graph H Z = (D Z , E D Z ) whose vertices are bubbles and with bubbles I and J being adjacent, i.e., {I, J} ∈ E D Z , if and only if the components I and J are connected by an edge from S. The graph H Z is a forest, because Z is a solution for (G, S, |Z|) and a cycle in H Z would give rise to an S-cycle in G − Z. Similarly, no two bubbles can be connected by more than one edge of S. By V I we denote the vertices that are contained in bubble I. Since |E(V I , V J ) ∩ S| ≤ 1 for all I, J ∈ D Z and equality holds if and only if {I, J} ∈ E D Z , we can associate an edge e = {I, J} ∈ E D Z with the one edge e S = {v I , v J } in E(V I , V J ) ∩ S. If we add the vertex set Z and all edges {z, I} with the property that z ∈ Z, I ∈ D Z and E(z, V I ) = ∅ to the graph H Z we obtain a graph H + Z that contains S-cycles. Note that every S-cycle must contain a vertex of the set Z. We partition the set of bubbles according to the number of bubbles they are connected with. Let X ⊆ V \ V (S) be a superset of Z. We define the graphs H X , H + X as well as the sets D X , E D X analogously to the graphs H Z , H + Z and the sets D Z , E D Z . Observe that the number of edges in S is at most |D Z \ D s Z |, because H Z is a forest, any two bubbles are connected by at most one S-edge, and V (S) ∩ Z = ∅.
So far our setup is essentially the same as the one used by Cygan et al. [3]. However, instead of an 8-approximate solution they use the framework of iterative compression, which provides a solution Z of size k + 1 and leaves them with the task of reducing the number of S-edges for the problem of finding a solution Z * that is disjoint from Z. Moreover, it suffices for them to consider the case that every feasible solution (if one exists) is disjoint from Z. In this setting they are able to reduce to an equivalent instance (or find that some assumption was violated) with only O(k 3 ) edges in S.
Thus, while many relevant structures like z-flowers or parallel x-y paths containing S-edges are the same, many things have to be handled differently. In particular, if we find that at least one out of two vertices x, y ∈ Z must be in the solution then we cannot stop (using the maximality condition) but need to continue and use this information in a more direct way.
During the reduction we detect certain pairs {x, y} of different vertices with the property that each solution of size at most k must contain at least one of the vertices (if one exists). We store this fact as a pair-constraint. We keep and enforce this information in the final instance, unless we decide earlier to delete x or y. By P we denote the set of pair-constraints that we have found so far. We can interpret this set as a set of edges and by V (P) we denote all vertices that are contained in a pair-constraint. Note that vertices from the set V (S) are never contained in a pair-constraint from P, because there always exists a solution that is disjoint from V (S). We need the set P to detect edges in S that may be safely deleted. To this end, we generalize the edge subset fvs problem by adding a set of pair-constraints P to the input; we call this problem pair-constrained edge subset fvs.

pair-constrained edge subset feedback vertex set
Parameter: k Input: An undirected graph G, a set S ⊆ E of edges, a set P of pair-constraints and an integer k. Question: Does there exist a set X ⊆ V of size at most k such that G − X contains no S-cycle and such that for each pair-constraint {x, y} ∈ P we have x ∈ X or y ∈ X?
Clearly, instances (G, S, k) of edge subset fvs and (G, S, ∅, k) of pair-constrained edge subset fvs are equivalent. Our goal is to reduce the size of S by detecting S-edges that we can delete from S without changing the outcome. This leads to the following definition: Note that if two different S-edges e and e ′ are irrelevant in (G, S, P, k), then e ′ is not necessarily irrelevant in (G, S \ {e}, P, k). In addition we do not expect to find all irrelevant edges or pair-constraints.
The reduction rules. We now present our reduction rules. Throughout we assume that always the lowest numbered applicable rule is applied first. Correctness and efficiency of the overall reduction process will be proved later.
Let (G, S, P = ∅, k) be an instance for pair-constrained edge subset fvs and let Z be an 8-approximation of this problem with k < |Z| ≤ 8k that is disjoint from V (S). In the following the graphs G − Z, G − Z − S, H Z , and H + Z are always defined with respect to the current instance (G, S, P, k) of pair-constrained edge subset fvs. Note that Z ⊆ V and we delete vertices from Z if we delete the corresponding vertex in V . Rules 2 and 3 ensure that each bubble I ∈ D Z is adjacent to a vertex in Z in the graph H + Z , i.e. for all I ∈ D Z we have E H + Z (V I , Z) = ∅: Since Rule 2 is not applicable every bubble I ∈ D Z must be adjacent to a bubble J ∈ D Z \ I, or a vertex in Z; otherwise G[V I ] would be a connected component of G that does not contain any edge from S (V I was deleted in Rule 2). From Rule 3 follows that a bubble I ∈ D Z must be adjacent to a vertex in Z; otherwise the edge e ∈ N (V I ) ∩ S would be a bridge in (V, E \ (S \ {e})).

Rule 4:
If there exists a vertex v in the set V (P) that is contained in at least k + 1 pairconstraints of P, then we reduce to G ′ = G − v and k ′ = k − 1.

Rule 5:
If |P| > k 2 (and Rule 4 is not applicable), then reduce (G, S, P, k) to some trivial false instance.

Rule 6:
If there exists a z-flower of order k + 1 in G for a vertex z ∈ Z, then we reduce to G ′ := G − z and k ′ := k − 1.
For the next rules we need a maximal matching M in H Z that covers all inner bubbles D i Z in H Z . Note that two adjacent leaf bubbles I 1 , I 2 are not adjacent to an inner bubble and form a K 2 in H Z , hence the edge {I 1 , I 2 } ∈ E D Z is contained in every maximal matching in H Z . We use this matching to detect pair-constraints in Z. To this end we introduce the following definition: Let e = {I, J} be an edge in the matching M . We say e sees the pair {x, y} of different vertices The matching M is always recomputed if, through application of rules, it does no longer cover every inner bubble or is maximal when testing whether Rules 7 or 8 apply (i.e., if the preceding rules do not apply). If M does cover all inner bubbles but neither Rule 7 nor 8 apply then, as we will prove later, this implies |M | ∈ O(k 3 ) and, hence, that there are at most 2|M | ∈ O(k 3 ) inner bubbles.
Let L = D l Z \ V (M ) be the set of leaf bubbles that are not covered by M . Because the matching covers at least all inner bubbles, we know that |S| ≤ 2|M | + |L|. Therefore we have to find a reduction rule that reduces the number of leaf bubbles in L. Every leaf bubble in L is adjacent to an inner bubble in H Z , because M covers all leaf bubbles that are not adjacent to an inner bubble. To bound the number of leaf bubbles in L we define for each z ∈ Z a graph G z with the help of the following two sets. The first one, consists of all vertices that are contained in an inner bubble that is adjacent to a leaf bubble in In the graph G z each leaf bubble I ∈ L z is a single vertex. We are not interested in the internal structure of leaf bubbles in L z , whereas we are interested in the structure of the inner bubbles that are adjacent to the leaf bubbles in L z . Thus we add the connected component that corresponds to an inner bubble which is adjacent to a bubble in L z to G z . In order to apply the concept of flowers and blocking sets in G z , an edge e ∈ E(G z ) is an S-edge in G z if e = {I, w} with I ∈ L z and w ∈ V i z . Note that e is an edge in G z , because there exists an S-edge e ′ = {v, w} in G with v ∈ V I . Lemma 7. If there exists no z-flower of order k + 1 in G z for a vertex z ∈ Z, then we can find a z-blocker B z ⊆ V i z \ V (S) of size at most 2k in G z .
The lemma follows from Theorem 1 and the preprocessing as well as the construction of G z .
Proof of Lemma 7. The number of vertex-disjoint L z -paths in G z − z is at most k, otherwise the L z -paths together with vertex z would correspond to a z-flower of order k + 1 in G z ; this contradicts the assumption. From Theorem 1 it follows that there exists a set B z ⊆ V (G z −z) = L z ∪ V i z of size at most 2k intersecting every L z -path. Since every S-cycle through z in G z must contain an L z -path, B z is a z-blocker of size at most 2k in G z .
It remains to show that there exists a z-blocker B z ⊆ V i z \ V (S). First we assume that there exists a vertex I ∈ B z ∩ L z . From the construction of G z it follows that every leaf bubble I ∈ L z has degree one in G z − z. Thus instead of I we can choose the vertex in N Gz (I) for the z-blocker B z to obtain that B z ⊆ V i z . In the next step we take care that B z is also disjoint from V (S). Assume that B z contains a vertex v e ∈ V (S) ∩ V i z . From the preprocessing it follows that we can add v ⊆ V i z \ V (S) to B z and delete v e from B z , because every cycle that contains v e also contains v.
Note that we delete at least as many vertices from B z as we add to B z , hence B z is still of size at most 2k.
Since no previous rule is applicable and a z-flower of order k + 1 in G z gives rise to a zflower of order k + 1 in G, we find a z-blocker of size at most 2k for every vertex z ∈ Z. Let B = z∈Z B z be the union of all z-blockers B z of size at most 2k. Note that the set L is the union of all sets L z with z ∈ Z, because every leaf bubble is adjacent to a vertex in Z due to Rule 2, hence L = z∈Z L z .
The following lemma provides three nice properties of the graph H Z∪B = (D Z∪B , E D Z∪B ) which helps us to bound the number of leaf bubbles in L ⊆ D l Z . To memorize: The set D Z∪B is the set of bubbles in G − (Z ∪ B) − S and two bubbles I, J are adjacent in H Z∪B if and only if E(V I , V J ) ∩ S = ∅. Lemma 8. The graph H Z∪B has the following properties: 1. For each bubble I ∈ D Z∪B there exists a bubble J ∈ D Z , such that V I ⊆ V J .
2. For each leaf bubble J ∈ D Z there exists a leaf bubble I ∈ D Z∪B , such that V I = V J .
Proof. Property 1 holds because the set B only splits bubbles of G − Z − S further (because we are now looking at deleting Z ∪ B from G − S) and does not merge any two bubbles. Property 2 follows from the fact that the set B is disjoint from the set of leaf bubbles. Next we show Property 3 by contradiction. We assume that some z ∈ Z is in N G (V I ) and in N G (V J ). Then I and J are both vertices of the graph G z and hence both are contained in the set L z . The consequence is that there exists an L z path from bubble I over bubble K to bubble J in H Z∪B which can be extended to a L z path in G z not containing any vertex in B; this contradicts the fact that B z ⊆ B blocks all L z -paths in G z .
From Lemma 8 it follows that L ⊆ D l Z∪B ; thus we can use H Z∪B to bound the number of leaf bubbles in L. Let I = {J ∈ D i Z∪B | E(L, J) = ∅} be the set of inner bubbles in H Z∪B that are adjacent to a leaf bubble in L. Clearly the number of edges between I and L in H Z∪B equals the number |L|. Instead of again using a matching to reduce this number we consider more carefully the properties of these edges. For this we define the property of seeing a pair in a slightly different way. Let e = {I, J} be an edge with I ∈ I and J ∈ L. We say that e = {I, J} with I ∈ I and J ∈ L sees the pair {x, y} of different vertices x ∈ Z ∪ B and y ∈ Z, if {I, x}, {J, y} ∈ E(H + Z∪B ). Observe that a bubble in L is never adjacent to a vertex in B in the graph H Z∪B , because B ⊆ z∈Z V i z \ V (S).

Rule 9:
If at least (k+2) edges {I 1 , J 1 }, {I 2 , J 2 }, . . . {I l , J l } with l ≥ k+2, I i ∈ I and J i ∈ L for 1 ≤ i ≤ l see a pair {x, y} of different vertices, such that x ∈ Z ∪ B is adjacent to I i , y ∈ Z is adjacent to J i for all i ∈ {1, 2, . . . , l}, then we add {x, y} to the set of pair-constraints P.
At first sight Rule 7 and 9 may seem somewhat similar, but on closer inspection on can observe a decisive difference. In Rule 9 we consider only edges between two disjoint sets of bubbles, whereas the edges in M can be between two inner bubbles, an inner bubble and a leaf bubble, or between two leaf bubbles. For this reason we can require in Rule 9 that all bubbles in I are adjacent to x and all bubbles in L are adjacent to y; this is not possible in Rule 7. We will see later that we need the definite assignment of the bubbles to the vertices in Z ∪ B by applying Rule 9.
Rule 10: If there exists an edge e = {I, J} with I ∈ I and J ∈ L such that e sees no single vertex z ∈ B ∪ Z and for every pair {x, y} seen by e the pair {x, y} is a pair-constraint in P, then remove e S from S, delete J from L and replace I by I ∪ J in I.
If we delete an edge e = {I, J} from S by applying Rule 10, then the consequence is that bubbles I and J are now merged into a single bubble. Anyhow, it is sufficient to continue with Rule 9, because M is still a matching that covers all inner bubbles in the current graph H Z and B still has the properties of Lemma 8 with respect to the current graph H Z∪B . That the edge set M is still a matching in H Z holds because we never delete an edge in M or an endpoint of an edge in M ; we only merge an endpoint of an edge in M with an unmatched leaf bubble in L. The first two properties of Lemma 8 obviously hold with respect to the current graph H Z . That Property 3 also holds follows from the fact that the leaf bubbles that are still in L are the same as before and adjacent to the same inner bubbles as before.
The reduction rules are safe. First we show that our reduction rules are safe, i.e. that there exists a solution for (G, S, P, k) if and only if there exists a solution for (G ′ , S ′ , P ′ , k ′ ). Note that Rules 1, 2, and 6 are obviously safe and Rule 3 is safe because for every S-cycle through an edge e ∈ S that is a bridge in (V, E \ (S \ {e})) there is another S-edge e ′ on the cycle. Let us consider the set P of pair-constraints to see that Rules 4 and 5 are safe. The set P naturally leads to the graph P = (V (P), P) and has the property that we have to pick at least on vertex of each pair-constraint for a solution for (G, S, P, k). Hence any solution for (G, S, P, k) must contain a vertex cover of P . Thus, Rules 4 and 5 are direct analogues of classical reduction rules for the vertex cover problem, and hence safe. To show that the other rules are safe, we first show a technical Lemma about a property of edges in H Z∪B . Lemma 9. If two different edges {I 1 , J 1 } and {I 2 , J 2 } in H Z∪B with I 1 , I 2 ∈ I, J 1 , J 2 ∈ L see a vertex z ∈ Z, respectively a pair {x, y} with x ∈ Z ∪ B and y ∈ Z such that {x, I 1 }, {x, I 2 }, {y, J 1 }, {y, J 2 } ∈ E(H Z∪B ), then it holds that they are disjoint, i.e. that I 1 = I 2 and J 1 = J 2 .
Proof. We first assume that I 1 = I 2 . This implies that J 1 and J 2 are leaf bubbles in L which are adjacent to the same inner bubble I = I 1 = I 2 in H Z∪B . For J 1 and J 2 it must hold that z ∈ N G (V I i ) respectively y ∈ N G (V I i ) for i = 1, 2. But this is a contradiction to Property 3 of Lemma 8. On the other hand if J 1 = J 2 , then I 1 = I 2 because every leaf bubble in L sees only one other bubble.
To show that Rules 7 and 9 are safe, we have to prove that we only add a pair {x, y} of vertices to the set P of pair-constraints if either x or y must be in each solution of size at most k. The (k + 2) edges that see a pair {x, y} are pairwise disjoint, because M is a matching and Lemma 9 holds. Hence we have at least (k + 2) disjoint x-y paths in H + Z respectively H + Z∪B which we can extend to at least (k + 2) disjoint x-y paths in G. This is the reason why at least one of x and y must be in any solution and it is safe to add {x, y} to P as a pair-constraint. It remains to show that Rules 8 and 10 are safe. For this we prove that the edges that we delete in these rules are irrelevant. First we prove the following lemma. Since every solution X for (G, S, P, k) is also a solution for (G, S \ {e S }, P, k), we only have to show the other direction.
Let X be a solution for (G, S \ {e S }, P, k). We assume that there exists an S-cycle C in G−X. This S-cycle C can only contain the S-edge e S ; otherwise would C be an (S \{e S })-cycle which contradicts the fact that X is a solution for (G, S \ {e S }, P, k).

Claim 2.
If an S-cycle C in G only contains the S-edge e S , then there exists either a vertex y ∈ Y such that e sees the single vertex y and y is contained in cycle C or two different vertices x, y ∈ Y such that e sees the pair {x, y} and cycle C contains x and y.
Proof. Let C be an S-cycle with the properties of the claim. Thus C must exit bubble I and bubble J by edges that end in Y , because this is the only way to obtain a path from v I to v J that uses no edge from S. If these two edges share their endpoint y in Y , then e sees the single vertex y and y is contained in C. On the other hand if these two edges have different endpoints x, y in Y , then e sees the pair {x, y} and the vertices x, y are contained in C.
Based on Claim 2, it follows that edge e = {I, J} must see a single vertex y ∈ Y that is contained in C or a pair {x, y} with x, y ∈ Y such that x, y are contained in C. From the properties of edge e follows that e sees no single vertex and every pair {x, y} that is seen by e must be contained in a pair-constraint. Let {x, y} be the pair that is seen by e such that x, y are vertices of cycle C (Claim 2). But at least one vertex of the pair {x, y} must be in the solution X for (G, S \ {e S }, P, k), since e sees only pairs that are contained in the set P of pair-constraints; hence C is no cycle in G − X.
From Lemma 10 follows that we only delete an edge e S in Rule 8 and 10 when e S is irrelevant for instance (G, S, P, k); this holds because Y = Z respectively Y = Z ∪ B is a superset of Z.
Applying the Rules. First we show that if none of the rules can be applied, then the size of S is bounded by O(k 4 ). For this we prove two lemmas. One bounds the size of M which helps us to bound the number of inner bubbles and the other bounds the number of leaf bubbles in L. Proof. Each edge in M sees either a pair of vertices in Z or a single vertex in Z. The number of pairs in Z is at most |Z| 2 ≤ |Z| 2 . Therefore the number of pairs in Z that are not in the set P of pair-constraints is at most |Z| 2 . Because we cannot apply Rule 7, at most (k + 1) edges in M see any pair that is not in the set of pair-constraints. Thus at most (k + 1)|Z| 2 edges of M can see a pair of vertices in Z that is not in P. The number of edges in M that see a single vertex in Z is at most k|Z|; otherwise we can apply Rule 6, because at least one single vertex z in Z is seen by at least k + 1 edges from M and these edges together with z are a z-flower of order k + 1 in H + Z which we can expand to a z-flower of order k + 1 in G. Since we cannot apply Rules 6, 7 or 8, this leads to at most (k + 1)|Z| 2 + k|Z| ∈ O(k 3 ) edges in M , because |Z| ≤ 8k.
From the lemma it follows that the number of inner bubbles in H Z is at most 2|M | ∈ O(k 3 ).
Lemma 12. If we cannot apply Rules 1 through 10 then the size of L is bounded by O(k 4 ).
Proof. We claim that the number of edges between bubbles in I and bubbles in L is at most (k + 1)|Z|(|B| + |Z|) + k|Z|, if no rule is applicable. This implies that there are at most O(k 4 ) leaf bubbles in L.
Each edge between bubbles in I and bubbles in L sees a pair {x, y}, such that {x, I}, {y, J} ∈ E(H Z∪B ) with x ∈ Z ∪ B is adjacent to I, y ∈ Z is adjacent to J or a vertex z in Z; hence the number of pairs is at most |Z|(|Z| + |B|). Rule 9 adds {x, y} to P if at least (k + 2) edges {I 1 , J 1 }, {I 2 , J 2 }, . . . , {I l , J l } with l ≥ k + 1, I i ∈ I and J i ∈ L for 1 ≤ i ≤ l see the pair {x, y} such that x ∈ Z ∪ B is adjacent to I i and y ∈ Z is adjacent to J i for 1 ≤ i ≤ l. This bounds the number of edges between vertices in I and L which see a pair, whose vertices are not a pair in the set P of pair-constraints, by (k + 1)|Z|(|Z| + |B|). The number of edges between vertices in I and L that see a certain vertex z is at most k, otherwise the at least k + 1 edges between I and L that see vertex z together with vertex z form a z-flower of order k + 1 in H + Z∪B because Lemma 9 ensures that the edges are disjoint. But then we can apply Rule 6 and delete vertex z. Hence at most k|Z| edges between vertices I and L can see a vertex in Z. This leads to at most (k + 1)|Z|(|B| + |Z|) + k|Z| edges between vertices in I and L, because we cannot apply Rules 6, 9 or 10; this implies that |L| ∈ O(k 4 ), because |Z| ≤ 8k and |B| ≤ 2k|Z| ≤ 16k 2 .
If we combine these two results, we know that |D i Z | + |D l Z | ∈ O(k 4 ). As mentioned above this is an upper bound for the number of edges in S, because H Z is a forest, because there is at most one edge of S between any two bubbles, and because V (S) ∩ Z = ∅.
Finally we have to prove that we can perform the reduction in polynomial time. First we prove that each rule is applied a polynomial number of times and second that every single rule application can be performed in polynomial time.
Lemma 13. Each reduction rule is applied at most a number of times that is polynomially bounded in the input size.
Proof. Note that we reduce in each rule, except Rules 7 and 9, the size of at least one of the sets V , E, S, the value k or decide that no solution of size at most k exists. In Rules 7 and 9 we add pair constraints to P, but if P contains more than k 2 pair constraints, we either find a vertex z ∈ V (P) that we delete in Rule 4 and reduce k by one or we decide in Rule 5 that no solution of size at most k exists. This bounds the number of pair constraints that we add to P during the reduction by k 3 because we can decrease k at most k times. Thus, each rule is applied at most a number of times that is polynomial in the input size.
Next we show that each single rule application can be performed in polynomial time. It is obvious that we can apply Rules 1 through 5 in polynomial time. The following lemma addresses Rule 6 by solving a matroid parity problem on an appropriate gammoid. Lemma 14. Let G = (V, E), z ∈ V , and S ⊆ E. A z-flower of maximum order, i.e., a maximum number of S-cycles that intersect only in z, can be found in (deterministic) polynomial time.
Proof. For simplicity, we assume that there are no edges of S incident with z. If this is not the case, then it can be checked that for each neighbor v ∈ N (z) with {v, z} ∈ S removing {v, z} from S and adding instead all other edges incident with v to S gives the desired result. Furthermore, we assume that no two edges of S are incident with the same vertex of G; this can be achieved by appropriate subdivision operations, without changing the maximum order of z-flowers.
Let {C 1 , . . . , C t } be a z-flower of order t. Each C i gives rise to a path P i between two different neighbors u and v of z; all these paths are fully vertex-disjoint. By our above assumption, there are no S-edges incident with z, hence, each P i must contain two consecutive vertices, say s i and t i , with {s i , t i } ∈ S. In this way, each path P i can be split into two paths, P i,s and P i,t , from N (v) to {s i , t i }; all these 2t paths are pairwise vertex-disjoint and do not contain the vertex z. Thus, from any z-flower of order t we get 2t vertex-disjoint paths in G − z from N (z) to T ⊆ V (S), i.e., endpoints of S-edges, such that T can be partitioned into t two-sets of vertices that are also edges in S. In the language of gammoids this means that T is an independent set in the gammoid on graph G − z, with sources N (z), and ground set V (S).
Conversely, any independent set T in the mentioned gammoid implies the existence of |T | vertex-disjoint paths in G − z from N (z) to T . If, as above, T can be partitioned into edges of S then this gives rise to a z-flower of order t = |T |/2: Clearly, |T | must be even to allow for the partition into sets of size two. Moreover, the paths are vertex-disjoint and, thus, two paths from N (z) ending in {s i , t i } ∈ S can be combined, using that {s i , t i } must be an edge of G into a single path, say P i , from N (z) to N (z) that contains at least one edge of S. Note that, because s i and t i are ends of two paths in the packing they cannot occur in any other paths, so this combination still yields vertex-disjoint paths in G − z. Finally, adding the vertex z, the paths P 1 , . . . , P t can be combined into t S-cycles that intersect only in z.
Thus, the task of finding a z-flower of maximum order reduces to that of solving a matroid parity problem on a gammoid: The underlying graph is G − z, the source set is N G (z), the ground set is V (S), and the pairs are given by S. Recall that pairs in S are vertex-disjoint. Using the algorithm due to Lovász [14], one may find a maximum independent set composed of pairs in S in polynomial time, when provided with a matrix representation for the gammoid. A small caveat would be that one would need a randomized algorithm for finding said representation. Conveniently, specialized deterministic algorithms exist for subclasses of linear matroids; we can use a deterministic algorithm due to Tong et al. [20] that solves the problem by reduction to weighted matching on graphs. (Note that given a maximum independent set T composed of pairs, the cycles of the z-flower can be found by simple disjoint paths computation for N (z) to T in G − z.) It remains to show that we can apply Rules 7 through 10 in polynomial time.
Lemma 15. We can apply Rule 7 and 8 in polynomial time.
Proof. First of all we store for each edge e = {I, J} ∈ M all vertices z ∈ Z seen by edge e and all pairs {x, y} with x, y ∈ Z seen by edge e. For each edge we need at most O(|Z| 2 ) time; we only have to test for each vertex z ∈ Z respectively each pair {x, y} with x, y ∈ Z whether {I, z}, {J, z} ∈ E(H Z ) respectively {I, x}, {J, y} ∈ E(H Z ) or {I, y}, {J, x} ∈ E(H Z ). Next we count how many edges see a pair {x, y} with x, y ∈ Z and denote this value by c {x,y} . It takes at most O(|E||Z| 2 ) time to compute all values; we only have to count for how many edges we store a certain pair. If a counter c {x,y} has value at least k + 2, then we add the pair {x, y} to the set P of pair-constraints. We can check this for all counters in O(|Z| 2 ) time. The above computation corresponds to the computation we need for Rule 7. To apply Rule 8 we only have to look at all vertices and pairs that we stored for an edge e ∈ M . If we have stored no single vertex and only pairs that are pair-constraints in P, then e fulfills the conditions of an edge that we delete in Rule 8. To check this for one edge takes at most O(|Z| 2 ) time.
We prove that we can apply Rule 9 and 10 in polynomial time similar to how we prove that we can apply Rule 7 and 8 in polynomial time. We only have to remember which endpoint is adjacent to which vertex in a pair.
Lemma 16. We can apply Rule 9 and 10 in polynomial time.
Proof. First of all we store for each edge e = {I, J} with I ∈ I, J ∈ L all vertices z ∈ Z seen by edge e and all pairs (x, y) with x ∈ Z ∪ B adjacent to I, y ∈ Z adjacent to J such that e sees the pair {x, y}. For each edge e = {I, J} with I ∈ I, J ∈ L we need at most O(|Z ∪ B||Z|) time; we only have to test for each vertex z ∈ Z respectively each pair (x, y) with x ∈ Z ∪ B, y ∈ Z whether {I, z}, {J, z} ∈ E(H Z∪B ) respectively {I, x}, {J, y} ∈ E(H Z∪B ). Next we count for how many edges we stored the pair (x, y) with x ∈ Z ∪ B, y ∈ Z and denote this value by c (x,y) . It takes at most O(|E||Z ∪ B||Z|) time to compute all values; we only have to count for how many edges we store a certain pair. If a counter c (x,y) has value at least k + 2, then we add the pair {x, y} to the set P of pair-constraints. We can check this for all counters in adding an edge {x, y} between x and y that is also contained in S ′ ; hence there are two edges between x and y with {x, y} ∈ P in graph G ′ and we add exactly one edge between x and y to S ′ . Because we cannot apply Rule 4 or 5 to (G, S, P, k), we know that |P| ≤ k 2 . This leads to a bound of |S| + |P| ∈ O(k 4 ) edges in S ′ for the edge subset fvs problem after the reduction.
Finally, we combine the results of Section 3 and Section 4 to obtain a polynomial kernel for edge subset fvs parameterized by k. Let us first make some comments about the reduction of the size of S and the kernelization: For the reduction of the size of S we use the fact that we can always find a solution that is disjoint from T . This only holds because we modified the graph accordingly. But since this is a correct reduction it holds that an input instance (G, S, k) of edge subset fvs has a solution if and only if the output instance (G ′ , S ′ , k ′ ) of the reduction in Section 4 has a solution. Thus it is no problem that we consider dominant solutions for the kernelization in Section 3 and that the kernelization only guarantees the preservation of dominant solutions. Every instance (G ′ , S ′ , k ′ ) has a dominant solution of size at most k ′ when a solution of size at most k ′ exists; remember that X is a dominant solution for (G ′ , S ′ , k ′ ) if it has minimum size and contains a maximal number of vertices from T ′ among solutions of minimum size. Hence if (G ′ , S ′ , k ′ ) has a solution then it has a dominant solution X and X is a dominant solution for (G ′ , S ′ , k ′ ) if and only if X is a dominant solution for (G ′′ , S ′ , k ′ ) the output instance of the kernelization in Section 3.
Summarized, the reduction of the number of edges in S to O(k 4 ) edges together with the kernelization to O(|S| 2 k) vertices for edge subset fvs parameterized by |S| and k, results in a kernelized instance with O(k 9 ) vertices for edge subset fvs parameterized by k.

Conclusions
We have shown that the subset fvs problem has a randomized polynomial kernelization using the matroid-based tools of Kratsch and Wahlström [11], positively answering the question of Cygan et al. [3]. As in previous work [11] the error-probability can be made exponentially small without increasing the kernel size. Nevertheless, it would of course be very interesting whether the use of randomization and/or matroids can be avoided. Furthermore, there is quite a gap between O(k 9 ) vertices and a lower bound of size O(k 2−ε ) that is inherited from vertex cover [4], conditioned on non-collapse of the polynomial hierarchy.
Other open problems regarding existence of polynomial kernels, possibly amenable to the matroid tools, are multiway cut and directed feedback vertex set (dfvs). There is also a directed version of subset fvs, called directed subset feedback vertex set, but it generalizes dfvs, whose kernel status has remained open for quite some time now.