Hardness of approximation for H-free edge modification problems

The $H$-Free Edge Deletion problem asks, for a given graph $G$ and an integer $k$, whether it is possible to delete at most $k$ edges from $G$ to make it $H$-free, that is, not containing $H$ as an induced subgraph. The $H$-Free Edge Completion problem is defined similarly, but we add edges instead of deleting them. The study of these two problem families has recently been the subject of intensive studies from the point of view of parameterized complexity and kernelization. In particular, it was shown that the problems do not admit polynomial kernels (under plausible complexity assumptions) for almost all graphs $H$, with several important exceptions occurring when the class of $H$-free graphs exhibits some structural properties. In this work we complement the parameterized study of edge modification problems to $H$-free graphs by considering their approximability. We prove that whenever $H$ is $3$-connected and has at least two non-edges, then both $H$-Free Edge Deletion and $H$-Free Edge Completion are very hard to approximate: they do not admit $\mathrm{poly}(\mathsf{OPT})$-approximation in polynomial time, unless $\mathrm{P}=\mathrm{NP}$, or even in time subexponential in $\mathsf{OPT}$, unless the Exponential Time Hypothesis fails. The assumption of the existence of two non-edges appears to be important: we show that whenever $H$ is a complete graph without one edge, then $H$-Free Edge Deletion is tightly connected to the Min Horn problem, whose approximability is still open. Finally, in an attempt to extend our hardness results beyond $3$-connected graphs, we consider the cases of $H$ being a path or a cycle, and we achieve an almost complete dichotomy there.


Introduction
(Completion) problems whenever H is 3-connected and has at least 2 non-edges. Nontrivial positive cases include e.g. H being a path on 4 vertices [9] (that is, Cograph Edge Deletion (Completion)), and H being a K 4 minus one edge [5] (that is, Diamond-free Edge Deletion). One of the most prominent open cases left is the kernelization complexity of Claw-Free Edge Deletion [4,6].
Our motivation and results. The starting point of our work is the realization that the propagational character of H-free Edge Deletion (Completion), which is the basic explanation of its apparent kernelization hardness, also makes the greedy approach to approximation incorrect. One cannot greedily remove all the edges of any copy of H in the graph, because removing an edge does not necessarily always help: it may create new copies of H in the instance. Hence, the approximation complexity of H-free Edge Deletion (Completion) is actually also highly unclear. On the other hand, the links between approximation and kernelization are well-known in parameterized complexity: it is often the case that a polynomial kernel for a problem can be turned into a poly(OPT)-approximation algorithm (i.e. an algorithm that returns a solution of cost bounded by some polynomial function of the optimum), by just taking greedily the kernel and reverting the reduction rules. While this intuitive link is far from being formal, and actually there are examples of problems behaving differently [8], it is definitely the case that the combinatorial insight given by kernelization algorithms may be very useful in the approximation setting.
Therefore, we propose to study the approximability of H-free Edge Deletion (Completion) as well, alongside with the best possible running times of fixed-parameter algorithms and the existence of polynomial kernels. This work is the first step in this direction.
We prove that the H-free Edge Deletion (Completion) problems are very hard to approximate for a vast majority of graphs H, which mirrors the kernelization hardness results of Cai and Cai [4,5]. The following theorem explains our main result formally. Theorem 1 makes two structural assumptions about graph H: that it is 3-connected, and has at least two non-edges. The first one is a crucial technical ingredient in the reductions, because it enables us to argue that for any vertex cut of size 2, every copy of H in the graph is completely contained on one side of the cut. Relaxing this assumption is a major issue addressed by Cai and Cai [4,5] in their work. In an attempt to lift this assumption in our setting as well, we try to resolve the case of H being a path or a cycle first; this reflects the development of the story of kernelization hardness for the considered problems [4,5,9,12]. The following theorem summarizes our results in this direction. Together with some easy cases and known positive results [14], this gives an almost complete dichotomy for paths and cycles. The only missing case is Cograph Edge Deletion (for H = P 4 ), for which we expect a positive answer due to the existence of a polynomial kernel [9]. However, our preliminary attempt at lifting the kernel of Guillemot et al. [9] showed that the approach does not directly work for approximation, and new insight seems to be necessary.
Finally, somewhat surprisingly we show that the assumption that H has at least two non-edges appears to be important. Suppose H = K n \ e is a complete graph on n ≥ 5 vertices with one edge removed. While H-free Edge Completion is trivially polynomial-time solvable, due to each obstacle having only one way to be destroyed, the complexity of H-free Edge Deletion turns out to be much more interesting. Namely, we show that it is tightly connected to the complexity of Min Horn Deletion, which apparently is one of the remaining open cases in the classification of the approximation complexity of CSP problems of Khanna et al. [11]. Hence, the following theorem shows that the case of H being a complete graph without an edge may be an interesting outlier in the whole complexity picture.
Theorem 3. For any n ≥ 5, the K n \ e-free Edge Deletion problem is Min Horn Deletioncomplete with respect to A-reductions.
The exact meaning of Min Horn Deletion-completeness, A-reductions and other definitions related to the hardness of approximation for CSP problems are explained in Section 4. A direct consequence of Theorem 3 and the work of Khanna et al. [11] is that K n \ e-free Edge Deletion does not admit a 2 O(log 1− m) -approximation algorithm working in polynomial time, for any > 0. Moreover, Theorem 3 implies that K n \ e-free Edge Deletion is poly-APX-hard if and only if each Min Horn Deletion-complete problem is poly-APX-hard, the latter being an intriguing open problem left by Khanna et al. [11] in their study of approximability of CSPs.
While there is no direct connection between the existence of a poly(OPT) approximation and poly-APX-hardness, we still believe that our reduction corroborates the hardness of resolving approximation question of K n \ e-free Edge Deletion in terms of optimum value. Intuitively, showing poly-APX-hardness should be easier than refuting poly(OPT) approximation. Below we state formally what our reduction actually implies.
Corollary 4. Let n ≥ 5. Then it is NP-hard to approximate the K n \ e-free Edge Deletion problem within factor 2 O(log 1− m) for any > 0, where m is the number of edges in a given graph.
Corollary 5. Let n ≥ 5. Then the K n \ e-free Edge Deletion problem admits an n δapproximation for all δ > 0, if and only if each Min Horn Deletion-complete problem admits an n δ 1 -approximation for all δ 1 > 0.
Our techniques. To prove our main result, Theorem 1, we employ the following strategy. We first consider the sandwich problem defined as follows: in Sandwich H-Free Edge Deletion we are given a graph G together with a subset D of undeletable edges, and the question is whether there exists a subset F ⊆ E(G) \ D of deletable edges for which G − F is H-free. Note that the sandwich problem differs from the standard H-free Edge Deletion problem in two aspects: first, some edges are forbidden to be deleted, and, second, it is a decision problem about the existence of any solution-we do not impose any constraint on its size. For completion, the sandwich problem is defined similarly: we have unfillable non-edges, i.e., non-edges that are forbidden to be added in the solution.
The crux of the approach is to prove that Sandwich H-Free Edge Deletion is actually NP-hard under the given assumptions on H. The next step is to reduce from the sandwich problem to the standard optimization variant. This is done by adding gadgets that emulate undeletable edges by introducing a large approximation gap, as follows. For each undeletable edge e, attach a large number of copies of H to e, so that each copy becomes an induced H-subgraph if e gets deleted. Then any solution that deletes the undeletable edge e must have a very large cost, due to all the disjoint copies of H that appear after the removal of e. The assumption that H is 3-connected is very useful for showing that the constructions do not introduce any additional, unwanted copies of H in the graph.
The approach for completion problems is similar. To prove Theorem 2 that concerns paths and cycles, we give problem-specific constructions using the same approach. Some of them are based on previous ETH-hardness proofs for the problems, given by Drange et al. [7].
As far as Theorem 3 is concerned, we employ a similar reduction strategy, but instead of starting from 3SAT, we start from a carefully selected MinOnes(F) problem: the problem of optimizing the number of ones in a satisfying assignment to a boolean formula that uses only constraints from some fixed family F. In particular, the constraint family F needs to be rich enough to be Min Horn Deletion-hard, while at the same time it needs to restrictive enough so that it can be expressed in the language of K n \ e-free Edge Deletion.
Our constructions are inspired by the rich toolbox of hardness proofs for kernelization and fixed-parameter algorithms for edge modification problems [1,4,5,7,9,12]. In particular, the idea of considering sandwich problems can be traced back to the work of Cai and Cai [4,5], who use the term quarantine for the optimization variants of edge modification problems with undeletable edges and non-fillable non-edges. Quarantined problems serve a technical, auxiliary role in the work of Cai and Cai [4,5]: one first proves hardness of the quarantined problem, and then lifts the quarantine by attaching gadgets, similarly as we do.
However, we would like to point out the new challenges that appear in the approximation setting. Most importantly, the vast majority of previous reductions heavily use budget constraints (i.e. the fact that the solution is stipulated to be of size at most k) to argue the correctness; this includes the general results of Cai and Cai [4,5]. In our setting, we cannot use arguments about the tightness of the budget, because we need to introduce a large approximation gap at the end of the construction. The usage of the sandwich problems without any budget constraints is precisely the way we overcome this difficulty. Thus, most of the old reductions do not work directly in our setting, but of course some technical constructions and ideas can be salvaged.
Outline. In Section 2 we introduce terminology and recall the most important facts from the previous works. Section 3 is devoted to the proof of our main result, Theorem 1. However, as the proof for H-free Edge Completion is similar to the proof for H-free Edge Deletion, in Section 3 we present only the proof for H-free Edge Deletion, while the proof for H-free Edge Completion is postponed to Section A.1. In Section 4 we discuss the proof of Theorem 3. Section 5 contains the discussion of Theorem 2, which is largely deferred to the appendix. Concluding remarks and prospects on future work are in Section 6.

Basic graph definitions
We use standard graph notation. For a graph G by V (G) and E(G) we denote the set of vertices and edges of G, respectively. Throughout the paper we consider simple graphs only, i.e., there are no self-loops nor parallel edges. We use K n to denote the complete graph on n vertices. By P (C ) we denote the path (cycle) with exactly vertices. By G we denote the complement of G, i.e., a graph on the same vertex set, where two distinct vertices are adjacent if and only if they were not adjacent in G. We say that a graph G is H-free, if G does not contain H as an induced subgraph.
We define a graph G to be 3-vertex-connected if G has at least 3 vertices, and removing any set of at most two vertices causes G to stay connected. For brevity, we call such graphs 3-connected.

Problems and approximation algorithms
In the decision version the H-free Edge Deletion (Completion) problem, for a given graph G and an integer k, one is to decide whether it is possible to delete (add) at most k edges from (to) G to make it H-free. In particular, we consider the P 5 -Free Deletion (Completion) problem, and call it House-Free Deletion (Completion). However, in the optimization variant of H-free Edge Deletion (Completion) the value of k is not given and the goal is to find a minimum size solution. It will be clear from the context whether we refer to a decision or optimization variant.
In the Sandwich H-Free Edge Deletion (Completion) problem we are given a graph G together with a subset D of undeletable edges. The question is whether there exists a subset F ⊆ E(G) \ D of deletable edges for which G − F is H-free. Note that it is a decision problem, where we ask about existence of any solution, i.e., we do not impose any constraint on the solution size.
Let f be a fixed non-decreasing function on positive integers. An f (OP T )-factor approximation algorithm for a minimization problem X is an algorithm that finds a solution of size at most f (OP T ) · OP T , where OP T is the size of an optimal solution for a given instance of X.

Satisfiability and Exponential Time Hypothesis
We employ the standard notation related to satisfiability problems. A 3CNF formula is a conjunction of clauses, where a clause is a disjunction of at most three literals. The 3SAT problem asks, for a given formula ϕ, whether there is a satisfying assignment to ϕ.
The Exponential Time Hypothesis (ETH), introduced by Impagliazzo, Paturi and Zane [10] is now an established tool used for proving conditional lower bounds in the parameterized complexity area (see [13] for a survey on ETH-based lower bounds).
Hypothesis 6 (Exponential Time Hypothesis (ETH) [10]). There is no 2 o(n) time algorithm for 3SAT, where n is the number of variables of the input formula.
The main consequence of the Sparsification Lemma of [10] is the following theorem: there is no subexponential algorithm for 3SAT even in terms of the number of clauses of the formula.

Hardness for 3-connected H
In this section we present the proof of Theorem 1 for H-free Edge Deletion, while a similar proof for H-free Edge Completion is deferred to Section A.1.

Deletion problems
We start with proving hardness of the sandwich problem. Lemma 8. Let H be a 3-connected graph with at least 2 non-edges. There is a polynomial-time reduction, which given an instance of 3SAT with n variables and m clauses, creates an equivalent instance of Sandwich H-free Edge Deletion with O(n + m) edges. Consequently, Sandwich H-free Edge Deletion is NP-hard for such graphs H.
Proof. Let ϕ be the given formula in 3CNF, and let vars and cls be the sets of variables and clauses of ϕ. By standard modifications of the formula, we may assume that each clause contains exactly three literals of pairwise different variables. We construct an instance G of Sandwich H-free Edge Deletion as follows. The graph G is created from three types of gadgets: a clause gadget, a variable gadget, and a connector gadget. They are depicted in Figure 1, where presented edges are deletable, and all others are undeletable. We first explain constructions of the gadgets, and then discuss connections between them. For each variable x ∈ vars, we create a variable gadget G x , which is the graph H with two added edges e x and e ¬x in place of any two non-edges of H. In the graph H x , all edges are marked as undeletable except e x and e ¬x . Intuitively, deletion of the edge e x or e ¬x mimics an assignment of the corresponding literal to true. The variable gadget forbids simultaneous assignments of both literals to true. If we delete both edges e x and e ¬x , we get an induced subgraph H in which we cannot delete any edge.
Each clause c = 1 ∨ 2 ∨ 3 ∈ cls has the corresponding clause gadget H c , which is a copy of the graph H. As H c is 3-connected, it has at least 3 edges. We pick arbitrarily three edges of H c and label them by e 1 , e 2 , e 3 . We mark all others edges as undeletable. In order to make the clause gadget H-free, we have to delete at least one edge from e 1 , e 2 , e 3 (note that some of the three distinguished edges might potentially share an endpoint). Intuitively, deletion of the edge labeled by e corresponds to assigning value true to literal .
The third type of gadgets is the connector gadget. The connector gadget C is a copy of the graph H, with one added edge in place of any non-edge of H. We label this edge as e in . In C, there also exists another edge that does not share any of its endpoints with e in . To see this, for the sake of contradiction suppose that every edge of C is incident to one of the endpoints of e in . If C has at least two vertices other than these endpoints, then the endpoints of e in form a vertex cut of size 2 separating them, a contradiction with 3-connectedness of H. Otherwise C has only one vertex other than the endpoints of e in , so H has at most 3 vertices; again, a contradiction with the 3-connectedness of H, as we assume H to have at least 2 non-edges. We select any edge in H that does not share endpoints with e in , and we label it as e out . Edge e out is made deletable, and all other edges of C are made undeletable. Note that deletion of the edge e in creates an induced subgraph H, and then we have to delete e out in order to destroy this subgraph.
Knowing the structure of all gadgets, we can proceed with the main construction of our reduction. Given a formula ϕ, for each clause c ∈ cls and variable x ∈ vars, we create the clause gadget H c and the variable gadget G x , respectively. Moreover, for each literal belonging to the clause c ∈ cls, we create a chain C ,c 1 , C ,c 2 , . . . , C ,c p+2 consisting of p + 2 copies of the connector gadget, where p = |V (H)|. This chain is constructed in the following way: the edge e out of C ,c i is identified with the edge e in of C ,c i+1 , for i = 1, . . . , p + 1. We also identify the edge e out in the subgraph C ,c p+2 with the edge e in the variable gadget of the variable of . Moreover, the edge e in in the subgraph C ,c 1 is identified with the edge e from the clause gadget H c . We use those chains to not allow the copy of H to be shared by any two gadgets, and we will prove it in the claim below.
Clearly, the constructed graph G has at most O(n + m) edges.
Claim 9. If G is a YES instance, then ϕ is satisfiable.
Proof. Take any solution to the instance G. Note that in each clause gadget we must delete at least one edge. We set the literals corresponding to the deleted edges to true, thus satisfying every clause.
We prove now that for each variable x we have not set both literals x and ¬x to true, so that we can find a true/false assignment to the variables that sets the literals accordingly. Deletion of an edge in the clause gadget propagates deletions up to the variable gadget via the chain of connector gadgets. This happens because the deletion of e in in C ,c 1 forces us to delete the e out in C ,c 1 , which is e in in C ,c 2 , so we are forced to delete e out in C ,c 2 , and so on. Following the chain of connector gadgets, it is easy to see that the edge e must be deleted in the corresponding variable gadget. As the solution to the instance G cannot delete both edges e x and e ¬x in any variable gadget at the same time, we obtain that there are no variables with both of its literals set to true.
Claim 10. If ϕ is satisfiable, then G is a YES instance.
Proof. Consider a true/false assignment that satisfies the formula ϕ and delete all edges in all clause gadgets that correspond to literals taking value true. Propagate deletions to all the connector and variable gadgets, as in the proof of Claim 9. It remains to prove that the obtained graph is indeed an H-free graph. By counting the number of edges in each gadgets, it follows that after the deletions, all gadgets become not isomorphic to H: in every variable gadget, we deleted exactly one edge, in every clause gadget, we deleted at least one edge, and in each connector gadget we deleted zero or two edges. So if the obtained graph contains an induced subgraph of H, then H is distributed across several gadgets. However, this is also not possible for the following reason.
For the sake of contradiction, suppose after the deletions there is an induced copy H of the graph H. Since H is connected and is distributed among more then one gadget, there have to be two different gadgets G 1 , G 2 that share a vertex, for which H contains both some vertex u ∈ V (G 1 ) \ V (G 2 ), and some vertex v ∈ V (G 2 ) \ V (G 1 ). Since H is 3-connected, there are 3 internally vertex-disjoint paths in H that lead from u to v. But every two gadgets share at most two common vertices, so at least one of these paths, say P , avoids V (G 1 ) ∩ V (G 2 ). Since the path P avoids V (G 1 ) ∩ V (G 2 ), from the construction of G it easily follows that such path P contains at least one vertex of some variable gadget and at least one vertex of some clause gadget. However, the distance between e in and e out in each connector gadget is at least 1, so the distance between any variable gadget and any clause gadget is at least |V (H)|. But the path P is entirely contained in H , thus its length is at most |V (H)| − 1, a contradiction. Claims 9 and 10 ensure that the output instance G is equivalent to the input instance ϕ of 3SAT, so we are done. Now, we show how to reduce Sandwich H-free Edge Deletion to the optimization variant of H-free Edge Deletion. Note that we only require H to have at least one non-edge; this is because we will reuse this lemma in the next section.
Lemma 11. Let H be a 3-connected graph with at least one non-edge, and p(·) be a polynomial with p( ) ≥ , for all positive . Then there is a polynomial-time reduction which, given an instance G of Sandwich H-free Edge Deletion, creates an instance (G , k) of H-free Edge Deletion, such that: • k is the number of deletable edges in G; Proof. We create G in the following way. For each undeletable edge uv, we add p(k) copies H uv i of the graph H, i = 1, . . . , p(k). In each copy, we choose any non-edge u i v i and identify the vertex u i with u, and v i with v. The construction is presented in Figure 2.
Note that if we delete the edge uv in G , we also must delete at least one edge in every H uv i . Hence, at least p(k) + 1 edges will be deleted in such a situation. With this observation in mind, we proceed to the proof of the correctness.
Proof. Let F be a subset deletable edges, such that G − F is H-free. Obviously |F | ≤ k, because there are k deletable edges in G in total. We will prove that G − F is also H-free, which implies that (G , k) is a YES instance.
Let us assume otherwise, that there is an induced copy H of H in G . Since G − F is H-free, we have that H has to contain at least one vertex of V (G ) \ V (G). Say that H contains some vertex , for some undeletable edge uv and some index i. The edge uv is undeletable in G, so it is not included in F . Consequently, the subgraph of G induced by V (H uv i ) contains one more edge than H, so it is not isomorphic to H. We conclude that H must contain some vertex y that lies outside of V (H uv i ). Since H is 3-connected, there are 3 internally vertex-disjoint paths between x and y in H.
v} is a vertex cut of size 2 that separates x and y. This is a contradiction, so G − F is indeed H-free.
Proof. For the sake of contradiction, suppose there is a set F of at most p(k) edges of G , such that G − F is H-free. Note that, F has to contain at least one undeletable edge uv, as otherwise F ∩ E(G) would be a solution to G. But then F has to contain at least p(k) more edges inside gadgets H uv i , for i = 1, 2, . . . , p(k), which is a contradiction with |F | ≤ p(k).
Claims 12 and 13 ensure the correctness of the reduction, and hence we are done.
By composing the reductions of Lemmas 8 and 11, we can deduce the part of Theorem 1 concerning deletion problems. Indeed, suppose H-free Edge Deletion admitted a polynomial-time q(OPT)factor approximation algorithm, for some polynomial q. Take any instance of 3SAT, and apply first the reduction of Lemma 8, and then the reduction of Lemma 11 for polynomial p( ) = q( ) · + 1. Finally, observe that the application of the hypothetical approximation algorithm for H-free Edge Deletion to the resulting instance would resolve whether the optimum value is at most k or at least p(k), which, by Lemma 11, resolves whether the input instance of 3SAT is satisfiable. The subexponential hardness of approximation under ETH follows from the same reasoning and the observation that the value of k in the output instance is bounded linearly in the size of the input formula.

Connections with Min Horn Deletion
In this section we prove Theorem 3. First, we need to introduce some definitions and notation regarding Min Horn Deletion hardness and completeness.
Khanna et al. [11] attempted to establish a full classification of approximability of boolean constraint satisfaction problems. In particular, many problems have been classified as APX-complete or poly-APX-complete. Even though some cases remained unresolved, Khanna et al. [11] grouped them into classes, such that all problems from the same class are equivalent (with respect to appropriately defined reductions) to a particular representative problem. One such representative problem is Min Horn Deletion, defined as follows: Given is a boolean formula ϕ in CNF that contains only unary clauses, and clauses with three literals out of which exactly one is negative. The problem asks for minimizing the number of ones in a satisfying assignment for ϕ.
We are not going to operate on instances of Min Horn Deletion directly, so the definition above is given only in order to complete the picture for the reader. Instead, we will rely on the approximation hardness results exhibited by Khanna et al. [11], which relate the approximability of various boolean CSPs to Min Horn Deletion. In particular, it is known that Min Horn Deletion does not admit a 2 O(log 1− nvars) approximation algorithm, unless P = NP, where n vars is the number of variables in the instance. On the other hand, it is an open problem whether any Min Horn Deletion-complete problem (under A-reductions, defined below) is actually poly-APX-complete.
Definition 14 (A-reducibility, Definition 2.6 of [11]). A combinatorial optimization problem is said to be an NPO problem if instances and solutions can be recognized in polynomial time, solutions are polynomially-bounded in the input size, and the objective function can be computed in polynomial time from an instance and a solution.
An NPO problem P is said to be A-reducible to an NPO problem Q, denoted P ≤ A Q, if there are two polynomial-time computable functions F and G and a constant α, such that: 1. For any instance I of P , F (I) is an instance of P .
2. For any instance I of P and any feasible solution S for F (I), G(I, S ) is a feasible solution for I.
3. For any instance I of P and any r ≥ 1, if S is an r-approximate solution for F (I), then G(I, S ) is an (αr)-approximate solution for I.
Intuitively, A-reductions preserve approximability problems up to a constant factor (or higher). As a source of Min Horn Deletion-hardness we will use the MinOnes(F) problem, defined below, for a particular choice of the family of constraints F.
In the MinOnes(F) problem, we are given a ground set of boolean variables X together with a set of boolean constraints. Each constraint f is taken from a specified family F, and f is applied to some tuple of variables from X. The goal of the problem is to find an assignment satisfying all the constraints, while minimizing the number of variables set to one. Note that the family F is considered a part of the problem definition, not part of the input. In order to use known results for the MinOnes(F) problem we need to define some properties of boolean constraints.
• A boolean constraint f is called weakly positive if it can be expressed using a CNF formula that has at most one negated variable in each clause.
• A boolean constraint f is 0-valid if the all-zeroes assignment satisfies it.
• A boolean constraint f is IHS-B + if it can be expressed using a CNF formula in which the clauses are all of one of the following types: x 1 ∨ · · · ∨ x k for some positive integer k ≤ B, or ¬x 1 ∨ x 2 , or ¬x 1 . IHS-B − constraints are defined analogously, with every literal being replaced by its complement.
The definition can be naturally extended to families of constraints, e.g., a family of constraints is weakly positive if all its constraints are weakly positive. We say that a family of constraints is IHS-B if it is either IHS-B + or IHS-B − (or both). The following result was proved by Khanna et al. [11].
Theorem 15 (Lemmas 8.7 and 8.14 from [11]). If a family of constraints F is weakly positive, but it is neither 0-valid nor IHS-B for any constant B, then the problem MinOnes(F) is Min Horn Deletion-complete under A-reductions; that is, there is an A-reduction from Min Horn Deletion to MinOnes(F) and an A-reduction from MinOnes(F) to Min Horn Deletion. Consequently, it is NP-hard to approximate MinOnes(F) within factor 2 O(log 1− nvars) for any > 0, where n vars is the number of variables in the given instance.
Our strategy for the proof of Theorem 3 is as follows. In Section 4.1 we show a reduction from MinOnes(F) to a properly defined quarantined version of K n \ e-free Edge Deletion. Next, in Section 4.2 we show a reduction which removes the quarantine. Finally, in Section 4.3 we conclude the proof of Theorem 3 and show the completeness with respect to A-reductions.
Note that having Theorem 3, we can immediately infer Corollaries 4,5 using Theorem 15 and the definition of an A-reduction.

From MinOnes(F) to Quarantined H-free Edge Deletion
In the Quarantined H-free Edge Deletion problem we are given a graph G, some edges of which are marked as undeletable. Quarantined H-free Edge Deletion is an optimization problem, where the goal is to obtain an H-free graph by removing the minimum number of deletable edges.
Next, we define the family of constraints that will be used in the MinOnes(F) problem.
Definition 16. We define the following constraints: • a constraint f 1 (x 1 , x 2 , x 3 ), which is equal to zero if and only if exactly one of the variables x 1 , x 2 , x 3 is set to 1; • a constraint f 2 (x) = x.
The family of constraints F is defined as A direct check, presented below, verifies that F has the properties needed to claim, using Theorem 15, thet MinOnes(F ) is Min Horn Deletion-hard. Proof. Note that f 1 is weakly positive since We prove now that f 1 is not IHS-B for any B. First, observe that any CNF formula expressing f 1 cannot contain a clause with only positive literals, as such a clause would not be satisfied by the assignment x 1 = x 2 = x 3 = 0, which in turn satisfies f 1 . Similarly, no clause can have only negative literals. Due to the definition of IHS-B, the only remaining case is a 2-clause with one positive and one negative literal. Without loss of generality, consider a clause x 1 ∨ ¬x 2 . Observe, that it is not satisfied by the assignment x 1 = 0, x 2 = x 3 = 1, which however satisfies f 1 . Therefore f 1 , and consequently F , is not IHS-B for any B.
Consequently, Theorem 15 and Lemma 17 together imply that MinOnes(F ) is Min Horn Deletion-hard under A-reductions. We now give our main reduction, from MinOnes(F ) to Quarantined K n \ e-free Edge Deletion.
Lemma 18. Let n ≥ 5. There is a polynomial-time computable transformation T which, given an instance I of the MinOnes(F ) problem, outputs an instance T (I) of the Quarantined K n \ e-free Edge Deletion problem, such that: • if I admits a satisfying assignment with k ones, then there is a solution of cost ∆ · k for the instance T (I), • if T (I) admits a solution of cost k , then there is a satisfying assignment with k /∆ ones for the instance I, where ∆ = 9n 2 vars + 2 and n vars is the number of variables in I.
Proof. First, we show how to transform an instance I (with a formula ϕ) of MinOnes(F ) into an instance T (I) (with a graph G) of Quarantined K n \ e-free Edge Deletion. Given an instance I, for any constraint f 1 (x, y, z) we create a separate clique K n , which will be called the constraint clique. We arbitrarily choose three edges in the clique and label them x, y, z. Mark all edges as undeletable except edges labelled by x, y, z. Moreover, for each variable x we additionally create a clique K n (called further the variable clique), and mark all edges in the clique as undeletable except two edges, which we label by x in , x out . The edges x in , x out are selected arbitrarily, however we require that they do not share common endpoints. Now we connect the variable cliques with the constraint cliques. For each variable x and a constraint f 1 of the instance I which contains x among its arguments, we add three cliques, as shown in Figure 3, such that the following properties are satisfied: • The first added clique shares with the variable clique of x only the edge x out .
• The second added clique shares one deletable edge with the first clique and a different deletable edge with the third clique. Label both these deletable edges by x.
• The third added clique shares with the clique corresponding to the constraint only the edge labelled (in the constraint clique) by x.
All the other edges of the introduced cliques, not mentioned above, are marked as undeletable. Note that each of the introduced cliques shares two edges with two different cliques. We may perform this construction so that these two edges never share endpoints (as depicted Figure 3), and hence we will assume this property. Denote by δ(x) the number of occurrences of the variable x in all f 1 -type constraints. Note that, by removing superfluous copies of the same constraint, we can assume that all f 1 -type constraints are pairwise different, so in particular there is at most n 3 vars of them. Also, for any variable x we have δ(x) ≤ 3n 2 vars . Next, for each variable x we add 3 · (3n 2 vars − δ(x)) or 3 · (3n 2 vars − δ(x)) + 1 cliques that share the deletable edge x in from the variable clique of x, and are otherwise disjoint. Moreover, in each such clique we make one more edge deletable; we label it by x. We add 3 · (3n 2 vars − δ(x)) cliques if the formula does contain the clause f 2 (x) = x, and 3 · (3n 2 vars − δ(x)) + 1 cliques otherwise. Finally, if there is a clause f 2 (x) = x in the instance I, then we delete the edge labelled by x in in the corresponding variable clique.
Observe that in the constructed instance of Quarantined K n \ e-free Edge Deletion, among all the 9n 2 vars + 2 edges labelled by x, x in , x out , where x is any variable, we have to delete either none, or all of them. This is because the deletion of any of them forces the deletion of all the others due to the appearance of induced copies of K n \ e in the graph. Moreover, if the edge x in is not present due to the existence of constraint f 2 (x) = x in I, then all of them have to be deleted.
Claim 19. If there is a satisfying assignment with k ones for the instance I, then it is possible to delete (9n 2 vars + 2) · k edges in T (I) in order to make it a K n \ e-free graph.
Proof. It is enough to delete all edges labelled by x, x in , x out for all variables x that are set to 1 in the satisfying assignment; the number of such edges is exactly (9n 2 vars + 2) · k. Let us prove the statement. Suppose the obtained graph is not K n \ e-free. Let H be an induced subgraph isomorphic to K n \ e. Note that for n ≥ 5 the graph K n \ e is 3-connected. Moreover, even after deletion of two arbitrary vertices in K n \ e, there are no two vertices at distance larger than two. Consequently, a direct check shows that the assumed H subgraph must stay completely in one of the cliques corresponding to a constraint or to a variable, or in one of the cliques connecting a variable clique with a constraint clique.
Obviously, H cannot be contained in a variable clique or a connection clique, as in such cliques either all edges are present, or two edges are missing. This means that H must stay in a constraint clique, so exactly one of the edges of this constraint clique is deleted. However, this is equivalent with the corresponding constraint being not satisfied under the considered assignment; this is a contradiction.
Claim 20. If T (I) admits a solution of cost k , then there is a satisfying assignment for the instance I with k /(9n 2 vars + 2) ones.
Proof. Take any solution for the output instance T (I). As mentioned earlier, in any solution for T (I), for any variable x either all edges labeled by x, x in , x out are deleted or none of them is deleted. The number of such edges for one variable x is equal to 9n 2 vars + 2. We set a variable to 1 if and only if the corresponding edges are deleted in the considered solution for T (I). All clauses of the form f 2 (x) will be satisfied, since in the construction of T (I) we delete x in if the clause f 2 (x) = x is present in I. All f 1 -type constraints will be satisfied as well, as otherwise in the clique corresponding to an unsatisfied constraint only one edge would be deleted and, hence, the graph would not be K n \ e-free.
The correctness of the transformation follows from Claims 19 and 20; hence the proof of Lemma 18 is complete.

Lifting the quarantine
In the following lemma we show how to reduce an instance of the quarantined problem to its regular version, using the same approach as in the proof of Lemma 11.
Lemma 21. Let n ≥ 5. There is a polynomial-time reduction which, given an instance G of Quarantined K n \ e-free Edge Deletion with m edges, outputs an instance G of K n \ e-free Edge Deletion such that: • G has O(m 3 ) vertices and edges.
• If there is a solution of size k for the instance G, then there is a solution of size k for the instance G .
• If there is a solution of size k ≤ m 2 for the instance G , then there is a solution of size k for the instance G.
Proof. We apply the reduction described in the proof of Lemma 11 for p(m) = m 2 and H = K n \ e. Now we verify that G has the claimed properties. The bound on the size of G follows directly from the size bound given by Lemma 11. Suppose first that G has some solution of size k. In the proof of Lemma 11 we argued that the same solution also works for the instance G (see the proof of Claim 12). Hence, G also has a solution of size k.
Suppose now that G has a solution F of some size k ≤ m 2 . In the proof of Claim 13 we argued that F does not delete any of the undeletable edges of G, because this would require deleting at least m 2 more edges in the attached gadgets. Hence, F ∩ E(G) is a set of size at most k, whose deletion turns G into an H-free graph, due to being an induced subgraph of G − F . Hence, G has some solution of size at most k.
The composition of the reductions of Lemmas 18 and 21 gives an A-reduction (for α = 1) from a Min Horn Deletion-hard problem MinOnes(F), yielding the hardness part of Theorem 3. Indeed, given an instance I of MinOnes(F) we can transform it into an instance G of Quarantined K n \ e-free Edge Deletion using Lemma 18, which in turn we can further transform into an instance G of K n \ e-free Edge Deletion using Lemma 21. Given any feasible solution F for G we check whether |F | ≤ |E(G)| 2 . If this is the case, we translate back the solution F into a solution F for G (using Lemma 21) and then into a solution for the initial instance I (using Lemma 18). On the other hand, if |F | > |E(G)| 2 , then for the initial instance I we may just take a trivial solution being an assignment setting all the variables to one.

Completeness
To finish the proof of Theorem 3 it remains to show a reduction in the other direction: from K n \ e-free Edge Deletion to Min Horn Deletion. We achieve this goal by presenting an A-reduction from the K n \ e-free Edge Deletion problem to another variant of MinOnes(F), which is Min Horn Deletion-complete.
Definition 22. Let n ≥ 5, and let t = n(n − 1)/2. We define family of constraints F n = {f n , g n } as follows: The proof of the following lemma is a technical check that is essentially the same as the proof of Lemma 17. Hence, we leave it to the reader.
Lemma 23. For each n ≥ 5, the set of constraints F n = {f n , g n } is weakly positive, and at the same time it is neither 0-valid, nor IHS-B for any B.
Therefore, by Theorem 15 we know that MinOnes(F n ) is Min Horn Deletion-complete and it suffices to present an A-reduction from K n \ e-free Edge Deletion to MinOnes(F n ).
Lemma 24. There is a polynomial-time algorithm, which given an instance G of K n \ e-free Edge Deletion produces an instance I of MinOnes(F n ), such that it is possible to remove exactly k edges in G to make it K n \ e-free if and only if one can find a satisfying assignment for I that sets exactly k variables to 1.
Proof. Consider an instance G of the K n \ e-free Edge Deletion problem. We enumerate all the edges in the graph G as e 1 , e 2 , . . . , e m , and to each edge e i we assign a fresh boolean variable x i . For any induced subgraph H isomorphic to K n \ e we list all its edges e i 1 , e i 2 , . . . , e i t−1 and create a corresponding constraint g(x i 1 , x i 2 , . . . , x i t−1 ). For any induced clique K containing n vertices and edges e i 1 , e i 2 , . . . , e it , we create a constraint f (x i 1 , x i 2 , . . . , x it ). The output instance I of MinOnes(F n ) is obtained by taking x i to be the variable set, and putting all the constraints constructed above.
Note that if we delete some edges in the graph G, then an induced copy of the graph K n \ e can be obtained only on vertices that originally were inducing K n \ e or K n . The constraints in the constructed instance guarantee that in each induced K n \ e subgraph at least one edge from the subgraph must be deleted, and in each induced subgraph K n either at least two edges should be deleted, or none of the edges should be deleted. So, for any S ⊆ {1, 2, . . . , |E(G)|}, the graph G − F , where F = {e i : i ∈ S}, is K n \ e-free if and only if the assignment {x i = 1 iff i ∈ S} satisfies I. This equivalence of solution sets immediately proves the lemma.
As discussed earlier, Lemma 24 gives an A-reduction K n \e-free Edge Deletion to MinOnes(F n ), which is Min Horn Deletion-complete, thereby proving that K n \ e-free Edge Deletion is A-reducible to Min Horn Deletion. This concludes the proof of Theorem 3.

Specific constructions for short paths and cycles
In this section we extend the general results yielded by Theorems 11 and 42 in the direction of obtaining a full picture of the approximation complexity for H being a path or a cycle. It can be easily seen that the complements of P and C for ≥ 6 satisfy the preconditions of Theorems 11 and 42. Hence, by complementation, we have already established hardness of approximation for these cases. We are left with considering the edge modification problems for H = P and H = C for ≤ 5. Therefore, to complete the proof of Theorem 2, it remains to prove the following.
Lemma 25. Let H be equal to C 4 , C 5 , or P 5 . Then neither H-free Edge Deletion nor H-free Edge Completion admits a poly(OPT)-factor approximation algorithm working in polynomial time, unless P = NP. Moreover, unless ETH fails, there is even no poly(OPT)-factor approximation algorithm working in time 2 o(OPT) · n O(1) , for any of these problems.
Before we proceed to the proof of the missing cases (Lemma 25), let us check that we indeed obtain a full classification for cycles, and an almost full classification for paths, as promised in the introduction. The problem C 3 -Free Edge Deletion, aka Triangle-Free Edge Deletion, admits a trivial greedy 3-approximation algorithm, whereas P 3 -Free Edge Deletion, aka Cluster Edge Deletion, admits a constant-factor approximation algorithm given by Natanzon [14]. The problem C 3 -Free Edge Completion has no sense, and P 3 -Free Edge Completion is polynomial-time solvable because there is only one way to destroy every obstacle. The only missing case is P 4 -Free Edge Deletion, which is equivalent to P 4 -Free Edge Completion by complementation.
The rest of this section is devoted to the proof of Lemma 25. For this, we implement the same strategy as in Theorems 11 and 42: we first prove hardness of sandwich problems by giving linear reductions from 3SAT, and then we reduce to the standard optimization variant by introducing the approximation gap. For convenience, instead of working with P 5 -Free Edge Deletion and P 5 -Free Edge Completion, we respectively consider House-Free Edge Completion and House-Free Edge Deletion, where house is the complement of P 5 : a 4-cycle with a triangle built on one of the edges (see Figure 4). These problems are equivalent to the ones concerning P 5 -s by complementation of the instance. Also, observe that C 5 -Free Edge Deletion and C 5 -Free Edge Completion are equivalent by complementation, and hence we consider only the former.

Sandwich deletion problems
We start with the hardness proof for Sandwich C 4 -Free Edge Deletion, which will serve as a template for other reductions. The structural property of the instance, described in the statement, will turn out to be useful in some further arguments.  Proof. Let ϕ be the given formula in 3CNF, and let vars and cls be the sets of variables and clauses of ϕ. By standard modifications of the formula, we may assume that each clause contains exactly three literals of pairwise different variables. We introduce gadgets for variables, for clauses, and for connections between variable and clause gadgets. They are depicted in Figure 5, where thick edges are undeletable and dashed edges are deletable. The variable gadget G vars , depicted on the first panel, has four named vertices u , v , u ⊥ and v ⊥ , which will be used to connect the copies of this gadget to the rest of the construction. The properties of the variable gadget are described in the following claim. Its proof follows by a direct check, and hence is omitted.
Claim 27. There are exactly two solutions to the Sandwich C 4 -Free Edge Deletion instance G vars . One of them, denoted F , contains u v and does not contain u ⊥ v ⊥ , and the second, denoted F ⊥ , contains u ⊥ v ⊥ and does not contain u v .
Next, we describe the clause gadget H cls , depicted on the second panel of Figure 5. It consists of a clique on 6 vertices {s 1 , t 1 , s 2 , t 2 , s 3 , t 3 }, where the cycle s 1 − t 1 − s 2 − t 2 − s 3 − t 3 − s 1 has deletable edges, and all the other edges are undeletable. Again, the properties of the clause gadget are described in the following claim, whose proof is omitted due to being straightforward.
Claim 28. In the Sandwich C 4 -Free Edge Deletion instance H cls there is no solution that simultaneously contains all three edges s 1 t 1 , s 2 t 2 and s 3 t 3 . However, for each i = 1, 2, 3, there is a solution F i that does not contain s i t i , but contains both the other edges from this triple.
For every variable x ∈ vars we create a copy G x of the variable gadget G vars . The copies of vertices u , v , u ⊥ and v ⊥ in G x are respectively renamed to u x , v x , u x ⊥ and v x ⊥ . For every clause c ∈ cls we create a copy H c of the clause gadget H cls . The copies of vertices s 1 , t 1 , s 2 , t 2 , s 3 , t 3 in H c are respectively renamed to s c 1 , t c 1 , s c 2 , t c 2 , s c 3 , t c 3 .
Finally, we wire the variable gadgets and clause gadgets using connector gadgets, which are just C 4 -s (depicted on the third panel of Figure 5). More precisely, whenever x appears in the i-th literal clause c, we connect s c i with u x p and t c i with v x p using undeletable edges, where p = if the appearance of x in c is positive, and p = ⊥ if it is negative. Note that the deletable edges uv and st depicted in Figure 5 are always present in respective variable or clause gadgets This concludes the construction; the constructed graph will be denoted by G. Obviously G has O(n + m) vertices and edges. It is straightforward to see that the asserted structural property of G is satisfied: the subgraph spanned by deletable edges consists of disjoint paths and cycles on 6 vertices, hence every C 4 subgraph must contain at least one undeletable edge.
We now need to verify that the obtained instance G of Sandwich C 4 -Deletion has a solution if and only if the input formula ϕ is satisfiable. For this, the following claim will be useful; its proof is a straightforward check following from the fact that each vertex of a clause gadget is incident with at most one edge leading to a variable gadget, and hence we omit the proof.
Claim 29. Every (not necessarily induced) C 4 in G is entirely contained in one variable gadget, in one clause gadget, or forms one connector gadget.
Suppose first that α : vars → {⊥, } is a variable assignment that satisfies ϕ. Construct a subset F of deletable edges in G as follows: • For each variable x ∈ vars, add to F the solution F α(x) in the variable gadget G x , given by Claim 27.
• For each clause c ∈ cls, arbitrarily choose an index i c ∈ {1, 2, 3} of any of its literal that satisfies it under α; such literal exists due to α being a satisfying assignment. Then add to F the solution F ic in the clause gadget H c , given by Claim 28.
By Claim 29, to verify the G − F is C 4 -free, it suffices to show that there is no induced C 4 within any variable gadget or within any clause gadget, and that one of the edges in each connector gadget is removed. The first two checks follow immediately from Claims 27 and 28. For the last check, fix some clause c and variable x appearing in it; we examine the connector gadget between G x and H c .
Suppose that x appears in the i-th literal of c, and assume w.l.o.g. that this appearance is positive; the second case is symmetric. If α(x) = , then the edge u x v x is deleted in G x , and hence the C 4 in the connector gadget is destroyed. Otherwise α(x) = ⊥, and hence the literal containing x cannot satisfy the clause c under assignment α. From the construction of F it follows that the edge s c i t c i is deleted in the gadget H c , and hence the C 4 in the connector gadget is also destroyed. For the other direction, suppose that there is a subset F of deletable edges in G such that G − F is C 4 -free. By Claim 27, the intersection of F with the edge set of each variable gadget G x must be equal either to solution F or to solution F ⊥ . Define assignment α : vars → {⊥, } as follows: α(x) = if this intersection is F , and α(x) = ⊥ if it is F ⊥ . In particular, edge u x v x belongs to F if and only if α(x) = , and the symmetric claim holds also for u x ⊥ v x ⊥ . We verify that α is a satisfying assignment for ϕ. Take any clause c ∈ cls, and for the sake of contradiction suppose it is not satisfied under α. By the construction of α, this means that in all three connector gadgets connecting H c with variable gadgets of variables appearing in c, the deletable edges from the variable gadgets are not included in F . Since each connector gadget induces a C 4 with only two edges deletable, it follows that all three edges s c 1 t c 1 , s c 2 t c 2 , and s c 3 t c 3 have to be included in F . However, Claim 28 asserts that there is no solution within the clause gadget H c that simultaneously contains all these three edges. This is a contradiction, and hence we conclude that assignment α satisfies formula ϕ.  We now move to the proof for Sandwich C 5 -Free Edge Deletion, which is a minor modification of the construction for Sandwich C 4 -Free Edge Deletion. For this reason, we only sketch how the construction need to be modified, and argue that the correctness proof follows the same steps.
Lemma 30. There is a polynomial-time reduction which, given an instance of 3SAT with n variables and m clauses, constructs an equivalent instance G of Sandwich C 5 -Free Edge Deletion with O(n + m) edges. Consequently, Sandwich C 5 -Free Edge Deletion is NP-hard.
Proof. We perform essentially the same construction as in the proof of Lemma 26, but we replace the variable, clause and connector gadgets with C 5 -specific constructions depicted in Figure 6.
The variable gadget G vars is depicted on the first panel of Figure 6. As before, it has four named vertices: u , v , u ⊥ , and v ⊥ . Again, a direct check, whose proof is omitted, yields the following.
Claim 31. There are exactly two solutions to the Sandwich C 5 -Free Edge Deletion instance G vars . One of them, denoted F , contains u v and does not contain u ⊥ v ⊥ , and the second, denoted F ⊥ , contains u ⊥ v ⊥ and does not contain u v .
The clause gadget H cls is depicted on the second panel of Figure 6. It has five vertices, but in order to keep the description same as in Lemma 26, one of them is named both s 1 and s 2 . Thus, the gadget has three deletable edges s 1 t 1 , s 2 t 2 , and s 3 t 3 . Again, a direct check, whose proof is omitted, yields the following.
Claim 32. In the Sandwich C 5 -Free Edge Deletion instance H cls there is no solution that simultaneously contains all three edges s 1 t 1 , s 2 t 2 and s 3 t 3 . However, for each i = 1, 2, 3, there is a solution F i that does not contain s i t i , but contains both the other edges from this triple.
As in the proof of Lemma 26, we create one variable gadget G x for each variable x, and one clause gadget H c for each clause c. We follow the same renaming convention, where the variable/clause corresponding to the gadget is in the superscript of each vertex of this gadget. The variable and clause gadgets are connected to each other via connector gadgets exactly as in Lemma 26, which this time are simply C 5 -s (see the third panel of Figure 6): the appropriate vertex s is connected to the appropriate vertex u via a path of length 2, and the appropriate vertex t is connected to the appropriate vertex v via a single edge; all these edges are undeletable. Similarly as in the proof of Lemma 26, a direct check yields the following.
Claim 33. Every (not necessarily induced) C 5 in G is entirely contained in one variable gadget, in one clause gadget, or forms one connector gadget.
We remark that for the check of Claim 33 it is important that the vertex s 1 = s 2 in the clause gadget that is shared between two deletable edges, is always the endpoint of the path of length 2, not 1, in the corresponding connector gadgets connecting it to variable gadgets.
Having Claims 31, 32, and 33 in place, the proof of the correctness is exactly the same as in the proof of Lemma 30. We leave the easy verification to the reader.
For now, we postpone the argumentation for the remaining deletion problem, namely House-Free Edge Deletion. We will deal with this case later, using a different reasoning.

Sandwich completion problems
We now proceed with proving the hardness of sandwich variants of the relevant completion problems: C 4 -Free Edge Completion and House-Free Edge Completion.
Lemma 34. There is a polynomial-time reduction which, given an instance of 3SAT with n variables and m clauses, constructs an equivalent instance G of Sandwich C 4 -Free Edge Completion with O(n + m) vertices, edges, and fillable non-edges. Moreover, G has the following additional property: the graph spanned by fillable non-edges does not contain any (not necessarily induced) C 4 . Consequently, Sandwich C 4 -Free Edge Completion is NP-hard, even on such instances.
Proof. We modify slightly the reduction of Drange et al. [7], which shows that (the minimization variant of) C 4 -Free Edge Completion has no subexponential-time algorithm, under the assumption of ETH. Unfortunately, while this construction happens to basically work "as is" in our setting, the proof of its correctness, contained in [7], uses budget constraints for convenience. For this reason, we now recall the whole construction, perform slight modifications to adjust it to the sandwich setting, and argue its correctness.
Let ϕ be the given formula in 3CNF, and let vars and cls be the sets of variables and clauses of ϕ. By standard modifications of the formula we may assume that each clause contains exactly three literals of pairwise different variables. For a variable x, by p x we denote the number of occurrences of x in ϕ. By copying the whole formula several times, we may assume that p x ≥ 2 for each x ∈ vars.
For each variable x, we construct a variable gadget G x depicted in Figure 7; this gadget is exactly the same as in [7], and in particular the figures depicting it are taken verbatim from [7] by the consent of the authors. The gadget consists of two cycles of length 4p x : connected into a cyclic "ladder" by adding edges t x i b x i , for all i = 0, 1, . . . , p x − 1. Moreover, for each i = 0, 1, . . . , p x − 1 we introduce vertices u x i and d x i . We make u x i adjacent to t x i−1 , t x i , and t x i+1 (the indices behave cyclically modulo 4p x ), whereas b x i is made adjacent to d x i−1 , d x i , and d x i+1 . In the constructed sandwich instance, within the gadget G x we declare only the diagonals of the C 4 -s to be fillable, i.e., edges t x i b x i+1 and t x i+1 b x i for i = 0, 1, . . . , p x − 1. All the other non-edges cannot be filled. The following claim verifies that the constructed gadget has exactly two solutions. We remark that the proof of Drange et al. [7] used at this point the budget constraints.
Claim 35. The Sandwich C 4 -Free Edge Completion instance G x has exactly two solutions, depicted on the second and third panel of Figure 8. The solution that takes all edges of the form t x i b x i+1 will be denoted by F x , whereas the solution that takes all edges of the form t x i+1 b x i will be denoted by F x ⊥ . Figure 7: Variable gadget G x ; the light grey lines represent fillable edges. The figure is taken almost verbatim from Drange et al. [7], by the consent of the authors.  Proof. For i = 0, 1, . . . , Fix any solution F in the instance G x . Let A be the set of those indices i for which t x i b x i+1 ∈ F , and let B be the set of those indices i for which t x i+1 b x i ∈ F . Each set W i induces a C 4 , and hence one of the edges t x i b x i+1 or t x i+1 b x i needs to be filled in F . Therefore A ∪ B = {0, 1, . . . , p x − 1}. We claim that if i ∈ A, then i + 1 / ∈ B. Indeed, otherwise we would obtain an induced C 4 with both diagonals non-fillable, which is a contradiction. Hence, in particular i ∈ A implies i + 1 ∈ A, so A is either empty or equal to {0, 1, . . . , p x − 1}. Since i ∈ A implies i + 1 / ∈ B, in the latter case we have that B is empty. We conclude that either A = ∅ and B = {0, 1, . . . , p x − 1}, or A = {0, 1, . . . , p x − 1} and B = ∅; this corresponds to the two solutions described in the statement.
We now move on to the description of the clause gadget H c , constructed for every clause c ∈ cls. Again, we use almost exactly the same construction as Drange et al. [7]. The construction is depicted in Figure 9, which is again taken almost verbatim from Drange et al. [7], by the consent of the authors.
The gadget consists of 8 vertices: v c t and u c t , for t = 1, 2, 3, 4. There are five fillable non-edges: All the other non-edges are declared to be not fillable; note that in particular u c 4 v c 4 is not fillable. The following claim, which can be verified by a direct check, explains the properties of the clause gadget.  Finally, we connect clause gadgets with variable gadgets using connector gadgets, which are just C 4 -s. More precisely, if the i-th literal of a clause c is x, and the occurrence of x in c is the (j + 1)-st occurrence of x in formula ϕ, then we add edges: , provided x appears in c positively. This concludes the construction of the graph G. The only non-edges that we allow to fill are the ones declared fillable in variable and clause gadgets: 8p x diagonal non-edges in each variable gadget G x , and 5 non-edges in each clause gadget H c . All the other non-edges are declared to be non-fillable. Obviously, G has O(n + m) vertices, edges, and fillable non-edges. To see that the constructed instance has the structural property asserted in the lemma statement, observe that the graph spanned by fillable non-edges consists of paths of length 2 or 3 and cycles of length 8 or more, and hence it has no C 4 subgraph. We are left with verifying the correctness of the reduction.
First, suppose the input formula ϕ has a satisfying assignment α. Construct solution F as follows: • For each variable x ∈ vars, add to F the solution F x α(x) in the variable gadget G x , given by Claim 35.
• For each clause c ∈ cls, arbitrarily choose an index i c ∈ {1, 2, 3} of any of its literal that satisfies it under α; such literal exists due to α being a satisfying assignment. Then add to F the solution F c ic in the clause gadget H c , given by Claim 36. It can be easily verified, using the fact that assignment α satisfies ϕ, that G + F is C 4 -free and hence F is a solution. This check is also contained in Drange et al. [7] (see the proof of Lemma 5.8 therein), and hence we omit it here.
For the other direction, we repeat the reasoning of Drange et al. [7], because we need to adjust it to the sandwich variant. Suppose that there exists a subset F of fillable non-edges such that G+F is C 4free. By Claim 35, the intersection of F with the fillable non-edges of each variable gadget G x has to be either equal to solution F x or to solution F x ⊥ . Let α : vars → {⊥, } be a variable assignment defined as follows: for a variable x, if the aforementioned intersection is F x then we set α(x) = , and otherwise, if it is F x ⊥ , then we set α(x) = ⊥. To verify that α is a satisfying assignment, suppose, for the sake of contradiction, that some clause c is not satisfied under α. By Claim 36, at least one of the edges u c 1 v c 1 , u c 2 v c 2 , and u c 3 v c 3 must belong to F , say u c i v c i . Let x be the variable in the i-th literal of c. Since this literal does not satisfy c, by the construction of α we infer that the two vertices in G x that are adjacent to u c i and v c i , are connected by a filled edge of F x α(x) . Hence, u c i , v c i , and these two vertices form a C 4 in G+F , with both diagonals being non-fillable. This is a contradiction with G+F being C 4 -free.
The hardness of Sandwich House-Free Edge Completion is established by a reduction from Sandwich C 4 -Free Edge Completion.
Lemma 37. There is a polynomial-time reduction which, given an instance G of Sandwich C 4 -Free Edge Completion with n vertices, m edges, and k fillable non-edges, with the additional assumption that the graph spanned by fillable non-edges contains no C 4 subgraph, constructs an equivalent instance G of Sandwich House-Free Edge Completion with O(n + m) vertices and edges, and k fillable non-edges. Consequently, Sandwich House-Free Edge Completion is NP-hard.
Proof. Starting from G, construct G as follows: for each edge uv ∈ E(G), introduce a new vertex w uv and make it adjacent to u and to v. The fillable non-edges in graph G are only the ones that were fillable in the original instance G; that is, every non-edge incident to any of the new vertices is nonfillable. We claim that the output instance G of Sandwich House-Free Edge Completion has a solution if and only if the input instance G of Sandwich C 4 -Free Edge Completion has a solution.
Suppose first that G has a solution F . Since every non-edge incident to any vertex of V (G )\V (G) is non-fillable, F consists only of non-edges that were fillable in the original instance G. We claim that F is also a solution to instance G of Sandwich C 4 -Free Edge Completion. For this, it suffices to verify that G+F has no induced C 4 . For the sake of contradiction, suppose there exists some induced C 4 in G + F , and call it D. Since in G there was no C 4 formed by four fillable non-edges, at least one edge uv of D is an original edge of G. For this edge we have created vertex w uv , which is adjacent both to u and to v. Since the non-edges connecting w uv to the other two vertices of D are not fillable, we infer that V (D)∪{w uv } induces a house in G+F . This is a contradiction with G+F being house-free.
For the other direction, suppose the original instance G has a solution F ; that is, F consists only of fillable non-edges and G + F is C 4 -free. We claim that G + F is house-free, and hence F is also a solution to the instance G of Sandwich House-Free Edge Completion. For the sake of contradiction, suppose G + F contains some induced house D; let D be the C 4 contained in D. At least one vertex of D does not belong to V (G), because otherwise D would be an induced C 4 in G+F , which is C 4 -free by assumption. Hence, this vertex is of the form w uv for some edge uv of G. Note that u and v are the only two neighbors of w uv in G +F , and hence they must be also its neighbors on the 4cycle D . However, uv is an edge of G, which contradicts the supposition that D is an induced C 4 .

From sandwich problems to hardness of approximation
Having proven the NP-hardness of sandwich problems, we can use them to prove the hardness of approximation for the standard variants, as in Theorems 11 and 42. For this, we need analogues of Theorems 11 and 42, which provide reductions from sandwich problems by turning the additional hard constraints into approximation gap. However, the proofs of Theorems 11 and 42 use the assumption about 3-connectedness, which is not available in our current setting. Hence, we need to verify by hand that the same strategy still works.
Lemma 38. Let (Π, Π ) be one of the following pairs of problems: • Sandwich C 4 -Free Edge Deletion and C 4 -Free Edge Deletion; • Sandwich C 5 -Free Edge Deletion and C 5 -Free Edge Deletion; • Sandwich C 4 -Free Edge Completion and C 4 -Free Edge Completion; • Sandwich House-Free Edge Completion and House-Free Edge Completion.
Let p(·) be a polynomial with p( ) ≥ , for all positive . Then there is a polynomial time reduction, which given an instance G of Π, creates an instance (G , k) of Π , such that: Figure 10: Gadgets attached to undeletable edges, resp. non-fillable non-edges, in the constructions in the proof of Theorem 38.
• k is the number of deletable edges, resp. fillable non-edges, in G; • If G is a YES instance of Π, then (G , k) is a YES instance of Π ; • If G is a NO instance of Π, then (G , p(k)) is a NO instance Π .
Proof. We give the proof for (Π, Π ) being Sandwich C 4 -Free Edge Deletion and C 4 -Free Edge Deletion, and then we shortly discuss how it can be modified to work for the other problem pairs. Let G be the input instance of Sandwich C 4 -Free Edge Deletion, and let k be the number of deletable edges in G. Starting from G, construct graph G as follows: for every undeletable edge uv ∈ E(G), add p(k) + 2 vertices w i uv , for i = 1, . . . , p(k) + 2. Each of these vertices is adjacent only to u and v. This concludes the construction of G ; we are left with verifying that G has the requested properties.
First, suppose that G is a YES instance of Sandwich C 4 -Free Edge Deletion, that is, there is some subset F of deletable edges of G such that G − F is C 4 -free. Obviously |F | ≤ k, because there are k deletable edges in G in total. We claim that then F is also a solution to instance (G , k) of C 4 -Free Edge Deletion. For this, it suffices to verify that G − F is also C 4 -free.
For the sake of contradiction, suppose that G − F contains some induced C 4 ; call it D. Since G − F is C 4 -free, at least one vertex of D is outside of V (G), and hence it is of the form w i uv for some undeletable edge uv of G and i ∈ [p(k) + 2]. As uv is undeletable, we have that uv / ∈ F . As w i uv has degree 2 in G, we have that the two neighbors of w i uv on D must be u and v. However, uv is still present in G − F , and hence it would be a chord in the induced 4-cycle D; this is a contradiction.
For the other direction, suppose that (G , p(k)) is a YES instance of C 4 -Free Edge Deletion, that is, there is a subset F of at most p(k) edges of G such that G − F is C 4 -free.
We first claim that F does not contain any edge of G that is undeletable. Suppose the contrary: there is some edge uv in F that is an undeletable edge of G. Recall that we have constructed p(k) + 2 vertices w i uv that are pairwise non-adjacent, and adjacent to u and v. Since |F | ≤ p(k), there have to be at least two of these vertices, say w i uv and w j uv , for which F does not contain any of the edges incident to w i uv or w j uv . Since uv ∈ F , we infer that {u, v, w i uv , w j uv } induces a C 4 in G − F , a contradiction.
Hence, F contains no undeletable edge of G. Consider set F = E(G) ∩ F : this set contains only deletable edges of G, and moreover G−F has to be C 4 -free due to being an induced subgraph of G −F . We conclude that F is a solution to the original instance G of Sandwich C 4 -Free Edge Deletion.
To prove the claim for the remaining 3 pairs of problems, we need to design problem-specific gadgets that are attached to an undeletable edge, resp. non-fillable non-edge, to force a large cost of breaking the constraint. The constructions are given in Figure 10. More precisely: • For Sandwich C 5 -Free Edge Deletion and C 5 -Free Edge Deletion, we add p(k) + 1 paths of length 3 and p(k) + 1 paths of length 2 between u and v, for each undeletable edge uv.
• For Sandwich C 4 -Free Edge Completion and C 4 -Free Edge Completion, we add p(k) + 1 paths of length 3 between u and v, for each non-fillable non-edge uv.
• For Sandwich House-Free Edge Completion and House-Free Edge Completion, we add p(k) + 1 paths of length 3 between u and v, for each non-fillable non-edge uv. Moreover, in each of these paths we build a triangle on the middle edge.
It is straightforward to verify that with these constructions, essentially the same reasoning as for C 4 -Free Edge Deletion goes through. We leave the details to the reader.
The only problem left is House-Free Edge Deletion, which by complementation is equivalent to P 5 -Free Edge Completion. Note that we even did not establish hardness of the sandwich variant of this problem. The reason for this is that we find it the simplest to prove the appropriate analogue of Lemma 38, stated below, using a direct reduction from Sandwich C 4 -Free Edge Deletion.
Lemma 39. Let p(·) be a polynomial with p( ) ≥ , for all positive . Then there is a polynomial time reduction which, given an instance G of Sandwich C 4 -Free Edge Deletion in which every C 4 subgraph contains an undeletable edge, creates an instance (G , k) of House-Free Edge Deletion, such that: • k is the number of deletable edges in G; • If G is a YES instance of Sandwich C 4 -Free Edge Deletion, then (G , k) is a YES instance of House-Free Edge Deletion; • If G is a NO instance of Sandwich C 4 -Free Edge Deletion, then (G , p(k)) is a NO instance House-Free Edge Deletion.
Proof. We perform a similar construction as in the proof of Lemma 38. We start with an instance G of Sandwich C 4 -Free Edge Deletion, where every C 4 subgraph contains an undeletable edge. Let k be the number of deletable edges in G. For every undeletable edge uv in G, we add p(k) + 2 gadgets Q i uv , for i ∈ [p(k) + 2], constructed as follows. Each gadget Q i uv consists of vertices a i uv and b i uv , and edges ua i uv , ub i uv , vb i uv , a i uv b i uv . The gadgets are not adjacent to each other. The construction is depicted in Figure 11.
Let G be the obtained graph. We now verify that the construction satisfies the required properties.
Suppose first that the input instance G of Sandwich C 4 -Free Edge Deletion has some solution F . That is, F is a subset of deletable edges of G and G−F is C 4 -free. Obviously |F | ≤ k, because there are k deletable edges in G in total. We claim that G − F is house-free, and hence (G , k) is a YES instance of House-Free Edge Deletion. For the sake of contradiction, suppose there is some induced house D in G −F , and let D be the induced C 4 contained in it. Since G−F is C 4 -free, at least one vertex w of D does not belong to V (G). Vertex w cannot be of the form a i uv for some undeletable edge uv, because such vertices have degree 2 in G − F and their neighbors are adjacent in G − F ; this cannot happen for a vertex of an induced C 4 . Hence, w = b i uv for some undeletable edge uv and i ∈ [p(k) + 2]. Since uv is undeletable, we have that uv / ∈ F . We conclude that in G − F , the only pair of For the other direction, suppose that the instance (G , p(k)) of House-Free Edge Deletion has some solution F . That is, F is a subset of edges of G of size at most p(k) for which G −F is house-free.
We first claim that F does not contain any edge of G that was undeletable in the original instance of Sandwich C 4 -Free Edge Deletion. Suppose the contrary: there is some edge uv in F that is an undeletable edge of G. Recall that we have constructed p(k) + 2 gadgets Q i uv . Since |F | ≤ p(k), there have to be at least two of these gadgets, say Q i uv and Q j uv , for which F does not contain any of their edges. Since uv ∈ F , we infer that {u, v, a i uv , b i uv , b j uv } induces a house in G −F , a contradiction. Hence, F contains no undeletable edge of G. Consider set F = E(G) ∩ F : this set contains only deletable edges of G, and we claim that it is in fact a solution to the input instance G of Sandwich C 4 -Free Edge Deletion. For the sake of contradiction, suppose G − F contains some induced C 4 ; call it S. By the assumption that each C 4 subgraph of G contains an undeletable edge, we conclude that S has at least one undeletable edge, say uv. Recall that for the edge uv we have constructed p(k) + 2 gadgets Q i uv . Since |F | ≤ p(k), there is at least one gadget Q i uv whose edges are disjoint with F . We conclude that S together with vertex b i uv induces a house in G − F , which is a contradiction. Hence G − F is indeed C 4 -free.
Having Lemmas 38 and 39, we can conclude the proof of Lemma 25 using the same reasoning as for Theorems 11 and 42.
• For the hardness of C 4 -Free Edge Deletion, we compose the reductions of Lemmas 26 and 38 (the first problem pair).
• For the hardness of C 4 -Free Edge Completion, we compose the reductions of Lemmas 34 and 38 (the third problem pair).
• For the hardness of C 5 -Free Edge Deletion, we compose the reductions of Lemmas 30 and 38 (the second problem pair). The problem C 5 -Free Edge Completion is equivalent to C 5 -Free Edge Deletion by the complementation of the instance.
• For the hardness of P 5 -Free Edge Deletion, we compose the reductions of Lemmas 37 and 38 (the fourth problem pair) to establish the hardness of House-Free Edge Completion, and then apply the complementation of the instance.
• For the hardness of P 5 -Free Edge Completion, we compose the reductions of Lemmas 26 and 39 to establish the hardness of House-Free Edge Deletion, and then apply the complementation of the instance.
This concludes the proof of Lemma 25, and hence of Theorem 2 as well.

Conclusions
In this work we initiated the study of approximability of edge modification problems related to the classes of H-free graphs. Mirroring known kernelization hardness results, we have shown that the problems are hard to approximate whenever H is a 3-connected graph with at least two non-edges, or it is a long enough path or cycle. It therefore seems that the approximation complexity of H-free Edge Deletion (Completion) somewhat matches the kernelization complexity in the cases considered so far, so it is tempting to formulate a conjecture that for every graph H, the H-free Edge Deletion (Completion) problem admits a polynomial kernel if and only if it admits a poly(OPT)approximation algorithm. Since neither for kernelization nor for approximability the classification is close to being complete, this conjecture should be regarded as a very distant goal. However, one very concrete open question that arises is whether Cograph Edge Deletion (equivalent to H = P 4 ) admits a poly(OPT)-approximation. Here, we expect the answer to be positive, due to the existence of the polynomial kernel of Guillemot et al. [9]. The same question can be asked about the diamond graph, that is, a K 4 minus an edges; a polynomial kernel for Diamond-Free Edge Deletion was given by Cai [5]. Also, further investigation of the links between the case of a complete graph without one edge and the Min Horn Deletion problem, seems like an interesting direction. The clause gadget H c for a clause c = 1 ∨ 2 ∨ 3 ∈ cls is created from two copies of H. The first copy contains two labeled non-edges e 1 , e 2 ∨ 3 , corresponding to 1 , and 2 ∨ 3 . All other non-edges are marked as non-fillable. The second copy is created from H by deleting one edge, and the corresponding non-edge is identified with the non-edge e 2 ∨ 3 from the first copy. We also pick any two other non-edges, label them as e 2 and e 3 , and make all the remaining non-edges non-fillable. Thus, the clause gadget has only 4 fillable non-edges: e 1 , e 2 ∨ 3 , e 2 and e 3 .
To see how the clause gadget works, observe that if we do not add an edge in the place of e 1 , then we have to add the edge e 2 ∨ 3 , which in turn forces us to fill either e 2 or e 3 . This shows that at least one of three non-edges e 1 , e 2 , e 3 has to be filled. Moreover, observe that for each i = 1, 2, 3, there is a solution within the clause gadget that fills only the non-edge e i among the aforementioned triple: it is either {e 1 } for i = 1, or {e 2 ∨ 3 , e i } for i = 2, 3.
The connector gadget C is obtained from H by (i) labeling any of its non-edges as e out , and (ii) selecting any edge not sharing any endpoint with e out , deleting it, and labelling the obtained non-edge as e in . Such an edge not sharing any endpoint with e out exists due to H being 3-connected, by the same argument as we used in the proof of Lemma 8. We mark all other non-edges as non-fillable, thus only e in and e out can be filled. Note that filling the non-edge e in forces us to fill also the non-edge e out , because we obtain an induced copy of H that could not be destroyed otherwise.
We combine those gadgets as in Lemma 8. That is, the non-edges e 1 , e 2 , e 3 in each clause gadget c, are connected by chains of length |V (H)| + 2 of connector gadgets to the corresponding variable gadgets. When forming the chain, the connector gadgets are attached to each other by identifying the non-edge e out in one gadget with the non-edge e in in the second gadget. The chain is attached to a clause gadget by identifying the corresponding non-edge e i with the non-edge e in of the first gadget of the chain. Similarly, the attachment to a variable gadget is done by identifying the non-edge e out of the last gadget of the chain with the corresponding non-edge e in the variable gadget. The explained behaviour of connector gadgets implies similar propagation of completions through the chains, as was the case for deletions in the proof of Lemma 8. It is easy to verify that the obtained graph G has O(n + m) vertices, edges, and fillable non-edges, where n and m are the cardinalities of the variable and clause sets of ϕ.
We have argued that the variable, clause, and connector gadgets have exactly the same functionality as in the proof of Lemma 8. Hence, the proof of the correctness of the reduction follows by a straightforward adaptation of the first proof; we leave checking the details to the reader. Lemma 42. Let H be a 3-connected graph, and p(·) be a polynomial with p( ) ≥ for all positive . Then there is a polynomial-time reduction which, given an instance G of Sandwich H-free Edge Completion, constructs an instance (G , k) of H-free Edge Completion such that: • k is the number of fillable non-edges of G, • G has O(|E(G)| + p(k) · |E(G)| · |E(H)|) edges, • If G is a YES instance, then (G , k) is a YES instance, • If G is a NO instance, then (G , p(k)) is a NO instance.
Proof. Similarly as in the proof of Lemma 11, for a non-fillable non-edge uv, we add p(k) copies of a gadget constructed as follows. Take H, arbitrarily choose one of its edges e, and delete e from H. The gadget is attached to the non-edge uv by identifying the endpoints of e with u and v. The construction is presented in Figure 2.
Observe that for any subset F of non-edges in the obtained graph G , for which G + F is H-free, if F contains the non-edge uv, then F also has to contain at least one non-edge within every gadget attached to uv. Otherwise the gadget would induce a copy of H. Hence, such solution F has to fill more than p(k) non-edges.
This shows that the functionality of the gadgets attached to non-edges is the same as in the proof of Lemma 11. Consequently, a proof of correctness of the reduction follows by a straightforward adaptation of the first proof; we leave checking the details to the reader.
Exactly as in the Section 3.1, by composing the reductions of Lemmas 41 and 42 we infer the hardness results promised in Theorem 1 concerning completion problems. This completes the proof of Theorem 1.