Distance-preserving graph contractions

Compression and sparsification algorithms are frequently applied in a preprocessing step before analyzing or optimizing large networks/graphs. In this paper we propose and study a new framework contracting edges of a graph (merging vertices into super-vertices), with the goal of preserving pairwise distances as accurately as possible. Formally, given an edge-weighted graph, the contraction should guarantee that for any two vertices at distance $d$, the corresponding super-vertices remain at distance at least $\varphi(d)$ in the contracted graph, where $\varphi$ is a tolerance function bounding the permitted distance distortion. We present a comprehensive picture of the algorithmic complexity of the contraction problem for affine tolerance functions $\varphi(x)=x/\alpha-\beta$, where $\alpha\geq 1$ and $\beta\geq 0$ are arbitrary real-valued parameters. Specifically, we present polynomial-time algorithms for trees as well as hardness and inapproximability results for different graph classes, precisely separating easy and hard cases. Further we prove upper and lower bounds, yielding efficient algorithms to compute (non-optimal) contractions despite our hardness results.


Introduction
When dealing with large networks, it is often beneficial to compress or sparsify the data to manageable size before analyzing or optimizing the network directly. To be useful, a meaningful compression should represent salient features of the original network with good approximation, while being much smaller in size. In this paper, we focus on a compression of undirected edge-weighted graphs that approximately maintains all distances between vertices in the graph.
In this context, an extensively studied concept are spanners (e.g. [PS89, ADD + 93, BKMP05,AB16]). Given an undirected graph G = (V, E) and numbers α ≥ 1 and β ≥ 0, a subgraph H = (V, E ′ ), E ′ ⊆ E, is an (α, β)-spanner of G if dist H (u, v) ≤ α · dist G (u, v) + β holds for all u, v ∈ V . While the number of edges in a spanner may be much smaller than that of the original graph, the number of vertices is the same for both, leaving further potential for compression untapped. For illustration, consider the road network of Europe with about 50 million vertices [BMSW13], any spanner of which must again have about 50 million vertices and edges. However, to approximately represent distances in Europe's road network one may also merge nearby vertices into super-vertices, thus achieving a much better compression of the network. This is akin to the visual process of zooming out of a graphical representation of the map, where neighbored vertices fade into each other and edges between merged vertices vanish. At a large enough zoom level, the entire network merges into a single vertex.
In this paper we propose and study a new framework for contracting networks that formalizes this intuitive idea and makes it applicable to general graphs (even without metric embedding). Specifically, we study a contraction problem on graphs where a subset of edges C ⊆ E is contracted. We denote the resulting simple graph obtained from G by contracting the edges in C and by deleting resulting loops and multiple edges, keeping only the shortest edge between any two vertices, by G/C. For any two vertices in G, we compare their distance in G with the distance of the corresponding super-vertices in G/C. It is interesting to contrast this concept with graph spanners. When constructing a spanner, the length of the removed edges is implicitly set to ∞, resulting in an overall increase of distances. On the other hand, a contraction implicitly sets the length of the contracted edges to zero, leading to an overall decrease of distances. For both problems, the ultimate goal is to reduce the complexity of the network while maintaining an approximation guarantee on the distances.
The following example shows that contractions may be better suited than spanners to achieve that goal. In a subgraph with small radius, a spanner can at best result in a spanning tree of the same order, while a contraction can reduce the whole subgraph to a single vertex (both entail essentially the same multiplicative distance distortion). In addition, the contraction may also merge many edges entering the contracted subgraph. Clearly, the objective here is to maximize the total number of contracted and deleted edges, as this minimizes the memory required to represent the resulting network in a computer (using e.g. adjacency lists).
Given the results presented in this paper and the known results for spanners (discussed in detail below), we further believe that the combination of spanners and contractions is very powerful, promising and flexible. As the former only increases and the latter only decreases the distances, the respective distortion guarantees provably also hold for the overall distortion. In fact, both effects may even compensate each other. This is true regardless of the order in which both compression operations are applied, even when they are applied repeatedly. a even for bipartite graphs and β = 1 b also NP-hard for planar graphs with arb. large girth, (α, β) = (2, 0), and unit lg. (ℓ = 1) [Th. 11]. c even if (α, β) = (3/2, 0).
In order to measure the distance distortion of the contraction, we assume a nondecreasing tolerance function ϕ : R → R, similar to the corresponding function for spanners, see e.g. [BKMP05]. We are interested in computing contractions that preserve distances in the following sense: For any two vertices u and v at distance d in G, the distance of the corresponding vertices in the contracted graph G/C must be at least ϕ (d). If this condition is satisfied, we call C a ϕ-distance preserving contraction, or ϕ-contraction for short. Formally, the algorithmic problem Contraction considered in this paper is to compute for a given graph G = (V, E) with edge lengths ℓ : E → R >0 and a given tolerance function ϕ, a ϕ-contraction C ⊆ E such that the number of contracted and deleted edges is maximized. We are specifically interested in the case where the tolerance function ϕ is an affine function ϕ(x) = x/α − β for real-valued parameters α ≥ 1 and β ≥ 0. We then simply write (α, β)-contraction instead of ϕ-contraction. See Figure 1 for some example instances of the problem Contraction.
When considering the case of a purely multiplicative error (β = 0), a slight subtlety has to be taken into account. Specifically, for a graph with positive edge lengths it is not feasible to contract a single edge. Therefore, we propose a slight modification of our original model: We say that a set C ⊆ E of edges of G is a weak ϕ-distance preserving contraction, or weak ϕ-contraction for short, if it does not contract the entire graph and, for any two vertices u and v at distance d in G, the distance of the corresponding vertices in G/C is either zero or at least ϕ(d). We will refer to the corresponding algorithmic problem as Weak Contraction. Put differently, in a weak contraction, the distances between different super-vertices satisfy the given distortion guarantee, but for vertices belonging to the same super-vertex, no guarantee is given.
1.1. Our results. In this paper, we present a comprehensive picture of the algorithmic complexity of the described contraction problems. Recall that we are given an input graph with edge lengths and tolerance function ϕ, and our goal is to compute a (weak) contraction that maximizes the total number of contracted and deleted edges. Our main results concern affine tolerance functions ϕ(x) = x/α − β with parameters α ≥ 1 and β ≥ 0. For the reader's convenience, our results are summarized in Tables 1, and 2. Within the tables and throughout this paper, n and m denote the number of vertices and edges, respectively, of the input graph under consideration.  Algorithmic results. We develop linear time greedy algorithms for Contraction with unit lengths on paths, cycles, and on trees with α = 1 (Theorems 2, 3 and 4). The first two algorithms are inspired by LP rounding techniques, the latter algorithm relies on a structural characterization of optimal solutions. We present dynamic programming algorithms solving Contraction and Weak Contraction on trees in time O(n 3 ) or O(n 5 ), respectively (Theorems 5 and 6). These dynamic programs compute optimal solutions on subtrees, in the latter case combining several Pareto optimal solutions in a two-dimensional parameter space (hence the larger running time).
Note that instead of maximizing the number of contracted and deleted edges, we could optimize for α or β while fixing the other parameters. The resulting problems are polynomially equivalent to our setting, via binary search over one of the parameters.
Hardness results. We complement these algorithms by several hardness results. First we consider the purely additive case where α = 1. We show that here both Contraction and Weak Contraction are NP-hard on cycles for any fixed β > 0, by a reduction of a variant of Partition (Theorem 7). As mentioned before, both problems can be solved efficiently on graphs without cycles, and there is a linear time algorithm for Contraction on cycles with unit lengths. By reductions from Clique we show that both the general as well as the unit lengths case of Contraction with α = 1 are hard to approximate within factors of n 1−ε or m 1/2−ε , respectively (Theorem 9 and Theorem 10).
Further we consider the purely multiplicative case where β = 0 (here Contraction is trivial). We show that in this case Weak Contraction is NP-hard on planar graphs with arbitrarily large girth and unit length edges by a reduction from a special case of Planar 3SAT (Theorem 11). Since these graphs are locally tree-like, this result constitutes another rather sharp separation from the polynomially solvable tree case. Furthermore, we show that the problem is hard to approximate within a factor of n 1−ε by a reduction from Independent Set (Theorem 12).
Asymptotic bounds. We now discuss our asymptotic bounds for contractions. In this setting, we are interested in (non-optimal) contractions for graphs with unit lengths that can be computed efficiently despite the above-mentioned hardness results. We prove that for any k ≥ 1 any graph G has a (2k − 1, 1)-contraction C such that G/C has at most n 1+1/k edges, and such a contraction can be computed in time O(m) (Theorem 13) by successively growing clusters around center vertices. Assuming Erdős' girth conjecture, we show a corresponding (not tight) lower bound (Theorem 15). For a purely additive error, we show that for any integer k ≥ 1 any graph with unit length edges has a (1, 2k)-contraction C with objective value at least km/n that can be computed in time O(m) (Theorem 16), which is asymptotically best possible for paths. The idea here is to remove the k vertices of highest degrees.
One possible advantage of contraction compared to spanners is the potentially significant reduction of vertices as well as edges. To ground this intuition, we exhibit a contraction that significantly reduces the number of vertices in a graph with minimum degree D to O(n/D) (Theorem 17). We also present a lower bound (Theorem 18) showing that we cannot guarantee o(n/D) vertices, even if we allow larger approximation error.
1.2. Comparison with previous results. There are several models aiming to compress graphs while preserving distances. They differ by their choice of compression operation, such as replacing the graph by a subgraph or minor, and by whether the aim is to preserve all or only certain distances.
As discussed before, graph spanners are a concept closely related to contractions, where the length of removed edges is set to ∞ rather than to 0. Our results highlight further intrinsic similarities of the two models. As for contractions, computing optimal spanners is NP-hard (see [PS89,LS93]), even to approximate (e.g. [Kor98,EP07]). But it is still possible to obtain powerful asymptotic guarantees in both models. In particular, our (2k − 1, 1)-contraction with O(n 1+1/k ) edges has a clear analogy to the classic (2k − 1, 0)spanner with the same number of edges [ADD + 93] (note that the additive error of 1 in our result is strictly necessary, as discussed above). There is, however, a major difference between the two results: whereas the (2k − 1, 0)-spanner can trivially be shown to be optimal assuming Erdős' girth conjecture, applying this conjecture to the contraction model only yields a lower bound of n 1+1/(2k) edges for a (2k − 1, 1)-contraction. Closing this gap thus remains as an interesting open problem in the contraction model, whose solution would likely yield further insight into the relationship to spanners.
It is interesting to note that the clustering yielding our (2k − 1, 1)-contraction was previously used in [PS89] to obtain a (4r + 1, 0)-spanner of the same density. On the other hand, no deterministic linear time algorithm computing a (2k − 1, 0)-spanner is known, though [BS03] achieves randomized linear time. Meanwhile our (2k − 1, 1)-contraction can be constructed deterministically in linear time.
For unweighted graphs, there are also spanner results that significantly sparsify the graph at the cost of a purely additive error: For example, a (1,2)-spanner with O(n 3/2 ) edges [ACIM99], or a (1,6)-spanner with O(n 4/3 ) edges [BKMP05]. We do not know if analogous results are possible in the contraction model. Lower bounds would also be interesting; for spanners, it is known that O(n 4/3 ) is the best possible guarantee if we want a constant additive error [AB16]. Finally, for spanners there are results that combine multiplicative and additive error, such as the (k, k − 1)-spanner of [BKMP05].
Gupta [Gup01] considered the problem of approximating a tree metric on a subset of the vertices by another tree, and gave a linear time algorithm computing an 8-approximation. As Chan et al. [CXKR06] observed later, on complete binary trees a solution of minimum distortion is always achieved by a minor (with possibly different edge lengths) of the input tree, so this seems to be the first investigation of contractions that approximate graph distances. Krauthgamer et al. [KNZ14] considered an extension to general graphs, studying the size of minors preserving all distances between a given terminal set of fixed size. Cheung et al. [CGH16] introduced a multiplicative distortion to this model. As here no two terminals may be merged, these approaches cannot compress a graph at all if every vertex is a terminal.
The pairwise preservers due to Coppersmith et al. [CE06] combine spanners with the aim of preserving only terminal distances. Given a weighted undirected graph G and a set of terminal pairs, a pair-wise preserver is a spanning subgraph inducing exactly the same terminal distances as G. Coppersmith et al. proved that for every such graph and every set of k = O(n 1/2 ) terminal pairs there exists a pair-wise preserver of size O(kn 1/2 + n).
1.3. Further related work. The preservation of graph properties other than distances has been studied as well. Biedl et al. [BBV00] considered contractions in capacitated networks with the goal of maintaining the maximum flow in the network. Here an edge e is called useless, if for every capacity function there is a maximum flow not using e. Biedl et al. showed that finding all useless edges is NP-complete, but solvable in O(n 2 ) time on certain planar graphs. For undirected networks, Misio lek et al. [MC05] gave an algorithm finding all useless edges in O(n + m) time. Toivonen et al. [ZMT10] considered a more general model aiming to maintain the quality of paths with respect to any given function, e.g., distance or capacity. They investigated strategies of removing edges, without decreasing the quality of the best path between any pair of vertices.
Graph simplification problems have also been studied in several other contexts, and we conclude this section by mentioning two such examples: Hübler et al. [HKBG08] studied a problem related to graph mining, examining how to choose an induced subgraph with a given number of vertices and with similar topological properties as the input graph. Dörfler and Bullo [DB13] considered the problem of exactly preserving the effective resistance among a set of terminal vertices of an electrical network via an algebraic network compression known as Kron reduction.
1.4. Outline of this paper. In Section 2 we introduce important definitions and notations that will be used throughout this paper. In Section 3 we discuss our three greedy algorithms for solving special cases of Contraction on paths, cycles and trees. In Section 4 we present our two dynamic programming algorithms for Contraction and Weak Contraction on trees. Sections 5 and 6 are devoted to proving our hardness results, focussing on the case of additive and multiplicative error, respectively. In Section 7 we present our asymptotic results on contractions.

Preliminaries
Throughout this paper we consider simple undirected graphs G (without parallel edges or loops). We let V (G) and E(G) denote the vertex and edge set of G, respectively, and we define n(G) := |V (G)| and m(G) := |E(G)|. If the context is clear, we simply write V , E, n and m. We also use the notation [n] := {1, 2, . . . , n}. We assume that G is connected, otherwise the contraction problem can be solved independently for each connected component. Edge lengths are given by a function ℓ : E → R >0 . The distance dist ℓ (u, v) between two vertices u and v is the length of a shortest path between u and v in G with respect to ℓ.
Given a subset of edges C ⊆ E, we denote the resulting simple graph obtained from G by contracting the edges in C, deleting resulting loops and keeping only the shortest edge between any two vertices by G/C. We denote the number of deleted loops and multiedges by ∆(C) (thus m(G/C) = m(G) − |C| − ∆(C)). Instead of contracting a set C ⊆ E of edges in G, setting their edge lengths to zero has the same effect on the distances in the resulting graph. This is somewhat cleaner conceptually, so we will often adopt this viewpoint. Specifically, we let ℓ C be the new length function that assigns 0 to every edge in C, and that is equal to the original edges lengths ℓ on the edges E \ C.
A tolerance function is a non-decreasing function ϕ : R → R. Roughly speaking, this function describes by how much the distance between two vertices may drop when contracting edges (i.e., setting edge lengths to zero). Formally, given a graph G with edge lengths ℓ and a tolerance function ϕ, we say that a subset of edges C ⊆ E is a ϕ-distance preserving contraction or ϕ-contraction for short, if (1) holds for any two vertices u and v in G. Similarly, we say that C is a weak ϕ-distance preserving contraction or weak ϕ-contraction for short, if (1) or dist ℓ C (u, v) = 0 holds for any two vertices u and v, and if the graph (V, C) is disconnected (equivalently, if G/C is not a single vertex). The last condition prevents solutions C ⊆ E for which the graph is contracted to a single vertex. If ϕ(x) = x/α − β, then we simply write (weak) (α, β)-contraction instead of (weak) ϕ-contraction.
An instance of the problem Contraction or Weak Contraction is a triple (G, ℓ, ϕ), where G is the underlying graph, ℓ the length function and ϕ the tolerance function, and the objective is to find a (weak) ϕ-distance preserving contraction C ⊆ E, such that is maximized. This quantity equals the number of edges we save when going from G to G/C. Note that for instance on trees we have Φ(C) = |C| for any (weak) contraction C.

(Weak) Contraction
Input: In this context we sometimes refer to a set of edges that forms a (weak) contraction as a feasible solution, and to a (weak) contraction of maximum size as an optimal solution.
We begin by proving that our contraction model behaves nicely when contracting edges in phases, i.e., the total error is simply the error accumulated over the contraction phases (but not more). To state this result we denote the composition of tolerance functions ϕ and ψ as (ψ • ϕ)(x) := ψ(ϕ(x)).
Theorem 1. Let C be a (weak) ϕ-contraction for G, and let C ′ be a (weak) ψ-contraction for G/C.
Proof. We only prove the statement for contractions ϕ and ψ. The proof for weak contractions works analogously. Let ℓ denote the edge lengths of G and consider a pair of vertices u, v ∈ V (G). Then we have dist ℓ C∪C ′ (u, v) ≥ ψ(dist ℓ C (u, v)) by the definition of C ′ and dist ℓ C (u, v) ≥ ϕ(dist ℓ (u, v)) by the definition of C. Combining these inequalities and using that ψ is non-decreasing we obtain dist ℓ C∪C ′ (u, v) ≥ ψ(ϕ(dist ℓ (u, v))), as desired.
Note that Theorem 1 only talks about the feasibility of repeated contractions, but not about their optimality when searching for contractions of maximum cardinality. With respect to solution quality, contracting in phases may be arbitrarily bad: Consider a star with k unit length edges and additive tolerance functions ϕ(x) = ψ(x) = x − 1. An optimum (ψ • ϕ)-contraction contains all k edges, whereas finding an optimal ϕ-contraction C and then an optimal ψ-contraction of G/C allows contracting only one edge in each phase, leading to a (ψ • ϕ)-contraction of value 2.

Greedy algorithms
In this section we consider three special cases of the problem Contraction with affine tolerance function ϕ(x) = x/α − β. We obtain simple greedy algorithms computing maximum size ϕ-contractions in O(n) time on paths and cycles with unit lengths, and on trees with unit lengths and α = 1.
3.1. Paths with unit length edges. In this section we consider the special case of contracting a path P n with n − 1 unit length edges ℓ = 1 and the tolerance function ϕ(x) = x/α − β. In this case optimal solutions have a very special structure, which leads to a straightforward greedy algorithm running in linear time. Recall that as a path is a tree, our objective functions satisfies Φ(C) = |C| for any contraction C.
Observe that a solution C ⊆ E(P n ) for the instance (P n , ℓ, ϕ) of the problem Contraction is feasible, if and only if every subpath P ′ ⊆ P n satisfies the condition ( This observation leads to the following natural greedy algorithm Greedy(P n , α, β): The algorithm considers the edges e 1 , e 2 , . . . , e n−1 of P n as they are encountered when starting from one of the two end vertices of P n . It iteratively constructs a solution C for the subpath on the first i edges e 1 , e 2 , . . . , e i for i = 1, 2, . . . , n − 1, by initializing C := ∅, and by adding the edge e i to C if and only if the condition |C| + 1 ≤ (1 − 1/α)i + β is satisfied (so after adding e i to C, (3) is still satisfied).
Theorem 2. Let P n be a path with unit length edges ℓ = 1 and consider the tolerance function ϕ(x) = x/α − β, α, β ≥ 1. The set of edges computed by the algorithm Greedy(P n , α, β) is an optimal solution for the instance (P n , ℓ, ϕ) of the problem Contraction, and it is computed in time O(n).
Proof. Let C ⊆ E(P n ) be the set of edges computed by the algorithm Greedy(P n , α, β). Clearly, we have |C| = ⌊(1 − 1/α)|E(P n )| + β⌋, and this is optimal according to (3). However, it remains to show that C is feasible. For 1 ≤ i ≤ j ≤ n − 1 = |E(P n )| we let P i,j denote the subpath of P n formed by the edges e i , e i+1 , . . . , e j . By the definition of our algorithm we know that |P 1,i ∩ C| = ⌊(1 − 1/α)i + β⌋, from which we obtain that where we used the assumption β ≥ 1 in the last step. Using (3) it thus follows that C is feasible.
3.2. Cycles with unit length edges. In this section we consider the special case of contracting a cycle C n with n vertices and unit length edges ℓ = 1 and the tolerance function ϕ(x) = x/α−β, α ≥ 1, β ≥ 0. For this case we present a greedy algorithm running in linear time. The main purpose of this result is to clearly separate the polynomially solvable cases of Contraction from the NP-hard cases, and the case of a cycle with unit length edges precisely forms this boundary on the polynomially solvable side. Recall in this context that we can solve Contraction in polynomial time on any tree (this will be proved in Section 4.1 below), and that Contraction is NP-hard already on a cycle for α = 1 (with arbitrary edge lengths; we will show this in Section 5.1 below). We first argue that on a cycle it is equivalent to maximize the number of contracted edges |C| or to maximize our objective function Φ(C) defined in (2). This is because the set of pairs (|C|, Φ(C)) for all feasible contractions C in a cycle G = C n is given by {(1, 1), (2, 2), . . . , (n − 3, n − 3), (n − 2, n − 1), (n − 1, n), (n, n)}, so it forms a monotone function, implying that maximizing either one of the two quantities is equivalent. Based on this argument, for the rest of this section we consider maximizing the number |C| of contracted edges.
Observe that a solution C ⊆ E(C n ) (C n is the cycle we want to contract, and C is the set of edges to be contracted) for the instance (C n , ℓ, ϕ) of the problem Contraction is feasible, if and only if every subpath P ⊆ C n of length d := |E(P )| ∈ {1, 2, . . . , n − 1} satisfies the condition |P ∩ C| ≤ ⌊d − min{d, n − d}/α + β⌋. (4) Rounding down on the right-hand side of (4) is justified because |P ∩ C| is always an integer. Defining we obtain from (4) that λ ∈ [0, 1] is the maximal amount by which we can contract each edge in a uniform fractional solution. Inspired by the rounding technique from [BOR80], we turn this fractional solution into an integer optimal solution, yielding the following greedy algorithm Greedy(C n , α, β): The algorithm considers the edges e 1 , e 2 , . . . , e n of C n as they are encountered when walking around the cycle. It iteratively constructs a solution C by initializing C := ∅ and by adding the edge e i to C if and only if ⌊λi⌋ − ⌊λ(i − 1)⌋ = 1 for all i = 1, 2, . . . , n (since λ ∈ [0, 1], this difference is always either 0 or 1). Note that we contract all edges of C n if and only if λ = 1.
Theorem 3. Let C n be a cycle with unit length edges ℓ = 1 and consider the tolerance The set of edges computed by the algorithm Greedy(C n , α, β) is an optimal solution for the instance (C n , ℓ, ϕ) of the problem Contraction, and it is computed in time O(n).
The next lemma shows that the contraction computed by our algorithm has the maximum size. Proof. If λ = 1 this inequality is trivial. So let us assume that λ = λ ′ < 1 and that the minimum in (5a) is attained for some d ∈ {1, 2, . . . , n − 1}. Starting at some vertex u of the cycle, we walk along the cycle and cover it with n consecutive paths P 1 , P 2 , . . . , P n of length d each (P i+1 starts where P i ends). The sum of the lengths of the paths is nd, so this process ends at the starting vertex u, and each edge of the cycle and each edge of C is covered exactly d times. We therefore obtain As |C| must be integral this inequality yields the desired bound |C| ≤ ⌊λn⌋.
With Lemma 3.1 in hand, we are now ready to prove Theorem 3.
Proof of Theorem 3. In this proof we will use that for any two real numbers x and y we have Let C ⊆ E(C n ) be the set of edges computed by the algorithm Greedy(C n , α, β). Clearly, we have |C| = n i=1 (⌊λi⌋ − ⌊λ(i − 1)⌋) = ⌊λn⌋, which is optimal by Lemma 3.1. However, it remains to show that C is feasible. We consider a path P of length d := |E(P )| ∈ {1, 2, . . . , n−1} on the edges e k , e k+1 , . . . , e k+d−1 (indices are considered cyclically modulo n, so e n+i = e i ). We distinguish two cases: If k + d − 1 ≤ n, we have If k + d − 1 > n, we obtain Applying (5) and using that ⌈⌊x⌋⌉ = ⌊x⌋ shows that the right-hand sides of (7) and (8) can both be bounded from above by ⌊d − min{d, n − d}/α + β⌋, proving that C is indeed feasible by (4).
3.3. Trees with unit length edges and additive error. In this section we consider the special case of contracting a tree T with unit length edges ℓ = 1 and the tolerance function ϕ(x) = x − β (purely additive error; we can assume w.l.o.g. that β is an integer). Note that in this setting the objective function defined in (2) satisfies Φ(C) = |C| for any contraction C. It turns out that in this case, optimal solutions have a very special structure that can be exploited to compute them in linear time. Specifically, an optimal solution is obtained by taking all edges of T which have the property that only short paths start from one of its end vertices. Formally, for the tree T and d ∈ N ≥0 , we let L(T, d) denote the set of all edges e of T which have one end vertex v such that all paths that start at v and do not contain e have length at most d − 1 (together with e these paths have length at most d). E.g., we have L(T, 0) = ∅, and the set L(T, 1) are all the edges incident to a leaf (see Figure 2).  Proof. We define C := L(T, d) if β is even and C := L(T, d)∪ {e}, for some e ∈ E \L(T, d), if β is odd. We first argue that C is a feasible solution. To see this note that for the given tolerance function we only need to verify that the path P between any two leaves u, v of T contains at most β edges. Consider all the edges of P for which both end vertices have distance at least d from both u and v. None of those edges is in L(T, d) by its definition. It follows that |P ∩ L(T, d)| ≤ 2d = 2⌊β/2⌋ and therefore |P ∩ C| ≤ β.
To prove that C is a solution of maximum size we argue by induction over β. The claim is trivially true for β = 0 and β = 1 (in these cases |C| = 0 and |C| = 1, respectively). So let D be an arbitrary feasible solution of the instance (T, ℓ, β) of the problem Contraction for some β ≥ 2. We need to show that |C| ≥ |D|. To this end we let V * ⊆ V (T ) denote the set of leaves of T and we define E * := L(T, 1). Moreover, we define We first consider the case that It remains to consider the case that both sets E * \ D and D \ E * are nonempty, so there is an edge e ′ ∈ E * \ D and an edge f ∈ D \ E * . We denote the leaf incident to e ′ by v. We will now remove an edge e ∈ D \ E * from D and add e ′ instead to obtain another feasible solution D ′ satisfying |D| = |D ′ |. Repeating this exchange argument and applying the reasoning from the first case then proves the lemma. The edge e ∈ D \ E * to be removed from D is obtained by considering the path that connects v and f in T and that contains f , and by choosing the first edge from D (or equivalently, from D \ E * ) that is encountered when following this path from v to f . It may happen that e = f is the first such edge we encounter. To complete the proof of the lemma it remains to show that D ′ = D \{e}∪ {e ′ } is feasible. To prove this we only need to check paths which start in v and contain e ′ but not e. Let P ′ be such a path, let Q be any path that also starts in v but does contain e, and consider the path P := (P ′ \ Q) ∪ (Q \ P ′ ) (see Figure 3). Here and in the following we slightly abuse notation and interpret these set unions/differences/intersections in terms of the edge sets of the graphs. As D is feasible and as P ∩ Q contains e, the number of edges in D or D ′ on P ′ \ Q = P \ Q is at most β − 1. By the choice of e, the number of edges of D ′ on P ′ ∩ Q is 1 (the only edge of D ′ on this path is e ′ ). As we obtain that the number of edges from D ′ on P ′ is at most β − 1 + 1 = β, as desired. This completes the proof.

Dynamic programs for general trees
In this section we describe dynamic programming algorithms for the problems Contraction and Weak Contraction on trees with general edge lengths and affine tolerance functions. Recall that on trees our objective function satisfies Φ(C) = |C| for any contraction C.
4.1. Contraction on trees. In this section we describe a dynamic programming algorithm for the problem of computing an optimal contraction of a tree T with arbitrary edge lengths ℓ : E → R >0 and an affine tolerance function ϕ(x) = x/α − β, α ≥ 1, β ≥ 0, generalizing the solution for the special case presented at the beginning of the previous section. The goal is to prove the following result.
Theorem 5. Let T be a tree with edge lengths ℓ : E → R >0 and consider the tolerance An optimal solution for the instance (T, ℓ, ϕ) of the problem Contraction can be computed by dynamic programming in time O(n 3 ).
Observe that a solution C ⊆ E is feasible if and only if for any two vertices u and v of T we have load C,α (u, v) ≤ β, where the load between u and v is defined as The next lemma states a criterion when feasible solutions of subtrees can be combined to a feasible solution of the entire tree. The definitions (9a), (9b) and the lemma are illustrated in Figure 4.
Proof. Observe that the path between two vertices u ∈ T 1 and w ∈ T 2 contains the vertex v, so we obtain load C,α (u, w) = load C,α (u, v) + load C,α (v, w) from (9a). Using (9b) it follows that the condition load C,α (u, w) ≤ β holding for all such pairs of vertices u, w is We will use this lemma to formulate our dynamic programming algorithm. The idea is to compute optimal solution for subtrees and combining them to an optimal solution for the entire tree.
To describe the algorithm we introduce a few definitions. An ordered rooted tree is a rooted tree with a specified left-to-right ordering for the children of each vertex. Given the tree T , we can pick an arbitrary vertex as the root, and for each descendant of the root an arbitrary left-to-right ordering of its children, yielding an ordered rooted tree (different roots and orderings yield different ordered rooted trees, but any one of them is good for our purposes). We slightly abuse notation in the following and use T to denote this ordered rooted tree. All trees considered in the rest of this section are ordered and rooted. For any vertex v of T , we let T v denote the subtree of T rooted at v, and we use c(v) to denote the number of children of v. If u 1 , u 2 , . . . , u c(v) are the children of v (in the specified ordering), we write T v,i , i ∈ {1, . . . , c(v)}, for the subtree of T that contains v, u i and all the descendants of u i . We also define T v,0 := {v}. Furthermore, . These definitions are illustrated in Figure 4.  For the set C of dashed edges we obtain load C, Using these definitions it follows straightforwardly from (9a) and (9b) that for any set Note that the load increases if the edge {v, u i } is added (see (10a)), and it decreases otherwise (see (10b)). Moreover, for any set of edges C ⊆ T + v,i and any i = 1, 2, . . . , c(v) we obtain from those definitions that These rules allow us to compute the load of all subtrees of T in a bottom-up fashion. Our dynamic program maintains the minimum load of all subtrees of T in three-dimensional matrices L and L + . We begin defining these matrices in an abstract way, and then establish several recursive relations which directly translate into a dynamic program. Specifically, , respectively, of the problem Contraction for which the load at the vertex v is as small as possible (the matrices contain the minimum achievable load, not the corresponding set of edges).

Lemma 4.2.
Let v be a vertex of T and let u 1 , u 2 , . . . , u c(v) be the children of v. Then the matrices L and L + defined in and directly after (12) satisfy the relations The most interesting of these recursive relations are of course (13c) and (13d). The relation (13c) captures the two possibilities of either adding the edge {v, u i } or not adding it to a partial solution in the tree T + u i ,c(u i ) = T u i to obtain a solution for the tree T v,i (recall (10)). The relation (13d), on the other hand, describes how to distribute s contraction edges in T + v,i among the two subtrees T + v,i−1 and T v,i (t is the number of edges contracted in the first tree, and s − t the number of edges in the second tree, respectively).
Proof. The relations (13a) and (13b) follow immediately from the definitions of the trees T v,i and T + v,i and from (12). The relation (13c) follows from (10) and (12). The relation (13d) follows from (11) and (12) with the help of Lemma 4.1.
We are now ready to prove Theorem 5.
Proof of Theorem 5. Given the instance (T, ℓ, ϕ), we fix an arbitrary root r of T and an arbitrary ordering of the children of each vertex, making T an ordered rooted tree. We then compute the entries of the matrices L and L + using Lemma 4.2. We first initialize various entries using (13a) and (13b), and compute the remaining entries in a bottomup fashion moving upwards from the leaves to the root. Specifically, at a vertex v with children u 1 , u 2 , . . . , u c(v) for which all the entries of L and L + have already been computed, we first compute L(v, i, s) for all i ∈ {1, 2, . . . , c(v)} and s ∈ {1, 2, . . . , m} using (13c), and then L + (v, i, s) for all i ∈ {1, 2, . . . , c(v)} and s ∈ {1, 2, . . . , m} using (13d).
Let s * be the largest s such that L + (r, c(r), s) ≤ β. From (12) we obtain that s * is the size of an optimal solution of the instance (T, ℓ, ϕ). The corresponding set of edges C ⊆ E can be obtained by keeping track of the arguments for which the minima and maxima in (13c) and (13d) are attained in each step.
Clearly, L and L + both have O(n 2 ) entries, and computing each entry takes time O(n), so the running time of our dynamic program is O(n 3 ).

Weak Contraction on trees.
In this section we consider the problem of computing weak contractions for a tree T with affine tolerance function ϕ(x) = x/α − β. Here, our main result is a dynamic programming algorithm that builds on the algorithmic ideas presented in Section 4.1.
Theorem 6. Let T be a tree with edge lengths ℓ : E → R >0 and consider the tolerance function ϕ(x) = x/α − β, α ≥ 1, β ≥ 0. An optimal solution for the instance (T, ℓ, ϕ) of the problem Weak Contraction can be computed by dynamic programming in time O(n 5 ).
In this setting we need to specifically keep track of pairs of vertices whose distance remains positive when contracting a set of edges C ⊆ E (i.e., not all edges in between these vertices are contracted). To this end we extend the definitions (9) as follows: For any vertex v of T we define the weak load of T at v as Note that in the maximization we have to consider all vertices u such that at least one edge on the path from u to v is not in C. This definition together with (9b) yields In contrast to the load, the weak load may be negative.
The following lemma is the counterpart to Lemma 4.1 for weak contractions. It describes how to combine feasible solutions on subtrees to a feasible solution of the entire tree. There is one important subtlety here: While the notion of a weak contraction forbids contracting all edges of T , we clearly have to allow this for partial solutions on subtrees of T (as long as some other edge not in the subtree is is not contracted, this might still yield a feasible solution).
Proof. Let C E. For the rest of the proof we omit the subscripts C and α and simply write load C,α = load and wload C,α = wload.
We first assume that C is a feasible solution for the instance (T, ℓ, ϕ) of the problem Weak Contraction. I.e., any two vertices u, w of T with dist ℓ C (u, w) > 0 satisfy the condition load(u, w) ≤ β. This is true in particular for all pairs of vertices u, w ∈ T i , i = 1, 2, implying that either C ⊇ T i or C ∩ T i T i is a feasible solution for the instance (T i , ℓ, ϕ). If wload(T 2 , v) = −∞, the claimed inequality load(T 1 , v) + wload(T 2 , v) ≤ β is trivially satisfied. So suppose that wload(T 2 , v) is a finite number, and let u ∈ T 1 and w ∈ T 2 be such that load(u, v) = load(T 1 , v), and dist ℓ C (v, w) > 0 as well as load(v, w) = wload(T 2 , v). Then we also have dist ℓ C (u, w) > 0, so we know that load(u, w) ≤ β by the assumption that C is feasible for (T, ℓ, ϕ). Combining this last inequality with the relation load(u, w) = load(u, v) + load(v, w) = load(T 1 , v) + wload(T 2 , v) proves that the right hand side of the equation is at most β, as claimed. The proof of the second inequality wload(T 1 , v) + load(T 2 , v) ≤ β works symmetrically. This proves one direction of the equivalence.
To prove the reverse direction, we now assume that either the last inequality holds by assumption). This proves that load(u, w) ≤ β, as desired. The proof of the other case dist ℓ C (v, w) > 0 works symmetrically. This completes the proof of the lemma.
As in Section 4.1, we view T as an ordered rooted tree, and consider its subtrees T v , T v,i and T + v,i for all v ∈ V and i ∈ {0, 1, . . . , c(v)} (recall the definitions given after Lemma 4.1). Let us briefly highlight the differences between Lemmas 4.1 and 4.3. The dynamic programming algorithm presented in Section 4.1 exploits the fact that the optimal way to contract exactly |C| = s edges in a subtree T v of T rooted at a particular vertex v is to contract a set of edges that minimizes load C,α (T v , v). This is possible as the optimality condition in Lemma 4.1 only depends on this parameter. Here the situation is more complicated, as Lemma 4.3 also considers wload C,α (T v , v). Figure 5 illustrates that it is not sufficient to minimize only one of these parameters.  Figure 5. Example of the behavior of the parameters load() and wload() for (α, β) = (2, 1/2). Consider the tree T in (c) and (d) with length functions that differ only in the value they assign to the topmost edge. Parts (a) and (b) of the figure show the subtree T v of T and two different subsets of edges C and C ′ of T v , respectively, drawn by dashed edges. We have load C, For the length function in (c), C is feasible and optimal, but C ′ is not feasible. For the length function in (d), on the other hand, C ′ is feasible and can be extended to an optimal solution of size 3, but C can not be extended.
Consequently, we keep track of an entire Pareto front of non-dominated partial solutions (see Figure 6). Formally, we define the set F (T v , s) of feasible partial solutions of size s as the family of all sets Note that the domination relation is reflexive, so there may be several different such minimal families, all with the same pairs of load and weak load values, and any choice among them is equally good for us. This definition is illustrated in Figure 6.
The following crucial lemma asserts that the number of points on the Pareto front, i.e., the size of the family P (T v , v, s) is at most n + 1. This property is essential for our dynamic programming approach, and it does not follow immediately from the definition of P (T v , v, s), as the set of feasible solutions F (T v , s) is typically of exponential size.

Lemma 4.4. For any
has size at most n + 1.
Proof. By the definitions (9) and (14) we have wload C, Again by the previously mentioned definitions this implies that load C, We now describe recursive relations for the weak load that are analogous to (10) and (11) for the load. It follows straightforwardly from (9) and (14) that for any vertex v of T and its children u i , i = 1, 2, . . . , c(v), and for any set of edges Note that the weak load increases if the edge {v, u i } is added (see (17a)). On the other hand, if the edge {v, u i } is not added, it may decrease or increase (the right hand side of (17b) refers to the load, not to the weak load). Moreover, for any set of edges C ⊆ T + v,i and any i = 1, 2, . . . , c(v) the definition (14) readily implies These rules together with the corresponding relations (10) and (11) allow us to compute the weak load and the load of all Pareto optimal partial solutions in a bottom-up fashion, similar to the approach taken in Section 4.1. Before it was sufficient to compute one optimal partial solution for every subtree T v,i and T + v,i , i ∈ {1, 2, . . . , c(v)}, and every possible size s of the contracted set of edges, but now our dynamic program keeps track of the entire Pareto fronts P (T v,i , v, s) and P (T + v,i , v, s). We store the corresponding pairs of load and weak load values on the Pareto front in separate four-dimensional matrices W , W + , L and L + (the entries of W and W + are certain weak load values, and the entries of L and L + are the corresponding load values). We begin defining these matrices in an abstract way, and then establish several recursive relations which directly translate into a dynamic programming algorithm. Specifically, for v ∈ V , i ∈ {0, 1, . . . , c(v)}, s ∈ {0, 1, . . . , m} and If there is no set C satisfying these requirements, we have It is easy to see that we have in fact (an analogous relation holds for the entries of L + ). The recursive relations satisfied by the matrices W , L, W + and L + defined before are captured by the following two lemmas. The initialization steps and the recursive computation of W and L are treated in Lemma 4.5. The recursive computation of W + and L + is somewhat more technical, and is treated separately in Lemma 4.6.
Furthermore, we have Note that the relations (21a)-(21f) are the initialization steps, and the relations (21g)-(21j) capture the two possibilities of either adding or not adding the edge {v, u i } to a partial solution in the tree T + u i ,c(u i ) = T u i to obtain a solution for the tree T v,i (recall (10) and (17)).
We only refer to well-defined entries of W + and L + in (21h) and in the definition of ν, Note that we either have ν ≤ λ or ν = ∞, while µ may also take a value in the open interval (λ, ∞).
Proof. The relations (21a)-(21f) follow immediately from the definitions of the trees T v,i and T + v,i and the definitions of the respective matrices given in (19) and afterwards. The relations (21g) and (21i) follow from (17) and the definitions of W and L, respectively: Consider a partial solution C ∈ F (T v i , s). If load C,α (T v,i , v) = 0, then C does not contain the edge {v, u i }, so we have W (v, i, s, 0) = µ. The other cases of (21g) as well as (21i) are implied by the following observation: The relation (21h) is closely related to (21g). If µ = ν, then (21h) follows immediately from (21g) and the definitions of W and L. If µ = ν ≤ min{β, λ}, then both a partial solution containing the edge {v, u i } as well as one missing this edge minimize the weak load. As the weak load is bounded from above by the load, we get L(v, i, s, λ) = W (v, i, s, λ) = µ in this case. This implies (21h). An analogous argument yields (21j).
The following lemma describes the recursive relations satisfied by the entries of W + and L + . Specifically, the lemma describes how to distribute s contraction edges in T + we have we have Proof. The relation (22c) follows by combining the definitions (19a) and (22a) with the relations (11), (18) and the condition (15) from Lemma 4.3. The argument for (22d) is analogous, using the definitions (19b) and (22b) instead of (19a) and (22a). The relation (22g) follows by combining the definitions (16a) and (22e) (recall also (20)) with the relations (11), (18) and the condition (15) from Lemma 4.3. The argument for (22h) is analogous, using the definitions (19a) and (22f) instead of (16a) and (22e).
We can trivially compute the quantities W (t), L(t), W * (t) and L * (t) as defined in  Proof. We define the sequence P 1 of all pairs of finite numbers (L Similarly, we define the sequence P 2 of all pairs of finite numbers (L in increasing order of λ-values. By Lemma 4.4 each of these lists has size O(n). Note that these sequences correspond to the Pareto fronts P (T + v,i−1 , v, t) and P (T v,i , v, s−t), respectively. Some pairs of points may appear multiple times consecutively in P 1 and P 2 , and in a preprocessing step we eliminate these duplicates in time O(n). We know that after this preprocessing step, the first entries in the simplified lists P 1 and P 2 are strictly increasing, and the second entries are strictly decreasing (recall Figure 6).
We first argue how to compute W (t) and L(t). We begin discarding all pairs from each list whose first entry (L + or L, respectively) is strictly greater than λ in time O(n). We then process the remaining lists P 1 and P 2 beginning at the last entries (L + j , W + j ) and (L k , W k ) (with smallest W + or W -values, respectively) in two phases.
In the first phase we compute W (t) as follows: If L + j + W k > β, we discard the last element of P 1 by decreasing j by 1 (by our sorting of the lists we know that L + j + W k ′ > β for all k ′ ≤ k). If W + j + L k > β, we discard the last element of P 2 by decreasing k by 1 (by our sorting of the lists we know that W + j ′ + L k > β for all j ′ ≤ j). Once L + j + W k ≤ β and W + j + L k ≤ β for the first time, we have found W (t) = max{W + j , W k }. If this never happens we know that W (t) = ∞. This computation is correct by the definition of Π(t, λ) in Lemma 4.6 and by (22a), and it takes time O(n).
In the second phase we compute L(t) as follows: If W (t) = ∞, we know that L(t) = ∞, too. Otherwise we distinguish two cases: If W + j ≥ W k , we decrease k further as long as both inequalities W + j ≥ W k and L + j + W k ≤ β are still satisfied (so that they still hold for the final k). If W + j ≤ W k , we decrease j further as long as both inequalities W + j ≤ W k and W + j + L k ≤ β are still satisfied (so that they still hold for the final j). In the end we set L(t) = max{L + j , L k }. Note that in the first case, the third constraint W + j + L k ≤ β remains valid by the monotonicity L k ′ ≤ L k for all k ′ ≤ k, and in the second case, the third constraint L + j + W k ≤ β remains valid by the monotonicity L + j ′ ≤ L + j for all j ′ ≤ j. Therefore, the correctness of the computation of L(t) follows from (22b).
The procedure to compute W * (t) and L * (t) processes P 1 and P 2 (as obtained from the preprocessing step explained in the beginning) starting at the first entries (L + j , W + j ), j = 1, and (L k , W k ), k = 1, in two phases very similarly to before. We omit the details here.
We are now ready to prove Theorem 6.
Proof of Theorem 6. Given the instance (T, ℓ, ϕ), we fix an arbitrary root r of T and an arbitrary ordering of the children of each vertex, making T an ordered rooted tree.
We begin precomputing and sorting all of the sets Λ(T v,i , v) and Λ(T + v,i , v), v ∈ V , i ∈ {0, 1, . . . , c(v)}, and we maintain them as sorted lists throughout the algorithm. This takes time O(n 2 log n) in total (recall Lemma 4.4).
Each of the matrices W , L, W + and L + has O(n 3 ) entries (recall Lemma 4.4). Computing an entry of W or L takes O(n) time by Lemma 4.5, while computing an entry of W + or L + can be achieved in time O(n 2 ) by Lemma 4.7, so the runnning time of our dynamic program is O(n 5 ).

Hardness for additive tolerance functions
In this section we prove that the problems Contraction and Weak Contraction for the tolerance function ϕ(x) = x − β (purely additive error) are hard already on cycles (Section 5.1 below). We then prove that Contraction with the same tolerance function is hard to approximate for general graphs and for bipartite graphs (Section 5.2).

Hardness of Contraction and Weak Contraction.
Recall that we can compute optimal (weak) (α, β)-contractions in polynomial time on trees (this was shown in Section 4.1), and have a linear time algorithm for Contraction on cycles with unit length edges (this was shown in Section 3.2). We now show that the problem with α = 1 is NP-hard on cycles with arbitrary edge lengths.
Theorem 7 (where β is not part of the input) follows immediately from Theorem 8 below (where β is part of the input). The reason is that an instance with α = 1 does not change when multiplying all edge lengths and β by some constant. The rest of this section is devoted to proving Theorem 8. For our proof we will use the following variant of the well-known problem Partition, referred to as Close-to-1 Partition. To state the problem we say that a set of positive real numbers {a 1 , a 2 , . . . , a n } is close to 1, if n i=1 a i = n and ε := n i=1 |a i − 1| < 1/5.

Input:
A set of positive real numbers {a 1 , a 2 , . . . , a n } that is close to 1. Output: 'Yes' if there is a subset I ⊆ [n] such that i∈I a i = i∈[n]\I a i , 'No' otherwise.
Note that for a 'Yes'-instance of this problem, the solution I ⊆ [n] must have size n/2, so |I| = |[n] \ I| = i∈I a i = i∈[n]\I a i = n/2. In particular, this implies that n is even.
In the classical problem Partition, the input set is not constrained to be close to 1. Partition was shown to be NP-complete already in Karp's seminal paper [Kar72]. The fact that Close-to-1 Partition is also NP-complete follows from a straightforward rescaling argument.
Proof. Given an instance {a 1 , a 2 , . . . , a n } of Partition, we first add n additional zeroes a n+1 = a n+2 = · · · = a 2n = 0 to the instance (by this we ensure that a partition with equal sums is transformed into one where both partition classes have the same number n of summands). We then linearly transform all the a i according to a ′ i := (a i + C)/D, where C and D are sufficiently large constants so that the transformed values a ′ i are close to 1. The transformed set of numbers has even cardinality 2n, is close to 1, and it admits a partition into two sets of size n with equal sum if and only if the original instance allows a partition into two sets with equal sum.
Proof of Theorem 8. We first focus on the problem Contraction. We reduce Closeto-1 Partition, which is NP-complete by Lemma 5.1, to the problem Contraction on a cycle with tolerance function ϕ(x) = x − β, β ≥ 0.
Let I = {a 1 , a 2 , . . . , a n } be an instance of Close-to-1 Partition such that a 1 ≥ a 2 ≥ · · · ≥ a n . This ensures that all a i that are bigger than 1 appear before all a i that are smaller than 1, which is the only property of the ordering that we exploit in the proof later on. The instance of Contraction we construct is on the cycle C 2n+4 with 2n + 4 edges. We label the vertices of the cycle by walking around the cycle as follows: The first n + 1 vertices are labelled u 0 , u 1 , . . . , u n , then there are two special vertices v 1 , v 2 , and the remaining n + 1 vertices are labelled w 0 , w 1 , . . . , w n , see Figure 7. We denote the subpaths of the cycle given by all edges {u i , u i+1 } as P u , and the subpath given by all edges {w i , w i+1 } by P w .
We will show that J has an optimal solution of cardinality (and thus of value) n + 2 if and only if I is a 'Yes'-instance. In particular, we will see that any feasible solution of J of size n + 2 contains the two edges of length ε and exactly n/2 edges with length a i , i ∈ I, from P u and the corresponding edges with length 2 − a i , i ∈ I, from P w . Such solutions correspond to subsets of [n] in the following natural way: For any subset I ⊆ [n] of size n/2 we let C(I) be the subset of edges of the cycle C 2n+4 consisting of the two edges of length ε and of all edges {u i−1 , u i } and {w i−1 , w i } (of length a i or 2 − a i , respectively) for all i ∈ I. Thus we will show that C(I) is an optimal solution of the instance J of Contraction if and only if i∈I a i = i∈[n]\I a i = n/2, i.e., I is a 'Yes'-instance of Close-to-1 Partition.
Both directions of this equivalence are captured and proved as Claim 2 and 4 below. Claims 1 and 3 are auxiliary statements used in the proof of these two main claims.
For any path P on the cycle we let ℓ(P ) denote the sum of ℓ(e) over all edges e of P . For all i ∈ [n] we denote by P ⊐ i and P ⊏ i the path on the cycle between the vertices u i and w i that contains and that does not contain the edge {v 1 , v 2 }, respectively (in Figure 7, these are the right and left segment of the cycle).
Claim 1: For all i ∈ [n], the number ℓ(P ⊐ i ) lies in the interval [n + β ′ + ε, n + β ′ + 2ε] and the number ℓ(P ⊏ i ) lies in the interval [n + β ′ + 2ε, n + β ′ + 3ε]. In particular, we have Proof of Claim 1: Note that the condition n i=1 a i = n implies that ε = 2 (1 − a i ). (23) By our assumption a 1 ≥ a 2 ≥ · · · ≥ a n , the numbers ℓ(P ⊐ i ) form a unimodular sequence for i = 0, 1, . . . , n that is maximized for i = 0 and i = n, proving that ℓ(P ⊐ i ) ≤ n + β ′ + 2ε (note that ℓ(P u ) = ℓ(P w ) = n). By (23) the minimum of this unimodular sequence is at most ε smaller than the maximum. This proves the first part of the claim. As ℓ(P ⊐ i ) + ℓ(P ⊏ i ) = 2(n + β ′ + 2ε), we obtain the second part of the claim. The last part of the claim is an immediate consequence of the first two.
Claim 2: If I ⊆ [n] is a solution of the instance I of Close-to-1 Partition such that i∈I a i = i∈[n]\I a i = n/2, then C(I) is a (1, β)-contraction. Proof of Claim 2: It suffices to prove that there is no pair of vertices whose distance decreases by more than β when contracting the edges in C(I).
We start by verifying this for the pairs u i , w i for i ∈ [n]. We first consider the path P ⊐ i between u i and w i . Observe that e∈C(I)∩P ⊐ i ℓ(e) lies in the interval [n/2 + ε, n/2 + 2ε] = [β−ε, β]. Similarly to before, this follows from the observation that by the assumption a 1 ≥ a 2 ≥ · · · ≥ a n those sums form a unimodular sequence for i = 0, 1, . . . , n that is maximized for i = 0 and i = n, and by using (23) (recall also that |I| = n/2). Consequently, we have Since e∈C(I) ℓ(e) = n + 2ε = 2β − 2ε, we obtain that e∈C( Combining (24) and (25) proves that Now consider two vertices u i and w j , j < i (the case j > i can be treated analogously). Let P ⊐ i,j and P ⊏ i,j be the path on the cycle between the vertices u i and w j that contains and that does not contain the edge {v 1 , v 2 }, respectively. Using that from (24). We know that a i ≤ 1 + 1/5 ≤ 8/5 and consequently 2 − a i ≥ 2/5 ≥ 2ε (28) by the assumption that the input {a 1 , a 2 , . . . , a n } of the instance I is close to 1 (there is plenty of leeway in all those inequalities). Furthermore, we have where the second-to-last inequality follows from Claim 1. Combining those observations yields Combining (27) and (30) proves that From (27) and (30) we can derive analogous relations for the remaining cases where we need to consider the distance between a vertex u i , i ∈ [n], and a vertex w ∈ {v 1 , v 2 , u 0 , u 1 , . . . , u i−1 , u i+1 , . . . , u n }, between a vertex w i , i ∈ [n], and a vertex u ∈ {v 1 , v 2 , w 0 , w 1 , . . . , w i−1 , w i+1 , . . . , w n }, and between the vertices v 1 and v 2 . This completes the proof of Claim 2.
Claim 3: Every (1, β)-contraction C contains at most n/2 edges in (P u ∪ P w ) ∩ P ⊐ i and at most n/2 edges in (P u ∪ P w ) ∩ P ⊏ i for all i ∈ [n]. Proof of Claim 3: Note that for any I ⊆ [n] and k ∈ {0, 1, . . . , n} we have i∈I:i>k a i + i∈I:i≤k (2 − a i ) ≥ |I| − ε by the definition of ε. Consequently, assuming for the sake of contradiction that C contains strictly more than n/2 edges in ( Similarly, assuming that C contains strictly more than n/2 where we used that ε < 1/5 in the second-to-last step. This contradicts the fact that C is a (1, β)-contraction, proving Claim 3. Claim 4: Let C be a feasible solution of the instance J of Contraction. Then we have |C| ≤ n + 2, and if |C| = n + 2, we have C = C(I) for some set I ⊆ [n] with i∈I a i = i∈[n]\I a i = n/2. Proof of Claim 4: As C does not contain any of the edges of length β ′ or β ′ + 2ε, we have |C| ≤ n + 2 by Claim 3 (the +2 comes from the two edges of length ε that may be contained in C). Suppose now that |C| = n + 2. Applying Claim 3 again shows that C must contain both edges of length ε, and that it contains the edge By Claim 1 we have dist ℓ (u 0 , w 0 ) = ℓ(P ⊐ 0 ) and dist ℓ (u n , w n ) = ℓ(P ⊐ n ). As C is a (1, β)contraction containing the two edges of length ε we thus obtain i∈I a i = e∈C∩Pu ℓ(e) ≤ β−2ε = n/2. Similarly, we have i∈[n]\I a i = i∈I (2−a i ) = e∈C∩Pw ℓ(e) ≤ β−2ε = n/2. As i∈[n] a i = n, these two inequalities must be tight, yielding i∈A a i = i∈[n]\I a i = n/2.
Combining Claims 2 and 4 proves the statement of the theorem for the problem Contraction.
The hardness proof for the problem Weak Contraction is analogous, only the argument that no feasible solution C of the instance J contains one of the edges of length β ′ or β ′ + 2ε (this is part of the proof of Claim 4) has to be adjusted.
The reader might be tempted to 'simplify' the previous reduction proof by omitting the four special edges of length ε, β ′ and β ′ + 2ε and by setting β := n/2 instead. However, this would invalidate Claim 2 (specifically, the estimate (25) would not always hold).

Inapproximability of Contraction.
We are able to extend the before-mentioned hardness result for Contraction as follows: Theorem 9. For any fixed β > 0 and ε > 0, it is NP-hard to approximate the problem Contraction with tolerance function ϕ(x) = x − β, β ≥ 0, to within a factor of n 1−ε .
For the following theorem the additive error is fixed to β = 1.
Theorem 10. For any ε > 0, it is NP-hard to approximate the problem Contraction with tolerance function ϕ(x) = x − 1 on bipartite graphs with unit length edges ℓ = 1 to within a factor of m 1/2−ε .
Our reductions are based on the inapproximability of the well-known Clique problem. Recall that a clique in a graph G is a complete subgraph of G.

Input:
A graph G. Output: A clique in G of maximum size.
It was shown in [Zuc07] that for any ε > 0, it is NP-hard to approximate Clique to within a factor of n 1−ε .
The following lemma will be used in our proofs. It shows that for (1, β)-contractions the feasibility condition (1) needs not be checked for all pairs of vertices u and v, but only for those satisfying certain extra conditions. E is a (1, β)-contraction if and only if all pairs of vertices u, v ∈ V with the property that every shortest path with respect to ℓ C between u and v starts and ends with an edge from C satisfy condition (1).
These relations together with Lemma 5.2 show that C(U ) is a (1, β)-contraction in H if and only if U is a clique in G.
As n(H) differs from n(G) only by a constant factor, an n 1−ε -approximation algorithm for Contraction would yield an n 1−ε ′ -approximation algorithm for Clique via this reduction. Together with the before-mentioned inapproximability of Clique [Zuc07] this proves the theorem.
The rest of this section is devoted to proving Theorem 10, so we now focus on (1, 1)contractions in bipartite graphs with unit length edges ℓ = 1. The next lemma characterizes the structure of contractions in this setting. Lemma 5.3. Let G = (V, E) be a bipartite graph with unit edge lengths ℓ = 1 and let C ⊆ E be a set of edges. C is a (1, 1) is a (1, 1)-contraction if and only if all two-element subsets of C are.
Proof. (i) Suppose for the sake of contradiction that C contains a path (u, v, w) on two edges. As G is bipartite, it has no triangles, so dist ℓ (u, w) = 2 and dist ℓ C (u, w) = 0, a contradiction to the assumption that C is a (1, 1)-contraction.
With Lemma 5.3 in hand, we are now ready to prove Theorem 10. For any set of vertices U ⊆ V we define C(U ) := {f u : u ∈ U } (see Figure 9). Claim 1: If U ⊆ V is a clique in G, then C(U ) is a (1, 1)-contraction in H and Φ(C(U )) = |U |.
For any set of edges C ⊆ E(H), we let U (C) be the set of vertices v ∈ V for which (v, 1) is incident to an edge in C. Proof of Claim 2: C is a matching by Lemma 5.3 (i). Let u, v ∈ U (C). We will show that e = {u, v} ∈ E by applying Lemma 5.3 (ii) to the two edges in C incident to (u, 1) and (v, 1). To prove that e ∈ E it suffices to show that dist ℓ ((u, 1), (v, 1)) = 2.
Every edge in H is either incident to s or to a vertex of the form (v, 1), v ∈ V . Since at most one of the edges incident to s can be in C, the definition of U (C) shows that the size of U (C) is either |C| − 1 or |C|. Therefore, to finish the proof of Claim 2, it suffices to show that Φ(C) ≤ |C| + 2. If C contains no two edges that are connected by more than one edge in H, then we have Φ(C) = |C|. Otherwise we consider two such edges f and g from C. It is easy to check that either f or g must be incident to s, so suppose that the edge f contains s. We first consider the case that f = {s, x e } for some edge e = {u, v} ∈ E. In this case it follows that g = {(u, 1), (u, 2)} or g = {(v, 1), (v, 2)}, so we have Φ(C) = |C| + 2. Now consider the case that f = {s, (u, 2)} for some vertex u ∈ V .
In this case it follows that g = {(u, 1), x e } for exactly one edge e ∈ E incident to u in G, showing that Φ(C) = |C| + 2. In all three cases we have Φ(C) ≤ |C| + 2, as claimed.
Combining Claims 1 and 2 will allow us to prove the following claim: Claim 3: If there is an n 1/2−ε -approximation algorithm for Contraction, then there is an n 1−ε/2 -approximation algorithm for Clique.
Proof of Claim 3: Suppose for the sake of contradiction that such an approximation algorithm for Contraction exists. We use it to compute a clique in a given instance G of Clique as follows: We construct I = (H(G), ℓ, ϕ) and compute a solution C of Contraction for this instance, and we define the clique U (C) as before (recall Claim 2). If U (C) = ∅, we return U (C), otherwise we return any vertex from G. We denote the clique computed in this fashion by U .
We may assume that n(G) ≥ 16 1/ε , in particular n(H) ≥ 16 1/ε . It follows that By assumption we know that where C * is an optimal solution of I. In particular, Φ(C) is positive. Combining these observations we get where the second inequality holds because of Claim 2, and the last inequality involving the clique number ω(G) holds because of Claim 1.

Hardness for multiplicative tolerance function
By Theorem 7, the problem Weak Contraction with purely additive tolerance function ϕ(x) = x − β is NP-hard on cycles. In this section we prove the hardness and inapproximability of this problem also in the case of a purely multiplicative tolerance function ϕ(x) = x/α, α ≥ 1. Recall that the problem Contraction is trivial for this tolerance function (we may not contract any edges).

Hardness of planar Weak Contraction.
To state the main result of this section recall that the girth of a graph G is defined as the minimum length of a cycle in G.
Theorem 11. For any g ≥ 2, the problem Weak Contraction with tolerance function ϕ(x) = x/2, is NP-hard for planar graphs with girth at least 3g and unit length edges ℓ = 1.
Theorem 11 implies that Weak Contraction is hard for a general multiplicative tolerance function ϕ(x) = x/α, α ≥ 1, but it leaves open the question whether this is true also for other fixed values of α other than 2 (when α is not part of the input). The arguments given in this section for α = 2 carry over straightforwardly to any fixed value 2 ≤ α < 3, but not to 3 or larger values (for α < 2 and unit length edges the problem is trivial).
We first characterize the set of feasible solutions in this special case.
Lemma 6.1. Let G = (V, E) be a graph with girth at least 6 and unit length edges ℓ = 1, and consider the tolerance function ϕ(x) = x/2. Furthermore, let C ⊆ E be a set of edges such that (V, C) is disconnected. Then C is a weak (2, 0)-contraction if and only if for any two edges e, f ∈ C either e and f are incident and both contain a degree-1 vertex, or any path containing e and f also contains at least two edges not in C.
Recall that the assumption that (V, C) is disconnected prevents solutions C ⊆ E for which the contracted graph G/C is a single vertex. Note that Lemma 6.1 does not require G to be planar.
Proof. To prove the equivalence, we need the following auxiliary claim: Claim: If C is a weak (2, 0)-contraction, then every component of (V, C) that is not a single edge is a star with the property that each of its vertices except the center of the star has degree 1 in G.
Proof of Claim: Let M be a component of (V, C) with more than one edge. Clearly, there must be an edge {u, v} with vertices u / ∈ V (M ) and v ∈ V (M ). If M contains a path P on two edges starting at v and ending at some vertex w, then dist ℓ (u, w) = 3 and dist ℓ C (u, w) = 1, a contradiction to the assumption that C is a weak (2, 0)-contraction (note that P ∪ {u, v} is the shortest path between u and w, as the girth of G is at least 6). Thus the edges of M must form a star centered at v. By the same argument, no vertex outside M can be connected to any vertex of M other than v. This proves the claim.
We first assume that C is a weak (2, 0)-contraction, and we need to show that any two edges e, f ∈ C satisfy the conditions of the lemma. If e and f are incident, the statement follows from the auxiliary claim from before. If e and f are not incident, we consider an inclusion-minimal path P containing both e and f . We let u and v be the end vertices of P , u ′ the other end vertex of e, and v ′ the other end vertex of f (u ′ and v ′ are the vertices at distance 1 from the ends of the path). If the distance between u ′ and v ′ was only 1, we have dist ℓ (u, v) = 3 and dist ℓ C (u, v) = 1 (here we need again the assumption that the girth is at least 6), a contradiction to the assumption that C is a weak (2, 0)-contraction. Therefore at least two edges lie between u ′ and v ′ . The auxiliary claim from before implies that no two incident edges on P between u and v are contained in C, therefore P must contain at least two edges not in C. This proves one direction of the equivalence.
To prove the other direction, we now assume that any two edges e, f satisfy the conditions of the lemma, and we need to show that C is a weak (2, 0)-contraction. Consider any two vertices u and v with dist ℓ C (u, v) > 0, and any path between u and v. As no inner vertex of P is a leaf, we know that between any two consecutive edges from C on P there are at least 2 edges not in C. This proves that dist ℓ C (u, v) ≥ dist ℓ (u, v)/2, as desired.
This completes the proof of the lemma.
For a given propositional formula F in conjunctive normal form (CNF) the bipartite variable-clause graph Γ(F ) is defined as follows: The two partition classes of Γ(F ) are given by the sets of variables and clauses of F , and there is an edge between a variable x and a clause c if x appears in c. If c contains x as a positive or negative literal, we call the corresponding edge of Γ(F ) a positive or negative edge, respectively. A planar drawing of Γ(F ), where positive and negative edges appear in cyclically contiguous intervals around every variable vertex, is called contiguous.
We call a k-CNF formula regular, if every clause contains exactly k literals, no clause contains a literal twice, every variable appears at least once as a positive literal and at least once as a negative literal in the formula.
Consider now the following variant of 3SAT.

Input:
A regular 3-CNF formula F and a contiguous planar drawing of Γ(F ). Output: 'Yes', if F has a satisfying assignment, 'No' otherwise.
Proof. The more general variant of Contiguous Planar 3SAT not requiring F to be regular was shown to be NP-complete in [dBK12]. We now show how to reduce this generalization to Contiguous Planar 3SAT, which will prove the lemma. Given a (not necessarily regular) 3-CNF formula F we first eliminate all variables appearing only as negative or only as positive literals and all clauses containing exactly one literal, as well as multiple appearances of literals in the same clause. This yields a formula F ′ in which all clauses have two or three literals, no clause contains a literal twice, and every variable appears at least once as a positive literal and at least once as a negative literal in F ′ . Moreover, since Γ(F ′ ) is a subgraph of Γ(F ), we also obtain a contiguous planar drawing of Γ(F ′ ). As a last step we eliminate clauses c with two literals by introducing a new variable x for each of them and replacing c by the equivalent formula (c ∨ x) ∧ (c ∨ x). It is easy to check that the resulting formula F ′′ is regular and equisatisfiable to F , and to obtain a contiguous planar drawing of Γ(F ′′ ), see Figure 10. Proof of Theorem 11. We first present the proof for the case g = 2, and then sketch how to generalize it for larger values of g. We reduce Contiguous Planar 3SAT to Weak Contraction. Consider an instance F of Contiguous Planar 3SAT with variables x 1 , x 2 , . . . , x n and clauses c 1 , c 2 , . . . , c m .
Given the formula F , we construct from it a graph G = G(F ) as follows, see Figures 11 and 12. For every variable x i , i ∈ [n], we add a variable gadget H(x i ) as shown on the left hand side of Figure 11 to the graph G. The vertices u i and u i will be used later to connect this gadget to other parts of the graph. The idea of the variable gadget is that an optimal solution of our instance of Weak Contraction should contain either the four edges corresponding to setting x i to true or false, respectively.
For every clause c j , j ∈ [m], we add a clause gadget H(c j ) (a star with three edges) as shown on the right hand side of Figure 11 to the graph G. The vertices v 1 j , v 2 j and v 3 j will be used later to connect this gadget to other parts of the graph. The idea of the clause gadget is that a feasible solution contains at most one of these three edges, and if it does contain one of them, this restricts the choice we have inside the respective neighbouring variable gadget. Figure 11. Variable gadget (left) and clause gadget (right) used in the proof of Theorem 11 for g = 2. The vertices used for connecting those gadgets to the rest of the graph are marked in black.
We connect the variable and clause gadgets in G as follows (see Figure 12): For every j ∈ [m] and k ∈ [3], if the k-th literal in the clause c j is x i , we add an edge connecting u i to v k j , and if the k-th literal in the clause c j is x i , we add an edge connecting u i to v k j . We refer to the edges added to G in this step as connection edges.
This completes the definition of the graph G = G(F ). It is easy to see that this graph is planar. Specifically, a planar embedding can be obtained from the given planar embedding of Γ(F ) by replacing variable vertices x i in Γ(F ) by the variable gadgets H(x i ) in G, and by replacing clause vertices c j by the clause gadgets H(c j ). Using that for each variable vertex x i in Γ(F ) the positive and negative edges appear in cyclically contiguous intervals around x i , the connection edges in G (that connect the variable and clause gadgets) can also be drawn in a planar fashion.
Moreover, it is easy to check that G has girth 6 and no degree-1 vertices. Now consider the instance I := (G, ℓ, ϕ) of the problem Weak Contraction with ℓ = 1 (unit length edges) and the tolerance function ϕ(x) = x/2. Lemma 6.1 implies that any feasible solution of I is a matching, as G has no vertices of degree 1. As G contains no cycles of length 3 or 4, it cannot contain two edges between vertex sets of two different components of (V, C) for any such feasible solution C. This implies that our objective function satisfies Φ(C) = |C|.
We proceed to show that F is satisfiable if and only if I has an optimal solution of cardinality (and thus of value) 4n + m. Specifically, a satisfying assignment of F corresponds to a solution that contains exactly all edges of either T i or F i in H(x i ) for every variable i ∈ [n] (corresponding to the value true or false assigned to this variable, respectively) and exactly one edge in H(c j ) for each clause j ∈ [m] (corresponding to a literal that satisfies this clause).
Formally, for any variable assignment τ : {x 1 , x 2 , . . . , x n } → {true, false}, we define the set of edges C(τ ) ⊆ E(G) as follows: C(τ ) contains all edges of T i for any variable x i , i ∈ [n], that τ sets to true, and it contains all edges of F i for any variable x i that τ sets to false. Moreover, for every clause c j , j ∈ [m], that is satisfied by τ , we choose an index k ∈ [3] of a literal in c j that is satisfied by τ and add the edge e k j to C(τ ). The following claim is an immediate consequence of Lemma 6.1. Proof of Claim 2: The first and last statement are immediate consequences of Claim 1. The argument for the second statement is as follows: For all i ∈ [n] we let E i denote the set of edges {h i , t i , f i } plus the connection edges incident to u i , and we let E i denote the set of edges {h i , t i , f i } plus the connection edges incident to u i . By Claim 1, C contains at most two edges from E i , and if the intersection size is two, then C must contain the edge h i . Similarly, C contains at most two edges from E i , and if the intersection size is two, then C must contain the edge h i . As C cannot contain h i and h i simultaneously, C contains at most three edges from E i ∪ E i , and if the intersection size is three, then C must contain either h i or h i . Again by Claim 1, C contains at most two edges from the However, if C contains one of the edges h i or h i , it contains at most one edge from this 6-cycle. This proves that C indeed contains at most four edges from H(x i ) + .
Note that every edge of G belongs to exactly one subgraph H(x i ) + or H(c j ). So if |C| = 4n + m, we know by Claim 2 that C contains exactly four edges from H(x i ) for all i ∈ [n] and exactly one edge from H(c j ) for all j ∈ [m], and none of the connection edges in G.
Claim 3: For any i ∈ [n], if C contains four edges from H(x i ) and if f i is not among them, then those edges must be T i . On the other hand, if t i is not among them, those edges must be F i . In particular, these two cases cannot occur simultaneously.
Proof of Claim 3: If C contains four edges from H(x i ) and f i is not among them, Claim 1 enforces taking first the edge t i , then t ′ i and t ′′ i , and eventually t i . This proves the first part of the statement. The argument for the second part is symmetric. The third part of the statement is a consequence of the first two.
So given a solution C of I of size 4n + m, we can derive from it a satisfying assignment τ of F as follows: For every clause c j , j ∈ [m], we consider the unique edge e k j from H(c j ) that belongs to C. We follow the attachment edge incident to e k j , leading to the corresponding variable gadget H(x i ), and connecting to either u i or u i . If the attachment edge connects to u i , then by Claim 1, f i / ∈ C, so by Claim 3, the four edges of H(x i ) contained in C must be T i , so we define τ (x i ) := true. If the attachment edge connects to u i , then by Claim 1, t i / ∈ C, so by Claim 3, the four edges of H(x i ) contained in C must be F i , so we define τ (x i ) := false. This process does not lead to any contradicting variable assignments by the last statement of Claim 3. However, this process may leave some variables x i undefined, and we can set them arbitrarily, e.g., τ (x i ) := true. By construction, each clause receives a satisfying literal, so the assignment τ is indeed a satisfying assignment of F . This proves that F is satisfiable if and only if I has a feasible solution of size 4n + m (which must be optimal by Claim 2), completing the proof of the theorem in the case g = 2.
For values g ≥ 2, the construction of the gadgets H(x i ) and H(c j ) can be generalized as follows: We subdivide each of the edges h i , h i and h ′′ i , and each of the edges e 1 j , e 2 j and e 3 j into 1 + 3(g − 2) edges. Then the resulting graph G = G(F ) clearly has girth 3g, and the above arguments can be easily modified to show that any solution C of I contains at most 1 + 3(g − 2) = 3g − 5 edges from H(c j ) for all j ∈ [m], and at most 4 + 3(g − 2) = 3g − 2 edges from H(x i ) + for all i ∈ [n], and that F is satisfiable if and only if I has an optimal solution of size (3g − 2)n + (3g − 5)m. This completes the proof.
6.2. Inapproximability of Weak Contraction. We are able to further extend our hardness results for Weak Contraction as follows: We proceed to show that I has a feasible solution of value k if and only if G has an independent set of size k. This is an immediate consequence of Claim 3 below. To prove Claim 3 we need the following two auxiliary claims.
Claim 1: For any induced subgraph of H that is a path on two edges, a feasible solution C of I does not contain only the longer of the two edges (either it contains none of the two, the shorter of the two if there is one, or both).
Proof of Claim 1: Consider a path on two edges {u, v}, {v, w} of length 2 in H such that {u, w} / ∈ E(H), and suppose for the sake of contradiction that {u, v} ∈ C, but {v, w} / ∈ C. Then we have dist ℓ (u, w) = 4 and dist ℓ C (u, w) = 2, violating the condition (1) for the given tolerance function. A similar contradiction arises if one of the edges has length 1 and the other length 2, and only the edge of length 2 is contracted. This proves the claim.
Claim 2: No feasible solution of I contains an edge of length 2. Proof of Claim 2: Assume for the sake of contradiction that a feasible solution C contains an edge e of length 2. Note that any edge f of H may be reached from e via a walk e 1 . . . e k where e 1 = e and e k = f , and for all i < k − 1 we have ℓ(e i ) = 2 and the edges e i and e i+1 induce a path in H. Now successively applying Claim 1 to the subgraphs induced by e i and e i+1 for i < k shows that C contracts f . Thus C violates the condition that a weak contraction must not contract every edge.
Claim 2 implies that our objective functions satisfies Φ(C) = |C| for every feasible solution C of I, because H never contains two edges between two different connected components of (V, C). Proof of Claim 3: Let C be a feasible solution of I. By Claim 2, C contains only edges of length 1, so we have C = C(U ) for some set of vertices U in G. Suppose that two such vertices u, v ∈ U are connected by an edge, then we would have dist ℓ (u, v) = 4 and dist ℓ C (u, v) = 2, violating the condition (1) for the given tolerance function. It follows that U is an independent set.
To prove the other direction of the equivalence, let U be an independent set in G and consider the set of edges C(U ) in H. To verify that C is a weak (3/2, 0)-contraction, it suffices to check condition (1) between the end vertices of paths on two edges, one of length 1 from C and the other of length 2, and for paths on k edges that start and end with an edge of length 1 from C. In the first case the contraction C(U ) changes the distance from 3 to 2, which is compatible with (1). In the second case the contraction C(U ) changes the distance from 2k − 2 to 2k − 4, which is also compatible with (1), where we use that k ≥ 4 because of the assumption that U is an independent set.
Claim 3 implies that I has a feasible solution with k edges if and only if G has an independent set of size k. As n(H) = 3n(G) = O(n(G)), the theorem follows from the [Zuc07] result.

Asymptotic bounds
In this section we show how to compute contractions for graphs that are not optimal (i.e., they do not maximize the objective function (2)), but that can be computed efficiently despite our hardness results presented in the previous sections. The main results of this section are Theorem 13 and the corresponding (not tight) lower bound (Theorem 15) that is conditional on the validity of Erdős' girth conjecture. We also prove that a contraction significantly reduces the number of vertices in dense input graphs (Theorem 17), and provide a corresponding matching lower bound (Theorem 18).
Throughout this section, we assume all graphs to have unit length edges ℓ = 1.
Theorem 13. Let k ≥ 1 be a real number. Any graph G has a (2k − 1, 1)-contraction C such that the contracted graph G/C has at most n 1+1/k edges, and such a contraction can be computed in time O(m).
In Theorem 13 and all following statements and proofs, n and m refer to the number of vertices and edges of the input graph G, not of the contracted graph G/C.
Setting k := log 2 n in Theorem 13 yields the following corollary.

Corollary 14.
Any graph G has a (2 log 2 n − 1, 1)-contraction C such that the contracted graph G/C has at most 2n edges, and such a contraction can be computed in time O(m).
To prove Theorem 13, we use a clustering approach as presented in [Awe85]. Specifically, the following crucial lemma appears in a slightly weaker form in that paper. To state the lemma we introduce a few definitions. For any real number r ≥ 1, we define an r-partition of a graph G = (V, E) as a set of clusters P i ⊆ V , i ∈ [l], with corresponding cluster centers p i ∈ P i , where the P i are required to form a partition of the vertex set V and where dist ℓ (p i , u) ≤ r − 1 for all u ∈ P i and i ∈ [l]. We denote the resulting r-partition by P := {(p i , P i ) : i ∈ [l]}. We write ρ(P ) for the number of pairs 1 ≤ i < j ≤ l for which P i and P j are connected by at least one edge, and we refer this quantity as the density of P .
Lemma 7.1. Let r ≥ 1 be a real number. Any graph G has an r-partition P with density ρ(P ) ≤ n 1+1/r , and such a partition can be computed in time O(m).
Proof. The idea of the algorithm is to build an r-partition P of G iteratively in rounds. In each round, we build a new cluster and remove all vertices from that cluster from the graph, processing the subgraph on the remaining vertices in the next round. The algorithm proceeds until all vertices are assigned to a cluster. In round i, we choose an arbitrary vertex p i as a cluster center, and define layers L i,0 , L i,1 , . . . around the vertex p i , where the layer L i,j consists of all vertices at distance exactly j from p i (this distance is measured in the subgraph of G under consideration in this round). We continue computing these layers as long as the number of vertices in the new layer is at least the number of vertices in all previous layers times the factor n 1/r . The cluster P i is defined as the union of all layers around p i satisfying this expansion condition. We refer to the first layer violating this condition (which is not added to P i anymore) as the rejected layer. We let P denote the partition of the vertices of G computed in this fashion.
To verify that P is indeed an r-partition, we proceed to show that each vertex within a cluster has distance at most r − 1 from the center vertex of that cluster, and that the density ρ(P ) of the partition is at most n 1+1/r . Intuitively, the expansion condition in the definition of the layers ensures that a cluster has few layers and that the number of edges that go to unclustered vertices is small.
Consider a cluster P i with center vertex p i and the layers L i,0 , L i,1 , . . . , L i,d . Suppose for the sake of contradiction that d ≥ r. By the definition of the layers in the algorithm we know that |L i,j | ≥ n 1/r j−1 k=0 |L i,k | holds for all j ∈ [d], implying that |L i,j | ≥ n j/r . Consequently, the size of the cluster satisfies |P i | = d j=0 |L i,j | ≥ 1 + n r/r = n + 1, a contradiction.
We now show that ρ(P ) ≤ n 1+1/r . The key idea is that the number of vertices in the rejected layer of a cluster P i is at most n 1/r |P i |. Thus the number of edges from P i to clusters that are created later is at most n 1/r |P i |. For every edge between two clusters we let the cluster that is created first account for that edge. Summing over all these edges between clusters yields the desired upper bound of ρ(P ) ≤ n · n 1/r = n 1+1/r .
Using breadth-first search, the partitioning algorithm described above runs in time O(m) (recall that G is assumed to be connected). This completes the proof of the lemma.
With Lemma 7.1 in hand, we are now ready to prove Theorem 13.
Proof of Theorem 13. Given G = (V, E), we first compute a k-partition P into l clusters as described by Lemma 7.1. We define the set C of contracted edges as the union of all edges within the clusters, C := {{u, v} ∈ E : u, v ∈ P i for some i ∈ [l]}. We thus contract each cluster into a single vertex and remove from every set of resulting parallel edges all but a single edge.
We proceed to show that C is a (2k − 1, 1)-contraction, i.e., we show that dist ℓ C (u, v) ≥ dist ℓ (u, v)/(2k − 1) − 1 for all u, v ∈ V . Consider two vertices u ∈ P i and v ∈ P j , where i and j might be equal. Let Q u,v be the shortest path from u to v in G with edge lengths ℓ C (all edges from C receive length zero). The length d of Q u,v is the number of edges on that path that connect different clusters. Note that Q u,v enters and leaves each of the d + 1 visited clusters at most once, using at most 2k − 2 edges in every cluster, so in G (where all edges have unit lengths) we get dist ℓ (u, v) ≤ d + (d + 1)(2k − 2).
Combining these observations we obtain proving the claim. It remains to show that the contracted graph G/C has at most n 1+1/k edges, which is an immediate consequence of the upper bound m(G/C) = ρ(P ) ≤ n 1+1/k given by Lemma 7.1. This completes the proof of the theorem.
Theorem 15. Assuming Erdős' girth conjecture, there exists for any integer k ≥ 2 a graph G such that any (k − 1, 1)-contraction C results in a graph G/C with Ω(n 1+1/k ) edges.
Proof. For a given integer k ≥ 2 let G be a graph that is guaranteed by Erdős' girth conjecture, i.e., G has girth 2k+1 and Ω(n 1+1/k ) edges. Consider any (k−1, 1)-contraction C on G, and consider a connected component of the graph (V, C). Applying (1) shows that dist ℓ (u, v) ≤ k − 1 holds for any two vertices u and v in that component. Using that the girth of G is 2k + 1, it follows that for any cycle in G, the connected component of (V, C) does not contain a contiguous segment of cycle edges of length at least half of the cycle. This implies that all connected components of the graph (V, C) are trees with diameter at most k − 1. Therefore, the total number of edges within all connected components of (V, C) is at most n. We will further argue that there is at most one edge between any two connected components. Suppose for the sake of contradiction that there are two components of (V, C) with two different edges connecting them, say {u, v} and {u ′ , v ′ }, where u and u ′ lie in the same connected component and v and v ′ in the other. As the diameter of each component is at most k − 1, it follows that in G there is a path from u to u ′ of length at most k − 1, and a path from v to v ′ of length at most k − 1. Together with the two edges connecting the components we obtain a cycle of length at most 2(k − 1) + 2 = 2k, contradicting the assumption that G has girth 2k + 1. Therefore, the resulting graph after the contraction has Ω(m) = Ω(n 1+1/k ) edges.
We proceed by proving our asymptotic bounds for contractions with a purely additive error.
Theorem 16. Any graph G has a (1, 2k)-contraction C for any integer 0 ≤ k ≤ n/2 with objective value Φ(C) ≥ km/n that can be computed in time O(m).
This result implies that the number of edges in G/C is at most m−km/n. If G is a path, no (1, 2k)-contraction has an objective value greater than 2k, and km/n = k(1 − 1/n), showing that the objective value in Theorem 16 can be improved by at most a factor of two.
Proof. Let U be the set of k vertices in G of highest degree. Then we have u∈U deg(u) ≥ k/n v∈V deg(v) = k/n · 2m = 2km/n.
Let C be the set of edges incident to any vertex in U . As each edge is incident to at most two vertices in U , we get |C| ≥ 1/2 u∈U deg(u) ≥ km/n from the previous inequality. As no shortest path visits a vertex in U twice, C is indeed a (1, 2k)-contraction. The set C can be computed as follows: We first compute the degrees of all vertices in time O(m), then find the k-th largest element in this list in time O(n), and by another linear time sweep over this list we select k vertices of highest degree. Overall, the required time is O(m).
All of the results above show that contractions can be effectively used to reduce the number of edges in a dense graph. But one possible advantage of using contraction instead of spanners is that it also has the potential to reduce the number of vertices in the graph. Unfortunately, for constant approximation errors, it is not possible to guarantee more than a constant-factor reduction in general graphs: it is not hard to see that given a path on n vertices, any (k, 1)-contraction will still result in at least n/k vertices. The same problem applies to general dense graphs, since they could still contain a long path within them. That being said, it seems likely that in practice contraction can lead to significant vertex reduction in many dense graphs. We ground this practical intuition with the following theoretical result for the special case of graphs with large minimum degree.
Theorem 17. Let D be an integer. Any graph G with minimum degree at least D has a (5, 1)-contraction C such that the contracted graph G/C has at most n/D vertices, and such a contraction can be computed in time O(m).
Proof. Recall the definition of an r-partition. For a cluster P i with center vertex p i we refer to r as the radius of that cluster. This is the maximum distance of all cluster vertices from p i .
We will show how to construct a 3-partition in which the number of clusters P i is at most n/D. Using the exact same argument as in the proof of Theorem 13, such a 3-partition yields the desired (5, 1)-contraction. Our construction first builds clusters of radius 1, and then extends them to clusters of radius 2. The clustering with radius 1 proceeds very similarly as in the proof of Lemma 7.1 before with r = 1. The crucial difference is that we choose as center vertices only vertices with degree at least D. If no such vertices are left, the clustering process terminates, and the remaining unclustered vertices have degree strictly less than D. It is easy to see that since those vertices have degree at least D in the original graph, they must be adjacent to a vertex in a radius 1 cluster. We can thus assign each of those vertices to such a cluster arbitrarily, yielding a clustering of all vertices of G with radius 2.
The number of clusters is at most n/D because by construction every cluster contains at least D vertices. This shows that the number of vertices in the contracted graph is at most n/D.
This algorithm can be implemented in time O(m) by using an adjacency list representation where we keep track of degree information after removing an edge from the graph.
To see that we cannot guarantee less than n/D vertices, even with larger approximation error, consider the graph G that consists of n/D isolated D-cliques. We now show that even if G is connected, we cannot guarantee o(n/D) vertices in the contracted graph, even if we allow a larger (constant) approximation error.
Theorem 18. Let D and k be integers. There exists a graph G with minimum degree D such that any (k, 1)-contraction C results in a graph G/C with Ω(n/(kD)) vertices.
Proof. Assume for simplicity that n is divisible by D. We construct the graph G as follows. We partition the n vertices into n/D layers, with each layer containing exactly D vertices. For 1 ≤ i < n/D, all vertices in layer i receive an edge to all vertices in layer i + 1. Clearly all vertices in the resulting graph have degree at least D. Let u and v be two vertices in layers i and j, respectively. Then clearly we have dist ℓ (u, v) ≥ |j − i|. Now let C be any (k, 1)-contraction on G, and consider the connected components of the graph (V, C). Applying (1) shows that dist ℓ (u, v) ≤ k holds for any two vertices u and v in the same component. Combining these two inequalities shows that every connected component contains vertices from at most k layers. As there are n/D layers, the contracted graph has at least n/(kD) vertices.