Improving TSP tours using dynamic programming over tree decomposition

Given a traveling salesman problem (TSP) tour $H$ in graph $G$ a $k$-move is an operation which removes $k$ edges from $H$, and adds $k$ edges of $G$ so that a new tour $H'$ is formed. The popular $k$-OPT heuristics for TSP finds a local optimum by starting from an arbitrary tour $H$ and then improving it by a sequence of $k$-moves. Until 2016, the only known algorithm to find an improving $k$-move for a given tour was the naive solution in time $O(n^k)$. At ICALP'16 de Berg, Buchin, Jansen and Woeginger showed an $O(n^{\lfloor 2/3k \rfloor+1})$-time algorithm. We show an algorithm which runs in $O(n^{(1/4+\epsilon_k)k})$ time, where $\lim \epsilon_k = 0$. We are able to show that it improves over the state of the art for every $k=5,\ldots,10$. For the most practically relevant case $k=5$ we provide a slightly refined algorithm running in $O(n^{3.4})$ time. We also show that for the $k=4$ case, improving over the $O(n^3)$-time algorithm of de Berg et al. would be a major breakthrough: an $O(n^{3-\epsilon})$-time algorithm for any $\epsilon>0$ would imply an $O(n^{3-\delta})$-time algorithm for the ALL PAIRS SHORTEST PATHS problem, for some $\delta>0$.


Introduction
In the Traveling Salesman Problem (TSP) one is given a complete graph G = (V, E) and a weight function w : E → N. The goal is to find a Hamiltonian cycle in G (also called a tour) of minimum weight. This is one of the central problems in computer science and operation research. It is well known to be NP-hard and has been researched from different perspectives, most notably using approximation [1,4,24], exponential-time algorithms [12,15] and heuristics [23,20,5].
In practice, TSP is often solved by means of local search heuristics where we begin from an arbitrary Hamiltonian cycle in G, and then the cycle is modified by means of some local changes in a series of steps. After each step the weight of the cycle should improve; when the algorithm cannot find any improvement it stops. One of the most successful examples of this approach is the k-opt heuristic, where in each step an improving k-move is performed. Given a Hamiltonian cycle H in a graph G = (V, E) a k-move is an operation that removes k edges from H and adds k edges of G so that the resulting set of edges H ′ is a new Hamiltonian cycle. The k-move is improving if the weight of H ′ is smaller than the weight of H. The k-opt heuristic has been introduced in 1958 by Croes [5] for k = 2, and then applied for k = 3 by Lin [19] in 1965. Then in 1972 Lin and Kernighan designed a complicated heuristic which uses k-moves for unbounded values of k, though restricting the space of k-moves to search to so-called sequential k-moves. A variant of this heuristic called LKH, implemented by Helsgaun [13], solves optimally instances up to 85 900 cities. Among other modifications, the variant searches for non-sequential 4-and 5-moves. From the theory perspective, the quality of the solutions returned by k-opt, as well as the length of the sequence of k-moves needed to find a local optimum, was studied, among others, by Johnson, Papadimitriou and Yannakakis [14], Krentel [17] and Chandra, Karloff and Tovey [3]. More recently, smoothed analysis of the running time and approximation ratio was investigated by Manthey and Veenstra [18] and Künnemann and Manthey [21].
In this paper we study the k-opt heuristic but we focus on its basic ingredient, namely on finding a single improving k-move. The decision problem k-opt Detection is to decide, given a tour H in an edge weighted complete graph G, if there is an improving k-move. In its optimization version, called k-opt Optimization, the goal is to find a k-move that gives the largest weight improvement, if any. Unfortunately, this is a computationally hard problem. Namely, Marx [22] has shown that k-opt Detection is W [1]-hard, which means that it is unlikely to be solvable in f (k)n O(1) time, for any function f . Later Guo, Hartung, Niedermeier and Suchý [11] proved that there is no algorithm running in time n o(k/ log k) , unless Exponential Time Hypothesis (ETH) fails. This explains why in practice people use exhaustive search running in O(n k ) time for every fixed k, or faster algorithms which explore only a very restricted subset of all possible k-moves.
Recently, de Berg, Buchin, Jansen and Woeginger [7] have shown that it is possible to improve over the naive exhaustive search. For every fixed k ≥ 3 their algorithm runs in time O(n ⌊2k/3⌋+1 ) and uses O(n) space. In particular, it gives O(n 3 ) time for k = 4. Thus, the algorithm of de Berg et al. is of high practical interest: the complexity of the k = 4 case now matches the complexity of k = 3 case, and hence it seems that one can use 4-opt in all the applications where 3-opt was fast enough. De Berg et al. show also that a progress for k = 3 is unlikely, namely they show that k-opt Detection has an O(n 3−ǫ )-time algorithm for some ǫ > 0 iff All Pairs Shortest Paths problem can be solved in O(n 3−δ )-time algorithm for some δ > 0.
Our Results. In this paper we extend the line of research started in [7]: we show an algorithm running in time O(n (1/4+ǫ k )k ) and using space O(n (1/8+ǫ k )k ) for every fixed k, where lim ǫ k = 0. We are able to compute the values of ǫ k for k ≤ 10. These values show that our algorithm improves the state of the art for every k = 5, . . . , 10 (see Table 1). A different adjustment of parameters of our algorithm results in time O(n k/2+3/2 ) and additional space of O( √ n), which improves the state of the art for every k ≥ 8.
We also show a good reason why we could not improve over the O(n 3 )-time algorithm of de Berg et al. for 4-opt Optimization: an O(n 3−ǫ )-time algorithm for some ǫ > 0 would imply that All Pairs Shortest Paths can be solved in time O(n 3−δ ) for some δ > 0. Note that although the family of 4-moves contains all 3-moves, it is still possible that there is no improving 3-move, but there is an improving 4-move. Thus the previous lower bound of de Berg et al. does not imply our lower bound, though our reduction is essentially an extension of the one by de Berg et al. [7] with a few additional technical tricks.
We also devote special attention to the k = 5 case of k-opt Optimization problem, hoping that it can still be of a practical interest. Our generic algorithm works in O(n 3.67 )  should be also possible for larger values of k. In Table 1 we present the running times for k = 5, . . . , 10.
Our Approach. Our algorithm applies dynamic programming on a tree decomposition. This is a standard method for dealing with some sparse graphs, like series-parallel graphs or outerplanar graphs. However, in our case we work with complete graphs. The trick is to work on an implicit structure, called dependence graph D. Graph D has k vertices which correspond to the k edges of H that are chosen to be removed. A subset of edges of D corresponds to the pattern of edges to be added (as we will see the number of such patterns is bounded for every fixed k, and one can iterate over all patterns). The dependence graph can be thought of as a sketch of the solution, which needs to be embedded in the input graph G. Graph D is designed so that if it has a separator S, such that D − S falls apart into two parts A and B, then once we find an optimal embedding of A ∪ S for some fixed embedding of S, one can forget about the embedding of A. This intuition can be formalized as dynamic programming on a tree decomposition of D, which is basically a tree of separators in D. The idea sketched above leads to an algorithm running in time O(n (1/3+ǫ k )k ) for every fixed k, where lim ǫ k = 0.
The reason for the exponent in the running time is that D is of maximum degree 4 and hence it has treewidth at most (1/3 + ǫ k )k, as shown by Fomin et al. [8].
The further improvement to O(n (1/4+ǫ k )k ) is obtained by yet another idea. We partition the n edges of H into n 1/4 buckets of size n 3/4 and we consider all possible distributions of the k edges to remove into buckets. If there are many nonempty buckets, then graph D has fewer edges, because some dependencies are forced by putting the corresponding edges into different buckets. As a result, the treewidth of D decreases and the dynamic programming runs faster. The case when there are few nonempty buckets does not give a large speed-up in the dynamic programming, but the number of such distributions is small.

Preliminaries
Throughout the paper let w 1 , w 2 , . . . , w n and e 1 , . . . , e n be sequences of respectively subsequent vertices and edges visited by H, so that e i = {w i , w i+1 } for i = 1, . . . , n−1 and e n = {w n , w 1 }. For i = 1, . . . , n − 1 we call w i the left endpoint of e i and w i+1 the right endpoint of e i . Also, w n is the left endpoint of e n and w 1 is its right endpoint.
We work with undirected graphs in this paper. An edge between vertices u and v is denoted either as {u, v} or shortly as uv.

Connection patterns and embeddings
Formally, a k-move is a pair of sets (E − , E + ), both of cardinality k, where E − ⊆ {e 1 , . . . , e n }, E + ⊆ E(G), and E(H)\E − ∪E + is a Hamiltonian cycle. This is the most intuitive definition of a k-move, however it has a drawback, namely it is impossible to specify E + without specifying E − first. For this reason instead of listing the edges of E + explicitly, we will define a connection pattern, which together with E − expressed as an embedding fully specifies a k-move.
A k-embedding (or shortly: embedding) is any function f : . A connection kpattern (or shortly: connection pattern) is any perfect matching in the complete graph on the vertex set [2k]. We call a connection pattern valid when one obtains a single k-cycle from M by identifying vertex 2i with vertex (2i + 1) mod 2k for every i = 1, . . . , k.
Let us show that every pair (E − , E + ) that defines a k-move has a corresponding pair of an embedding and a connection pattern, consequently giving an intuitive explanation of the above definition of embeddings and connection patterns. Consider a move Q = (E − , E + ). Let E − = {e i 1 , . . . , e i k }, where i 1 < i 2 < · · · < i k . For every j = 1, . . . , k, let v 2j−1 and v 2j be the left and right endpoint of e i j , respectively. An embedding of the k-move Q is the function Note that at least one such matching always exists, and if E − contains two incident edges then there is more than one such matching. Note also that M is valid, because otherwise after applying the k-move Q we do not get a Hamiltonian cycle.
Conversely, consider a pair (f, M ), where f is an increasing embedding and M is a valid connection pattern. We define E − f = {e f (j) | j = 1, . . . , k}. For every j = 1, . . . , k, let v 2j−1 and v 2j be the left and right endpoint of e f (j) , respectively. Then we also define ) is a k-move. Because of the equivalence shown above, in what follows we abuse the notation slightly and a k-move Q can be described both by a pair of edges to remove and add (E − Q , E + Q ) and by an embedding-connection pattern pair (f Q , M Q ). The gain of Q is defined as . Given a connection pattern M and an embedding f , we can also define an M -gain of f , denoted by gain M (f ) = gain(Q), where Q is the k-move defined by (f, M ). Note that k-opt Optimization asks for a k-move with maximum gain.
We note that the notion of connection pattern of a k-move was essentially introduced by de Berg et al. [7] under the name of 'signature', though they used a permutation instead of a matching, which we find more natural. They also show that one can reduce the problem k-opt Optimization so that it suffices to consider only k-moves where E − contains pairwise non-incident edges, but we do not find it helpful in the description of our algorithm (this assumption makes the connection pattern of a k-move unique).

Tree decomposition and nice tree decomposition
To make the paper self-contained, in this section we recall the definitions of tree and path decompositions and state their basic properties which will be used later in the paper. The content of this section comes from the textbook of Cygan et al. [6].
A tree decomposition of a graph G is a pair T = (T, {X t } t∈V (T ) ), where T is a tree whose every node t is assigned a vertex subset X t ⊆ V (G), called a bag, such that the following three conditions hold: (T2) For every uv ∈ E(G), there exists a node t of T such that u, v ∈ X t .
(T3) For every u ∈ V (G), the set {t ∈ V (T ) | u ∈ X t } induces a connected subtree of T .
The width of tree decomposition T = (T, {X t } t∈V (T ) ), denoted by w(T), equals max t∈V (T ) |X t |− 1. The treewidth of a graph G, denoted by tw(G), is the minimum possible width of a tree decomposition of G. When E is a set of edges and V (E) the set of endpoints of all edges in E, by tw(E) we denote the treewidth of the graph (V (E), E).
A path decomposition is a tree decomposition T = (T, {X t } t∈V (T ) ), where T is a path. Then T is more conveniently represented by a sequence of bags (X 1 , . . . , X |V (T )| ), corresponding to successive vertices of the path. The pathwidth of a graph G, denoted by pw(G), is the minimum possible width of a path decomposition of G.
In what follows we frequently use the notion of nice tree decomposition, introduced by Kloks [16]. These tree decompositions are more structured, making it easier to describe dynamic programming over the decomposition. A tree decomposition T = (T, {X t } t∈V (T ) ) can be rooted by choosing a node r ∈ V (T ), called the root of T , which introduces a natural parent-child and ancestor-descendant relations in the tree T . A rooted tree decomposition (T, {X t } t∈V (T ) ) is nice if X r = ∅, X ℓ = ∅ for every leaf ℓ of T , and every non-leaf node of T is of one of the following three types: • Forget node: a node t with exactly one child t ′ such that X t = X t ′ \ {w} for some vertex w ∈ X t ′ .
• Join node: a node t with two children t 1 , t 2 such that X t = X t 1 = X t 2 .
A path decomposition is nice when it is nice as tree decomposition after rooting the path in one of the endpoints. (Note that it does not contain join nodes.) Proposition 1 (see Lemma 7.4 in [6]). Given a tree (resp. path) decomposition T = (T, {X t } t∈V (T ) ) of G of width at most k, one can in time O(k 2 ·max(|V (T )|, |V (G)|)) compute a nice tree (resp. path) decomposition of G of width at most k that has at most O(k|V (G)|) nodes.
) be a tree decomposition of a graph G and let ab be an edge of T . The forest T − ab obtained from T by deleting edge ab consists of two connected components T a (containing a) and

The algorithm
In this section we present our algorithms for k-opt Optimization. The brute-force algorithm verifies all possible k-moves. In other words, it iterates over all possible valid connection patterns and increasing embeddings. The brilliant observation of Berg et al. [7] is that we can iterate only over all possible connection patterns, whose number is bounded by (2k)!. In other words, we fix a valid connection pattern M and from now on, our goal is to find an increasing embedding f : [k] → [n] which, together with M , defines a k-move giving the largest weight improvement over all k-moves with connection pattern M . Instead of doing this by enumerating all Θ(n k ) embeddings, Berg et al. [7] fix carefully selected ⌊2/3k⌋ values of f in all n ⌊2/3k⌋ possible ways, and then show that the optimal choice of the remaining values can be found by a simple dynamic programming running in O(nk) time. Our idea is to find the optimal embedding for a given connection pattern using a different, more efficient approach.

Basic setup
Informally speaking, instead of guessing some values of f , we guess an approximation of f defined by appropriate bucketing. For each approximation b, finding an optimal embedding consistent with b is done by a dynamic programming over a tree decomposition. We would like to note that even without bucketing (i.e, by using a single trivial bucket of size n) our algorithm works in n (1/3+ǫ k )k time. Therefore the notion of bucketing is used to further improve the running time, but it is not essential to perform the dynamic programming on a tree decomposition.
More precisely, we partition the set [n], corresponding to the edges of H, into buckets. Each bucket is an interval {i, i + 1, . . . , j} ⊆ [n], for some 1 ≤ i ≤ j ≤ n. Let n b be the number of buckets and let B j denote the j-th bucket, for j = 1, . . . , n b . A bucket assignment is any nondecreasing function Unless explicitly modified, we use all buckets of the same size ⌈n α ⌉, for a constant α which we set later. Then, for j = 1, . . . , b the j-th bucket is the set Given a bucket assignment b we define the set Note that a b-monotone embedding f : [k] → [n] is always increasing, but a b-monotone partial embedding does not even need to be non-decreasing (this seemingly artificial design simplifies some of our proofs). In what follows, we present an efficient dynamic programming (DP) algorithm which, given a valid connection pattern M and a bucket assignment b finds a b-monotone embedding of maximum M -gain. To this end, we need to introduce the gain of a partial embedding. Let f : S → [n] be a b-monotone partial embedding, for some S ⊆ [k]. For every j ∈ S, let v 2j−1 and v 2j be the left and right endpoint of e f (j) , respectively. We define Note that gain M (f ) does not necessarily represent the actual cost gain of the choice of the edges to remove represented by f . Indeed, assume that for some pair Then we say that i interferes with j, which means that we plan to add an edge between an endpoint of the i-th deleted edge and the j-th deleted edge. Note that if i ∈ S (the i-th edge is chosen) and j ∈ S (the j-th edge is not chosen yet) this edge to be added is not known yet, and its cost is not represented in gain M (f ). However, the value of f (i) influences this cost. Consider the following set of interfering pairs: Note that I M is obtained from M by identifying vertex 2i − 1 with vertex 2i for every i = 1, . . . , k (and the new vertex is simply called i). In particular, this implies the following simple property of I M .

Dynamic programming over tree decomposition
The vertices of the graph correspond to the k edges to be removed from H (i.e., j corresponds to the j-th deleted edge in the sequence e 1 , . . . , e n ). The edges of D M,b correspond to dependencies between the edges to remove (equivalently, elements of the domain of an embedding). The edges from O b are order dependencies: edge {i, i + 1} means that the (i + 1)-th deleted edge should appear further on H than the i-th deleted edge. (Note that in O b there are no edges between the last element of a bucket and the first element of the next bucket -this is because the corresponding constraint is forced by the assignment to buckets.) The edges from I M are cost dependencies (resulting from interference explained in Section 3.1).
The goal of this section is a proof of the following theorem. Let T = (T, {X t } t∈V (T ) ) be a nice tree decomposition of D M,b with minimum width. Such a decomposition can be found in O * (1.7347 k ) time by an algorithm of Fomin and Villanger [10], though for practical purposes a simpler O * (2 k )-time algorithm is advised by Bodlaender et al. [2]. For every t ∈ V (T ) we denote by V t the union of all the bags in the subtree of T rooted in t.
For every node t ∈ V (T ), and for every b-monotone function f : X t → [n], we will compute the following value. T Then, if r is the root of T , and ∅ denotes the unique partial embedding with empty domain, then T r [∅] is the required maximum M -gain of a b-monotone embedding. The embedding itself (and hence the corresponding k-move) can be also found by using standard DP techniques. The values of T t [f ] are computed in a bottom-up fashion. Let us now present the formulas for computing these values, depending on the kind of node in the tree T .
Leaf node. When t is a leaf of T , we know that X t = V t = ∅, and we just put T t [∅] = 0.
Then, we claim that for every b-monotone function f : We show that (1) holds by showing the two relevant inequalities. Let g be a function for which the maximum from the definition of Now we proceed to the other inequality. Assume g ′ is a function for which the maximum from the definition of This finishes the proof that (1) holds.
Forget node. Assume X t = X t ′ \ {i}, for some i ∈ X t ′ where node t ′ is the only child of t. Then the definition of T t [f ] implies that (2) Join node. Assume X t = X t 1 = X t 2 , for some nodes t, t 1 and t 2 , where t 1 and t 2 are the only children of t. Then, we claim that for every b-monotone function f : , Let us first show the ≤ inequality. Let g be a function for which the maximum from the definition of T t [f ] is attained. Let g 1 = g| Vt 1 and g 2 = g| Vt 2 . Note that g 1 and g 2 are b-monotone because g is b-monotone. This, together with the fact that g . Now we proceed to the ≥ inequality. Assume g 1 (resp. g 2 ) is a function for which the maximum from the definition of T t 1 [f ] (resp. T t 2 [f ]) is attained. Let g : V t → [n] be the function such that g| Vt 1 = g 1 and g| Vt 2 = g 2 . Note that g| Xt = f . Then gain M (g) = immediate, since g 1 and g 2 are b-monotone. For (M 2), consider any {j, j + 1} ∈ O b such that {j, j + 1} ⊆ V t . If {j, j + 1} ⊆ V t 1 or {j, j + 1} ⊆ V t 2 then g(j) < g(j + 1) by b-monotonicity of g 1 or g 2 , respectively. Hence, by symmetry, we can assume j ∈ V t 1 \ V t 2 and j + 1 ∈ V t 2 \ V t 1 . However, this cannot happen, because then X t does not separate j from j + 1, a contradiction with Lemma 2.
Running time. Since |V (T )| = O(k), in order to complete the proof of Theorem 4 it suffices to prove the following lemma. For a forget node, a direct evaluation of (2) for all b-monotone functions f : Finally, for a join node a direct evaluation of (3)  3.3 An algorithm running in time O(n (1/3+ǫ)k ) for k large enough We will make use of the following theorem due to Fomin, Gaspers, Saurabh, and Stepanov [8].
Theorem 6 (Fomin et al. [8]). For any ǫ > 0, there exists an integer n ǫ such that for every graph G with n > n ǫ vertices, pw(G) ≤ 1 6 n 3 + 1 3 n 4 + 13 30 n 5 + 23 45 where n i is the number of vertices of degree i in G for any i ∈ {3, . . . , 6} and n ≥7 is the number of vertices of degree at least 7.
We actually use the following corollary, which is rather immediate.
Corollary 7. For any ǫ > 0, there exists an integer n ǫ such that for every multigraph G with n > n ǫ vertices and m edges where for every vertex v ∈ V (G) we have 2 ≤ deg G (v) ≤ 4, the pathwidth of G is at most (m − n)/3 + ǫn.
Proof. The corollary follows from Theorem 6 by the following chain of equalities.
By Lemma 8 it follows that the running time in Theorem 4 is bounded by O(n ( α 3 +ǫ)k ). If we do not use the buckets at all, i.e., α = 1 and we have one big bucket of size n, we get the O(n ( 1 3 +ǫ)k ) bound. By iterating over all at most (2k)! connection patterns we get the following result, which already improves over the state of the art for large enough k.

An algorithm running in time O(n (1/4+ǫ)k ) for k large enough
Let M k be the set of all valid connection k-patterns.
For every M ∈ M k , the optimal value of α M can be found by a simple LP (see Section 3.6). The claim follows.

Saving space
The algorithm from Theorem 11, as described above, uses O(n (1/4+ǫ k )k ) space. However, a closer look reveals that the space can be decreased to O(n (1/8+ǫ k )k ). This is done by exploiting some properties of the specific tree decomposition of graphs of maximum degree 4, described by Fomin et al. [8], which we used in Theorem 6. This decomposition is obtained as follows. Let D be a k-vertex graph of maximum degree 4. As long as D contains a vertex v of degree 4, we remove v. As a result we get a set of removed vertices S and a subgraph D ′ = D − S of maximum degree 3. Then we construct a tree decomposition T ′ of D ′ , of width at most (1/6 + ǫ k )k, given in the paper of Fomin and Høie [9]. The tree decomposition T of D is then obtained by adding S to every bag of T ′ . An inductive argument (see [8]) shows that the width of T is at most 1 3 k 4 + 1 6 k 3 + ǫ k k.  We suppose that more space/time trade-offs are possible by finding small sets whose removal makes the tree decomposition somewhat small.

Small values of k
The value of c(k) in Lemma 10 can be computed using a computer programme for small values of k, by enumerating all connection patterns and using formula (5) to find optimum α. We used a C++ implementation (see http://www.mimuw.edu.pl/˜kowalik/localtsp/localtsp.cpp for the source code) including a simple O(2 k ) dynamic programming for computing treewidth described in the work of Bodlaender et al. [2]. For every valid connection pattern M our program finds the value of min α∈[0,1] max A⊆P k ((1 − α)(k − |A|) + α(tw(I M ∪ A) + 1)) by solving a simple linear program, as follows.
minimize v We get running times for k = 5, . . . , 10 described in Table 2. It turns out that for k = 5, . . . , 10 the running time does not grow when we fix the same size of the buckets n α for all connection patterns, hence in Table 2 we present also the values of α.

A refined analysis of 5-opt Optimization
In this section we focus on 5-opt Optimization problem. This the first case where our findings may have a practical relevance, which motivates us towards a deepened analysis. It turns out that to get the entry for k = 5 in Table 2 we do not need a computer, and the proof is rather short, as one can see below. Proof. Let D = ( [5], I M ∪ A) be the dependence multigraph. Since K 5 is the only 5-vertex graph with treewidth larger than 3, and D has at most different 9 edges, we note that tw(D) ≤ 3. One can see that the tight cases in the above proof are |A| = 0 and |A| = 2. A closer look at the |A| = 2 case reveals that the source of hardness of this case is a single (up to isomorphism) graph ( [5], I M ∪ A) of treewidth 3. It turns out that using a different bucket partition design one can save some running time in this particular case. The details are given in the proof of Theorem 16. However, first we need a simple technical lemma, which extends Lemma 5 to general (not necessarily nice) path decompositions (it is true also for tree decompositions, but we do not need it). Proof. We create a nice path decomposition of D as follows. For every q = 1, . . . , r − 1 we insert between X q and X q+1 a sequence of forget nodes (one for every j ∈ X q \ X q+1 ) followed by a sequence of introduce nodes (one for every j ∈ X q+1 \ X q ). Thus, the resulting path decomposition has at most rk nodes. It is clear that for each of the added forget nodes with a bag X, we have i∈X s i ≤ i∈Xq s i , and for each of the added introduce nodes with a bag X, we have i∈X s i ≤ i∈X q+1 s i . The claim follows by Lemma 5. Proof. We will refine the proof of Theorem 16 by looking closer at the |A| = 2 case. ab ∈ I M . Then I M has one cycle abdce. Then D is the same 5-cycle, so it has pathwidth 2. CASE 1.2.3: ad ∈ I M . Then I M has one cycle adbce. Note that D contains a minor of K 4 , so it has treewidth 3. It follows that we need to modify the algorithm. We partition the bucket containing a and b into n α/2 buckets of size n α/2 and we consider all possible assignments of a and b to these buckets.
First consider the assignments where a and b are in the same small bucket. There are at most n 3(1−α) n α/2 = n 3−2.5α such assignments. Consider a path decomposition of D consisting of two adjacent nodes p and q with bags X p = {a, b, c, d} and X q = {a, b, e}. Note that each of the bags contains two vertices from a bucket of size n α/2 and at most two vertices from a bucket of size n α . By Lemma 15 nodes p and q can be processed in time O(n 2·α/2 · n 2α ) = O(n 3α ). Hence the computation for the assignments where a and b are in the same small bucket take O(n 3+α/2 ) time in total. Now consider the assignments where a and b are in different small buckets. There are at most n 3(1−α) n 2α/2 = n 3−2α such assignments. However, the corresponding dependence graph To sum up, by Case 1 of the proof of Theorem 14, Case 1 of the proof of Theorem 16 and Cases 1 and 2 above, the algorithm works in time O(n 5−2α + n 3+α/2 + n 2+ 5 3 α + n 1+3α ). Putting α = 4/5 finishes the proof.

Lower bound for k = 4
In this section we show a hardness result for 4-opt Optimization. More precisely, we work with the decision version, called 4-opt Detection, where the input is the same as in 4-opt Optimization and the goal is to determine if there is a 4-move which improves the weight of the given Hamiltonian cycle. To this end, we reduce the Negative Edge-Weighted Triangle problem, where the input is an undirected, complete graph G, and a weight function w : E(G) → Z. The goal is to determine whether G contains a triangle whose total edge-weight is negative.
Lemma 18. Every instance I = (G, w) of Negative Edge-Weighted Triangle can be reduced in O(|V (G)| 2 ) time into an instance I ′ = (G ′ , w ′ , C) of 4-opt Detection such that G contains a triangle of negative weight iff I ′ admits an improving 4-move. Moreover, |V (G ′ )| = O(|V (G)|), and the maximum absolute weight in w ′ is larger by a constant factor than the maximum absolute weight in w.
Let W be the maximum absolute value of a weight in w. Then let M 1 = 5W + 1 and M 2 = 21M 1 + 1 and let in other case.
Note that the cases are not overlapping. (Note also that although some weights are negative, we can get an equivalent instance with nonnegative weights by adding M 2 to all the weights.) Let C = a 1 , b 1 , . . . , a n , b n , b ′ n , a ′ n , . . . , b ′ 1 , a ′ 1 . If there is a negative triangle v i , v j , v k for some i < j < k in G then we can improve C by removing edges (a i , b i ), (a j , b j ), (a k , b k ) and (a ′ k , b ′ k ) and inserting edges (a i , b j ), (a j , b k ), (a k , b ′ k ) and (a ′ k , b i ). We obtain a cycle The total weight of the removed edges is M 1 + M 1 + M 1 + (−3M 1 ) = 0 and the total weight of the inserted edges is Let us assume that C can be improved by removing 4 edges and inserting 4 edges. Note that all the edges of weight −M 2 belong to C and all the edges of weight M 2 do not belong to C. All the other edges have absolute values of their weights bounded by 3M 1 . Therefore even a single edge of the weight −M 2 cannot be removed and even a single edge of the weight M 2 cannot be inserted because a loss of M 2 cannot be compensated by any other 7 edges (inserted or removed), as they can result in a gain of at most 7 · 3M 1 < M 2 . Hence in the following we treat edges of weights ±M 2 as fixed, i.e., they cannot be inserted or removed from the cycle. Note that the edges of C that can be removed are only the edges of the form (a i , b i ) (of weights M 1 ) and (a ′ i , b ′ i ) (of weights −3M 1 ). All the edges of weight −3M 1 already belong to C and all the remaining edges of the graph that can be inserted or removed from the cycle are the edges of the weight M 1 belonging to C and the edges of absolute values of their weights bounded by W. Therefore we cannot remove more than one edge of the weight −3M 1 from C because a loss of 6M 1 cannot be compensated by any 2 removed and 4 inserted edges (we could potentially gain only 2M 1 + 4W < 3M 1 ). Figure 1: A simplified view of the instance (G ′ , w ′ , C) together with an example of a 4-move.
The added edges are marked as blue (dashed) and the removed edges are marked as red (dotted).
Hence we can remove at most one edge of the weight −3M 1 from C. For the same reason if we do remove one edge of the weight −3M 1 (i.e., of the form (a ′ i , b ′ i )) from C we need to remove also three edges of the weights M 1 (i.e., of the form (a j , b j )) in order to compensate the loss of 3M 1 (otherwise we could compensate up to 2M 1 + 5W < 3M 1 ).
Note that the only edges that can be added (i.e., the edges with the weights less than M 2 that do not belong to C) are the edges of the form (a i , b j ) for i < j, (a ′ i , b j ) for j < i and (a i , b ′ i ). Therefore if the removed edges from G[V up ] are (a i 1 , b i 1 ), . . . , (a i ℓ , b i ℓ ) for some i 1 < . . . < i ℓ (and no other edges belonging to G[V up ]) then in order to close the cycle we need to insert some edge incident to b i 1 but since for any i 0 < i 1 there is no removed edge (a i 0 , b i 0 ) it cannot be an edge of the form (a i 0 , b i 1 ). Hence it has to be an edge of the form (a ′ j , b i 1 ) for some j > i 1 . But then also the edge (a ′ j , b ′ j ) has to be removed. Therefore if we remove at least one edge of the form (a i , b i ) then we need to remove also an edge of the form (a ′ j , b ′ j ) (and as we know this implies also that at least three edges of the form (a i , b i ) have to be removed). So if any edge is removed, then exactly three edges of the form (a i , b i ) and exactly one edge of the form (a ′ j , b ′ j ) have to be removed. Note that this implies also that the total weight of the removed edges has to be equal to zero.
Clearly the move has to remove at least one edge in order to improve the weight of the cycle. Let us assume that the removed edges are (a i , b i ), (a j , b j ) and (a k , b k ) for some i < j < k and (a ′ ℓ , b ′ ℓ ) for some ℓ. For the reason mentioned in the previous paragraph in order to obtain a Hamiltonian cycle one of the inserted edges has to be the edge (a ′ ℓ , b i ). Also the vertex b j has to be connected with something but the vertex a ′ ℓ is already taken and hence it has to be connected with the vertex a i . Similarly the vertex b k has to be connected with a j because a ′ ℓ and a i are already taken. Thus a k has to be connected with b ′ ℓ and this means that k = ℓ. The total weight change of the move is negative and therefore the total weight of the added edges has to be negative (since the total weight of the removed edges is equal to zero). Thus we have w(v i , v j ) + w(v j , v k ) + w(v k , v i ) = w ′ (a i , b j ) + w ′ (a j , b k ) + w ′ (a ′ k , b i ) + w ′ (a k , b ′ k ) < 0. So v i , v j , v k is a negative triangle in (G, w).

Theorem 19.
If there is ǫ > 0 such that 4-opt Detection admits an algorithm in time O(n 3−ǫ · polylog(M )), then there is δ > 0 such that both Negative Edge-Weighted Triangle and All Pairs Shortest Paths admit an algorithm in time O(n 3−δ · polylog(M )), where in all cases we refer to n-vertex input graphs with integer weights from {−M, . . . , M }.
Proof. The first part of the claim follows from Lemma 18, while the second part follows from the reduction of All Pairs Shortest Paths to Negative Edge-Weighted Triangle by Vassilevska-Williams and Williams (Theorem 1.1 in [25]).