Abstract 1 Introduction 2 Preliminaries 3 Complexity of Counting 2-Optimal Tours 4 Counting 2-Optima in Random Instances 5 Discussion References

Counting Locally Optimal Tours in the TSP

Bodo Manthey ORCID Faculty of Electrical Engineering, Mathematics, and Computer Science, University of Twente, The Netherlands Jesse van Rhijn ORCID Faculty of Electrical Engineering, Mathematics, and Computer Science, University of Twente, The Netherlands
Abstract

We show that the problem of counting 2-optimal tours in instances of the Travelling Salesperson Problem (TSP) on complete graphs is #P-complete. In addition, we show that the expected number of 2-optimal tours in random instances of the TSP on complete graphs is O(1.2098nn!). Based on numerical experiments, we conjecture that the true bound is at most O(n!), which is approximately the square root of the total number of tours.

Keywords and phrases:
Travelling salesman problem, probabilistic analysis, local search, heuristics, 2-opt
Funding:
Jesse van Rhijn: Supported by NWO grant OCENW.KLEIN.176.
Copyright and License:
[Uncaptioned image] © Bodo Manthey and Jesse van Rhijn; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Randomness, geometry and discrete structures
; Theory of computation Approximation algorithms analysis ; Theory of computation Discrete optimization
Related Version:
Full Version: https://arxiv.org/abs/2410.18650
Acknowledgements:
We thank Alexander E. Black for helpful discussions that led to the geometric perspective outlined in Section 4.4.
Editors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

1 Introduction

The Travelling Salesperson Problem is among the best-studied problems in computer science. It can be stated compactly: given a weighted graph G=(V,E) with edge weights w:E, find the Hamiltonian cycle (tour) on G with the smallest total weight. The TSP is a classic example of a hard optimization problem, being even among Karp’s original 21 NP-hard problems [23].

Owing to this hardness, practitioners often turn to approximate methods. One extremely successful method is local search [25]. This is a general optimization framework where one modifies an existing (sub-optimal) solution into a better solution.

The simplest local search heuristic for the TSP is 2-opt [2]. This heuristic takes as its input a tour T, and finds two sets of two edges each, {e1,e2}T and {f1,f2}T, such that exchanging {e1,e2} for {f1,f2} yields again a tour T, and the total weight of T is strictly less than the total weight of T. This procedure is repeated with the new tour, and stops once no such edges exist. The resulting tour is said to be locally optimal with respect to the 2-opt neighborhood.

A convenient way to view 2-opt (and other local search heuristics) is via the transition graph 𝒯. This directed graph contains a node for every tour of G. An arc (T1,T2) exists in 𝒯 if and only if T2 can be obtained from T1 by a 2-opt step and T2 has strictly lower cost than T1. The sinks of 𝒯 are exactly the locally optimal tours of G. A run of 2-opt can then be characterized by a directed path through 𝒯 ending in a sink.

Much research has previously focused on understanding the running time of 2-opt [9, 16, 17, 26, 27] and its approximation ratio [6, 15, 16, 20, 24]. On the other hand, little is known about the structure of 𝒯. In this paper, we are concerned with counting the sinks of 𝒯, which is equivalent to counting 2-optimal tours in the instance represented by 𝒯.

There are practical reasons to study the transition graphs of local search heuristics. First, observe that the transition graph of 2-opt for the random TSP instances we consider is a type of random directed acyclic graph. A run of 2-opt can be viewed as a path through this random DAG, so that the the length of the longest path in 𝒯 is an upper bound for the number of iterations 2-opt can perform. This upper bound is however rather crude: If we consider a run of 2-opt with a random initialization, then the probability that we start the run on a node of the longest path is likely small. If most paths are much shorter, then this can provide a better explanation for the practical running time of 2-opt than only studying the longest path.

Structural results on the transition graph may in addition have implications for the running time of metaheuristics. In particular, 2-opt is often used as the basis of simulated annealing, a physics-inspired metaheuristic [2, Chapter 8]. It has long been known that the structure of the transition graph strongly influences the running time of this algorithm. Structural parameters of this graph often enter convergence results and running time estimates [19, 21, 28]. A recent result by Chen et al. [10] especially illustrates this point. They showed that the Metropolis process (in essence, simulated annealing at a constant temperature) is unable to find even very large planted cliques in an Erdős-Rényi random graph. Their analysis hinges on several structural results on cliques in such random graphs.

In particular, the aforementioned results rely (roughly) on the following structural property of the transition graph: If S is the set of desired cliques, then the set δS of neighbors of S is small. Since simulated annealing must pass through δS to reach S, it can be shown that the hitting time of S is large in expectation.

The result by Chen et al., as well as the preceding result by Jerrum [21], deals with the purely discrete problem of finding cliques. However, simulated annealing is often applied to weighted problems [1], which yields significant challenges in understanding the transition graph. We believe that understanding more about the structure of transition graphs is key to proving rigorous results on simulated annealing for weighted problems.

To be explicit, for 2-opt, if S is the set of 2-optimal tours, then trivially |δS|O(n2)|S|. Thus, if |S| is sufficiently small (Theorem 3), then we easily obtain a lower bound for the hitting time of the 2-optimal tours. We do not perform this analysis here, since (i) it is not the main focus of this paper, and (ii) the resulting bound is still somewhat weak: a better bound likely needs a more thorough structural investigation of the transition graph.

Results

We start by showing that the problem of counting 2-optimal tours is #P-complete, even on complete weighted graphs. Recall that #P is the counting analogue to NP, asking not whether a solution exists but how many solutions exist. A formal definition of #P is provided in Section 2.

Our result is in fact slightly stronger. To state it in full, we need the notion of a path cover. A set of paths 𝒫 in a graph G is a path cover if every vertex of G is contained in exactly one path of 𝒫. We then have the following.

Theorem 1.

Let f2-opt be a function that maps a complete weighted graph on the vertex set V to the number of 2-optimal tours on this graph. Using |V| calls to f2-opt, we can compute the number of path covers of size  for each 1|V| in polynomial time, using f2-opt as an oracle.

This result yields #P-hardness of #2Opt on complete graphs as a corollary. This counting problem asks the following question: Given a weighted graph G, how many 2-optimal tours are there on G? Note that counting Hamiltonian cycles is trivial on complete graphs, whereas hardness of counting 2-optimal tours in the same setting is not immediately obvious.

Theorem 2.

#2Opt is #P-complete, even on complete graphs.

We note that the result remains true for metric TSP instances on complete graphs, which can be seen to hold by adding a sufficiently large number to every edge weight of the original instance.

While counting 2-optimal tours on complete graphs is thus likely intractable in general, we may still wonder about the average case. In the case where the edge weights are given by independent uniformly distributed random variables, we obtain the following upper bound on the number of 2-optimal tours.

Theorem 3.

Let G be a complete graph on n=2k+1 vertices, with edge weights drawn independently from U[0,1] for each edge. Then the expected number of 2-optimal tours on G is bounded from above by O(1.2098nn!).

In the process of proving Theorem 3 we obtain a link between the number of 2-optimal tours and the probability that all entries of a multivariate normal random vector are positive. This quantity is also known as the positive orthant probability. To estimate this probability we prove the following theorem.

Theorem 4.

Let X be a multivariate normal vector with zero mean and covariance matrix Σ. The positive orthant probability (+d)=(X+d) satisfies

(+d)exp(12i=1dΣii1(𝔼[Xi2|+d]))2d1ed/2.

In particular, if the diagonal elements of Σ1 are each Σii1=1, then

(+d)2d+1ed/2exp(12𝔼[X22|+d]).

Theorem 4 makes no assumptions on the covariance matrix Σ, and thus holds for any set of zero-mean multivariate normal variables. Hence, it may be of independent interest in applications where bounds on positive orthant probabilities are necessary.

2 Preliminaries

2.1 Notation and Definitions

We start with some notational shorthand. Given a symbol a, we write am for the string consisting of m copies of a. Throughout, log() denotes the logarithm to base 2. We denote the positive orthant of d by =+d{xd|xi>0,i[d]}. The negative orthant d is defined similarly.

Let G=(V(G),E(G)) be a simple graph. For T a tour through G, we call the edges of T the tour-edges and the edges of E(G)T the chord-edges. A 2-change on G then removes two tour-edges from T and adds two chord-edges to T.

For a fixed tour, if two tour-edges are removed then there is only one choice of chord-edges that yields a new tour. Thus, we can characterize a 2-change fully by the tour-edges it removes. If a 2-change removes the tour-edges e and f we denote this 2-change by ST(e,f), omitting the subscript T when the tour is clear from the context. We say that two 2-changes on T are chord-disjoint if they have no chord-edges in common.

Given a set 𝒮 of 2-changes, we define P(𝒮)={{e,f}ST(e,f)𝒮} as the set of pairs of tour-edges that participate in the 2-changes in 𝒮. For eT, we define ke(𝒮)=|{pP(𝒮)ep}|, the number of 2-changes in 𝒮 in which e participates. For each of these quantities, we may omit the argument 𝒮 whenever the set meant is clear from context.

Counting Complexity

For our complexity results, we state the definitions of #P and #P-completeness taken verbatim from Arora and Barak [4].

Definition 5.

A function f:{0,1} is in #P if there exists a polynomial p: and a polynomial-time Turing machine M such that for every x{0,1},

f(x)=|{y{0,1}p(x)|M(x,y)=1}|.

For completeness, we need to recall the complexity class FP, which is the functional analogue to P. This means that FP is the set of functions f:{0,1}{0,1} computable by a polynomial-time deterministic Turing machine. Moreover, we denote by FPf the set of such functions where the Turing machine additionally has access to an oracle for f.

Definition 6.

A function f is #P-complete if it belongs to #P and every g#P is in FPf.

By this definition, if we can solve some #P-complete problem in polynomial time, then we can solve every problem in #P in polynomial time.

2.2 Multivariate Normal Distribution

We require some basic facts about multivariate normal distributions. Let μd and let Σd×d be symmetric and positive definite. The multivariate normal distribution 𝒩(μ,Σ) is defined by the probability density function

f(x)=exp(12(xμ)TΣ1(xμ))(2π)d/2detΣ (1)

with support d.

Let X𝒩d(μ,Σ). The positive orthant probability of X is the probability that X falls in the positive orthant d+. When μ=0 this quantity is also referred to simply as the orthant probability, without specifiying which orthant, as each orthant is related by flipping the sign of a set of coordinates.

3 Complexity of Counting 2-Optimal Tours

We now proceed to show hardness of counting 2-optimal tours on complete graphs. (For space reasons, we defer some proofs to the appendix.) We start by recalling the notion of a path cover. Given a simple graph G=(V,E), a path cover 𝒫 of G is a collection of vertex-disjoint paths such that every vV belongs to exactly one path of 𝒫. The size of 𝒫 is the number of paths in 𝒫. Note that a path cover of size 1 is equivalent to a Hamiltonian path, and that a path cover may contain paths that consist of a single vertex.

From an instance G=(V,E) of #HamPath we construct a family of instances Gm=(VS,EF) of #2Opt for m|V|+1. First, we add a new set S of vertices to Gm, where |S|=m|V|+1. To ensure that the reduction is polynomial-time computable, we also require m2|V|; the reason for this choice will be apparent shortly. The edges of Gm are the edges of G, which are assigned a weight of 0 plus the set of edges F, which contains:

  • all missing edges between the vertices of V, with weight M=4, which we call non-edges;

  • all edges between the vertices of S, with weight N=2, which we call S-edges;

  • all edges between the vertices of V and the vertices of S, with weight L=1, which we call (V,S)-edges.

The precise values of MN and L are not important, only that they are related as MN=2L. See Figure 1 for a schematic depiction of the reduction.

By this construction, Gm is a complete graph on |V|+m vertices. We claim that by computing the number f2-opt(Gm) of 2-optimal tours on Gm for m{|V|+1,,2|V|} we can compute the number of path covers of G of size 1|V|. To this end, we first characterize the 2-optimal tours on Gm.

(a) The reduction we use to prove #P-hardness of #2Opt. The set V represents the vertices of the original graph, and S is the set of m vertices added in the reduction. Each depicted edge is labelled with its weight. Note that in Gm, the non-edges of G are added as well (represented here by the dashed edge).
(b) A 2-optimal tour through Gm consisting of two segments. Solid black curves represent paths through V, solid gray lines are paths through S, and dotted lines are (V,S)-edges.
Figure 1: Schematic depiction of the reduction we use to prove #P-hardness of #2Opt, and of a 2-optimal tour in the image instance.
Lemma 7.

A tour T through Gm is 2-optimal if and only if it contains no non-edges.

Proof.

To begin, we note that since |S|>|V|, any tour must contain at least one S-edge. Now suppose T contains a non-edge uvE, and let ab be an S-edge in T. Assume these vertices are traversed in the written order on T. Then we replace uv and ab by ua and vb, a 2-change which decreases the tour length by M+N2L>0. Hence, T is not 2-optimal, proving one direction.

For the other direction, suppose T contains no non-edges. A 2-change that adds a non-edge to the tour increases the tour length by at least M2L0; hence, any improving 2-change cannot add any non-edges to the tour. Any 2-changes on T must then involve only edges of GS-edges and (V,S)-edges. We go case by case, considering the possible types of the edges removed by any 2-change on T.

Two S-edges.

Since the vertices involved are all vertices of S the added edges are also S-edges, and hence the 2-change does not change the length of the tour at all and is therefore not feasible.

Two edges of G.

Since these edges have zero weight the tour length cannot decrease, and the 2-change is not feasible.

Two (V,S)-edges.

There are two possibilities for the added edges. In one case we obtain one S-edge and one edge of G, for an improvement of 2LN=0. In the other case we obtain again two (V,S)-edges, again for no improvement.

One S-edge ab and one edge uv of G.

The added edges are au and bv, both of which are (V,S)-edges and thus of weight L. The tour is thus shortened by N2L=0, and the 2-change yields no improvement.

One (V,S)-edge au and one edge vw of G.

The added edges are av and uw. The added edges are again a (V,S)-edge and an edge of G, yielding no improvement.

One (V,S)-edge au and one S-edge bc.

The added edges consist of one (V,S)-edge and one S-edge, and there is no change in tour length.

These cases cover all possibilities, and hence a tour that contains no non-edges is 2-optimal, concluding the proof.

Let T be a 2-optimal tour through Gm. If we remove from T all edges incident to S, then we obtain a path cover of G. We call the size of the resulting path cover the number of segments of T. On the other hand, from any path cover of G, we can construct many 2-optimal tours by connecting the paths in the cover using vertices of S.

More formally, we say that a path cover 𝒫 corresponds to T if we obtain 𝒫 by removing the edges of T incident to S. Note that two distinct path covers of G correspond to two disjoint sets of 2-optimal tours through Gm, and every 2-optimal tour corresponds to exactly one path cover. The following lemma counts the number of 2-optimal tours that correspond to a single path cover of size .

Lemma 8.

A path cover of size  corresponds to exactly 21m!(m1)!(m)! 2-optimal tours in Gm with  segments.

Proof.

By Lemma 7, the 2-optimal tours are exactly those tours which contain no non-edges. Given a path cover 𝒫 of size , we can then construct a 2-optimal tour in Gm as follows.

Any tour must visit all vertices in S in some order. Hence, we first fix a tour TS through S; there are 12(m1)! such choices. Next, we break  edges of TS; there are (m) choices of which edges to break.

We must now insert the  paths of 𝒫 in place of these broken edges, reconnecting the endpoints of each path to the endpoints of the broken edge it replaces. Note that whenever we perform this operation, there are two possible ways to connect the path. Moreover, there are ! ways to match an endpoint to a broken edge, for a total of 2! ways to insert the paths.

Putting the pieces together, we thus have

(m1)!2(m)2!=21m!(m1)!(m)!

ways to construct a 2-optimal tour through Gm.

Let c(,m)=21m!(m1)!(m)!, and let C|V|×|V| be a matrix with entries Cij=c(i,j+|V|). Recall that  runs from 1 to |V| and m runs from |V|+1 to 2|V|.

Lemma 9.

The matrix C defined above has full rank.

Proof.

To start, we write out the entries of C explicitly. Letting n=|V|,

Cij=2i1(n+j)!(n+j1)!(n+ji)!.

Scaling any row or column of a matrix uniformly does not change its rank. Hence, we multiply column j by 1(n+j)!(n+j1)!, and subsequently multiply row i by 1/2i1. Finally, interchanging any two rows also does not change the rank of the matrix, and hence we also mirror the rows of the matrix. The resulting matrix C has entries

Cij=1(i+j1)!.

To show that this matrix has full rank, we first recall that a square matrix has full rank if and only if its determinant is nonzero. While we could in principle compute the determinant exactly, this is tedious and not necessary for our purposes. Hence, we use a concise argument from analysis to only show detC0 [30], reproduced here for the sake of completeness. The proof uses the notion of the Wronskian W(x) of a set of functions fj(x), which is the determinant of the matrix with (i,j)-th element fj(i)(x). If the functions are analytic, then W(x) vanishes identically if and only if the functions are linearly dependent [5].

We observe that the determinant of C is, up to a sign, the Wronskian of the functions fj(x)=xn+j1(n+j1)! evaluated at x=1. Since these functions are polynomials of different degrees, they are linearly independent and hence from analyticity of the polynomials it follows that the Wronskian does not vanish identically. By factoring out all the powers of x from the rows and columns, we see that the Wronskian is a power of x multiplied by its value at x=1, which is detC up to a sign. As the Wronskian does not vanish identically, we must have detC0.

Now we proceed to our main algorithmic result, from which #P-completeness of #2Opt follows as a corollary.

Theorem 1. [Restated, see original statement.]

Let f2-opt be a function that maps a complete weighted graph on the vertex set V to the number of 2-optimal tours on this graph. Using |V| calls to f2-opt, we can compute the number of path covers of size  for each 1|V| in polynomial time, using f2-opt as an oracle.

Proof.

For a graph G, let Gm be the complete weighted graph resulting from the reduction described above. Let a(G) be the number of path covers of size , and let b(Gm) be the number of 2-optimal tours on Gm consisting of  segments. From Lemma 8, we have

b(Gm)=c(,m)a(G).

We have f2-opt(Gm)==1|V|b(Gm)==1|V|c(,m)a(G).

We aim to compute the numbers a(G) for each [|V|]. Let a=(a(G))[|V|], and let b=(f2-opt(Gm))m=|V|+12|V|. Then the above yields the matrix equation b=CTa.

By Lemma 9, the matrix C has full rank, and is hence invertible. While C is rather ill-conditioned, the elements of C can be encoded using a number of bits polynomial in |V|+|E|. Thus, after making |V| calls to f2-opt to compute the vector b, we can compute a in polynomial time. The entries a of a are, by construction, exactly the number of path covers of size  of G.

Theorem 2. [Restated, see original statement.]

#2Opt is #P-complete, even on complete graphs.

Proof.

Membership in #P is obvious, since the problem of verifying 2-optimality of a tour is in P. For hardness, we rely on the #P-hardness of #HamPath, which was shown by Valiant [29]. Note that the number of Hamiltonian paths through a graph G is exactly the number of path covers of size =1. By Theorem 1, given oracle access to f2-opt, we can solve #HamPath – and thus every problem in #P – in polynomial time, and hence #2Opt is #P-complete when restricted to complete graphs.

4 Counting 2-Optima in Random Instances

While counting 2-optimal tours is in general hard on complete graphs, we can still provide some results for special cases. (Again, missing proofs may be found in the appendix.) In this section, we restrict our attention to complete graphs on n=2k+1 vertices for some integer k. The weights of the edges of our graphs are drawn independently from the uniform distribution on [0,1].

The strategy we use is to bound the probability that a given tour is 2-optimal. Since we assume the input graph is complete, this probability is the same for all tours. It then suffices to multiply this probability by the total number of tours, which is 12(n1)!.

To bound the probability of a tour T being 2-optimal we find a large set 𝒮 of mutually chord-disjoint 2-changes on T. We then apply a general result that links the probability of 2-optimality of T to how often each edge of T is used in 𝒮 (Lemma 10). Details of the construction of this set are given in Section 4.2.

4.1 Orthant Probabilities and 2-Optimality

Let 𝒮 be a set of chord-disjoint 2-changes on a tour T on the complete graph on n vertices, and define for xE

g𝒮(x)=exp({e,f}P(𝒮)xexfke(𝒮)kf(𝒮)).

Now let X=(Xe)eT,ke(𝒮)>0 be a sequence of independent half-normal distributed random variables with unit variance. We define the function

𝒢(𝒮)=𝔼[g𝒮(X)].

Observe that 𝒢 is solely a function of 𝒮.

Lemma 10.

Let G be a complete graph on n vertices, with each edge of G independently assigned a weight drawn from U[0,1]. Let T be any tour through G. Let 𝒮 be a set of chord-disjoint 2-changes on T. Then

(T is 2-optimal)𝒢(𝒮)eTke(𝒮)>0π2ke(𝒮).
Proof sketch.

We start by conditioning on the values of the weights on the edges of T. Since 𝒮 is chord-disjoint, the events {Δ(S)0}S𝒮 are independent subject to this conditioning. Since the event that T is 2-optimal is equivalent to S𝒮(Δ(S)0), the probability of this event then separates into the product of the probabilities that each 2-change in 𝒮 is non-improving. These probabilities are then straightforwardly computed. In the resulting expression, we recognize the function 𝒢 defined above, as well as the joint density of a multivariate half-normal distribution, from which the lemma follows after normalizing this density.

Bounding (T is 2-optimal) now involves two steps. First, we must construct a set 𝒮 of chord-disjoint 2-changes such that each edge of T is used in many 2-changes in 𝒮. Second, given this set, we must bound 𝒢(𝒮).

We remark now that 𝒢(𝒮) is trivially bounded from above by 1. However, this leaves a factor of about (π/2)n/2. Although this factor is small compared to eT,ke>01ke(𝒮) for the set 𝒮 we construct, leaving it in is somewhat unsatisfactory. We thus make an attempt to prove a stronger bound for 𝒢.

Computing 𝒢(𝒮) directly is unfeasible. It helps to recast it in terms of a positive orthant probability.

Lemma 11.

For a set 𝒮 of chord-disjoint 2-changes on a tour T, let k denote the number of tour-edges with ke(𝒮)>0. Label these edges of T arbitrarily from 1 to k. For any I[k], the function 𝒢(𝒮) is bounded from above by 2|I|detΣ(iIZi>0), where (Zi)iI is distributed according to a multivariate normal distribution with mean 0 and inverse covariance matrix Σ1|I|×|I| with entries indexed by I,

Σij1={1, if i=j,sij, otherwise,

where sij1/kikj.

Proof.

Since g𝒮(x)1 and g𝒮 is decreasing in each variable, we can bound g𝒮 from above uniformly by setting the coefficients of any subset of the variables to zero. Setting these to zero for all variables outside I then leaves

𝒢(𝒮)(2π)|I|/2+|I|exp(12iIxi2)exp(iPI(𝒮)xixjkikj)iIdxi,

where PI(𝒮) is the same as P(𝒮), except we keep only pairs with both elements in I.

Observe that exixj/kikjesijxixj, allowing us to replace the coefficients of these products. We now only need to insert the appropriate normalization factor for a multivariate normal distribution. Comparing the resulting expression with Equation 1 completes the proof.

Still, computing the positive orthant probability directly is rather difficult. Explicit formulas are known for low-dimensional cases, as well as general recursive formulas [3, 11, 12], but none of these are particularly helpful in bounding 𝒢(𝒮). As we only need a nontrivial upper bound, some simplifications are possible. In Theorem 4 (restated below), we show that it suffices to bound the expected squared norm of a multivariate normal vector.

Theorem 4. [Restated, see original statement.]

Let X be a multivariate normal vector with zero mean and covariance matrix Σ. The positive orthant probability (+d)=(X+d) satisfies

(+d)exp(12i=1dΣii1(𝔼[Xi2|+d]))2d1ed/2.

In particular, if the diagonal elements of Σ1 are each Σii1=1, then

(+d)2d+1ed/2exp(12𝔼[X22|+d]).

4.2 Finding Chord-Disjoint 2-Changes

Figure 2: Colors of the edges in the tour T at stage t=3, for n=24+1. The dotted lines are drawn to show the boundaries of each segment Ti more clearly. The segments of the tour are numbered starting at the right and proceeding counterclockwise. The 2-changes we consider in the proof of Lemma 12 are then the 2-changes formed from the red edges in Ti (drawn black) and the blue edges in Ti+1 (drawn gray) that appear in even positions along T, for i odd. Note that the last edge along T is drawn dashed to indicate that it is not used in the construction of 𝒮.

Next, we construct a set 𝒮 of 2-changes such that any two 2-changes in 𝒮 are chord-disjoint, and most of the edges of T participate in many 2-changes in 𝒮. The construction we provide here works for complete graphs with n=2k+1 vertices for some integer k.

The construction of 𝒮 proceeds as follows. Recall that we ordered the edges of T as (e1,e2,,en). We define the following process on T, occurring in stages. At stage t we divide the tour into 2t equal segments {T1,,T2t}, starting at e1. In each stage, we color all the edges in each odd segment red and the edges in each even segment blue. The only exception to this rule is the last edge en, which we color black; this edge is not used to form any 2-changes. See Figure 2 for an illustration at stage t=3.

At each stage we consider the 2-changes formed by the red edges in each odd segment Ti together with the even blue edges in its successor segment Ti+1. We say that these are the 2-changes added in stage t, and denote the set of these 2-changes by 𝒮t. We continue this process for log(n1)1 stages. Note that in the final stage, each segment contains two colored edges.

With some effort, one can show that 𝒮 is chord-disjoint.

Lemma 12.

The 2-changes in 𝒮=t=1logn1𝒮t are chord-disjoint.

We now determine how often each edge of T is used in 𝒮, which we need to apply Lemma 10.

Figure 3: Illustration of the construction used in the proof of Lemma 13. The path indicated in bold assigns a binary string 011 to each edge in T5.
Lemma 13.

For 𝒮 as constructed above, we have {ke(𝒮)}eT={0,1,2,,n3}. Moreover, there are two edges with ke(𝒮)=0 and two edges with ke(𝒮)=n12.

Proof.

We maintain the same order on the edges of T as used in the construction of 𝒮. Using this order, we call an edge even if it appears in an even position in this order, and odd otherwise. Note that the last edge en is not considered in the construction of 𝒮, and so ken=0. For the remainder of the proof, we consider T with en removed.

For convenience, denote by ket the number of 2-changes that e participates in at stage t, so that ke=t=1logn1ket. We observe the following property of 𝒮.

Fact.

Consider stage t in the construction of 𝒮. If an edge e is red in stage t, then ket=n/2t+1. If e is blue, then ket=n/2t if e is even, and ket=0 otherwise.

For any edge of T, we can count the number of 2-changes of 𝒮 it participates in as follows. (See Figure 3 for an illustration.) Construct a rooted directed binary tree H, where the nodes of H are sub-paths of T. The root of H is T itself, while the children of any node P of H are the first and the second halves of the edges in P under the ordering of the edges of T, labelled P1 and P2 respectively. Thus, the children of T are P1={e1,,e(n1)/2} and P2={e(n1)/2+1,,en1}. We moreover label the arcs of H. If P1 and P2 are the children of a node P as described above, then the arc a1=(P,P1) gets a label L(a1)=1, while a2=(P,P2) gets a label L(a2)=0.

We construct H in this way from the root, continuing until H has depth log(n1)1. (We define the depth of a node v in a rooted directed tree as the number of arcs in the directed path from the root to v. The depth of the tree itself is the largest depth among all nodes of the tree.) Then the nodes at depth t of H are the parts into which T is partitioned at stage t. Note that the leaves of H each contain two successive edges of T.

Let a=(P,Q) be an arc in H with L(a)=1. From the construction of 𝒮, it follows that any edge eQ is colored red in the stage corresponding to the depth of Q. Conversely, if L(a)=0, then Q is colored blue in this stage.

For each leaf P of H, we then consider the path P(v)=Ta1P1a2alog(n1)1Q from the root to this leaf. Following this path, we collect the labels of the arcs along this path into a string x(Q), so x(Q)=L(a1)L(a2)L(alog(n1)1). For eQ, we set xe=x(Q).

Given xe for some edge eT, we know that e is red in stage t if and only if xe(t)=1. We thus know exactly in which stages e is colored red, and in which it is colored blue. Using this information together with the fact above, we can derive formulae for ke for any eT. There are two distinct cases.

Case 1: e odd.

Since e is odd, it only participates in any 2-change when it is colored red. Thus, stage t contributes xe(t)(n1)/2t+1 2-changes, and so we count

ke =(n1)t=1log(n1)1xe(t)12t+1=t=1log(n1)1xe(t)2log(n1)1t
=j=0log(n1)2xe(log(n1)1j)2j.

Note that for a given bit string xe, this is simply the decimal expansion of the binary number represented by xe.

Since every bit string of length log(n1)1 is present for some odd edge (there are 2log(n1)1 leaves in H and each leaf corresponds to a distinct string), we find that {keodd eT}={0,1,,(n1)/21}.

Case 2: e even.

An even edge contributes (n1)/2t+1 2-changes at stage t if it is red, and (n1)/2t 2-changes if it is blue. Thus, the contribution at this stage is (n1)/2txe(t)(n1)/2t+1, and so we have

ke =(n1)(t=1log(n1)12tt=2log(n1)1xe(t)12t+1)
=n3n12t=1log(n1)1xe(t)12t
=n3j=0log(n1)2xe(log(n1)1j)2j.

As in the previous case, we recognize here the decimal expansion of xe in the second term. Thus, {keeven eT}={(n1)/21,(n1)/2,,n2,n3}.

4.3 Putting the Pieces Together

We return to the positive orthant probability by constructing an inverse covariance matrix Σ1 corresponding to 𝒮 according to Lemma 11. In the following, we sort the edges of T by decreasing value of ke(𝒮).

Observe that the first d=(n3)/2 edges of T in this order each form 2-changes with one another in 𝒮. Note that d is an integer, as n=2k+1 is odd. The entries of Σ1 corresponding to these 2-changes are Σij1=1/kikj1n3. Thus, the inverse covariance matrix Σ1 constructed from 𝒮 is upper-left triangular, except with ones on the diagonal. We can then write it in block form,

Σ1=(Σ~1AATI),

where the diagonal entries of Σ~1d×d are each 1, and the off-diagonal entries each satisfy Σ~ij11n3=12d. We consider the matrix Σ^1, which is identical to Σ~1, but with the off-diagonal entries replaced by 12d.

Comparing Σ^1 to Lemma 11, we proceed to bound the positive orthant probability associated with Σ^1. A trivial bound for this probability is 2d. Lemma 14 therefore represents a non-trivial, if modest, improvement.

Lemma 14.

Let X be distributed according to 𝒩d(0,Σ^), where Σ^1d×d has unit diagonal entries and off-diagonal entries 12d. The positive orthant probability of X is bounded from above by O(2de2d9π).

The following is now an immediate consequence of Lemmas 11 and 14, recalling that d=(n3)/2.

Lemma 15.

For 𝒮 as constructed above, 𝒢(𝒮)=O(en9π).

Proof.

By Lemma 11, we need to multiply the bound from Lemma 14 by a factor 2ddetΣ^. It is easily seen from the proof of the latter lemma that this determinant is O(1).

Finally, we combine all of the above for the last crucial lemma, from which our main result follows directly.

Lemma 16.

Let G be a complete graph on n=2k+15 vertices, with edge weights drawn independently from U[0,1] for each edge. Let T be any tour through G. The probability that T is 2-optimal is bounded from above by

O(cn(n2)!),

where c=π2e19π<1.2098.

Proof.

Let 𝒮 be a set of 2-changes on G constructed as described above, and let ke be the number of 2-changes in 𝒮 in which eT participates. From Lemma 13 we have

eTke>01ke=1n1211(n3)!=O(1(n2)!).

Using this result in Lemma 10 together with Lemma 15 yields the claim.

This leads to our last result (Theorem 3), using the fact that the number of tours on a complete graph is 12(n1)!.

4.4 Numerical Experiment

It seems unlikely that the bound in Theorem 3 is tight. A simple numerical estimate of 𝒢(𝒮) already shows that it could be improved to O(1.0223nn!), but even this may be a coarse approximation; after all, in constructing 𝒮, we discard many 2-changes.

To estimate the number of 2-optimal tours numerically, one could take a naïve approach: simply fix a tour T, generate edge weights from U[0,1], and check 2-optimality of T with these weights. By repeating this experiment we can estimate (T is 2-optimal). However, since this probability is super-exponentially small (Lemma 16), this is rather inefficient.

We can do better by taking a different view of the problem. Fix again an arbitrary tour T. We write the edge weights as a vector w[0,1]E. Given T, we can determine all n(n3)/2 possible 2-changes on T. Local optimality of T means that all these 2-changes yield a negative improvement. Thus, for a 2-change that removes edges e1 and e2 and adds f1 and f2, this yields an inequality

we1+we2wf1wf20. (2)

Each possible 2-change on T yields such a constraint. Together with the constraints 0we1 for each edge, this yields a convex polytope P embedded in m. Since the edge weights are i.i.d uniform random variables,

(T is 2-optimal)=vol(P),

the |E|-dimensional volume of P.

We remark now that our approach through Lemma 10 is equivalent to bounding the volume of a polytope that contains P, by discarding some of the inequalities (Equation 2). It may be possible to use methods from convex geometry to compute better estimates of vol(P); we leave this discussion to Section 5.

Computing the volume of an arbitrary polytope is itself not an easy task: this problem is #P-hard in general, as shown by Dyer and Frieze [13]. In the same work however, they show that volume approximation can be done efficiently by randomized algorithms. We use the open-source software package Volesti [8, 14] to numerically approximate the volume of P.

For a range of different n, we compute the inequalities that define P, and output them in a format readable for Volesti. We used the “Cooling Balls” algorithm of Volesti, with an error parameter of 0.001 (see the specification of Volesti [8]), and approximated vol(P) up to n=40. The results of these computations are shown in Figure 4.

Figure 4: Estimated volume of the 2-opt polytope Pn, for different values of n, as computed by Volesti. For comparison, the function n1/n! is also plotted.

5 Discussion

The result we present in Theorem 2 is to our knowledge the first hardness result for counting locally optimal solutions for a natural, weighted local search problem. We note that it is easy to show that counting the local optima is #P-hard for some local search problem. For instance, Johnson et al. [22] provided a local search problem whose locally optimal solutions are exactly the satisfying assignments of an instance of Boolean Satisfiability (plus one extra solution). However, their construction is rather artificial, whereas 2-opt is a heuristic often used in practice. Moreover, for unweighted problems, Goldberg et al. [18] previously showed some hardness results for counting maximal independent sets (which can be seen as locally optimal independent sets under the simple neighborhood of vertex addition).

Theorem 3 is to our knowledge the first non-trivial bound on the number of locally optimal solutions for a local search problem. It must be noted that we only proved Theorem 3 for specific values of n, namely n=2k+1 for k. This restriction simplifies mainly the construction of 𝒮, our set of chord-disjoint 2-changes. We believe that the results can be extended at the cost of some complexity in the proofs. For the sake of simplicity, we chose not to do so.

The bound in Theorem 3 shows that in expectation, the number of 2-optimal tours in a random TSP instance is approximately the square root of the total number of tours. While still a rather large number, it is nonetheless a super-exponentially small fraction of the total number of tours. While our analysis is specialized to 2-opt, we believe the techniques can be straightforwardly extended to other heuristics, so long as the instance is sufficiently symmetric and the improvement steps can be described by a fixed number of random variables (e.g., four random edge weights for 2-opt). The extension would be more involved if the number of variables per 2-opt step is not fixed, or if different solutions differ in structure.

Limitations

A natural question concerns the tightness of the bound in Theorem 3. We have not succeeded in constructing a lower bound for the number of 2-optimal tours. Presumably such a construction would be rather involved, since there is little structure in this random instance model. Nonetheless, we can still argue that Theorem 3 can be improved significantly from two directions: bounding 𝒢(𝒮) and constructing 𝒮.

Lemma 15 is far from optimal. A simple numerical experiment to estimate 𝒢(𝒮) yields 𝒢(𝒮)=O(1.226n). This results in a bound of O(1.0223nn!) in Theorem 3. We also note that, even though our set 𝒮 achieves essentially the largest number of chord-disjoint 2-changes possible on any tour, it may not be an optimal choice: different choices may lead to better sequences of values for ke.

Using the trivial bound of 𝒢(𝒮)1 instead of Lemma 15 in Lemma 16, we would obtain in Theorem 3 a bound of approximately O(1.2534nn!). The difference with our bound is thus less than a factor of 1.04n, which is a rather minute improvement for the amount of trouble we went through to obtain it. We therefore regard the calculations leading to Lemma 15 more as a proof-of-concept that significant improvements are still possible to obtain rigorously, and as a demonstration of the effort required to achieve anything non-trivial.

Another possible direction for improving the bound in Theorem 3 lies in polyhedron volume estimation. In Section 4.4, we defined a polytope P such that the volume of P is the probability that a tour on n vertices is 2-optimal. While computing the volume of a polytope is #P-hard in general [13], it may be possible to obtain an asymptotic formula for our specific case, as was done by Canfield and McKay for the well-studied Birkhoff polytope [7].

A conjecture

In light of these observations, and the estimate resulting from the numerical experiments, we conjecture the following.

Conjecture 17.

Let G be a complete graph on n vertices, with edge weights drawn independently from U[0,1] for each edge. Then the expected number of 2-optimal tours on G is bounded from above by O(n!).

The numerical data suggest that the expected number of tours may even be cnn! for some c<1 (as opposed to c>1 as in Theorem 3), although we could not perform the numerical experiments for large enough n to state this with confidence.

References

  • [1] Emile Aarts and Jan Korst. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. John Wiley & Sons, Inc., USA, 1989.
  • [2] Emile Aarts and Jan Karel Lenstra, editors. Local Search in Combinatorial Optimization. Princeton University Press, 2003. doi:10.2307/j.ctv346t9c.
  • [3] I. G. Abrahamson. Orthant Probabilities for the Quadrivariate Normal Distribution. The Annals of Mathematical Statistics, 35(4):1685–1703, December 1964. doi:10.1214/aoms/1177700391.
  • [4] Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach. Cambridge University Press, Cambridge, 2009. doi:10.1017/CBO9780511804090.
  • [5] Maxime Bôcher. Certain cases in which the vanishing of the Wronskian is a sufficient condition for linear dependence. Transactions of the American Mathematical Society, 2(2):139–149, 1901. doi:10.2307/1986214.
  • [6] Ulrich A. Brodowsky and Stefan Hougardy. The Approximation Ratio of the 2-Opt Heuristic for the Euclidean Traveling Salesman Problem. In DROPS-IDN/v2/Document/10.4230/LIPIcs.STACS.2021.18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.STACS.2021.18.
  • [7] E. Rodney Canfield and Brendan D. McKay. The asymptotic volume of the Birkhoff polytope. Online Journal of Analytic Combinatorics, 2009.
  • [8] Apostolos Chalkis and Vissarion Fisikopoulos. Volesti: Volume Approximation and Sampling for Convex Polytopes in R. The R Journal, 13(2):642–660, 2021.
  • [9] Barun Chandra, Howard Karloff, and Craig Tovey. New Results on the Old k-opt Algorithm for the Traveling Salesman Problem. SIAM Journal on Computing, 28(6):1998–2029, January 1999. doi:10.1137/S0097539793251244.
  • [10] Zongchen Chen, Elchanan Mossel, and Ilias Zadik. Almost-Linear Planted Cliques Elude the Metropolis Process. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Proceedings, pages 4504–4539. Society for Industrial and Applied Mathematics, January 2023. doi:10.1137/1.9781611977554.ch171.
  • [11] M. C. Cheng. The Orthant Probabilities of Four Gaussian Variates. The Annals of Mathematical Statistics, 40(1):152–161, 1969. URL: https://www.jstor.org/stable/2239206.
  • [12] F. N. David. A note on the evaluation of the multivariate normal integral. Biometrika, 40(3-4):458–459, December 1953. doi:10.1093/biomet/40.3-4.458.
  • [13] M. E. Dyer and A. M. Frieze. On the Complexity of Computing the Volume of a Polyhedron. SIAM Journal on Computing, 17(5):967–974, October 1988. doi:10.1137/0217060.
  • [14] Ioannis Z. Emiris and Vissarion Fisikopoulos. Practical Polytope Volume Approximation. ACM Trans. Math. Softw., 44(4):38:1–38:21, June 2018. doi:10.1145/3194656.
  • [15] Christian Engels and Bodo Manthey. Average-case approximation ratio of the 2-opt algorithm for the TSP. Operations Research Letters, 37(2):83–84, March 2009. doi:10.1016/j.orl.2008.12.002.
  • [16] Matthias Englert, Heiko Röglin, and Berthold Vöcking. Worst Case and Probabilistic Analysis of the 2-Opt Algorithm for the TSP. Algorithmica, 68(1):190–264, January 2014. Corrected version: https://arxiv.org/abs/2302.06889. doi:10.1007/s00453-013-9801-4.
  • [17] Matthias Englert, Heiko Röglin, and Berthold Vöcking. Smoothed Analysis of the 2-Opt Algorithm for the General TSP. ACM Transactions on Algorithms, 13(1):10:1–10:15, September 2016. doi:10.1145/2972953.
  • [18] Leslie Ann Goldberg, Rob Gysel, and John Lapinskas. Approximately counting locally-optimal structures. Journal of Computer and System Sciences, 82(6):1144–1160, 2016. doi:10.1016/j.jcss.2016.04.001.
  • [19] Bruce Hajek. Cooling Schedules for Optimal Annealing. Mathematics of Operations Research, 13(2):311–329, 1988. doi:10.1287/MOOR.13.2.311.
  • [20] Stefan Hougardy, Fabian Zaiser, and Xianghui Zhong. The approximation ratio of the 2-Opt Heuristic for the metric Traveling Salesman Problem. Operations Research Letters, 48(4):401–404, July 2020. doi:10.1016/j.orl.2020.05.007.
  • [21] Mark Jerrum. Large Cliques Elude the Metropolis Process. Random Structures & Algorithms, 3(4):347–359, 1992. doi:10.1002/rsa.3240030402.
  • [22] David S. Johnson, Christos H. Papadimitriou, and Mihalis Yannakakis. How easy is local search? Journal of Computer and System Sciences, 37(1):79–100, August 1988. doi:10.1016/0022-0000(88)90046-3.
  • [23] Richard M. Karp. Reducibility among Combinatorial Problems. In Raymond E. Miller, James W. Thatcher, and Jean D. Bohlinger, editors, Complexity of Computer Computations: Proceedings of a Symposium on the Complexity of Computer Computations, The IBM Research Symposia Series, pages 85–103. Springer US, Boston, MA, 1972. doi:10.1007/978-1-4684-2001-2_9.
  • [24] Marvin Künnemann, Bodo Manthey, and Rianne Veenstra. Smoothed Analysis of the 2-Opt Heuristic for the TSP under Gaussian Noise, August 2023. Comment: Combination of an ISAAC 2013 paper by Bodo Manthey and Rianne Veenstra and an ICALP 2015 paper by Marvin Künnemann and Bodo Manthey. The results of the ISAAC 2013 paper have been improved. doi:10.48550/arXiv.2308.00306.
  • [25] S. Lin and B. W. Kernighan. An Effective Heuristic Algorithm for the Traveling-Salesman Problem. Operations Research, 21(2):498–516, April 1973. doi:10.1287/opre.21.2.498.
  • [26] Bodo Manthey and Jesse van Rhijn. Improved Smoothed Analysis of 2-Opt for the Euclidean TSP. In ISAAC 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ISAAC.2023.52.
  • [27] Bodo Manthey and Rianne Veenstra. Smoothed Analysis of the 2-Opt Heuristic for the TSP: Polynomial Bounds for Gaussian Noise. In Leizhen Cai, Siu-Wing Cheng, and Tak-Wah Lam, editors, Algorithms and Computation, Lecture Notes in Computer Science, pages 579–589, Berlin, Heidelberg, 2013. Springer. Full, improved version: https://arxiv.org/abs/2308.00306. doi:10.1007/978-3-642-45030-3_54.
  • [28] Andreas Nolte and Rainer Schrader. A Note on the Finite Time Behavior of Simulated Annealing. Mathematics of Operations Research, 25(3):476–484, 2000. doi:10.1287/MOOR.25.3.476.12211.
  • [29] Leslie G. Valiant. The Complexity of Enumeration and Reliability Problems. SIAM Journal on Computing, 8(3):410–421, August 1979. doi:10.1137/0208032.
  • [30] Qiaochu Yuan. Answer to “Determinant of a matrix involving factorials”, November 2022.