Counting Locally Optimal Tours in the TSP
Abstract
We show that the problem of counting 2-optimal tours in instances of the Travelling Salesperson Problem (TSP) on complete graphs is #P-complete. In addition, we show that the expected number of 2-optimal tours in random instances of the TSP on complete graphs is . Based on numerical experiments, we conjecture that the true bound is at most , which is approximately the square root of the total number of tours.
Keywords and phrases:
Travelling salesman problem, probabilistic analysis, local search, heuristics, 2-optFunding:
Jesse van Rhijn: Supported by NWO grant OCENW.KLEIN.176.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Randomness, geometry and discrete structures ; Theory of computation Approximation algorithms analysis ; Theory of computation Discrete optimizationAcknowledgements:
We thank Alexander E. Black for helpful discussions that led to the geometric perspective outlined in Section 4.4.Editors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał SkrzypczakSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The Travelling Salesperson Problem is among the best-studied problems in computer science. It can be stated compactly: given a weighted graph with edge weights , find the Hamiltonian cycle (tour) on with the smallest total weight. The TSP is a classic example of a hard optimization problem, being even among Karp’s original 21 NP-hard problems [23].
Owing to this hardness, practitioners often turn to approximate methods. One extremely successful method is local search [25]. This is a general optimization framework where one modifies an existing (sub-optimal) solution into a better solution.
The simplest local search heuristic for the TSP is 2-opt [2]. This heuristic takes as its input a tour , and finds two sets of two edges each, and , such that exchanging for yields again a tour , and the total weight of is strictly less than the total weight of . This procedure is repeated with the new tour, and stops once no such edges exist. The resulting tour is said to be locally optimal with respect to the 2-opt neighborhood.
A convenient way to view 2-opt (and other local search heuristics) is via the transition graph . This directed graph contains a node for every tour of . An arc exists in if and only if can be obtained from by a 2-opt step and has strictly lower cost than . The sinks of are exactly the locally optimal tours of . A run of 2-opt can then be characterized by a directed path through ending in a sink.
Much research has previously focused on understanding the running time of 2-opt [9, 16, 17, 26, 27] and its approximation ratio [6, 15, 16, 20, 24]. On the other hand, little is known about the structure of . In this paper, we are concerned with counting the sinks of , which is equivalent to counting 2-optimal tours in the instance represented by .
There are practical reasons to study the transition graphs of local search heuristics. First, observe that the transition graph of 2-opt for the random TSP instances we consider is a type of random directed acyclic graph. A run of 2-opt can be viewed as a path through this random DAG, so that the the length of the longest path in is an upper bound for the number of iterations 2-opt can perform. This upper bound is however rather crude: If we consider a run of 2-opt with a random initialization, then the probability that we start the run on a node of the longest path is likely small. If most paths are much shorter, then this can provide a better explanation for the practical running time of 2-opt than only studying the longest path.
Structural results on the transition graph may in addition have implications for the running time of metaheuristics. In particular, 2-opt is often used as the basis of simulated annealing, a physics-inspired metaheuristic [2, Chapter 8]. It has long been known that the structure of the transition graph strongly influences the running time of this algorithm. Structural parameters of this graph often enter convergence results and running time estimates [19, 21, 28]. A recent result by Chen et al. [10] especially illustrates this point. They showed that the Metropolis process (in essence, simulated annealing at a constant temperature) is unable to find even very large planted cliques in an Erdős-Rényi random graph. Their analysis hinges on several structural results on cliques in such random graphs.
In particular, the aforementioned results rely (roughly) on the following structural property of the transition graph: If is the set of desired cliques, then the set of neighbors of is small. Since simulated annealing must pass through to reach , it can be shown that the hitting time of is large in expectation.
The result by Chen et al., as well as the preceding result by Jerrum [21], deals with the purely discrete problem of finding cliques. However, simulated annealing is often applied to weighted problems [1], which yields significant challenges in understanding the transition graph. We believe that understanding more about the structure of transition graphs is key to proving rigorous results on simulated annealing for weighted problems.
To be explicit, for 2-opt, if is the set of 2-optimal tours, then trivially . Thus, if is sufficiently small (Theorem 3), then we easily obtain a lower bound for the hitting time of the 2-optimal tours. We do not perform this analysis here, since (i) it is not the main focus of this paper, and (ii) the resulting bound is still somewhat weak: a better bound likely needs a more thorough structural investigation of the transition graph.
Results
We start by showing that the problem of counting 2-optimal tours is #P-complete, even on complete weighted graphs. Recall that #P is the counting analogue to NP, asking not whether a solution exists but how many solutions exist. A formal definition of #P is provided in Section 2.
Our result is in fact slightly stronger. To state it in full, we need the notion of a path cover. A set of paths in a graph is a path cover if every vertex of is contained in exactly one path of . We then have the following.
Theorem 1.
Let be a function that maps a complete weighted graph on the vertex set to the number of 2-optimal tours on this graph. Using calls to , we can compute the number of path covers of size for each in polynomial time, using as an oracle.
This result yields #P-hardness of #2Opt on complete graphs as a corollary. This counting problem asks the following question: Given a weighted graph , how many 2-optimal tours are there on ? Note that counting Hamiltonian cycles is trivial on complete graphs, whereas hardness of counting 2-optimal tours in the same setting is not immediately obvious.
Theorem 2.
#2Opt is #P-complete, even on complete graphs.
We note that the result remains true for metric TSP instances on complete graphs, which can be seen to hold by adding a sufficiently large number to every edge weight of the original instance.
While counting 2-optimal tours on complete graphs is thus likely intractable in general, we may still wonder about the average case. In the case where the edge weights are given by independent uniformly distributed random variables, we obtain the following upper bound on the number of 2-optimal tours.
Theorem 3.
Let be a complete graph on vertices, with edge weights drawn independently from for each edge. Then the expected number of 2-optimal tours on is bounded from above by .
In the process of proving Theorem 3 we obtain a link between the number of 2-optimal tours and the probability that all entries of a multivariate normal random vector are positive. This quantity is also known as the positive orthant probability. To estimate this probability we prove the following theorem.
Theorem 4.
Let be a multivariate normal vector with zero mean and covariance matrix . The positive orthant probability satisfies
In particular, if the diagonal elements of are each , then
Theorem 4 makes no assumptions on the covariance matrix , and thus holds for any set of zero-mean multivariate normal variables. Hence, it may be of independent interest in applications where bounds on positive orthant probabilities are necessary.
2 Preliminaries
2.1 Notation and Definitions
We start with some notational shorthand. Given a symbol , we write for the string consisting of copies of . Throughout, denotes the logarithm to base 2. We denote the positive orthant of d by . The negative orthant is defined similarly.
Let be a simple graph. For a tour through , we call the edges of the tour-edges and the edges of the chord-edges. A 2-change on then removes two tour-edges from and adds two chord-edges to .
For a fixed tour, if two tour-edges are removed then there is only one choice of chord-edges that yields a new tour. Thus, we can characterize a 2-change fully by the tour-edges it removes. If a 2-change removes the tour-edges and we denote this 2-change by , omitting the subscript when the tour is clear from the context. We say that two 2-changes on are chord-disjoint if they have no chord-edges in common.
Given a set of -changes, we define as the set of pairs of tour-edges that participate in the 2-changes in . For , we define , the number of 2-changes in in which participates. For each of these quantities, we may omit the argument whenever the set meant is clear from context.
Counting Complexity
For our complexity results, we state the definitions of #P and #P-completeness taken verbatim from Arora and Barak [4].
Definition 5.
A function is in #P if there exists a polynomial and a polynomial-time Turing machine such that for every ,
For completeness, we need to recall the complexity class FP, which is the functional analogue to P. This means that FP is the set of functions computable by a polynomial-time deterministic Turing machine. Moreover, we denote by the set of such functions where the Turing machine additionally has access to an oracle for .
Definition 6.
A function is #P-complete if it belongs to #P and every is in .
By this definition, if we can solve some #P-complete problem in polynomial time, then we can solve every problem in #P in polynomial time.
2.2 Multivariate Normal Distribution
We require some basic facts about multivariate normal distributions. Let and let be symmetric and positive definite. The multivariate normal distribution is defined by the probability density function
| (1) |
with support d.
Let . The positive orthant probability of is the probability that falls in the positive orthant . When this quantity is also referred to simply as the orthant probability, without specifiying which orthant, as each orthant is related by flipping the sign of a set of coordinates.
3 Complexity of Counting 2-Optimal Tours
We now proceed to show hardness of counting 2-optimal tours on complete graphs. (For space reasons, we defer some proofs to the appendix.) We start by recalling the notion of a path cover. Given a simple graph , a path cover of is a collection of vertex-disjoint paths such that every belongs to exactly one path of . The size of is the number of paths in . Note that a path cover of size 1 is equivalent to a Hamiltonian path, and that a path cover may contain paths that consist of a single vertex.
From an instance of #HamPath we construct a family of instances of #2Opt for . First, we add a new set of vertices to , where . To ensure that the reduction is polynomial-time computable, we also require ; the reason for this choice will be apparent shortly. The edges of are the edges of , which are assigned a weight of 0 plus the set of edges , which contains:
-
all missing edges between the vertices of , with weight , which we call non-edges;
-
all edges between the vertices of , with weight , which we call -edges;
-
all edges between the vertices of and the vertices of , with weight , which we call -edges.
The precise values of , and are not important, only that they are related as . See Figure 1 for a schematic depiction of the reduction.
By this construction, is a complete graph on vertices. We claim that by computing the number of 2-optimal tours on for we can compute the number of path covers of of size . To this end, we first characterize the 2-optimal tours on .
Lemma 7.
A tour through is 2-optimal if and only if it contains no non-edges.
Proof.
To begin, we note that since , any tour must contain at least one -edge. Now suppose contains a non-edge , and let be an -edge in . Assume these vertices are traversed in the written order on . Then we replace and by and , a 2-change which decreases the tour length by . Hence, is not 2-optimal, proving one direction.
For the other direction, suppose contains no non-edges. A 2-change that adds a non-edge to the tour increases the tour length by at least ; hence, any improving 2-change cannot add any non-edges to the tour. Any 2-changes on must then involve only edges of , -edges and -edges. We go case by case, considering the possible types of the edges removed by any 2-change on .
- Two -edges.
-
Since the vertices involved are all vertices of the added edges are also -edges, and hence the 2-change does not change the length of the tour at all and is therefore not feasible.
- Two edges of .
-
Since these edges have zero weight the tour length cannot decrease, and the 2-change is not feasible.
- Two -edges.
-
There are two possibilities for the added edges. In one case we obtain one -edge and one edge of , for an improvement of . In the other case we obtain again two -edges, again for no improvement.
- One -edge and one edge of .
-
The added edges are and , both of which are -edges and thus of weight . The tour is thus shortened by , and the 2-change yields no improvement.
- One -edge and one edge of .
-
The added edges are and . The added edges are again a -edge and an edge of , yielding no improvement.
- One -edge and one -edge .
-
The added edges consist of one -edge and one -edge, and there is no change in tour length.
These cases cover all possibilities, and hence a tour that contains no non-edges is 2-optimal, concluding the proof.
Let be a 2-optimal tour through . If we remove from all edges incident to , then we obtain a path cover of . We call the size of the resulting path cover the number of segments of . On the other hand, from any path cover of , we can construct many 2-optimal tours by connecting the paths in the cover using vertices of .
More formally, we say that a path cover corresponds to if we obtain by removing the edges of incident to . Note that two distinct path covers of correspond to two disjoint sets of 2-optimal tours through , and every 2-optimal tour corresponds to exactly one path cover. The following lemma counts the number of 2-optimal tours that correspond to a single path cover of size .
Lemma 8.
A path cover of size corresponds to exactly 2-optimal tours in with segments.
Proof.
By Lemma 7, the 2-optimal tours are exactly those tours which contain no non-edges. Given a path cover of size , we can then construct a 2-optimal tour in as follows.
Any tour must visit all vertices in in some order. Hence, we first fix a tour through ; there are such choices. Next, we break edges of ; there are choices of which edges to break.
We must now insert the paths of in place of these broken edges, reconnecting the endpoints of each path to the endpoints of the broken edge it replaces. Note that whenever we perform this operation, there are two possible ways to connect the path. Moreover, there are ways to match an endpoint to a broken edge, for a total of ways to insert the paths.
Putting the pieces together, we thus have
ways to construct a 2-optimal tour through .
Let , and let be a matrix with entries . Recall that runs from to and runs from to .
Lemma 9.
The matrix defined above has full rank.
Proof.
To start, we write out the entries of explicitly. Letting ,
Scaling any row or column of a matrix uniformly does not change its rank. Hence, we multiply column by , and subsequently multiply row by . Finally, interchanging any two rows also does not change the rank of the matrix, and hence we also mirror the rows of the matrix. The resulting matrix has entries
To show that this matrix has full rank, we first recall that a square matrix has full rank if and only if its determinant is nonzero. While we could in principle compute the determinant exactly, this is tedious and not necessary for our purposes. Hence, we use a concise argument from analysis to only show [30], reproduced here for the sake of completeness. The proof uses the notion of the Wronskian of a set of functions , which is the determinant of the matrix with -th element . If the functions are analytic, then vanishes identically if and only if the functions are linearly dependent [5].
We observe that the determinant of is, up to a sign, the Wronskian of the functions evaluated at . Since these functions are polynomials of different degrees, they are linearly independent and hence from analyticity of the polynomials it follows that the Wronskian does not vanish identically. By factoring out all the powers of from the rows and columns, we see that the Wronskian is a power of multiplied by its value at , which is up to a sign. As the Wronskian does not vanish identically, we must have .
Now we proceed to our main algorithmic result, from which #P-completeness of #2Opt follows as a corollary.
Theorem 1. [Restated, see original statement.]
Let be a function that maps a complete weighted graph on the vertex set to the number of 2-optimal tours on this graph. Using calls to , we can compute the number of path covers of size for each in polynomial time, using as an oracle.
Proof.
For a graph , let be the complete weighted graph resulting from the reduction described above. Let be the number of path covers of size , and let be the number of 2-optimal tours on consisting of segments. From Lemma 8, we have
We have .
We aim to compute the numbers for each . Let , and let . Then the above yields the matrix equation .
By Lemma 9, the matrix has full rank, and is hence invertible. While is rather ill-conditioned, the elements of can be encoded using a number of bits polynomial in . Thus, after making calls to to compute the vector , we can compute in polynomial time. The entries of are, by construction, exactly the number of path covers of size of .
Theorem 2. [Restated, see original statement.]
#2Opt is #P-complete, even on complete graphs.
Proof.
Membership in #P is obvious, since the problem of verifying 2-optimality of a tour is in P. For hardness, we rely on the #P-hardness of #HamPath, which was shown by Valiant [29]. Note that the number of Hamiltonian paths through a graph is exactly the number of path covers of size . By Theorem 1, given oracle access to , we can solve #HamPath – and thus every problem in #P – in polynomial time, and hence #2Opt is #P-complete when restricted to complete graphs.
4 Counting 2-Optima in Random Instances
While counting 2-optimal tours is in general hard on complete graphs, we can still provide some results for special cases. (Again, missing proofs may be found in the appendix.) In this section, we restrict our attention to complete graphs on vertices for some integer . The weights of the edges of our graphs are drawn independently from the uniform distribution on .
The strategy we use is to bound the probability that a given tour is 2-optimal. Since we assume the input graph is complete, this probability is the same for all tours. It then suffices to multiply this probability by the total number of tours, which is .
To bound the probability of a tour being 2-optimal we find a large set of mutually chord-disjoint 2-changes on . We then apply a general result that links the probability of -optimality of to how often each edge of is used in (Lemma 10). Details of the construction of this set are given in Section 4.2.
4.1 Orthant Probabilities and 2-Optimality
Let be a set of chord-disjoint 2-changes on a tour on the complete graph on vertices, and define for
Now let be a sequence of independent half-normal distributed random variables with unit variance. We define the function
Observe that is solely a function of .
Lemma 10.
Let be a complete graph on vertices, with each edge of independently assigned a weight drawn from . Let be any tour through . Let be a set of chord-disjoint 2-changes on . Then
Proof sketch.
We start by conditioning on the values of the weights on the edges of . Since is chord-disjoint, the events are independent subject to this conditioning. Since the event that is 2-optimal is equivalent to , the probability of this event then separates into the product of the probabilities that each 2-change in is non-improving. These probabilities are then straightforwardly computed. In the resulting expression, we recognize the function defined above, as well as the joint density of a multivariate half-normal distribution, from which the lemma follows after normalizing this density.
Bounding now involves two steps. First, we must construct a set of chord-disjoint 2-changes such that each edge of is used in many 2-changes in . Second, given this set, we must bound .
We remark now that is trivially bounded from above by 1. However, this leaves a factor of about . Although this factor is small compared to for the set we construct, leaving it in is somewhat unsatisfactory. We thus make an attempt to prove a stronger bound for .
Computing directly is unfeasible. It helps to recast it in terms of a positive orthant probability.
Lemma 11.
For a set of chord-disjoint 2-changes on a tour , let denote the number of tour-edges with . Label these edges of arbitrarily from to . For any , the function is bounded from above by , where is distributed according to a multivariate normal distribution with mean 0 and inverse covariance matrix with entries indexed by ,
where .
Proof.
Since and is decreasing in each variable, we can bound from above uniformly by setting the coefficients of any subset of the variables to zero. Setting these to zero for all variables outside then leaves
where is the same as , except we keep only pairs with both elements in .
Observe that , allowing us to replace the coefficients of these products. We now only need to insert the appropriate normalization factor for a multivariate normal distribution. Comparing the resulting expression with Equation 1 completes the proof.
Still, computing the positive orthant probability directly is rather difficult. Explicit formulas are known for low-dimensional cases, as well as general recursive formulas [3, 11, 12], but none of these are particularly helpful in bounding . As we only need a nontrivial upper bound, some simplifications are possible. In Theorem 4 (restated below), we show that it suffices to bound the expected squared norm of a multivariate normal vector.
Theorem 4. [Restated, see original statement.]
Let be a multivariate normal vector with zero mean and covariance matrix . The positive orthant probability satisfies
In particular, if the diagonal elements of are each , then
4.2 Finding Chord-Disjoint 2-Changes
Next, we construct a set of 2-changes such that any two 2-changes in are chord-disjoint, and most of the edges of participate in many 2-changes in . The construction we provide here works for complete graphs with vertices for some integer .
The construction of proceeds as follows. Recall that we ordered the edges of as . We define the following process on , occurring in stages. At stage we divide the tour into equal segments , starting at . In each stage, we color all the edges in each odd segment red and the edges in each even segment blue. The only exception to this rule is the last edge , which we color black; this edge is not used to form any 2-changes. See Figure 2 for an illustration at stage .
At each stage we consider the 2-changes formed by the red edges in each odd segment together with the even blue edges in its successor segment . We say that these are the 2-changes added in stage , and denote the set of these 2-changes by . We continue this process for stages. Note that in the final stage, each segment contains two colored edges.
With some effort, one can show that is chord-disjoint.
Lemma 12.
The 2-changes in are chord-disjoint.
We now determine how often each edge of is used in , which we need to apply Lemma 10.
Lemma 13.
For as constructed above, we have . Moreover, there are two edges with and two edges with .
Proof.
We maintain the same order on the edges of as used in the construction of . Using this order, we call an edge even if it appears in an even position in this order, and odd otherwise. Note that the last edge is not considered in the construction of , and so . For the remainder of the proof, we consider with removed.
For convenience, denote by the number of 2-changes that participates in at stage , so that . We observe the following property of .
Fact.
Consider stage in the construction of . If an edge is red in stage , then . If is blue, then if is even, and otherwise.
For any edge of , we can count the number of 2-changes of it participates in as follows. (See Figure 3 for an illustration.) Construct a rooted directed binary tree , where the nodes of are sub-paths of . The root of is itself, while the children of any node of are the first and the second halves of the edges in under the ordering of the edges of , labelled and respectively. Thus, the children of are and . We moreover label the arcs of . If and are the children of a node as described above, then the arc gets a label , while gets a label .
We construct in this way from the root, continuing until has depth . (We define the depth of a node in a rooted directed tree as the number of arcs in the directed path from the root to . The depth of the tree itself is the largest depth among all nodes of the tree.) Then the nodes at depth of are the parts into which is partitioned at stage . Note that the leaves of each contain two successive edges of .
Let be an arc in with . From the construction of , it follows that any edge is colored red in the stage corresponding to the depth of . Conversely, if , then is colored blue in this stage.
For each leaf of , we then consider the path from the root to this leaf. Following this path, we collect the labels of the arcs along this path into a string , so . For , we set .
Given for some edge , we know that is red in stage if and only if . We thus know exactly in which stages is colored red, and in which it is colored blue. Using this information together with the fact above, we can derive formulae for for any . There are two distinct cases.
- Case 1: odd.
-
Since is odd, it only participates in any 2-change when it is colored red. Thus, stage contributes 2-changes, and so we count
Note that for a given bit string , this is simply the decimal expansion of the binary number represented by .
Since every bit string of length is present for some odd edge (there are leaves in and each leaf corresponds to a distinct string), we find that .
- Case 2: even.
-
An even edge contributes 2-changes at stage if it is red, and 2-changes if it is blue. Thus, the contribution at this stage is , and so we have
As in the previous case, we recognize here the decimal expansion of in the second term. Thus, .
4.3 Putting the Pieces Together
We return to the positive orthant probability by constructing an inverse covariance matrix corresponding to according to Lemma 11. In the following, we sort the edges of by decreasing value of .
Observe that the first edges of in this order each form 2-changes with one another in . Note that is an integer, as is odd. The entries of corresponding to these 2-changes are . Thus, the inverse covariance matrix constructed from is upper-left triangular, except with ones on the diagonal. We can then write it in block form,
where the diagonal entries of are each 1, and the off-diagonal entries each satisfy . We consider the matrix , which is identical to , but with the off-diagonal entries replaced by .
Comparing to Lemma 11, we proceed to bound the positive orthant probability associated with . A trivial bound for this probability is . Lemma 14 therefore represents a non-trivial, if modest, improvement.
Lemma 14.
Let be distributed according to , where has unit diagonal entries and off-diagonal entries . The positive orthant probability of is bounded from above by .
Lemma 15.
For as constructed above, .
Proof.
By Lemma 11, we need to multiply the bound from Lemma 14 by a factor . It is easily seen from the proof of the latter lemma that this determinant is .
Finally, we combine all of the above for the last crucial lemma, from which our main result follows directly.
Lemma 16.
Let be a complete graph on vertices, with edge weights drawn independently from for each edge. Let be any tour through . The probability that is 2-optimal is bounded from above by
where .
Proof.
Let be a set of 2-changes on constructed as described above, and let be the number of 2-changes in in which participates. From Lemma 13 we have
Using this result in Lemma 10 together with Lemma 15 yields the claim.
This leads to our last result (Theorem 3), using the fact that the number of tours on a complete graph is .
4.4 Numerical Experiment
It seems unlikely that the bound in Theorem 3 is tight. A simple numerical estimate of already shows that it could be improved to , but even this may be a coarse approximation; after all, in constructing , we discard many 2-changes.
To estimate the number of 2-optimal tours numerically, one could take a naïve approach: simply fix a tour , generate edge weights from , and check 2-optimality of with these weights. By repeating this experiment we can estimate . However, since this probability is super-exponentially small (Lemma 16), this is rather inefficient.
We can do better by taking a different view of the problem. Fix again an arbitrary tour . We write the edge weights as a vector . Given , we can determine all possible 2-changes on . Local optimality of means that all these 2-changes yield a negative improvement. Thus, for a 2-change that removes edges and and adds and , this yields an inequality
| (2) |
Each possible 2-change on yields such a constraint. Together with the constraints for each edge, this yields a convex polytope embedded in m. Since the edge weights are i.i.d uniform random variables,
the -dimensional volume of .
We remark now that our approach through Lemma 10 is equivalent to bounding the volume of a polytope that contains , by discarding some of the inequalities (Equation 2). It may be possible to use methods from convex geometry to compute better estimates of ; we leave this discussion to Section 5.
Computing the volume of an arbitrary polytope is itself not an easy task: this problem is #P-hard in general, as shown by Dyer and Frieze [13]. In the same work however, they show that volume approximation can be done efficiently by randomized algorithms. We use the open-source software package Volesti [8, 14] to numerically approximate the volume of .
For a range of different , we compute the inequalities that define , and output them in a format readable for Volesti. We used the “Cooling Balls” algorithm of Volesti, with an error parameter of 0.001 (see the specification of Volesti [8]), and approximated up to . The results of these computations are shown in Figure 4.
5 Discussion
The result we present in Theorem 2 is to our knowledge the first hardness result for counting locally optimal solutions for a natural, weighted local search problem. We note that it is easy to show that counting the local optima is #P-hard for some local search problem. For instance, Johnson et al. [22] provided a local search problem whose locally optimal solutions are exactly the satisfying assignments of an instance of Boolean Satisfiability (plus one extra solution). However, their construction is rather artificial, whereas 2-opt is a heuristic often used in practice. Moreover, for unweighted problems, Goldberg et al. [18] previously showed some hardness results for counting maximal independent sets (which can be seen as locally optimal independent sets under the simple neighborhood of vertex addition).
Theorem 3 is to our knowledge the first non-trivial bound on the number of locally optimal solutions for a local search problem. It must be noted that we only proved Theorem 3 for specific values of , namely for . This restriction simplifies mainly the construction of , our set of chord-disjoint 2-changes. We believe that the results can be extended at the cost of some complexity in the proofs. For the sake of simplicity, we chose not to do so.
The bound in Theorem 3 shows that in expectation, the number of 2-optimal tours in a random TSP instance is approximately the square root of the total number of tours. While still a rather large number, it is nonetheless a super-exponentially small fraction of the total number of tours. While our analysis is specialized to 2-opt, we believe the techniques can be straightforwardly extended to other heuristics, so long as the instance is sufficiently symmetric and the improvement steps can be described by a fixed number of random variables (e.g., four random edge weights for 2-opt). The extension would be more involved if the number of variables per 2-opt step is not fixed, or if different solutions differ in structure.
Limitations
A natural question concerns the tightness of the bound in Theorem 3. We have not succeeded in constructing a lower bound for the number of 2-optimal tours. Presumably such a construction would be rather involved, since there is little structure in this random instance model. Nonetheless, we can still argue that Theorem 3 can be improved significantly from two directions: bounding and constructing .
Lemma 15 is far from optimal. A simple numerical experiment to estimate yields . This results in a bound of in Theorem 3. We also note that, even though our set achieves essentially the largest number of chord-disjoint 2-changes possible on any tour, it may not be an optimal choice: different choices may lead to better sequences of values for .
Using the trivial bound of instead of Lemma 15 in Lemma 16, we would obtain in Theorem 3 a bound of approximately . The difference with our bound is thus less than a factor of , which is a rather minute improvement for the amount of trouble we went through to obtain it. We therefore regard the calculations leading to Lemma 15 more as a proof-of-concept that significant improvements are still possible to obtain rigorously, and as a demonstration of the effort required to achieve anything non-trivial.
Another possible direction for improving the bound in Theorem 3 lies in polyhedron volume estimation. In Section 4.4, we defined a polytope such that the volume of is the probability that a tour on vertices is 2-optimal. While computing the volume of a polytope is #P-hard in general [13], it may be possible to obtain an asymptotic formula for our specific case, as was done by Canfield and McKay for the well-studied Birkhoff polytope [7].
A conjecture
In light of these observations, and the estimate resulting from the numerical experiments, we conjecture the following.
Conjecture 17.
Let be a complete graph on vertices, with edge weights drawn independently from for each edge. Then the expected number of 2-optimal tours on is bounded from above by .
The numerical data suggest that the expected number of tours may even be for some (as opposed to as in Theorem 3), although we could not perform the numerical experiments for large enough to state this with confidence.
References
- [1] Emile Aarts and Jan Korst. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. John Wiley & Sons, Inc., USA, 1989.
- [2] Emile Aarts and Jan Karel Lenstra, editors. Local Search in Combinatorial Optimization. Princeton University Press, 2003. doi:10.2307/j.ctv346t9c.
- [3] I. G. Abrahamson. Orthant Probabilities for the Quadrivariate Normal Distribution. The Annals of Mathematical Statistics, 35(4):1685–1703, December 1964. doi:10.1214/aoms/1177700391.
- [4] Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach. Cambridge University Press, Cambridge, 2009. doi:10.1017/CBO9780511804090.
- [5] Maxime Bôcher. Certain cases in which the vanishing of the Wronskian is a sufficient condition for linear dependence. Transactions of the American Mathematical Society, 2(2):139–149, 1901. doi:10.2307/1986214.
- [6] Ulrich A. Brodowsky and Stefan Hougardy. The Approximation Ratio of the 2-Opt Heuristic for the Euclidean Traveling Salesman Problem. In DROPS-IDN/v2/Document/10.4230/LIPIcs.STACS.2021.18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.STACS.2021.18.
- [7] E. Rodney Canfield and Brendan D. McKay. The asymptotic volume of the Birkhoff polytope. Online Journal of Analytic Combinatorics, 2009.
- [8] Apostolos Chalkis and Vissarion Fisikopoulos. Volesti: Volume Approximation and Sampling for Convex Polytopes in R. The R Journal, 13(2):642–660, 2021.
- [9] Barun Chandra, Howard Karloff, and Craig Tovey. New Results on the Old k-opt Algorithm for the Traveling Salesman Problem. SIAM Journal on Computing, 28(6):1998–2029, January 1999. doi:10.1137/S0097539793251244.
- [10] Zongchen Chen, Elchanan Mossel, and Ilias Zadik. Almost-Linear Planted Cliques Elude the Metropolis Process. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Proceedings, pages 4504–4539. Society for Industrial and Applied Mathematics, January 2023. doi:10.1137/1.9781611977554.ch171.
- [11] M. C. Cheng. The Orthant Probabilities of Four Gaussian Variates. The Annals of Mathematical Statistics, 40(1):152–161, 1969. URL: https://www.jstor.org/stable/2239206.
- [12] F. N. David. A note on the evaluation of the multivariate normal integral. Biometrika, 40(3-4):458–459, December 1953. doi:10.1093/biomet/40.3-4.458.
- [13] M. E. Dyer and A. M. Frieze. On the Complexity of Computing the Volume of a Polyhedron. SIAM Journal on Computing, 17(5):967–974, October 1988. doi:10.1137/0217060.
- [14] Ioannis Z. Emiris and Vissarion Fisikopoulos. Practical Polytope Volume Approximation. ACM Trans. Math. Softw., 44(4):38:1–38:21, June 2018. doi:10.1145/3194656.
- [15] Christian Engels and Bodo Manthey. Average-case approximation ratio of the 2-opt algorithm for the TSP. Operations Research Letters, 37(2):83–84, March 2009. doi:10.1016/j.orl.2008.12.002.
- [16] Matthias Englert, Heiko Röglin, and Berthold Vöcking. Worst Case and Probabilistic Analysis of the 2-Opt Algorithm for the TSP. Algorithmica, 68(1):190–264, January 2014. Corrected version: https://arxiv.org/abs/2302.06889. doi:10.1007/s00453-013-9801-4.
- [17] Matthias Englert, Heiko Röglin, and Berthold Vöcking. Smoothed Analysis of the 2-Opt Algorithm for the General TSP. ACM Transactions on Algorithms, 13(1):10:1–10:15, September 2016. doi:10.1145/2972953.
- [18] Leslie Ann Goldberg, Rob Gysel, and John Lapinskas. Approximately counting locally-optimal structures. Journal of Computer and System Sciences, 82(6):1144–1160, 2016. doi:10.1016/j.jcss.2016.04.001.
- [19] Bruce Hajek. Cooling Schedules for Optimal Annealing. Mathematics of Operations Research, 13(2):311–329, 1988. doi:10.1287/MOOR.13.2.311.
- [20] Stefan Hougardy, Fabian Zaiser, and Xianghui Zhong. The approximation ratio of the 2-Opt Heuristic for the metric Traveling Salesman Problem. Operations Research Letters, 48(4):401–404, July 2020. doi:10.1016/j.orl.2020.05.007.
- [21] Mark Jerrum. Large Cliques Elude the Metropolis Process. Random Structures & Algorithms, 3(4):347–359, 1992. doi:10.1002/rsa.3240030402.
- [22] David S. Johnson, Christos H. Papadimitriou, and Mihalis Yannakakis. How easy is local search? Journal of Computer and System Sciences, 37(1):79–100, August 1988. doi:10.1016/0022-0000(88)90046-3.
- [23] Richard M. Karp. Reducibility among Combinatorial Problems. In Raymond E. Miller, James W. Thatcher, and Jean D. Bohlinger, editors, Complexity of Computer Computations: Proceedings of a Symposium on the Complexity of Computer Computations, The IBM Research Symposia Series, pages 85–103. Springer US, Boston, MA, 1972. doi:10.1007/978-1-4684-2001-2_9.
- [24] Marvin Künnemann, Bodo Manthey, and Rianne Veenstra. Smoothed Analysis of the 2-Opt Heuristic for the TSP under Gaussian Noise, August 2023. Comment: Combination of an ISAAC 2013 paper by Bodo Manthey and Rianne Veenstra and an ICALP 2015 paper by Marvin Künnemann and Bodo Manthey. The results of the ISAAC 2013 paper have been improved. doi:10.48550/arXiv.2308.00306.
- [25] S. Lin and B. W. Kernighan. An Effective Heuristic Algorithm for the Traveling-Salesman Problem. Operations Research, 21(2):498–516, April 1973. doi:10.1287/opre.21.2.498.
- [26] Bodo Manthey and Jesse van Rhijn. Improved Smoothed Analysis of 2-Opt for the Euclidean TSP. In ISAAC 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ISAAC.2023.52.
- [27] Bodo Manthey and Rianne Veenstra. Smoothed Analysis of the 2-Opt Heuristic for the TSP: Polynomial Bounds for Gaussian Noise. In Leizhen Cai, Siu-Wing Cheng, and Tak-Wah Lam, editors, Algorithms and Computation, Lecture Notes in Computer Science, pages 579–589, Berlin, Heidelberg, 2013. Springer. Full, improved version: https://arxiv.org/abs/2308.00306. doi:10.1007/978-3-642-45030-3_54.
- [28] Andreas Nolte and Rainer Schrader. A Note on the Finite Time Behavior of Simulated Annealing. Mathematics of Operations Research, 25(3):476–484, 2000. doi:10.1287/MOOR.25.3.476.12211.
- [29] Leslie G. Valiant. The Complexity of Enumeration and Reliability Problems. SIAM Journal on Computing, 8(3):410–421, August 1979. doi:10.1137/0208032.
- [30] Qiaochu Yuan. Answer to “Determinant of a matrix involving factorials”, November 2022.
