Algorithms for the Diverse--SAT Problem:
The Geometry of Satisfying Assignments
Abstract
Given a -CNF formula and an integer , we study algorithms that obtain solutions to the formula that are as dispersed as possible. For , this problem of computing the diameter of a -CNF formula was initiated by Creszenzi and Rossi, who showed strong hardness results even for . The current best upper bound [Angelsmark and Thapper â04] goes to as . As our first result, we show that this quadratic blow up is not necessary by utilizing the Fast-Fourier transform (FFT) to give a time exact algorithm for computing the diameter of any -CNF formula.
For , the problem was raised in the SAT community (Nadel â11) and several heuristics have been proposed for it, but no algorithms with theoretical guarantees are known. We give exact algorithms using FFT and clique-finding that run in and respectively, where is the size of the solutions space of the formula and is the matrix multiplication exponent.
However, current SAT algorithms for finding one solution run in time for , which is much faster than all above run times. As our main result, we analyze two popular SAT algorithms - PPZ (Paturi, PudlĂĄk, Zane â97) and Schöningâs (â02) algorithms, and show that in time , they can be used to approximate diameter as well as the dispersion () problem. While we need to modify Schöningâs original algorithm for technical reasons, we show that the PPZ algorithm, without any modification, samples solutions in a geometric sense. We believe this geometric sampling property of PPZ may be of independent interest.
Finally, we focus on diverse solutions to NP-complete optimization problems, and give bi-approximations running in time with for several problems such as Maximum Independent Set, Minimum Vertex Cover, Minimum Hitting Set, Feedback Vertex Set, Multicut on Trees and Interval Vertex Deletion. For all of these problems, all existing exact methods for finding optimal diverse solutions have a runtime with at least an exponential dependence on the number of solutions . Our methods show that by relaxing to bi-approximations, this dependence on can be made polynomial.
Keywords and phrases:
Exponential time algorithms, Satisfiability, -SAT, PPZ, Schöning, Dispersion, DiversityCategory:
Track A: Algorithms, Complexity and GamesFunding:
Ioana O. Bercea: Received funding from Basic Algorithms Research Copenhagen(BARC), supported by VILLUM Foundation Grants 16582 and 54451.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithmsEditors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele PuppisSeries and Publisher:

1 Introduction
In this work, we start by asking a simple question: what is the complexity of computing the diameter of a -SAT solution space? That is, given a satisfiable -CNF formula, we want to output two satisfying assignments with maximum Hamming distance between them. More generally, what if we want multiple satisfying assignments that are maximally far apart? One can also think of this as finding a binary code with optimal rate/distance tradeoff, where each codeword must satisfy the given -CNF formula. We give exact and approximate exponential time algorithms for these problems and show that existing well-known algorithms for finding one solution can be leveraged to output multiple, reasonably far apart, solutions.
Crescenzi and Rossi [18] formulated the diameter computation problem for general Constraint Satisfaction Problems (CSPs), under the name Maximum Hamming Distance. They studied the approximability of the problem and gave a complete classification based on Schaeferâs criteria for the satisfiability of CSPs [46]. In particular, they also showed that the diameter problem is NP-hard even for -SAT.111They in fact show that it is PolyAPX-hard. Moreover, while not explicitly stated, their reduction immediately gives an optimal inapproximability of for the diameter of a -CNF formula. On the constructive side, Angelsmark and Thapper [4] gave an algorithm that outputs a diameter pair in polynomial space and time, whenever there exists an time algorithm for finding one satisfying assignment. Under standard complexity assumptions (SETH), as , so the above approach is unlikely to run in time better than222We use the notation to hide polynomial factors in . .
This already raises the interesting question of the optimal running time needed for finding a diameter pair (i.e., its exponential complexity [13]). In the case of graphs, it is known that quadratic blow-up in time is unavoidable, assuming the Orthogonal Vectors Hypothesis [52, 3]. Should we also expect a quadratic blow-up in time for diameter of -SAT? We first show that this is not the case: using a Fourier analytical approach, we show how to compute a diameter pair deterministically in time (Theorem 6).
Dispersion.
The problem of computing diverse satisfying assignments to a -CNF formula was explicitly raised by Nadel [40]. Generating diverse solutions has many applications [10, 1, 8], and several other works have focused on finding multiple solutions to either SAT or constraint programming [2, 43, 29, 44, 38, 23, 6]. However, all of the above works are heuristic in nature, and we could not find any algorithm for dispersed solutions to -SAT with provable guarantees. Our work provides the first exact and approximate algorithms for computing diverse solutions to a -CNF formula.
There are many different ways to define the dispersion for a set of points (see Table 1 in [36]). We consider two most popular measures of dispersion: minimum pairwise distance and sum of pairwise distances (the latter is equivalent to average pairwise distance). We will use to denote the Hamming distance. By the dispersion problem, we mean given a -CNF formula and an integer , return a set of satisfying assignments to that maximize or . If the -CNF formula does not have distinct satisfying assignments, we allow the algorithm to return a multiset. Unless stated otherwise, our results will be for the minimum version of dispersion.
Exact algorithms.
We show that we can extend our Fourier analytical approach for diameter to dispersion, obtaining an exact algorithm in time (Theorem 9). Furthermore, for we also get a faster algorithm based on clique finding (Theorem 10), that runs in time , where is the set of satisfying assignments of the formula and is the matrix multiplication exponent [53].
Faster approximations.
Even with our improvements, the above exact algorithms still run in time for . What if we allow approximations? Two questions arise:
-
Can one obtain a bound of the form ? If so, must have exponential dependence on , or can be made polynomial in ?
-
The current fastest -SAT algorithms for finding one solution run in time for . Can one get a bound of the form ? Thus, the best runtime for finding dispersed solutions that one could hope for is , as this is roughly the time taken to find any set of solutions. Can we achieve this?
Main result, informal. There exist randomized algorithms with a run time of that, given a -CNF formula on variables and a parameter , return a set of many satisfying assignments that approximately maximize and . Moreover, for several optimization problems, there exist algorithms with a similar runtime that are bi-approximations, i.e., return approximately-optimal solutions that are also approximately-maximally-diverse.
In addition to these results being a step towards bridging the gap between the theory and practice of finding diverse solutions, what is surprising is that the way we arrive at them reveals novel interesting aspects of two extremely well-studied algorithms for finding one solution to a given formula: PPZ and Schöningâs algorithm.
PPZ and Schöningâs algorithms.
The complexity of the -SAT problem has a long and rich history [34, 35, 13, 21]. In a foundational work, Paturi, PudlĂĄk, and Zane [42] presented a remarkably simple and elegant randomized algorithm for -SAT. Their algorithm runs in time and outputs a satisfying solution with probability if one exists. A few years after that, Schöning [51] developed another surprisingly simple random walk-based algorithm running in time ,333The run-time of Schöningâs algorithm is normally presented as , which we have rewritten for ease of comparison with PPZ. which runs faster than the PPZ algorithm for all . With time, these approaches have been reanalyzed and sometimes improved in a variety of technically subtle and involved ways [33, 11, 41, 31, 30, 39, 49, 28, 47, 48], including the PPSZ algorithm by Paturi, PudlĂĄk, Saks and Zane [41], which is the current fastest algorithm for -SAT.
In our work, we ask whether PPZ and Schöningâs can exploit the global geometry of the solution space and go beyond finding just one satisfying assignment. Namely, can they be used to approximate the diameter and the dispersion for -SAT? We remark that the main result above is not a black-box result that uses any SAT solver - we only know how to use PPZ and Schöningâs algorithms for this purpose. To familiarize the reader with these two algorithms, we provide their pseudocodes next.
Farthest Point Oracles.
Gonzalez [24] proposed the farthest-insertion algorithm, and showed that it gives a 1/2 approximation to the minimum version of the dispersion problem: given a metric space of points, find a set of points in it that maximize . This was later extended to the sum version by [12]. The algorithm builds the set iteratively; in the th iteration it adds the point that maximizes the minimum (resp. sum of) distance to all the points in the solution so far. Moreover, the factor 1/2 is tight assuming the Exponential Time Hypothesis (ETH), so in a sense, farthest insertion is the best possible (polynomial) algorithm for dispersion [22].
In our setting, a farthest point oracle takes as input a -CNF formula (with a set of satisfying assignments) and a set (or multiset) , and outputs a satisfying assignment that is âfar awayâ from the assignments in . Namely, for , we let and . Then for some , the assignment would either satisfy
for the and the version, respectively.
In Section 2, we describe our main technical lemmas on PPZ and Schöning algorithms. This is followed by the algorithms for diameter and dispersion implied by these lemmas (Section 3). As mentioned in the informal result statement, our techniques extend to finding diverse solutions to optimization problems as well. These results are formally described in Section 4.
2 Main Technical Lemmata
Recall that we are aiming for a runtime of . The question therefore is: can we implement farthest point insertion in time? We now state the two main technical lemmas that form the core of our analysis.
Lemma 1 (PPZ performs geometric sampling).
For any , with probability at least , each iteration of the PPZ algorithm outputs a satisfying assignment , such that . The iteration of PPZ does not depend on .
Lemma 2 (Modified Schöningâs Algorithm is a farthest point oracle).
There exists an algorithm, running in time that takes a -CNF formula and as input and outputs a satisfying assignment such that . Here, is used explicitly inside the iteration.
We sketch the proofs in Section 5. Three remarks are in order.
 Remark 3.
Lemma 1 requires several insights into the behavior of PPZ. PPZ is not a traditional local search algorithm and it falls in the random restriction paradigm [48]. The analysis of PPZ [42] is local in nature: the authors bound the probability of arriving at a solution that is -isolated, meaning that exactly neighbors of are also satisfying solution. This probability is then added over all satisfying assignments, resulting in the PPZ run time bound of . On the other hand, in Lemma 1 we are interested in bounding the probability that PPZ returns a solution that is far away from a given point . The fact that PPZ, without any modifications based on , returns such far-away solutions automatically was surprising to us. We leave it as an open question whether the PPZ-based, more involved, state-of-the-art algorithm of Paturi, Pudlåk, Saks and Zane (PPSZ) [41], can also be shown to exhibit similar behavior.
 Remark 4.
Unlike PPZ, we could not prove that Schöningâs original algorithm works directly as an approximate farthest point oracle. Our modification of Schöningâs algorithm controls both the region of starting assignments and the length of the Schöning walk from . Instead of Schöningâs analysis that bounds the probability of finding any solution starting at a random point, we bound the probability that we find a solution far from and close to . As a plus, in addition to giving us a farthest point oracle, this also allows us to obtain a tradeoff between runtime and approximation factors. More details can be found in Section 5.
 Remark 5.
We investigate other promising candidate approaches for -CNF dispersion that do not use PPZ or Schöningâs algorithms. First, we show that the approach to solve dispersion problem via uniform sampling algorithms [50] does not necessarily give a good approximation compared to our approach, even for the diameter. Furthermore, we consider yet another promising approach via the Min-Ones problem. This problem asks for the minimum Hamming weight solution to a SAT formula [20]. While we note that the an algorithm for the Min-Ones problem can be used to give a approximation of the diameter, we also observe that this approach is unlikely to be extended to finding more than two diverse solutions, as the reduction to diameter does not generalize. We refer the reader to the full version of this paper for more details [7].
Lemma 1 and Lemma 2 give us algorithms for computing a set with maximum dispersion for both the and the versions. These are stated formally in Section 3. Moreover, we get a variety of applications: diverse solutions to several optimization problems and CSPs, and reanalyzing SAT algorithms when the formula has many diverse assignments. These are presented in Section 4.
3 Results on Diameter and Dispersion
Throughout the paper, we let denote a -CNF formula on variables (unless otherwise specified). Given such an , we let denote the set of satisfying assignments of . We start by formally defining the diameter problem. For a given formula , let be defined as , where is non-empty. Note that when has a unique satisfying assignment, then is simply . On the other hand, if is not satisfiable, we define . For a set , define and . We then define Opt-sum as the maximum value of over all multisets with satisfying assignments (including multiplicities), and
, i.e., the maximum such distance over all sets of satisfying assignments. Further, we define as the maximum value of over all sets with distinct satisfying assignments.
3.1 Computing diameter exactly and approximately
Computing diameter exactly.
We first study the exponential complexity of computing . Specifically, we prove the following theorem.
Theorem 6 (Exact Diameter).
Let be a -CNF formula on variables. There exists a deterministic algorithm that uses time and space, and outputs a pair of satisfying assignments with .
Prior to our work, the best exact algorithm known was by Angelsmark and Thapper [5]. Their algorithm runs in time and space , where is the running time for solving the -SAT problem. Our result significantly improves the running time of their algorithm (but uses substantially more space than their algorithm).
Our technique is also different from other techniques in the literature. Namely, this algorithm does not depend on any SAT algorithm. Our main observation is that can be reduced to computing the convolution of the Boolean function represented by with itself. We then use that such a convolution can be computed within the above stated time and space bounds using the Fast Fourier Transform.
Our technique for exact diameter is fairly general and does not depend on the fact that the solution space corresponds to a -CNF formula. For any Boolean function such that for a given , there is a polynomial time oracle to compute , our algorithm can be used to exactly compute the diameter of with the above performance guarantees.
Approximating the diameter.
Next, we give algorithms for approximating 444All approximation algorithms we present here use space.. As a warm-up, here is a simple way to approximate . We can start by using the best known algorithm to find a single satisfying assignment for . Suppose that assignment is . We can then (in polynomial time) change to by negating some of the variables such that becomes the satisfying assignment of . One can then use the best known algorithm for the Min-Ones problem to find a satisfying assignment for , which finds a satisfying assignment with minimum s in it, say . It is easy to see that the Hamming distance between gives a -approximation to the diameter of . By using the best known algorithms for -SAT by Paturi, Pudlåk, Saks, and Zane [41] and for Min-Ones by Fomin, Gaspers, Lokshtanov and Saurabh [20], it is easy to see that we can obtain in time .555Note that, .
Here, we obtain better running time for for with a small loss in the approximation factor. From here on, we assume that unless stated otherwise.
Theorem 7 (PPZ approximating ).
Let be a -CNF formula on variables. There exists a randomized algorithm running in time that takes as input and if is satisfiable, outputs with with probability .
The running time of the algorithm here is exactly the same as the running time of the algorithm achieved in [42], which solves the -SAT problem. Our result demonstrates that the diameter can be approximated in the same time used to compute a single satisfying assignment. In fact, the way we achieve this running time is by repeatedly invoking the PPZ algorithm. At the heart of the analysis of the PPZ algorithm lies the Satisfiability Coding Lemma from [42]. Informally speaking, the Satisfiability Coding Lemma says that if the solutions of a -CNF instance are well-separated then they have a small description. In our proof, we generalise this lemma. We discuss our proof idea in detail in Section 5.
Next, we show how to approximate the diameter within the running time guarantees of Schöningâs algorithm for -SAT. Specifically, we prove the following theorem.
Theorem 8 (Schöning approximating .).
Let be a -CNF formula on variables. There exists a randomized algorithm running in time that takes as input and if is satisfiable, outputs with with probability .
In fact, Theorem 8 is one instance of a smooth tradeoff between the approximation factor and the running time. Notice that the running time obtained here is better than the running time obtained using Theorem 7, which in turn is faster than the naive algorithm that uses Min-Ones. We incur some loss in the approximation factors to obtain these speedups. As stated, the result gives non-trivial approximation factors when . We refer the reader to the full version of this paper [7] for more details.
3.2 Computing dispersion exactly and approximately
We extend all the algorithms from Section 3.1 and obtain bounds for the dispersion problem.
Exact algorithms for dispersion.
We start with the problem of exactly computing , and . The obvious algorithm for computing all these quantities would be to do a brute force search over all , which would require time. We observe that we can extend the Fourier analytical approach we used in Theorem 6 to do this in time and space. We also provide an alternate algorithm for dispersion based on clique-finding that runs in time and uses space , where denotes the matrix multiplication exponent [53]. As such, it is faster than the Fourier analysis-based algorithm for any , and can be much faster when the size of the solution set is less than . We formally state these results below.
Theorem 9 (Exact Dispersion using FFT).
Let be a function computable by a black box and let be a given parameter. Then, there exist deterministic algorithms that make oracle calls to and in addition to that, use time and space provide the following guarantees.
-
1.
The output of is a multiset such that .
-
2.
The output of is a set such that .
-
3.
If , the output of is a set such that .
Theorem 10 (Exact Dispersion using Clique Finding).
There exist deterministic algorithms that given as input a non-empty set of size and parameter , runs in time, uses space, and have the following behaviour.
-
1.
The output of is such that .
-
2.
The output of is such that .
-
3.
And, as long as , the output of is a set such that .
Approximating dispersion.
We now turn to approximation algorithms for dispersion. Our goal is to come up with approximation algorithms for all the versions of the dispersion problem as in the case of approximation algorithms for computing the diameter.
We saw that Min-Ones can be used to give a 0.5 approximation to . However, it is not clear how we can use it to approximate the dispersion. We elaborate in Section 5.
Approximating .
We show that PPZ as well as Schöningâs algorithms can be modified to compute . Formally,
Theorem 11 (PPZ approximating ).
Let be a -CNF formula on variables. There exists a randomized algorithm running in time that takes and an integer as input and if is satisfiable, with probability at least , outputs a multiset of size such that
 Remark 12.
When , this algorithm achieves a better approximation ratios for smaller values of than stated above. Note that as and become large, the approximation factor tends to . For more details, we refer the reader to the full version of this paper [7, Section 3].
We note that we can design algorithms for , with exactly the same approximation factors as for for certain parameter regimes of (As earlier, we refer the reader to the full version of this paper for more details).
Approximating .
Next, we show that our techniques can be used to approximate Opt-min as well. Formally,
Theorem 13 (PPZ approximating ).
Let be a -CNF formula on variables. There exists a randomized algorithm running in time that takes and an integer as input and if is satisfiable and , with probability at least , outputs a set of size such that .666The function denotes the inverse of the binary entropy function restricted to the domain . The domain of is and its range is .
Note that, in the above statement, the approximation factor is non-trivial () only for . We note that we can also obtain Schöning-type running time bounds for dispersion for . We achieve this by extending Theorem 8.
Approximating for heavy-weight solutions.
We now consider a heavy-weight variant of . Formally, for a -CNF formula , we let denote the set of satisfying assignments to with Hamming weight at least . We then define
and let denote the minimum Hamming weight of assignments in . We show that the approach developed for approximating Opt-min via Schöningâs algorithm can also be used to return dispersed satisfying assignments of heavy weight.
Theorem 14 (Schöning for weighted dispersion).
Let be a -CNF formula on variables, and . Let . There exist algorithms that take as input and output with probability in time :
-
1.
of size such that if is satisfiable and .
-
2.
of size such that if is satisfiable and ,
 Remark 15.
We note that when , this just reduces to an algorithm for approximating . The approximation factors in Theorem 14 are non-trivial only for . However, just like the case of Theorem 8, Theorem 14 can be generalized, obtaining running time bounds for any and for a larger range of approximation factors. Further, we can prove that an analogous result exists for the sum of distances dispersion measure.
We refer the reader to [7, Section 4] for the complete theorem statements and proofs of our Schöning type results.
4 Generalizations and applications
1. Isometric Reductions.
Dispersion has also been studied when the space is induced by solutions to some NP-complete optimization problem [10, 9]. To address this optimization aspect, we first generalize our techniques to give dispersed solutions of high (or low) Hamming weight777In a recent work, Gurumukhani, Paturi, Pudlåk, Saks, and Talebanfard [26] consider the problem of enumerating satisfying assignments with Hamming weight at least for a given -CNF formula (assuming that satisfying assignments of smaller weight do not exist). They show that this problem has interesting connections to circuit lower bounds.. Namely, given , all of our solutions will have Hamming weight at least (or at most) approximately , and their dispersion will be close to that of an optimally dispersed set wherein all solutions have weight at least (or at most) . We then formalize a set of reductions, that preserve the size of the solution set and the distances between solutions. We call such reductions isometric. As a result, we can approximate dispersion for problems such as Maximum Independent Set, Minimum Vertex Cover and Minimum Hitting Set.
2. Using the monotone local search framework for diverse solutions.
Our second application allows us to compute diverse solutions to optimization problems that perhaps do not allow isometric reductions to SAT. In this case, we show how to use the monotone local search framework by Fomin, Gaspers, Lokshtanov and Saurabh [20]. This allows us to extend our results to a variety of problems, including Feedback Vertex Set, Multicut on Trees, and Minimum -Hitting Set (see Table 1 for a sample of the results that can be obtained using this technique888The table provides the running time guarantees to obtain -approx. optimal, -approx. maximally diverse solutions, by plugging in into the run-time bounds in our generalized theorem [7, Theorem 36]).
For all of these problems, any existing exact methods for finding a set of optimal, maximally diverse solutions has a runtime with at least an exponential dependence on the number of solutions [10, 9]. Our methods show that by relaxing to bi-approximations, this dependence on can be made polynomial.
Optimization Problem | One optimal solution | Multiple approximately optimal, |
---|---|---|
[20] | approximately dispersed solutions | |
Vertex cover | ||
Maximum independent Set | ||
Feedback Vertex Set | ||
Subset Feedback Vertex Set | ||
Feedback Vertex Set in Tournaments | ||
Group Feedback Vertex Set | ||
Vertex -Partization | ||
Interval Vertex Deletion | ||
Proper Interval Vertex Deletion | ||
Block Graph Vertex Deletion | ||
Cluster Vertex Deletion | ||
Thread Graph Vertex Deletion | ||
Multicut on Trees | ||
3-Hitting Set | ||
4-Hitting Set | ||
Weighted Feedback Vertex Set | ||
Weighted 3-Hitting Set |
3. On faster SAT algorithms.
Another compelling reason to study diversity of the solution space of a -CNF formula is that the existence of far apart solutions might be used to study the computational complexity of -SAT and its variants. Indeed, the geometry of the solution space has been studied extensively, both to obtain faster SAT solvers (parameterised by the number of solutions, such as in Hirsch [32] and Kane and Watanabe [37]) and in the random SAT setting, e.g., the diameter by Feige, Flaxman and Vilenchik [19] and the giant connected component by Chen, Mani, Moitra [17]).
Consider a formula with for some . For such a formula, it is known that PPZ scales optimally, i.e., it finds one solution in time  [14]. Cardinal, Nummenpalo and Welzl [15] proved a weaker result for Schöning, but nevertheless, both PPZ and Schöning run faster if the solution space is large. In fact, the same is true for PPSZ [47].
Taking this idea a step further, we investigate the runtime of PPZ and Schöningâs algorithms when contains many well-dispersed solutions. For example, if contains a Hamming code that achieves the Gilbert Varshmov bound, we can show an exponential improvement in the runtime of Schöningâs algorithm. Similarly, using the geometric sampling property of PPZ in Lemma 1, we obtain an improved runtime in this setting. In this sense, if having more (solutions) is better [47], then our results formalize the intuition that more dispersed solutions are even better. To be precise, for every , we define . Note that from the definition of , for every , there exists a set of size , such that the balls of radius around each are disjoint. We also note that .
Theorem 16.
Let be a -CNF formula. If is satisfiable, Schöningâs algorithm succeeds in finding a satisfying assignment within iterations.
Theorem 17.
Let be a -CNF formula. If is satisfiable, the PPZ algorithm succeeds in finding a satisfiable assignment to within iterations.
4. Relation to coding theory.
We mention a connection that might be of independent interest. The dispersion problem can be restated in the language of coding theory, namely, we are looking for codewords that also satisfy a given -CNF formula. If for all , then it is known that a uniformly random code achieves the Gilbert-Varshamov bound [45]. When is not trivial, the algorithms presented in this work provide such a code. Moreover, our result says that the code can be found in time proportional to the running times of PPZ and Schöning (when the size of the code is small). Additionally, in practice, one also wants codes that have succinct representations, e.g. linear codes [27, 25]. While our codes do not exhibit this property, it would indeed be interesting to extend our algorithms in this direction.
5. CSPs.
Finally, since Schöningâs algorithm for finding one solution generalizes to CSPs, we also give algorithms obtaining diverse solutions to CSPs.
5 Technical Overview: Proof sketches for Lemma 1 and Lemma 2
In this section we outlines the main techniques behind Lemma 1 and Lemma 2, that show that PPZ and Schöning algorithms can be employed as approximate farthest point oracles. Because of this approximation, slightly more work needs to be done in order to bound the overall approximation factors for dispersion. For the sum dispersion problem, we also adapt Cevallos, Eisenbrand, and Zenklusenâs local search algorithm [16] for our setting to get an improved approximation factor.
Lemma 1: PPZ samples geometrically
The PPZ algorithm consists of repeating the following procedure times: sample an assignment and a permutation uniformly and independently at random. Then call a deterministic subroutine that runs in time and outputs another assignment . The algorithm stops once .
The analysis is based on bounding the probability that, for a randomly chosen and , leads to some satisfying assignment . For any , let denote the probability that an iteration outputs and for any set , let denote the probability that an iteration outputs a satisfying assignment in .
The lower bound that PPZ gives on uses the the local geometry of around in the following sense: we say that is -isolated if, out of the neighboring assignments to in the Boolean hypercube, at least of them are not satisfying. The key observation in the analysis of the PPZ algorithm, called the Satisfiability Coding Lemma [42] states that for every -isolated satisfying assignment , it holds that . Intuitively, the more isolated a solution is, the more choices of and would lead to it through .
Our renewed analysis of PPZ shows that, for any fixed assignment , is also likely to output satisfying assignments that are far away from it. We state Lemma 1 formally in [7, Lemma 18] that shows that with probability at least , each iteration of the PPZ algorithm outputs a satisfying assignment , such that
Thus, we get that PPZ is also an approximate farthest point oracle. More interestingly, the run of PPZ does not depend on , and therefore we say that PPZ samples geometrically. We note that the original analysis does not take into account distances between solutions, i.e., the probability of finding a solution only depends on the number of its immediate neighbors that are non-solutions. This in itself is a local feature that does not capture global properties like the diameter/dispersion of the solution space. Indeed, our analysis differs from the original PPZ analysis in precisely the fact that it exploits this global information (which is needed for diameter/diversity, but not needed if we just want to find one solution).
In order to exploit global geometric properties of the solution space, we view as a subgraph of the -dimensional Hypercube graph. We then divide the vertices in into layers, where layer consists of all the vertices at distance from (in ). We also define . Now, we want to show that assignments in higher layers will be reached by with good probability. We do this by proving that for large enough , either is large or the number of cut edges between and is small in .
We then use the original Satisfiability Coding Lemma and the fact that an assignment is -isolated if and only if its degree in is , to show that, for any subset of the vertices in , it holds that
where denotes the edges in between vertices in and denotes the edges in between and . We then use the edge isoperimetric lemma for subgraphs of the hypercube which upper bounds the number of edges in the subgraph by a function of the number of vertices in the subgraph. To complete the proof of Lemma 1, we lower bound the probability , where are the assignments in that are far away from .
We also show that the above analysis can be extended to prove that for any subset , with probability at least , each iteration of the PPZ algorithm outputs a satisfying assignment , such that
This directly implies the existence of a -approximate farthest point oracle that runs in the same time as the PPZ algorithm [7, Section 3]. However, we were not able to show a similar lower bound with respect to the distance from . Instead, we can use Lemma 1 to show that for every satisfying assignment , each iteration of the PPZ algorithm outputs a satisfying assignment within Hamming distance from (invoke Lemma 1 on the antipode of ). We can also assume that we have a lower bound on on the order of (just exhaustively search all the balls around assignments in until you hit PPZ running time). Thus, we get an approximate farthest point oracle running in the same time as the PPZ algorithm for the min-dispersion problem as well.
Lemma 2: Modified Schöningâs algorithm is a farthest point oracle
Our second approach for designing farthest point oracles uses Schöningâs algorithm [51]. At its core, Schöningâs algorithm is a local search algorithm that does a random walk from some starting assignment . The main subroutine takes as input and, as long as there is a clause that is unsatisfied, picks one of its literals at random an flips its value. Schöning showed that, if there exists a satisfying assignment within Hamming distance from , then within steps, the above random walk outputs a satisfying assignment with probability at least . By picking the starting point uniformly at random from and letting the random walk go for steps, one can then show that the subroutine suceeds with probability at least .
We modify Schöningâs algorithm by picking the starting point and then setting the length of the random walk more carefully. Suppose we are promised that there exists a satisfying assignment that is distance (in max-sum or max-min) from some set of assignments. We then restrict our starting points to be sampled such that they are also guaranteed to be approximately at distance from . From there, we perform a random walk of small length such that any satisfying assignment we find is also guaranteed to be far away from . The probability that we succeed depends on bounding the set of good starting points: those that are close to the promised (not just far from ), since these are the ones most likely to find a satisfying assignment within the length of the random walk. This is the most technically involved step of our analysis. We thus get a farthest point oracle for diameter and all versions of dispersion. Moreover, the Schöning strategy can also find heavy-weight assignments. This is done by artificially adding as part of the set (thus, an assignment that is far from in Hamming distance will also have a large weight).
References
- [1] Amir Abboud, Vincent Cohen-Addad, Euiwoong Lee, and Pasin Manurangsi. Improved approximation algorithms and lower bounds for search-diversification problems. In Mikolaj Bojanczyk, Emanuela Merelli, and David P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming, ICALP 2022, July 4-8, 2022, Paris, France, volume 229 of LIPIcs, pages 7:1â7:18. Schloss-Dagstuhl-Leibniz Zentrum fĂŒr Informatik, Schloss Dagstuhl â Leibniz-Zentrum fĂŒr Informatik, 2022. doi:10.4230/LIPIcs.ICALP.2022.7.
- [2] Sabih Agbaria, Dan Carmi, Orly Cohen, Dmitry Korchemny, Michael Lifshits, and Alexander Nadel. Sat-based semiformal verification of hardware. In Roderick Bloem and Natasha Sharygina, editors, Proceedings of 10th International Conference on Formal Methods in Computer-Aided Design, FMCAD 2010, Lugano, Switzerland, October 20-23, pages 25â32. IEEE, IEEE, 2010. URL: https://ieeexplore.ieee.org/document/5770929/.
- [3] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 136â150. IEEE, IEEE Computer Society, 2015. doi:10.1109/FOCS.2015.18.
- [4] Ola Angelsmark and Johan Thapper. Algorithms for the maximum Hamming distance problem. In Boi Faltings, Adrian Petcu, François Fages, and Francesca Rossi, editors, International Workshop on Constraint Solving and Constraint Logic Programming, volume 3419 of Lecture Notes in Computer Science, pages 128â141. Springer, 2004. doi:10.1007/11402763_10.
- [5] Ola Angelsmark and Johan Thapper. A microstructure based approach to constraint satisfaction optimisation problems. In Ingrid Russell and Zdravko Markov, editors, Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, Clearwater Beach, Florida, USA, pages 155â160. AAAI Press, 2005. URL: http://www.aaai.org/Library/FLAIRS/2005/flairs05-026.php.
- [6] Andrea Arcuri and Lionel C. Briand. Formal analysis of the probability of interaction fault detection using random testing. IEEE Trans. Software Eng., 38(5):1088â1099, 2012. doi:10.1109/TSE.2011.85.
- [7] Per Austrin, Ioana O Bercea, Mayank Goswami, Nutan Limaye, and Adarsh Srinivasan. Algorithms for the diverse--sat problem: the geometry of satisfying assignments. arXiv preprint arXiv:2408.03465, 2024. doi:10.48550/arXiv.2408.03465.
- [8] Nikhil Bansal, Kamal Jain, Anna Kazeykina, and Joseph Naor. Approximation algorithms for diversified search ranking. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Languages and Programming, 37th International Colloquium, ICALP 2010, Bordeaux, France, July 6-10, 2010, Proceedings, Part II, volume 6199 of Lecture Notes in Computer Science, pages 273â284. Springer, Springer, 2010. doi:10.1007/978-3-642-14162-1_23.
- [9] Julien Baste, Michael R. Fellows, Lars Jaffke, TomĂĄs MasarĂk, Mateus de Oliveira Oliveira, Geevarghese Philip, and Frances A. Rosamond. Diversity of solutions: An exploration through the lens of fixed-parameter tractability theory. Artif. Intell., 303:103644, 2022. doi:10.1016/j.artint.2021.103644.
- [10] Julien Baste, Lars Jaffke, TomĂĄs MasarĂk, Geevarghese Philip, and GĂŒnter Rote. FPT algorithms for diverse collections of hitting sets. Algorithms, 12(12):254, 2019. doi:10.3390/a12120254.
- [11] Sven Baumer and Rainer Schuler. Improving a probabilistic 3-SAT algorithm by dynamic search and independent clause pairs. In Enrico Giunchiglia and Armando Tacchella, editors, Theory and Applications of Satisfiability Testing, 6th International Conference, SAT 2003. Santa Margherita Ligure, Italy, May 5-8, 2003 Selected Revised Papers, volume 2919 of Lecture Notes in Computer Science, pages 150â161. Springer, Springer, 2003. doi:10.1007/978-3-540-24605-3_12.
- [12] Allan Borodin, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In Michael Benedikt, Markus Krötzsch, and Maurizio Lenzerini, editors, Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012, pages 155â166. ACM, 2012. doi:10.1145/2213556.2213580.
- [13] Chris Calabro. The exponential complexity of satisfiability problems. PhD thesis, University of California, San Diego, USA, 2009. URL: http://www.escholarship.org/uc/item/0pk5w64k.
- [14] Chris Calabro, Russell Impagliazzo, Valentine Kabanets, and Ramamohan Paturi. The complexity of unique k-SAT: An isolation lemma for k-CNFs. J. Comput. Syst. Sci., 74(3):386â393, 2008. doi:10.1016/j.jcss.2007.06.015.
- [15] Jean Cardinal, Jerri Nummenpalo, and Emo Welzl. Solving and sampling with many solutions: Satisfiability and other hard problems. In Daniel Lokshtanov and Naomi Nishimura, editors, 12th International Symposium on Parameterized and Exact Computation, IPEC 2017, September 6-8, 2017, Vienna, Austria, volume 89 of LIPIcs, pages 11:1â11:12. Schloss Dagstuhl â Leibniz-Zentrum fĂŒr Informatik, 2017. doi:10.4230/LIPIcs.IPEC.2017.11.
- [16] Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. An improved analysis of local search for max-sum diversification. Math. Oper. Res., 44(4):1494â1509, 2019. doi:10.1287/moor.2018.0982.
- [17] Zongchen Chen and Nitya Mani. From algorithms to connectivity and back: Finding a giant component in random k-sat. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 3437â3470. SIAM, SIAM, 2023. doi:10.1137/1.9781611977554.ch132.
- [18] Pierluigi Crescenzi and Gianluca Rossi. On the Hamming distance of constraint satisfaction problems. Theor. Comput. Sci., 288(1):85â100, 2002. doi:10.1016/S0304-3975(01)00146-3.
- [19] Uriel Feige, Abraham D. Flaxman, and Dan Vilenchik. On the diameter of the set of satisfying assignments in random satisfiable k-cnf formulas. SIAM J. Discret. Math., 25(2):736â749, 2011. doi:10.1137/090749323.
- [20] Fedor V. Fomin, Serge Gaspers, Daniel Lokshtanov, and Saket Saurabh. Exact algorithms via monotone local search. J. ACM, 66(2):8:1â8:23, March 2019. doi:10.1145/3284176.
- [21] Fedor V. Fomin and Petteri Kaski. Exact exponential algorithms. Commun. ACM, 56(3):80â88, 2013. doi:10.1145/2428556.2428575.
- [22] Jie Gao, Mayank Goswami, Karthik C. S., Meng-Tsung Tsai, Shih-Yu Tsai, and Hao-Tsung Yang. Obtaining approximately optimal and diverse solutions via dispersion. In Armando Castañeda and Francisco RodrĂguez-HenrĂquez, editors, LATIN 2022: Theoretical Informatics - 15th Latin American Symposium, Guanajuato, Mexico, November 7-11, 2022, Proceedings, volume 13568 of Lecture Notes in Computer Science, pages 222â239. Springer, Springer, 2022. doi:10.1007/978-3-031-20624-5_14.
- [23] Carla P. Gomes, Ashish Sabharwal, and Bart Selman. Near-uniform sampling of combinatorial spaces using XOR constraints. In Bernhard Schölkopf, John C. Platt, and Thomas Hofmann, editors, Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006, volume 19, pages 481â488. MIT Press, 2006. URL: https://proceedings.neurips.cc/paper/2006/hash/4110a1994471c595f7583ef1b74ba4cb-Abstract.html.
- [24] Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293â306, 1985. doi:10.1016/0304-3975(85)90224-5.
- [25] Elena Grigorescu, Tali Kaufman, and Madhu Sudan. Succinct representation of codes with applications to testing. SIAM J. Discret. Math., 26(4):1618â1634, 2012. doi:10.1137/100818364.
- [26] Mohit Gurumukhani, Ramamohan Paturi, Pavel Pudlåk, Michael E. Saks, and Navid Talebanfard. Local enumeration and majority lower bounds. CoRR, abs/2403.09134, 2024. doi:10.48550/arXiv.2403.09134.
- [27] Venkatesan Guruswami, Johan HĂ„stad, and Swastik Kopparty. On the list-decodability of random linear codes. In Leonard J. Schulman, editor, Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 409â416. ACM, 2010. doi:10.1145/1806689.1806747.
- [28] Thomas Dueholm Hansen, Haim Kaplan, Or Zamir, and Uri Zwick. Faster k-sat algorithms using biased-ppsz. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 578â589. ACM, 2019. doi:10.1145/3313276.3316359.
- [29] Emmanuel Hebrard, Brahim Hnich, Barry OâSullivan, and Toby Walsh. Finding diverse and similar solutions in constraint programming. In Manuela M. Veloso and Subbarao Kambhampati, editors, Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, July 9-13, 2005, Pittsburgh, Pennsylvania, USA, volume 5, pages 372â377. AAAI Press / The MIT Press, 2005. URL: http://www.aaai.org/Library/AAAI/2005/aaai05-059.php.
- [30] Timon Hertli. Breaking the PPSZ barrier for unique 3-sat. In Javier Esparza, Pierre Fraigniaud, Thore Husfeldt, and Elias Koutsoupias, editors, Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, volume 8572 of Lecture Notes in Computer Science, pages 600â611. Springer, Springer, 2014. doi:10.1007/978-3-662-43948-7_50.
- [31] Timon Hertli, Robin A. Moser, and Dominik Scheder. Improving PPSZ for 3-sat using critical variables. In Thomas Schwentick and Christoph DĂŒrr, editors, 28th International Symposium on Theoretical Aspects of Computer Science, STACS 2011, March 10-12, 2011, Dortmund, Germany, volume 9 of LIPIcs, pages 237â248. Schloss Dagstuhl â Leibniz-Zentrum fĂŒr Informatik, 2011. doi:10.4230/LIPIcs.STACS.2011.237.
- [32] Edward A. Hirsch. A fast deterministic algorithm for formulas that have many satisfying assignments. Log. J. IGPL, 6(1):59â71, 1998. doi:10.1093/jigpal/6.1.59.
- [33] Thomas Hofmeister, Uwe Schöning, Rainer Schuler, and Osamu Watanabe. A probabilistic 3-SAT algorithm further improved. In Helmut Alt and Afonso Ferreira, editors, STACS 2002, 19th Annual Symposium on Theoretical Aspects of Computer Science, Antibes - Juan les Pins, France, March 14-16, 2002, Proceedings, volume 2285 of Lecture Notes in Computer Science, pages 192â202. Springer, Springer, 2002. doi:10.1007/3-540-45841-7_15.
- [34] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci., 62(2):367â375, 2001. doi:10.1006/jcss.2000.1727.
- [35] Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Comput. Syst. Sci., 63(4):512â530, 2001. doi:10.1006/jcss.2001.1774.
- [36] Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. Composable core-sets for diversity and coverage maximization. In Richard Hull and Martin Grohe, editors, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODSâ14, Snowbird, UT, USA, June 22-27, 2014, pages 100â108. ACM, 2014. doi:10.1145/2594538.2594560.
- [37] Daniel Kane and Osamu Watanabe. A short implicant of a CNF formula with many satisfying assignments. Algorithmica, 76(4):1203â1223, 2016. doi:10.1007/s00453-016-0125-z.
- [38] Nathan Kitchen and Andreas Kuehlmann. Stimulus generation for constrained random simulation. In Georges G. E. Gielen, editor, 2007 International Conference on Computer-Aided Design, ICCAD 2007, San Jose, CA, USA, November 5-8, 2007, pages 258â265. IEEE, IEEE Computer Society, 2007. doi:10.1109/ICCAD.2007.4397275.
- [39] Sixue Liu. Chain, generalization of covering code, and deterministic algorithm for k-sat. In Ioannis Chatzigiannakis, Christos Kaklamanis, DĂĄniel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 88:1â88:13. Schloss Dagstuhl â Leibniz-Zentrum fĂŒr Informatik, Schloss Dagstuhl â Leibniz-Zentrum fĂŒr Informatik, 2018. doi:10.4230/LIPIcs.ICALP.2018.88.
- [40] Alexander Nadel. Generating diverse solutions in SAT. In Karem A. Sakallah and Laurent Simon, editors, Theory and Applications of Satisfiability Testing - SAT 2011 - 14th International Conference, SAT 2011, Ann Arbor, MI, USA, June 19-22, 2011. Proceedings, volume 6695 of Lecture Notes in Computer Science, pages 287â301. Springer, Springer, 2011. doi:10.1007/978-3-642-21581-0_23.
- [41] Ramamohan Paturi, Pavel PudlĂĄk, Michael E. Saks, and Francis Zane. An improved exponential-time algorithm for k-sat. J. ACM, 52(3):337â364, 2005. doi:10.1145/1066100.1066101.
- [42] Ramamohan Paturi, Pavel PudlĂĄk, and Francis Zane. Satisfiability coding lemma. Chic. J. Theor. Comput. Sci., 1999:566, 1999. URL: http://cjtcs.cs.uchicago.edu/articles/1999/11/contents.html.
- [43] Thierry Petit and Andrew C. Trapp. Enriching solutions to combinatorial problems via solution engineering. INFORMS J. Comput., 31(3):429â444, 2019. doi:10.1287/ijoc.2018.0855.
- [44] Quentin Plazar, Mathieu Acher, Gilles Perrouin, Xavier Devroey, and Maxime Cordy. Uniform sampling of SAT solutions for configurable systems: Are we there yet? In 12th IEEE Conference on Software Testing, Validation and Verification, ICST 2019, Xiâan, China, April 22-27, 2019, pages 240â251. IEEE, IEEE, 2019. doi:10.1109/ICST.2019.00032.
- [45] Ron M. Roth. Introduction to Coding Theory. Cambridge University Press, 2006.
- [46] Thomas J. Schaefer. The complexity of satisfiability problems. In Richard J. Lipton, Walter A. Burkhard, Walter J. Savitch, Emily P. Friedman, and Alfred V. Aho, editors, Proceedings of the 10th Annual ACM Symposium on Theory of Computing, May 1-3, 1978, San Diego, California, USA, pages 216â226. ACM, 1978. doi:10.1145/800133.804350.
- [47] Dominik Scheder. PPSZ for k 5: More is better. ACM Trans. Comput. Theory, 11(4):25:1â25:22, 2019. doi:10.1145/3349613.
- [48] Dominik Scheder. PPSZ is better than you think. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 205â216. IEEE, IEEE, 2021. doi:10.1109/FOCS52979.2021.00028.
- [49] Dominik Scheder and John P. Steinberger. PPSZ for general k-sat - making hertliâs analysis simpler and 3-sat faster. In Ryan OâDonnell, editor, 32nd Computational Complexity Conference, CCC 2017, July 6-9, 2017, Riga, Latvia, volume 79 of LIPIcs, pages 9:1â9:15. Schloss Dagstuhl â Leibniz-Zentrum fĂŒr Informatik, 2017. doi:10.4230/LIPIcs.CCC.2017.9.
- [50] Manuel Schmitt and Rolf Wanka. Exploiting independent subformulas: A faster approximation scheme for #k-sat. Inf. Process. Lett., 113(9):337â344, 2013. doi:10.1016/j.ipl.2013.02.013.
- [51] Uwe Schöning. A probabilistic algorithm for k -sat based on limited local search and restart. Algorithmica, 32(4):615â623, 2002. doi:10.1007/s00453-001-0094-7.
- [52] Virginia Vassilevska Williams. Some open problems in fine-grained complexity. SIGACT News, 49(4):29â35, December 2018. doi:10.1145/3300150.3300158.
- [53] Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. New bounds for matrix multiplication: from alpha to omega. In David P. Woodruff, editor, Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms, SODA 2024, Alexandria, VA, USA, January 7-10, 2024, pages 3792â3835. SIAM, SIAM, 2024. doi:10.1137/1.9781611977912.134.