On the Satisfiability of Random -SAT Formulas with -Wise Independent Clauses
Abstract
The problem of identifying the satisfiability threshold of random -SAT formulas has received a lot of attention during the last decades and has inspired the study of other threshold phenomena in random combinatorial structures. The classical assumption in this line of research is that, for a given set of Boolean variables, each clause is drawn uniformly at random among all sets of three literals from these variables, independently from other clauses. Here, we keep the uniform distribution of each clause, but deviate significantly from the independence assumption and consider richer families of probability distributions. For integer parameters , , and , we denote by the family of probability distributions that produce formulas with clauses, each selected uniformly at random from all sets of three literals from the variables, so that the clauses are -wise independent. Our aim is to make general statements about the satisfiability or unsatisfiability of formulas produced by distributions in for different values of the parameters , , and .
Our technical results are as follows: First, all probability distributions in with return unsatisfiable formulas with high probability. This result is tight. We show that there exists a probability distribution with so that a random formula drawn from is almost always satisfiable. In contrast, for , any probability distribution returns an unsatisfiable formula with high probability. This is our most surprising and technically involved result. Finally, for any integer , any probability distribution with returns a satisfiable formula with high probability.
Keywords and phrases:
Random -SAT, -wise independence, Random bipartite graphCopyright and License:
2012 ACM Subject Classification:
Theory of computation Generating random combinatorial structures ; Mathematics of computing Combinatoric problems ; Mathematics of computing Probability and statisticsFunding:
The research is supported by National Key R&D Program of China (2023YFA1009500) and “the Fundamental Research Funds for the Central Universities”.Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Satisfiability of propositional formulas (SAT) is one of the most renowned problems in theoretical computer science. It appeared in the first lists of NP-complete problems independently proposed by Cook and Levin [44], and is pivotal for many developments in modern complexity theory. Today, many lower bounds on the running time of algorithms rely on the Exponential Time Hypothesis for solving SAT [11, 18, 36, 37]. On the practical side, SAT solvers are frequently deployed in hardware circuit design, model checking, program verification, automated planning and scheduling, as well as in solving real-life instantiations of combinatorial optimization problems such as FCC spectrum auctions. Modern SAT solvers often find solutions to large industrial instances with thousands or even millions of variables despite the NP-hardness of the problem. However, there is still a large discrepancy between the performance of SAT solvers on those instances and theoretical average-case predictions, which have been studied in great depth under the line of research on random SAT.
Random SAT
A -CNF formula over variables is composed of OR-clauses, each containing exactly literals of different variables. In the most commonly studied random SAT model, a formula is generated uniformly at random from all possible -CNF formulas over variables and clauses. The most prominent theoretical question related to random SAT is to identify the satisfiability threshold such that is equal to when , and equal to when . It has been established [15] that -SAT has , and its phase transition window [10] is . For , the asymptotic -SAT threshold was shown to be as [17] (improving previous results from [3]), while for large enough the exact value of was determined in [20]. However, the question of identifying for small values of remains open. In particular, random -SAT has attracted a lot of attention. For the lower bound part, it has been shown in a series of papers [15, 12, 32, 1, 35, 39] that (the currently best known bound is due to [35, 39]). The upper bound part is studied by [27, 38, 21, 19]; the currently best known bound is due to [19]. The estimate was derived from numerical experiments [40] (see also [14, 41]).
A more recent line of work [29, 30, 31, 42] extends the standard model of random -SAT to non-uniform distributions. Their motivation comes from the empirical observation that, in practice, CNF formulas often have rather different frequencies/probabilities for the variables to appear in each clause (following a power-law distribution instead of a uniform one). Namely, Friedrich and Rothenberger [31] proposed a non-uniform random model, where the literals are selected independently at random in each clause of the random -CNF with and where probabilities may vary across different variables. They find satisfiability threshold of non-uniform random -SAT for certain regimes depending on . However, the non-uniform model of [31] does not capture the community biases/correlations (i.e., the fact that certain variables are more likely to appear together in a clause), which are often observed in practice [6]. This leads us to the question of whether it is possible to relax the strong independence assumption in the existing random SAT literature.
Relaxation of independence
We first observe that it does not make much sense to study distributions of SAT formulas with arbitrary correlations over the clauses. Indeed, by allowing correlation between several clauses, one may enforce that the random formula contains large fixed sub-formulas corresponding to NP-hard SAT variants. This would be at odds with our goal of studying average-case complexity. Therefore, we must keep a certain degree of independence in the distribution of instances. We propose to consider the relaxation of mutual independence over clauses in a random formula to -wise independence for a small constant . To keep the new model tractable, we focus on -SAT and uniform distribution of literals within each clause. I.e., we assume that (i) every -OR-clause of a random -CNF formula has three literals of three distinct variables drawn uniformly at random among all such triplets of literals and that (ii) given this marginal distribution of each clause , the distribution over the clause set in is only -wise independent instead of the mutually independent distribution in the standard model. This is a natural generalization that has been considered in a number of different settings but, to the best of our knowledge, not in the context of random SAT. Note that the smaller is, the bigger the set of possible distributions . Furthermore, for small values of , a -wise independent distribution can still capture a large class of dependencies among clauses but at the same time does not allow correlation between any -tuples of clauses. In mathematical terms, the family of discrete -wise independent distributions naturally appears when we map the set of distributions to the set of their low-degree moments. Specifically, if a distribution is supported on the -dimensional binary cube , then all its moments of degree up to can be described as . As low-degree moments (basically, the image of ) are extremely important in statistical analysis, it is equally important to study the kernel of the aforementioned mapping, which exactly corresponds to the family of -wise independent distributions. Let us provide additional justifications of our framework by discussing some of the theoretical work on random -SAT and on other settings with a similar -wise independence relaxation.
- Pseudo-randomness.
-
Historically, the -wise relaxation of independence has been actively used in the literature on derandomization and pseudo-randomness, as it allows to significantly reduce the amount of random bits needed to generate random objects. For example, Alon and Nussboim [5] consider random Erdős-Rényi graphs and examine the minimal degree of independence needed to achieve a variety of graph properties and statistics (such as connectivity, existence of perfect matchings, existence of Hamiltonian cycles, clique and chromatic numbers, etc.) that match those in the mutually independent case. Benjamini et al. [9] consider similar questions for monotone boolean functions. The motivation in [5] comes from the fact that there are efficient constructions of -wise independent distributions with “low degree of independence” (say ) that utilize only random bits, i.e., much fewer than the polynomial number of random bits required to generate mutually independent distributions. While some of this motivation can be applied to our setting of random -SAT, it is a conceptually different story. Indeed, the perspective of pseudo-random generation is through the lenses of “probability theory”, where one controls the distributions and can simply choose one that satisfies necessary conditions such as, e.g., -wise independence. On the other hand, our motivation stems from “statistics”, as our ideal model should have a reasonable fit to empirical observations. So, we would like to use as minimal assumptions as possible and study small (constant) degrees of independence.
- Refutability of -SAT.
-
While the research on lower bounds for random -SAT often comes up with certain simple heuristics that efficiently find a satisfying assignment (see the surveys by Achlioptas [2] and Flaxman [26]), it is extremely hard to find an efficient refutation of a unsatisfiable -SAT formula. Indeed, the common approach to refute a given SAT formula is proof in resolution. Chvatal and Szemeredi [16] first showed that a random -CNF formula with clauses (which is almost surely unsatisfiable) almost surely admits exponential size proof in resolution. Later, Ben-Sasson and Wigderson [8] derived similar result for much larger . On the positive side, [28] gave the first polynomial time algorithm via spectral techniques that almost surely111Refutation in this case is an algorithm with one-sided error: it always refutes the formula correctly by producing certain certificates, or says that the formula might be correct. refutes a random -SAT formula with clauses. The best known bound on is due to Feige and Ofek [25] who proved that, for a sufficiently large constant , random -SAT formulas with clauses can be almost surely refuted in polynomial time using another spectral graph algorithm. We note that a similar situation (extremely high probability of unsatisfiability for a random formula and inability to efficiently confirm it) is unlikely to happen in our -clause independent model for constant . Indeed, the main proof approach for dealing with arbitrary -wise independent distribution is to define a -wise statistic, which differentiates any satisfiable formula from a typical unsatisfiable one.
- Testing -wise independence.
-
The property of -wise independence of a distribution with components can be tested using many samples in polynomial time, when is a constant [4, 43]. This is a useful property to have, as it allows one to verify with only polynomially many instances of random -SAT, whether these instances conform to -wise independence or not.
- Robust mechanism design.
-
A recent line of work in robust mechanism design also considers families of -wise independent Bayesian priors in single and multi-unit auctions [13, 22, 33, 34]. Their motivation is similar to ours, as they also rely on the statistical point of view to justify the extension of the results for mutual independent priors typically assumed in Bayesian mechanism design to -wise independent ones.
1.1 Problem formulation
We consider random -CNF formulas with variables generated from a distribution over clauses, where the mutual independence assumption over clauses is relaxed to -wise independence. We use the term -clause independence to refer to such distributions. We denote such families of distributions by , where each has identical marginals uniformly distributed over all possible OR-clauses and those marginals are only assumed to be -wise independent in . We would like to understand the following question for small values of :
How does the satisfiability threshold of random -SAT formulas behave under any -clause independent distribution ?
As the distribution is not unique, there might be a large gap between lower and upper estimates of . To this end, we formally define the lower satisfiability threshold as an upper bound on , such that a random formula drawn from a distribution in with clauses has . Similarly, the upper satisfiability threshold is a lower bound on , such that the random formula with clauses has . What kind of bounds on upper and lower thresholds should we expect?
Reasonable expectations
The condition only says something about configurations of at most clauses and does not put any other restrictions on the random formula . As the degree of independence is a small constant, any argument that gives bounds on or can only rely on statistics of at most a constant number of clauses. Hence, it is rather likely that bounds on and come together with efficient procedures of, respectively, finding a satisfying assignment for a random formula , or certifying that is not satisfiable. Hence, given the prior work on random -SAT for , we get the following picture:
- Upper satisfiability threshold.
-
The best known result for refuting -CNF formulas efficiently is due to Feige and Ofek [25], who show how to do it only for a large number of clauses . Furthermore, for any smaller number of clauses , a random -CNF formula is likely to have only exponential in proof size for any unsatisfiability proof in resolution [8]. Hence, it is out of reach to aim for a better bound on than while relying only on -wise independence for some constant . In fact, the best known positive result on efficiently computable proofs of unsatisfiability in resolution is due to Beame et al. [7], who show that an ordered DLL algorithm executed on a random -SAT instance with clauses terminates in polynomial time.
- Lower satisfiability threshold.
-
As the proofs for the lower bounds on often establish simple procedures that find satisfying assignments with high probability, it is still possible that is of similar order as the lower bounds on for . Thus, the most ambitious result would be to show that for constant that increases with . A more modest goal is to aim for for a constant , where as .
1.2 Our results
We obtain the following bounds on the upper and lower satisfiability thresholds and for various values of .
Upper satisfiability thresholds
We first consider small degrees of independence, i.e., . In both cases, we show that , meaning that one needs almost all possible clauses in a -clause (as well as -clause) independent formula to ensure that it is unsatisfiable (see Theorem 4 and Theorem 7). The most nontrivial part is to construct the distribution with and . Our construction is based on “-XOR formulas” (i.e., OR-clauses that have either one or three literals that are satisfied by a randomly planted truth assignment), which aligns well with the intuition developed in previous work [24, 25]. The main technical difficulty is to ensure -clause independence by adding a small fraction of unsatisfiable instances and checking all -wise statistics.
Our most exciting and technically involved result (see Theorem 8) is our proof that , i.e., a random formula with clauses is unsatisfiable with large probability for any -clause independent distribution . It is worth noting that such a bound is much harder to get under the -wise independence assumption than in the case of a mutually independent distribution . Indeed, Feige and Ofek [25] describe a very simple refutation algorithm for that fixes a variable and considers all clauses containing or (there will be such clauses in expectation). Then, after deleting (or ), one can reduce the problem to the refutation of the respective random -CNF sub-instance, which can be easily verified in polynomial time and has a low satisfiability threshold of . This simple approach obviously fails for -clause independent distributions. We instead construct a bipartite multigraph between pairs of distinct literals on one side and all singleton literals on the other, in which every OR-clause in corresponds to three different edges. We then carefully examine the statistic that counts subgraphs in for a random . We find that the expected value of for random is only slightly larger than its absolute minimal value, while at the same time is significantly larger than its expectation when is satisfiable. Our argument bears certain similarities with the argument in [25], which also looked at intersections of two literals between pairs of clauses but used the -XOR principle and a differently constructed non-bipartite graph.
Lower satisfiability thresholds
We show (in Theorem 14) that for any . I.e., any -clause independent random formula is satisfiable with high probability if it contains at most clauses. The argument is simple: for any -clause independent distribution, we look at the -uniform hypergraph that corresponds to the variables of a random formula produced according to this distribution, and argue that this graph does not have Berge-cycles, with high probability. We also provide an informal justification that this bound is asymptotically tight, i.e., that . Specifically, we outline a plausible approach for constructing a -clause independent distribution with clauses such that most of its formulas are unsatisfiable. Our approach is built upon existing constructions of dense hyper-graphs with large girth. It is interesting to note that, in both the proof of the result and the approach for showing that , we only need to consider variables and can completely ignore the distribution over the literals.
2 Preliminaries
Let , , , be boolean variables. A literal is a boolean variable or the negation of it. For convenience, we usually represent a literal as a variable-sign pair, i.e., , where is the positive sign + if the literal is a boolean variable and the negative sign - if it is its negation. We define . An instance of the Satisfiability problem in conjunctive normal form (or SAT instance, for short) is a boolean formula over a subset of the variables which is a conjunction of disjunctive clauses, each clause containing literals with different variables. In a -SAT instance, every clause has exactly literals. Each SAT instance can be described as a multiset of clauses. We denote the size of an instance as .
Given the number of variables , let be the set of variables. Let be the set of all unordered triplets of variables and be the set of all possible clauses, i.e., all triplets with literals of different variables. We will usually omit the dependency of and on , when the value of is clear from the context. To refer to the variable (the set of variables) or the sign (the set of signs) of a literal (a clause ), we define operators and , so that if, e.g., , then and .
A truth assignment is a vector , where or at the coordinate corresponds to the “true” or “false” value of , respectively. A truth assignment is satisfying for an instance when each clause has at least one true literal under . An instance is satisfiable if there exists a satisfying assignment; otherwise, is unsatisfiable.
The random -SAT model assumes that the instance is drawn from a probability distribution over all possible -SAT instances with clauses and variables. The main object of study in such a model is the probability of such a random instance being satisfiable as a function of and .
2.1 Random -SAT without mutual independence
In the standard random -SAT model, each clause is drawn uniformly at random among all clauses in , independently from the other clauses. A constructive definition is given by Model 1. We denote the distribution over instances generated by Model 1 as .
Our main focus will be on -wise relaxations of the independent distribution.
Definition 1 (-clause independent random SAT).
A distribution for selecting a random SAT instance is -clause independent if the set of any fixed clauses is distributed uniformly at random over all -tuples of possible clauses, i.e.,
for every and . We denote the family of all -clause independent distributions with clauses over variables as .
We remark that, in contrast to the random -SAT model (Model 1), which defines a single distribution for given and , the family contains many different distributions.
By definition, the probability of any event that depends only on a subset of at most clauses in the -clause independent distribution is the same as for . I.e., the expectations of the indicator function are the same for and . Hence, by linearity of expectation, any statistic that involves only clauses must be the same for and , i.e.,
| (1) |
We would like to understand what are the largest/smallest possible number of clauses for a random -SAT formula to be almost surely satisfiable/unsatisfiable under any -clause independent distribution. Using to denote the event that the SAT instance is satisfiable, we define the following satisfiability “thresholds”.
Definition 2 (Upper satisfiability threshold).
The upper satisfiability threshold is defined as follows. For integer , is the minimum integer such that
Definition 3 (Lower satisfiability threshold).
The lower satisfiability threshold is defined as follows. For integer , is the maximum integer such that
Extending the line of research on the standard random -SAT model, we would like to have as tight estimates of and as possible.
3 Tight bounds for the upper satisfiability threshold
We begin with a technical warm up and prove asymptotically tight bounds on the upper satisfiability threshold of -clause independent random -SAT.
Theorem 4.
.
Theorem 4 follows by the next two lemmas. Lemma 5 provides an upper bound on . In the proof, we introduce the technique that we will use later in Section 5 to get a much more involved upper bound on .
Lemma 5.
For any with , .
Proof.
Let be the number of identical clause pairs in the instance . On the one hand, Equation (1) implies that the expectation of is the same when is drawn from a -clause independent distribution and . On the other hand, if an instance has a satisfying assignment, at least of possible clauses from must not appear in , which means that the value of is significantly higher than its expectation. These two observations will allow us to get the desired bound on the probability .
Let and define . We shall derive two lower bounds on the value the random variable can get: an unconditional lower bound (not far from ) and a lower bound on when is satisfiable (this will be significantly larger than ). We note that
| (2) |
We next derive , , and . We denote as the number of possible clauses and . First, we observe that
| (3) |
To derive the lower bounds and , we observe that any given clause type contributes pairs to , where is the number of type clauses in . I.e.,
| (4) |
The minimum of under the constraint is achieved when all the variables are equal, i.e., equal to . Thus, we get the following lower bound on :
| (5) |
To derive the lower bound on for a satisfiable formula , we note that at least a -fraction of the clause types are not present in . I.e., at least of the variables have value equal to in (4). Similarly to (5), the minimum value of is achieved when all the remaining variables are equal to each other, getting the value . That is,
| (6) |
We plug the bounds (3), (5), and (6) into (2) to get
After simple algebraic transformation, this is equivalent to the inequality , which implies that .
To lower-bound , we use a specific -clause independent distribution, which is depicted as Model 2. The next lemma proves the correctness of our construction. The proof is deferred to the full version.
Lemma 6.
Model 2 defines a -clause independent probability distribution that generates 3-SAT instances of size that are satisfiable with probability at least .
4 A tight lower bound for
Clearly, Lemma 5 also provides an upper bound on . The proof of the next statement follows by presenting a matching lower bound through Model 3.
Theorem 7.
.
We give an explicit construction (see Model 3) of a -clause independent distribution with variables and (an even number of) clauses, such that . Our construction in Model 3 follows the same pattern as in Model 2, but uses an additional step and minor modifications to ensure -clause independence.222The first and third block of Model 3 correspond to the two blocks of Model 2. The only difference in the first block is that the matching of is not to all the variable triplets in but only to of them. The proof is deferred to the full version.
5 An upper bound for
Our next result is rather surprising as it indicates that -wise independence allows for a steep decrease in the upper satisfiability threshold compared to - and -wise independence.
Theorem 8.
.
Proof.
We will prove the theorem by showing that for any positive integers and and any -clause independent probability distribution , it holds . The claim is obvious for . We will assume that and , and will show that for every -clause independent probability distribution . Note that for , the probability bound of follows by selecting uniformly at random a subset of clauses.
We will use a graph representation of -SAT instances defined as follows. Given a -SAT instance consisting of clauses over variables, the bipartite multi-graph has a node corresponding to each (unordered) pair of literals from different variables at the left node side and a node corresponding to each literal at the right node side . Hence, and . For every clause of , has the three edges between the node corresponding to the pair of literals and the node corresponding to literal for .
The main proof idea of Theorem 8 is to analyze the statistic defined as the number of distinct subgraphs in graph . Namely, we first derive an upper bound on the expectation (see Lemma 9) when is drawn from the -clause independent probability distribution . We then give two lower bounds on the value of the random variable by considering an underlying simple subgraph of . Note that the underlying simple subgraph corresponds to a smaller instance with distinct clauses, in which we remove all duplicated clauses in . As we show in Lemma 10, this does not significantly reduce the size of the instance. Our first lower bound (Lemma 11) on holds for any instance and is very close to the upper bound on the expectation of . On the other hand, when is satisfiable, we manage to give a significantly stronger lower bound on in Lemma 12. We then conclude the proof of Theorem 8 by relating all these upper and lower bounds with in Lemma 13. We defer the proofs of Lemmas 9, 10, 11, and 12 to the full version.
Lemma 9.
For any , it holds that .
Lemma 10.
.
Lemma 11.
Let be an instance with variables and distinct clauses. Then,
Lemma 12.
Let be a statisfiable instance with variables and distinct clauses. Then
Lemma 13.
For any , we have .
Proof.
6 Bounds on the lower satisfiability threshold
In this section, we present our upper bound on the lower satisfiability threshold and explain why we believe that it is the tight bound for every degree of independence.
Theorem 14.
For every integer , .
Proof.
We prove the theorem by showing that for every integers and , any -clause independent distribution satisfies .
Let be a -uniform hypergraph defined by an instance drawn from a -clause independent distribution , and , i.e., every node in represents a variable and every hyperegde in represents the variable triplet of a clause. We consider simple Berge-cycles (or, simply, cycles) of length . A cycle of length is defined as a set of distinct hyperedges such that pairs of consecutive edges share exactly one vertex ( and ) and all other pairs of hyperedges are disjoint. In a cycle of length , the two hyperedges and have at least two vertices in common. We similarly define a path of length , where pairs of consecutive edges share exactly one vertex ( for ) and all other pairs of hyperedges are disjoint. We observe that any unsatisfiable instance must contain a subgraph in such that every variable in appears in at least two hyperedges of , which in turn implies that has a cycle. That translates into the following upper bound on the probability that instance is not satisfiable:
| (7) |
In the above derivation, counts the number of distinct cycles of length in , and counts the number of distinct paths of lengths in . For each ordered tuple of hyperedges in (there are such orderings), we can easily calculate the probability that they form a path. Thus,
| (8) |
For each ordering of edges, the first edge can be chosen arbitrarily; the second edge has exactly one vertex in common with and the remaining two vertices are chosen from vertices; each of the remaining edges for has exactly one of the two vertices of different from and has two other vertices chosen from . The upper bound follows after simplifying each of the terms. We similarly derive an upper bound on for as follows:
where for each ordering of edges (now orderings correspond to the same cycle), the probabilities to choose the first edges can be computed in the same way as for ; for the last hyper-edge , there are ways to select a vertex of together with a vertex of , and choices for the new vertex in . Hence,
| (9) |
For we have
| (10) |
We conclude the proof by plugging estimates (8),(9), and (10) into (7).
The second inequality follows since .
6.1 On the tightness of the bound for
Theorem 14 says that -wise independence of is enough to guarantee satisfiability of a random formula for the number of clauses of order . On the other hand, for the mutually independent distribution a random formula is unsatisfiable with high probability for . Furthermore, our analysis in Theorem 14 is essentially tight and it seems unlikely that there is a better bound than . We discuss below why this is the case, by outlining a plausible way for constructing -clause independent distribution with and .
Informal outline of the construction
In our construction of the -clause independent distribution we are only concerned with the distribution of clauses over variables, as all literals will be assigned uniformly at random and independently. Then, a sufficient condition for a random formula to be unsatisfiable with large probability is that the -uniform hypergraph constructed in Theorem 14 has a dense subgraph, i.e., a subgraph on vertices (corresponding to variables in ) with at least hyperedges. Indeed, one can simply count the expected number of satisfying assignments of a random formula on variables and fixed clauses with randomly assigned literals: initially, all assignment are satisfying, but then, as we add clauses one-by-one, the expected number of satisfying assignments reduces each time exactly by a factor (each satisfying assignment disappears with probability ). At the end, we get that the expected number of satisfying assignments is , which means that is unsatisfiable most of the times.
Now, we want to make sure that our hypergraph often has such a “dense” subgraph , to get an unsatisfiable random formula. Notice that needs only have a constant number of vertices. Also, note that according to the analysis in Theorem 14, we need to avoid any cycles of length smaller than or equal to , as the probability of having such a cycle is of order in for and any -clause independent distribution . Luckily, there are many construction of such -uniform hypergraphs on vertices with large number of hyperedges , and also of large girth (e.g., see [23]). We shall “plant” (essentially insert as a connected component into ) with large probability in our construction. However, we need to make sure that by inserting into our graph , we still have enough room to match for all . Next, we give a high level idea how one could achieve this.
As usual for the construction of distributions with identical marginals, we symmetrize over all possible permutations of variables in . Then, our goal is to match the expected numbers for each isomorphism class of configurations of hyperedges in for with . It is useful to take note of the structure of hypergraph for when : it consists of connected components, each of which is a tree of size at most ; the number of connected components of size is a constant that grows slightly faster than linearly in and, more generally, the number of connected components of size is . There is also a probability event of having a Berge-cycle or a connected component of size at least in for . We can ignore in our construction of those events, by adding small probability mass to that consists of conditional on any of these rare events (can be easily achieved via rejection sampling from ). In this way, we only need to worry about matching expected numbers of forest configurations of size in between and . We can pick the constant in sufficiently large, so that the constant size subgraph contributes fewer trees of each type than their respective expected numbers in for . Then, we can add a few more connected components that are trees of size to , so that we match -wise statistics on all tree configurations with . By having and a few trees of size in the graph for , we only utilize a constant number of variables. Then, we would like to keep adding smaller connected components that consist of trees with strictly less than hyperedges and eventually match -wise statistics on all forest-like configurations of size .
It seems plausible that the approach outlined above should work and yield a -clause independent distribution . Apart from heavy notation that would be needed to formalize all steps in the above outline, the main technical hurdle is to ensure that statistics for all “forest-like” configurations of hyperedges with more than one connected component are perfectly matched with for . Note, however, that even if our approach fails and there is a stronger version of Theorem 14 with and for any -clause independent distribution , such a theorem would require a rather nontrivial argument that relies on subtle dependencies between multiple -wise statistics.
References
- [1] Dimitris Achlioptas. Setting 2 variables at a time yields a new lower bound for random 3-SAT. In Proceedings of the 32nd Annual ACM Symposium on Theory of Computing (STOC), pages 28–37, 2000. doi:10.1145/335305.335309.
- [2] Dimitris Achlioptas. Random satisfiabiliy. In Handbook of Satisfiability, volume 336 of Frontiers in Artificial Intelligence and Applications, pages 437–462. IOS Press, 2nd edition, 2021. doi:10.3233/FAIA200993.
- [3] Dimitris Achlioptas and Yuval Peres. The threshold for random k-sat is . Journal of the American Mathematical Society, 17:947–973, 2004. URL: http://www.jstor.org/stable/20161221.
- [4] Noga Alon, Alexandr Andoni, Tali Kaufman, Kevin Matulef, Ronitt Rubinfeld, and Ning Xie. Testing k-wise and almost k-wise independence. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC), pages 496–505, 2007. doi:10.1145/1250790.1250863.
- [5] Noga Alon and Asaf Nussboim. k-wise independent random graphs. In Proceedings of 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 813–822, 2008. doi:10.1109/FOCS.2008.61.
- [6] Carlos Ansótegui, Jesús Giráldez-Cru, and Jordi Levy. The community structure of SAT formulas. In Proceedings of 15th International Conference on Theory and Applications of Satisfiability Testing (SAT), pages 410–423, 2012. doi:10.1007/978-3-642-31612-8_31.
- [7] Paul Beame, Richard Karp, Toniann Pitassi, and Michael Saks. The efficiency of resolution and davis–putnam procedures. SIAM Journal on Computing, 31(4):1048–1075, April 2002. doi:10.1137/S0097539700369156.
- [8] Eli Ben-Sasson and Avi Wigderson. Short proofs are narrow—resolution made simple. Journal of the ACM, 48(2):149–169, 2001. doi:10.1145/375827.375835.
- [9] Itai Benjamini, Ori Gurel-Gurevich, and Ron Peled. On k-wise independent distributions and boolean functions. arXiv:math, abs/1201.3261, 2012. doi:10.48550/arXiv.1201.3261.
- [10] Béla Bollobás, Christian Borgs, Jennifer T. Chayes, Jeong Han Kim, and David Bruce Wilson. The scaling window of the 2-SAT transition. Random Structures and Algorithms, 18(3):201–256, 2001. doi:10.1002/RSA.1006.
- [11] Karl Bringmann. Why walking the dog takes time: Frechet distance has no strongly subquadratic algorithms unless SETH fails. In Proceedings of 55th IEEE Annual Symposium on Foundations of Computer Science, (FOCS), pages 661–670, 2014. doi:10.1109/FOCS.2014.76.
- [12] Andrei Z. Broder, Alan M. Frieze, and Eli Upfal. On the satisfiability and maximum satisfiability of random 3-CNF formulas. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 322–330, 1993. URL: http://dl.acm.org/citation.cfm?id=313559.313794.
- [13] Ioannis Caragiannis, Nick Gravin, Pinyan Lu, and Zihe Wang. Relaxing the independence assumption in sequential posted pricing, prophet inequality, and random bipartite matching. In Proceedings of 17th International Conference on Web and Internet Economics (WINE), pages 131–148, 2021. doi:10.1007/978-3-030-94676-0_8.
- [14] Peter C. Cheeseman, Bob Kanefsky, and William M. Taylor. Where the really hard problems are. In Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI), pages 331–340, 1991. URL: http://ijcai.org/Proceedings/91-1/Papers/052.pdf.
- [15] Vasek Chvátal and Bruce A. Reed. Mick gets some (the odds are on his side). In Proceedings of 33rd Annual Symposium on Foundations of Computer Science (FOCS), pages 620–627, 1992. doi:10.1109/SFCS.1992.267789.
- [16] Vašek Chvátal and Endre Szemerédi. Many hard examples for resolution. Journal of the ACM, 35(4):759–768, 1988. doi:10.1145/48014.48016.
- [17] Amin Coja-Oghlan. The asymptotic k-sat threshold. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC), pages 804–813, 2014. doi:10.1145/2591796.2591822.
- [18] Marek Cygan, Jesper Nederlof, Marcin Pilipczuk, Michal Pilipczuk, Johan M. M. van Rooij, and Jakub Onufry Wojtaszczyk. Solving connectivity problems parameterized by treewidth in single exponential time. ACM Transactions on Algorithms, 18(2):17:1–17:31, 2022. doi:10.1145/3506707.
- [19] Josep Díaz, Lefteris M. Kirousis, Dieter Mitsche, and Xavier Pérez-Giménez. On the satisfiability threshold of formulas with three literals per clause. Theoretical Computer Science, 410(30-32):2920–2934, 2009. doi:10.1016/J.TCS.2009.02.020.
- [20] Jian Ding, Allan Sly, and Nike Sun. Proof of the satisfiability conjecture for large . Annals of Mathematics, 196(1):1–388, 2022. doi:10.4007/annals.2022.196.1.1.
- [21] Olivier Dubois, Yacine Boufkhad, and Jacques Mandler. Typical random 3-SAT formulae and the satisfiability threshold. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 126–127, 2000. URL: http://dl.acm.org/citation.cfm?id=338219.338243.
- [22] Shaddin Dughmi, Yusuf Hakan Kalayci, and Neel Patel. Limitations of stochastic selection problems with pairwise independent priors. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC), pages 479–490, 2024. doi:10.1145/3618260.3649718.
- [23] David Ellis and Nathan Linial. On regular hypergraphs of high girth. The Electronic Journal of Combinatorics, 21(1):1, 2014. doi:10.37236/3851.
- [24] Uriel Feige. Relations between average case complexity and approximation complexity. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 534–543, 2002. doi:10.1145/509907.509985.
- [25] Uriel Feige and Eran Ofek. Easily refutable subformulas of large random 3cnf formulas. Theory of Computing, 3(1):25–43, 2007. doi:10.4086/TOC.2007.V003A002.
- [26] Abraham Flaxman. Random planted -sat. In Ming-Yang Kao, editor, Encyclopedia of Algorithms, pages 1728–1732. Springer New York, New York, NY, 2016. doi:10.1007/978-1-4939-2864-4_330.
- [27] John Franco and Marvin C. Paull. Probabilistic analysis of the davis putnam procedure for solving the satisfiability problem. Discrete Applied Mathematics, 5(1):77–87, 1983. doi:10.1016/0166-218X(87)90032-1.
- [28] Joel Friedman, Andreas Goerdt, and Michael Krivelevich. Recognizing more unsatisfiable random k-SAT instances efficiently. SIAM Journal on Computing, 35(2):408–430, 2005. doi:10.1137/S009753970444096X.
- [29] Tobias Friedrich, Anton Krohmer, Ralf Rothenberger, Thomas Sauerwald, and Andrew M. Sutton. Bounds on the satisfiability threshold for power law distributed random SAT. In Proceedings of 25th Annual European Symposium on Algorithms (ESA), pages 37:1–37:15, 2017. doi:10.4230/LIPICS.ESA.2017.37.
- [30] Tobias Friedrich and Ralf Rothenberger. Sharpness of the satisfiability threshold for non-uniform random k-SAT. In Proceedings of 21st International Conference on Theory and Applications of Satisfiability Testing (SAT), pages 273–291, 2018. doi:10.1007/978-3-319-94144-8_17.
- [31] Tobias Friedrich and Ralf Rothenberger. The satisfiability threshold for non-uniform random 2-SAT. In Proceedings of 46th International Colloquium on Automata, Languages, and Programming, (ICALP), pages 61:1–61:14, 2019. doi:10.4230/LIPICS.ICALP.2019.61.
- [32] Alan M. Frieze and Stephen Suen. Analysis of two simple heuristics on a random instance of k-SAT. Journal of Algorithms, 20(2):312–355, 1996. doi:10.1006/JAGM.1996.0016.
- [33] Nick Gravin and Zhiqi Wang. On robustness to k-wise independence of optimal bayesian mechanisms. In Proceedings of the 65th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 1275–1293, 2024. doi:10.1109/FOCS61266.2024.00084.
- [34] Anupam Gupta, Jinqiao Hu, Gregory Kehne, and Roie Levin. Pairwise-independent contention resolution. In Proceddings of 25th International Conference on Integer Programming and Combinatorial Optimization (IPCO), pages 196–209, 2024. doi:10.1007/978-3-031-59835-7_15.
- [35] Mohammad Taghi Hajiaghayi and Gregory B Sorkin. The satisfiability threshold of random 3-SAT is at least 3.52. arXiv:math, abs/0310193, 2003. doi:10.48550/arXiv.math/0310193.
- [36] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. Journal of Computer and System Sciences, 62(2):367–375, 2001. doi:10.1006/JCSS.2000.1727.
- [37] Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? Journal of Computer and System Sciences, 63(4):512–530, 2001. doi:10.1006/JCSS.2001.1774.
- [38] Anil Kamath, Rajeev Motwani, Krishna V. Palem, and Paul G. Spirakis. Tail bounds for occupancy and the satisfiability threshold conjecture. Random Structures and Algorithms, 7(1):59–80, 1995. doi:10.1002/RSA.3240070105.
- [39] Alexis C. Kaporis, Lefteris M. Kirousis, and Efthimios G. Lalas. The probabilistic analysis of a greedy satisfiability algorithm. Random Structures and Algorithms, 28(4):444–480, 2006. doi:10.1002/RSA.20104.
- [40] Marc Mezard, Giorgio Parisi, and Riccardo Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, 297(5582):812–815, 2002. doi:10.1126/science.1073287.
- [41] David G. Mitchell, Bart Selman, and Hector J. Levesque. Hard and easy distributions of SAT problems. In Proceedings of the 10th National Conference on Artificial Intelligence (AAAI), pages 459–465, 1992. URL: http://www.aaai.org/Library/AAAI/1992/aaai92-071.php.
- [42] Oleksii Omelchenko and Andrei A. Bulatov. Satisfiability threshold for power law random 2-sat in configuration model. Theoretical Computer Science, 888:70–94, 2021. doi:10.1016/J.TCS.2021.07.028.
- [43] Ronitt Rubinfeld and Ning Xie. Testing non-uniform k-wise independent distributions over product spaces. In Proceedings of 37th International Colloquium on Automata, Languages, and Programming (ICALP), pages 565–581, 2010. doi:10.1007/978-3-642-14165-2_48.
- [44] Michael Sipser. Introduction to the Theory of Computation. Cengage Learning, 3rd edition, 2013.
