Abstract 1 Introduction 2 Maximum pseudo-likelihood estimator: the proof of Theorem 1 3 Impossibility of learning: proofs of Theorems 2 and 3 References

One-Shot Learning for k-SAT

Andreas Galanis University of Oxford, UK Leslie Ann Goldberg University of Oxford, UK Xusheng Zhang University of Oxford, UK
Abstract

Consider a k-SAT formula Φ where every variable appears at most d times, and let σ be a satisfying assignment of Φ sampled proportionally to eβm(σ) where m(σ) is the number of variables set to true and β is a real parameter. Given Φ and σ, can we learn the value of β efficiently?

This problem falls into a recent line of works about single-sample (“one-shot”) learning of Markov random fields. The k-SAT setting we consider here was recently studied by Galanis, Kandiros, and Kalavasis (SODA’24) where they showed that single-sample learning is possible when roughly d2k/6.45 and impossible when d(k+1)2k1. Crucially, for their impossibility results they used the existence of unsatisfiable instances which, aside from the gap in d, left open the question of whether the feasibility threshold for one-shot learning is dictated by the satisfiability threshold of k-SAT formulas of bounded degree.

Our main contribution is to answer this question negatively. We show that one-shot learning for k-SAT is infeasible well below the satisfiability threshold; in fact, we obtain impossibility results for degrees d as low as k2 when β is sufficiently large, and bootstrap this to small values of β when d scales exponentially with k, via a probabilistic construction. On the positive side, we simplify the analysis of the learning algorithm and obtain significantly stronger bounds on d in terms of β. In particular, for the uniform case β0 that has been studied extensively in the sampling literature, our analysis shows that learning is possible under the condition d2k/2. This is nearly optimal (up to constant factors) in the sense that it is known that sampling a uniformly-distributed satisfying assignment is NP-hard for d2k/2.

Keywords and phrases:
Computational Learning Theory, k-SAT, Maximum likelihood estimation
Category:
Track A: Algorithms, Complexity and Games
Copyright and License:
[Uncaptioned image] © Andreas Galanis, Leslie Ann Goldberg, and Xusheng Zhang; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Maximum likelihood estimation
; Theory of computation Machine learning theory ; Theory of computation Problems, reductions and completeness
Related Version:
Full Version: https://arxiv.org/abs/2502.07135 [22]
Editors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

1 Introduction

footnotetext: For the purpose of Open Access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. All data is provided in full in the results section of this paper.

A key task that arises in statistical inference is to estimate the underlying parameters of a distribution, frequently based on the assumption that one has access to a sufficiently large number of independent and identically distributed (i.i.d.) samples. However, in many settings it is critical to perform the estimation with substantially fewer samples, driven by constraints in data availability, computational cost, or real-time decision-making requirements. In this paper, we consider the extreme setting where only a single sample is available and investigate the feasibility of parameter estimation in this case. We refer to this setting as “one-shot learning”.

Markov random fields (also known as undirected graphical models) are a canonical framework used to model high-dimensional distributions. The seminal work of [11] initiated the study of one-shot learning for the Ising and spin glass models, a significant class of Markov random fields that includes the well-known Sherrington-Kirkpatrick and Hopfield models. This approach was later explored in greater depth for the Ising model by [3] and subsequently extended to tensor or weighted variants of the Ising model in [25, 35, 15]. Beyond the Ising model, [17, 18] examined one-shot learning in more general settings, notably including logistic regression and higher-order spin systems, obtaining various algorithmic results in “soft-constrained” models, i.e., models where the distribution is supported on the entire state space. [4] showed that efficient parameter estimation using one sample is still possible under the presence of hard constraints which prohibit certain states, relaxing the soft-constrained assumption with “permissiveness”; canonical Markov random fields in this class include various combinatorial models such as the hardcore model (weighted independent sets). Notably, in all these cases, one-shot learning is always feasible with mild average-degree assumptions on the underlying graph (assuming of course access to an appropriate sample).

More recently, [23] investigated one-shot learning for hard-constrained models that are not permissive, focusing primarily on k-SAT and proper colourings; in contrast to soft-constrained models, they showed that one-shot learning is not always possible and investigated its feasibility under various conditions. Their results left however one important question open for k-SAT, in terms of identifying the “right” feasibility threshold. In particular, their impossibility results were based on the existence of unsatisfiable instances for k-SAT, suggesting that it might be the satisfiability threshold that is most relevant for one-shot learning. Here we refute this in a strong way. We show infeasibility well below the satisfiability threshold, and obtain positive results that align closely with the conjectured threshold for sampling satisfying assignments.

1.1 Definitions and Main Results

In the k-SAT model, we consider the state space Ωn:={𝖳𝖱𝖴𝖤,𝖥𝖠𝖫𝖲𝖤}n, where each element is an assignment to n Boolean variables. The support of the Markov random field is then restricted to the set of assignments that satisfy a given k-CNF formula. More precisely, we define Φn,k,d as the set of CNF formulas with n variables such that each clause has exactly k distinct variables and each variable appears in at most d clauses. For an assignment σΩn and a formula ΨΦn,k,d, we denote by σΨ the event that σ satisfies Ψ and we denote by m(σ) the number of variables that are assigned to 𝖳𝖱𝖴𝖤 in σ. (See Section 1.4 for further details.)

We study the weighted k-SAT model parametrized by β. For a fixed formula ΨΦn,k,d, the probability for each assignment σΩn is given by

PrΨ,β[σ]=eβm(σ) 1[σΨ]σΩneβm(σ) 1[σΨ], (1)

Let Ωn(Ψ):={σΩn:σΨ} be the support of PrΨ,β. When β=0, this distribution reduces to the uniform distribution over all satisfying assignments Ωn(Ψ). For general β0, it biases the distribution toward assignments with more 𝖳𝖱𝖴𝖤 if β>0 and biases toward those with more 𝖥𝖠𝖫𝖲𝖤 if β<0.

We consider the following one-shot learning task for β. The learner knows parameters d,k and a fixed formula ΨΦn,k,d. Additionally, the learner has access to a single sample σΩn(Ψ) drawn from distribution PrΨ,β[]. The learner also knows that β lies within a specified range |β|B, but it does not know the exact value of β. The goal is to estimate β using these inputs.

To quantify the accuracy of our estimate, we say that β^ is an ϵ-estimate if |ββ^|ϵ. Typically we want ϵ to decrease as n increases so that ϵ0 when n. In this case we call β^ a consistent estimator. On the other hand, if there exists a constant ϵ0>0 such that lim supn|β^β|ϵ0, then β^ is not a consistent estimator and we say β is not identifiable by β^. Finally, if β is not identifiable by any β^, we say it is impossible to estimate β.

Our main algorithmic result is a linear-time one-shot learning algorithm for β in the weighted k-SAT model.

Theorem 1.

Let B>0 be a real number. Let d,k3 be integers such that

d1e3k(1+eB)k2. (2)

There is an estimation algorithm which, for any β with |β|B, given any input ΦΦn,k,d and a sample from σPrΦ,β, outputs in O(n+log(nB)) time an O(n1/2)-estimate β^(σ) such that

PrΦ,β[|β^(σ)β|=O(n1/2)]=1eΩ(n).

Our results improve upon the conditions in [23], which ensure a consistent estimate under the requirement when d(1+eB)k/6.45. Based on the corresponding threshold for approximate sampling, the conjectured “true” threshold for d is of the order (1+eB)k2. Consequently, our improved condition in (2) is only off by a polynomial factor in k relative to this conjectured threshold.

For comparison with the approximate sampling threshold – commonly stated for the uniform k-SAT distribution – we specialize to B0. In that limit, our algorithmic result for single-sample learning holds roughly when d2k/2. The best currently known result for efficient sampling, due to [38], holds under the condition d2k/4.82, see also the series of works [33, 19, 30, 29]. It is conjectured that the sharp condition for efficient sampling is d2k/2, supported by matching hardness results for monotone formulas. It is known in particular that for d2k/2, no efficient sampling algorithm exists (unless 𝖭𝖯=𝖱𝖯).

To complement our algorithmic result, we also present impossibility results, suggesting that conditions like (2) are nearly sharp.

Theorem 2.

Let β be a real number such that |β|>1. Let k4 be an even integer, and let n be a multiple of k/2 that is large enough. If

dk3(1+ee|β|e)k2, (3)

then there exists a formula ΦΦn,k,d such that it is impossible to estimate β from a sample σPrΦ,β with high probability.

For the parameter d around the satisfiability threshold, specifically at the uniquely satisfiable threshold u(k) (see, e.g., [32] on the connection between these two thresholds), if

du(k)=Θ(2kk), (4)

there exists a formula ΦΦn,k,d such that it is impossible to estimate β from any number of samples σPrΦ,β because Ωn(Φ) is a deterministic set consisting of a single satisfying assignment that does not depend on β. [23] explicitly constructs such a formula Φ, though it requires an additional O(k2) factor relative to (4), representing the previous best known condition for the impossibility of estimation. Condition (3) in Theorem 2 not only relaxes (4) when |β| grows large, but it also features the correct k/2 exponent, matching that in both (2) and the conjectured threshold. Indeed, when B|β|, conditions (2) and (3) both take the form

(1+O(e|β|))k/2kO(1) (5)

These findings partially indicate that, at least for the k-SAT model, the sampling threshold is more relevant to one-shot learning than the satisfiability threshold.

In addition, we find that if we allow β to be proportional to k , then learning becomes impossible for a significantly larger range of d. Specifically, unlike condition (3), which requires d to be exponential in k, here we only need d to be quadratic in k, leading to a much sparser formula when k is large.

Theorem 3.

Let k4 be an even integer. For all β such that |β|kln2 the following holds. Let n be a multiple of k/2 that is large enough. If dk2/2, then there exists a formula ΦΦn,k,d such that it is impossible to estimate β from a sample σPrΦ,β with high probability.

 Remark 4.

In the regimes where Theorem 2 or Theorem 3 apply, the corresponding formula Φ ensures that there is a single assignment that is the output with all but exponentially-small probability, regardless of the value of β. Hence, the proof of this theorem guarantees that it is impossible to learn from exponentially many independent samples. Moreover, for any pair of (β1,β2) such that |β1|>|β2||β| and β1β20, no hypothesis testing for H0:β=β1 versus H1:β=β2 can be done to distinguish PrΦ,β1 from PrΦ,β2, that is, there exists no sequence of consistent test functions ϕn:Ωn{0,1} such that

limn𝔼σPrΦ,β1ϕn(σ)=0 and limn𝔼σPrΦ,β2ϕn(σ)=1.

1.2 Proof overview

We prove Theorem 1 by using the maximum pseudo-likelihood estimator. Establishing the consistency of this estimator – as stated in Theorem 5 – requires demonstrating that the log-pseudo-likelihood function is (strongly) concave; see (11) for the precise formulation. In the k-SAT setting, showing such concavity amounts to showing that with high probability over samples drawn from PrΦ,β[], each sample contains a linear number of “flippable variables”. We prove this property in Lemma 6 by applying the Lovász Local Lemma (LLL), which enables us to compare PrΦ,β[] to a suitable product distribution – under which the number of flippable variables is guaranteed to be linear. Notably, we apply the LLL directly to a non-local set of variables, in contrast to previous analyses that confined the application of the LLL to local neighborhoods – a restriction that typically imposes stronger constraints on the parameter regime. By circumventing these stronger constraints, our approach achieves its guarantee under the nearly optimal LLL condition.

The main technical novelty of this paper is our negative results. To explain these, we begin by outlining the proof of Theorem 3. For simplicity, we focus on the case where β0. On a high level, we construct a gadget formula Ψ0 for which the all-true assignment σ+ carries almost all of the probability mass under PrΨ0,β[] provided that βkln2. Consequently, a sample σPrΨ0,β[] drawn from this distribution is nearly deterministic, offering virtually no information about β and thus rendering learning impossible. The key property of this gadget is established in Lemma 8. Specifically, the lemma ensures that σ+ satisfies Ψ0 and that any other assignment σ with fewer than 2n/k variables set to 𝖥𝖠𝖫𝖲𝖤 is not a satisfying assignment of Ψ0. We achieve this by incorporating a cyclic structure over the variables that enforces global correlation among the 𝖥𝖠𝖫𝖲𝖤 values in the assignments. In particular, there are no flippable variables in σ+.

Towards the proof of Theorem 2, we first leverage the gadget Ψ0 to show the existence of a stronger gadget Ψ2, parametrized by b>1 in Lemma 13, which guarantees that the all-true assignment σ+ satisfies Ψ2 and any other assignment with fewer than n/b variables set to 𝖥𝖠𝖫𝖲𝖤 fails to satisfy Ψ2. Then we choose b appropriately in terms of β to make sure that σ+ carries nearly all of the probability mass, using some more technical estimates for the corresponding partition function. To build Ψ2, we take a finite number of replicas of Ψ0 on randomly permuted sets of variables. The existence of the desired formula is established using the probabilistic method; specifically, we demonstrate an upper bound on the expectation of m(σ) for any satisfying assignment σσ+, over the choice of the permutations.

1.3 Related work

Parameter Estimation in Markov Random Fields.

A large body of work has focused on parameter estimation under the one-shot learning paradigm (see, e.g., [11, 3, 17, 18, 25, 15, 35]), particularly for Ising-like models in statistical physics and for dependent regression models in statistics. In this work, we follow a similar approach by establishing the consistency of the maximum pseudo-likelihood estimator. Earlier studies (e.g., [26, 14, 13, 24]) have also explored parameter estimation in Markov random fields using the maximum likelihood estimator.

Before our work, the papers [4, 23] were the first to study one-shot learning in hard-constrained models. In particular, the hardcore model analysed in [4] can be viewed as a weighted monotone 2-SAT model, and one natural extension of the hardcore model to k-uniform hypergraphs corresponds to the class of weighted monotone k-SAT models – a special case of the weighted k-SAT models that we consider. Because a typical assignment in these monotone formulas possesses Ω(n) flippable variables, the pseudo-likelihood estimator remains consistent across all parameter regimes, and no phase transition is expected. The weighted k-SAT problem was analysed in [23], where the authors derived both a consistency condition and an impossibility condition, though a substantial gap remained between them. By tightening the bounds on both ends, our work considerably narrows this gap, nearly closing it entirely.

Related Works in Structural Learning/Testing.

An alternative direction in learning Markov Random Fields involves estimating the interaction matrix between variables – a question originally posed by [12]. For the Ising model, this problem has been extensively studied (see, e.g., [6, 37, 9] and the references therein), and subsequent work has extended the results to higher-order models [31, 28, 20]. Recent work [39, 15, 21] has also considered the joint learning of both structure and parameters. Moreover, [36] establishes the information-theoretic limits on what any learner can achieve, and similar analyses have been conducted for hardcore models [8, 7]. While some approaches in this line of work require multiple independent samples, as noted in [15], it is also possible to reduce learning with O(1) samples to a class of special cases within one-shot learning. Related problems in one-shot testing for Markov random fields have also been studied in [10, 16, 34, 2, 5].

1.4 𝒌-SAT Notation

We use standard notation for the k-SAT problem. A formula Φ=(V,C) denotes a CNF (Conjunctive Normal Form) formula with variables V={x1,,xn} and clauses C. We use σ(xi) and σi to denote the truth value of the variable xi under an assignment σ:V{𝖳𝖱𝖴𝖤,𝖥𝖠𝖫𝖲𝖤}. For any clause cC, var(c) denotes the set of variables appearing in c (negated or not). The degree of a variable xj in Φ is the number of clauses in which xj or ¬xj appears, namely |{cC:xjvar(c)}|. The degree of Φ is the maximum, over xjV, of the degree of xj in Φ. As noted in the introduction, m(σ) denotes the number of variables that are assigned to 𝖳𝖱𝖴𝖤 by an assignment σ. Namely, m(σ):=|{i[n]:σi=𝖳𝖱𝖴𝖤}|.

2 Maximum pseudo-likelihood estimator: the proof of Theorem 1

Section 2.1 introduces the fundamentals of the maximum pseudo-likelihood estimator and analyses its running time for solving the weighted k-SAT problem. In Sections 2.2 and 2.3, we establish the estimator’s consistency.

2.1 Overview of maximum (pseudo)-likelihood estimation

For a k-SAT formula Ψ and a satisfying assignment σ, we will use f(β;σ) to denote the quantity PrΨ,β(σ) from (1), i.e., f(β;σ)=eβm(σ)/Z(β;Ψ), where Z(β;Ψ) is the normalising constant of the distribution (the partition function).

A standard approach to parameter estimation is to find β^MLE(σ):=argmaxβf(β;σ), which is commonly referred to as the maximum likelihood estimate (MLE). However, two main obstacles arise when applying MLE directly to the weighted k-SAT problem. First, (approximately) computing Z(β;Ψ) is generally intractable because it is an NP-hard computation. Second, even if an approximation algorithm exists for computing β^MLE(σ), there is no guarantee of its consistency, i.e., there is no guarantee that with high probability it is close to β. Hence, we take a computationally more tractable variant of MLE from [1] which is called the maximum pseudo-likelihood estimation. Let fi(β;σ) be the conditional probability of σi, in a distribution with parameter β, conditioned on the value of σi:=(σj)ji. The maximum pseudo-likelihood estimate (MPLE) is defined as

β^MPLE(σ):=argmaxβiVfi(β;σ)=argmaxβiVlnfi(β;σ).

Here, the objective function F(β;σ):=iVlnfi(β;σ) is the so-called log-pseudo-likelihood function. For the weighted k-SAT problem, it is not hard to compute fi(β;σ) as

fi(β;σ)=eβσieβ𝟙[σi(σi𝖳𝖱𝖴𝖤)]+𝟙[σi(σi𝖥𝖠𝖫𝖲𝖤)],

where 𝟙[ω] is shorthand for 𝟙[ωΨ]. So we can write the log-pseudo-likelihood function for k-SAT as

F(β;σ) =iVln(eβσieβ𝟙[σi(σi𝖳𝖱𝖴𝖤)]+𝟙[σi(σi𝖥𝖠𝖫𝖲𝖤)]),
=βm(σ)iVln(eβ𝟙[σi(σi𝖳𝖱𝖴𝖤)]+𝟙[σi(σi𝖥𝖠𝖫𝖲𝖤)]).

For a fixed σ, F(;σ): is a function of β. By taking derivative with respect to β, we obtain

F(β;σ)β=m(σ)iVeβ𝟙[σi(σi𝖳𝖱𝖴𝖤)]eβ𝟙[σi(σi𝖳𝖱𝖴𝖤)]+𝟙[σi(σi𝖥𝖠𝖫𝖲𝖤)]. (6)

Clearly, F(β;σ)β is a decreasing function of β, which implies that F(β;σ) has a unique global maximum that is achieved when F(β;σ)β=0. Therefore, β^MPLE(σ) can be uniquely defined to be the maximum of F(β;σ).

Moreover, provided |β^MPLE(σ)|2B, an ϵ-close estimate of β^MPLE(σ) can be computed, using O(ln(B/ϵ)) steps of binary search for the solution to F(β;σ)β=0. At each step of the binary search, we evaluate F(β;σ)β and adjust the binary search interval based on its sign. A naive evaluation of F(β;σ)β as in (6) would require Θ(n) operations per step. We can reduce this by exploiting the fact that the summand

Si(β):=eβ𝟙[σi(σi𝖳𝖱𝖴𝖤)]eβ𝟙[σi(σi𝖳𝖱𝖴𝖤)]+𝟙[σi(σi𝖥𝖠𝖫𝖲𝖤)]

can take only three values {0,1,eβ/(1+eβ)}. Hence, by grouping the summands according to their values, we obtain

m(σ)iVSi(β)=m(σ)|{iV:Si(β)=1}|eβ1+eβ|{iV:Si(β)=eβ1+eβ}|. (7)

Crucially, the sets {iV:Si(β)=1} and {iV:Si(β)=eβ1+eβ} do not depend on β, and thus can be computed only once before the binary search. After this preprocessing, each evaluation of F(β;σ)β can be done in O(1) time using the decomposition in (7). Overall, to achieve an O(n1/2)-close estimate, the total running time is O(n+log(nB)).

2.2 Consistency of MPLE

In Section 2.1 we gave an algorithm with running time O(n+log(nB)), which takes as input a formula Ψ and an assignment σΩn(Ψ) and outputs an estimate β^(σ) which satisfies

|β^(σ)β^MPLE(σ)|=O(n1/2),

provided |β^MPLE(σ)|2B. Theorem 1 follows immediately from the Theorem 5, which demonstrates O(n)-consistency.

Theorem 5.

Let β be a real number, and let d,k3 be integers such that

d1e3k(1+e|β|)k2. (8)

For any integer n and ΦΦn,k,d,

PrΦ,β[|β^MPLE(σ)β|=O(n1/2)]=1eΩ(n). (9)

As in the standard literature [11], the consistency result can be proved using bounds on the derivatives of the log-pseudo-likelihood function f. In particular, following the analysis in [23], (9) is a consequence of these two bounds:

  1. 1.

    A uniform linear upper bound of the second moment of the first derivative of fF: for all β,

    𝔼Φ,β[(F(β;σ)β)2]kdn (10)
  2. 2.

    With high probability over σPrΦ,β, there is a uniform lower bound of the second derivative of F:

    PrΦ,β[infβR2F(β;σ)β2=Ω(n)]=1eΩ(n). (11)

Moreover, Lemma 7 in [23] proves the bound (10) for all βR and all d,k3, and they give a combinatorial expression (equations (9) and (10) in [23]) for 2F(β;σ)β2 that will be the useful for proving (11). To state this expression, we introduce the notion of flippable variables. We say a variable vi is flippable in σ if the assignment, obtained by flipping the value of variable vi while keeping the values of other variables in σ, is still a satisfying assignment, that is, (σi(¬σi))Ψ. We use evi(σ) to denote the indicator of the event that variable vi is flippable in a satisfying assignment σ. It was shown that

2F(β;σ)β2=eβ(eβ+1)2vVev(σ). (12)

Hence, proving (11) reduces to establishing a linear lower bound on the number of flippable variables. The main ingredient in the proof of our positive result is Lemma 6, which provides such a lower bound under the condition (8).

Lemma 6.

Let β be a real number, and let d,k3 be integers such that

d1e3k(1+e|β|)k2.

Then for a fixed Φ=(V,C)Φn,k,d,

PrΦ,β[vVev(σ)=Ω(n)]=1eΩ(n). (13)

Using Lemma 6 and the identity (12), we derive (11) under the condition (8), thereby completing the proof of Theorem 5. The proof of Lemma 6 will be presented in the next section.

2.3 Proof of Lemma 6: applying the Lovasz’s local lemma in a batch

The following Lovasz’s local lemma (LLL) from [27] will be useful for our proof of Lemma 6.

Lemma 7 (Lemma 26, [27]).

Let Φ=(V,C) be a CNF formula. Let μ be the product distribution on {𝖳𝖱𝖴𝖤,𝖥𝖠𝖫𝖲𝖤}|V|, where each variable is set to 𝖳𝖱𝖴𝖤 independently with probability eβ/(1+eβ). For each cC, let Ac be the event that clause c is unsatisfied. If there exists a sequence {xc}cC such that for each cC,

Prμ[Ac]xcjΓ(c)(1xj), (14)

where Γ(c)C is the set of clauses that contain a variable in var(c), then Φ has a satisfying assignment. Moreover, the distributions PrΦ,β and Prμ can be related as follows: for any event E that can be completely determined by the assignment of a set S of variables,

PrΦ,β[E]Prμ[E]jΓ(E)11xj, (15)

where Γ(E) denotes the set of all clauses that contain a variable in var(S).

In many previous works utilising the LLL for sampling purposes, Lemma 7, or a variant of it, is typically applied to local events, including, for instance, the event that a specific variable is flippable or, more generally, to events happening in the neighbourhood of a vertex. This is present in the approach of [23] which, in the end, imposed more strict conditions on d (relative to k) since it requires “marking” variables appropriately (see [33]). Here, we prove Lemma 6 by applying Lemma 7 directly to a batch R of random variables scattered around the graph in one go, removing the need for marking, and relaxing significantly the conditions on d. This simple idea enables our stronger result. In particular, we set xc=(d2k+1)1 and let E be the event {i=1Revi(σ)<R3}, where R=Ω(n) and v1,,vR are well-separated variables in the hypergraph corresponding to Φ. We refer the reader to the full paper for a detailed proof of this lemma.

3 Impossibility of learning: proofs of Theorems 2 and 3

For the construction of the impossibility instances, we begin with a “gadget” that will serve as a building block.

Lemma 8.

Let k4 be an even integer and let nk be a multiple of k/2. If dk2/2, then there is a formula Ψ0Φn,k,d such that if an assignment σ of Ψ0 is satisfying then either

  1. 1.

    σ has n 𝖳𝖱𝖴𝖤s, or

  2. 2.

    σ has at least 2n/k 𝖥𝖠𝖫𝖲𝖤s.

Similarly (by symmetry), there is a formula Ψ1Φn,k,d such that, for any satisfying assignment σ of Ψ1, either σ has n 𝖥𝖠𝖫𝖲𝖤s, or σ has at least 2n/k 𝖳𝖱𝖴𝖤s.

The instance Ψ0 and Ψ1 will lead to a proof of Theorem 3. For clarity, we denote the assignment to n variables with n 𝖳𝖱𝖴𝖤s as σ+.

Proof of Theorem 3.

Assume βkln2 without loss of generality. Let Ψ0 be the formula given by Lemma 8 (when βkln2 we use Ψ1 instead). Suppose σ is drawn from PrΨ0,β, and we directly estimate the probability of {σ=σ+}: since σ has at most 2n possibilities and on event {σσ+} the number of 𝖳𝖱𝖴𝖤s is at most n2nk, we have

PrΨ0,β[σ=σ+]eβneβn+2neβ(n2n/k)2kn2kn+2n+k(n2n/k)=11+2n.

As the samples from PrΨ0,β are insensitive to β with high probability, learning β from σ is impossible.

We now present a detailed description of the formula that defines Ψ0 in Lemma 8. Let k3 be an even integer and let nk be a multiple of k/2. Let N={0,,n1}. The variables of Ψ0 are {xiiN}.

For the construction, it will be helpful to group variables in batches of k/2 variables in a cyclic manner. Specifically, for i, consider the i-th batch of indices Ξi={i+j(modn)j{0,,k/21}} and let Ci={xΞi} be the corresponding variables in the i-th batch. We now introduce two types of length-k/2 clauses Wi, and Πi that will be used to form the final length-k clauses. Specifically, for each in the batch Ξi, Wi, is the length-(k/2) clause with variable set Ci in which x appears positively and all other variables are negated. Let Πi be the length-(k/2) clause with variable set Ci in which all variables are negated. Finally,

Ψ0:=iN,Ξi(Wi,Πi+k/2).

(so Ψ0 is the formula with variable set {xiiN} and clause set {Wi,Πi+k/2iN,Ξi}). The formula Ψ1 is obtained from Ψ0 by negating all of the literals.

Note that Ψ0,Ψ1Φn,d,k for every integer dk2/2, since for each jN, the literals xj and ¬xj occur (together) k2/2 times. To see this, note that for each jN, xj or ¬xj comes up in Wi,Πi+k/2 for all i{jk1(modn),,j} and all Ξi so this is k different i’s and k/2 different ’s. In the proof of Lemma 9 we will demonstrate that Ψ0 (and analogously, Ψ1) satisfies the requirements of Lemma 8.

Lemma 9.

Let k4 be an even integer and let nk be a multiple of k/2. If σσ+ satisfies Ψ0 then, for all l{0,1,,2nk1}, σ assigns at least one variable in Clk/2C(l+1)k/2 to 𝖥𝖠𝖫𝖲𝖤 and σ has at least 2nk 𝖥𝖠𝖫𝖲𝖤s in total.

Proof of Lemma 9.

We will use the following claim as a key step of the proof.

Claim 10.

If σσ+ satisfies Ψ0, and σ assigns one variable xa in Ck/2 to 𝖥𝖠𝖫𝖲𝖤, then either

  1. 1.

    σ assigns a variable xb in C(+1)k/2 to 𝖥𝖠𝖫𝖲𝖤, or

  2. 2.

    All variables in C(+1)k/2 are assigned to 𝖳𝖱𝖴𝖤 in σ, and there exist variables xcC(+2)k/2 and xdCk/2{xa} that are assigned to 𝖥𝖠𝖫𝖲𝖤 in σ.

Proof of the Claim.

Suppose we are not in the first case, so we will show that σ assigns 𝖥𝖠𝖫𝖲𝖤 to xcC(+2)k/2 and xdCk/2{xa}. Since σ satisfies Ψ0, it satisfies at least one of Wk/2,a and Π(+1)k/2. Since variables in C(+1)k/2 are all assigned to 𝖳𝖱𝖴𝖤 by the assumption that we are not in the first case, σ does not satisfy Π(+1)k/2. If σ satisfies Wk/2,a then it assigns a variable xd to 𝖥𝖠𝖫𝖲𝖤 where dΞk/2{a}. Also, the set {jΞk/2:σ(xj)=𝖥𝖠𝖫𝖲𝖤} is not empty. Let j=max{jΞk/2:σ(xj)=𝖥𝖠𝖫𝖲𝖤}. Note that σ does not satisfy Wj,j, so for σ to satisfy Wj,jΠj+(k/2), it must satisfy Πj+(k/2), which means σ assigns a variable xcCj+(k/2) to 𝖥𝖠𝖫𝖲𝖤. Since xcC(+1)k/2 and Cj+(k/2)C(+1)k/2C(+2)k/2, we establish that xcC(+2)k/2. This concludes the proof of the claim.

We now show how to use the claim to prove the lemma. Fix an assignment σσ+ that satisfies Ψ0. Fix an index j(0) so that xj(0) is assigned 𝖥𝖠𝖫𝖲𝖤 by σ. By symmetry of Ψ0, we could assume j(0)Ξ0. Let (0)=0 and G(0)={xj(0)}. Consider three sequences {j(t)}t0, {(t)}t0 and {G(t)}t0 defined recursively as follows. For every positive integer t, applying the claim to a=j(t)Ξ(t)k/2, in the first case we let (t+1)=(t)+1, j(t+1)=bΞ((t)+1)k/2 and G(t+1)=G(t){xb}; in the latter case we let (t+1)=(t)+2, j(t+1)=cΞ((t)+2)k/2 and G(t+1)=G(t){xc,xd}. By induction on t (with base case t=0) we conclude that (i) j(t)Ξ(t)k/2 is assigned to 𝖥𝖠𝖫𝖲𝖤, (ii) σ assigns all variables in G(t) to 𝖥𝖠𝖫𝖲𝖤, and (iii) (t)+2(t+1)(t)+1 for t<T, where T:=min{t>0:(t)=2n/k1 or 2n/k} is a stopping time of {j(t),(t),G(t)}t0. By construction, for all l{0,1,,2nk1}, G(T)(Clk/2C(l+1)k/2) is not empty. Hence we have proved the first part of the lemma.

Next we will show that |G(T)|2nk. For this, observe that for t<T, |G(t+1)|=|G(t)|+(t+1)(t). Since |G(0)|=1, (0)=0 and (T)2nk1, it holds that |G(T)|2nk.

Proof of Lemma 8.

In the case of Ψ0, first note that for all iN, σ+ satisfies all instances of Wi,Πi+k/2, so σ+ satisfies Ψ0. Also, if σσ+, then the lemma follows immediately from Lemma 9. The case of Ψ1 holds analogously.

The names of the indices of the variables of Ψ0 are not very important, and when we generalise the construction in Lemma 13 it will be useful to consider an arbitrary permutation of them. Here is the notation that we will use. Let π be any permutation of N={0,,n1}. We use the notation π(i) to denote the element in N that i is mapped to by π. We will construct a formula Ψ0π. Taking id to be the identity permutation on N, the formula Ψ0 that was already defined is Ψ0id.

For i, let Ξiπ={π(i+j(modn))j{0,,k/21}} and let Ciπ={xΞiπ}. For Ξiπ, let Wi,π be the length-(k/2) clause with variable set Ciπ in which x appears positively and all other variables are negated. Let Πiπ be the length-(k/2) clause with variable set Ciπ in which all variables are negated. Then Ψ0π:=iN,Ξi(Wi,πΠi+k/2π). The proof that Ψ0πΦn,d,k for any dk2/2 is exactly the same as the case π=id. The formula Ψ1π is obtained from Ψ0π by negating all of the literals.

The following corollary follows immediately from the proof of Lemma 9 (by renaming the indices using π) and the fact that C0,Ck/2,,C2n/k1 are disjoint sets.

Corollary 11.

Let k4 be an even integer and let nk be a multiple of k/2. Let π be a permutation of N. If σσ+ satisfies Ψ0π then there exists a subset π(σ){0,1,,2nk1} of size at least nk such that for all π(σ), σ assigns at least one variable in Ck/2π to 𝖥𝖠𝖫𝖲𝖤.

 Remark 12.

While Corollary 11 does not guarantee the uniqueness of π(σ), in what follows we define π(σ) to be the lexicographically smallest set among all the smallest sets satisfying the corollary. Since there are finitely many such sets and they can be lexicographically ordered, π(σ) is a unique and well-defined set for given π and σ.

Lemma 8 provides formula Ψ0 with a GAP property applying to the number of 𝖥𝖠𝖫𝖲𝖤s in its satisfying assignments. In the next lemma, we use Ψ0 to build a larger formula Ψ2 that amplifies the GAP property to make the gap arbitrarily large.

Lemma 13.

Let k4 be an even integer, let b>1 be a real number, and let d be an integer satisfying

dk3(11b)k/2.

Let nk be a sufficiently large multiple of k/2. Then there is a formula Ψ2Φn,k,d such that if an assignment σ satisfies Ψ2 then either

  1. 1.

    σ has n 𝖳𝖱𝖴𝖤s, or

  2. 2.

    σ has at least n/b 𝖥𝖠𝖫𝖲𝖤s.

Similarly (by symmetry), there is a formula Ψ3Φn,k,d such that, for any satisfying assignment σ of Ψ3, either σ has n 𝖥𝖠𝖫𝖲𝖤s, or σ has at least n/b 𝖳𝖱𝖴𝖤s.

Specifically, the set of clauses in Ψ2 is of the form

πΓ{Wi,πΠi+k/2πiN,Ξiπ},

where Γ is a set of approximately 2kb(11b)k/2 independently-chosen permutations of N. We use the probabilistic method to show that, with positive probability over the choice of Γ, every satisfying assignment σσ+ of Ψ2 has at least n/b variables assigned to 𝖥𝖠𝖫𝖲𝖤. To facilitate this counting argument, we apply Corollary 11. For a complete proof of this lemma, we direct the reader to the full paper.

We present the following more general result, from which Theorem 2 follows as a corollary. To prove Theorem 2, we apply Lemma 13 to show that the distribution PrΨ2,β is sharply concentrated at a single point, σ+. This parallels the approach used in Theorem 3, but requires more delicate estimates. The proof of this theorem is also provided in the full version of the paper.

Theorem 14.

For all β0, let α=α(|β|) be any real in (0,1) satisfying

|β|+α1αlnα+ln(1α)>0. (16)

Let k4 be an even integer and let β such that |β||β|. Let n be a multiple of k/2 that is large enough. If

dk3αk2, (17)

then there exists a formula ΦΦn,k,d such that it is impossible to estimate β from a sample σPrΦ,β with high probability.

Finally, we prove Theorem 2 as a special case of Theorem 14.

Proof.

We give an explicit α:(1,)(0,1) such that α(|β|) satisfies (16) for any |β|>1:

α(β)=11eβ1=eβeeβ. (18)

Thus, we obtain the explicit lower bound (3) of d by plugging (18) to (17).

Next we verify that the choice of α in (18) satisfies (16). As in the proof of Theorem 14, we define f(α):=α1αlnαln(1α). To verify (16), it suffices to show that f(α(β))<β for all β>1. After some algebraic simplifications, we arrive at

f(α(β)) =[eβeeβ1e1βln(eβeeβ)+ln1eβ1]
=[(eβ11)(ln(eβe)β)+1β]
=ln(eβe)1+βeβ1eβ1ln(eβe)
=β+ln(1e1β)1+βeβ1eβ1[β+ln(1e1β)]
=β+ln(1e1β)1eβ1ln(1e1β)
=β1+(1eβ1)ln(1e1β)

Notice for all 0<x<1, by Taylor’s expansion,

(11x)ln(1x)=(11x)n=1xnn=1+n=1(1n+11n)xn=1n=1xnn(n+1)<1.

Hence, by setting x=e1β, we have shown f(α(β))<β.

References

  • [1] Julian Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), 36(2):192–236, 1974.
  • [2] Ivona Bezáková, Antonio Blanca, Zongchen Chen, Daniel Štefankovič, and Eric Vigoda. Lower bounds for testing graphical models: colorings and antiferromagnetic Ising models. J. Mach. Learn. Res., 21(1), 2020. URL: https://jmlr.org/papers/v21/19-580.html.
  • [3] BHASWAR B. BHATTACHARYA and SUMIT MUKHERJEE. Inference in Ising models. Bernoulli, 24(1):493–525, 2018.
  • [4] Bhaswar B. Bhattacharya and Kavita Ramanan. Parameter estimation for undirected graphical models with hard constraints. IEEE Transactions on Information Theory, 67(10):6790–6809, 2021. doi:10.1109/TIT.2021.3094404.
  • [5] Antonio Blanca, Zongchen Chen, Daniel Stefankovič, and Eric Vigoda. Hardness of identity testing for restricted boltzmann machines and potts models. Journal of Machine Learning Research, 22(152):1–56, 2021. URL: https://jmlr.org/papers/v22/20-1162.html.
  • [6] Guy Bresler. Efficiently learning Ising models on arbitrary graphs. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, pages 771–782, 2015. doi:10.1145/2746539.2746631.
  • [7] Guy Bresler, David Gamarnik, and Devavrat Shah. Hardness of parameter estimation in graphical models. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’14, pages 1062–1070, 2014. URL: https://proceedings.neurips.cc/paper/2014/hash/26dd0dbc6e3f4c8043749885523d6a25-Abstract.html.
  • [8] Guy Bresler, David Gamarnik, and Devavrat Shah. Structure learning of antiferromagnetic Ising models. In Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
  • [9] Guy Bresler, David Gamarnik, and Devavrat Shah. Learning graphical models from the glauber dynamics. IEEE Transactions on Information Theory, 64(6):4072–4080, 2018. doi:10.1109/TIT.2017.2713828.
  • [10] Guy Bresler and Dheeraj Nagaraj. Optimal single sample tests for structured versus unstructured network data. In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 1657–1690, 2018. URL: http://proceedings.mlr.press/v75/bresler18a.html.
  • [11] Sourav Chatterjee. Estimation in spin glasses: A first step. The Annals of Statistics, 35(5):1931–1946, 2007.
  • [12] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467, 1968. doi:10.1109/TIT.1968.1054142.
  • [13] Francis Comets. On consistency of a class of estimators for exponential families of markov random fields on the lattice. The Annals of Statistics, 20(1):455–468, 1992.
  • [14] Francis Comets and Basilis Gidas. Asymptotics of Maximum Likelihood Estimators for the Curie-Weiss Model. The Annals of Statistics, 19(2):557–578, 1991. doi:10.1214/aos/1176348111.
  • [15] Yuval Dagan, Constantinos Daskalakis, Nishanth Dikkala, and Anthimos Vardis Kandiros. Learning Ising models from one or multiple samples. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 161–168, 2021. doi:10.1145/3406325.3451074.
  • [16] Constantinos Daskalakis, Nishanth Dikkala, and Gautam Kamath. Testing Ising models. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’18, pages 1989–2007, 2018. doi:10.1137/1.9781611975031.130.
  • [17] Constantinos Daskalakis, Nishanth Dikkala, and Ioannis Panageas. Regression from dependent observations. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pages 881–889, 2019. doi:10.1145/3313276.3316362.
  • [18] Constantinos Daskalakis, Nishanth Dikkala, and Ioannis Panageas. Logistic regression with peer-group effects via inference in higher-order ising models. In AISTATS, pages 3653–3663, 2020. URL: http://proceedings.mlr.press/v108/daskalakis20a.html.
  • [19] Weiming Feng, Heng Guo, Yitong Yin, and Chihao Zhang. Fast sampling and counting k-sat solutions in the local lemma regime. J. ACM, 68(6), October 2021. doi:10.1145/3469832.
  • [20] Jason Gaitonde, Ankur Moitra, and Elchanan Mossel. Bypassing the noisy parity barrier: Learning higher-order markov random fields from dynamics, 2024. arXiv:2409.05284.
  • [21] Jason Gaitonde and Elchanan Mossel. A unified approach to learning Ising models: Beyond independence and bounded width. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 503–514. Association for Computing Machinery, 2024. doi:10.1145/3618260.3649674.
  • [22] Andreas Galanis, Leslie Ann Goldberg, and Xusheng Zhang. One-shot learning for k-sat, 2025. doi:10.48550/arXiv.2502.07135.
  • [23] Andreas Galanis, Alkis Kalavasis, and Anthimos Vardis Kandiros. Learning hard-constrained models with one sample. Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3184–3196, 2024.
  • [24] Charles J. Geyer and Elizabeth A. Thompson. Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society. Series B (Methodological), 54(3):657–699, 1992.
  • [25] Promit Ghosal and Sumit Mukherjee. Joint estimation of parameters in Ising model. The Annals of Statistics, 48(2):785–810, 2020.
  • [26] B. Gidas. Consistency of maximum likelihood and pseudo-likelihood estimators for gibbs distributions. In Wendell Fleming and Pierre-Louis Lions, editors, Stochastic Differential Systems, Stochastic Control Theory and Applications, pages 129–145, 1988.
  • [27] Heng Guo, Mark Jerrum, and Jingcheng Liu. Uniform sampling through the lovász local lemma. J. ACM, 66(3), April 2019. doi:10.1145/3310131.
  • [28] Linus Hamilton, Frederic Koehler, and Ankur Moitra. Information theoretic properties of Markov random fields, and their algorithmic applications. In Advances in Neural Information Processing Systems, volume 30, 2017.
  • [29] Kun He, Chunyang Wang, and Yitong Yin. Deterministic counting lovász local lemma beyond linear programming. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3388–3425, 2023. doi:10.1137/1.9781611977554.CH130.
  • [30] Vishesh Jain, Huy Tuan Pham, and Thuy Duong Vuong. Towards the sampling lovász local lemma. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 173–183, 2022. doi:10.1109/FOCS52979.2021.00025.
  • [31] Adam Klivans and Raghu Meka. Learning graphical models using multiplicative weights. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 343–354, 2017. doi:10.1109/FOCS.2017.39.
  • [32] William Matthews and Ramamohan Paturi. Uniquely satisfiable k-sat instances with almost minimal occurrences of each variable. In Ofer Strichman and Stefan Szeider, editors, Theory and Applications of Satisfiability Testing – SAT 2010, pages 369–374, 2010. doi:10.1007/978-3-642-14186-7_34.
  • [33] Ankur Moitra. Approximate counting, the lovász local lemma, and inference in graphical models. J. ACM, 66(2), April 2019. doi:10.1145/3268930.
  • [34] Rajarshi Mukherjee, Sumit Mukherjee, and Ming Yuan. Global testing against sparse alternatives under Ising models. The Annals of Statistics, 46(5):2062–2093, 2018.
  • [35] Somabha Mukherjee, Jaesung Son, and Bhaswar B Bhattacharya. Estimation in tensor Ising models. Information and Inference: A Journal of the IMA, 11(4):1457–1500, 2022.
  • [36] Narayana P. Santhanam and Martin J. Wainwright. Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory, 58(7):4117–4134, 2012. doi:10.1109/TIT.2012.2191659.
  • [37] Marc Vuffray, Sidhant Misra, Andrey Y. Lokhov, and Michael Chertkov. Interaction screening: efficient and sample-optimal learning of ising models. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 2603–2611, 2016.
  • [38] Chunyang Wang and Yitong Yin. A sampling Lovász local lemma for large domain sizes. In 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS), pages 129–150, 2024. doi:10.1109/FOCS61266.2024.00019.
  • [39] Huanyu Zhang, Gautam Kamath, Janardhan Kulkarni, and Zhiwei Steven Wu. Privately learning Markov random fields. In Proceedings of the 37th International Conference on Machine Learning, ICML’20, 2020.