Abstract 1 Introduction 2 Overview of the proof 3 Interpolating the average energy between SK and the diluted model 4 Interpolating the average overlap between SK and the diluted model 5 Disorder chaos for sparse models References

Hardness of Sampling for the Anti-Ferromagnetic Ising Model on Random Graphs

Neng Huang ORCID University of Michigan, Ann Arbor, MI, USA Will Perkins ORCID Georgia Institute of Technology, Atlanta, GA, USA Aaron Potechin ORCID University of Chicago, IL, USA
Abstract

We prove a hardness of sampling result for the anti-ferromagnetic Ising model on random graphs of average degree d for large constant d, proving that when the normalized inverse temperature satisfies β>1 (asymptotically corresponding to the condensation threshold), then w.h.p. over the random graph there is no stable sampling algorithm that can output a sample close in W2 distance to the Gibbs measure. The results also apply to a fixed-magnetization version of the model, showing that there are no stable sampling algorithms for low but positive temperature max and min bisection distributions. These results show a gap in the tractability of search and sampling problems: while there are efficient algorithms to find near optimizers, stable sampling algorithms cannot access the Gibbs distribution concentrated on such solutions.

Our techniques involve extensions of the interpolation technique relating behavior of the mean field Sherrington-Kirkpatrick model to behavior of Ising models on random graphs of average degree d for large d. While previous interpolation arguments compared the free energies of the two models, our argument compares the average energies and average overlaps in the two models.

Keywords and phrases:
Random graph, spin glass, sampling algorithm
Funding:
Neng Huang: Work was primarily done when NH was affiliated with the University of Chicago, supported partly by the Institute for Data, Econometrics, Algorithms, and Learning (IDEAL) with NSF grant ECCS-2216912.
Will Perkins: Supported in part by NSF grant CCF-2309708.
Copyright and License:
[Uncaptioned image] © Neng Huang, Will Perkins, and Aaron Potechin; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Probability and statistics
; Theory of computation Randomness, geometry and discrete structures
Related Version:
Full Version: https://arxiv.org/abs/2409.03974
Editors:
Raghu Meka

1 Introduction

The study of disordered systems is at the intersection of statistical physics, probability theory, and computer science. Models of disordered systems, such as the Edwards–Anderson model, Sherrington–Kirkpatrick model, random constraint satisfaction problems, and combinatorial optimization problems on random networks, have been studied from many different perspectives, including as toy models of complex physical systems and as a source of random computational problems on which to test algorithms and understand the sources of algorithmic intractability.

In the algorithmic context, disordered systems present at least two natural algorithmic challenges: the first is the search problem, that of finding a high quality solution to an optimization problem or finding a near ground state in the language of statistical physics; the second is the sampling problem, that of (approximately) sampling from a Gibbs measure that weights solutions exponentially by their objective value.

One striking phenomenon that has been discovered is a gap between the tractability of search and sampling. That is, for certain choices of parameters in some models of disordered systems, the search problem is tractable (there exist polynomial-time or even near-linear time algorithms to find high quality solutions), while no efficient sampling algorithms are known, and in fact broad classes of sampling algorithms can be proved to be ineffective. So while high quality solutions can be found efficiently, these solutions are not typical, in the sense that the distribution of solutions found by a given search algorithm is far from the equilibrium Gibbs measure.

A prime example of this phenomenon occurs in the Sherrington–Kirkpatrick (SK) model [36], an Ising model on the complete graph on n vertices in which the (n2) coupling constants are independent Gaussian random variables (see e.g [32] for a mathematical survey of the model). Here, after appropriate normalization, the key parameter is β, the inverse temperature of the model. Parisi famously predicted a formula for the limiting free energy of the SK model as a function of β [33, 34]; this formula is non-analytic at β=1, indicating the existence of the phase transition at this point. The Parisi formula was proved rigorously by Guerra [23] and Talagrand [39], with methods that have proved very influential in mathematics, physics, and computer science in the past two decades. Along with the Parisi formula comes information about the SK Gibbs measure itself; in particular, Parisi identified the “average overlap” of two independent samples from the Gibbs measure as an order parameter for the phase transition [35] and developed the theory of a hierarchy of “replica symmetry breaking”. Thus the equilibrium picture of the SK model is fairly well understood, with the phase transition identified at β=1.

Turning to the algorithmic search problem, the task is to find a solution σ{±1}n that (approximately) maximizes the probability mass function of the SK model; that is, the task of finding a ground state or near ground state of the model. Montanari [30], partly inspired by the work of Subag on spherical spin glasses [38], proposed an algorithm for this task based on approximate message passing, which finds an approximate optimizer under the widely believed assumption that the SK model has “no overlap gap” (sometimes also referred to as “full replica symmetry breaking”) for large enough β. This approach was later extended to mixed p-spin models [3], which can be thought of as the generalization of the Ising model to hypergraphs.

For the sampling problem, however, the picture is less optimistic. It was predicted by physicists that the simple Glauber dynamics should converge fast as long as β<1 [37], but progress on a rigorous proof of this statement has only come recently. A series of recent works [7, 22, 18, 5] showed that Glauber dynamics does indeed converge quickly if β<1/4 (more specifically, in O(nlogn) steps). A more recent work [6] has further extended the range of rapid convergence to values beyond β=1/4. Using a different approach based on stochastic localization, El Alaoui, Montanari, and Sellke [4] gave a different sampling algorithm that approximately samples from the SK Gibbs measure for β<1/2, with the approximation in terms of the Wasserstein distance instead of the total variation distance guarantees given by previous works. This result was later improved to the entire replica-symmetric phase β<1 by Celentano [11]. On the other hand, obstacles seem to arise after the phase transition at β=1. El Alaoui, Montanari, and Sellke showed that when β>1, the onset of disorder chaos naturally obstructs stable sampling algorithms [4]. Similar ideas were used recently by El Alaoui and Gamarnik to rule out stable sampling algorithms for the symmetric binary perceptron problem [1].

One can ask whether this gap in tractability for the search and sampling problems is universal for certain classes of disordered systems. The SK model is an example of a mean-field model: all spins (variables) interact with each other. Mean-field models have the advantage of being mathematically tractable to some degree, but the disadvantage of being unrealistic from the physics perspective as well as the perspective of large real-world networks. Seeking a trade-off between these two aspects, physicists have studied diluted mean-field models: that is, statistical physics models on random graphs or hypergraphs of constant average degree [29]. These models inherit some of the symmetries of mean-field models while also having some non-trivial local geometry. The study of diluted mean field models in physics led to rich and surprising predictions for the behavior of optimization problems on random graphs and random constraint satisfaction problems [27, 25, 20]. Rigorously proving these predictions is a major task in mathematics and theoretical computer science.

The specific diluted mean-field model we address here is the anti-ferromagnetic Ising model on sparse random graphs. The Hamiltonian of this model is the number of edges in the random graph cut by the given configuration, and the ground states correspond to the maximum cuts in the graph. Finding a maximum cut in a graph is one of the best-known constraint satisfaction problems, and due to this connection this model has been studied extensively in computer science as well as statistical physics. It is known that as the average degree of the random graph increases, many aspects of this model can be understood via the SK model. Dembo, Montanari, and Sen showed that the typical size of a maximum cut in both Erdős-Rényi random graphs and random regular graphs can be related to the ground state energy of SK model [21]. They proved this by interpolating the free energy between the two models (described in more detail below). El Alaoui, Montanari, and Sellke then gave an algorithm for finding a near maximum cut in a random regular graph by adapting Montanari’s message-passing algorithm for the SK model [2]. Similar results have also been shown for the more general mixed p-spin models and their diluted counterparts [24, 13].

It is natural to ask if this connection between the anti-ferromagnetic Ising model and the SK model can be extended further. In particular, does the search vs. sampling phenomenon also arise in the anti-ferromagnetic Ising model on random graphs? In this paper, we give an affirmative answer to this question.

1.1 Main Results

Before stating our main results we introduce some necessary definitions and notation. We refer to an element in {1,1}n as a configuration. Given a graph G and its adjacency matrix AG, we define the Ising Hamiltonian HG(σ)=1ijn(AG)ijσ(i)σ(j). Note that |E(G)|+HG(σ)2 is exactly the number of edges in the graph whose endpoints are assigned the same spin by σ. In particular, a ground state (i.e., a configuration that minimizes the Hamiltonian) of HG corresponds to a maximum cut in the graph. For any inverse temperature parameter β, HG induces the Gibbs distribution

μβ,G(σ)=exp(βH(σ))ZG(β). (1.1)

where

ZG(β)=σ{1,1}nexp(βH(σ)) (1.2)

is the partition function, the normalizing constant of the Gibbs distribution.

In this paper, as is common in prior works, the random graph model we consider will be the Poissonized random multigraph model 𝔾n,dPo, where we first sample the number of edges from a Poisson distribution with mean dn/2 (so that the expected average degree is d), and then for each edge sample both of its endpoints independently from [n]={1,2,,n}.

Given any two measures μ1,μ2 over {1,1}n, we define their normalized Wasserstein distance to be

W2,n(μ1,μ2)=infπΓ(μ1,μ2)(𝐄(σ1,σ2)π[1ni=1n(σ1(i)σ2(i))2])1/2, (1.3)

where Γ(μ1,μ2) is the set of all couplings of μ1 and μ2; that is, measures over {1,1}n×{1,1}n whose marginal on the first argument is μ1 and the second μ2.

Our main result is a hardness of sampling theorem in terms of the W2,n distance for the measure μβ,G against stable sampling algorithms where G𝔾n,dPo. Informally, a sampling algorithm is stable if its output is insensitive to small perturbations of the input. See Definition 12 for a formal definition.

Theorem 1.

Let {𝖠𝖫𝖦n}n1 be a family of randomized sampling algorithms where 𝖠𝖫𝖦n takes as input an n-vertex (multi-)graph G and an inverse temperature parameter β and produces an output law μβ,G𝖠𝖫𝖦 over {1,1}n. For every β>1 and d sufficiently large, if {𝖠𝖫𝖦n}n1 is stable at inverse temperature parameter β/d, then

lim infn𝐄G𝔾n,dPo[W2,n(μβ/d,G𝖠𝖫𝖦,μβ/d,G)]>0. (1.4)

That is, stable algorithms cannot sample from the model with vanishing error in Wasserstein distance.

It is known that the replica symmetry breaking threshold for 𝔾n,dPo occurs at the Kesten-Stigum bound β(d)=12logd+1d1111Unless otherwise specified, all logarithms in this paper are natural logarithms. [31, 19], so the threshold β=1 corresponds exactly to the normalized limit limddβ(d)=1. Note that β=1 is also the replica symmetry breaking threshold of the Sherrington-Kirkpatrick model.

As it turns out, the same proof for Theorem 1 yields sampling hardness for near-maximum and near-minimum bisections as well. Let An={σ{1,1}n:|i=1nσ(i)|1} be the set of configurations in which the numbers of +1s and 1s differ by at most one, which we refer to as bisections. The Gibbs distribution μβ,G when restricted to An gives

μβ,G𝖻𝗂𝗌(σ)=exp(βH(σ))ZG𝖻𝗂𝗌(β). (1.5)

where

ZG𝖻𝗂𝗌(β)=σAnexp(βH(σ)) (1.6)

is the bisection partition function. Note that μβ,G𝖻𝗂𝗌(σ) prefers bisections that cut more edges when β>0, and bisections with fewer edges when β<0. This model is also known as the zero-magnetization Ising model and has been studied both in the statistical physics literature on disordered systems [28] and recently in computer science [10, 8, 26].

Theorem 2.

Let {𝖠𝖫𝖦n}n1 be a family of randomized sampling algorithms where 𝖠𝖫𝖦n takes as input an n-vertex (multi-)graph G and an inverse temperature parameter β and produces an output law μβ,G𝖠𝖫𝖦 over {1,1}n. For every β such that |β|>1 and d sufficiently large, if {𝖠𝖫𝖦n}n1 is stable at inverse temperature parameter β/d, then

lim infn𝐄G𝔾n,dPo[W2,n(μβ/d,G𝖠𝖫𝖦,μβ/d,G𝖻𝗂𝗌)]>0. (1.7)

Theorems 1 and 2 exhibit a gap between search and sampling for the maximum cut and max/minimum bisection problem: while we have a search algorithm known under the widely believed “No Overlap Gap” conjecture [2] 222[2] stated the result for random regular graphs, but it can be transferred to other graph models using e.g. the argument in [13]., no stable algorithm can sample from the Gibbs distribution with arbitrary precision in the Wasserstein metric. Put another way, the algorithm of [2] finds solutions of value (1ϵ)OPT for any fixed ϵ>0; Theorems 1 and 2 rule out stable algorithms that sample solutions of these values approximately uniformly, since a standard reduction of counting to sampling (e.g [9]) reduces the task of sampling from the Gibbs distribution to sampling uniformly from solutions of a given value.

1.2 Techniques

To show that sampling from the anti-ferromagnetic Ising Gibbs distribution on random graphs is hard for stable sampling algorithms, we use the framework that was used by [4] to show that sampling from the Sherrington-Kirkpatrick Gibbs distribution is hard. The key quantity in this approach is the overlap between two configurations σ1,σ2, defined by

R1,2(σ1,σ2):=1ni=1nσ1(i)σ2(i).

The framework consists of the following two steps:

  1. 1.

    Show that w.h.p. over the disorder, the average of the squared overlap R1,2(σ1,σ2)2 with respect to two independent samples σ1 and σ2 from the Gibbs distribution is at least some positive constant independent of n. In particular, this is expected to hold when the Gibbs distribution exhibits replica symmetry breaking, which explains the threshold β=1.

  2. 2.

    Show that the Gibbs distribution exhibits disorder chaos. That is, if we perturb the input (the random coupling constants in the case of the SK model or the random graph in the case of the Ising model) slightly (see Section 2 for a formal definition of this perturbation) and then take a sample σ1 from the Gibbs distribution for the original system and a sample σ2 from the Gibbs distribution for the perturbed system, with high probability R1,2(σ1,σ2) is very close to zero.

Combining these two steps and using a connection between R1,2 and W2,n distance (see Lemma 15), we have that for arbitrarily small perturbations, the W2,n distance between the original Gibbs distribution and the perturbed Gibbs distribution is at least some positive constant. However, for stable sampling algorithms, if we make the perturbation sufficiently small then the W2,n distance between the old and new output distributions can also be arbitrarily small. This implies that the output distribution of any stable sampling algorithm will have strictly positive W2,n distance from the target Gibbs distribution.

In order to carry out this framework for sparse random graph models, we prove new structural properties of the Gibbs measures μβ/d,G and μβ/d,G𝖻𝗂𝗌 and connect them to those of the Gibbs measures μβ,𝐠 and μβ,𝐠𝖻𝗂𝗌 of the SK model (see Section 2 for the formal definition of μβ,𝐠 and μβ,𝐠𝖻𝗂𝗌). This further extends the close connection between the behavior of the SK model and the sparse random graph model. Previously, it was shown by Dembo, Montanari, and Sen [21] that the free energy of the sparse model converges to that of the SK model as the average degree d increases to infinity. We first extend this to the average Hamiltonian of a sample drawn from the Gibbs distribution, showing that this quantity also converges in the same manner, regardless of the bisection restriction. This is done by using convexity of the free energy and a well-known observation that the average Hamiltonian can be obtained by taking derivative of the free energy (see (3.1) and (3.2)). The main difficulty here is establishing that the limiting free energy is the same regardless of the bisection restriction, the details of which can be found in the full version of this paper.

Proposition 3.
  1. (a)

    For every β0, we have

    limdlim supn1n|𝐄𝐠[𝐄σμβ,𝐠[HSK(σ)]]1d𝐄G[𝐄σμβ/d,G[HG(σ)]]|=0. (1.8)
  2. (b)

    For every β, we have

    limdlim supn1n|𝐄𝐠[𝐄σμβ,𝐠𝖻𝗂𝗌[HSK(σ)]]1d𝐄G[𝐄σμβ/d,G𝖻𝗂𝗌[HG(σ)]]|=0. (1.9)

Our main technical contribution is the following proposition which says that as the average degree d goes to infinity, the expected average squared overlap between two independent samples from the Gibbs measure μβ/d,G converges to those sampled from the Gibbs measure μβ,𝐠 of the SK model, again regardless of the bisection restriction. This gives the first ingredient in the sampling hardness framework. For the SK Hamiltonian, there is a known connection between the average Hamiltonian and the average squared overlap, which can be established using the Gaussian integration by parts trick (see e.g. [32] for more details). In this paper, we show that one can use the Stein-Chen identity for the Poisson distribution (see Lemma 24) in place of the Gaussian integration by parts trick to establish a similar connection for the sparse model in the large degree limit.

Proposition 4.
  1. (a)

    For any β0,

    limdlim supn|𝐄𝐠[𝐄σ1,σ2μβ,𝐠[R1,2(σ1,σ2)2]]𝐄G[𝐄σ1,σ2μβ/d,G[R1,2(σ1,σ2)2]]|=0. (1.10)
  2. (b)

    For any β,

    limdlim supn|𝐄𝐠[𝐄σ1,σ2μβ,𝐠𝖻𝗂𝗌[R1,2(σ1,σ2)2]]𝐄G[𝐄σ1,σ2μβ/d,G𝖻𝗂𝗌[R1,2(σ1,σ2)2]]|=0. (1.11)

Finally, to obtain disorder chaos for the sparse model which is the second component in the sampling hardness framework, we use the fact that two configurations sampled from the coupled system for the SK model have nontrivial overlap with only exponentially small probability (see Theorem 29). This causes a gap in free energy between the coupled system and uncoupled system, which can then be transferred to the sparse model using the interpolation result by [17, 16]. We can then translate this gap back to a disorder chaos statement.

1.3 Organization

The paper is organized as follows. In Section 2 we set up the notation and give a detailed overview of the proof. In Section 3 and Section 4, we establish the correspondence of average energy and average overlap between the two models. Finally in Section 5, we transfer the disorder chaos property from the SK model to the sparse random graph models.

2 Overview of the proof

Gibbs distributions

For σ{1,1}n,𝐗n×n, we define the Hamiltonian

H(σ;𝐗):=i,j=1n𝐗ijσ(i)σ(j).

For any β, it induces the following Gibbs distribution on {1,+1}n:

μβ,𝐗(σ)=exp(βH(σ;𝐗))Z(β,𝐗),

where Z(β,𝐗)=σ{1,1}nexp(βH(σ;𝐗)) is the normalizing factor commonly referred to as the partition function. β is commonly referred to as the inverse temperature parameter in statistical physics. We also consider the restriction of the above distribution to bisections σAn, with

μβ,𝐗𝖻𝗂𝗌(σ)=exp(βH(σ;𝐗))Z𝖻𝗂𝗌(β,𝐗),Z𝖻𝗂𝗌(β,𝐗)=σAnexp(βH(σ;𝐗)).

We refer to such restricted versions as bisection models. In statistical physics literature, this is sometimes also called the model with zero magnetization and the version without the bisection restriction is called the model with non-fixed magnetization. Here, the term magnetization refers to the quantity m(σ)=i=1nσ(i)/n. We sometimes also write μ(σ;β,𝐗) and μ𝖻𝗂𝗌(σ;β,𝐗) instead of μβ,𝐗(σ) and μβ,𝐗𝖻𝗂𝗌(σ) if we wish to stress the dependence on β and 𝐗.

𝐗 is a random matrix drawn from some distribution and in this paper we consider two important cases. In the first case, 𝐗=𝐠, where 𝐠ijN(0,1/2n) independently. This is known as the Sherrington-Kirkpatrick model. In the second case we have 𝐗=𝐀, where each entry 𝐀ij is an independent Po(d/2n) random variable for some parameter d>0. This case corresponds to sparse random (multi-)graphs with average degree d, where the vertex set is [n]={1,2,,n} and the multiplicity of the edge {i,j} is 𝐀ij+𝐀ji if ij and 𝐀ii if i=j (note that here 𝐀 is not the adjacency matrix of the graph and can be asymmetric). Any configuration σ can be thought of as the indicator vector for a cut where the vertices assigned 1 by σ is on one side and those assigned 1 is on the other side, and H(σ;𝐀) is equal to the difference between the number of edges crossing the cut and the number of edges not crossing the cut. When β>0, the Gibbs measure prefers those configurations that cut more edges, and this corresponds to the Maximum Cut problem for the non-fixed magnetization model and the Maximum Bisection problem for the zero-magnetization model. This case is known as the anti-ferromagnetic Ising model in the statistical physic literature. When β<0, the non-fixed magnetization model corresponds to the Minimum Cut problem while the zero-magnetization model corresponds to the Minimum Bisection problem. This case is known as the ferromagnetic Ising model in the statistical physic literature.

For notational convenience, we sometimes drop the random matrix and write HSK(σ):=H(σ;𝐠) and Hd(σ):=H(σ;𝐀). We sometimes refer to these two models as dense or sparse models respectively.

Gibbs average

For any function f:{1,1}n, we can define its average with respect to μ or μ𝖻𝗂𝗌:

f(σ)β,𝐗=σ{1,1}nf(σ)μβ,𝐗(σ),f(σ)β,𝐗𝖻𝗂𝗌=σAnf(σ)μβ,𝐗𝖻𝗂𝗌(σ).

We can extend this definition to functions f:{1,1}n×k which take multiple configurations as inputs, in which case we assume that the configurations are sampled independently from the Gibbs distribution:

f(σ1,,σk)β,𝐗=σ1,,σk{1,1}nμβ,𝐗(σ1)μβ,𝐗(σk)f(σ1,,σk),

and the average over μ𝖻𝗂𝗌 is defined similarly. We sometimes drop the subscripts if they are clear from context. One function of particular interest to us is the overlap between two configurations, defined as

R1,2(σ1,σ2)=1ni=1nσ1(i)σ2(i).

R1,2 gives the normalized inner product between two configurations. If |R1,2(σ1,σ2)|β,𝐗 is close to zero, then two configurations independently sampled from the Gibbs distribution μβ,𝐗(σ) are nearly orthogonal.

Free energy

The free energy for these models is defined as follows.

Φn,SK(β)=1n𝐄𝐠[logZ(β,𝐠)],Φn,d(β)=1n𝐄𝐀[logZ(β,𝐀)].

We can also define free energy in a similar way when restricted to bisections.

Φn,SK𝖻𝗂𝗌(β)=1n𝐄𝐠[logZ𝖻𝗂𝗌(β,𝐠)],Φn,d𝖻𝗂𝗌(β)=1n𝐄𝐀[logZ𝖻𝗂𝗌(β,𝐀)].

The following proposition is a simple consequence of Cauchy-Schwarz.

Proposition 5.

Both logZ(β,𝐗) and logZ𝖻𝗂𝗌(β,𝐗) are convex in β.

Proof.

For any β1,β2 we have

Z(β1+β22,𝐗) =σ{1,1}nexp(β1+β22H(σ;𝐗))
(σ{1,1}nexp(β1H(σ;𝐗)))(σ{1,1}nexp(β2H(σ;𝐗)))
=Z(β1,𝐗)Z(β2,𝐗).

Taking the logarithm of both sides, we get that logZ(β,𝐗) is convex in β. The convexity of logZ𝖻𝗂𝗌(β,𝐗) can be shown in the same way.

By taking expectation over the randomness of 𝐗 we immediately obtain the following corollary.

Corollary 6.

Φn,SK(β),Φn,SK𝖻𝗂𝗌(β),Φn,d(β),Φn,d𝖻𝗂𝗌(β) are all convex in β.

Correspondence between dense and sparse models

It is known that as the average degree d gets larger, the sparse model will “converge” to the dense model in some sense. Dembo, Montanari, and Sen proved this for the free energy of these two models using a clever interpolation argument:

Lemma 7 ([21]).

There exist constants c1,c2,c3>0 independent of n,β,d such that for any β,

|Φn,d𝖻𝗂𝗌(βd)Φn,SK𝖻𝗂𝗌(β)|c1|β|3d+c2β4d+c3|β|d1n2. (2.1)

In this paper, we further extend this correspondence to the average energy H(σ) as well as the average overlap R1,2(σ1,σ2)2.

Proposition 8 (Restatement of Proposition 3).
  1. (a)

    For every β0, we have

    limdlim supn1n|𝐄𝐠[HSK(σ)β,𝐠]1d𝐄𝐀[Hd(σ)β/d,𝐀]|=0. (2.2)
  2. (b)

    For every β, we have

    limdlim supn1n|𝐄𝐠[HSK(σ)β,𝐠𝖻𝗂𝗌]1d𝐄𝐀[Hd(σ)β/d,𝐀𝖻𝗂𝗌]|=0. (2.3)
Proposition 9 (Restatement of Proposition 4).
  1. (a)

    For any β0,

    limdlim supn|𝐄𝐠[R1,22β,𝐠]𝐄𝐀[R1,22β/d,𝐀]|=0. (2.4)
  2. (b)

    For any β,

    limdlim supn|𝐄𝐠[R1,22β,𝐠𝖻𝗂𝗌]𝐄𝐀[R1,22β/d,𝐀𝖻𝗂𝗌]|=0. (2.5)

We prove Proposition 8 in Section 3 and Proposition 9 in Section 4.

Disorder chaos and stable algorithms

Disorder chaos is a well-studied phenomenon in spin glass theory. The term “disorder” refers to the random matrix 𝐗, and “chaos” describes what happens to the Gibbs distribution if we slightly perturb 𝐗. For the SK model, consider the following notion of perturbation. Let 𝐠,𝐠 be two independent copies of Gaussian matrices where 𝐠ij,𝐠ijN(0,1/2n) independently for each i,j. Given any perturbation parameter t0, we can consider the two measures μβ,𝐠(σ) and μβ,𝐠t(σ) where 𝐠t=1t𝐠+t𝐠. Define f(σ1,σ2)β,𝐠,𝐠t to be the average of f(σ1,σ2) where σ1μβ,𝐠 and σ2μβ,𝐠t sampled independently from these two distributions respectively, i.e.,

f(σ1,σ2)β,𝐠,𝐠t=σ1,σ2{1,1}nf(σ1,σ2)μβ,𝐠(σ1)μβ,𝐠t(σ2).

It is known that two such samples σ1,σ2 will likely have some nontrivial overlap when t=0, in which case there is no perturbation and f(σ1,σ2)β,𝐠,𝐠t is simply f(σ1,σ2)β,𝐠.

Theorem 10 (See e.g. [4]).

If |β|>1, then there exists ϵ=ϵ(β)>0 such that

limn𝐄𝐠[R1,2(σ1,σ2)2β,𝐠]ϵ(β). (2.6)

However, when t>0, then the overlap will be zero in the n limit.

Theorem 11 ([12]).

For all β,t(0,1], we have

limn𝐄𝐠,𝐠[R1,2(σ1,σ2)2β,𝐠,𝐠t]=0. (2.7)

If we think of the expression on the left hand side of (2.6) and (2.7) as a function of t, then (2.6) and (2.7) together suggest that this function is not right-continuous at t=0 if |β|>1. It was shown in [4] that this property very naturally obstructs a class of sampling algorithms for μβ,𝐠.

Definition 12 (Definition 2.2 in [4]).

Let {𝖠𝖫𝖦n}n1 be a family of sampling algorithms for the Gibbs measure μβ,𝐠, where 𝖠𝖫𝖦n takes an n×n matrix 𝐠 and β as input and outputs an assignment in {1,1}n. Let μβ,𝐠𝖠𝖫𝖦 be the output law of 𝖠𝖫𝖦n. We say that {𝖠𝖫𝖦n}n1 is stable (with respect to disorder at β), if

limt0lim supn𝐄[W2,n(μβ,𝐠𝖠𝖫𝖦,μβ,𝐠t𝖠𝖫𝖦)]=0. (2.8)

Intuitively, (2.8) means that stable algorithms are not able to make the leap at the discontinuity t=0 implied by (2.6) and (2.7), and therefore must be producing a distribution that is bounded away from the Gibbs distribution μβ,𝐠 in terms of the W2,n distance. This intuition is formalized in [4] as the following theorem.

Theorem 13 (Theorem 2.6 in [4]).

Fix β such that |β|>1. Let {𝖠𝖫𝖦n}n1 be a family of sampling algorithms for the Gibbs measure μβ,𝐠 that is stable with respect to disorder at β, then

lim infn𝐄𝐠[W2,n(μβ,𝐠,μβ,𝐠𝖠𝖫𝖦)]>0. (2.9)

In this paper, we obstruct stable sampling algorithms for sparse models as well. For the sparse models, we consider a slightly different notion of perturbation. Fix the average degree d>0 and the perturbation parameter t[0,1], we will take three independent random matrices 𝐀(1t),𝐀(t,1),𝐀(t,2) such that 𝐀ij(1t)Po((1t)d/(2n)), 𝐀ij(t,1),𝐀ij(t,2)Po(td/(2n)) independently for all i,j[n]. We then define 𝐀=𝐀(1t)+𝐀(t,1) and 𝐀t=𝐀(1t)+𝐀(t,2). Similarly to Definition 12, we say that a family of sampling algorithms {𝖠𝖫𝖦n}n1 with inputs β and 𝐀 is stable at inverse temperature β if

limt0lim supn𝐄[W2,n(μβ,𝐀𝖠𝖫𝖦,μβ,𝐀t𝖠𝖫𝖦)]=0. (2.10)

As an example, it was recently shown that stable algorithms implemented by Boolean circuits with bounded depth are stable [1].

When t=0, we have by Proposition 9 and Theorem 10 that for β>1,

limdlim infn𝐄𝐀[R1,22β/d,𝐀]>0, (2.11)

and for β with |β|>1

limdlim infn𝐄𝐀[R1,22β/d,𝐀𝖻𝗂𝗌]>0. (2.12)

On the other hand, if t>0, we show in the following lemma that the overlap tends to zero as n goes to infinity.

Lemma 14.

Fix t>0.

  1. (a)

    For any β0,

    limdlim supn𝐄[R1,22β/d,𝐀,𝐀t]=0. (2.13)
  2. (b)

    For any β,

    limdlim supn𝐄[R1,22β/d,𝐀,𝐀t𝖻𝗂𝗌]=0. (2.14)

The following lemma establishes the connection between the overlap and the W2,n distance.

Lemma 15 (Lemma 5.3 in [4]).

Fix n. Let μ1,μ2,ν1,ν2 be distributions over {1,+1}n. We have

|𝐄(σ,σ)μ1ν1[|R1,2(σ,σ)|]𝐄(σ,σ)μ2ν2[|R1,2(σ,σ)|]|W2,n(μ1,μ2)+W2,n(ν1,ν2). (2.15)

We now have the ingredients needed for proving Theorem 1. Theorem 2 can be proved in the same way and we omit the details here.

Proof of Theorem 1.

We drop the inverse temperature parameter in the notation for brevity. For any arbitrary t>0, by choosing μ1=μ2=ν1=μβ/d,𝐀 and ν2=μβ/d,𝐀t in Lemma 15, we get that

W2,n(μβ/d,𝐀,μβ/d,𝐀t)||R1,2(σ,σ)|𝐀|R1,2(σ,σ)|𝐀,𝐀t|. (2.16)

Taking expectation on both sides, we obtain

𝐄𝐀,𝐀t[W2,n(μβ/d,𝐀,μβ/d,𝐀t)] 𝐄𝐀,𝐀t[||R1,2(σ,σ)|𝐀|R1,2(σ,σ)|𝐀,𝐀t|]
𝐄𝐀,𝐀t[|R1,2(σ,σ)|𝐀]𝐄𝐀,𝐀t[|R1,2(σ,σ)|𝐀,𝐀t].

In the second inequality we used the fact that |ab||a||b| for any a,b. Taking lim inf on both sides, we obtain

lim infn𝐄𝐀,𝐀t[W2,n(μβ/d,𝐀,μβ/d,𝐀t)]
lim infn𝐄𝐀,𝐀t[|R1,2(σ,σ)|𝐀]lim supn𝐄𝐀,𝐀t[|R1,2(σ,σ)|𝐀,𝐀t].

By (2.11) and (2.13), if d is sufficiently large, for some ϵ=ϵ(β)>0 independent of t,

lim infn𝐄𝐀,𝐀t[W2,n(μβ/d,𝐀,μβ/d,𝐀t)]ϵ. (2.17)

By the triangle inequality, we have

W2,n(μβ/d,𝐀,μβ/d,𝐀t)
W2,n(μβ/d,𝐀,μβ/d,𝐀𝖠𝖫𝖦)+W2,n(μβ/d,𝐀𝖠𝖫𝖦,μβ/d,𝐀t𝖠𝖫𝖦)+W2,n(μβ/d,𝐀t,μβ/d,𝐀t𝖠𝖫𝖦).

Since 𝐀 and 𝐀t have the same distribution, we have

𝐄[W2,n(μβ/d,𝐀,μβ/d,𝐀𝖠𝖫𝖦)]=𝐄[W2,n(μβ/d,𝐀t,μβ/d,𝐀t𝖠𝖫𝖦)]. (2.18)

Since {𝖠𝖫𝖦n}n is stable at any inverse temperature, by (2.10) and taking t sufficiently small we have

lim supn𝐄[W2,n(μβ/d,𝐀𝖠𝖫𝖦,μβ/d,𝐀t𝖠𝖫𝖦)]ϵ2. (2.19)

It follows that

lim infn𝐄[W2,n(μβ/d,𝐀𝖠𝖫𝖦,μβ/d,𝐀)]ϵ4.

Free energy of coupled bisection models

In subsequent proofs, we will need the following results on the free energy for coupled bisection models. More specifically, we show that the limiting free energy does not change under the bisection constraint for both dense and sparse models (for the sparse model we require d). Intuitively, in the case of SK model this holds because the symmetry of its disorder implies that there is nothing special about the all-one direction. The sparse case requires more care since it no longer has the symmetry, but we overcome this difficulty using Poisson concentration arguments. Due to the space constraint, proofs will be omitted here and we refer interested readers to the full version of this paper.

Let us define (β is omitted from the notation for simplicity)

Z𝐠,𝐠tS=σ1,σ2{1,1}nR1,2(σ1,σ2)Sexp(β(H(σ1;𝐠)+H(σ2;𝐠t)))

and

Z𝐠,𝐠tS,𝖻𝗂𝗌=σ1,σ2AnR1,2(σ1,σ2)Sexp(β(H(σ1;𝐠)+H(σ2;𝐠t))).

Similarly for the sparse model, let us define

Z𝐀,𝐀tS =σ1,σ2{1,1}nR1,2(σ1,σ2)Sexp(βd(H(σ1;𝐀)+H(σ2;𝐀t))), (2.20)
Z𝐀,𝐀tS,𝖻𝗂𝗌 =σ1,σ2AnR1,2(σ1,σ2)Sexp(βd(H(σ1;𝐀)+H(σ2;𝐀t))). (2.21)
Theorem 16.

Let S[1,1]. For any β, we have

limn1n𝐄[logZ𝐠,𝐠tS,𝖻𝗂𝗌logZ𝐠,𝐠tS]=0.

By letting t=0 and S=[1,1], we immediately obtain the following.

Corollary 17.

We have

limn(Φn,SK𝖻𝗂𝗌(β)Φn,SK(β))=0.
Lemma 18.

Fix β,d>0 and t[0,1]. For every δ(0,1/2), there exists a constant C=C(β,d) such that for all sufficiently large n

Pr[|logZ𝐀,𝐀tS𝐄[logZ𝐀,𝐀tS]|n1/2+δ]exp(Cn2δ) (2.22)

and

Pr[|logZ𝐀,𝐀tS,𝖻𝗂𝗌𝐄[logZ𝐀,𝐀tS,𝖻𝗂𝗌]|n1/2+δ]exp(Cn2δ). (2.23)
Theorem 19.

Let 0a<b. For every ϵ>0, if d is sufficiently large, then for every β[a,b],

lim supn1n|𝐄logZ𝐀,𝐀tS𝐄logZ𝐀,𝐀tS,𝖻𝗂𝗌|ϵ. (2.24)

Again, by letting t=0 and S=[1,1], we immediately obtain the following.

Theorem 20.

Let 0a<b. For every ϵ>0, if d is sufficiently large, then for every β[a,b],

lim supn|Φn,d(βd)Φn,d𝖻𝗂𝗌(βd)|ϵ. (2.25)

3 Interpolating the average energy between SK and the diluted model

In this section we prove Proposition 8 and establish the interpolation of average energy between sparse and dense models. By Lemma 7, we already know that the free energy of the sparse model converges to that of the dense model as the average degree d goes to infinity. The following connection between free energy and average energy is well-known:

β𝐄[logZ(β,𝐗)]=𝐄[σ{1,1}nH(σ;𝐗)exp(βH(σ;𝐗))Z(β,𝐗)]=𝐄[H(σ;𝐗)β,𝐗]. (3.1)

Similarly,

β𝐄[logZ𝖻𝗂𝗌(β,𝐗)]=𝐄[σAnH(σ;𝐗)exp(βH(σ;𝐗))Z𝖻𝗂𝗌(β,𝐗)]=𝐄[H(σ;𝐗)β,𝐗𝖻𝗂𝗌]. (3.2)

To obtain convergence of the partial derivatives, we use the following elementary fact sometimes known as Griffith’s lemma in statistical physics.

Proposition 21 (See e.g. [41]).

Let (fα)α0 be a family of convex and differentiable functions that converges pointwise in an open interval I to f, then limαfα(x)=f(x) at every xI where f(x) exists.

Proof of Proposition 8.

Proposition 8 says that

  1. (a)

    For every β0, limdlim supn1n|𝐄𝐠[HSK(σ)β,𝐠]1d𝐄𝐀[Hd(σ)β/d,𝐀]|=0.

  2. (b)

    For every β, limdlim supn1n|𝐄𝐠[HSK(σ)β,𝐠𝖻𝗂𝗌]1d𝐄𝐀[Hd(σ)β/d,𝐀𝖻𝗂𝗌]|=0.

We first prove part (b). Assume for the sake of contradiction that for some β0

lim supdlim supn1n|𝐄𝐠[HSK(σ)β0,𝐠𝖻𝗂𝗌]1d𝐄𝐀[Hd(σ)β0/d,𝐀𝖻𝗂𝗌]|=ϵ>0. (3.3)

It follows that we can choose a sequence of pairs (di,ni)i such that

limi1ni|𝐄𝐠[HSK(σ)β0,𝐠𝖻𝗂𝗌]1di𝐄𝐀[Hdi(σ)β0/di,𝐀𝖻𝗂𝗌]|=ϵ. (3.4)

Note that for every i, ni can be chosen to be sufficiently large so that the on(1) term in Lemma 7 is 1/di, and under this assumption we have (Φni,di𝖻𝗂𝗌(βdi))i as a family of functions of β converges pointwise444The convergence is in fact uniform in any bounded interval. to limiΦni,SK𝖻𝗂𝗌(β). Here we remark that it is known that the limit limnΦn,SK(β) exists and is differentiable in β [39, 40], and by Corollary 17 we have

limnΦn,SK(β)=limnΦn,SK𝖻𝗂𝗌(β)=limiΦni,SK𝖻𝗂𝗌(β). (3.5)

By Proposition 21, we have

limiβΦni,di𝖻𝗂𝗌(βdi)=βlimnΦn,SK𝖻𝗂𝗌(β)=limnβΦn,SK𝖻𝗂𝗌(β)=limiβΦni,SK𝖻𝗂𝗌(β) (3.6)

for every β. By (3.2), we have

βΦn,d𝖻𝗂𝗌(βd)=1dn𝐄𝐀[Hd(σ)β/d,𝐀𝖻𝗂𝗌], (3.7)

and

βΦn,SK𝖻𝗂𝗌(β)=1n𝐄𝐠[HSK(σ)β,𝐠𝖻𝗂𝗌]. (3.8)

(3.6), (3.7), and (3.8) imply that

limi1ni|𝐄𝐠[HSK(σ)β0,𝐠𝖻𝗂𝗌]1di𝐄𝐀[Hdi(σ)β0/di,𝐀𝖻𝗂𝗌]|=0. (3.9)

However, this contradicts (3.4), so we obtain part (b).

For part (a), note that by Theorem 20, for any bounded open interval I[0,+), we can choose the sequence (di,ni)i to satisfy the additional requirement that

limiΦni,di𝖻𝗂𝗌(β/di)=limiΦni,di(β/di) (3.10)

for all βI. In particular, we can choose I to include any desired β0>0 (when β=0 the Gibbs measure is uniform and part (a) trivially holds). We can then invoke Proposition 21 again and proceed with the same proof.

4 Interpolating the average overlap between SK and the diluted model

In this section we prove Proposition 9. Since the proofs are mostly the same, we will only prove item (a) in Proposition 9. We first state the following well-known fact which relates the average energy to the average overlap of the Gibbs distribution in the SK model.

Lemma 22 (See e.g. [32]).

For every β, we have

1n𝐄𝐠[HSK(σ)]=β2𝐄𝐠[1R1,22β,𝐠]. (4.1)

The proof of Lemma 22 uses the following proposition.

Proposition 23.

Let 𝐗n×n,β. We have

σ{1,1}ni,j=1nσ(i)σ(j)𝐗ijμ(σ;β,𝐗)=βn2(1R1,22β,𝐗) (4.2)
Proof.

For every σ{1,1}n and i,j[n], we have

𝐗ijμ(σ;β,𝐗)
= 𝐗ij(exp(βH(σ;𝐗))Z(β,𝐗))
= (β)(σ(i)σ(j)μ(σ;β,𝐗)σ{1,1}nσ(i)σ(j)μ(σ;β,𝐗)μ(σ;β,𝐗)). (4.3)

Summing over all σ,i,j, we obtain

σ{1,1}ni,j=1nσ(i)σ(j)𝐗ijμ(σ;β,𝐗)
= σ{1,1}ni,j=1nσ(i)σ(j)(β)(σ(i)σ(j)μ(σ;β,𝐗)
σ{1,1}nσ(i)σ(j)μ(σ;β,𝐗)μ(σ;β,𝐗))
= βn2(1R1,22β,𝐗).

Lemma 22 can be proved using Proposition 23 and the Gaussian integration by parts formula. We prove a statement similar to Lemma 22, but for the sparse model. We will use the following lemma, known as the Stein-Chen identity for the Poisson distribution, in place of Gaussian integration by parts.

Lemma 24 ([14]).

Let XPo(λ). For any bounded function f, we have

𝐄[Xf(X)]=λ𝐄[f(X+1)]. (4.4)
Lemma 25.

Let 𝐉ij be the matrix with 1 in the (i,j)-th entry and 0 everywhere else. Then there exists ξ[0,1] such that

|μ(σ;β/d,𝐗+𝐉ij)μ(σ;β/d,𝐗)𝐗ijμ(σ;β/d,𝐗)|3β2dμ(σ;β/d,𝐗+ξ𝐉ij). (4.5)
Proof.

Since μ is infinitely differentiable in each entry of 𝐗, by Taylor’s Theorem, there exists ξ[0,1] such that

μ(σ;β/d,𝐗+𝐉ij)μ(σ;β/d,𝐗)𝐗ijμ(σ;β/d,𝐗)=122𝐗ij2μ(σ;β/d,𝐗+ξ𝐉ij). (4.6)

It remains to bound 2𝐗ij2μ(σ;β/d,𝐗+ξ𝐉ij). By (4.3), we have (omitting β/d,𝐗+ξ𝐉ij from the notation for simplicity)

2𝐗ij2μ(σ) =𝐗ij((βd)(σ(i)σ(j)σ{1,1}nσ(i)σ(j)μ(σ))μ(σ))
=(βd)2(1+σ,σ′′{1,1}nσ(i)σ(j)σ′′(i)σ′′(j)μ(σ)μ(σ′′))μ(σ)
+(βd)2(σ(i)σ(j)σ{1,1}nσ(i)σ(j)μ(σ))2μ(σ).

Since

|σ,σ′′{1,1}nσ(i)σ(j)σ′′(i)σ′′(j)μ(σ)μ(σ′′)|=|σ(i)σ(j)σ′′(i)σ′′(j)|1 (4.7)

and similarly

|σ{1,1}nσ(i)σ(j)μ(σ)|=|σ(i)σ(j)|1, (4.8)

it follows that

|2𝐗ij2μ(σ)|β2d(2+4)μ(σ)=6β2dμ(σ). (4.9)

(4.6) and (4.9) together imply (4.5).

Proposition 26.

Let 𝐉ij be the matrix with 1 in the (i,j)-th entry and 0 everywhere else. For any ξ, we have

e2β|ξ|μ(σ;β,𝐗)μ(σ;β,𝐗+ξ𝐉ij)e2β|ξ|μ(σ;β,𝐗) (4.10)
Proof.

We have

exp(βH(σ;𝐗+ξ𝐉ij))=exp(βH(σ;𝐗)βH(σ;ξ𝐉ij))=exp(βH(σ;𝐗))eβξσ(i)σ(j), (4.11)

which implies that

eβ|ξ|exp(βH(σ;𝐗))exp(βH(σ;𝐗+ξ𝐉ij))eβ|ξ|exp(βH(σ;𝐗)). (4.12)

Summing over σ{1,1}n, we obtain

eβ|ξ|Z(β,𝐗)Z(β,𝐗+ξ𝐉ij)eβ|ξ|Z(β,𝐗). (4.13)

Since these quantities are all strictly positive, we have

μ(σ;β,𝐗+ξ𝐉ij)=exp(βH(σ;𝐗+ξ𝐉ij))Z(β,𝐗+ξ𝐉ij)eβ|ξ|exp(βH(σ;𝐗))eβ|ξ|Z(β,𝐗)=e2β|ξ|μ(σ;β,𝐗) (4.14)

and similarly

e2β|ξ|μ(σ;β,𝐗)μ(σ;β,𝐗+ξ𝐉ij).

Lemma 27.

For every β, we have

1dn𝐄[Hd(σ)βd,𝐀]
= d2𝐄[(1ni=1nσ(i))2βd,𝐀]β2𝐄[1R1,22βd,𝐀]+Od(1d).
Proof.

For simplicity, we drop the subscripts from . We have

1dn𝐄[Hd(σ)] =1dn𝐄[σ{1,1}ni,j=1nσ(i)σ(j)𝐀ijμ(σ;β/d,𝐀)]
=1dnd2n𝐄[σ{1,1}ni,j=1nσ(i)σ(j)μ(σ;β/d,𝐀+𝐉ij)]
=d2n2𝐄[σ{1,1}ni,j=1nσ(i)σ(j)μ(σ;β/d,𝐀+𝐉ij)]. (4.15)

Here in the second equality we invoked Lemma 24. By Lemma 25, for some ξσ,i,j[0,1] we have

|𝐄[σ{1,1}ni,j=1nσ(i)σ(j)μ(σ;β/d,𝐀+𝐉ij)]
𝐄[σ{1,1}ni,j=1nσ(i)σ(j)(μ(σ;β/d,𝐀)+𝐀ijμ(σ;β/d,𝐀))]|
3β2d𝐄[σ{1,1}ni,j=1nμ(σ;β/d,𝐀+ξσ,i,j𝐉ij)]
3β2dn2e2β/d𝐄[σ{1,1}nμ(σ;β/d,𝐀)]
= 3β2dn2e2β/d, (4.16)

where the last inequality is due to Proposition 26. We observe that

σ{1,1}ni,j=1nσ(i)σ(j)μ(σ;β/d,𝐀)=n2(1ni=1nσ(i))2. (4.17)

By Proposition 23, we also have

σ{1,1}ni,j=1nσ(i)σ(j)𝐀ijμ(σ;β/d,𝐀)=βdn2(1R1,22). (4.18)

The lemma follows by combining (4.15), (4.16), (4.17), and (4.18). The next proposition deals with the first term in the above lemma. Note that this term is trivially bounded in the bisection models for all β.

Proposition 28.

For every β0, we have

limddlim supn𝐄[(1ni=1nσ(i))2β/d,𝐀]=0. (4.19)
Proof.

Fix an arbitrary ϵ>0 and let S(n,ϵ)={σ{1,1}nϵni=1nσ(i)ϵn}. We have

𝐄[(1ni=1nσ(i))2β/d,𝐀]
= 𝐄[σ{1,1}n(1ni=1nσ(i))2μ(σ;β/d,𝐀)]
𝐄[σS(n,ϵ/d1/4)(ϵd1/4)2μ(σ;β/d,𝐀)]
+𝐄[σ{1,1}n\S(n,ϵ/d1/4)1μ(σ;β/d,𝐀)]
(ϵd1/4)2+𝐄[σ{1,1}n\S(n,ϵ/d1/4)μ(σ;β/d,𝐀)]. (4.20)

Recall that An={σ{1,1}n:|i=1nσ(i)|1}. Given σ{1,1}n\S(n,ϵ/d1/4), take a τAn such that R1,2(τ,σ)=maxσAnR1,2(σ,σ). By Poisson concentration properties (see full version for details), we have that there exists a constant C=C(ϵ)>0 such that with probability 12exp(Cd1/4n) we have Hd(σ)Hd(τ)ϵ2dn4, and consequently

μ(σ;β/d,𝐀)μ(τ;β/d,𝐀)exp(βϵ2n4). (4.21)

By taking expectations, we obtain

𝐄[μ(σ;β/d,𝐀)] 𝐄[μ(τ;β/d,𝐀)]exp(βϵ2n4)+2exp(Cd1/4n)
1|An|exp(βϵ2n4)+2exp(Cd1/4n). (4.22)

Here in the last inequality we used the fact that 𝐄[μ(τ;β/d,𝐀)] is the same for all τAn.

Combining (4.20) and (4.22), we obtain

d𝐄[(1ni=1nσ(i))2β/d,𝐀]ϵ2+d2n(1|An|exp(βϵ2n4)+2exp(Cd1/4n)). (4.23)

If d is sufficiently large, the above gives

dlim supn𝐄[(1ni=1nσ(i))2β/d,𝐀]ϵ2. (4.24)

The proposition follows since ϵ is chosen arbitrarily.

We are now ready to prove Proposition 9.

Proof of Proposition 9.

For part (a), by Lemma 22, Lemma 27, and Proposition 28, we have

limdlim supn1n|𝐄𝐠[HSK(σ)β,𝐠]1d𝐄𝐀[Hd(σ)β/d,𝐀]|
= limd(β2lim supn|𝐄𝐠[1R1,22β,𝐠]𝐄𝐀[1R1,22βd,𝐀]|
+d2𝐄[(1ni=1nσ(i))2βd,𝐀]+Od(1d))
= β2limdlim supn|𝐄𝐠[R1,22β,𝐠]𝐄𝐀[R1,22βd,𝐀]|.

Part (a) then follows by applying Proposition 8. Part (b) follows in a similar manner and we omit the details.

5 Disorder chaos for sparse models

Let 𝐠,𝐠t,𝐀,𝐀t be random matrices as defined in Section 2 and let S[1,1]. Recall that we also defined the following short-hand notation:

Z𝐠,𝐠tS =σ1,σ2{1,1}nR1,2(σ1,σ2)Sexp(β(H(σ1;𝐠)+H(σ2;𝐠t))), (5.1)
Z𝐠,𝐠tS,𝖻𝗂𝗌 =σ1,σ2AnR1,2(σ1,σ2)Sexp(β(H(σ1;𝐠)+H(σ2;𝐠t))), (5.2)
Z𝐀,𝐀tS =σ1,σ2{1,1}nR1,2(σ1,σ2)Sexp(βd(H(σ1;𝐀)+H(σ2;𝐀t))), (5.3)
Z𝐀,𝐀tS,𝖻𝗂𝗌 =σ1,σ2AnR1,2(σ1,σ2)Sexp(βd(H(σ1;𝐀)+H(σ2;𝐀t))). (5.4)

It is known that the SK model exhibits disorder chaos at any temperature.

Theorem 29 (Theorem 9 in [15]).

Let β, t>0. Fix an arbitrary ϵ>0. Let Iϵ=[1,ϵ][ϵ,1]. There exists some constant K>0 such that for every n,

𝐄[Z𝐠,𝐠tIϵZ𝐠,𝐠t[1,1]]Kexp(n/K). (5.5)

Theorem 29 immediately implies that the overlap of two configurations sampled from the coupled system μ𝐠,𝐠t is nearly zero.

Corollary 30.

If t>0, then

limn𝐄[R1,22β,𝐠,𝐠t]=0. (5.6)
Proof.

For any ϵ>0, we have

𝐄[R1,22β,𝐠,𝐠t] ϵ2𝐄[Z𝐠,𝐠t[ϵ,ϵ]Z𝐠,𝐠t[1,1]]+1𝐄[Z𝐠,𝐠tIϵZ𝐠,𝐠t[1,1]]
ϵ2+Kexp(n/K).

Here we used Theorem 29 and the fact that Z𝐠,𝐠t[ϵ,ϵ]Z𝐠,𝐠t[1,1]. It follows that

lim supn𝐄[R1,22β,𝐠,𝐠t]ϵ2

for any arbitrary ϵ>0, and the corollary follows.

The exponentially small fraction in Theorem 29 can be translated into a constant gap between the free energy of coupled and uncoupled systems.

Proposition 31.

For every ϵ>0, there exists some constant K>0 such that

lim supn(1n𝐄[logZ𝐠,𝐠tIϵ]1n𝐄[logZ𝐠,𝐠t[1,1]])1K. (5.7)
Proof.

By Theorem 29 and Jensen’s inequality

1n𝐄[logZ𝐠,𝐠tIϵ]1n𝐄[logZ𝐠,𝐠t[1,1]] =1n𝐄[logZ𝐠,𝐠tIϵZ𝐠,𝐠t[1,1]]
1nlog𝐄[Z𝐠,𝐠tIϵZ𝐠,𝐠t[1,1]]1K+O(1n).

The proposition follows by taking n to infinity.

The following theorem, which is a generalized version of Lemma 7, allows us to transfer this free energy gap to sparse models.

Theorem 32 (See e.g. [17, 16]).

For any β,

|1n𝐄[logZ𝐠,𝐠tIϵ,𝖻𝗂𝗌]1n𝐄[logZ𝐀,𝐀tIϵ,𝖻𝗂𝗌]|Od(1d)+don(1). (5.8)

We are now ready to prove Lemma 14.

Proof of Lemma 14.

As before, we only prove part (a). Fix some arbitrary ϵ>0. By Theorem 16, for both S=Iϵ and S=[1,1] we have

limn1n(𝐄[logZ𝐠,𝐠tS,𝖻𝗂𝗌]𝐄[logZ𝐠,𝐠tS])=0. (5.9)

By Proposition 31, for some constant K=K(ϵ)>0, we have

lim supn1n(𝐄[logZ𝐠,𝐠tIϵ,𝖻𝗂𝗌]𝐄[logZ𝐠,𝐠t[1,1],𝖻𝗂𝗌])1K. (5.10)

We can then use Theorem 32 to transfer this gap to the sparse model and obtain

limdlim supn1n(𝐄[logZ𝐀,𝐀tIϵ,𝖻𝗂𝗌]𝐄[logZ𝐀,𝐀t[1,1],𝖻𝗂𝗌])1K. (5.11)

By Theorem 19, we then have

limdlim supn1n(𝐄[logZ𝐀,𝐀tIϵ]𝐄[logZ𝐀,𝐀t[1,1]])1K. (5.12)

This means that if d is sufficiently large, then for all sufficiently large n we have

1n𝐄[logZ𝐀,𝐀tIϵ]1n𝐄[logZ𝐀,𝐀t[1,1]]12K. (5.13)

By Lemma 18, we have that with probability at least 1on(1)

1nlogZ𝐀,𝐀tIϵ1nlogZ𝐀,𝐀t[1,1]14K, (5.14)

which rearranges to

Z𝐀,𝐀tIϵZ𝐀,𝐀t[1,1]exp(n4K). (5.15)

It follows that

𝐄[R1,22β/d,𝐀,𝐀t]ϵ2+𝐄[Z𝐀,𝐀tIϵZ𝐀,𝐀t[1,1]]ϵ2+exp(n4K)+on(1). (5.16)

By taking n to infinity, we have lim supn𝐄[R1,22β/d,𝐀,𝐀t]ϵ2. This completes the proof.

References

  • [1] Ahmed El Alaoui and David Gamarnik. Hardness of sampling solutions from the Symmetric Binary Perceptron, July 2024. arXiv:2407.16627 [cs, math]. doi:10.48550/arXiv.2407.16627.
  • [2] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Local algorithms for Maximum Cut and Minimum Bisection on locally treelike regular graphs of large degree, November 2021. arXiv:2111.06813 [math-ph]. doi:10.48550/arXiv.2111.06813.
  • [3] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Optimization of mean-field spin glasses. The Annals of Probability, 49(6):2922–2960, November 2021. Publisher: Institute of Mathematical Statistics. doi:10.1214/21-AOP1519.
  • [4] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Sampling from the Sherrington-Kirkpatrick Gibbs measure via algorithmic stochastic localization, March 2022. arXiv:2203.05093 [cond-mat]. doi:10.48550/arXiv.2203.05093.
  • [5] Nima Anari, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong. Entropic independence I: Modified log-Sobolev inequalities for fractionally log-concave distributions and high-temperature Ising models. arXiv preprint arXiv:2106.04105, 2021.
  • [6] Nima Anari, Frederic Koehler, and Thuy-Duong Vuong. Trickle-down in localization schemes and applications. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 1094–1105, New York, NY, USA, 2024. Association for Computing Machinery. doi:10.1145/3618260.3649622.
  • [7] Roland Bauerschmidt and Thierry Bodineau. A very simple proof of the LSI for high temperature spin systems. Journal of Functional Analysis, 276(8):2582–2588, 2019.
  • [8] Roland Bauerschmidt, Thierry Bodineau, and Benoit Dagallier. Kawasaki dynamics beyond the uniqueness threshold. arXiv preprint arXiv:2310.04609, 2023. doi:10.48550/arXiv.2310.04609.
  • [9] Ivona Bezáková, Daniel Štefankovič, Vijay V Vazirani, and Eric Vigoda. Accelerating simulated annealing for the permanent and combinatorial counting problems. SIAM Journal on Computing, 37(5):1429–1454, 2008. doi:10.1137/050644033.
  • [10] Charlie Carlson, Ewan Davies, Alexandra Kolla, and Will Perkins. Computational thresholds for the fixed-magnetization Ising model. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1459–1472, 2022. doi:10.1145/3519935.3520003.
  • [11] Michael Celentano. Sudakov–Fernique post-AMP, and a new proof of the local convexity of the TAP free energy. The Annals of Probability, 52(3):923–954, 2024. doi:10.1214/23-AOP1675.
  • [12] Sourav Chatterjee. Disorder chaos and multiple valleys in spin glasses. arXiv preprint, 2009. arXiv:0907.3381.
  • [13] Antares Chen, Neng Huang, and Kunal Marwaha. Local algorithms and the failure of log-depth quantum advantage on sparse random CSPs, October 2023. arXiv:2310.01563 [quant-ph]. doi:10.48550/arXiv.2310.01563.
  • [14] Louis H. Y. Chen. Poisson Approximation for Dependent Trials. The Annals of Probability, 3(3):534–545, June 1975. Publisher: Institute of Mathematical Statistics. doi:10.1214/aop/1176996359.
  • [15] Wei-Kuo Chen. Variational representations for the Parisi functional and the two-dimensional Guerra–Talagrand bound. The Annals of Probability, 45(6A):3929–3966, November 2017. Publisher: Institute of Mathematical Statistics. doi:10.1214/16-AOP1154.
  • [16] Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, and Mustazee Rahman. Suboptimality of local algorithms for a class of max-cut problems. The Annals of Probability, 47(3):1587–1618, May 2019. Publisher: Institute of Mathematical Statistics. doi:10.1214/18-AOP1291.
  • [17] Wei-Kuo Chen and Dmitry Panchenko. Disorder chaos in some diluted spin glass models. The Annals of Applied Probability, 28(3):1356–1378, June 2018. doi:10.1214/17-AAP1331.
  • [18] Yuansi Chen and Ronen Eldan. Localization schemes: A framework for proving mixing bounds for Markov chains. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 110–122. IEEE, 2022.
  • [19] Amin Coja-Oghlan, Philipp Loick, Balázs F. Mezei, and Gregory B. Sorkin. The Ising Antiferromagnet and Max Cut on Random Regular Graphs. SIAM Journal on Discrete Mathematics, 36(2):1306–1342, June 2022. Publisher: Society for Industrial and Applied Mathematics. doi:10.1137/20M137999X.
  • [20] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 84(6):066106, 2011.
  • [21] Amir Dembo, Andrea Montanari, and Subhabrata Sen. Extremal cuts of sparse random graphs. The Annals of Probability, 45(2):1190–1217, March 2017. Publisher: Institute of Mathematical Statistics. doi:10.1214/15-AOP1084.
  • [22] Ronen Eldan, Frederic Koehler, and Ofer Zeitouni. A spectral condition for spectral gap: fast mixing in high-temperature Ising models. Probability theory and related fields, 182(3):1035–1051, 2022.
  • [23] Francesco Guerra. Broken replica symmetry bounds in the mean field spin glass model. Communications in mathematical physics, 233:1–12, 2003.
  • [24] Chris Jones, Kunal Marwaha, Juspreet Singh Sandhu, and Jonathan Shi. Random Max-CSPs Inherit Algorithmic Hardness from Spin Glasses. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 77:1–77:26, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2023.77.
  • [25] Florent Krzakała, Andrea Montanari, Federico Ricci-Tersenghi, Guilhem Semerjian, and Lenka Zdeborová. Gibbs states and the set of solutions of random constraint satisfaction problems. Proceedings of the National Academy of Sciences, 104(25):10318–10323, 2007. doi:10.1073/PNAS.0703685104.
  • [26] Aiya Kuchukova, Marcus Pappik, Will Perkins, and Corrine Yap. Fast and slow mixing of the Kawasaki dynamics on bounded-degree graphs. arXiv preprint, 2024. doi:10.48550/arXiv.2405.06209.
  • [27] M Mézard, G Parisi, and R Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, pages 815–814, 2002.
  • [28] Marc Mézard and Giorgio Parisi. Mean-field theory of randomly frustrated systems with finite connectivity. Europhysics Letters, 3(10):1067, 1987.
  • [29] Marc Mézard and Giorgio Parisi. The Bethe lattice spin glass revisited. The European Physical Journal B-Condensed Matter and Complex Systems, 20:217–233, 2001.
  • [30] Andrea Montanari. Optimization of the Sherrington–Kirkpatrick hamiltonian. SIAM Journal on Computing, 0(0):FOCS19–1–FOCS19–38, 2021. doi:10.1137/20M132016X.
  • [31] Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162(3):431–461, August 2015. doi:10.1007/s00440-014-0576-6.
  • [32] Dmitry Panchenko. The Sherrington-Kirkpatrick Model. Springer Monographs in Mathematics. Springer New York, NY, 2013.
  • [33] Giorgio Parisi. Infinite number of order parameters for spin-glasses. Physical Review Letters, 43(23):1754, 1979.
  • [34] Giorgio Parisi. A sequence of approximated solutions to the SK model for spin glasses. Journal of Physics A: Mathematical and General, 13(4):L115, 1980.
  • [35] Giorgio Parisi. Order parameter for spin-glasses. Physical Review Letters, 50(24):1946, 1983.
  • [36] David Sherrington and Scott Kirkpatrick. Solvable model of a spin-glass. Physical review letters, 35(26):1792, 1975.
  • [37] H. Sompolinsky and Annette Zippelius. Dynamic theory of the spin-glass phase. Phys. Rev. Lett., 47:359–362, August 1981. doi:10.1103/PhysRevLett.47.359.
  • [38] Eliran Subag. Following the ground states of full-RSB spherical spin glasses. Communications on Pure and Applied Mathematics, 74(5):1021–1044, 2021.
  • [39] Michel Talagrand. The Parisi Formula. Annals of Mathematics, 163(1):221–263, 2006. Publisher: Annals of Mathematics. URL: https://www.jstor.org/stable/20159953.
  • [40] Michel Talagrand. Parisi measures. Journal of Functional Analysis, 231(2):269–286, February 2006. doi:10.1016/j.jfa.2005.03.001.
  • [41] Michel Talagrand. Mean Field Models for Spin Glasses: Volume I: Basic Examples. Springer, Berlin, Heidelberg, 2011. doi:10.1007/978-3-642-15202-3.