Hardness of Sampling for the Anti-Ferromagnetic Ising Model on Random Graphs

Huang, Neng; Perkins, Will; Potechin, Aaron

doi:10.4230/LIPIcs.ITCS.2025.61

Hardness of Sampling for the Anti-Ferromagnetic Ising Model on Random Graphs

Neng Huang

University of Michigan, Ann Arbor, MI, USA Will Perkins

Georgia Institute of Technology, Atlanta, GA, USA Aaron Potechin

University of Chicago, IL, USA

Abstract

We prove a hardness of sampling result for the anti-ferromagnetic Ising model on random graphs of average degree $d$ for large constant $d$ , proving that when the normalized inverse temperature satisfies $\beta>1$ (asymptotically corresponding to the condensation threshold), then w.h.p. over the random graph there is no stable sampling algorithm that can output a sample close in $W_{2}$ distance to the Gibbs measure. The results also apply to a fixed-magnetization version of the model, showing that there are no stable sampling algorithms for low but positive temperature max and min bisection distributions. These results show a gap in the tractability of search and sampling problems: while there are efficient algorithms to find near optimizers, stable sampling algorithms cannot access the Gibbs distribution concentrated on such solutions.

Our techniques involve extensions of the interpolation technique relating behavior of the mean field Sherrington-Kirkpatrick model to behavior of Ising models on random graphs of average degree $d$ for large $d$ . While previous interpolation arguments compared the free energies of the two models, our argument compares the average energies and average overlaps in the two models.

Keywords and phrases:

Random graph, spin glass, sampling algorithm

Funding:

Neng Huang: Work was primarily done when NH was affiliated with the University of Chicago, supported partly by the Institute for Data, Econometrics, Algorithms, and Learning (IDEAL) with NSF grant ECCS-2216912.

Will Perkins: Supported in part by NSF grant CCF-2309708.

Copyright and License:

2012 ACM Subject Classification:

Mathematics of computing

\rightarrow

Probability and statistics ; Theory of computation

\rightarrow

Randomness, geometry and discrete structures

Editors:

Raghu Meka

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

The study of disordered systems is at the intersection of statistical physics, probability theory, and computer science. Models of disordered systems, such as the Edwards–Anderson model, Sherrington–Kirkpatrick model, random constraint satisfaction problems, and combinatorial optimization problems on random networks, have been studied from many different perspectives, including as toy models of complex physical systems and as a source of random computational problems on which to test algorithms and understand the sources of algorithmic intractability.

In the algorithmic context, disordered systems present at least two natural algorithmic challenges: the first is the search problem, that of finding a high quality solution to an optimization problem or finding a near ground state in the language of statistical physics; the second is the sampling problem, that of (approximately) sampling from a Gibbs measure that weights solutions exponentially by their objective value.

One striking phenomenon that has been discovered is a gap between the tractability of search and sampling. That is, for certain choices of parameters in some models of disordered systems, the search problem is tractable (there exist polynomial-time or even near-linear time algorithms to find high quality solutions), while no efficient sampling algorithms are known, and in fact broad classes of sampling algorithms can be proved to be ineffective. So while high quality solutions can be found efficiently, these solutions are not typical, in the sense that the distribution of solutions found by a given search algorithm is far from the equilibrium Gibbs measure.

A prime example of this phenomenon occurs in the Sherrington–Kirkpatrick (SK) model [36], an Ising model on the complete graph on $n$ vertices in which the $\binom{n}{2}$ coupling constants are independent Gaussian random variables (see e.g [32] for a mathematical survey of the model). Here, after appropriate normalization, the key parameter is $\beta$ , the inverse temperature of the model. Parisi famously predicted a formula for the limiting free energy of the SK model as a function of $\beta$ [33, 34]; this formula is non-analytic at $\beta=1$ , indicating the existence of the phase transition at this point. The Parisi formula was proved rigorously by Guerra [23] and Talagrand [39], with methods that have proved very influential in mathematics, physics, and computer science in the past two decades. Along with the Parisi formula comes information about the SK Gibbs measure itself; in particular, Parisi identified the “average overlap” of two independent samples from the Gibbs measure as an order parameter for the phase transition [35] and developed the theory of a hierarchy of “replica symmetry breaking”. Thus the equilibrium picture of the SK model is fairly well understood, with the phase transition identified at $\beta=1$ .

Turning to the algorithmic search problem, the task is to find a solution $\sigma\in\{\pm 1\}^{n}$ that (approximately) maximizes the probability mass function of the SK model; that is, the task of finding a ground state or near ground state of the model. Montanari [30], partly inspired by the work of Subag on spherical spin glasses [38], proposed an algorithm for this task based on approximate message passing, which finds an approximate optimizer under the widely believed assumption that the SK model has “no overlap gap” (sometimes also referred to as “full replica symmetry breaking”) for large enough $\beta$ . This approach was later extended to mixed $p$ -spin models [3], which can be thought of as the generalization of the Ising model to hypergraphs.

For the sampling problem, however, the picture is less optimistic. It was predicted by physicists that the simple Glauber dynamics should converge fast as long as $\beta<1$ [37], but progress on a rigorous proof of this statement has only come recently. A series of recent works [7, 22, 18, 5] showed that Glauber dynamics does indeed converge quickly if $\beta<1/4$ (more specifically, in $O(n\log n)$ steps). A more recent work [6] has further extended the range of rapid convergence to values beyond $\beta=1/4$ . Using a different approach based on stochastic localization, El Alaoui, Montanari, and Sellke [4] gave a different sampling algorithm that approximately samples from the SK Gibbs measure for $\beta<1/2$ , with the approximation in terms of the Wasserstein distance instead of the total variation distance guarantees given by previous works. This result was later improved to the entire replica-symmetric phase $\beta<1$ by Celentano [11]. On the other hand, obstacles seem to arise after the phase transition at $\beta=1$ . El Alaoui, Montanari, and Sellke showed that when $\beta>1$ , the onset of disorder chaos naturally obstructs stable sampling algorithms [4]. Similar ideas were used recently by El Alaoui and Gamarnik to rule out stable sampling algorithms for the symmetric binary perceptron problem [1].

One can ask whether this gap in tractability for the search and sampling problems is universal for certain classes of disordered systems. The SK model is an example of a mean-field model: all spins (variables) interact with each other. Mean-field models have the advantage of being mathematically tractable to some degree, but the disadvantage of being unrealistic from the physics perspective as well as the perspective of large real-world networks. Seeking a trade-off between these two aspects, physicists have studied diluted mean-field models: that is, statistical physics models on random graphs or hypergraphs of constant average degree [29]. These models inherit some of the symmetries of mean-field models while also having some non-trivial local geometry. The study of diluted mean field models in physics led to rich and surprising predictions for the behavior of optimization problems on random graphs and random constraint satisfaction problems [27, 25, 20]. Rigorously proving these predictions is a major task in mathematics and theoretical computer science.

The specific diluted mean-field model we address here is the anti-ferromagnetic Ising model on sparse random graphs. The Hamiltonian of this model is the number of edges in the random graph cut by the given configuration, and the ground states correspond to the maximum cuts in the graph. Finding a maximum cut in a graph is one of the best-known constraint satisfaction problems, and due to this connection this model has been studied extensively in computer science as well as statistical physics. It is known that as the average degree of the random graph increases, many aspects of this model can be understood via the SK model. Dembo, Montanari, and Sen showed that the typical size of a maximum cut in both Erdős-Rényi random graphs and random regular graphs can be related to the ground state energy of SK model [21]. They proved this by interpolating the free energy between the two models (described in more detail below). El Alaoui, Montanari, and Sellke then gave an algorithm for finding a near maximum cut in a random regular graph by adapting Montanari’s message-passing algorithm for the SK model [2]. Similar results have also been shown for the more general mixed $p$ -spin models and their diluted counterparts [24, 13].

It is natural to ask if this connection between the anti-ferromagnetic Ising model and the SK model can be extended further. In particular, does the search vs. sampling phenomenon also arise in the anti-ferromagnetic Ising model on random graphs? In this paper, we give an affirmative answer to this question.

1.1 Main Results

Before stating our main results we introduce some necessary definitions and notation. We refer to an element in $\{-1,1\}^{n}$ as a configuration. Given a graph $G$ and its adjacency matrix $A_{G}$ , we define the Ising Hamiltonian $H_{G}(\sigma)=\sum_{1\leq i\leq j\leq n}(A_{G})_{ij}\sigma(i)\sigma(j)$ . Note that $\frac{|E(G)|+H_{G}(\sigma)}{2}$ is exactly the number of edges in the graph whose endpoints are assigned the same spin by $\sigma$ . In particular, a ground state (i.e., a configuration that minimizes the Hamiltonian) of $H_{G}$ corresponds to a maximum cut in the graph. For any inverse temperature parameter $\beta\in\mathbb{R}$ , $H_{G}$ induces the Gibbs distribution

\mu_{\beta,G}(\sigma)=\frac{\exp(-\beta H(\sigma))}{Z_{G}(\beta)}.

(1.1)

where

Z_{G}(\beta)=\sum_{\sigma^{\prime}\in\{-1,1\}^{n}}\exp(-\beta H(\sigma^{\prime% }))

(1.2)

is the partition function, the normalizing constant of the Gibbs distribution.

In this paper, as is common in prior works, the random graph model we consider will be the Poissonized random multigraph model $\mathbb{G}_{n,d}^{\mathrm{Po}}$ , where we first sample the number of edges from a Poisson distribution with mean $dn/2$ (so that the expected average degree is $d$ ), and then for each edge sample both of its endpoints independently from $[n]=\{1,2,\ldots,n\}$ .

Given any two measures $\mu_{1},\mu_{2}$ over $\{-1,1\}^{n}$ , we define their normalized Wasserstein distance to be

W_{2,n}(\mu_{1},\mu_{2})=\inf_{\pi\in\Gamma(\mu_{1},\mu_{2})}\left(\mathop{{}% \mathbf{E}}_{(\sigma_{1},\sigma_{2})\sim\pi}\left[\frac{1}{n}\sum_{i=1}^{n}(% \sigma_{1}(i)-\sigma_{2}(i))^{2}\right]\right)^{1/2},

(1.3)

where $\Gamma(\mu_{1},\mu_{2})$ is the set of all couplings of $\mu_{1}$ and $\mu_{2}$ ; that is, measures over $\{-1,1\}^{n}\times\{-1,1\}^{n}$ whose marginal on the first argument is $\mu_{1}$ and the second $\mu_{2}$ .

Our main result is a hardness of sampling theorem in terms of the $W_{2,n}$ distance for the measure $\mu_{\beta,G}$ against stable sampling algorithms where $G\sim\mathbb{G}_{n,d}^{\mathrm{Po}}$ . Informally, a sampling algorithm is stable if its output is insensitive to small perturbations of the input. See Definition 12 for a formal definition.

Theorem 1.

Let $\{\mathsf{ALG}_{n}\}_{n\geq 1}$ be a family of randomized sampling algorithms where $\mathsf{ALG}_{n}$ takes as input an $n$ -vertex (multi-)graph $G$ and an inverse temperature parameter $\beta$ and produces an output law $\mu^{\mathsf{ALG}}_{\beta,G}$ over $\{-1,1\}^{n}$ . For every $\beta>1$ and $d$ sufficiently large, if $\{\mathsf{ALG}_{n}\}_{n\geq 1}$ is stable at inverse temperature parameter $\beta/\sqrt{d}$ , then

\liminf\limits_{n\to\infty}\mathop{{}\mathbf{E}}_{G\sim\mathbb{G}_{n,d}^{% \mathrm{Po}}}\left[W_{2,n}\left(\mu^{\mathsf{ALG}}_{\beta/\sqrt{d},G},\mu_{% \beta/\sqrt{d},G}\right)\right]>0.

(1.4)

That is, stable algorithms cannot sample from the model with vanishing error in Wasserstein distance.

It is known that the replica symmetry breaking threshold for $\mathbb{G}^{\mathrm{Po}}_{n,d}$ occurs at the Kesten-Stigum bound $\beta^{\dagger}(d)=\frac{1}{2}\log\frac{\sqrt{d}+1}{\sqrt{d}-1}$ ¹¹1Unless otherwise specified, all logarithms in this paper are natural logarithms. [31, 19], so the threshold $\beta=1$ corresponds exactly to the normalized limit $\lim_{d\to\infty}\sqrt{d}\cdot\beta^{\dagger}(d)=1$ . Note that $\beta=1$ is also the replica symmetry breaking threshold of the Sherrington-Kirkpatrick model.

As it turns out, the same proof for Theorem 1 yields sampling hardness for near-maximum and near-minimum bisections as well. Let $A_{n}=\{\sigma\in\{-1,1\}^{n}:|\sum_{i=1}^{n}\sigma(i)|\leq 1\}$ be the set of configurations in which the numbers of $+1$ s and $-1$ s differ by at most one, which we refer to as bisections. The Gibbs distribution $\mu_{\beta,G}$ when restricted to $A_{n}$ gives

\mu_{\beta,G}^{\mathsf{bis}}(\sigma)=\frac{\exp(-\beta H(\sigma))}{Z_{G}^{% \mathsf{bis}}(\beta)}.

(1.5)

where

Z_{G}^{\mathsf{bis}}(\beta)=\sum_{\sigma^{\prime}\in A_{n}}\exp(-\beta H(% \sigma^{\prime}))

(1.6)

is the bisection partition function. Note that $\mu_{\beta,G}^{\mathsf{bis}}(\sigma)$ prefers bisections that cut more edges when $\beta>0$ , and bisections with fewer edges when $\beta<0$ . This model is also known as the zero-magnetization Ising model and has been studied both in the statistical physics literature on disordered systems [28] and recently in computer science [10, 8, 26].

Theorem 2.

Let $\{\mathsf{ALG}_{n}\}_{n\geq 1}$ be a family of randomized sampling algorithms where $\mathsf{ALG}_{n}$ takes as input an $n$ -vertex (multi-)graph $G$ and an inverse temperature parameter $\beta$ and produces an output law $\mu^{\mathsf{ALG}}_{\beta,G}$ over $\{-1,1\}^{n}$ . For every $\beta\in\mathbb{R}$ such that $|\beta|>1$ and $d$ sufficiently large, if $\{\mathsf{ALG}_{n}\}_{n\geq 1}$ is stable at inverse temperature parameter $\beta/\sqrt{d}$ , then

\liminf\limits_{n\to\infty}\mathop{{}\mathbf{E}}_{G\sim\mathbb{G}_{n,d}^{% \mathrm{Po}}}\left[W_{2,n}\left(\mu^{\mathsf{ALG}}_{\beta/\sqrt{d},G},\mu^{% \mathsf{bis}}_{\beta/\sqrt{d},G}\right)\right]>0.

(1.7)

Theorems 1 and 2 exhibit a gap between search and sampling for the maximum cut and max/minimum bisection problem: while we have a search algorithm known under the widely believed “No Overlap Gap” conjecture [2] ²²2[2] stated the result for random regular graphs, but it can be transferred to other graph models using e.g. the argument in [13]., no stable algorithm can sample from the Gibbs distribution with arbitrary precision in the Wasserstein metric. Put another way, the algorithm of [2] finds solutions of value $(1-\epsilon)\cdot\text{OPT}$ for any fixed $\epsilon>0$ ; Theorems 1 and 2 rule out stable algorithms that sample solutions of these values approximately uniformly, since a standard reduction of counting to sampling (e.g [9]) reduces the task of sampling from the Gibbs distribution to sampling uniformly from solutions of a given value.

1.2 Techniques

To show that sampling from the anti-ferromagnetic Ising Gibbs distribution on random graphs is hard for stable sampling algorithms, we use the framework that was used by [4] to show that sampling from the Sherrington-Kirkpatrick Gibbs distribution is hard. The key quantity in this approach is the overlap between two configurations $\sigma_{1},\sigma_{2}$ , defined by

R_{1,2}(\sigma_{1},\sigma_{2}):=\frac{1}{n}\sum_{i=1}^{n}\sigma_{1}(i)\sigma_{% 2}(i)\,.

The framework consists of the following two steps:

1.

Show that w.h.p. over the disorder, the average of the squared overlap $R_{1,2}(\sigma_{1},\sigma_{2})^{2}$ with respect to two independent samples $\sigma_{1}$ and $\sigma_{2}$ from the Gibbs distribution is at least some positive constant independent of $n$ . In particular, this is expected to hold when the Gibbs distribution exhibits replica symmetry breaking, which explains the threshold $\beta=1$ .
2.

Show that the Gibbs distribution exhibits disorder chaos. That is, if we perturb the input (the random coupling constants in the case of the SK model or the random graph in the case of the Ising model) slightly (see Section 2 for a formal definition of this perturbation) and then take a sample $\sigma_{1}$ from the Gibbs distribution for the original system and a sample $\sigma_{2}$ from the Gibbs distribution for the perturbed system, with high probability $R_{1,2}(\sigma_{1},\sigma_{2})$ is very close to zero.

Combining these two steps and using a connection between $R_{1,2}$ and $W_{2,n}$ distance (see Lemma 15), we have that for arbitrarily small perturbations, the $W_{2,n}$ distance between the original Gibbs distribution and the perturbed Gibbs distribution is at least some positive constant. However, for stable sampling algorithms, if we make the perturbation sufficiently small then the $W_{2,n}$ distance between the old and new output distributions can also be arbitrarily small. This implies that the output distribution of any stable sampling algorithm will have strictly positive $W_{2,n}$ distance from the target Gibbs distribution.

In order to carry out this framework for sparse random graph models, we prove new structural properties of the Gibbs measures $\mu_{\beta/\sqrt{d},G}$ and $\mu^{\mathsf{bis}}_{\beta/\sqrt{d},G}$ and connect them to those of the Gibbs measures $\mu_{\beta,\mathbf{g}}$ and $\mu^{\mathsf{bis}}_{\beta,\mathbf{g}}$ of the SK model (see Section 2 for the formal definition of $\mu_{\beta,\mathbf{g}}$ and $\mu_{\beta,\mathbf{g}}^{\mathsf{bis}}$ ). This further extends the close connection between the behavior of the SK model and the sparse random graph model. Previously, it was shown by Dembo, Montanari, and Sen [21] that the free energy of the sparse model converges to that of the SK model as the average degree $d$ increases to infinity. We first extend this to the average Hamiltonian of a sample drawn from the Gibbs distribution, showing that this quantity also converges in the same manner, regardless of the bisection restriction. This is done by using convexity of the free energy and a well-known observation that the average Hamiltonian can be obtained by taking derivative of the free energy (see (3.1) and (3.2)). The main difficulty here is establishing that the limiting free energy is the same regardless of the bisection restriction, the details of which can be found in the full version of this paper.

Proposition 3.

(a)

For every $\beta\geq 0$ , we have

$\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left|\mathop{{}\mathbf{E}}_{% \mathbf{g}}\left[\mathop{{}\mathbf{E}}_{\sigma\sim\mu_{\beta,\mathbf{g}}}\left% [H_{\mathrm{SK}}(\sigma)\right]\right]-\frac{1}{\sqrt{d}}\mathop{{}\mathbf{E}}% _{G}\left[\mathop{{}\mathbf{E}}_{\sigma\sim\mu_{\beta/\sqrt{d},G}}\left[H_{G}(% \sigma)\right]\right]\right|=0.$ (1.8)
(b)

For every $\beta\in\mathbb{R}$ , we have

$\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left|\mathop{{}\mathbf{E}}_{% \mathbf{g}}\left[\mathop{{}\mathbf{E}}_{\sigma\sim\mu_{\beta,\mathbf{g}}^{% \mathsf{bis}}}\left[H_{\mathrm{SK}}(\sigma)\right]\right]-\frac{1}{\sqrt{d}}% \mathop{{}\mathbf{E}}_{G}\left[\mathop{{}\mathbf{E}}_{\sigma\sim\mu_{\beta/% \sqrt{d},G}^{\mathsf{bis}}}\left[H_{G}(\sigma)\right]\right]\right|=0.$ (1.9)

Our main technical contribution is the following proposition which says that as the average degree $d$ goes to infinity, the expected average squared overlap between two independent samples from the Gibbs measure $\mu_{\beta/\sqrt{d},G}$ converges to those sampled from the Gibbs measure $\mu_{\beta,\mathbf{g}}$ of the SK model, again regardless of the bisection restriction. This gives the first ingredient in the sampling hardness framework. For the SK Hamiltonian, there is a known connection between the average Hamiltonian and the average squared overlap, which can be established using the Gaussian integration by parts trick (see e.g. [32] for more details). In this paper, we show that one can use the Stein-Chen identity for the Poisson distribution (see Lemma 24) in place of the Gaussian integration by parts trick to establish a similar connection for the sparse model in the large degree limit.

Proposition 4.

(a)

For any $\beta\geq 0$ ,

$\lim_{d\to\infty}\limsup_{n\to\infty}\left|\mathop{{}\mathbf{E}}_{\mathbf{g}}% \left[\mathop{{}\mathbf{E}}_{\sigma_{1},\sigma_{2}\sim\mu_{\beta,\mathbf{g}}}% \left[R_{1,2}(\sigma_{1},\sigma_{2})^{2}\right]\right]-\mathop{{}\mathbf{E}}_{% G}\left[\mathop{{}\mathbf{E}}_{\sigma_{1},\sigma_{2}\sim\mu_{\beta/\sqrt{d},G}% }\left[R_{1,2}(\sigma_{1},\sigma_{2})^{2}\right]\right]\right|=0.$ (1.10)
(b)

For any $\beta\in\mathbb{R}$ ,

$\lim_{d\to\infty}\limsup_{n\to\infty}\left|\mathop{{}\mathbf{E}}_{\mathbf{g}}% \left[\mathop{{}\mathbf{E}}_{\sigma_{1},\sigma_{2}\sim\mu_{\beta,\mathbf{g}}^{% \mathsf{bis}}}\left[R_{1,2}(\sigma_{1},\sigma_{2})^{2}\right]\right]-\mathop{{% }\mathbf{E}}_{G}\left[\mathop{{}\mathbf{E}}_{\sigma_{1},\sigma_{2}\sim\mu_{% \beta/\sqrt{d},G}^{\mathsf{bis}}}\left[R_{1,2}(\sigma_{1},\sigma_{2})^{2}% \right]\right]\right|=0.$ (1.11)

Finally, to obtain disorder chaos for the sparse model which is the second component in the sampling hardness framework, we use the fact that two configurations sampled from the coupled system for the SK model have nontrivial overlap with only exponentially small probability (see Theorem 29). This causes a gap in free energy between the coupled system and uncoupled system, which can then be transferred to the sparse model using the interpolation result by [17, 16]. We can then translate this gap back to a disorder chaos statement.

1.3 Organization

The paper is organized as follows. In Section 2 we set up the notation and give a detailed overview of the proof. In Section 3 and Section 4, we establish the correspondence of average energy and average overlap between the two models. Finally in Section 5, we transfer the disorder chaos property from the SK model to the sparse random graph models.

2 Overview of the proof

Gibbs distributions

For $\sigma\in\{-1,1\}^{n},\mathbf{X}\in\mathbb{R}^{n\times n}$ , we define the Hamiltonian

H(\sigma;\mathbf{X}):=\sum_{i,j=1}^{n}\mathbf{X}_{ij}\sigma(i)\sigma(j).

For any $\beta\in\mathbb{R}$ , it induces the following Gibbs distribution on $\{-1,+1\}^{n}$ :

\mu_{\beta,\mathbf{X}}(\sigma)=\frac{\exp\left(-\beta H(\sigma;\mathbf{X})% \right)}{Z(\beta,\mathbf{X})},

where $Z(\beta,\mathbf{X})=\sum_{\sigma\in\{-1,1\}^{n}}\exp\left(-\beta H(\sigma;% \mathbf{X})\right)$ is the normalizing factor commonly referred to as the partition function. $\beta$ is commonly referred to as the inverse temperature parameter in statistical physics. We also consider the restriction of the above distribution to bisections $\sigma\in A_{n}$ , with

\mu^{\mathsf{bis}}_{\beta,\mathbf{X}}(\sigma)=\frac{\exp\left(-\beta H(\sigma;% \mathbf{X})\right)}{Z^{\mathsf{bis}}(\beta,\mathbf{X})},\quad Z^{\mathsf{bis}}% (\beta,\mathbf{X})=\sum_{\sigma\in A_{n}}\exp\left(-\beta H(\sigma;\mathbf{X})% \right).

We refer to such restricted versions as bisection models. In statistical physics literature, this is sometimes also called the model with zero magnetization and the version without the bisection restriction is called the model with non-fixed magnetization. Here, the term magnetization refers to the quantity $m(\sigma)=\sum_{i=1}^{n}\sigma(i)/n$ . We sometimes also write $\mu(\sigma;\beta,\mathbf{X})$ and $\mu^{\mathsf{bis}}(\sigma;\beta,\mathbf{X})$ instead of $\mu_{\beta,\mathbf{X}}(\sigma)$ and $\mu^{\mathsf{bis}}_{\beta,\mathbf{X}}(\sigma)$ if we wish to stress the dependence on $\beta$ and $\mathbf{X}$ .

$\mathbf{X}$ is a random matrix drawn from some distribution and in this paper we consider two important cases. In the first case, $\mathbf{X}=\mathbf{g}$ , where $\mathbf{g}_{ij}\sim N(0,1/2n)$ independently. This is known as the Sherrington-Kirkpatrick model. In the second case we have $\mathbf{X}=\mathbf{A}$ , where each entry $\mathbf{A}_{ij}$ is an independent $\mathrm{Po}(d/2n)$ random variable for some parameter $d>0$ . This case corresponds to sparse random (multi-)graphs with average degree $d$ , where the vertex set is $[n]=\{1,2,\ldots,n\}$ and the multiplicity of the edge $\{i,j\}$ is $\mathbf{A}_{ij}+\mathbf{A}_{ji}$ if $i\neq j$ and $\mathbf{A}_{ii}$ if $i=j$ (note that here $\mathbf{A}$ is not the adjacency matrix of the graph and can be asymmetric). Any configuration $\sigma$ can be thought of as the indicator vector for a cut where the vertices assigned 1 by $\sigma$ is on one side and those assigned $-1$ is on the other side, and $H(\sigma;\mathbf{A})$ is equal to the difference between the number of edges crossing the cut and the number of edges not crossing the cut. When $\beta>0$ , the Gibbs measure prefers those configurations that cut more edges, and this corresponds to the Maximum Cut problem for the non-fixed magnetization model and the Maximum Bisection problem for the zero-magnetization model. This case is known as the anti-ferromagnetic Ising model in the statistical physic literature. When $\beta<0$ , the non-fixed magnetization model corresponds to the Minimum Cut problem while the zero-magnetization model corresponds to the Minimum Bisection problem. This case is known as the ferromagnetic Ising model in the statistical physic literature.

For notational convenience, we sometimes drop the random matrix and write $H_{\mathrm{SK}}(\sigma):=H(\sigma;\mathbf{g})$ and $H_{d}(\sigma):=H(\sigma;\mathbf{A})$ . We sometimes refer to these two models as dense or sparse models respectively.

Gibbs average

For any function $f:\{-1,1\}^{n}\to\mathbb{R}$ , we can define its average with respect to $\mu$ or $\mu^{\mathsf{bis}}$ :

\langle f(\sigma)\rangle_{\beta,\mathbf{X}}=\sum_{\sigma\in\{-1,1\}^{n}}f(% \sigma)\mu_{\beta,\mathbf{X}}(\sigma),\quad\langle f(\sigma)\rangle_{\beta,% \mathbf{X}}^{\mathsf{bis}}=\sum_{\sigma\in A_{n}}f(\sigma)\mu^{\mathsf{bis}}_{% \beta,\mathbf{X}}(\sigma).

We can extend this definition to functions $f:\{-1,1\}^{n\times k}\to\mathbb{R}$ which take multiple configurations as inputs, in which case we assume that the configurations are sampled independently from the Gibbs distribution:

\langle f(\sigma_{1},\ldots,\sigma_{k})\rangle_{\beta,\mathbf{X}}=\sum_{\sigma% _{1},\ldots,\sigma_{k}\in\{-1,1\}^{n}}\mu_{\beta,\mathbf{X}}(\sigma_{1})\cdots% \mu_{\beta,\mathbf{X}}(\sigma_{k})f(\sigma_{1},\ldots,\sigma_{k}),

and the average over $\mu^{\mathsf{bis}}$ is defined similarly. We sometimes drop the subscripts if they are clear from context. One function of particular interest to us is the overlap between two configurations, defined as

R_{1,2}(\sigma_{1},\sigma_{2})=\frac{1}{n}\sum_{i=1}^{n}\sigma_{1}(i)\sigma_{2% }(i).

$R_{1,2}$ gives the normalized inner product between two configurations. If $\langle|R_{1,2}(\sigma_{1},\sigma_{2})|\rangle_{\beta,\mathbf{X}}$ is close to zero, then two configurations independently sampled from the Gibbs distribution $\mu_{\beta,\mathbf{X}}(\sigma)$ are nearly orthogonal.

Free energy

The free energy for these models is defined as follows.

\Phi_{n,\mathrm{SK}}(\beta)=\frac{1}{n}\mathop{{}\mathbf{E}}_{\mathbf{g}}\left% [\log Z(\beta,\mathbf{g})\right],\quad\Phi_{n,d}(\beta)=\frac{1}{n}\mathop{{}% \mathbf{E}}_{\mathbf{A}}\left[\log Z(\beta,\mathbf{A})\right].

We can also define free energy in a similar way when restricted to bisections.

\Phi_{n,\mathrm{SK}}^{\mathsf{bis}}(\beta)=\frac{1}{n}\mathop{{}\mathbf{E}}_{% \mathbf{g}}\left[\log Z^{\mathsf{bis}}(\beta,\mathbf{g})\right],\quad\Phi_{n,d% }^{\mathsf{bis}}(\beta)=\frac{1}{n}\mathop{{}\mathbf{E}}_{\mathbf{A}}\left[% \log Z^{\mathsf{bis}}(\beta,\mathbf{A})\right].

The following proposition is a simple consequence of Cauchy-Schwarz.

Proposition 5.

Both $\log Z(\beta,\mathbf{X})$ and $\log Z^{\mathsf{bis}}(\beta,\mathbf{X})$ are convex in $\beta$ .

Proof.

For any $\beta_{1},\beta_{2}$ we have

	$\displaystyle Z\left(\frac{\beta_{1}+\beta_{2}}{2},\mathbf{X}\right)$	$\displaystyle=\sum_{\sigma\in\{-1,1\}^{n}}\exp\left(-\frac{\beta_{1}+\beta_{2}% }{2}H(\sigma;\mathbf{X})\right)$
		$\displaystyle\leq\sqrt{\left(\sum_{\sigma\in\{-1,1\}^{n}}\exp\left(-\beta_{1}H% (\sigma;\mathbf{X})\right)\right)\left(\sum_{\sigma\in\{-1,1\}^{n}}\exp\left(-% \beta_{2}H(\sigma;\mathbf{X})\right)\right)}$
		$\displaystyle=\sqrt{Z(\beta_{1},\mathbf{X})Z(\beta_{2},\mathbf{X})}.$

Taking the logarithm of both sides, we get that $\log Z(\beta,\mathbf{X})$ is convex in $\beta$ . The convexity of $\log Z^{\mathsf{bis}}(\beta,\mathbf{X})$ can be shown in the same way. $\hfill\blacktriangleleft$

By taking expectation over the randomness of $\mathbf{X}$ we immediately obtain the following corollary.

Corollary 6.

$\Phi_{n,\mathrm{SK}}(\beta),\Phi_{n,\mathrm{SK}}^{\mathsf{bis}}(\beta),\Phi_{n% ,d}(\beta),\Phi_{n,d}^{\mathsf{bis}}(\beta)$ are all convex in $\beta$ .

Correspondence between dense and sparse models

It is known that as the average degree $d$ gets larger, the sparse model will “converge” to the dense model in some sense. Dembo, Montanari, and Sen proved this for the free energy of these two models using a clever interpolation argument:

Lemma 7 ([21]).

There exist constants $c_{1},c_{2},c_{3}>0$ independent of $n,\beta,d$ such that for any $\beta\in\mathbb{R}$ ,

\left|\Phi_{n,d}^{\mathsf{bis}}\left(\frac{\beta}{\sqrt{d}}\right)-\Phi_{n,% \mathrm{SK}}^{\mathsf{bis}}(\beta)\right|\leq c_{1}\frac{|\beta|^{3}}{\sqrt{d}% }+c_{2}\frac{\beta^{4}}{d}+c_{3}|\beta|\sqrt{d}\cdot\frac{1}{n^{2}}.

(2.1)

In this paper, we further extend this correspondence to the average energy $\langle H(\sigma)\rangle$ as well as the average overlap $\langle R_{1,2}(\sigma_{1},\sigma_{2})^{2}\rangle$ .

Proposition 8 (Restatement of Proposition 3).

(a)

For every $\beta\geq 0$ , we have

$\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left|\mathop{{}\mathbf{E}}_{% \mathbf{g}}[\langle H_{\mathrm{SK}}(\sigma)\rangle_{\beta,\mathbf{g}}]-\frac{1% }{\sqrt{d}}\mathop{{}\mathbf{E}}_{\mathbf{A}}[\langle H_{d}(\sigma)\rangle_{% \beta/\sqrt{d},\mathbf{A}}]\right|=0.$ (2.2)
(b)

For every $\beta\in\mathbb{R}$ , we have

$\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left|\mathop{{}\mathbf{E}}_{% \mathbf{g}}[\langle H_{\mathrm{SK}}(\sigma)\rangle_{\beta,\mathbf{g}}^{\mathsf% {bis}}]-\frac{1}{\sqrt{d}}\mathop{{}\mathbf{E}}_{\mathbf{A}}[\langle H_{d}(% \sigma)\rangle_{\beta/\sqrt{d},\mathbf{A}}^{\mathsf{bis}}]\right|=0.$ (2.3)

Proposition 9 (Restatement of Proposition 4).

(a)

For any $\beta\geq 0$ ,

$\lim_{d\to\infty}\limsup_{n\to\infty}\left|\mathop{{}\mathbf{E}}_{\mathbf{g}}[% \langle R_{1,2}^{2}\rangle_{\beta,\mathbf{g}}]-\mathop{{}\mathbf{E}}_{\mathbf{% A}}[\langle R_{1,2}^{2}\rangle_{\beta/\sqrt{d},\mathbf{A}}]\right|=0.$ (2.4)
(b)

For any $\beta\in\mathbb{R}$ ,

$\lim_{d\to\infty}\limsup_{n\to\infty}\left|\mathop{{}\mathbf{E}}_{\mathbf{g}}[% \langle R_{1,2}^{2}\rangle_{\beta,\mathbf{g}}^{\mathsf{bis}}]-\mathop{{}% \mathbf{E}}_{\mathbf{A}}[\langle R_{1,2}^{2}\rangle_{\beta/\sqrt{d},\mathbf{A}% }^{\mathsf{bis}}]\right|=0.$ (2.5)

We prove Proposition 8 in Section 3 and Proposition 9 in Section 4.

Disorder chaos and stable algorithms

Disorder chaos is a well-studied phenomenon in spin glass theory. The term “disorder” refers to the random matrix $\mathbf{X}$ , and “chaos” describes what happens to the Gibbs distribution if we slightly perturb $\mathbf{X}$ . For the SK model, consider the following notion of perturbation. Let $\mathbf{g},\mathbf{g}^{\prime}$ be two independent copies of Gaussian matrices where $\mathbf{g}_{ij},\mathbf{g}_{ij}^{\prime}\sim N(0,1/2n)$ independently for each $i, j$ . Given any perturbation parameter $t\geq 0$ , we can consider the two measures $\mu_{\beta,\mathbf{g}}(\sigma)$ and $\mu_{\beta,\mathbf{g}_{t}}(\sigma)$ where $\mathbf{g}_{t}=\sqrt{1-t}\mathbf{g}+\sqrt{t}\mathbf{g}^{\prime}$ . Define $\langle f(\sigma_{1},\sigma_{2})\rangle_{\beta,\mathbf{g},\mathbf{g}_{t}}$ to be the average of $f(\sigma_{1},\sigma_{2})$ where $\sigma_{1}\sim\mu_{\beta,\mathbf{g}}$ and $\sigma_{2}\sim\mu_{\beta,\mathbf{g}_{t}}$ sampled independently from these two distributions respectively, i.e.,

\langle f(\sigma_{1},\sigma_{2})\rangle_{\beta,\mathbf{g},\mathbf{g}_{t}}=\sum% _{\sigma_{1},\sigma_{2}\in\{-1,1\}^{n}}f(\sigma_{1},\sigma_{2})\mu_{\beta,% \mathbf{g}}(\sigma_{1})\mu_{\beta,\mathbf{g}_{t}}(\sigma_{2}).

It is known that two such samples $\sigma_{1},\sigma_{2}$ will likely have some nontrivial overlap when $t=0$ , in which case there is no perturbation and $\langle f(\sigma_{1},\sigma_{2})\rangle_{\beta,\mathbf{g},\mathbf{g}_{t}}$ is simply $\langle f(\sigma_{1},\sigma_{2})\rangle_{\beta,\mathbf{g}}$ .

Theorem 10 (See e.g. [4]).

If $|\beta|>1$ , then there exists $\epsilon=\epsilon(\beta)>0$ such that

\lim_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{g}}\left[\langle R_{1,2}(% \sigma_{1},\sigma_{2})^{2}\rangle_{\beta,\mathbf{g}}\right]\geq\epsilon(\beta).

(2.6)

However, when $t>0$ , then the overlap will be zero in the $n\to\infty$ limit.

Theorem 11 ([12]).

For all $\beta\in\mathbb{R},t\in(0,1]$ , we have

\lim_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{g},\mathbf{g}^{\prime}}\left[% \langle R_{1,2}(\sigma_{1},\sigma_{2})^{2}\rangle_{\beta,\mathbf{g},\mathbf{g}% _{t}}\right]=0.

(2.7)

If we think of the expression on the left hand side of (2.6) and (2.7) as a function of $t$ , then (2.6) and (2.7) together suggest that this function is not right-continuous at $t=0$ if $|\beta|>1$ . It was shown in [4] that this property very naturally obstructs a class of sampling algorithms for $\mu_{\beta,\mathbf{g}}$ .

Definition 12 (Definition 2.2 in [4]).

Let $\{\mathsf{ALG}_{n}\}_{n\geq 1}$ be a family of sampling algorithms for the Gibbs measure $\mu_{\beta,\mathbf{g}}$ , where $\mathsf{ALG}_{n}$ takes an $n\times n$ matrix $\mathbf{g}$ and $\beta\in\mathbb{R}$ as input and outputs an assignment in $\{-1,1\}^{n}$ . Let $\mu_{\beta,\mathbf{g}}^{\mathsf{ALG}}$ be the output law of $\mathsf{ALG}_{n}$ . We say that $\{\mathsf{ALG}_{n}\}_{n\geq 1}$ is stable (with respect to disorder at $\beta$ ), if

\lim_{t\to 0}\limsup_{n\to\infty}\mathop{{}\mathbf{E}}\left[W_{2,n}(\mu_{\beta% ,\mathbf{g}}^{\mathsf{ALG}},\mu_{\beta,\mathbf{g}_{t}}^{\mathsf{ALG}})\right]=0.

(2.8)

Intuitively, (2.8) means that stable algorithms are not able to make the leap at the discontinuity $t=0$ implied by (2.6) and (2.7), and therefore must be producing a distribution that is bounded away from the Gibbs distribution $\mu_{\beta,\mathbf{g}}$ in terms of the $W_{2,n}$ distance. This intuition is formalized in [4] as the following theorem.

Theorem 13 (Theorem 2.6 in [4]).

Fix $\beta$ such that $|\beta|>1$ . Let $\{\mathsf{ALG}_{n}\}_{n\geq 1}$ be a family of sampling algorithms for the Gibbs measure $\mu_{\beta,\mathbf{g}}$ that is stable with respect to disorder at $\beta$ , then

\liminf_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{g}}\left[W_{2,n}(\mu_{\beta% ,\mathbf{g}},\mu_{\beta,\mathbf{g}}^{\mathsf{ALG}})\right]>0.

(2.9)

In this paper, we obstruct stable sampling algorithms for sparse models as well. For the sparse models, we consider a slightly different notion of perturbation. Fix the average degree $d>0$ and the perturbation parameter $t\in[0,1]$ , we will take three independent random matrices $\mathbf{A}^{(1-t)},\mathbf{A}^{(t,1)},\mathbf{A}^{(t,2)}$ such that $\mathbf{A}^{(1-t)}_{ij}\sim\mathrm{Po}((1-t)d/(2n))$ , $\mathbf{A}^{(t,1)}_{ij},\mathbf{A}^{(t,2)}_{ij}\sim\mathrm{Po}(td/(2n))$ independently for all $i,j\in[n]$ . We then define $\mathbf{A}=\mathbf{A}^{(1-t)}+\mathbf{A}^{(t,1)}$ and $\mathbf{A}_{t}=\mathbf{A}^{(1-t)}+\mathbf{A}^{(t,2)}$ . Similarly to Definition 12, we say that a family of sampling algorithms $\{\mathsf{ALG}_{n}\}_{n\geq 1}$ with inputs $\beta$ and $\mathbf{A}$ is stable at inverse temperature $\beta$ if

\lim_{t\to 0}\limsup_{n\to\infty}\mathop{{}\mathbf{E}}\left[W_{2,n}(\mu_{\beta% ,\mathbf{A}}^{\mathsf{ALG}},\mu_{\beta,\mathbf{A}_{t}}^{\mathsf{ALG}})\right]=0.

(2.10)

As an example, it was recently shown that stable algorithms implemented by Boolean circuits with bounded depth are stable [1].

When $t=0$ , we have by Proposition 9 and Theorem 10 that for $\beta>1$ ,

\lim_{d\to\infty}\liminf_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{A}}[% \langle R_{1,2}^{2}\rangle_{\beta/\sqrt{d},\mathbf{A}}]>0,

(2.11)

and for $\beta$ with $|\beta|>1$

\lim_{d\to\infty}\liminf_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{A}}[% \langle R_{1,2}^{2}\rangle_{\beta/\sqrt{d},\mathbf{A}}^{\mathsf{bis}}]>0.

(2.12)

On the other hand, if $t>0$ , we show in the following lemma that the overlap tends to zero as $n$ goes to infinity.

Lemma 14.

Fix $t>0$ .

(a)

For any $\beta\geq 0$ ,

$\lim_{d\to\infty}\limsup_{n\to\infty}\mathop{{}\mathbf{E}}[\langle R_{1,2}^{2}% \rangle_{\beta/\sqrt{d},\mathbf{A},\mathbf{A}_{t}}]=0.$ (2.13)
(b)

For any $\beta\in\mathbb{R}$ ,

$\lim_{d\to\infty}\limsup_{n\to\infty}\mathop{{}\mathbf{E}}[\langle R_{1,2}^{2}% \rangle_{\beta/\sqrt{d},\mathbf{A},\mathbf{A}_{t}}^{\mathsf{bis}}]=0.$ (2.14)

The following lemma establishes the connection between the overlap and the $W_{2,n}$ distance.

Lemma 15 (Lemma 5.3 in [4]).

Fix $n\in\mathbb{N}$ . Let $\mu_{1},\mu_{2},\nu_{1},\nu_{2}$ be distributions over $\{-1,+1\}^{n}$ . We have

\left|\mathop{{}\mathbf{E}}_{(\sigma,\sigma^{\prime})\sim\mu_{1}\otimes\nu_{1}% }\left[|R_{1,2}(\sigma,\sigma^{\prime})|\right]-\mathop{{}\mathbf{E}}_{(\sigma% ,\sigma^{\prime})\sim\mu_{2}\otimes\nu_{2}}\left[|R_{1,2}(\sigma,\sigma^{% \prime})|\right]\right|\leq W_{2,n}(\mu_{1},\mu_{2})+W_{2,n}(\nu_{1},\nu_{2}).

(2.15)

We now have the ingredients needed for proving Theorem 1. Theorem 2 can be proved in the same way and we omit the details here.

Proof of Theorem 1.

We drop the inverse temperature parameter in the $\langle\cdot\rangle$ notation for brevity. For any arbitrary $t>0$ , by choosing $\mu_{1}=\mu_{2}=\nu_{1}=\mu_{\beta/\sqrt{d},\mathbf{A}}$ and $\nu_{2}=\mu_{\beta/\sqrt{d},\mathbf{A}_{t}}$ in Lemma 15, we get that

\displaystyle W_{2,n}(\mu_{\beta/\sqrt{d},\mathbf{A}},\mu_{\beta/\sqrt{d},% \mathbf{A}_{t}})\geq\left|\left\langle|R_{1,2}(\sigma,\sigma^{\prime})|\right% \rangle_{\mathbf{A}}-\left\langle|R_{1,2}(\sigma,\sigma^{\prime})|\right% \rangle_{\mathbf{A},\mathbf{A}_{t}}\right|.

(2.16)

Taking expectation on both sides, we obtain

	$\displaystyle\mathop{{}\mathbf{E}}_{\mathbf{A},\mathbf{A}_{t}}\left[W_{2,n}(% \mu_{\beta/\sqrt{d},\mathbf{A}},\mu_{\beta/\sqrt{d},\mathbf{A}_{t}})\right]$	$\displaystyle\geq\mathop{{}\mathbf{E}}_{\mathbf{A},\mathbf{A}_{t}}\left[\left\|% \left\langle\|R_{1,2}(\sigma,\sigma^{\prime})\|\right\rangle_{\mathbf{A}}-\left% \langle\|R_{1,2}(\sigma,\sigma^{\prime})\|\right\rangle_{\mathbf{A},\mathbf{A}_{% t}}\right\|\right]$
		$\displaystyle\geq\mathop{{}\mathbf{E}}_{\mathbf{A},\mathbf{A}_{t}}\left[\left% \langle\|R_{1,2}(\sigma,\sigma^{\prime})\|\right\rangle_{\mathbf{A}}\right]-% \mathop{{}\mathbf{E}}_{\mathbf{A},\mathbf{A}_{t}}\left[\left\langle\|R_{1,2}(% \sigma,\sigma^{\prime})\|\right\rangle_{\mathbf{A},\mathbf{A}_{t}}\right].$

In the second inequality we used the fact that $|a-b|\geq|a|-|b|$ for any $a,b\in\mathbb{R}$ . Taking $\liminf$ on both sides, we obtain

		$\displaystyle\liminf_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{A},\mathbf{A}_% {t}}\left[W_{2,n}(\mu_{\beta/\sqrt{d},\mathbf{A}},\mu_{\beta/\sqrt{d},\mathbf{% A}_{t}})\right]$
	$\displaystyle\geq\,$	$\displaystyle\liminf_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{A},\mathbf{A}_% {t}}\left[\left\langle\|R_{1,2}(\sigma,\sigma^{\prime})\|\right\rangle_{\mathbf{% A}}\right]-\limsup_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{A},\mathbf{A}_{t% }}\left[\left\langle\|R_{1,2}(\sigma,\sigma^{\prime})\|\right\rangle_{\mathbf{A}% ,\mathbf{A}_{t}}\right].$

By (2.11) and (2.13), if $d$ is sufficiently large, for some $\epsilon=\epsilon(\beta)>0$ independent of $t$ ,

\liminf_{n\to\infty}\mathop{{}\mathbf{E}}_{\mathbf{A},\mathbf{A}_{t}}\left[W_{% 2,n}(\mu_{\beta/\sqrt{d},\mathbf{A}},\mu_{\beta/\sqrt{d},\mathbf{A}_{t}})% \right]\geq\epsilon.

(2.17)

By the triangle inequality, we have

		$\displaystyle W_{2,n}(\mu_{\beta/\sqrt{d},\mathbf{A}},\mu_{\beta/\sqrt{d},% \mathbf{A}_{t}})$
	$\displaystyle\leq\,$	$\displaystyle W_{2,n}(\mu_{\beta/\sqrt{d},\mathbf{A}},\mu_{\beta/\sqrt{d},% \mathbf{A}}^{\mathsf{ALG}})+W_{2,n}(\mu_{\beta/\sqrt{d},\mathbf{A}}^{\mathsf{% ALG}},\mu_{\beta/\sqrt{d},\mathbf{A}_{t}}^{\mathsf{ALG}})+W_{2,n}(\mu_{\beta/% \sqrt{d},\mathbf{A}_{t}},\mu_{\beta/\sqrt{d},\mathbf{A}_{t}}^{\mathsf{ALG}}).$

Since $\mathbf{A}$ and $\mathbf{A}_{t}$ have the same distribution, we have

\mathop{{}\mathbf{E}}\left[W_{2,n}(\mu_{\beta/\sqrt{d},\mathbf{A}},\mu_{\beta/% \sqrt{d},\mathbf{A}}^{\mathsf{ALG}})\right]=\mathop{{}\mathbf{E}}\left[W_{2,n}% (\mu_{\beta/\sqrt{d},\mathbf{A}_{t}},\mu_{\beta/\sqrt{d},\mathbf{A}_{t}}^{% \mathsf{ALG}})\right].

(2.18)

Since $\{\mathsf{ALG}_{n}\}_{n}$ is stable at any inverse temperature, by (2.10) and taking $t$ sufficiently small we have

\limsup_{n\to\infty}\mathop{{}\mathbf{E}}\left[W_{2,n}(\mu_{\beta/\sqrt{d},% \mathbf{A}}^{\mathsf{ALG}},\mu_{\beta/\sqrt{d},\mathbf{A}_{t}}^{\mathsf{ALG}})% \right]\leq\frac{\epsilon}{2}.

(2.19)

It follows that

\displaystyle\liminf\limits_{n\to\infty}\mathop{{}\mathbf{E}}\left[W_{2,n}(\mu% _{\beta/\sqrt{d},\mathbf{A}}^{\mathsf{ALG}},\mu_{\beta/\sqrt{d},\mathbf{A}})% \right]\geq\frac{\epsilon}{4}.

$\hfill\blacktriangleleft$

Free energy of coupled bisection models

In subsequent proofs, we will need the following results on the free energy for coupled bisection models. More specifically, we show that the limiting free energy does not change under the bisection constraint for both dense and sparse models (for the sparse model we require $d\to\infty$ ). Intuitively, in the case of SK model this holds because the symmetry of its disorder implies that there is nothing special about the all-one direction. The sparse case requires more care since it no longer has the symmetry, but we overcome this difficulty using Poisson concentration arguments. Due to the space constraint, proofs will be omitted here and we refer interested readers to the full version of this paper.

Let us define ( $\beta$ is omitted from the notation for simplicity)

Z^{S}_{\mathbf{g},\mathbf{g}_{t}}=\sum_{\begin{subarray}{c}\sigma_{1},\sigma_{% 2}\in\{-1,1\}^{n}\\ R_{1,2}(\sigma_{1},\sigma_{2})\in S\end{subarray}}\exp\left(-\beta(H(\sigma_{1% };\mathbf{g})+H(\sigma_{2};\mathbf{g}_{t}))\right)

and

Z^{S,\mathsf{bis}}_{\mathbf{g},\mathbf{g}_{t}}=\sum_{\begin{subarray}{c}\sigma% _{1},\sigma_{2}\in A_{n}\\ R_{1,2}(\sigma_{1},\sigma_{2})\in S\end{subarray}}\exp\left(-\beta(H(\sigma_{1% };\mathbf{g})+H(\sigma_{2};\mathbf{g}_{t}))\right).

Similarly for the sparse model, let us define

	$\displaystyle Z^{S}_{\mathbf{A},\mathbf{A}_{t}}$	$\displaystyle=\sum_{\begin{subarray}{c}\sigma_{1},\sigma_{2}\in\{-1,1\}^{n}\\ R_{1,2}(\sigma_{1},\sigma_{2})\in S\end{subarray}}\exp\left(-\frac{\beta}{% \sqrt{d}}\cdot(H(\sigma_{1};\mathbf{A})+H(\sigma_{2};\mathbf{A}_{t}))\right),$		(2.20)
	$\displaystyle Z^{S,\mathsf{bis}}_{\mathbf{A},\mathbf{A}_{t}}$	$\displaystyle=\sum_{\begin{subarray}{c}\sigma_{1},\sigma_{2}\in A_{n}\\ R_{1,2}(\sigma_{1},\sigma_{2})\in S\end{subarray}}\exp\left(-\frac{\beta}{% \sqrt{d}}\cdot(H(\sigma_{1};\mathbf{A})+H(\sigma_{2};\mathbf{A}_{t}))\right).$		(2.21)

Theorem 16.

Let $S\subseteq[-1,1]$ . For any $\beta\in\mathbb{R}$ , we have

\displaystyle\lim_{n\to\infty}\frac{1}{n}\mathop{{}\mathbf{E}}\left[\log Z^{S,% \mathsf{bis}}_{\mathbf{g},\mathbf{g}_{t}}-\log Z^{S}_{\mathbf{g},\mathbf{g}_{t% }}\right]=0.

By letting $t=0$ and $S=[-1,1]$ , we immediately obtain the following.

Corollary 17.

We have

\displaystyle\lim_{n\to\infty}\left(\Phi_{n,\mathrm{SK}}^{\mathsf{bis}}(\beta)% -\Phi_{n,\mathrm{SK}}(\beta)\right)=0.

Lemma 18.

Fix $\beta,d>0$ and $t\in[0,1]$ . For every $\delta\in(0,1/2)$ , there exists a constant $C=C(\beta,d)$ such that for all sufficiently large $n$

\Pr\left[\left|\log Z^{S}_{\mathbf{A},\mathbf{A}_{t}}-\mathop{{}\mathbf{E}}[% \log Z^{S}_{\mathbf{A},\mathbf{A}_{t}}]\right|\geq n^{1/2+\delta}\right]\leq% \exp\left(-C\cdot n^{2\delta}\right)

(2.22)

and

\Pr\left[\left|\log Z^{S,\mathsf{bis}}_{\mathbf{A},\mathbf{A}_{t}}-\mathop{{}% \mathbf{E}}[\log Z^{S,\mathsf{bis}}_{\mathbf{A},\mathbf{A}_{t}}]\right|\geq n^% {1/2+\delta}\right]\leq\exp\left(-C\cdot n^{2\delta}\right).

(2.23)

Theorem 19.

Let $0\leq a<b$ . For every $\epsilon>0$ , if $d$ is sufficiently large, then for every $\beta\in[a,b]$ ,

\limsup_{n\to\infty}\frac{1}{n}\left|\mathop{{}\mathbf{E}}\log Z^{S}_{\mathbf{% A},\mathbf{A}_{t}}-\mathop{{}\mathbf{E}}\log Z^{S,\mathsf{bis}}_{\mathbf{A},% \mathbf{A}_{t}}\right|\leq\epsilon.

(2.24)

Again, by letting $t=0$ and $S=[-1,1]$ , we immediately obtain the following.

Theorem 20.

Let $0\leq a<b$ . For every $\epsilon>0$ , if $d$ is sufficiently large, then for every $\beta\in[a,b]$ ,

\limsup_{n\to\infty}\left|\Phi_{n,d}\left(\frac{\beta}{\sqrt{d}}\right)-\Phi_{% n,d}^{\mathsf{bis}}\left(\frac{\beta}{\sqrt{d}}\right)\right|\leq\epsilon.

(2.25)

3 Interpolating the average energy between SK and the diluted model

In this section we prove Proposition 8 and establish the interpolation of average energy between sparse and dense models. By Lemma 7, we already know that the free energy of the sparse model converges to that of the dense model as the average degree $d$ goes to infinity. The following connection between free energy and average energy is well-known:

\frac{\partial}{\partial\beta}\mathop{{}\mathbf{E}}\left[\log Z(\beta,\mathbf{% X})\right]=\mathop{{}\mathbf{E}}\left[\frac{\sum_{\sigma\in\{-1,1\}^{n}}-H(% \sigma;\mathbf{X})\exp(-\beta H(\sigma;\mathbf{X}))}{Z(\beta,\mathbf{X})}% \right]=-\mathop{{}\mathbf{E}}\left[\langle H(\sigma;\mathbf{X})\rangle_{\beta% ,\mathbf{X}}\right].

(3.1)

Similarly,

\frac{\partial}{\partial\beta}\mathop{{}\mathbf{E}}\left[\log Z^{\mathsf{bis}}% (\beta,\mathbf{X})\right]=\mathop{{}\mathbf{E}}\left[\frac{\sum_{\sigma\in A_{% n}}-H(\sigma;\mathbf{X})\exp(-\beta H(\sigma;\mathbf{X}))}{Z^{\mathsf{bis}}(% \beta,\mathbf{X})}\right]=-\mathop{{}\mathbf{E}}\left[\langle H(\sigma;\mathbf% {X})\rangle_{\beta,\mathbf{X}}^{\mathsf{bis}}\right].

(3.2)

To obtain convergence of the partial derivatives, we use the following elementary fact sometimes known as Griffith’s lemma in statistical physics.

Proposition 21 (See e.g. [41]).

Let $(f_{\alpha})_{\alpha\geq 0}$ be a family of convex and differentiable functions that converges pointwise in an open interval $I$ to $f$ , then $\lim_{\alpha\to\infty}f^{\prime}_{\alpha}(x)=f^{\prime}(x)$ at every $x\in I$ where $f^{\prime}(x)$ exists.

Proof of Proposition 8.

Proposition 8 says that

(a)

For every $\beta\geq 0$ , $\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left|\mathop{{}\mathbf{E}}_{% \mathbf{g}}[\langle H_{\mathrm{SK}}(\sigma)\rangle_{\beta,\mathbf{g}}]-\frac{1% }{\sqrt{d}}\mathop{{}\mathbf{E}}_{\mathbf{A}}[\langle H_{d}(\sigma)\rangle_{% \beta/\sqrt{d},\mathbf{A}}]\right|=0.$
(b)

For every $\beta\in\mathbb{R}$ , $\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left|\mathop{{}\mathbf{E}}_{% \mathbf{g}}[\langle H_{\mathrm{SK}}(\sigma)\rangle_{\beta,\mathbf{g}}^{\mathsf% {bis}}]-\frac{1}{\sqrt{d}}\mathop{{}\mathbf{E}}_{\mathbf{A}}[\langle H_{d}(% \sigma)\rangle_{\beta/\sqrt{d},\mathbf{A}}^{\mathsf{bis}}]\right|=0.$

We first prove part (b). Assume for the sake of contradiction that for some $\beta_{0}\in\mathbb{R}$

\limsup_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left|\mathop{{}\mathbf{E}}% _{\mathbf{g}}[\langle H_{\mathrm{SK}}(\sigma)\rangle_{\beta_{0},\mathbf{g}}^{% \mathsf{bis}}]-\frac{1}{\sqrt{d}}\mathop{{}\mathbf{E}}_{\mathbf{A}}[\langle H_% {d}(\sigma)\rangle_{\beta_{0}/\sqrt{d},\mathbf{A}}^{\mathsf{bis}}]\right|=% \epsilon>0.

(3.3)

It follows that we can choose a sequence of pairs $(d_{i},n_{i})_{i\in\mathbb{N}}$ such that

\lim_{i\to\infty}\frac{1}{n_{i}}\left|\mathop{{}\mathbf{E}}_{\mathbf{g}}[% \langle H_{\mathrm{SK}}(\sigma)\rangle_{\beta_{0},\mathbf{g}}^{\mathsf{bis}}]-% \frac{1}{\sqrt{d_{i}}}\mathop{{}\mathbf{E}}_{\mathbf{A}}[\langle H_{d_{i}}(% \sigma)\rangle_{\beta_{0}/\sqrt{d_{i}},\mathbf{A}}^{\mathsf{bis}}]\right|=\epsilon.

(3.4)

Note that for every $i$ , $n_{i}$ can be chosen to be sufficiently large so that the $o_{n}(1)$ term in Lemma 7 is $\leq 1/d_{i}$ , and under this assumption we have $\left(\Phi_{n_{i},d_{i}}^{\mathsf{bis}}\left(\frac{\beta}{\sqrt{d_{i}}}\right)% \right)_{i\in\mathbb{N}}$ as a family of functions of $\beta$ converges pointwise⁴⁴4The convergence is in fact uniform in any bounded interval. to $\lim_{i\to\infty}\Phi_{n_{i},\mathrm{SK}}^{\mathsf{bis}}\left(\beta\right)$ . Here we remark that it is known that the limit $\lim_{n\to\infty}\Phi_{n,\mathrm{SK}}(\beta)$ exists and is differentiable in $\beta$ [39, 40], and by Corollary 17 we have

\lim_{n\to\infty}\Phi_{n,\mathrm{SK}}\left(\beta\right)=\lim_{n\to\infty}\Phi_% {n,\mathrm{SK}}^{\mathsf{bis}}\left(\beta\right)=\lim_{i\to\infty}\Phi_{n_{i},% \mathrm{SK}}^{\mathsf{bis}}\left(\beta\right).

(3.5)

By Proposition 21, we have

\lim_{i\to\infty}\frac{\partial}{\partial\beta}\Phi_{n_{i},d_{i}}^{\mathsf{bis% }}\left(\frac{\beta}{\sqrt{d_{i}}}\right)=\frac{\partial}{\partial\beta}\lim_{% n\to\infty}\Phi_{n,\mathrm{SK}}^{\mathsf{bis}}(\beta)=\lim_{n\to\infty}\frac{% \partial}{\partial\beta}\Phi_{n,\mathrm{SK}}^{\mathsf{bis}}(\beta)=\lim_{i\to% \infty}\frac{\partial}{\partial\beta}\Phi_{n_{i},\mathrm{SK}}^{\mathsf{bis}}(\beta)

(3.6)

for every $\beta\in\mathbb{R}$ . By (3.2), we have

\frac{\partial}{\partial\beta}\Phi_{n,d}^{\mathsf{bis}}\left(\frac{\beta}{% \sqrt{d}}\right)=-\frac{1}{\sqrt{d}\cdot n}\mathop{{}\mathbf{E}}_{\mathbf{A}}% \left[\langle H_{d}(\sigma)\rangle_{\beta/\sqrt{d},\mathbf{A}}^{\mathsf{bis}}% \right],

(3.7)

and

\frac{\partial}{\partial\beta}\Phi_{n,\mathrm{SK}}^{\mathsf{bis}}(\beta)=-% \frac{1}{n}\mathop{{}\mathbf{E}}_{\mathbf{g}}[\langle H_{\mathrm{SK}}(\sigma)% \rangle_{\beta,\mathbf{g}}^{\mathsf{bis}}].

(3.8)

(3.6), (3.7), and (3.8) imply that

\lim_{i\to\infty}\frac{1}{n_{i}}\left|\mathop{{}\mathbf{E}}_{\mathbf{g}}[% \langle H_{\mathrm{SK}}(\sigma)\rangle_{\beta_{0},\mathbf{g}}^{\mathsf{bis}}]-% \frac{1}{\sqrt{d_{i}}}\mathop{{}\mathbf{E}}_{\mathbf{A}}[\langle H_{d_{i}}(% \sigma)\rangle_{\beta_{0}/\sqrt{d_{i}},\mathbf{A}}^{\mathsf{bis}}]\right|=0.

(3.9)

However, this contradicts (3.4), so we obtain part (b).

For part (a), note that by Theorem 20, for any bounded open interval $I\subset[0,+\infty)$ , we can choose the sequence $(d_{i},n_{i})_{i\in\mathbb{N}}$ to satisfy the additional requirement that

\lim_{i\to\infty}\Phi_{n_{i},d_{i}}^{\mathsf{bis}}(\beta/\sqrt{d_{i}})=\lim_{i% \to\infty}\Phi_{n_{i},d_{i}}(\beta/\sqrt{d_{i}})

(3.10)

for all $\beta\in I$ . In particular, we can choose $I$ to include any desired $\beta_{0}>0$ (when $\beta=0$ the Gibbs measure is uniform and part (a) trivially holds). We can then invoke Proposition 21 again and proceed with the same proof. $\hfill\blacktriangleleft$

4 Interpolating the average overlap between SK and the diluted model

In this section we prove Proposition 9. Since the proofs are mostly the same, we will only prove item (a) in Proposition 9. We first state the following well-known fact which relates the average energy to the average overlap of the Gibbs distribution in the SK model.

Lemma 22 (See e.g. [32]).

For every $\beta\in\mathbb{R}$ , we have

\frac{1}{n}\mathop{{}\mathbf{E}}_{\mathbf{g}}\left[\langle H_{\mathrm{SK}}(% \sigma)\rangle\right]=\frac{\beta}{2}\mathop{{}\mathbf{E}}_{\mathbf{g}}[1-% \langle R_{1,2}^{2}\rangle_{\beta,\mathbf{g}}].

(4.1)

The proof of Lemma 22 uses the following proposition.

Proposition 23.

Let $\mathbf{X}\in\mathbb{R}^{n\times n},\beta\in\mathbb{R}$ . We have

\sum_{\sigma\in\{-1,1\}^{n}}\sum_{i,j=1}^{n}\sigma(i)\sigma(j)\frac{\partial}{% \partial\mathbf{X}_{ij}}\mu(\sigma;\beta,\mathbf{X})=-\beta n^{2}\left(1-% \langle R_{1,2}^{2}\rangle_{\beta,\mathbf{X}}\right)

(4.2)

Proof.

For every $\sigma\in\{-1,1\}^{n}$ and $i,j\in[n]$ , we have

	$\displaystyle\frac{\partial}{\partial\mathbf{X}_{ij}}\mu(\sigma;\beta,\mathbf{% X})$
$\displaystyle=\,$	$\displaystyle\frac{\partial}{\partial\mathbf{X}_{ij}}\left(\frac{\exp\left(-% \beta H(\sigma;\mathbf{X})\right)}{Z(\beta,\mathbf{X})}\right)$
$\displaystyle=\,$	$\displaystyle(-\beta)\left(\sigma(i)\sigma(j)\mu(\sigma;\beta,\mathbf{X})-\sum% _{\sigma^{\prime}\in\{-1,1\}^{n}}\sigma^{\prime}(i)\sigma^{\prime}(j)\cdot\mu(% \sigma;\beta,\mathbf{X})\mu(\sigma^{\prime};\beta,\mathbf{X})\right).$	(4.3)

Summing over all $\sigma,i,j$ , we obtain

		$\displaystyle\sum_{\sigma\in\{-1,1\}^{n}}\sum_{i,j=1}^{n}\sigma(i)\sigma(j)% \frac{\partial}{\partial\mathbf{X}_{ij}}\mu(\sigma;\beta,\mathbf{X})$
	$\displaystyle=\,$	$\displaystyle\sum_{\sigma\in\{-1,1\}^{n}}\sum_{i,j=1}^{n}\sigma(i)\sigma(j)% \cdot(-\beta)\Bigg{(}\sigma(i)\sigma(j)\mu(\sigma;\beta,\mathbf{X})$
		$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad-\sum_{\sigma^{% \prime}\in\{-1,1\}^{n}}\sigma^{\prime}(i)\sigma^{\prime}(j)\cdot\mu(\sigma;% \beta,\mathbf{X})\mu(\sigma^{\prime};\beta,\mathbf{X})\Bigg{)}$
	$\displaystyle=\,$	$\displaystyle-\beta n^{2}\left(1-\langle R_{1,2}^{2}\rangle_{\beta,\mathbf{X}}% \right).\$

$\hfill\blacktriangleleft$

Lemma 22 can be proved using Proposition 23 and the Gaussian integration by parts formula. We prove a statement similar to Lemma 22, but for the sparse model. We will use the following lemma, known as the Stein-Chen identity for the Poisson distribution, in place of Gaussian integration by parts.

Lemma 24 ([14]).

Let $X\sim\mathrm{Po}(\lambda)$ . For any bounded function $f$ , we have

\mathop{{}\mathbf{E}}[X\cdot f(X)]=\lambda\mathop{{}\mathbf{E}}[f(X+1)].

(4.4)

Lemma 25.

Let $\mathbf{J}_{ij}$ be the matrix with 1 in the $(i,j)$ -th entry and 0 everywhere else. Then there exists $\xi\in[0,1]$ such that

\left|\mu(\sigma;\beta/\sqrt{d},\mathbf{X}+\mathbf{J}_{ij})-\mu(\sigma;\beta/% \sqrt{d},\mathbf{X})-\frac{\partial}{\partial\mathbf{X}_{ij}}\mu(\sigma;\beta/% \sqrt{d},\mathbf{X})\right|\leq\frac{3\beta^{2}}{d}\cdot\mu(\sigma;\beta/\sqrt% {d},\mathbf{X}+\xi\mathbf{J}_{ij}).

(4.5)

Proof.

Since $\mu$ is infinitely differentiable in each entry of $\mathbf{X}$ , by Taylor’s Theorem, there exists $\xi\in[0,1]$ such that

\mu(\sigma;\beta/\sqrt{d},\mathbf{X}+\mathbf{J}_{ij})-\mu(\sigma;\beta/\sqrt{d% },\mathbf{X})-\frac{\partial}{\partial\mathbf{X}_{ij}}\mu(\sigma;\beta/\sqrt{d% },\mathbf{X})=\frac{1}{2}\cdot\frac{\partial^{2}}{\partial\mathbf{X}_{ij}^{2}}% \mu(\sigma;\beta/\sqrt{d},\mathbf{X}+\xi\mathbf{J}_{ij}).

(4.6)

It remains to bound $\frac{\partial^{2}}{\partial\mathbf{X}_{ij}^{2}}\mu(\sigma;\beta/\sqrt{d},% \mathbf{X}+\xi\mathbf{J}_{ij})$ . By (4.3), we have (omitting $\beta/\sqrt{d},\mathbf{X}+\xi\mathbf{J}_{ij}$ from the notation for simplicity)

	$\displaystyle\frac{\partial^{2}}{\partial\mathbf{X}_{ij}^{2}}\mu(\sigma)$	$\displaystyle=\frac{\partial}{\partial\mathbf{X}_{ij}}\left(\left(-\frac{\beta% }{\sqrt{d}}\right)\left(\sigma(i)\sigma(j)-\sum_{\sigma^{\prime}\in\{-1,1\}^{n% }}\sigma^{\prime}(i)\sigma^{\prime}(j)\mu(\sigma^{\prime})\right)\mu(\sigma)\right)$
		$\displaystyle=\left(-\frac{\beta}{\sqrt{d}}\right)^{2}\left(-1+\sum_{\sigma^{% \prime},\sigma^{\prime\prime}\in\{-1,1\}^{n}}\sigma^{\prime}(i)\sigma^{\prime}% (j)\sigma^{\prime\prime}(i)\sigma^{\prime\prime}(j)\mu(\sigma^{\prime})\mu(% \sigma^{\prime\prime})\right)\mu(\sigma)$
		$\displaystyle\qquad\qquad+\left(-\frac{\beta}{\sqrt{d}}\right)^{2}\left(\sigma% (i)\sigma(j)-\sum_{\sigma^{\prime}\in\{-1,1\}^{n}}\sigma^{\prime}(i)\sigma^{% \prime}(j)\mu(\sigma^{\prime})\right)^{2}\mu(\sigma).$

Since

\left|\sum_{\sigma^{\prime},\sigma^{\prime\prime}\in\{-1,1\}^{n}}\sigma^{% \prime}(i)\sigma^{\prime}(j)\sigma^{\prime\prime}(i)\sigma^{\prime\prime}(j)% \mu(\sigma^{\prime})\mu(\sigma^{\prime\prime})\right|=\left|\langle\sigma^{% \prime}(i)\sigma^{\prime}(j)\sigma^{\prime\prime}(i)\sigma^{\prime\prime}(j)% \rangle\right|\leq 1

(4.7)

and similarly

\left|\sum_{\sigma^{\prime}\in\{-1,1\}^{n}}\sigma^{\prime}(i)\sigma^{\prime}(j% )\mu(\sigma^{\prime})\right|=|\langle\sigma^{\prime}(i)\sigma^{\prime}(j)% \rangle|\leq 1,

(4.8)

it follows that

\left|\frac{\partial^{2}}{\partial\mathbf{X}_{ij}^{2}}\mu(\sigma)\right|\leq% \frac{\beta^{2}}{d}(2+4)\mu(\sigma)=\frac{6\beta^{2}}{d}\mu(\sigma).

(4.9)

(4.6) and (4.9) together imply (4.5). $\hfill\blacktriangleleft$

Proposition 26.

Let $\mathbf{J}_{ij}$ be the matrix with 1 in the $(i,j)$ -th entry and 0 everywhere else. For any $\xi\in\mathbb{R}$ , we have

\mathrm{e}^{-2\beta|\xi|}\cdot\mu(\sigma;\beta,\mathbf{X})\leq\mu(\sigma;\beta% ,\mathbf{X}+\xi\mathbf{J}_{ij})\leq\mathrm{e}^{2\beta|\xi|}\cdot\mu(\sigma;% \beta,\mathbf{X})

(4.10)

Proof.

We have

\exp(-\beta H(\sigma;\mathbf{X}+\xi\mathbf{J}_{ij}))=\exp(-\beta H(\sigma;% \mathbf{X})-\beta H(\sigma;\xi\mathbf{J}_{ij}))=\exp(-\beta H(\sigma;\mathbf{X% }))\cdot\mathrm{e}^{-\beta\xi\sigma(i)\sigma(j)},

(4.11)

which implies that

\mathrm{e}^{-\beta|\xi|}\exp(-\beta H(\sigma;\mathbf{X}))\leq\exp(-\beta H(% \sigma;\mathbf{X}+\xi\mathbf{J}_{ij}))\leq\mathrm{e}^{\beta|\xi|}\exp(-\beta H% (\sigma;\mathbf{X})).

(4.12)

Summing over $\sigma\in\{-1,1\}^{n}$ , we obtain

\mathrm{e}^{-\beta|\xi|}Z(\beta,\mathbf{X})\leq Z(\beta,\mathbf{X}+\xi\mathbf{% J}_{ij})\leq\mathrm{e}^{\beta|\xi|}Z(\beta,\mathbf{X}).

(4.13)

Since these quantities are all strictly positive, we have

\mu(\sigma;\beta,\mathbf{X}+\xi\mathbf{J}_{ij})=\frac{\exp(-\beta H(\sigma;% \mathbf{X}+\xi\mathbf{J}_{ij}))}{Z(\beta,\mathbf{X}+\xi\mathbf{J}_{ij})}\leq% \frac{\mathrm{e}^{\beta|\xi|}\exp(-\beta H(\sigma;\mathbf{X}))}{\mathrm{e}^{-% \beta|\xi|}Z(\beta,\mathbf{X})}=\mathrm{e}^{2\beta|\xi|}\cdot\mu(\sigma;\beta,% \mathbf{X})

(4.14)

and similarly

\displaystyle\mathrm{e}^{-2\beta|\xi|}\cdot\mu(\sigma;\beta,\mathbf{X})\leq\mu% (\sigma;\beta,\mathbf{X}+\xi\mathbf{J}_{ij}).

$\hfill\blacktriangleleft$

Lemma 27.

For every $\beta\in\mathbb{R}$ , we have

		$\displaystyle\frac{1}{\sqrt{d}n}\mathop{{}\mathbf{E}}\left[\left\langle H_{d}(% \sigma)\right\rangle_{\frac{\beta}{\sqrt{d}},\mathbf{A}}\right]$
	$\displaystyle=\,$	$\displaystyle\frac{\sqrt{d}}{2}\cdot\mathop{{}\mathbf{E}}\left[\left\langle% \left(\frac{1}{n}\sum_{i=1}^{n}\sigma(i)\right)^{2}\right\rangle_{\frac{\beta}% {\sqrt{d}},\mathbf{A}}\right]-\frac{\beta}{2}\mathop{{}\mathbf{E}}\left[1-% \langle R_{1,2}^{2}\rangle_{\frac{\beta}{\sqrt{d}},\mathbf{A}}\right]+O_{d}% \left(\frac{1}{\sqrt{d}}\right).$

Proof.

For simplicity, we drop the subscripts from $\langle\cdot\rangle$ . We have

$\displaystyle\frac{1}{\sqrt{d}n}\mathop{{}\mathbf{E}}\left[\left\langle H_{d}(% \sigma)\right\rangle\right]$	$\displaystyle=\frac{1}{\sqrt{d}n}\mathop{{}\mathbf{E}}\left[\sum_{\sigma\in\{-% 1,1\}^{n}}\sum_{i,j=1}^{n}\sigma(i)\sigma(j)\mathbf{A}_{ij}\cdot\mu\left(% \sigma;\beta/\sqrt{d},\mathbf{A}\right)\right]$
	$\displaystyle=\frac{1}{\sqrt{d}n}\cdot\frac{d}{2n}\cdot\mathop{{}\mathbf{E}}% \left[\sum_{\sigma\in\{-1,1\}^{n}}\sum_{i,j=1}^{n}\sigma(i)\sigma(j)\mu\left(% \sigma;\beta/\sqrt{d},\mathbf{A}+\mathbf{J}_{ij}\right)\right]$
	$\displaystyle=\frac{\sqrt{d}}{2n^{2}}\cdot\mathop{{}\mathbf{E}}\left[\sum_{% \sigma\in\{-1,1\}^{n}}\sum_{i,j=1}^{n}\sigma(i)\sigma(j)\mu\left(\sigma;\beta/% \sqrt{d},\mathbf{A}+\mathbf{J}_{ij}\right)\right].$	(4.15)

Here in the second equality we invoked Lemma 24. By Lemma 25, for some $\xi_{\sigma,i,j}\in[0,1]$ we have

	$\displaystyle\Bigg{\|}\mathop{{}\mathbf{E}}\Bigg{[}\sum_{\sigma\in\{-1,1\}^{n}}% \sum_{i,j=1}^{n}\sigma(i)\sigma(j)\mu\left(\sigma;\beta/\sqrt{d},\mathbf{A}+% \mathbf{J}_{ij}\right)\Bigg{]}$
	$\displaystyle\qquad-\mathop{{}\mathbf{E}}\Bigg{[}\sum_{\sigma\in\{-1,1\}^{n}}% \sum_{i,j=1}^{n}\sigma(i)\sigma(j)\left(\mu\left(\sigma;\beta/\sqrt{d},\mathbf% {A}\right)+\frac{\partial}{\partial\mathbf{A}_{ij}}\mu\left(\sigma;\beta/\sqrt% {d},\mathbf{A}\right)\right)\Bigg{]}\Bigg{\|}$
$\displaystyle\leq\,$	$\displaystyle\frac{3\beta^{2}}{d}\cdot\mathop{{}\mathbf{E}}\left[\sum_{\sigma% \in\{-1,1\}^{n}}\sum_{i,j=1}^{n}\mu\left(\sigma;\beta/\sqrt{d},\mathbf{A}+\xi_% {\sigma,i,j}\mathbf{J}_{ij}\right)\right]$
$\displaystyle\leq\,$	$\displaystyle\frac{3\beta^{2}}{d}\cdot n^{2}\cdot\mathrm{e}^{2\beta/\sqrt{d}}% \cdot\mathop{{}\mathbf{E}}\Bigg{[}\sum_{\sigma\in\{-1,1\}^{n}}\mu\left(\sigma;% \beta/\sqrt{d},\mathbf{A}\right)\Bigg{]}$
$\displaystyle=\,$	$\displaystyle\frac{3\beta^{2}}{d}\cdot n^{2}\cdot\mathrm{e}^{2\beta/\sqrt{d}},$	(4.16)

where the last inequality is due to Proposition 26. We observe that

\sum_{\sigma\in\{-1,1\}^{n}}\sum_{i,j=1}^{n}\sigma(i)\sigma(j)\mu\left(\sigma;% \beta/\sqrt{d},\mathbf{A}\right)=n^{2}\cdot\left\langle\left(\frac{1}{n}\sum_{% i=1}^{n}\sigma(i)\right)^{2}\right\rangle.

(4.17)

By Proposition 23, we also have

\sum_{\sigma\in\{-1,1\}^{n}}\sum_{i,j=1}^{n}\sigma(i)\sigma(j)\frac{\partial}{% \partial\mathbf{A}_{ij}}\mu\left(\sigma;\beta/\sqrt{d},\mathbf{A}\right)=-% \frac{\beta}{\sqrt{d}}\cdot n^{2}\left(1-\langle R_{1,2}^{2}\rangle\right).

(4.18)

The lemma follows by combining (4.15), (4.16), (4.17), and (4.18). $\hfill\blacktriangleleft$ The next proposition deals with the first term in the above lemma. Note that this term is trivially bounded in the bisection models for all $\beta\in\mathbb{R}$ .

Proposition 28.

For every $\beta\geq 0$ , we have

\lim_{d\to\infty}\sqrt{d}\cdot\limsup_{n\to\infty}\mathop{{}\mathbf{E}}\left[% \left\langle\left(\frac{1}{n}\sum_{i=1}^{n}\sigma(i)\right)^{2}\right\rangle_{% \beta/\sqrt{d},\mathbf{A}}\right]=0.

(4.19)

Proof.

Fix an arbitrary $\epsilon>0$ and let $S(n,\epsilon)=\{\sigma\in\{-1,1\}^{n}\mid-\epsilon n\leq\sum_{i=1}^{n}\sigma(i% )\leq\epsilon n\}$ . We have

	$\displaystyle\mathop{{}\mathbf{E}}\left[\left\langle\left(\frac{1}{n}\sum_{i=1% }^{n}\sigma(i)\right)^{2}\right\rangle_{\beta/\sqrt{d},\mathbf{A}}\right]$
$\displaystyle=\,$	$\displaystyle\mathop{{}\mathbf{E}}\left[\sum_{\sigma\in\{-1,1\}^{n}}\left(% \frac{1}{n}\sum_{i=1}^{n}\sigma(i)\right)^{2}\cdot\mu\left(\sigma;\beta/\sqrt{% d},\mathbf{A}\right)\right]$
$\displaystyle\leq\,$	$\displaystyle\mathop{{}\mathbf{E}}\left[\sum_{\sigma\in S(n,\epsilon/d^{1/4})}% \left(\frac{\epsilon}{d^{1/4}}\right)^{2}\cdot\mu\left(\sigma;\beta/\sqrt{d},% \mathbf{A}\right)\right]$
	$\displaystyle\qquad\qquad+\mathop{{}\mathbf{E}}\left[\sum_{\sigma\in\{-1,1\}^{% n}\backslash S(n,\epsilon/d^{1/4})}1\cdot\mu\left(\sigma;\beta/\sqrt{d},% \mathbf{A}\right)\right]$
$\displaystyle\leq\,$	$\displaystyle\left(\frac{\epsilon}{d^{1/4}}\right)^{2}+\mathop{{}\mathbf{E}}% \left[\sum_{\sigma\in\{-1,1\}^{n}\backslash S(n,\epsilon/d^{1/4})}\mu\left(% \sigma;\beta/\sqrt{d},\mathbf{A}\right)\right].$	(4.20)

Recall that $A_{n}=\{\sigma\in\{-1,1\}^{n}:|\sum_{i=1}^{n}\sigma(i)|\leq 1\}$ . Given $\sigma\in\{-1,1\}^{n}\backslash S(n,\epsilon/d^{1/4})$ , take a $\tau\in A_{n}$ such that $R_{1,2}(\tau,\sigma)=\max_{\sigma^{\prime}\in A_{n}}R_{1,2}(\sigma^{\prime},\sigma)$ . By Poisson concentration properties (see full version for details), we have that there exists a constant $C=C(\epsilon)>0$ such that with probability $1-2\cdot\exp(-Cd^{1/4}n)$ we have $H_{d}(\sigma)-H_{d}(\tau)\geq\frac{\epsilon^{2}\sqrt{d}n}{4}$ , and consequently

\mu(\sigma;\beta/\sqrt{d},\mathbf{A})\leq\mu(\tau;\beta/\sqrt{d},\mathbf{A})% \cdot\exp\left(-\frac{\beta\epsilon^{2}n}{4}\right).

(4.21)

By taking expectations, we obtain

	$\displaystyle\mathop{{}\mathbf{E}}[\mu(\sigma;\beta/\sqrt{d},\mathbf{A})]$	$\displaystyle\leq\mathop{{}\mathbf{E}}[\mu(\tau;\beta/\sqrt{d},\mathbf{A})]% \cdot\exp\left(-\frac{\beta\epsilon^{2}n}{4}\right)+2\cdot\exp(-Cd^{1/4}n)$
		$\displaystyle\leq\frac{1}{\|A_{n}\|}\cdot\exp\left(-\frac{\beta\epsilon^{2}n}{4}% \right)+2\cdot\exp(-Cd^{1/4}n).$		(4.22)

Here in the last inequality we used the fact that $\mathop{{}\mathbf{E}}[\mu(\tau;\beta/\sqrt{d},\mathbf{A})]$ is the same for all $\tau\in A_{n}$ .

Combining (4.20) and (4.22), we obtain

\sqrt{d}\cdot\mathop{{}\mathbf{E}}\left[\left\langle\left(\frac{1}{n}\sum_{i=1% }^{n}\sigma(i)\right)^{2}\right\rangle_{\beta/\sqrt{d},\mathbf{A}}\right]\leq% \epsilon^{2}+\sqrt{d}\cdot 2^{n}\left(\frac{1}{|A_{n}|}\cdot\exp\left(-\frac{% \beta\epsilon^{2}n}{4}\right)+2\exp(-Cd^{1/4}n)\right).

(4.23)

If $d$ is sufficiently large, the above gives

\sqrt{d}\cdot\limsup_{n\to\infty}\mathop{{}\mathbf{E}}\left[\left\langle\left(% \frac{1}{n}\sum_{i=1}^{n}\sigma(i)\right)^{2}\right\rangle_{\beta/\sqrt{d},% \mathbf{A}}\right]\leq\epsilon^{2}.

(4.24)

The proposition follows since $\epsilon$ is chosen arbitrarily. $\hfill\blacktriangleleft$

We are now ready to prove Proposition 9.

Proof of Proposition 9.

For part (a), by Lemma 22, Lemma 27, and Proposition 28, we have

		$\displaystyle\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left\|\mathop{{}% \mathbf{E}}_{\mathbf{g}}\left[\langle H_{\mathrm{SK}}(\sigma)\rangle_{\beta,% \mathbf{g}}\right]-\frac{1}{\sqrt{d}}\mathop{{}\mathbf{E}}_{\mathbf{A}}\left[% \langle H_{d}(\sigma)\rangle_{\beta/\sqrt{d},\mathbf{A}}\right]\right\|$
	$\displaystyle=\,\,$	$\displaystyle\lim_{d\to\infty}\Bigg{(}\frac{\beta}{2}\limsup_{n\to\infty}\left% \|\mathop{{}\mathbf{E}}_{\mathbf{g}}\left[1-\langle R_{1,2}^{2}\rangle_{\beta,% \mathbf{g}}\right]-\mathop{{}\mathbf{E}}_{\mathbf{A}}\left[1-\langle R_{1,2}^{% 2}\rangle_{\frac{\beta}{\sqrt{d}},\mathbf{A}}\right]\right\|$
		$\displaystyle\qquad\qquad+\frac{\sqrt{d}}{2}\mathop{{}\mathbf{E}}\left[\left% \langle\left(\frac{1}{n}\sum_{i=1}^{n}\sigma(i)\right)^{2}\right\rangle_{\frac% {\beta}{\sqrt{d}},\mathbf{A}}\right]+O_{d}\left(\frac{1}{\sqrt{d}}\right)\Bigg% {)}$
	$\displaystyle=\,\,$	$\displaystyle\frac{\beta}{2}\cdot\lim_{d\to\infty}\limsup_{n\to\infty}\left\|% \mathop{{}\mathbf{E}}_{\mathbf{g}}\left[\langle R_{1,2}^{2}\rangle_{\beta,% \mathbf{g}}\right]-\mathop{{}\mathbf{E}}_{\mathbf{A}}\left[\langle R_{1,2}^{2}% \rangle_{\frac{\beta}{\sqrt{d}},\mathbf{A}}\right]\right\|.$

Part (a) then follows by applying Proposition 8. Part (b) follows in a similar manner and we omit the details. $\hfill\blacktriangleleft$

5 Disorder chaos for sparse models

Let $\mathbf{g},\mathbf{g}_{t},\mathbf{A},\mathbf{A}_{t}$ be random matrices as defined in Section 2 and let $S\subseteq[-1,1]$ . Recall that we also defined the following short-hand notation:

$\displaystyle Z^{S}_{\mathbf{g},\mathbf{g}_{t}}$	$\displaystyle=\sum_{\begin{subarray}{c}\sigma_{1},\sigma_{2}\in\{-1,1\}^{n}\\ R_{1,2}(\sigma_{1},\sigma_{2})\in S\end{subarray}}\exp\left(-\beta(H(\sigma_{1% };\mathbf{g})+H(\sigma_{2};\mathbf{g}_{t}))\right),$	(5.1)
$\displaystyle Z^{S,\mathsf{bis}}_{\mathbf{g},\mathbf{g}_{t}}$	$\displaystyle=\sum_{\begin{subarray}{c}\sigma_{1},\sigma_{2}\in A_{n}\\ R_{1,2}(\sigma_{1},\sigma_{2})\in S\end{subarray}}\exp\left(-\beta(H(\sigma_{1% };\mathbf{g})+H(\sigma_{2};\mathbf{g}_{t}))\right),$	(5.2)
$\displaystyle Z^{S}_{\mathbf{A},\mathbf{A}_{t}}$	$\displaystyle=\sum_{\begin{subarray}{c}\sigma_{1},\sigma_{2}\in\{-1,1\}^{n}\\ R_{1,2}(\sigma_{1},\sigma_{2})\in S\end{subarray}}\exp\left(-\frac{\beta}{% \sqrt{d}}\cdot(H(\sigma_{1};\mathbf{A})+H(\sigma_{2};\mathbf{A}_{t}))\right),$	(5.3)
$\displaystyle Z^{S,\mathsf{bis}}_{\mathbf{A},\mathbf{A}_{t}}$	$\displaystyle=\sum_{\begin{subarray}{c}\sigma_{1},\sigma_{2}\in A_{n}\\ R_{1,2}(\sigma_{1},\sigma_{2})\in S\end{subarray}}\exp\left(-\frac{\beta}{% \sqrt{d}}\cdot(H(\sigma_{1};\mathbf{A})+H(\sigma_{2};\mathbf{A}_{t}))\right).$	(5.4)

It is known that the SK model exhibits disorder chaos at any temperature.

Theorem 29 (Theorem 9 in [15]).

Let $\beta\in\mathbb{R}$ , $t>0$ . Fix an arbitrary $\epsilon>0$ . Let $I_{\epsilon}=[-1,-\epsilon]\cup[\epsilon,1]$ . There exists some constant $K>0$ such that for every $n\in\mathbb{N}$ ,

\mathop{{}\mathbf{E}}\left[\frac{Z^{I_{\epsilon}}_{\mathbf{g},\mathbf{g}_{t}}}% {Z^{[-1,1]}_{\mathbf{g},\mathbf{g}_{t}}}\right]\leq K\cdot\exp(-n/K).

(5.5)

Theorem 29 immediately implies that the overlap of two configurations sampled from the coupled system $\mu_{\mathbf{g},\mathbf{g}_{t}}$ is nearly zero.

Corollary 30.

If $t>0$ , then

\lim_{n\to\infty}\mathop{{}\mathbf{E}}\left[\langle R_{1,2}^{2}\rangle_{\beta,% \mathbf{g},\mathbf{g}_{t}}\right]=0.

(5.6)

Proof.

For any $\epsilon>0$ , we have

	$\displaystyle\mathop{{}\mathbf{E}}\left[\langle R_{1,2}^{2}\rangle_{\beta,% \mathbf{g},\mathbf{g}_{t}}\right]$	$\displaystyle\leq\epsilon^{2}\cdot\mathop{{}\mathbf{E}}\left[\frac{Z^{[-% \epsilon,\epsilon]}_{\mathbf{g},\mathbf{g}_{t}}}{Z^{[-1,1]}_{\mathbf{g},% \mathbf{g}_{t}}}\right]+1\cdot\mathop{{}\mathbf{E}}\left[\frac{Z^{I_{\epsilon}% }_{\mathbf{g},\mathbf{g}_{t}}}{Z^{[-1,1]}_{\mathbf{g},\mathbf{g}_{t}}}\right]$
		$\displaystyle\leq\epsilon^{2}+K\cdot\exp(-n/K).$

Here we used Theorem 29 and the fact that $Z^{[-\epsilon,\epsilon]}_{\mathbf{g},\mathbf{g}_{t}}\leq Z^{[-1,1]}_{\mathbf{g% },\mathbf{g}_{t}}$ . It follows that

\limsup_{n\to\infty}\mathop{{}\mathbf{E}}\left[\langle R_{1,2}^{2}\rangle_{% \beta,\mathbf{g},\mathbf{g}_{t}}\right]\leq\epsilon^{2}

for any arbitrary $\epsilon>0$ , and the corollary follows. $\hfill\blacktriangleleft$

The exponentially small fraction in Theorem 29 can be translated into a constant gap between the free energy of coupled and uncoupled systems.

Proposition 31.

For every $\epsilon>0$ , there exists some constant $K>0$ such that

\limsup_{n\to\infty}\left(\frac{1}{n}\mathop{{}\mathbf{E}}\left[\log Z^{I_{% \epsilon}}_{\mathbf{g},\mathbf{g}_{t}}\right]-\frac{1}{n}\mathop{{}\mathbf{E}}% \left[\log Z^{[-1,1]}_{\mathbf{g},\mathbf{g}_{t}}\right]\right)\leq-\frac{1}{K}.

(5.7)

Proof.

By Theorem 29 and Jensen’s inequality

	$\displaystyle\frac{1}{n}\mathop{{}\mathbf{E}}\left[\log Z^{I_{\epsilon}}_{% \mathbf{g},\mathbf{g}_{t}}\right]-\frac{1}{n}\mathop{{}\mathbf{E}}\left[\log Z% ^{[-1,1]}_{\mathbf{g},\mathbf{g}_{t}}\right]$	$\displaystyle=\frac{1}{n}\mathop{{}\mathbf{E}}\left[\log\frac{Z^{I_{\epsilon}}% _{\mathbf{g},\mathbf{g}_{t}}}{Z^{[-1,1]}_{\mathbf{g},\mathbf{g}_{t}}}\right]$
		$\displaystyle\leq\frac{1}{n}\log\mathop{{}\mathbf{E}}\left[\frac{Z^{I_{% \epsilon}}_{\mathbf{g},\mathbf{g}_{t}}}{Z^{[-1,1]}_{\mathbf{g},\mathbf{g}_{t}}% }\right]\leq-\frac{1}{K}+O\left(\frac{1}{n}\right).$

The proposition follows by taking $n$ to infinity. $\hfill\blacktriangleleft$

The following theorem, which is a generalized version of Lemma 7, allows us to transfer this free energy gap to sparse models.

Theorem 32 (See e.g. [17, 16]).

For any $\beta\in\mathbb{R}$ ,

\displaystyle\left|\frac{1}{n}\mathop{{}\mathbf{E}}\left[\log Z^{I_{\epsilon},% \mathsf{bis}}_{\mathbf{g},\mathbf{g}_{t}}\right]-\frac{1}{n}\mathop{{}\mathbf{% E}}\left[\log Z^{I_{\epsilon},\mathsf{bis}}_{\mathbf{A},\mathbf{A}_{t}}\right]% \right|\leq O_{d}\left(\frac{1}{\sqrt{d}}\right)+\sqrt{d}\cdot o_{n}(1).

(5.8)

We are now ready to prove Lemma 14.

Proof of Lemma 14.

As before, we only prove part (a). Fix some arbitrary $\epsilon>0$ . By Theorem 16, for both $S=I_{\epsilon}$ and $S=[-1,1]$ we have

\displaystyle\lim_{n\to\infty}\frac{1}{n}\left(\mathop{{}\mathbf{E}}\left[\log Z% ^{S,\mathsf{bis}}_{\mathbf{g},\mathbf{g}_{t}}\right]-\mathop{{}\mathbf{E}}% \left[\log Z^{S}_{\mathbf{g},\mathbf{g}_{t}}\right]\right)=0.

(5.9)

By Proposition 31, for some constant $K=K(\epsilon)>0$ , we have

\limsup_{n\to\infty}\frac{1}{n}\left(\mathop{{}\mathbf{E}}\left[\log Z^{I_{% \epsilon},\mathsf{bis}}_{\mathbf{g},\mathbf{g}_{t}}\right]-\mathop{{}\mathbf{E% }}\left[\log Z^{[-1,1],\mathsf{bis}}_{\mathbf{g},\mathbf{g}_{t}}\right]\right)% \leq-\frac{1}{K}.

(5.10)

We can then use Theorem 32 to transfer this gap to the sparse model and obtain

\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left(\mathop{{}\mathbf{E}}% \left[\log Z^{I_{\epsilon},\mathsf{bis}}_{\mathbf{A},\mathbf{A}_{t}}\right]-% \mathop{{}\mathbf{E}}\left[\log Z^{[-1,1],\mathsf{bis}}_{\mathbf{A},\mathbf{A}% _{t}}\right]\right)\leq-\frac{1}{K}.

(5.11)

By Theorem 19, we then have

\lim_{d\to\infty}\limsup_{n\to\infty}\frac{1}{n}\left(\mathop{{}\mathbf{E}}% \left[\log Z^{I_{\epsilon}}_{\mathbf{A},\mathbf{A}_{t}}\right]-\mathop{{}% \mathbf{E}}\left[\log Z^{[-1,1]}_{\mathbf{A},\mathbf{A}_{t}}\right]\right)\leq% -\frac{1}{K}.

(5.12)

This means that if $d$ is sufficiently large, then for all sufficiently large $n$ we have

\frac{1}{n}\mathop{{}\mathbf{E}}\left[\log Z^{I_{\epsilon}}_{\mathbf{A},% \mathbf{A}_{t}}\right]-\frac{1}{n}\mathop{{}\mathbf{E}}\left[\log Z^{[-1,1]}_{% \mathbf{A},\mathbf{A}_{t}}\right]\leq-\frac{1}{2K}.

(5.13)

By Lemma 18, we have that with probability at least $1-o_{n}(1)$

\frac{1}{n}\log Z^{I_{\epsilon}}_{\mathbf{A},\mathbf{A}_{t}}-\frac{1}{n}\log Z% ^{[-1,1]}_{\mathbf{A},\mathbf{A}_{t}}\leq-\frac{1}{4K},

(5.14)

which rearranges to

\frac{Z^{I_{\epsilon}}_{\mathbf{A},\mathbf{A}_{t}}}{Z^{[-1,1]}_{\mathbf{A},% \mathbf{A}_{t}}}\leq\exp\left(-\frac{n}{4K}\right).

(5.15)

It follows that

\mathop{{}\mathbf{E}}[\langle R_{1,2}^{2}\rangle_{\beta/\sqrt{d},\mathbf{A},% \mathbf{A}_{t}}]\leq\epsilon^{2}+\mathop{{}\mathbf{E}}\left[\frac{Z^{I_{% \epsilon}}_{\mathbf{A},\mathbf{A}_{t}}}{Z^{[-1,1]}_{\mathbf{A},\mathbf{A}_{t}}% }\right]\leq\epsilon^{2}+\exp\left(-\frac{n}{4K}\right)+o_{n}(1).

(5.16)

By taking $n$ to infinity, we have $\limsup_{n\to\infty}\mathop{{}\mathbf{E}}[\langle R_{1,2}^{2}\rangle_{\beta/% \sqrt{d},\mathbf{A},\mathbf{A}_{t}}]\leq\epsilon^{2}$ . This completes the proof. $\hfill\blacktriangleleft$

References

[1] Ahmed El Alaoui and David Gamarnik. Hardness of sampling solutions from the Symmetric Binary Perceptron, July 2024. arXiv:2407.16627 [cs, math]. doi:10.48550/arXiv.2407.16627.
[2] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Local algorithms for Maximum Cut and Minimum Bisection on locally treelike regular graphs of large degree, November 2021. arXiv:2111.06813 [math-ph]. doi:10.48550/arXiv.2111.06813.
[3] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Optimization of mean-field spin glasses. The Annals of Probability, 49(6):2922–2960, November 2021. Publisher: Institute of Mathematical Statistics. doi:10.1214/21-AOP1519.
[4] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Sampling from the Sherrington-Kirkpatrick Gibbs measure via algorithmic stochastic localization, March 2022. arXiv:2203.05093 [cond-mat]. doi:10.48550/arXiv.2203.05093.
[5] Nima Anari, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong. Entropic independence I: Modified log-Sobolev inequalities for fractionally log-concave distributions and high-temperature Ising models. arXiv preprint arXiv:2106.04105, 2021.
[6] Nima Anari, Frederic Koehler, and Thuy-Duong Vuong. Trickle-down in localization schemes and applications. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 1094–1105, New York, NY, USA, 2024. Association for Computing Machinery. doi:10.1145/3618260.3649622.
[7] Roland Bauerschmidt and Thierry Bodineau. A very simple proof of the LSI for high temperature spin systems. Journal of Functional Analysis, 276(8):2582–2588, 2019.
[8] Roland Bauerschmidt, Thierry Bodineau, and Benoit Dagallier. Kawasaki dynamics beyond the uniqueness threshold. arXiv preprint arXiv:2310.04609, 2023. doi:10.48550/arXiv.2310.04609.
[9] Ivona Bezáková, Daniel Štefankovič, Vijay V Vazirani, and Eric Vigoda. Accelerating simulated annealing for the permanent and combinatorial counting problems. SIAM Journal on Computing, 37(5):1429–1454, 2008. doi:10.1137/050644033.
[10] Charlie Carlson, Ewan Davies, Alexandra Kolla, and Will Perkins. Computational thresholds for the fixed-magnetization Ising model. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1459–1472, 2022. doi:10.1145/3519935.3520003.
[11] Michael Celentano. Sudakov–Fernique post-AMP, and a new proof of the local convexity of the TAP free energy. The Annals of Probability, 52(3):923–954, 2024. doi:10.1214/23-AOP1675.
[12] Sourav Chatterjee. Disorder chaos and multiple valleys in spin glasses. arXiv preprint, 2009. arXiv:0907.3381.
[13] Antares Chen, Neng Huang, and Kunal Marwaha. Local algorithms and the failure of log-depth quantum advantage on sparse random CSPs, October 2023. arXiv:2310.01563 [quant-ph]. doi:10.48550/arXiv.2310.01563.
[14] Louis H. Y. Chen. Poisson Approximation for Dependent Trials. The Annals of Probability, 3(3):534–545, June 1975. Publisher: Institute of Mathematical Statistics. doi:10.1214/aop/1176996359.
[15] Wei-Kuo Chen. Variational representations for the Parisi functional and the two-dimensional Guerra–Talagrand bound. The Annals of Probability, 45(6A):3929–3966, November 2017. Publisher: Institute of Mathematical Statistics. doi:10.1214/16-AOP1154.
[16] Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, and Mustazee Rahman. Suboptimality of local algorithms for a class of max-cut problems. The Annals of Probability, 47(3):1587–1618, May 2019. Publisher: Institute of Mathematical Statistics. doi:10.1214/18-AOP1291.
[17] Wei-Kuo Chen and Dmitry Panchenko. Disorder chaos in some diluted spin glass models. The Annals of Applied Probability, 28(3):1356–1378, June 2018. doi:10.1214/17-AAP1331.
[18] Yuansi Chen and Ronen Eldan. Localization schemes: A framework for proving mixing bounds for Markov chains. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 110–122. IEEE, 2022.
[19] Amin Coja-Oghlan, Philipp Loick, Balázs F. Mezei, and Gregory B. Sorkin. The Ising Antiferromagnet and Max Cut on Random Regular Graphs. SIAM Journal on Discrete Mathematics, 36(2):1306–1342, June 2022. Publisher: Society for Industrial and Applied Mathematics. doi:10.1137/20M137999X.
[20] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 84(6):066106, 2011.
[21] Amir Dembo, Andrea Montanari, and Subhabrata Sen. Extremal cuts of sparse random graphs. The Annals of Probability, 45(2):1190–1217, March 2017. Publisher: Institute of Mathematical Statistics. doi:10.1214/15-AOP1084.
[22] Ronen Eldan, Frederic Koehler, and Ofer Zeitouni. A spectral condition for spectral gap: fast mixing in high-temperature Ising models. Probability theory and related fields, 182(3):1035–1051, 2022.
[23] Francesco Guerra. Broken replica symmetry bounds in the mean field spin glass model. Communications in mathematical physics, 233:1–12, 2003.
[24] Chris Jones, Kunal Marwaha, Juspreet Singh Sandhu, and Jonathan Shi. Random Max-CSPs Inherit Algorithmic Hardness from Spin Glasses. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 77:1–77:26, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2023.77.
[25] Florent Krzakała, Andrea Montanari, Federico Ricci-Tersenghi, Guilhem Semerjian, and Lenka Zdeborová. Gibbs states and the set of solutions of random constraint satisfaction problems. Proceedings of the National Academy of Sciences, 104(25):10318–10323, 2007. doi:10.1073/PNAS.0703685104.
[26] Aiya Kuchukova, Marcus Pappik, Will Perkins, and Corrine Yap. Fast and slow mixing of the Kawasaki dynamics on bounded-degree graphs. arXiv preprint, 2024. doi:10.48550/arXiv.2405.06209.
[27] M Mézard, G Parisi, and R Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, pages 815–814, 2002.
[28] Marc Mézard and Giorgio Parisi. Mean-field theory of randomly frustrated systems with finite connectivity. Europhysics Letters, 3(10):1067, 1987.
[29] Marc Mézard and Giorgio Parisi. The Bethe lattice spin glass revisited. The European Physical Journal B-Condensed Matter and Complex Systems, 20:217–233, 2001.
[30] Andrea Montanari. Optimization of the Sherrington–Kirkpatrick hamiltonian. SIAM Journal on Computing, 0(0):FOCS19–1–FOCS19–38, 2021. doi:10.1137/20M132016X.
[31] Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162(3):431–461, August 2015. doi:10.1007/s00440-014-0576-6.
[32] Dmitry Panchenko. The Sherrington-Kirkpatrick Model. Springer Monographs in Mathematics. Springer New York, NY, 2013.
[33] Giorgio Parisi. Infinite number of order parameters for spin-glasses. Physical Review Letters, 43(23):1754, 1979.
[34] Giorgio Parisi. A sequence of approximated solutions to the SK model for spin glasses. Journal of Physics A: Mathematical and General, 13(4):L115, 1980.
[35] Giorgio Parisi. Order parameter for spin-glasses. Physical Review Letters, 50(24):1946, 1983.
[36] David Sherrington and Scott Kirkpatrick. Solvable model of a spin-glass. Physical review letters, 35(26):1792, 1975.
[37] H. Sompolinsky and Annette Zippelius. Dynamic theory of the spin-glass phase. Phys. Rev. Lett., 47:359–362, August 1981. doi:10.1103/PhysRevLett.47.359.
[38] Eliran Subag. Following the ground states of full-RSB spherical spin glasses. Communications on Pure and Applied Mathematics, 74(5):1021–1044, 2021.
[39] Michel Talagrand. The Parisi Formula. Annals of Mathematics, 163(1):221–263, 2006. Publisher: Annals of Mathematics. URL: https://www.jstor.org/stable/20159953.
[40] Michel Talagrand. Parisi measures. Journal of Functional Analysis, 231(2):269–286, February 2006. doi:10.1016/j.jfa.2005.03.001.
[41] Michel Talagrand. Mean Field Models for Spin Glasses: Volume I: Basic Examples. Springer, Berlin, Heidelberg, 2011. doi:10.1007/978-3-642-15202-3.

[bib.bib1] [1] Ahmed El Alaoui and David Gamarnik. Hardness of sampling solutions from the Symmetric Binary Perceptron, July 2024. arXiv:2407.16627 [cs, math]. doi:10.48550/arXiv.2407.16627.

[bib.bib2] [2] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Local algorithms for Maximum Cut and Minimum Bisection on locally treelike regular graphs of large degree, November 2021. arXiv:2111.06813 [math-ph]. doi:10.48550/arXiv.2111.06813.

[bib.bib3] [3] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Optimization of mean-field spin glasses. The Annals of Probability, 49(6):2922–2960, November 2021. Publisher: Institute of Mathematical Statistics. doi:10.1214/21-AOP1519.

[bib.bib4] [4] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Sampling from the Sherrington-Kirkpatrick Gibbs measure via algorithmic stochastic localization, March 2022. arXiv:2203.05093 [cond-mat]. doi:10.48550/arXiv.2203.05093.

[bib.bib5] [5] Nima Anari, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong. Entropic independence I: Modified log-Sobolev inequalities for fractionally log-concave distributions and high-temperature Ising models. arXiv preprint arXiv:2106.04105, 2021.

[bib.bib6] [6] Nima Anari, Frederic Koehler, and Thuy-Duong Vuong. Trickle-down in localization schemes and applications. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 1094–1105, New York, NY, USA, 2024. Association for Computing Machinery. doi:10.1145/3618260.3649622.

[bib.bib7] [7] Roland Bauerschmidt and Thierry Bodineau. A very simple proof of the LSI for high temperature spin systems. Journal of Functional Analysis, 276(8):2582–2588, 2019.

[bib.bib8] [8] Roland Bauerschmidt, Thierry Bodineau, and Benoit Dagallier. Kawasaki dynamics beyond the uniqueness threshold. arXiv preprint arXiv:2310.04609, 2023. doi:10.48550/arXiv.2310.04609.

[bib.bib9] [9] Ivona Bezáková, Daniel Štefankovič, Vijay V Vazirani, and Eric Vigoda. Accelerating simulated annealing for the permanent and combinatorial counting problems. SIAM Journal on Computing, 37(5):1429–1454, 2008. doi:10.1137/050644033.

[bib.bib10] [10] Charlie Carlson, Ewan Davies, Alexandra Kolla, and Will Perkins. Computational thresholds for the fixed-magnetization Ising model. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1459–1472, 2022. doi:10.1145/3519935.3520003.

[bib.bib11] [11] Michael Celentano. Sudakov–Fernique post-AMP, and a new proof of the local convexity of the TAP free energy. The Annals of Probability, 52(3):923–954, 2024. doi:10.1214/23-AOP1675.

[bib.bib12] [12] Sourav Chatterjee. Disorder chaos and multiple valleys in spin glasses. arXiv preprint, 2009. arXiv:0907.3381.

[bib.bib13] [13] Antares Chen, Neng Huang, and Kunal Marwaha. Local algorithms and the failure of log-depth quantum advantage on sparse random CSPs, October 2023. arXiv:2310.01563 [quant-ph]. doi:10.48550/arXiv.2310.01563.

[bib.bib14] [14] Louis H. Y. Chen. Poisson Approximation for Dependent Trials. The Annals of Probability, 3(3):534–545, June 1975. Publisher: Institute of Mathematical Statistics. doi:10.1214/aop/1176996359.

[bib.bib15] [15] Wei-Kuo Chen. Variational representations for the Parisi functional and the two-dimensional Guerra–Talagrand bound. The Annals of Probability, 45(6A):3929–3966, November 2017. Publisher: Institute of Mathematical Statistics. doi:10.1214/16-AOP1154.

[bib.bib16] [16] Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, and Mustazee Rahman. Suboptimality of local algorithms for a class of max-cut problems. The Annals of Probability, 47(3):1587–1618, May 2019. Publisher: Institute of Mathematical Statistics. doi:10.1214/18-AOP1291.

[bib.bib17] [17] Wei-Kuo Chen and Dmitry Panchenko. Disorder chaos in some diluted spin glass models. The Annals of Applied Probability, 28(3):1356–1378, June 2018. doi:10.1214/17-AAP1331.

[bib.bib18] [18] Yuansi Chen and Ronen Eldan. Localization schemes: A framework for proving mixing bounds for Markov chains. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 110–122. IEEE, 2022.

[bib.bib19] [19] Amin Coja-Oghlan, Philipp Loick, Balázs F. Mezei, and Gregory B. Sorkin. The Ising Antiferromagnet and Max Cut on Random Regular Graphs. SIAM Journal on Discrete Mathematics, 36(2):1306–1342, June 2022. Publisher: Society for Industrial and Applied Mathematics. doi:10.1137/20M137999X.

[bib.bib20] [20] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 84(6):066106, 2011.

[bib.bib21] [21] Amir Dembo, Andrea Montanari, and Subhabrata Sen. Extremal cuts of sparse random graphs. The Annals of Probability, 45(2):1190–1217, March 2017. Publisher: Institute of Mathematical Statistics. doi:10.1214/15-AOP1084.

[bib.bib22] [22] Ronen Eldan, Frederic Koehler, and Ofer Zeitouni. A spectral condition for spectral gap: fast mixing in high-temperature Ising models. Probability theory and related fields, 182(3):1035–1051, 2022.

[bib.bib23] [23] Francesco Guerra. Broken replica symmetry bounds in the mean field spin glass model. Communications in mathematical physics, 233:1–12, 2003.

[bib.bib24] [24] Chris Jones, Kunal Marwaha, Juspreet Singh Sandhu, and Jonathan Shi. Random Max-CSPs Inherit Algorithmic Hardness from Spin Glasses. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 77:1–77:26, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2023.77.

[bib.bib25] [25] Florent Krzakała, Andrea Montanari, Federico Ricci-Tersenghi, Guilhem Semerjian, and Lenka Zdeborová. Gibbs states and the set of solutions of random constraint satisfaction problems. Proceedings of the National Academy of Sciences, 104(25):10318–10323, 2007. doi:10.1073/PNAS.0703685104.

[bib.bib26] [26] Aiya Kuchukova, Marcus Pappik, Will Perkins, and Corrine Yap. Fast and slow mixing of the Kawasaki dynamics on bounded-degree graphs. arXiv preprint, 2024. doi:10.48550/arXiv.2405.06209.

[bib.bib27] [27] M Mézard, G Parisi, and R Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, pages 815–814, 2002.

[bib.bib28] [28] Marc Mézard and Giorgio Parisi. Mean-field theory of randomly frustrated systems with finite connectivity. Europhysics Letters, 3(10):1067, 1987.

[bib.bib29] [29] Marc Mézard and Giorgio Parisi. The Bethe lattice spin glass revisited. The European Physical Journal B-Condensed Matter and Complex Systems, 20:217–233, 2001.

[bib.bib30] [30] Andrea Montanari. Optimization of the Sherrington–Kirkpatrick hamiltonian. SIAM Journal on Computing, 0(0):FOCS19–1–FOCS19–38, 2021. doi:10.1137/20M132016X.

[bib.bib31] [31] Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162(3):431–461, August 2015. doi:10.1007/s00440-014-0576-6.

[bib.bib32] [32] Dmitry Panchenko. The Sherrington-Kirkpatrick Model. Springer Monographs in Mathematics. Springer New York, NY, 2013.

[bib.bib33] [33] Giorgio Parisi. Infinite number of order parameters for spin-glasses. Physical Review Letters, 43(23):1754, 1979.

[bib.bib34] [34] Giorgio Parisi. A sequence of approximated solutions to the SK model for spin glasses. Journal of Physics A: Mathematical and General, 13(4):L115, 1980.

[bib.bib35] [35] Giorgio Parisi. Order parameter for spin-glasses. Physical Review Letters, 50(24):1946, 1983.

[bib.bib36] [36] David Sherrington and Scott Kirkpatrick. Solvable model of a spin-glass. Physical review letters, 35(26):1792, 1975.

[bib.bib37] [37] H. Sompolinsky and Annette Zippelius. Dynamic theory of the spin-glass phase. Phys. Rev. Lett., 47:359–362, August 1981. doi:10.1103/PhysRevLett.47.359.

[bib.bib38] [38] Eliran Subag. Following the ground states of full-RSB spherical spin glasses. Communications on Pure and Applied Mathematics, 74(5):1021–1044, 2021.

[bib.bib39] [39] Michel Talagrand. The Parisi Formula. Annals of Mathematics, 163(1):221–263, 2006. Publisher: Annals of Mathematics. URL: https://www.jstor.org/stable/20159953.

[bib.bib40] [40] Michel Talagrand. Parisi measures. Journal of Functional Analysis, 231(2):269–286, February 2006. doi:10.1016/j.jfa.2005.03.001.

[bib.bib41] [41] Michel Talagrand. Mean Field Models for Spin Glasses: Volume I: Basic Examples. Springer, Berlin, Heidelberg, 2011. doi:10.1007/978-3-642-15202-3.

Hardness of Sampling for the Anti-Ferromagnetic Ising Model on Random Graphs

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

1.1 Main Results

Theorem 1.

Theorem 2.

1.2 Techniques

Proposition 3.

Proposition 4.

1.3 Organization

2 Overview of the proof

Gibbs distributions

Gibbs average

Free energy

Proposition 5.

Proof.

Corollary 6.

Correspondence between dense and sparse models

Lemma 7 ([21]).

Proposition 8 (Restatement of Proposition 3).

Proposition 9 (Restatement of Proposition 4).

Disorder chaos and stable algorithms

Theorem 10 (See e.g. [4]).

Theorem 11 ([12]).

Definition 12 (Definition 2.2 in [4]).

Theorem 13 (Theorem 2.6 in [4]).

Lemma 14.

Lemma 15 (Lemma 5.3 in [4]).

Proof of Theorem 1.

Free energy of coupled bisection models

Theorem 16.

Corollary 17.

Lemma 18.

Theorem 19.

Theorem 20.

3 Interpolating the average energy between SK and the diluted model

Proposition 21 (See e.g. [41]).

Proof of Proposition 8.

4 Interpolating the average overlap between SK and the diluted model

Lemma 22 (See e.g. [32]).

Proposition 23.

Proof.

Lemma 24 ([14]).

Lemma 25.

Proof.

Proposition 26.

Proof.

Lemma 27.

Proof.

Proposition 28.

Proof.

Proof of Proposition 9.

5 Disorder chaos for sparse models

Theorem 29 (Theorem 9 in [15]).

Corollary 30.

Proof.

Proposition 31.

Proof.

Theorem 32 (See e.g. [17, 16]).

Proof of Lemma 14.

References