Abstract 1 Introduction 2 Preliminaries 3 Parallel random-cluster and Ising samplers via field dynamics 4 A coupling-with-stationary criterion for parallel simulation 5 Parallel Ising sampler via restricted Gaussian dynamics References Appendix A Parallel simulation of Glauber dynamics (Proof of Theorem 8) Appendix B Low-temperature random clusters (Proof of Theorem 10) Appendix C Analysis of field dynamics (Proof of Theorem 5) Appendix D Analysis of restricted Gaussian dynamics

Efficient Parallel Ising Samplers via Localization Schemes

Xiaoyu Chen ORCID State Key Laboratory for Novel Software Technology, New Cornerstone Science Laboratory, Nanjing University, China Hongyang Liu ORCID State Key Laboratory for Novel Software Technology, New Cornerstone Science Laboratory, Nanjing University, China Yitong Yin ORCID State Key Laboratory for Novel Software Technology, New Cornerstone Science Laboratory, Nanjing University, China Xinyuan Zhang ORCID State Key Laboratory for Novel Software Technology, New Cornerstone Science Laboratory, Nanjing University, China
Abstract

We introduce efficient parallel algorithms for sampling from the Gibbs distribution and estimating the partition function of Ising models. These algorithms achieve parallel efficiency, with polylogarithmic depth and polynomial total work, and are applicable to Ising models in the following regimes: (1) Ferromagnetic Ising models with external fields; (2) Ising models with interaction matrix J of operator norm J2<1.

Our parallel Gibbs sampling approaches are based on localization schemes, which have proven highly effective in establishing rapid mixing of Gibbs sampling. In this work, we employ two such localization schemes to obtain efficient parallel Ising samplers: the field dynamics induced by negative-field localization, and restricted Gaussian dynamics induced by stochastic localization. This shows that localization schemes are powerful tools, not only for achieving rapid mixing but also for the efficient parallelization of Gibbs sampling.

Keywords and phrases:
Localization scheme, parallel sampling, Ising model
Category:
RANDOM
Funding:
Xinyuan Zhang: NSFC Young Student Basic Research Program (PhD candidate, No. 623B2051).
Copyright and License:
[Uncaptioned image] © Xiaoyu Chen, Hongyang Liu, Yitong Yin, and Xinyuan Zhang; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Random walks and Markov chains
; Theory of computation Parallel algorithms
Related Version:
Previous Version: https://arxiv.org/abs/2505.05185
Funding:
This work has been partially supported by the New Cornerstone Science Foundation.
Editors:
Alina Ene and Eshan Chattopadhyay

1 Introduction

The Ising model, introduced by Lenz and [26] in statistical physics to study the criticality of ferromagnetism, has since found numerous applications and has been extensively studied across various fields, including probability theory, statistical learning, and computer science.

Let G=(V,E) be a connected undirected graph with n=|V| vertices and m=|E| edges. Let 𝜷(0,+)E represent the edge activities, and 𝝀[0,1]V represent the external fields. The Gibbs distribution μ𝜷,𝝀Ising of the Ising model on the graph G with parameters 𝜷 and 𝝀 is supported on 2V and is given by:

SV,μ𝜷,𝝀Ising(S):=1Z𝜷,𝝀Isingem(S)βevSλv,

where m(S):={e=(u,v)Eu,vS or u,vS} is the set of “monochromatic” edges, i.e., edges where both endpoints are either in S or both outside of S. The normalizing factor, called the partition function, is given by: Z𝜷,𝝀Ising:=SEem(S)βevSλv.

The Ising model is called ferromagnetic if 𝜷(1,+)E, meaning that pairwise interactions favor monochromatic edges, and the model is called anti-ferromagnetic if 𝜷(0,1)E.

A central problem in the study of Ising models is sampling from the Gibbs distribution. This task is essential not only for estimating the partition function but also for various other inference and learning problems associated with the Ising model.

1.1 Parallel sampler for ferromagnetic Ising model

In a seminal work by [29], it was shown that for ferromagnetic Ising models, approximately sampling from the Gibbs distribution and approximately computing the partition function are both tractable in polynomial time. This result, along with celebrated breakthroughs in polynomial-time approximations of permanents [28, 30] and volumes of convex bodies [15], remarkably showcased the power of random sampling in approximately solving #P-hard inference problems on polynomial-time Turing machines.

Motivated by emerging applications in large-scale data and large models, parallel Gibbs sampling has recently drawn considerable attention [37, 3, 5, 1, 36, 4, 22, 20, 23, 21]. Despite these advances, the tractability of the ferromagnetic Ising model through parallel algorithms remains unresolved, in contrast to the successes of polynomial-time sequential algorithms.

In the theory of computing, the parallel complexity of the ferromagnetic Ising model is of foundational significance. In a seminal work, [39] asked whether an 𝖱𝖭𝖢 algorithm (with polylogarithmic depth on polynomial processors) exists for sampling bipartite perfect matchings, which would imply an 𝖱𝖭𝖢 approximation algorithm for the permanent of Boolean matrices. To this day, this remains a major open problem. [41] conjectured that no 𝖱𝖭𝖢 algorithm exists for this task, making it a rare example of a polynomial-time tractable problem suspected to be intrinsically sequential, yet not known to be P-complete. The challenge of parallelizing the ferromagnetic Ising sampler is closely tied to the problem of sampling matchings, as both problems were resolved sequentially using the same canonical path argument [28, 29].

In this work, we present an efficient parallel sampler in the 𝖱𝖭𝖢 class for the general ferromagnetic Ising model with nonzero external fields. To the best of our knowledge, this is the first 𝖱𝖭𝖢 sampler for Ising model beyond the critical phase-transition threshold.

Theorem 1 (ferromagnetic Ising sampler).

Let δ(0,1) be a constant. There is a parallel algorithm that, given ϵ(0,1) and an Ising model on a graph G=(V,E) with parameters 𝛃[1+δ,+)E and 𝛌[0,1δ]V, outputs a sample from the Gibbs distribution μ𝛃,𝛌Ising within total variation distance ϵ in (ϵ1lognlogn)Oδ(1) parallel time using Oδ(m2lognϵ) processors.

One can view this result as a parallel counterpart to [29]. Through a standard reduction via non-adaptive annealing, this 𝖱𝖭𝖢 Ising sampler can be turned into an 𝖱𝖭𝖢 algorithm for approximating the Ising partition function Z𝜷,𝝀Ising. Specifically, applying the work-efficient parallel annealing algorithm recently developed in [38], the parallel Ising sampler stated in Theorem 1 can be transformed to a randomized parallel algorithm which returns an (1±ϵ)-approximation of the Ising partition function in (ϵ1lognlogn)Oδ(1) parallel time on O~δ(m2n/ϵ2) machines.

The parallel Ising sampler in Theorem 1 is implied by a parallel sampler for the random cluster model (see Section 2.2 for definition). The correspondence between the Ising model and the random cluster model is due to the well-known Edwards-Sokal coupling [16] (see Lemma 6).

Theorem 2 (random cluster sampler).

Let δ(0,1) be a constant. There is a parallel algorithm that, given ϵ(0,1) and a random cluster model on a graph G=(V,E) with parameters 𝐩[δ,1)E and 𝛌[0,1δ]V, outputs a sample from the random cluster distribution μE,𝐩,𝛌RC within total variation distance ϵ in (ϵ1lognlogn)Oδ(1) parallel time on Oδ(m2lognϵ) processors.

Theorem 1 follows from Theorem 2 via the Edwards-Sokal coupling, as formally described in Section 3. Our main technical contribution in this part is a parallel sampler for the random cluster model, as stated in Theorem 2, whose efficiency is proved in Section 4.

1.2 Parallel sampler for Ising models with contracting interaction matrix

An Ising model can be formulated through the interaction matrix. Let V be a set of n vertices. The Gibbs distribution μJ,hIsing of the Ising model, supported on {±1}V, is defined by the interaction matrix JV×V, which is a symmetric positive semi-definite matrix, and the external fields hV:

σ{±1}V,μJ,hIsingexp(12σJσ+hσ). (1)

Recent studies have established rapid mixing for Ising models in terms of their interaction norm [18, 6], showing that Glauber dynamics mixes rapidly when J2<1, whereas for J2>1 sampling becomes intractable [34, 24, 31]. For parallel sampling, [36] proposed an 𝖱𝖭𝖢 algorithm under a stricter condition that requires J to have a constant -norm. However, under the broader condition J2<1, the best known parallel algorithms [32, 37] achieve only a depth of O~(n), highlighting a significant gap in achieving optimal parallelism.

In this work, we present an 𝖱𝖭𝖢 parallel sampler for the Ising model under the condition J2<1. To the best of our knowledge, this is the first 𝖱𝖭𝖢 Ising sampler that matches this critical threshold for rapid mixing in terms of the interaction norm.

Theorem 3 (Ising sampler).

Let η(0,1) be a constant. There is a parallel algorithm that, given ϵ(0,1) and an Ising model with an interaction matrix JV×V satisfying

0η2IJ(1η2)I

and arbitrary external fields hV, outputs a sample from the Gibbs distribution μJ,hIsing within total variation distance ϵ in Oη(log4(nϵ)) parallel time using O~η(n3/ϵ2) processors.

 Remark 4.

Note that for the Ising model with interaction matrix J, for every constant C, taking J=J+CI results in the same Gibbs distribution. Thus, in the regime where J21η, for which Glauber dynamics is known to mix rapidly [18, 6], one can add a diagonal matrix to the interaction matrix J in order to satisfy the condition of Theorem 3.

The 𝖱𝖭𝖢 sampler stated in Theorem 3 is presented in Section 5.

1.3 Technique overview

Classic local Markov chains, such as Glauber dynamics, are inherently sequential, updating the spin of a single site at each step based on its neighbors. Recently, [36, 37] proposed a generic parallelization framework using correlated sampling (or universal coupling), enabling polylogarithmic-depth simulation of Glauber dynamics under a relaxed Dobrushin condition. When applied to the Ising model, this yields an 𝖱𝖭𝖢 sampler in the uniqueness regime 𝜷(Δ2Δ,ΔΔ2)E, where Δ is the maximum degree. However, these conditions do not hold for the Ising models considered in this work.

Instead of directly parallelizing a local Markov chain, our approach focuses on simulating “global” Markov chains that update O(n) spins in each step, satisfying the following:

  • The chain mixes in polylogarithmic steps;

  • Each step of the chain has an efficient parallel implementation in 𝖱𝖭𝖢.

These together ensure an 𝖱𝖭𝖢 sampler with polylog depth and polynomial total work.

For the ferromagnetic Ising model, this global chain corresponds to the field dynamics introduced by [10], while for the Ising model with a contracting interaction matrix, the global chain is the restricted Gaussian dynamics (also known as the proximal sampler) introduced by [33]. Both of these global Markov chains can be viewed as arising from different localization schemes for the Ising Gibbs sampling.

Proposed by [13], the localization schemes provide an abstract framework that generalizes high-dimensional expander walks and encompasses a wide range of stochastic processes. This framework can be interpreted in terms of noising and denoising processes, as discussed in [17, 9]. Let Xμ be drawn from the target distribution μ. A noisy channel D, which is a Markov chain, is applied to X to obtain a “noised” sample YD(X,). The joint distribution of (X,Y) is denoted as π. The denoising process U represents the time reversal of D, where it draws Z from the posterior distribution U(Y,):=π(Y). The Markov chains associated with the localization scheme update the current state X to a new state Z according to the rules:

  • Add noise to X through the noisy channel: sample YD(X,);

  • Denoise Y via the posterior distribution: sample ZU(Y,).

In particular, when the noisy channel D corresponds to the continuous-time down walk channel, which drops each element in a set X with a fixed probability, the localization process is called the negative-field localization, and the above associated global Markov chain is the field dynamics. Alternatively, when D corresponds to a Gaussian noise channel, which adds Gaussian noise to the sample X, the localization process is called stochastic localization, and the resulting global Markov chain becomes the restricted Gaussian dynamics.

The localization scheme has previously proven remarkably effective in establishing rapid mixing of Glauber dynamics up to criticality [10, 11, 7, 13, 12, 8, 9]. Specifically, the localization process can effectively “tilt” the model parameters towards sub-critical directions. This allows the mixing properties established in sub-critical regimes to be conserved up to criticality, provided the associated global chains mix rapidly.

In the current work, we explore another perspective on localization schemes: leveraging the associated global Markov chains to achieve parallel efficiency in sampling. This task is highly non-trivial, as these chains were originally designed as analytical tools for studying mixing times, and their efficient parallel simulation poses significant challenges.

For the ferromagnetic Ising model with external fields, we leverage the field dynamics of the random-cluster model to design a parallel sampler. The field dynamics is the global Markov chain induced by negative-field localization. Each update consists of two steps:

  1. 1.

    A noising step YD(X,) which drops each element in X with a fixed probability (as in Line 4 of Algorithm 1), and is straightforward to implement in parallel.

  2. 2.

    A denoising step ZU(Y,), which requires sampling from a tilted posterior distribution U(Y,) and is non-trivial. We simulate the denoising step via Glauber dynamics on U(Y,), which corresponds to a sub-critical (low-temperature) random-cluster model, where rapid mixing is easier to establish.

A similar idea was employed in [12] to develop a near-linear time sequential sampler. Here, we further parallelize this Glauber dynamics on U(Y,) using the parallel simulation algorithm introduced in [37], which yields Algorithm 2. A key challenge, however, is that despite being tilted to the low-temperature regime, the Dobrushin condition required in [37] to ensure efficient parallel simulation does not hold. To overcome this, we establish a new “coupling with the stationary” criterion, which significantly relaxes the Dobrushin condition and guarantees the efficiency of parallel simulation of Glauber dynamics. Altogether, this yields parallel random-cluster and ferromagnetic-Ising samplers with polylogarithmic depth and polynomial total work.

For the Ising model with a contracting interaction matrix J, where J2<1, we leverage the restricted Gaussian dynamics (also known as the proximal sampler) to design a parallel sampler. This dynamics defines a global Markov chain, which is the associated chain of the stochastic localization for the Ising model. Each update consists of two steps:

  1. 1.

    A noising step YD(X,), which introduces Gaussian noise to the current sample.

  2. 2.

    A denoising step ZU(Y,), which resamples from a posterior distribution U(Y,).

With appropriately chosen parameters, U(Y,) becomes a product distribution, allowing an efficient parallel implementation of the denoising process. The non-trivial step is the noising step YD(X,), which samples from a high-dimensional Gaussian distribution with mean X and covariance J1 (as in Line 3 of Algorithm 3). A key observation is that the Gaussian distribution is log-concave, so the noising step can be efficiently approximated using recently developed parallelizations of Langevin Monte Carlo [2].

In a related prior work [5], an efficient parallel algorithm was developed for transport-stable distributions by implementing stochastic localization in discrete steps. However, their approach is fundamentally different from ours. Their method leverages the log-concavity of the tilted (or “noised”) distribution D(X,), which enables efficient parallel sampling. In contrast, our approach relies on the rapid mixing of the proximal sampler, which also admits an efficient parallel implementation.

2 Preliminaries

2.1 Notation

Given a finite nonempty ground set E, we will use boldface letters, such as 𝒑, to denote a vector in E. Given a vector 𝒂E and function f:, we write 𝒃=f(𝒂) for the vector 𝒃E satisfying that be=f(ae) for all eE.

Throughout the paper, we use log to denote the natural logarithm.

2.2 Random cluster model

Let G=(V,E) be a connected, undirected graph. The random cluster model on G=(V,E) with edge probabilities 𝒑[0,1]E and vertex weights 𝝀[0,1]V is defined as follows. The random cluster distribution μE,𝒑,𝝀RC is supported on the power set 2E, where each SE is assigned a weight:

wE,𝒑,𝝀RC(S):=eSpeeES(1pe)Cκ(V,S)(1+jCλj), (2)

where κ(V,S) is the set of connected components in the graph (V,S). The distribution μE,𝒑,𝝀RC is then given by:

SE,μE,𝒑,𝝀RC(S):=wE,𝒑,𝝀RC(S)ZE,𝒑,𝝀RC,

where the normalizing factor ZE,𝒑,𝝀RC:=SEwE,𝒑,𝝀RC(S) is the partition function.

2.3 Markov chains and mixing time

Let Ω be a finite state space, and let (Xt)t0 be a Markov chain over Ω with transition matrix P. We use P to represent the Markov chain for convenience. It is well known that a finite Markov chain P converges to a unique stationary distribution μ if P is irreducible and aperiodic (see [35] for the definitions of these concepts). The mixing time of a Markov chain P is defined by

Tmix(ϵ):=maxx0Ωmin{tdTV(Pt(x0,),μ)<ϵ},

where dTV(𝒑,𝒒):=12𝒑𝒒1 is the total variation distance.

2.3.1 Glauber dynamics

The Glauber dynamics is a canonical Markov chain for sampling from joint distributions. Let μ be a distribution over 2E on a finite ground set E. For any XE and uE, let μuX be the marginal probability of u given X, formally defined as:

μuX:=μ(X{u})μ(X{u})+μ(X{u}).

The Glauber dynamics PGD updates a configuration XE according to the following rule:

  • Pick an element uE uniformly at random;

  • With probability μuX, replace X with X{u}; otherwise, replace X with X{u}.

It is well known that μ is the unique stationary distribution of this chain [35].

2.3.2 Field dynamics

Field dynamics is a novel Markov chain introduced in [10]. Let μ be a distribution over 2E on a finite ground set E. The field dynamics PθFD with parameter θ(0,1) updates a configuration XE according to the following rule:

  • Add each element uE into a set S independently with probability θ;

  • Replace X with a random YE that follows the law of θ1μ conditioned on YXS, where θ1μ denotes the distribution supported on 2E and defined as:

    TE,(θ1μ)(T)θ|T|μ(T).

The field dynamics can be thought of as an adaptive block dynamics, where a block of sites is chosen for updating, adapted to the current configuration. It was shown in [10] that μ is the unique stationary distribution of the field dynamics.

For the random cluster model on graph G=(V,E) with parameters 𝒑 and 𝝀, the second step of the field dynamics corresponds to:

  • Replace X with a random subgraph Y distributed as μSX,𝒑,λRC, where 𝐩=𝒑𝒑+θ(1𝒑).

2.4 Model of computation

We assume the concurrent-read exclusive-write (𝖢𝖱𝖢𝖶) parallel random access machine (𝖯𝖱𝖠𝖬) [27] as our model of computation, where concurrent writes to the same memory location are allowed, and an arbitrary value written concurrently is stored. The computational complexity is measured by the number of rounds (parallel time steps) and the number of processors (or machines) used in the worst case.

We use 𝖭𝖢 to refer to both the class of algorithms that run in polylogarithmic time using a polynomial number of processors and the class of problems solvable by such algorithms. 𝖱𝖭𝖢 denotes the randomized counterpart of 𝖭𝖢.

3 Parallel random-cluster and Ising samplers via field dynamics

We propose a parallel algorithm for sampling from the random-cluster model, based on the simulation of field dynamics. Consider a random-cluster model defined on a graph G=(V,E) with edge probabilities 𝒑(0,1]E and vertex weights 𝝀[0,1)V.

The main algorithm is described in Algorithm 1.

Algorithm 1 Random cluster field dynamics.

Algorithm 1 simulates the field dynamics (defined in Section 2.3.2), the Markov chain induced by the negative-field localization process for the random cluster model μE,𝐩,𝝀RC. At each transition step, the algorithm samples from μSX,𝐩,λRC, where 𝐩=𝒑𝒑+θ(1𝒑). This step is implemented using the 𝗉𝖺𝗋𝖺𝗅𝗅𝖾𝗅-𝖱𝖢 subroutine, described in Algorithm 2.

Table 1: Parameters assumed by Algorithm 1.
parameter value
θ e100exp(10log(ϵ/2)logn)exp(140(1λmax)2)pmin/logn
N0 max{exp(60(1λmax)2),3pmin,log(2/ϵ2)logn}
TFD (eθ)5(1λmax)2(2logn+loglog2pmin+log2ϵ2)
TGD 2m(logm+log(8TFD/ϵ))
TPA 3log(8TGDTFDϵ)

The subroutine 𝗉𝖺𝗋𝖺𝗅𝗅𝖾𝗅-𝖱𝖢(G,𝐩,𝝀,ϵ), as presented in Algorithm 2, is a parallel algorithm designed for sampling from the random cluster distribution μE,𝐩,𝝀RC. It simulates the Glauber dynamics for μ=μE,𝐩,𝝀RC through a parallel simulation approach introduced in [36], using correlated sampling, also known as universal coupling. Specifically, Algorithm 2 employs inverse transform sampling as the universal coupler. However, prior analyses in [36, 37] rely on a variant of the Dobrushin condition, which does not hold in our context, and thus overcoming this barrier require novel techniques.

Let n=|V| and m=|E|. The parameters θ, N0, TFD, TGD, and TPA assumed by the algorithms are specified in Table 1. We will show that the algorithm is both efficient and correct under these parameters.

Algorithm 2 𝗉𝖺𝗋𝖺𝗅𝗅𝖾𝗅-𝖱𝖢(G,𝐩,𝝀,ϵ).

Specifically, we show that if the random cluster model is in the low-temperature regime (i.e., corresponding to an Ising model with large β>1), then it holds simultaneously:

  1. 1.

    The Glauber dynamics gets sufficiently close to μE,𝐩,𝝀RC within TGD steps.

  2. 2.

    The parallel iterative updates (the for loop in Line 9 of Algorithm 2) stabilize in TPA rounds, faithfully simulating the TGD-step Glauber dynamics with high probability.

Therefore, Algorithm 2 is an efficient parallel sampler in the low-temperature regime.

As a result of the localization scheme, the field dynamics in Algorithm 1 “tilts” the current instance μE,𝐩,𝝀RC to a new random cluster distribution μSX,𝐩,𝝀RC at low temperature in each round. Within this regime, the Glauber dynamics is rapidly mixing and can be faithfully simulated by Algorithm 2 in parallel. The field dynamics mixes in polylogarithmic rounds according to [12].

Combining everything together, we obtain the following theorem for parallel sampler.

Theorem 5.

Let δ(0,1) be a constant, and let 𝐩[δ,1)E and 𝛌[0,1δ]V. Then:

  1. 1.

    Algorithm 1 halts in (ϵ1lognlogn)Oδ(1) parallel time using Oδ(m2lognϵ) processors.

  2. 2.

    Algorithm 1 outputs a random XE with dTV(X,μE,𝒑,𝝀RC)ϵ.

Our main theorem for the random cluster sampler (Theorem 2) follows directly from Theorem 5. Our main theorem for the Ising sampler (Theorem 1) follows from Theorem 5 as well, through the Edwards-Sokal coupling [16], which connects the distributions of the Ising and random cluster models. For our purposes, we use a variant of the ES coupling from [19].

Lemma 6 ([19, Proposition 2.3]).

For the Ising model on a graph G=(V,E) with parameters 𝛃(1,+) and 𝛌(0,1), the following process generate a sample Xμ𝛃,𝛌Ising:

  1. 1.

    Sample SE according to the distribution μE,𝒑,𝝀RC, where 𝒑=1𝜷1;

  2. 2.

    For each connected component C in the graph (V,S), add C to X with probability uCλu1+uCλu.

Furthermore, the ES coupling can be efficiently implemented in parallel.

Lemma 7.

Item 2 of Lemma 6 can be computed in O(logn) parallel time using O(m) processors.

Proof.

Use the connected component algorithm in [40], to compute all components of the graph G=(V,S) in O(logn) parallel time using O(n+m) processors. For each connected component C, the product uCλu can be computed in O(log|C|) parallel time using O(|C|) processors by divide-and-conquer. The overall complexity is O(logn) parallel time using O(m) processors.

Proof of Theorem 1.

By Lemma 6, a sample from μ𝜷,𝝀Ising can be generated as follows:

  1. 1.

    Generate a random subset SE using Algorithm 1, where 𝒑=1𝜷1;

  2. 2.

    Apply Item 2 of Lemma 6 to construct X.

By Theorem 5, we have dTV(S,μE,𝒑,𝝀RC)ϵ, which, combined with Lemma 6, ensures that dTV(X,μ𝜷,𝝀Ising)ϵ. The efficiency follows from Theorem 5 and Lemma 7.

It remains to prove Theorem 5. The key steps of the proof are outlined in Section 4, while the full proof is deferred to Appendix C.

4 A coupling-with-stationary criterion for parallel simulation

In this section, we outline the key ideas underlying the proof of Theorem 5.

Consider the random cluster model on G=(V,E) with parameters 𝒑(0,1]E and 𝝀[0,1)V. The distribution μ=μE,𝒑,𝝀RC is defined over subgraphs SE and is proportional to the weight function w=wE,𝒑,𝝀RC, given by:

μ(S)w(S):=eSpefES(1pe)Cκ(V,S)(1+jCλj),

where κ(V,S) denotes the set of all connected components in the graph (V,S).

Consider the Glauber dynamics (Xt)t0 for sampling from μ, starting from an initial state X0E. At each step, the process updates Xi1 to Xi as follows: Select an edge eiE and a real number i[0,1] uniformly at random, and update Xi according to:

Xi={Xi1{ei}if i<μeiXi1Xi1{ei}otherwise, (3)

where μeiXi1 represents the marginal probability of ei given Xi1, formally defined as

μeX:=μ(X{e})μ(X{e})+μ(X{e})=w(X{e})w(X{e})+w(X{e}).

It is straightforward to verify that (Xt)t0 is the Glauber dynamics defined in Section 2.3.1.

We will show that Algorithm 2 faithfully simulates the Glauber dynamics, provided that a “coupling with stationarity” condition holds, as formally stated below.

Condition 1.

Let η(0,1) be a parameter. There is a 𝒢2E satisfying the following conditions:

  1. 1.

    For any X,Y𝒢, the total variation in marginal probabilities across edges satisfies

    eE|μeXμeY||XY|2.
  2. 2.

    For any S:E2E, consider the random configuration σE generated by including each edge eE independently with probability μeS(e). Then, we have

    𝐏𝐫[σ𝒢]η.

Condition 1 describes a “coupling with stationary” style criterion, akin to the one introduced in [25]. This criterion significantly relaxes the Dobrushin condition used in [36, 37] for ensuring the efficiency of parallel simulation, such as Algorithm 2. Rather than requiring distance decay between all pairs of configurations, it only enforces distance decay in expectation between good configurations X,Y𝒢 within the one-step optimal coupling of Glauber dynamics, while ensuring that these good configurations appear frequently throughout the dynamics.

We now state the main theorem of this section.

Theorem 8.

If Condition 1 holds with parameter η, then for any TGD,TPA1, the output XALG of Algorithm 2 and the Glauber dynamics (Xt)0tTGD defined in (3), with initial state X0E generated as in Line 1 of Algorithm 2, satisfy the following bound on the total variation distance:

dTV(XALG,XTGD)TGD(2(TPA1)+4η).
 Remark 9 (generality of the criterion).

Note that Algorithm 2, Condition 1, and Theorem 8 are stated abstractly for Glauber dynamics applied to a general joint distribution μ over variables with a Boolean domain. Therefore, these results are applicable to any such distribution.

The following theorem is an application of Theorem 8 to low-temperature random cluster models, establishing that Algorithm 2 is both accurate and efficient in this regime.

Theorem 10.

Let G=(V,E), 𝐩,𝛌, and ϵ be the input to Algorithm 2. Assume that n=|V|3, ϵ12, and the following condition holds:

(1pmin)logn min{e40exp(5log(1/ϵ)logn),1λmax27}.

Then, for any TGD2m(logm+log(4/ϵ)) and TPA3log(4TGD/ϵ), the output XALG of Algorithm 2 satisfies

dTV(XALG,μ)ϵ. (4)

Theorem 10 is a key step in establishing the accuracy and efficiency of Algorithm 1. Under the parameterization given in Table 1, each instance of the random cluster model passed to Algorithm 2 within Algorithm 1 falls within the low-temperature regime required by Theorem 10. The theorem holds because these low-temperature random cluster models satisfy Condition 1.

The proofs of Theorem 8 and Theorem 10 are deferred to Appendix A and Appendix B, respectively. Finally, the proof of Theorem 5 is completed in Appendix C.

5 Parallel Ising sampler via restricted Gaussian dynamics

We introduce a parallel sampler for the Ising model with contracting interaction norm. Our algorithm approximately implements the following restricted Gaussian dynamics (also known as the proximal sampler, described in Algorithm 3), the Markov chain induced by the stochastic localization process for Ising Gibbs sampling.

Algorithm 3 Restricted Gaussian Dynamics.

The efficient implementation of Algorithm 3 follows from the following observations:

  1. 1.

    Suppose 0η2IJ(1η2)I. Given an error bound 0<ϵ0<1, the Gaussian sampling step (Line 3) of Algorithm 3 can be approximated within total variation error ϵ0. Specifically, there exists an algorithm that, given xi1 and J1, produces samples from a distribution π satisfying dTV(π,𝒩(xi1,J1))ϵ0. This approximation can be achieved in Oη(log3(n/ϵ02)) parallel time using O~(n3/ϵ02) processors.

  2. 2.

    The denoising step (Line 4) of Algorithm 3 involves sampling from a product distribution whose marginals can be computed in O(logn) parallel time. This step can be implemented faithfully with no error using O(logn) parallel time.

  3. 3.

    Assuming perfect simulations of Line 3 and Line 4 in Algorithm 3, the total variation distance between the output X and the target distribution μJ,hIsing does not exceed ϵ/2 after TRGD=Oη(log(n/ϵ)) iterations of the outer loop in Algorithm 3.

The detailed analysis of Item 1, Item 2, and Item 3 is provided in Appendix D.

Assuming these properties hold, we now proceed with the proof of Theorem 3.

Proof of Theorem 3.

Let X be the distribution of output generated by Algorithm 3. Under the assumption that the entire algorithm can be perfectly simulated, Item 3 ensures that the total variation distance between X and the target distribution μJ,hIsing is at most ϵ/2. Now, we implement Line 3 of Algorithm 3 using an approximate oracle provided by Item 1, setting the parameter ϵ0=ϵ/2TRGD. Let Y denote the output produced by our implementation of Algorithm 3. By a simple coupling argument, we obtain dTV(X,Y)TRGDϵ0=ϵ/2. Applying the triangle inequality, we have dTV(Y,μJ,hIsing)dTV(X,μJ,hIsing)+dTV(X,Y)ϵ. Thus, our implementation produces an approximate sample within Oη(log4(n/ϵ)) parallel time using O~η(n3/ϵ2) processors, as guaranteed by Item 1, Item 2, and Item 3.

References

  • [1] Nima Anari, Callum Burgess, Kevin Tian, and Thuy-Duong Vuong. Quadratic speedups in parallel sampling from determinantal distributions. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2023), pages 367–377. ACM, 2023. doi:10.1145/3558481.3591104.
  • [2] Nima Anari, Sinho Chewi, and Thuy-Duong Vuong. Fast parallel sampling under isoperimetry. In The Thirty Seventh Annual Conference on Learning Theory (ICML 2024), volume 247 of Proceedings of Machine Learning Research, pages 161–185. PMLR, 2024. URL: https://proceedings.mlr.press/v247/anari24a.html.
  • [3] Nima Anari, Ruiquan Gao, and Aviad Rubinstein. Parallel sampling via counting. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC 2024), pages 537–548. ACM, 2024. doi:10.1145/3618260.3649744.
  • [4] Nima Anari, Nathan Hu, Amin Saberi, and Aaron Schild. Sampling arborescences in parallel. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021), volume 185 of LIPIcs, pages 83:1–83:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.ITCS.2021.83.
  • [5] Nima Anari, Yizhi Huang, Tianyu Liu, Thuy-Duong Vuong, Brian Xu, and Katherine Yu. Parallel discrete sampling via continuous walks. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing (STOC 2023), pages 103–116. ACM, 2023. doi:10.1145/3564246.3585207.
  • [6] Nima Anari, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong. Entropic independence I: Modified log-Sobolev inequalities for fractionally log-concave distributions and high-temperature Ising models. arXiv preprint, 2021. arXiv:2106.04105.
  • [7] Nima Anari, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong. Entropic independence: optimal mixing of down-up random walks. In Proceedings of the 54th Annual ACM Symposium on Theory of Computing (STOC 2022), pages 1418–1430. ACM, 2022. doi:10.1145/3519935.3520048.
  • [8] Nima Anari, Frederic Koehler, and Thuy-Duong Vuong. Trickle-down in localization schemes and applications. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC 2024), pages 1094–1105. ACM, 2024. doi:10.1145/3618260.3649622.
  • [9] Xiaoyu Chen, Zongchen Chen, Yitong Yin, and Xinyuan Zhang. Rapid mixing at the uniqueness threshold. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing (STOC 2025), pages 879–890. ACM, 2025. doi:10.1145/3717823.3718260.
  • [10] Xiaoyu Chen, Weiming Feng, Yitong Yin, and Xinyuan Zhang. Rapid mixing of Glauber dynamics via spectral independence for all degrees. In 62nd IEEE Annual Symposium on Foundations of Computer Science (FOCS 2021), pages 137–148. IEEE, 2021. doi:10.1109/FOCS52979.2021.00022.
  • [11] Xiaoyu Chen, Weiming Feng, Yitong Yin, and Xinyuan Zhang. Optimal mixing for two-state anti-ferromagnetic spin systems. In 63rd IEEE Annual Symposium on Foundations of Computer Science (FOCS 2022), pages 588–599. IEEE, 2022. doi:10.1109/FOCS54457.2022.00062.
  • [12] Xiaoyu Chen and Xinyuan Zhang. A near-linear time sampler for the Ising model with external field. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms (SODA 2023), pages 4478–4503. SIAM, 2023. doi:10.1137/1.9781611977554.CH170.
  • [13] Yuansi Chen and Ronen Eldan. Localization schemes: A framework for proving mixing bounds for markov chains (extended abstract). In 63rd IEEE Annual Symposium on Foundations of Computer Science (FOCS 2022), pages 110–122. IEEE, 2022. doi:10.1109/FOCS54457.2022.00018.
  • [14] Richard Cole. Parallel merge sort. SIAM J. Comput., 17(4):770–785, 1988. doi:10.1137/0217049.
  • [15] Martin Dyer, Alan Frieze, and Ravi Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM (JACM), 38(1):1–17, 1991. doi:10.1145/102782.102783.
  • [16] Robert G. Edwards and Alan D. Sokal. Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. Phys. Rev. D (3), 38(6):2009–2012, 1988.
  • [17] Ahmed El Alaoui and Andrea Montanari. An information-theoretic view of stochastic localization. IEEE Trans. Inform. Theory, 68(11):7423–7426, 2022. doi:10.1109/TIT.2022.3180298.
  • [18] Ronen Eldan, Frederic Koehler, and Ofer Zeitouni. A spectral condition for spectral gap: fast mixing in high-temperature Ising models. Probability Theory and Related Fields, pages 1–17, 2021. doi:10.1007/s00440-021-01085-x.
  • [19] Weiming Feng, Heng Guo, and Jiaheng Wang. Swendsen-wang dynamics for the ferromagnetic ising model with external fields. Inf. Comput., 294:105066, 2023. doi:10.1016/J.IC.2023.105066.
  • [20] Weiming Feng, Thomas P. Hayes, and Yitong Yin. Distributed Metropolis sampler with optimal parallelism. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA 2021), pages 2121–2140. SIAM, 2021. doi:10.1137/1.9781611976465.127.
  • [21] Weiming Feng, Yuxin Sun, and Yitong Yin. What can be sampled locally? Distributed Comput., 33(3-4):227–253, 2020. doi:10.1007/S00446-018-0332-8.
  • [22] Weiming Feng, Nisheeth K. Vishnoi, and Yitong Yin. Dynamic sampling from graphical models. SIAM J. Comput., 50(2):350–381, 2021. doi:10.1137/20M1315099.
  • [23] Manuela Fischer and Mohsen Ghaffari. A simple parallel and distributed sampling technique: Local glauber dynamics. In 32nd International Symposium on Distributed Computing (DC 2018), volume 121 of LIPIcs, pages 26:1–26:11. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.DISC.2018.26.
  • [24] Andreas Galanis, Alkis Kalavasis, and Anthimos Vardis Kandiros. On sampling from Ising models with spectral constraints. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024), volume 317 of LIPIcs, pages 70:1–70:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.APPROX/RANDOM.2024.70.
  • [25] Thomas P. Hayes and Eric Vigoda. Coupling with the stationary distribution and improved sampling for colorings and independent sets. Ann. Appl. Probab., 16(3):1297–1318, 2006. doi:10.1214/105051606000000330.
  • [26] Ernst Ising. Beitrag zur theorie des ferro-und paramagnetismus. PhD thesis, Grefe & Tiedemann Hamburg, 1924.
  • [27] Joseph JéJé. An introduction to parallel algorithms. Reading, MA: Addison-Wesley, 10:133889, 1992.
  • [28] Mark Jerrum and Alistair Sinclair. Approximating the permanent. SIAM J. Comput., 18(6):1149–1178, 1989. doi:10.1137/0218077.
  • [29] Mark Jerrum and Alistair Sinclair. Polynomial-time approximation algorithms for the Ising model. SIAM J. Comput., 22(5):1087–1116, 1993. doi:10.1137/0222066.
  • [30] Mark Jerrum, Alistair Sinclair, and Eric Vigoda. A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. Journal of the ACM (JACM), 51(4):671–697, 2004. doi:10.1145/1008731.1008738.
  • [31] Dmitriy Kunisky. Optimality of Glauber dynamics for general-purpose Ising model sampling and free energy approximation. In Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms (SODA 2024), pages 5013–5028. SIAM, 2024. doi:10.1137/1.9781611977912.180.
  • [32] Holden Lee. Parallelising glauber dynamics. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024), volume 317 of LIPIcs, pages 49:1–49:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.APPROX/RANDOM.2024.49.
  • [33] Yin Tat Lee, Ruoqi Shen, and Kevin Tian. Structured logconcave sampling with a restricted Gaussian oracle. In The Annual Conference on Learning Theory (COLT 2021), pages 2993–3050, 2021. URL: http://proceedings.mlr.press/v134/lee21a.html.
  • [34] David A. Levin, Malwina J. Luczak, and Yuval Peres. Glauber dynamics for the mean-field Ising model: cut-off, critical power law, and metastability. Probab. Theory Related Fields, 146(1-2):223–265, 2010. doi:10.1007/s00440-008-0189-z.
  • [35] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov chains and mixing times. American Mathematical Society, Providence, RI, 2017.
  • [36] Hongyang Liu and Yitong Yin. Simple parallel algorithms for single-site dynamics. In STOC, pages 1431–1444. ACM, 2022. doi:10.1145/3519935.3519999.
  • [37] Hongyang Liu and Yitong Yin. Parallelize single-site dynamics up to dobrushin criterion. J. ACM, 72(1), January 2025. doi:10.1145/3708558.
  • [38] Hongyang Liu, Yitong Yin, and Yiyao Zhang. Work-efficient parallel counting via sampling. arXiv preprint arXiv:2408.09719, 2024. doi:10.48550/arXiv.2408.09719.
  • [39] Ketan Mulmuley, Umesh V Vazirani, and Vijay V Vazirani. Matching is as easy as matrix inversion. In STOC, pages 345–354. ACM, 1987. doi:10.1145/28395.383347.
  • [40] Yossi Shiloach and Uzi Vishkin. An O(logn) parallel connectivity algorithm. J. Algorithms, 3(1):57–67, 1982. doi:10.1016/0196-6774(82)90008-6.
  • [41] Shang-Hua Teng. Independent sets versus perfect matchings. Theoret. Comput. Sci., 145(1-2):381–390, 1995. doi:10.1016/0304-3975(94)00289-U.

Appendix A Parallel simulation of Glauber dynamics (Proof of Theorem 8)

We assume that Condition 1 holds for a given parameter η[0,1].

Let Y=(Yi)1iTGD, where Yi is constructed in Line 9 of Algorithm 2, which indicates the result of the i-th update in the -th iteration. Note that Y for 1TPA are determined by the random choices X and (ei,i)i=1TGD generated in Line 1 and Line 3 of Algorithm 2, respectively. The next lemma shows that Y stabilizes fast with high probability.

Lemma 11.

Given parameters TPA,TGD1, for any 1TPA, it holds that

𝐏𝐫[YY1](2(1)+4η)TGD. (5)
Proof.

For 1iTGD, define

Predi:={predi(e):eE}{0}.

Intuitively, Predi is the set of updates that have impact on the result of the i-th update.

We claim that for 2 and 1iTGD,

𝐏𝐫[YiYi1](12mjPredi𝐏𝐫[Yj1Yj2])+2η. (6)

Assuming (6), the following can be proved by an induction on 1TPA:

𝐏𝐫[YiYi1]2(1)+4η. (7)

For the induction basis =1, (7) holds trivially since 2(1)+4η>1. We now assume (7) holds for 11. By (6) and the induction hypothesis,

𝐏𝐫[YiYi1] |Predi|2m(2(2)+4η)+2η
(Since |Predi|m) 2(1)+4η.

Lemma 11 follows from (7) by taking the union bound over all 1iTGD.

It remains to prove the claim (6), which completes the proof of Lemma 11.

Let <i denote the random choices used by the first i1 updates, formally

<i:=(ej,j)1j<i.

Recall the rules for Yi being updated in Line 13. The probability of YiYi1 conditioning on <i and ei is given by

𝐏𝐫[YiYi1<i,ei] =|μeiσiμeiσi1|, (8)

where μeS=μ(S{e})μ(S{e})+μ(S{e}) is the marginal probability.

Let 𝒢2E be the good event in Condition 1. When σi,σi1𝒢, by Condition 1,

𝐏𝐫[YiYi1<i]=1meE|μeσiμeσi1||σiσi1|2m. (9)

Recall the construction of σi in Line 11. Configurations σi and σi1 differ at site eE if and only if predi(e)>1 and Ypredi(e)1Ypredi(e)2. Note that for different e1,e2E with predi(e1),predi(e2)>0, it holds that predi(e1)predi(e2). Therefore, |σiσi1| can be bounded by

|σiσi1|jPrediI[Yj1Yj2]. (10)

Combining (9), (10), and by the law of total probability, we have

𝐏𝐫[YiYi1]12mjPredi𝐏𝐫[Yj1Yj2]+𝐏𝐫[(σi𝒢)(σi1𝒢)]. (11)

Now, fix any 1tTPA, 1iTGD. By the definition of σit and the updating rule of Yit,

eE,I[eσit]={Ypredi(e)t1,t>1predi(e)1,I[eX],otherwise.

It is straightforward to verify that, in Algorithm 2, Ypredi(e)t1 and I[eX] are independently generated according to μeσpredi(e)t1 and μeE, respectively. We can define the function S:E2E as

eE,S(e):={σpredi(e)t1,t>1predi(e)1,E,otherwise.

Note that σit is generated by the same rule as Item 2 of Condition 1 using this S. This, according to Item 2 of Condition 1, implies that 𝐏𝐫[σit𝒢]η. Note that i,t are fixed arbitrarily. By a union bound, we have

𝐏𝐫[(σi𝒢)(σi1𝒢)]𝐏𝐫[σi𝒢]+𝐏𝐫[σi1𝒢]2η. (12)

Therefore, (6) holds. This completes the proof of Lemma 11.

Theorem 8 is a corollary of Lemma 11.

Proof of Theorem 8.

We couple XALG and XTGD by using the same random choices X and (ei,i)i=1TGD in the two processes. Let (Zi)i=0TGD be the sequence of configurations defined by

Zi:={eE(ji(e)=0eX)(ji(e)1Yji(e)TPA=1)},

where ji(e)=max{j1jiej=e}{0}. Recall the rules for Yi being updated in Line 13, and the construction of σi in Line 11. It holds that for all 1iTGD,

Zi={Zi1{ei}i<μ(σiTPA{ei})μ(σiTPA{ei})+μ(σiTPA{ei}),Zi1{ei}otherwise. (13)

Now assume that YTPA=YTPA1. For all 1iTGD, Zi1 satisfies

Zi1 ={eE(ji1(e)=0eX)(ji1(e)1Yji1(e)TPA=1)}
={eE(predi(e)=0eX)(predi(e)1Ypredi(e)TPA=1)}
={eE(predi(e)=0eX)(predi(e)1Ypredi(e)TPA1=1)}=σiTPA.

Therefore, the σiTPA in (13) can be replaced with Zi, and hence the sequence (Zi)i=0TGD is identical to (Xi)i=0TGD because the two processes have the same transition rules (13) and (3) and initial state Z0=X0. Hence,

𝐏𝐫[XALGXTGD]𝐏𝐫[YTPAYTPA1](2(TPA1)+4η)TGD,

where the last inequality follows from Lemma 11. This concludes the proof.

Appendix B Low-temperature random clusters (Proof of Theorem 10)

Recall that G=(V,E), 𝒑,𝝀, and ϵ are the input of Algorithm 2, where it holds that n=|V|3, ϵ1/2, and

(1pmin)lognmin{e40exp(5log(1/ϵ)logn),1λmax27}.

Furthermore, TGD2m(logm+log(4/ϵ)) and TPA3log(4TGD/ϵ) are parameters used in Algorithm 2. These assumptions will be used throughout.

First, under the above assumption, it is known that the Glauber dynamics for the distribution μ=μE,𝒑,𝝀RC of the random cluster model is rapidly mixing.

Lemma 12 ([12, Theorem 5.1]).

Let (Xi)i=0TGD be the Glauber dynamics on the distribution μ with the initial state X0 generated as in Line 1 of Algorithm 2, then it holds that

dTV(XTGD,μ)ϵ/2.
 Remark 13.

In [12], the initial state of the Glauber dynamics was set to E instead of the random X0. Though, the proof of [12, Theorem 5.1] holds so long as the initial state follows the law of a product distribution whose marginal probabilities are bounded from below by 13(1pmin)logn. By the argument in [12, Lemma 5.5], this is satisfied by the random initial state X0 in Lemma 12.

We are now ready to prove Theorem 10.

Proof of Theorem 10.

We first prove Theorem 10 by assuming Condition 1 with parameter η=ϵ16TGD. By Theorem 8 and the assumption of parameters,

dTV(XALG,XTGD)2TGD23log(4TGD/ϵ)+4TGDϵ16TGDϵ/2.

Therefore, by Lemma 12 and a triangle inequality, it holds that

dTV(XALG,μ) dTV(XALG,XTGD)+dTV(XTGD,μ)ϵ.

The remaining proof is dedicated to verifying Condition 1 with parameter η=ϵ16TGD.

Verifying Item 1 of Condition 1

let 𝒞 be the family of vertex sets with large expansion:

𝒞:={SV|S|n/2 and |E(S,VS)||S|logn},

where E(S,VS) is the set of edges between S and VS. We define the good event

𝒢:={XES𝒞,|XE(S,VS)|>0}.

We remark that this is the same good event used in [12, Section 5]. In has been established in [12, Eq. (14)] that

eE{f}|μeXμeY|=0.

Note that we also have |μfXμfY|=0 by definition. Therefore,

eE|μeXμeY|12. (14)

For X,Y𝒢 with |XY|>1, let e1,e2,,ei be edges in XY and let f1,f2,,fj be edges in YX. Construct following configuration path P:=(P0,P1,,Pi+j) from X to Y:

  • let P0=X;

  • for 1tj, let Pt=Pt1{ft};

  • for j<t, let Pt=Pt1{etj}.

Note that |Pt1Pt|=1 for all 1ti+j, and Pt𝒢 for all 0ti+j. Hence,

eE|μeXμeY|eE1ti+j|μePt1μePt|=1ti+jeE|μePt1μePt||XY|2,

where the last inequality follows from (14) and the fact that i+j=|XY|.

Verifying Item 2 of Condition 1

Fix S:E2E, the random configuration σ is generated by including each eE independently with probability μeS(e).

Lemma 14 ([12, Lemma 5.6]).

For any TE and eE, it holds that

μeT13(1pmin)logn.

For simplicity, let K=(1pmin)logn, and Lemma 14 actually implies a marginal lower bound for σ, that is for each eE, 𝐏𝐫[eσ]13K.

Since each edge e is added to σ independently, by the definition of 𝒢,

𝐏𝐫[σ𝒢]S𝒞(3K)|E(S,VS)|S𝒞(3K)|S|lognj=1+njnjlog(3K)nlog(27K).

Recall that we choose

TGD =2m(logm+log(2/ϵ))4n2log(4n2ϵ)16n4ϵn10ϵ,

where in the last inequality, we use the assumption that n3. By assumption, it holds that

nlog(27K) =exp(lognlog(27K))
exp(lognlog(e40exp(3log(1/ϵ)logn)))
=exp(logn(403log(1/ϵ)logn))
=exp(40logn3log(1/ϵ))=ϵ3n40ϵ16TGD,

where in the last inequality, we use n3. This finishes the verification.

Appendix C Analysis of field dynamics (Proof of Theorem 5)

We now prove Theorem 5. The Item 1 (efficiency) of Theorem 5 will be proved in Section C.1, and Item 2 (accuracy) of Theorem 5 will be proved in Section C.2.

C.1 Efficiency

We first bound the efficiency of Algorithm 2, assuming the setting of parameters in Table 1.

Proposition 15.

Algorithm 2 teriminates in O(TPAlogm) parallel time steps on O(mTGD) machines.

Proof.

The parallel complexity of Algorithm 2 is dominated by Line 7 and Line 12.

Line 7 requires to compute predi(e)=max{j1j<iej=e}{0} for each 1iTGD and eE in parallel. For each edge eE, the update list e is defined as:

e:={j1ej=e}{0} (15)

To implement Line 7, for each eE, we first store a sorted update list e using parallel merge sort [14], which costs O(logTGD) parallel time on O(TGD) machines. Then each predecessor predi(e) can be computed in O(logTGD) parallel time on O(1) machines by a binary search on sorted e. The whole process costs O(logTGD) parallel time on O(mTGD) machines in total.

In Line 12, the marginal distribution μeiσ=μ(σ{ei})μ(σ{ei})+μ(σ{ei})=w(σ{ei})w(σ{ei})+w(σ{ei}) can be computed by calculating the weights of σ{ei} and σ{ei} (defined in (2)), whereas each weight can be computed by a connected component algorithm such as [40], which costs O(logm) parallel time on O(n+m) machines. Thus, each iteration of Line 9 costs O(logm) parallel time on O(mTGD) machines, and these O(mTGD) machines can be re-used in the next iteration. In Line 1, the marginal distribution μeE can be computed in the same method, which costs O(logm) parallel time on O(n+m) machines for each edge eE.

For other costs, the preprocessing part (from Line 02 to Line 5) of Algorithm 2 can be computed in O(1) parallel time on O(TGD) machines; and in Line 16, the result X of Algorithm 2 can be computed in O(1) parallel time on O(m) machines, since je is maximum of the sorted list e.

Overall, Algorithm 2 runs in O(TPAlogm) parallel time on O(mTGD) machines.

Proof of Item 1 of Theorem 5.

When |V|N0, Algorithm 1 generates X by brute force, which costs 2O(N02)=exp(Oδ(log(1/ϵ)logn)) in total work. When |V|>N0, Algorithm 1 terminates within TFD iterations, and in each iteration, it generates a random S using O(1) parallel time on O(m) machines and calls Algorithm 2. By Proposition 15, we have that Algorithm 1 terminates within O(TPAlogm)TFD=(exp(log(1/ϵ)logn)logn)Oδ(1) parallel time steps on O(mTGD)=Oδ(m2log(n/ϵ)) machines.

C.2 Accuracy of sampling

We now prove Item 2 of Theorem 5, the accuracy of sampling of Algorithm 1, still assuming the setting of parameters in Table 1. A key step has already been provided in Theorem 10. To complete the proof, we are going to establish the followings.

Lemma 16.

If for every UE, the output Y of 𝗉𝖺𝗋𝖺𝗅𝗅𝖾𝗅-𝖱𝖢((V,U),𝐩,𝛌,(2TFD)1ϵ) always satisfies dTV(Y,μU,𝐩,𝛌RC)(2TFD)1ϵ, then the output X of Algorithm 1 satisfies dTV(X,μE,𝐩,𝛌RC)ϵ.

Lemma 17.

Assuming n=|V|N0, it holds that for every UE, the output Y of the subroutine 𝗉𝖺𝗋𝖺𝗅𝗅𝖾𝗅-𝖱𝖢((V,U),𝐩,𝛌,(2TFD)1ϵ) satisfies

dTV(Y,μU,𝒑,𝝀RC)ϵ2TFD.

Item 2 of Theorem 5 follows directly by combining Lemma 16 and Lemma 17.

Lemma 16 follows from the rapid mixing of the field dynamics.

Lemma 18 ([12, Lemma 3.5]).

The mixing time of the field dynamics satisfies

ϵ(0,1),Tmix(ϵ)(eθ)5(1λmax)2(2logn+loglog2pmin+log12ϵ2).
Proof of Lemma 16.

Without loss of generality, we assume that |V|>N0. For 0tTFD, let Xt be the configuration X after the t-th iteration of Algorithm 1, and let Yt be the configuration Y after the t-th iteration of the field dynamics PθFD starting from the same initial state X0=Y0=E.

Let μ=μE,𝒑,𝝀RC for short. By triangle inequality and the coupling lemma, it holds that

dTV(XTFD,μ)dTV(XTFD,YTFD)+dTV(YTFD,μ)𝐏𝐫[XTFDYTFD]+dTV(YTFD,μ),

for any coupling (Xt,Yt) of the two processes Xt and Yt. And it is obvious that if XTFDYTFD, then there must exist 1iTFD such that XiYi but Xj=Yj for all 0j<i. Hence, it holds that

𝐏𝐫[XTFDYTFD] i=1TFD𝐏𝐫[XiYi and j<i,Xj=Yj]
i=1TFD𝐏𝐫[XiYiXi1=Yi1].

For any 1iTFD, conditioned on Xi1=Yi1, we construct a coupling of Xi and Yi:

  1. 1.

    generate a random SE by adding each eE independently with probability θ;

  2. 2.

    generate (Xi,Yi) according to the optimal coupling of μ^U,𝒑,𝝀RC and μU,𝒑,𝝀RC, where U=SXi1 and μ^U,𝒑,𝝀RC is the distribution of the output of 𝗉𝖺𝗋𝖺𝗅𝗅𝖾𝗅-𝖱𝖢((V,U),𝒑,𝝀,(2TFD)1ϵ).

By the coupling lemma and the assumption of Lemma 16, it holds that

𝐏𝐫[XiYiXi1=Yi1] maxUEdTV(μ^U,𝒑,𝝀RC,μU,𝒑,𝝀RC)ϵ2TFD.

By Lemma 18 and our choice of TFD, we have

dTV(XTFD,μ) TFDϵ2TFD+ϵ2=ϵ.

Now, it only remains to prove Lemma 17.

Proof of Lemma 17.

It is sufficient to verify the condition of Theorem 10 with error bound ϵ2TFD on the new instance μU,𝒑,𝝀RC:

(1pmin)lognmin{e40exp(5log(2TFD/ϵ)logn),1λmax27}. (16)

By our assumption nN0max{4,3pmin,log(2/ϵ2)logn}, it holds that

logTFD6(1λmax)2log(1/K)+14(1λmax)2logn, (17)

where for convenience, we denote

K:=θlognpmin=e100exp(10log(ϵ/2)logn)exp(140(1λmax)2).

The first term in the right hand side of (16) can be bounded by

e40exp (5log(2TFD/ϵ)logn)=e40exp(5logTFDlogn+5log(ϵ/2)logn)
(by (17)) e50exp(5log(ϵ/2)logn)exp(70(1λmax)2+30(1λmax)2logKlogn)
e50exp(5log(ϵ/2)logn)exp(70(1λmax)2)K1/2,

where the last inequality holds since nN0exp(60(1λmax)2).

Note that the 𝒑=𝒑𝒑+θ(1𝒑) constructed in Algorithm 1 satisfies

pmin=mineEpe=pminpmin+θ(1pmin)lognlogn+K1Klogn.

Therefore, we have the following two bounds on (1pmin)logn:

(1pmin)lognK e50exp(5log(ϵ/2)logn)exp(70(1λmax)2)K1/2
e40exp(5log(2TFD/ϵ)logn), (18)
(1pmin)lognK exp(140(1λmax)2)(1λmax)21401λmax27. (19)

Combining (18) and (19), we conclude the proof of Lemma 17.

Appendix D Analysis of restricted Gaussian dynamics

In this section, we verify Item 1, Item 2, and Item 3 stated in Section 5, completing the overall proof.

Item 1 requires an efficient method for approximately sampling in parallel from the Gaussian distribution 𝒩(xi1,J1). The Langevin process is a well-established technique for Gaussian sampling, and recent work has introduced a parallelization for this approach.

Theorem 19 ([2, Corollary 14]).

Let α,β>0 be constants. Define κ:=β/α. Let π=exp(V) be a distribution over d such that V:d satisfies

0αI2V(x)βI

for all xd. Given the minimizer x of V and an error bound ϵ0, there exists a parallel sampling algorithm 𝒜 that generates Y such that dTV(Y,π)ϵ0 by using O(κlogκlog2(d/ϵ02)) parallel rounds and at most 7max{κdϵ02,κ2}-gradient evaluations per round.

Simulating Line 3 requires sampling from 𝒩(0,J1). By Theorem 19, we define the potential function V(x)=12xJx. Under the constraint 0η2IJ(1η2)I, we set κ=2/η and d=n. Evaluating the gradient V(x) takes O(logn) parallel rounds on O(n2) machines. Thus, the total complexity of the parallelized Langevin algorithm is O(κlogκlog2(d/ϵ02)))O(logn)=Oη(log3(n/ϵ)) and each parallel round requires 7max{κdϵ02,κ2}O(n2)=O~η(n3/ϵ02) machines. Thus, Item 1 is verified.

Verifying Item 2 follows directly by expanding the distribution formula in Line 4. Since the interaction matrix J is symmetric, we have

μJ,hIsing(x)exp(12(yix)J(yix)) =exp(12xJx+hσ12(yix)J(yix))
=exp(yiJ+h,x12yiJyi)

It follows that each dimension in the above distribution is independent. The cost of computing yiJ+h is O(logn) parallel rounds on O(n2) machines.

Verifying Item 3 follows from the rapid mixing of the restricted Gaussian dynamics, which has been established recently.

Theorem 20 ([13, 9]).

Let δ(0,1) be a constant, and let P be the transition matrix of the restricted Gaussian dynamics. If J21δ, then the restricted Gaussian dynamics exhibits entropy decay with rate δ, i.e., for all f:Ω(μ)0,

Entμ[Pf](1δ)Entμ[f].

Consequently, by Pinsker’s inequality, we obtain

dTV(X,μ)dKL(X,μ)/2exp(δTRGD)log(1/μ(x0)),

where X is the output of Algorithm 3 and x0 is the initial state.

Since 0η2IJ(1η2)I, it follows that for all x,y{±1}n,

μ(x)μ(y)exp(2η).

Thus, for any x0Ω(μ), we have 1/μ(x0)2nexp(2/η). Applying Theorem 20, we ensure dTV(X,μ)ϵ/2 by choosing

TRGD=2η(log4ϵ2+log(nlog2+2η))=Oη(log(n/ϵ)).

Thus, Item 3 is verified.