Abstract 1 Introduction 2 Preliminaries 3 Spectral Algorithms for Refuting Semirandom 𝒌-LIN(𝔽) for Even 𝒌 References

Spectral Refutations of Semirandom k-LIN over Larger Fields

Nicholas Kocurek ORCID University of Washington, Seattle, WA, USA Peter Manohar ORCID The Institute for Advanced Study, Princeton, NJ, USA
Abstract

We study the problem of strongly refuting semirandom k-LIN(𝔽) instances: systems of k-sparse inhomogeneous linear equations over a finite field 𝔽. For the case of 𝔽=𝔽2, this is the well-studied problem of refuting semirandom instances of k-XOR, where the works of [18, 20] establish a tight trade-off between runtime and clause density for refutation: for any choice of a parameter , they give an nO()-time algorithm to certify that there is no assignment that can satisfy more than 12+ε-fraction of constraints in a semirandom k-XOR instance, provided that the instance has O(n)(n)k/21logn/ε4 constraints, and the work of [28] provides good evidence that this tight up to a polylog(n) factor via lower bounds for the Sum-of-Squares hierarchy. However, for larger fields, the only known results for this problem are established via black-box reductions to the case of 𝔽2, resulting in a |𝔽|3k gap between the current best upper and lower bounds.

In this paper, we give an algorithm for refuting semirandom k-LIN(𝔽) instances with the “correct” dependence on the field size |𝔽|. For any choice of a parameter , our algorithm runs in (|𝔽|n)O()-time and strongly refutes semirandom k-LIN(𝔽) instances with at least O(n)(|𝔽|n)k/21log(n|𝔽|)/ε4 constraints. We give good evidence that this dependence on the field size |𝔽| is optimal by proving a lower bound for the Sum-of-Squares hierarchy that matches this threshold up to a polylog(n|𝔽|) factor. Our results also extend beyond finite fields to the more general case of m and arbitrary finite Abelian groups. Our key technical innovation is a generalization of the “𝔽2 Kikuchi matrices” of [36, 18] to larger fields, and finite Abelian groups more generally.

Keywords and phrases:
Spectral Algorithms, CSP Refutation, Kikuchi Matrices
Category:
APPROX
Copyright and License:
[Uncaptioned image] © Nicholas Kocurek and Peter Manohar; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Approximation algorithms analysis
Funding:
This material is based upon work supported by the National Science Foundation under Grant No. DMS-1926686.
Related Version:
Full Version: https://arxiv.org/abs/2508.18185
Editors:
Alina Ene and Eshan Chattopadhyay

1 Introduction

A k-LIN(𝔽) instance over a finite field 𝔽 is a collection of k-sparse 𝔽-linear inhomogeneous equations in n variables x1,,xn. That is, each equation has the form iIαixi=bI, where |I|=k, αi𝔽{0}, and bI𝔽. For worst-case instances, there has been a long line of work on developing algorithms for and proving hardness of determining the value, i.e., the maximum fraction of satisfiable constraints, of a k-LIN(𝔽) instance, with a special focus on the case of 𝔽=𝔽2, where the k-LIN(𝔽2) problem is commonly referred to as “k-XOR”. For k-LIN(𝔽) instances with O(n) constraints, Håstad’s PCP [19] shows that for k3, it is NP-hard to decide if the instance has value 1|𝔽|+ε (a random assignment is near-optimal) or 1ε (nearly satisfiable). On the algorithmic side, the work of [6] gives a PTAS for k-XOR instances with nO(k) constraints (“maximally dense” instances). And, assuming the exponential time hypothesis [21], the work of [15] gives an essentially tight “runtime vs. number of constraints” trade-off for worst-case instances. For the case of k=2, the 2-LIN(𝔽) problem is closely related to the Unique Games Conjecture [25], which conjectures that deciding if a 2-LIN(𝔽) instance has value 1|𝔽|+ε or 1ε is NP-hard when |𝔽| is sufficiently large as a function of ε.

Despite its worst-case hardness, there have been many works on designing algorithms for random k-XOR instances. As random k-XOR instances with n constraints have value 12+ε with high probability, the natural problem to consider is the task of (strong) refutation, where the algorithmic goal is to output a certificate that the value of the random instance is at most 12+ε. This problem has been the focus of several works [16, 10, 2, 34], with the ultimate goal being to understand the trade-off between “runtime” and “number of constraints”. That is, given n and a “time budget” of nO() for a parameter (which may be super-constant, e.g., nδ for constant δ>0), how many constraints, as a function of n and , are required to refute a random k-XOR in nO() time? This question was essentially answered by [34], which gives for any , an nO()-time algorithm that, with high probability over a random instance, certifies that a random k-XOR instance has maximum value at most 12+ε if it has at least n(n/)k/21poly(log(n),ε1) constraints. This trade-off between runtime and number of constraints is conjectured to be essentially optimal, with evidence coming in the form of lower bounds in various restricted computational models [17, 13, 35, 9, 32, 29, 7, 28]. More recently, there has been a flurry of work [1, 18, 20] on designing algorithms for k-XOR in the harder semirandom model, where the “left-hand sides” of the equations are worst case, and only the “right-hand sides” bI are chosen at random. These works show that one can refute semirandom instances at the same runtime vs. number of constraints trade-off as shown in [34]. That is, semirandom instances are just as easy to refute as fully random ones.

Thus, for the Boolean case of 𝔽=𝔽2, we have a near-complete understanding: for any choice of the parameter , if the number of constraints in the semirandom k-XOR instance is at least n(n/)k/21poly(log(n),ε1), then the algorithm of [1, 18, 20] can refute the instance in nO() time, and if the number of constraints is smaller than n(n/)k/21poly(log(n),ε1), the lower bound of, e.g., [28] provides good evidence that there is no algorithm to refute in nO() time, even for random instances.

What can we say about semirandom k-LIN over finite fields 𝔽𝔽2? There is a simple reduction to the Boolean case (see Appendix B in [2]) that loses an |𝔽|3k factor in the number of constraints. That is, the reduction gives an algorithm to refute such instances with at least |𝔽|3kn(n/)k/21poly(log(n),ε1) constraints. On the side of lower bounds, the work [28] establishes the same lower bound as in the case of 𝔽2, i.e., with no field-dependent factor. Thus, for e.g., 𝔽 with |𝔽|=nδ, there a large gap between the upper and lower bounds.

Understanding the optimal dependence on the field size for refuting semirandom k-LIN instances has many potential applications. One immediate application is obtaining better attacks on the (sparse) learning parity with noise (LPN) assumption commonly used in cryptography. The k-sparse LPN assumption111The phrase “sparse LPN” sometimes refers to the case when the secret is sparse. Here, we mean that the equations are sparse. is the distinguishing variant of the refutation problem for k-LIN(𝔽) – the “dense” LPN assumption removes the sparsity requirement on the equations – and is considered a foundational assumption in cryptography. While many cryptographic applications only use this assumption over the field 𝔽2 [5], many applications such as constructing indistinguishability obfuscation [22, 23, 33] and others [12, 11] require fields of much larger (even superpolynomial) size.

As another application, one could hope to prove stronger lower bounds for information-theoretic private information retrieval (PIR) schemes. Recent work of [4] has led to a flurry of improvements in lower bounds for binary locally decodable [4, 8, 24] and locally correctable codes [26, 37, 3, 27] by establishing a connection between these lower bounds and refuting “semirandom-like” instances of k-LIN over 𝔽2. While these results can be extended to larger, constant-sized alphabets (see, e.g., Appendix A in [4]), information-theoretic PIR schemes are essentially equivalent to locally decodable codes over alphabets of poly(n) size. Thus, the large loss in |𝔽| above prevents the approach of [4] from being able to prove stronger PIR lower bounds.

1.1 Our results

In this paper, we investigate the dependence on the field size in the number of constraints required to refute semirandom k-LIN(𝔽) instances. As our main results, we give both an algorithm and a matching Sum-of-Squares lower bound with the “correct” polynomial dependence on the field size |𝔽|.

Before stating our main results, we formally define semirandom k-LIN instances.

Definition 1 ((Semirandom) k-LIN over 𝔽).

An instance of k-LIN(𝔽) is =(,{bv}v), where is a set of k-sparse vectors222A vector v𝔽n is k-sparse if |{i:vi0}|=k. in 𝔽n and bv𝔽 for all v. We view as representing the system of linear equations with variables x1,,xn specified by i=1nvixi=bv for each v. The value of the instance, which we denote by val(), is the maximum over x𝔽n of the fraction of constraints satisfied by x. That is, val()=maxx𝔽n1||v1(i=1nvixi=bv).

An instance of k-LIN is random if is a random subset of k-sparse vectors and each bv is drawn independently and uniformly from 𝔽. An instance of k-LIN is semirandom if each bv is drawn independently and uniformly from 𝔽 (but may be arbitrary).

The first main result of this paper gives a refutation algorithm for semirandom k-LIN(𝔽) for any field 𝔽.

Theorem 2 (Tight refutation of semirandom k-LIN(𝔽)).

Fix k/2. There is an algorithm that takes as input a k-LIN(𝔽) instance =(,{bv}v) in n variables and outputs a number algval()[0,1] in time (|𝔽|n)O() with the following two guarantees:

  1. (1)

    algval()val() for every instance ;

  2. (2)

    If ||Ω(n)(n|𝔽|)k/21log(|𝔽|n)ε4 and is drawn from the semirandom distribution described in Definition 1, then with probability 11poly(n) over the draw of the semirandom instance, i.e., the randomness of {bv}v, it holds that algval()1|𝔽|+ε.

As a byproduct of the analysis of Theorem 2, we also establish an extremal combinatorics statement on the existence of short linear dependencies in any sufficiently dense collection of k-sparse vectors over a finite field 𝔽.

Theorem 3 (Short linear dependencies in k-sparse vectors over 𝔽).

Let be a set of k-sparse vectors in 𝔽n with ||Ω(n)(n|𝔽|)k/21log(|𝔽|n). Then, there exists a set 𝒱 with |𝒱|log(|𝔽|n) and non-zero coefficients {αv}v𝒱 in 𝔽 such that:

v𝒱αvv=0.

That is, 𝒱 is a linearly dependent subset of .

Theorem 3 is a generalization of the hypergraph Moore bound, or Feige’s conjecture on the existence of short even covers in hypergraphs [14] (first proven in [18] for the case of 𝔽2) to arbitrary finite fields. The hypergraph Moore bound establishes a rate vs. distance trade-off for binary LDPC codes (see [30]). One can similarly view Theorem 3 as establishing such a trade-off for LDPC codes over general finite fields.

The key technical innovation in our proofs of Theorems 2 and 3 is the introduction of a new Kikuchi matrix for any finite field 𝔽 (Definition 10). Our Kikuchi matrices are a generalization of the “𝔽2 Kikuchi matrices” of [36, 18] to other fields, and finite Abelian groups more generally. As we point out after Definition 10, the natural generalization of the 𝔽2 Kikuchi matrix is not sufficient, and we have to add an additional condition to the matrix to make the analysis work (see Remark 11).

In our second main result, we prove a Sum-of-Squares lower bound for refuting k-LIN(𝔽) instances that nearly matches the threshold in Theorem 2.

Theorem 4 (Sum-of-Squares lower bounds for refuting random k-LIN, informal).

Fix k3 and nmax(|𝔽|,k)k. Let be a random k-LIN(𝔽) instance ||O(n)(n|𝔽|)k/21ε2. Then, with large probability over the draw of , it holds that

  1. (1)

    val()1|𝔽|+ε;

  2. (2)

    The degree-O~() Sum-of-Squares relaxation for k-LIN(𝔽) fails to refute .

We note that Theorem 4 requires k3 as Theorem 2 gives a polynomial-time algorithm when ||=O(n) and k=2.

The threshold in Theorem 4 matches the threshold in Theorem 2 up to the “lower order” poly(logn,ε1) factor. However, it has a one limitation: the degree of the Sum-of-Squares relaxation can only be at most n/|𝔽|, rather than the whole of range of O(1) to Ω(n). That is, Theorem 4 can only give a subexponential-time lower bound of 2n1δ for the Sum-of-Squares algorithm when |𝔽|=nδ. It turns out that this is nearly necessary, as there is a very simple refutation algorithm, implementable within degree O~() Sum-of-Squares, that out-performs the algorithm in Theorem 2 when |𝔽| is very large.

Theorem 5 (Simple refutation algorithm).

Fix n/2k. There is an algorithm that takes as input a k-LIN(𝔽) instance =(,{bv}v) in n variables and outputs a number algval()[0,1] in time (|𝔽|n)O() with the following two guarantees:

  1. (1)

    algval()val() for every instance ;

  2. (2)

    If ||Ω(n)(n)k1lognε2 and is drawn from the fully random distribution described in Definition 1, then with probability 11poly(n) over the draw of the random instance, it holds that algval()1|𝔽|+ε;

  3. (3)

    If ||Ω(n)(n)k1lognε3 and is drawn from the semirandom distribution described in Definition 1, then with probability 11poly(n) over the draw of the semirandom instance, i.e., the randomness of {bv}v, it holds that algval()1|𝔽|+ε.

Theorem 5 is nearly identical to Theorem 2: the main difference is that, ignoring the lower order poly(logn,ε1) term, the “constraint threshold” is now n(n/)k1 instead of n(n|𝔽|/)k/21, and this is smaller when n/|𝔽|12/k. As this algorithm is “captured” by the Sum-of-Squares hierarchy, the Sum-of-Squares lower bound in Theorem 4 is false when n/|𝔽|12/k. Thus, the range of where the lower bound in Theorem 4 holds is nearly tight: we show it holds for n/|𝔽|, and it is false for n/|𝔽|12/k.

We also extend our refutation result for k-LIN(𝔽) to k-LIN(m) for composite m, and more generally to any finite Abelian group G. Below, we define the natural extension of Definition 1 to Abelian groups, and then state our generalization of Theorem 2 to this setting. Recall that by the fundamental theorem of finite Abelian groups, any finite Abelian group G is isomorphic to i=1rmi for some m1,,mr.

Definition 6 ((Semirandom) k-LIN over an Abelian group G).

Let G=i=1rmi for some m1,,mr. An instance of k-LIN(G) is =(,{bv}v), where is a set of k-sparse vectors in Gn and bvG for all v. We view as representing the system of linear equations with variables x1,,xn specified by v,x=bv for each v. Note that the inner product notation here represents v,x=i=1nvixi where the multiplication is direct product multiplication over each mi. The value of the instance, which we denote by val(), is the maximum over x𝔽n of the fraction of constraints satisfied by x. That is, val()=maxxGn1||v1(v,x=bv).

An instance of k-LIN is random if is a random subset of k-sparse vectors and each bv is drawn independently and uniformly from G.

An instance of k-LIN is semirandom if each bv is drawn independently and uniformly from G (but may be arbitrary).

Definition 6 allows each equation to have coefficients on the variables, which was natural in the case of finite fields (Definition 1), but may appear strange in the case of a general finite Abelian group, where it is perhaps more natural to only have coefficients that are 1. However, because we are working with semirandom instances, the “left-hand sides” of the equations are arbitrary, and so semirandom instances “capture” the special case where the coefficients are all 1. We choose this perhaps nonstandard definition because it is more general; it seamlessly captures both the case of a finite field 𝔽 or a ring m, where coefficients are natural, and also a finite Abelian group.

Our final result generalizes Theorem 2 to the case of k-LIN(G).

Theorem 7 (Tight refutation of semirandom k-LIN(G)).

Fix k/2. There is an algorithm that takes as input a k-LIN(G) instance =(,{bv}v in n variables and outputs a number algval()[0,1] in time (|G|n)O() with the following two guarantees:

  1. (1)

    algval()val() for every instance ;

  2. (2)

    If ||Ω(n)(n|G|)k/21log(|G|n)ε5 and is drawn from the semirandom distribution described in Definition 6, then with probability 11poly(n) over the draw of the semirandom instance, i.e., the randomness of {bv}v, it holds that algval()1|G|+ε.

The proof of Theorem 7 encounters additional technical difficulties compared to Theorem 2, arising from zero divisors in m for composite m. Roughly, this allows a semirandom instance to embed equations within a subgroup of m, by, e.g., choosing only equations where the coefficients are divisible by some integer d2, and handling this issue requires an additional step and an extra factor ε1.

2 Preliminaries

2.1 Basic notation

We let [n] denote the set {1,,n}. For two subsets S,T[n], we let ST denote the symmetric difference of S and T, i.e., ST:={i:(iSiT)(iSiT)}. For a natural number t, we let ([n]t) be the collection of subsets of [n] of size exactly t.

For a rectangular matrix Am×n, we let A2:=maxxm,yn:x2=y2=1xAy denote the spectral norm of A.

For a vector v𝔽n, we let supp(v):={i:vi0} and wt(v):=|supp(v)|. For a field 𝔽 with char(𝔽)=p, we let Tr() denote the trace map of 𝔽 over 𝔽p.

For a matrix An×n, we let tr(A) be the trace of A, i.e., i=1nAi,i. This should not be confused with the trace map for field elements, which we denote by Tr(). For two vectors x,yn we define the following inner product:

x,y=xy=i=1nxi¯yi.

2.2 Fourier analysis

Let G be an Abelian group isomorphic to m1××mr via the isomorphism ψ. For m, we let ωm:=e2πim. For α,xG, we define

χα(x)=i=1rωmiψ(α)iψ(x)i.

These functions form a Fourier basis for G, as shown in [31]. This extends to a Fourier basis for Gn as follows. For v,xGn, we define

χv(x)=i=1nχvi(xi).

For a function f:Gn, we have that for each xGn,

f(x)=vGnf^(v)χv(x),

where f^(v)=𝔼xGn[f(x)χv(x)¯].

For the special case of functions f:𝔽n with char(𝔽)=p, we note that the standard Fourier basis is simply

χv(x)=ωpTr(v,x).

2.3 Binomial coefficient inequalities

In this section, we state and prove the following fact about binomial coefficients that we will use.

Proposition 8.

Let n,,q be positive integers with n. Let q be constant and ,n be asymptotically large with n/2. Then,

(nq)(n)=Θ((n)q),
(nq)(n)=Θ(1).

Proof.

We have that

(nq)(n)=(q)(n+qq).

Using that (ab)b(ab)(eab)b finishes the proof of the first equation.

We also have that

(nq)(n)=(nq)!(n)!n!(nq)!=i=0q1nini=i=0q1(1ni),

and this is Θ(1) since n/2 and q is constant.

3 Spectral Algorithms for Refuting Semirandom 𝒌-LIN(𝔽) for Even 𝒌

As our proof overview, we will give a complete proof of Theorem 2 in the case when k is even. As in [18, 20], the proof is substantially simpler in the case of even k. We will assume familiarity with the notation and conventions defined in Section 2.

Our refutation algorithm for semirandom k-LIN roughly follows the framework established in [18, 20]. The main technical tool we use is a generalization of the Kikuchi matrix of [36] for 𝔽2 to arbitrary finite fields 𝔽. Analyzing the spectral norm of this matrix requires a more complicated trace moment calculation as compared to the case of 𝔽2, and requires a careful choice of the Kikuchi matrix (see Remark 11).

We let char(𝔽)=p and ωp=e2πi/p denote a primitive p-th root of unity in .

3.1 Step 1: Expressing a 𝒌-LIN(𝔽) instance as a polynomial in

As the first step in the proof, we make the following observation, which shows that we can express the fraction of constraints satisfied by an assignment x𝔽n as a polynomial in n variables in .

Observation 9.

For a k-LIN(𝔽) instance =(,{bv}v), let val(,x) denote the fraction of constraints satisfied by an assignment x𝔽n. Then, we can express val(,x) as a polynomial in . That is,

val(,x)=1|𝔽|+1|||𝔽|vβ𝔽ωpTr(βbv)χβv(x)¯:=1|𝔽|+Φ(x).

Proof.

Recall that a constraint in takes the form v,x=bv for v, where x𝔽n are the variables. The indicator variable for this event is simply:

1(v,x=bv)=𝔼β𝔽[ωpTr(βbvβv,x)]=1|𝔽|β𝔽ωpTr(βbv)χβv(x)¯.

where p=char(𝔽). Indeed, if v,x=bv, then Tr(βbvβv,x)=0 for all β𝔽. If bvv,x0, i.e., it is some α𝔽, then 𝔼β𝔽[ωpTr(βα)]=𝔼β𝔽[ωpTr(β)]=0. Hence, it follows that

val(,x)=1||v1(v,x=bv)=1||v1|𝔽|β𝔽ωpTr(βbv)χβv(x)¯
=1|𝔽|+1|||𝔽|vβ𝔽ωpTr(βbv)χβv(x)¯,

which finishes the proof.

3.2 Step 2: Expressing 𝚽(𝒙) as a quadratic form on a Kikuchi matrix

In light of ˜9, it thus remains to find a certificate that bounds maxx𝔽nΦ(x). We do this by generalizing the analysis of [18] and constructing a Kikuchi matrix whose spectral norm provides a certificate bounding the maximum value of Φ.

Definition 10.

(Even-arity Kikuchi matrix over 𝔽). Let k/2n/2 be a parameter,333Note that it suffices to prove Theorem 2 for in this range. and let N=|𝔽|(n). For each k-sparse vector v𝔽n and β𝔽, we define a matrix Av,βN×N as follows. First, we identify N with the set of -sparse vectors in 𝔽n. Then, for -sparse vectors U,V𝔽n, we let

Av,β(U,V)={1Uv,βV0otherwise

where we say Uv,βV if UV=βv and supp(U)supp(V)=supp(v).

Let Φ(x)=1|𝔽|||vβ𝔽cv,βχβv be a polynomial defined by a set of k-sparse vectors from 𝔽n and complex coefficients {cv,β}vβ𝔽. We define the level- Kikuchi matrix for this polynomial to be A=vβ𝔽cv,βAv,β. We refer to the graph (with complex weights) defined by the underlying adjacency matrix as the Kikuchi graph.

 Remark 11.

Our Kikuchi matrix in Definition 10 has an additional condition that supp(U)supp(V)=supp(v). A perhaps more natural generalization of the 𝔽2 Kikuchi matrix of [36, 18] would be the matrix where this condition is removed, i.e., we only require that UV=βv. As we shall see, when we do the trace moment calculation at the end of Section 3.3, it is crucial that for any U and v, there is at most one β𝔽 such that Uv,βV for some V. That is, the number of edges adjacent to U “coming from” a constraint v is at most 1. If this uniqueness of β did not hold, we would lose an additional factor of 𝔽 in the number of constraints that we require in Theorem 2. This is a substantial increase, as e.g., this would increase the number of constraints when k=2 to n|𝔽| when the correct dependence is n. The condition that supp(U)supp(V)=supp(v) implies uniqueness of β above, and without this condition we could have a U with supp(v)U, in which case Uαv=V for an -sparse vector V in 𝔽n for all α𝔽.

 Remark 12.

We note that in Definition 10, we have Av,β=Aβv,1. The reason we use the above definition with two parameters v and β is that it will be more convenient when counting walks in the matrix A, as it makes explicit the choice of v and β. Note that in , there could exist v and v with βv=v for some β𝔽, and we need to count these terms separately.

Observation 13.

The Kikuchi matrix A is always Hermitian.

Proof.

To see this note that UV=βvVU=βv, χβ¯=χβ, and is commutative.

The following observation shows that we can express Φ(x) as a quadratic form on the matrix A defined in Definition 10. Thus, A2 bounds maxx𝔽nΦ(x).

Observation 14.

For x𝔽n define yN as follows. For each -sparse U𝔽n, we set yU=χU(x). Then

Φ(x)=1|||𝔽|ΔyAy,

where Δ:=(kk/2)(nkk/2)|𝔽|k/2.

Proof.

yAy =U,V𝔽nwt(U)=wt(V)=A(U,V)χU(x)¯χV(x)
=U,V𝔽nwt(U)=wt(U)=v,β𝔽1(Uv,βV)cv,βχU(x)¯χV(x)
=U,V𝔽nwt(U)=wt(U)=v,β𝔽1(Uv,βV)cv,βχUV(x)¯
=U,V𝔽nwt(U)=wt(U)=v,β𝔽1(Uv,βV)cv,βχβv(x)¯.

For each v and β𝔽, the term cv,βχβv(x)¯ appears once for each pair of vertices (U,V) with Uv,βV. Let us now argue that the number of such pairs (U,V) is exactly Δ=(kk/2)(nkk/2)|𝔽|k/2. We will count the number of pairs (U,V) by first specifying supp(U) and supp(V), and then by specifying Ui for each isupp(U) (and same for V). We first require that supp(U)supp(V)=supp(v), which in turn means that supp(U) has intersection exactly k/2 with supp(v) and likewise for supp(V). Thus, we can pay (kk/2) to count the number of ways to split supp(v) into two equal parts. Second, we need to specify supp(U)supp(v), which is equal to supp(V)supp(v), which is (nkk/2) choices. Finally, we need to specify Ui for each isupp(U) and Vi for each isupp(V). For each isupp(U)supp(v), we set Ui=(βv)i, and for each isupp(U)supp(v), we can set Ui to be any element in 𝔽. Note that specifying U then determines V, so we have |𝔽|k/2 choices. This finishes the proof.

Next, we compute the average degree (or number of non-zero entries) in a row/column in A.

Observation 15.

For U𝔽n with wt(U)=, we define the graph degree as normal:

deg(U):=|{βvβ𝔽,v s.t. V𝔽n,wt(V)=,Uv,βV}|.

Then 𝔼[deg(U)]|𝔽|2(|𝔽|n)k/2||.

Proof.

Each v contributes |𝔽|Δ to the total degree, so the average degree is 𝔼[deg(U)]=|||𝔽|ΔN. We then have:

𝔼[deg(U)]=|𝔽|ΔN||=|𝔽|k/2+1(kk/2)(nkk/2)|𝔽|(n)|||𝔽|2(|𝔽|n)k/2||,

where the last inequality follows from Proposition 8.

3.3 Step 3: Bounding the spectral norm of 𝑨 via the trace moment method

The following spectral norm bound now implies Theorem 2.

Lemma 16.

Let A be the level- Kikuchi matrix over 𝔽n defined in Definition 10 for the k-LIN instance =(,{bv}v). Let ΓN×N be the diagonal matrix Γ=D+d𝕀 where DU,U:=deg(U) and d=𝔼[deg(U)]. Suppose that the bv’s are drawn independently and uniformly from 𝔽, i.e., the instance is semirandom (Definition 1). Then, with probability 11poly(n), it holds that

Γ1/2AΓ1/22O(log(|𝔽|n)d).

We postpone the proof of Lemma 16 to the end of this section, and now finish the proof of Theorem 2.

Proof of Theorem 2 from Lemma 16.

Let =(,{bv}v) be the input to the algorithm. Given , the algorithm constructs the matrix A and computes algval()=1|𝔽|+2|𝔽||𝔽|A~2, where A~=Γ1/2AΓ1/2. It remains to argue that this quantity has the desired properties.

Let Φ(x) be the polynomial defined in ˜9. For each x𝔽n, letting yn be the vector defined in ˜14, we have

Φ(x)=1|𝔽|||ΔyAy=1|𝔽|||Δ(Γ1/2y)A~(Γ1/2y)1|𝔽|||ΔA~2Γ1/2y22
=1|𝔽|||ΔA~2tr(Γ)=2|𝔽||𝔽|A~2,

where we use that Γ1/2y22=yΓy=UΓU|yU|2=UΓU=tr(Γ) since |yU|=1 for all U, and that tr(Γ)=2|||𝔽|Δ. Hence,

val()=1|𝔽|+maxx𝔽nΦ(x)1|𝔽|+2|𝔽||𝔽|A~2,

which proves Item 1 in Theorem 2.

To prove Item 2, we observe that by Lemma 16, if is semirandom, then with high probability over the draw of the bv’s, it holds that

A~2O(log(|𝔽|n)d).

From ˜15, we have d|𝔽|2(|𝔽|n)k/2||. Hence, if we have that ||Cnlog(|𝔽|n)(|𝔽|n)k/21ε2 for a sufficiently large constant C, then A~2ε with probability 11/poly(n). This proves Item 2.

It remains to prove Lemma 16, which we do using the trace moment method.

Proof of Lemma 16.

By ˜13, we have that A~2tr((Γ1A)2t)1/2t for any positive integer t. Because the bv’s are drawn independently from 𝔽, the matrix A~ is a random matrix. By Markov’s inequality,

Pr[tr((Γ1A)2t)N𝔼[tr((Γ1A)2t)]]1N.

We note this event is the same as tr((Γ1A)2t)1/2tN1/2t𝔼[tr((Γ1A)2t)]1/2t, and for 2tlogN we have N1/2tO(1). This immediately gives us that with probability 11N, A~2O(𝔼[tr((Γ1A)2t)]1/2t). We then have that

𝔼[tr((Γ1A)2t)] =𝔼[tr((Γ1v,β𝔽cv,βAv,β)2t)]
=𝔼[tr((v1,β1),,(v2t,β2t)×𝔽i=12tΓ1cvi,βiAvi,βi)]
=(v1,β1),,(v2t,β2t)×𝔽𝔼[tr(i=12tΓ1cvi,βiAvi,βi)]
=(v1,β1),,(v2t,β2t)×𝔽𝔼[i=12tcvi,βi]tr(i=12tΓ1Avi,βi).

Let us now make the following observation. Let (v1,β1),,(v2t,β2t)×𝔽 be a term in the above sum. Fix v, and let R(v) denote the set of i[2t] such that vi=v. We observe that if for some v, iR(v)βi0, then 𝔼[i=12tcvi,βi]=0. Indeed, this is because bv is independent for each v, and so 𝔼[i=12tcvi,βi]=v𝔼[iR(v)cv,βi], and

𝔼[iR(v)cv,βi]=𝔼[iR(v)ωpTr(βibv)]=𝔼[ωpTr((iR(v)βi)bv)].

Then, since bv is uniform from 𝔽, it follows that 𝔼[ωpTr((iR(v)βi)bv)]=0 if iR(v)βi0, and 𝔼[ωpTr((iR(v)βi)bv)]=1 if iR(v)βi=0. This motivates the following definition.

Definition 17 (Trivially closed sequence).

Let (v1,β1),,(v2t,β2t)×𝔽. We say that (v1,β1),,(v2t,β2t)×𝔽 is trivially closed with respect to v if it holds that iR(v)βi=0. We say that the sequence is trivially closed if it is trivially closed with respect to all v.

With the above definition in hand, we have shown that

𝔼[tr((Γ1A)2t)]=(v1,β1),,(v2t,β2t)trivially closedtr(i=12tΓ1Avi,βi).

The following lemma yields the desired bound on 𝔼[tr((Γ1A)2t)].

Lemma 18.

(v1,β1),,(v2t,β2t)trivially closedtr(i=12tΓ1Avi,βi)N22t(2td)t.

With Lemma 18, we thus have the desired bound 𝔼[tr((Γ1A)2t)]. Taking t to be clog2N for a sufficiently large constant c and applying Markov’s inequality finishes the proof.

Proof of Lemma 18.

We bound the sum as follows. First, we observe that for a trivially closed sequence (v1,β1),,(v2t,β2t), we have

tr(i=12tΓ1Avi,βi)=U0,U1,,U2t1i=02t1ΓUi11(Uivi+1,βi+1Ui+1),

where we define U2t=U0. Thus, the sum that we wish to bound in Lemma 18 simply counts the total weight of “trivially closed walks” U0,v1,β1,U1,,U2t1,v2t,β2t,U2t (where U2t=U0) in the Kikuchi graph A, where the weight of a walk is simply i=02t1ΓUi1.

Let us now bound this total weight by encoding a walk U0,v1,β1,U1,,U2t1,v2t,β2t,U2t as follows.

  • First, we write down the start vertex U0.

  • For i=1,,2t, we let zi be 1 if vi=vj for some j<i. In this case, we say that the edge is “old”. Otherwise zi=0, and we say that the edge is “new”.

  • For i=1,,2t, if zi is 1 then we encode Ui by writing down the smallest j[2t] such that vi=vj. We note that we do not need to specify the element βi, as for any vertex U, there is at most one V and one β𝔽 such that 1(Uvi,βV). As pointed out in Remark 11, this crucially saves us a factor of |𝔽| in the total number of constraints that we require.

  • For i=1,,2t, if zi is 0 then we encode Ui by writing down an integer in 1,,deg(Ui1) that specifies the edge we take to move to Ui from Ui1 (we associate [deg(Ui1)] to the edges adjacent to Ui1 with an arbitrary fixed map).

With the above encoding, we can now bound the total weight of all trivially closed walks as follows. First, let us consider the total weight of walks for some fixed choice of z1,,z2t. We have N choices for the start vertex U0. For each i=1,,2t where zi=0, we have deg(Ui1) choices for Ui, and we multiply by a weight of ΓUi111deg(Ui1). For each i=1,,2t where zi=1, we have at most 2t choices for the index j<i, and we multiply by a weight of ΓUi111d. Hence, the total weight for a specific z1,,z2t is at most N(2td)r, where r is the number of zi such that zi=1.

Finally, we observe that any trivially closed walk must have rt. Hence, after summing over all z1,,z2t, we have the final bound of N22t(2td)t, which finishes the proof.

References

  • [1] Jackson Abascal, Venkatesan Guruswami, and Pravesh K. Kothari. Strongly refuting all semi-random Boolean CSPs. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 – 13, 2021, pages 454–472. SIAM, 2021. doi:10.1137/1.9781611976465.28.
  • [2] Sarah R. Allen, Ryan O’Donnell, and David Witmer. How to Refute a Random CSP. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 689–708. IEEE Computer Society, 2015. doi:10.1109/FOCS.2015.48.
  • [3] Omar Alrabiah and Venkatesan Guruswami. Near-tight bounds for 3-query locally correctable binary linear codes via rainbow cycles. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024. IEEE, 2024. doi:10.1109/FOCS61266.2024.00112.
  • [4] Omar Alrabiah, Venkatesan Guruswami, Pravesh K. Kothari, and Peter Manohar. A near-cubic lower bound for 3-query locally decodable codes from semirandom CSP refutation. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 1438–1448. ACM, 2023. doi:10.1145/3564246.3585143.
  • [5] Benny Applebaum, Boaz Barak, and Avi Wigderson. Public-key cryptography from different assumptions. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 171–180. ACM, 2010. doi:10.1145/1806689.1806715.
  • [6] Sanjeev Arora, David R. Karger, and Marek Karpinski. Polynomial time approximation schemes for dense instances of NP-hard problems. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, 29 May-1 June 1995, Las Vegas, Nevada, USA, pages 284–293. ACM, 1995. doi:10.1145/225058.225140.
  • [7] Boaz Barak, Siu On Chan, and Pravesh K. Kothari. Sum of Squares Lower Bounds from Pairwise Independence. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 97–106. ACM, 2015. doi:10.1145/2746539.2746625.
  • [8] Arpon Basu, Jun-Ting Hsieh, Pravesh Kothari, and Andrew Lin. Improved lower bounds for all odd-query locally decodable codes. Electron. Colloquium Comput. Complex., pages TR24–189, 2024. URL: https://eccc.weizmann.ac.il/report/2024/189.
  • [9] Siavosh Benabbas, Konstantinos Georgiou, Avner Magen, and Madhur Tulsiani. SDP gaps from pairwise independence. Theory of Computing, 8(1):269–289, 2012. doi:10.4086/TOC.2012.V008A012.
  • [10] Amin Coja-Oghlan, Andreas Goerdt, and André Lanka. Strong refutation heuristics for random k-SAT. Combinatorics, Probability & Computing, 16(1):5, 2007.
  • [11] Henry Corrigan-Gibbs, Alexandra Henzinger, Yael Kalai, and Vinod Vaikuntanathan. Somewhat homomorphic encryption from linear homomorphism and sparse LPN. IACR Cryptol. ePrint Arch., page 1760, 2024. URL: https://eprint.iacr.org/2024/1760.
  • [12] Quang Dao, Yuval Ishai, Aayush Jain, and Huijia Lin. Multi-party homomorphic secret sharing and sublinear MPC from sparse LPN. In Advances in Cryptology – CRYPTO 2023 – 43rd Annual International Cryptology Conference, CRYPTO 2023, Santa Barbara, CA, USA, August 20-24, 2023, Proceedings, Part II, volume 14082 of Lecture Notes in Computer Science, pages 315–348. Springer, 2023. doi:10.1007/978-3-031-38545-2_11.
  • [13] Uriel Feige. Relations between average case complexity and approximation complexity. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 534–543, 2002. doi:10.1145/509907.509985.
  • [14] Uriel Feige. Small linear dependencies for binary vectors of low weight. In Building Bridges: Between Mathematics and Computer Science, pages 283–307. Springer, 2008.
  • [15] Dimitris Fotakis, Michael Lampis, and Vangelis Th. Paschos. Sub-exponential Approximation Schemes for CSPs: From Dense to Almost Sparse. In 33rd Symposium on Theoretical Aspects of Computer Science, STACS 2016, February 17-20, 2016, Orléans, France, volume 47 of LIPIcs, pages 37:1–37:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.STACS.2016.37.
  • [16] Andreas Goerdt and André Lanka. Recognizing more random unsatisfiable 3-sat instances efficiently. Electron. Notes Discret. Math., 16:21–46, 2003. doi:10.1016/S1571-0653(04)00461-5.
  • [17] Dima Grigoriev. Linear lower bound on degrees of positivstellensatz calculus proofs for the parity. Theoretical Computer Science, 259(1):613–622, 2001. doi:10.1016/S0304-3975(00)00157-2.
  • [18] Venkatesan Guruswami, Pravesh K. Kothari, and Peter Manohar. Algorithms and certificates for Boolean CSP refutation: smoothed is no harder than random. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 – 24, 2022, pages 678–689. ACM, 2022. doi:10.1145/3519935.3519955.
  • [19] Johan Håstad. Some optimal inapproximability results. Journal of the ACM (JACM), 48(4):798–859, 2001. doi:10.1145/502090.502098.
  • [20] Jun-Ting Hsieh, Pravesh K. Kothari, and Sidhanth Mohanty. A simple and sharper proof of the hypergraph Moore bound. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 2324–2344. SIAM, 2023. doi:10.1137/1.9781611977554.CH89.
  • [21] Russell Impagliazzo and Ramamohan Paturi. On the Complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367–375, 2001. doi:10.1006/JCSS.2000.1727.
  • [22] Aayush Jain, Huijia Lin, and Amit Sahai. Indistinguishability obfuscation from well-founded assumptions. In STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 60–73. ACM, 2021. doi:10.1145/3406325.3451093.
  • [23] Aayush Jain, Huijia Lin, and Amit Sahai. Indistinguishability obfuscation from LPN over 𝔽p, DLIN, and PRGs in NC0. In Advances in Cryptology – EUROCRYPT 2022 – 41st Annual International Conference on the Theory and Applications of Cryptographic Techniques, Trondheim, Norway, May 30 – June 3, 2022, Proceedings, Part I, volume 13275 of Lecture Notes in Computer Science, pages 670–699. Springer, 2022. doi:10.1007/978-3-031-06944-4_23.
  • [24] Oliver Janzer and Peter Manohar. A kqq2 lower bound for odd query locally decodable codes from bipartite kikuchi graphs. Electron. Colloquium Comput. Complex., pages TR24–187, 2024. URL: https://eccc.weizmann.ac.il/report/2024/187.
  • [25] Subhash Khot. On the power of unique 2-prover 1-round games. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 767–775. ACM, 2002. doi:10.1145/509907.510017.
  • [26] Pravesh K. Kothari and Peter Manohar. An exponential lower bound for linear 3-query locally correctable codes. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024, pages 776–787. ACM, 2024. doi:10.1145/3618260.3649640.
  • [27] Pravesh K. Kothari and Peter Manohar. Exponential lower bounds for smooth 3-lccs and sharp bounds for designs. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024. IEEE, 2024. doi:10.1109/FOCS61266.2024.00110.
  • [28] Pravesh K. Kothari, Ryuhei Mori, Ryan O’Donnell, and David Witmer. Sum of squares lower bounds for refuting any CSP. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 132–145. ACM, 2017. doi:10.1145/3055399.3055485.
  • [29] Ryuhei Mori and David Witmer. Lower Bounds for CSP Refutation by SDP Hierarchies. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2016, September 7-9, 2016, Paris, France, volume 60 of LIPIcs, pages 41:1–41:30, 2016. doi:10.4230/LIPICS.APPROX-RANDOM.2016.41.
  • [30] Assaf Naor and Jacques Verstraëte. Parity check matrices and product representations of squares. Combinatorica, 28(2):163–185, 2008. doi:10.1007/S00493-008-2195-2.
  • [31] Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
  • [32] Ryan O’Donnell and David Witmer. Goldreich’s PRG: evidence for near-optimal polynomial stretch. In 2014 IEEE 29th Conference on Computational Complexity (CCC), pages 1–12. IEEE, 2014. doi:10.1109/CCC.2014.9.
  • [33] Seyoon Ragavan, Neekon Vafa, and Vinod Vaikuntanathan. Indistinguishability obfuscation from bilinear maps and LPN variants. In Theory of Cryptography – 22nd International Conference, TCC 2024, Milan, Italy, December 2-6, 2024, Proceedings, Part IV, volume 15367 of Lecture Notes in Computer Science, pages 3–36. Springer, 2024. doi:10.1007/978-3-031-78023-3_1.
  • [34] Prasad Raghavendra, Satish Rao, and Tselil Schramm. Strongly refuting random CSPs below the spectral threshold. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 121–131. ACM, 2017. doi:10.1145/3055399.3055417.
  • [35] Grant Schoenebeck. Linear level lasserre lower bounds for certain k-csps. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 593–602. IEEE Computer Society, 2008. doi:10.1109/FOCS.2008.74.
  • [36] Alexander S. Wein, Ahmed El Alaoui, and Cristopher Moore. The Kikuchi Hierarchy and Tensor PCA. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 1446–1468. IEEE Computer Society, 2019. doi:10.1109/FOCS.2019.000-2.
  • [37] Tal Yankovitz. A stronger bound for linear 3-lcc. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024. IEEE, 2024. doi:10.1109/FOCS61266.2024.00109.