Abstract 1 Introduction 2 Counterexample 3 Local couplings References

Generalised Linial–Nisan Conjecture Is False for DNFs

Yaroslav Alekseev ORCID Technion, Haifa, Israel Mika Göös EPFL, Lausanne, Switzerland Ziyi Guan ORCID EPFL, Lausanne, Switzerland Gilbert Maystre ORCID EPFL, Lausanne, Switzerland Artur Riazanov ORCID EPFL, Lausanne, Switzerland Dmitry Sokolov ORCID EPFL, Lausanne, Switzerland Weiqiang Yuan ORCID EPFL, Lausanne, Switzerland
Abstract

Aaronson (STOC 2010) conjectured that almost k-wise independence fools constant-depth circuits; he called this the generalised Linial–Nisan conjecture. Aaronson himself later found a counterexample for depth-3 circuits. We give here an improved counterexample for depth-2 circuits (DNFs). This shows, for instance, that Bazzi’s celebrated result (k-wise independence fools DNFs) cannot be generalised in a natural way. We also propose a way to circumvent our counterexample: We define a new notion of pseudorandomness called local couplings and show that it fools DNFs and even decision lists.

Keywords and phrases:
pseudorandomness, DNFs, bounded independence
Funding:
Yaroslav Alekseev: Supported by ISF grant 507/24.
Copyright and License:
[Uncaptioned image] © Yaroslav Alekseev, Mika Göös, Ziyi Guan, Gilbert Maystre,
Artur Riazanov, Dmitry Sokolov, and Weiqiang Yuan; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Circuit complexity
; Theory of computation Pseudorandomness and derandomization
Acknowledgements:
We thank Shalev Ben-David and Avishay Tal for helpful email communication.
Funding:
This project was supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number MB22.00026.
Editors:
Srikanth Srinivasan

1 Introduction

Linial and Nisan [16] conjectured that “k-wise independent” distributions fool constant-depth circuits (class AC0). More specifically, a distribution 𝒟 over {0,1}n is called k-independent if the marginal distribution on every k-sized subset of bits is uniform. We say that 𝒟 δ-fools a circuit C if the circuit cannot distinguish 𝒟 from the uniform distribution on {0,1}n:

|Pr𝒙𝒟[C(𝒙)=1]Pr𝒙{0,1}n[C(𝒙)=1]|δ.

The Linial–Nisan conjecture was first proved for depth-2 circuits (DNFs and CNFs) by Bazzi [4] (with a simplification by Razborov [21]) and then for every AC0-circuit by Braverman [7]. Indeed, Braverman showed that every size-s AC0-circuit is o(1)-fooled by poly(logs)-independence.

Aaronson [1] asked whether the Linial–Nisan conjecture could be strengthened to hold also for “almost k-wise independence,” a seemingly modest generalisation. We say that a distribution 𝒟 over {0,1}n is (ε,k)-independent if for every subset I[n], |I|=k, the marginal distribution on the bits in I is multiplicatively close to uniform in the sense that for every α{0,1}I,

(1ε)2kPr𝒙𝒟[𝒙I=α](1+ε)2k.
Generalised Linial–Nisan Conjecture (GLN).

Let 𝒟 be a (1/nΩ(1),nΩ(1))-independent distribution over {0,1}n. Then 𝒟 o(1)-fools every AC0-circuit of size 2no(1).

Aaronson’s original motivation for this conjecture was to resolve a problem in quantum complexity theory. He showed that a positive resolution of GLN would imply the separation BQPPH relative to an oracle. (This separation was subsequently proved by Raz and Tal [20] by a different approach.) Later, Aaronson himself found a counterexample to GLN for depth-3 circuits [2], but he still re-posed the conjecture (and thought it “plausible”) for depth-2 circuits. Our main result here is to refute the GLN conjecture in this remaining case.

Theorem 1 (Main result).

There exists a (1/nΩ(1),nΩ(1))-independent distribution 𝒟 over {0,1}n and a O(log3n)-width DNF formula F such that

Pr𝒙𝒟[F(𝒙)=1]Pr𝒙{0,1}n[F(𝒙)=1]Ω(1).

Let us make two notes about the parameters here. First, our formula F will have quasi-polynomial size, whereas Aaronson’s depth-3 counterexample has only polynomial size; hence his example achieves slightly better parameters (at the cost of larger depth). Second, our construction can be varied to produce the following tradeoff: by increasing the DNF width to any wno(1), we can make the distribution (exp(wΩ(1)),nΩ(1))-independent (see Section 2.5).

1.1 Implications and related work

One consequence of the failure of the GLN conjecture is to the construction of pseudorandom generators (PRGs) for DNFs. It is known that for kΩ(logn) there exist (o(1),k)-independent distributions with support size 2O(k) [18, 3], which is smaller than nΩ(k) that is required for truly k-wise independent distributions [8]. Thus Theorem 1 rules out a natural approach (“output an almost k-wise independent distribution”) to improving the seed length of PRGs. For the current state-of-the-art PRGs for DNFs, see [9, 22, 17]; see also the survey [13].

Another lesson from Theorem 1 is to the further development of circuit lower bound methods. We find it important to seek alternative proofs of central theorems such as Bazzi’s [4, 21] and its extensions [7]. The existing proofs use the polynomial method to approximate a DNF with a low-degree polynomial. Is there a more “combinatorial” proof of Bazzi’s theorem? One such more combinatorial approach is the top-down lower bound method [12, 11], which often uses entropy-based arguments to analyse circuits. We interpret the failure of GLN as a challenge to such top-down methods. While the method is in a formal sense complete (it can prove any lower bound that is true), the typical entropy counting arguments have a hard time distinguishing almost k-wise independent distributions from truly k-wise independent ones, suggesting that any top-down proof of Bazzi’s theorem would require substantially new ideas.

Finally, we mention that – besides (almost) k-wise independence – several other notions of pseudorandomness have been considered in the literature [6, 5, 14].

1.2 Workaround: Local couplings

To complement our main result, we also propose a way to circumvent the failure of GLN by proposing a new notion of pseudorandomness called local couplings. This notion is useful for fooling depth-2 circuit models, but not depth-3 models; in particular, we show the following claims:

  1. 1.

    Local couplings fool DNFs (query complexity analogue of NP).

  2. 2.

    Local couplings fool decision lists (query complexity analogue of PNP).

  3. 3.

    Local couplings do not fool depth-3 circuits (query complexity analogue of Σ2P).

Definition 2 (Local couplings).

A pair of jointly distributed random variables (𝐱,𝐲)({0,1}n)2 is an ε-semi-coupling if for every ysupp(𝐲) and i[n],

Pr[𝒙i𝒚i𝒚=y]ε.

We say that (𝐱,𝐲) is an ε-coupling if both (𝐱,𝐲) and (𝐲,𝐱) are ε-semi-couplings.

The notion of a local coupling was somewhat implicit in Aaronson’s analysis [2] of his depth-3 counterexample. Local couplings are also a stronger variant of a notion proposed by Zhandry [23] that he called “substitution distance.”

Claims 12

A width-k decision list is a sequence of pairs {(Ti,ai)}i[m] where Ti are k-terms (conjunctions of at most k literals) and ai{0,1} are output values. A decision list defines f:{0,1}n{0,1} as follows: f(x)=ai where i=min{i[m]Ti(x)=1}. The decision list width of a function is polynomially equivalent to the number of DNF queries necessary to compute the function [10, Appendix A]. In other words it is indeed a query complexity analogue of PNP. The following theorem (Section 3.1) formalises 12 when 𝒚 is uniformly distributed.

Theorem 3.

Let f be computed by a width-k decision list. For any ε-coupling (𝐱,𝐲),

Pr[f(𝒙)f(𝒚)]2kε.

Claim 3

Aaronson’s [2] original counterexample involved a distribution 𝒟 related to a certain surjectivity function, which can be computed by a small depth-3 circuit. We observe (Section 3.2) that Aaronson’s distribution 𝒟 can indeed be locally coupled with the uniform distribution, which implies that local couplings do not fool depth-3 circuits (Claim 3). We can furthermore conclude (using Theorem 3) that 𝒟 fools decision lists – this claim was already made earlier by Aaronson [2, Theorem 3], but his proof contained a mistake,111The mistake is acknowledged on the author’s homepage. An implication of this result would have been to show the separation Π2PPNP relative to a random oracle. That result however follows (using a function different from surjectivity) from the more recent result that PH is infinite in the random oracle model [15]. which we can now fix with the notion of local couplings. Finally, we also show (Section 3.3) that an ε-semi-coupling is not enough by itself to fool DNFs – one truly needs the two-sided condition of an ε-coupling.

2 Counterexample

In this section, we prove Theorem 1 by constructing a DNF formula F and an associated almost k-independent distribution 𝒟 that F can distinguish from uniform. We first (Section 2.1) construct a weak example that distinguishes 𝒟 from uniform with advantage 1/poly(logn). Then (Section 2.4) we amplify this advantage to Ω(1) by using a standard majority trick.

2.1 Construction

Our starting point is the address function Addr:{0,1}m×{0,1}2m{0,1} defined as Addr(a,p):=pa. Here we write pa to mean pint(a) where int(a)[2m] is the integer corresponding naturally to the bitstring a. Let us first observe that Addr together with the uniform distribution over Addr1(1) “almost works” as the counterexample in Theorem 1. For (𝒂,𝒑)Addr1(1) the distribution of 𝒑 is already (o(1),2Ω(m))-independent. The reason the whole (𝒂,𝒑) does not have the same property is that for example, fixing all bits of 𝒂 to some a forces 𝒑a=1, so for some I[m+2m] containing all bits describing 𝒂 and the bit 𝒑a the settings of (𝒂,𝒑)I with 𝒑a=0 have probability zero.

To avoid this issue we hide the bits of the address using the usual tribes function:

Tribes(A):=j[r]k[m]Ak,j.

The input here is an m×r boolean matrix and the function returns 1 iff the matrix contains an all-1 column. It is well-known [19, §4.2] that if we choose r:=2mln2, the function becomes balanced, meaning that Pr𝑨{0,1}m×r[Tribes(𝑨)]=1/2+o(1).

A natural attempt to define a counterexample would be to consider the distinguishing function Addr((Tribes(A1),,Tribes(Am)),p). This does not work since this function requires polynomial DNF width as the negation of Tribes reduces to it. We fix this by replacing the Addr function by its monotone version: mAddr:{0,1}m×{0,1}2m{0,1} is defined as

mAddr(a,p):={0if |a|<m/2paif |a|=m/21if |a|>m/2, where |a| is the Hamming weight of a.

We are now ready to define our function f:({0,1}m×r)m×{0,1}2m{0,1} by (see also Figure 1)

f(A1,,Am,p):=mAddr((Tribes(A1),,Tribes(Am)),p), (1)

Note that input size of f is n:=m2r+2m. The following constructs a narrow DNF for f.

Figure 1: Illustration of mAddr(Tribes(A1),,Tribes(A4),p). Blue cells correspond to 1-input bits, white cells correspond to 0-input bits. The address a=(Tribes(A1),,Tribes(A4)) is (0,1,0,1), so it satisfies |a|=4/2. Hence the function outputs pint(a)=p11=1.
Claim 4.

There exists a DNF F of width O(log2n) that computes f.

Proof.

A DNF is commonly viewed as a collection of 1-certificates: f is computable by a k-DNF iff for each point xf1(1) there exists a certificate comprised of a subset of input bits I[n] of size k and α{0,1}I such that xI=α implies f(x)=1. Hence it is enough to provide a certificate of width O(m2)=O(log2n) for each 1-input of f. Consider a 1-input x=(A1,,Am,p) and let a:=(Tribes(A1),,Tribes(Am)). If |a|>m/2, a 1-certificate is simply a set of matrices H[m] of size |H|=m/2+1 together with a 1m-column in each of those matrices. Such certificates fix (m/2+1)m variables. A similar idea can certify 1-inputs with |a|=m/2, at the cost of adding the corresponding bit pa.

We now define 𝒟 as a distribution of the random variable 𝒙 defined below.

Definition 5.

Let 𝐱=(𝐀1,𝐀2,,𝐀m,𝐩) over {0,1}n be sampled as follows:

  1. 1.

    Sample 𝑨i{0,1}m×r uniformly and independently for each i[m].

  2. 2.

    Sample 𝒑{0,1}2m uniformly and independently.

  3. 3.

    Let 𝒂=(Tribes(𝑨1),,Tribes(𝑨m)); If |𝒂|=m/2, fix 𝒑𝒂=1.

We show that 𝒟 is (n1/5,n1/5)-independent, yet f distinguishes 𝒟 from the uniform distribution, which together with Claim 4 implies the following weaker version of Theorem 1:

Lemma 6.

The distribution 𝒟 as in Definition 5 is (n1/5,n1/5)-independent, but there is an O(log2n)-DNF F such that Pr𝐱𝒟[F(𝐱)=1]Pr𝐱{0,1}n[F(𝐱)=1]=Ω(log1/2n).

We reduce the proof of Lemma 6 to the following two lemmas:

Lemma 7.

Pr𝒙𝒟[f(𝒙)=1]Pr𝒙{0,1}n[f(𝒙)=1]=Ω(1/m), where m is as in Definition 5.

Lemma 8.

The distribution 𝒟 as in Definition 5 is (ε,k)-independent for k2m, εk2m/2+1.

Proof of Lemma 6.

𝒟 is distributed over {0,1}n where n:=m22mln2+2m. Apply Lemma 8 with k=n1/5 and ε=n1/52m/2+1n1/5. Then by Lemma 7 and Claim 4 there exists O(m2)=O(log2n)-DNF that Ω(1/m)=Ω(1/logn)-distinguishes 𝒟 from 𝒰, where the latter is uniformly distributed over {0,1}n.

2.2 Proof of Lemma 7

Let 𝒙:=(𝑨1,,𝑨m,𝒑)𝒟 and 𝒚{0,1}n. Since the matrices 𝑨i are uniformly generated, it is possible to couple 𝒙 and 𝒚 by defining 𝒚:=(𝑨1,,𝑨m,𝒑) where 𝒑{0,1}2m. Note that the address part of each input coincides and in particular, they share the event E:=|𝒂|=m/2.

Observe that Pr[f(𝒙)=1¬E]=Pr[f(𝒚)=1¬E] by the definition of 𝒟: if |𝒂|m/2 then Step (3) is not reached in the definition of 𝒟 and 𝒑 is uniform. On the other hand we have Pr[F(𝒙)=1E]=1. Indeed, if E holds, we have F(𝒙)=𝒑𝒂=1 by the definition of f and 𝒟.

Pr[F(𝒚)=1E] =Pr[𝒑𝒂=1E]
=a{0,1}mPr[𝒑a=1E𝒂=a]Pr[𝒂=aE]
=a{0,1}mPr[𝒑a=1]Pr[𝒂=aE]=12.

Thus, Pr[f(𝒙)=1]Pr[f(𝒚)=1]=Pr[E]/2 and so it remains to bound Pr[E]. For that, we need the following simple fact:

Lemma 9.

Let 𝐱 distributed over {0,1}n according to a product distribution such that |Pr[𝐱i=1]1/2|ε. Then Δ(𝐱,𝐮):=maxE{0,1}n|Pr[𝐱E]Pr[𝐮E]|2nε, where 𝐮{0,1}n.

Proof.

Let us couple 𝒙 with 𝒖 as follows: suppose Pr[𝒙i=1]=1/2+p. We then set 𝒖i:=𝒙i with probability 1/(1+2|p|) and 𝒖i:=p>0:=(1 if p>0 otherwise 0) with probability 11/(1+2|p|)2|p|2ε. Then Pr[𝑼i=1]=(1/2+p)/(1+2|p|)+p>0(11/(1+2|p|))=1/2, so 𝒖 is indeed uniformly distributed. Then Pr[𝒙𝒖]2nε, so Δ(𝒙,𝒖)2nε.

Note that each bit 𝒂i is close to being balanced:

Pr[𝒂i=1]=1(12m)r=1(1/e+Θ(2m))ln2=1/2+Θ(2m).

As all 𝒂i are independent, we can use Lemma 9 to get sharp bounds on their sum being exactly m/2: Pr[E]Pr𝒙{0,1}m[|𝒙|=m/2]Θ(m2m)=Ω(1/m).

2.3 Proof of Lemma 8

We need to show that for every I[n] of size k and for every α{0,1}I we have (1ε)2kPr[𝒙I=α](1+ε)2k. We now classify the bits of I and α. Let Ii[m]×[r] for i[m] be the set of bits of 𝑨i in I. Let J{0,1}m be the set of bit indices of 𝒑 that belong to I (we identify the indices with their bit representations). Let αi{0,1}Ii and β{0,1}J be the corresponding parts of α.

Since 𝑨1,,𝑨m are uniformly distributed it suffices to show that

(1ε)2|J|Pr[𝒑J=βi[m]:𝑨Iii=αi](1+ε)2|J|.

Let Jm/2:={sJ|s|=m/2}. Intuitively the only non-uniformity in 𝒙I is introduced when 𝒂Jm/2 as this is the only case where 𝒑 is changed from uniform. We make this intuition precise in the following claim.

Claim 10.

For any event E that is a function of 𝑨1,,𝑨m we have

(1Pr[𝒂Jm/2E])2|J|Pr[𝒑J=βE](1+Pr[𝒂Jm/2E])2|J|.
Proof.

Let Ji:={jJm/2βj=i} for i{0,1}. By the total probability law we get

Pr[𝒑J=βE]= Pr[𝒑J=βE𝒂J0]Pr[𝒂J0E]
+Pr[𝒑J=βE𝒂J1]Pr[𝒂J1E]
+2|J|Pr[𝒂Jm/2E] (2)
=  0+2(|J|1)Pr[𝒂J1E]+2|J|Pr[𝒂Jm/2E] (3)
=  2|J|(Pr[𝒂Jm/2E]+2Pr[𝒂J1E]). (4)

In (2) and (3) we use that given 𝒂 the event E is independent from 𝒑. Since (4) is minimized when J1= and maximized when J1=Jm/2, we have the claim.

Now let E be the event “i[m]:𝑨Iii=αi”. Let us compute Pr[𝒂=sE] for s{0,1}m. Since sJm/2 we have |s|=m/2, wlog let s=0m/21m/2. Since the bits of 𝒂 denoted by 𝒂1,,𝒂m are independent and E is a conjunction of independent events we have

Pr[𝒂=sE] =[m/2]Pr[𝒂=0E][m][m/2]Pr[𝒂=1E]
[m/2]Pr[𝒂=0𝑨I=α]

Let us fix [m/2] and bound Pr[𝒂=0𝑨I=α]. By definition 𝒂=Tribes(𝑨), so it equals 0 iff no column of 𝑨 is all-1, in particular all columns that do not contain bits of I must not be all-1. For each of these columns the probability that it is not all-1 is 12m. Since there are at least 2mln2|I| such columns we get

Pr[𝒂=sE] [m/2](12m)2mln2|I|
=(12m)m/22mln2(12m)[m/2]|I|
2m/2(12m)k
2m/2+1

Thus, Pr[𝒂Jm/2E]|J|2m/2+1=k2m/2+1, so we conclude the proof by Claim 10.

2.4 Amplification

In this section we reduce Theorem 1 to Lemma 6. The construction is a simple variation of the majority vote of several instances of f. We prove that our construction indeed amplifies the distinguishing probability in the following lemma.

Lemma 11.

Suppose 𝐱 is distributed over {0,1}n and there exists a function g:{0,1}n{0,1} such that

Pr[g(𝒙)=1]Pr𝒖{0,1}n[g(𝒖)=1]δ,

for some δ depending on n. Let α=(Pr[g(𝐱)=1]+Pr[g(𝐮)=1])/2. Then for t=2/δ2 we have

Pr[i[t]g(𝒙i)tα]Pr[i[t]g(𝒖i)tα]Ω(1),

where 𝐱1,,𝐱t are independent samples of 𝐱 and 𝐮1,,𝐮t{0,1}n.

Proof.

Let px=𝔼[g(𝒙)]. Since 𝔼[i[t]g(𝒙i)]=tpx, we have by Hoeffding inequality,

Pr[i[t]g(𝒙i)αt]=1Pr[i[t]g(𝒙i)<αt]1e2t2(pxα)2/t1e(tδ)2/2t.

Similarly, we can conclude that Pr[i[t]g(𝒖i)αt]e(tδ)2/2t, hence,

Pr[i[t]g(𝒙i)αt]Pr[i[t]g(𝒖i)αt] 12etδ2/2

With t=2/δ2, we conclude the proof. We now need to show that a narrow DNF can check whether i[t]f(xi)αt. In fact, this is true for any monotone function composed with a narrow DNF:

Lemma 12.

Let f:{0,1}n{0,1} be a function that can be computed by a -DNF D. Let g:{0,1}t{0,1} be a monotone function. Then gft(x1,,xt):=g(f(x1),,f(xt)) can be computed by a t-DNF.

Proof.

Since f can be computed by a -DNF, a 1-certificate of f is a satisfying assignment for one term of D, which has size at most . Since g is monotone we can certify that gft(x1,,xt)=1 by giving a 1-certificate that D(xi)=1 for every i[t] where that is the case. Such certificate has size at most t, which implies that gft can be computed by a t-DNF.

Finally, we need to show that independent copies of an (ε,k)-independent distribution comprise an (O(εt),k)-independent distribution:

Lemma 13.

If 𝒟 is (ε,k)-independent, then the product distribution 𝒟t is (O(εt),k)-independent.

Proof.

Suppose 𝒙𝒟. Let 𝒙t𝒟t be t independent copies of 𝒙. Fix I([nt]k) and α{0,1}I. For every i[t], we define Ii and αi to be the positions of I and α respectively in 𝒙i. Then,

Pr[𝒙It=α]=i[t]Pr[(𝒙i)Ii=αi]=i[t]Pr[𝒙Ii=αi].

Since 𝒙 is (ε,k)-independent, for every i[t], (1ε)2|Ii|Pr[𝒙Ii=αi](1+ε)2|Ii|. Hence, for small enough ε:

(12tε)2k2i[t]|Ii|(1ε)tPr[𝒙It=α]2i[t]|Ii|(1+ε)t2k(1+2tε).

Proof of Theorem 1.

Let s be a natural number to be fixed later. Let 𝒟 be the (s1/5,s1/5)-independent distribution in Lemma 6. Let D be the O(log2s)-DNF such that

Pr𝒙𝒟[D(𝒙)=1]Pr𝒖{0,1}s[D(𝒖)=1]=Ω(1/logm).

From Lemma 13, for every t, 𝒟t is (O(ts1/5),s1/5)-independent. By Lemma 11 for φ(x1,,xt):=i=1tD(xi)αt:=(1 if i=1tD(xi)αt, otherwise 0), when t=O(logs),

Pr𝒙t𝒟t[φ(𝒙t)]Pr𝒖t{0,1}st[φ(𝒖t)]=Ω(1).

Moreover, φ can be computed by a O(tlog2s)-DNF from Lemma 12. Choosing t=O(logs) and ts=n we get that there exists a (O(lognn1/5),Ω(n/logn)1/5)-independent distribution 𝒟t over {0,1}n that can be Ω(1)-distinguished from the uniform by a O(log3n)-DNF, which implies the claim.

2.5 Variation: Tradeoff between width and error

We finally sketch an extension of our construction that gives a tradeoff between DNF width and ε.

Theorem 14.

For any wΩ(logn) there exists a function fw:{0,1}n{0,1} computable by a wO(1)-DNF and an (nΩ(w),nΩ(1))-independent distribution 𝒟 over {0,1}n such that

Pr𝒙𝒟[fw(𝒙)]Pr𝒙{0,1}n[fw(𝒙)]Ω(1).
Proof sketch.

We define a “monotone xor” of the functions Addr as follows: g:({0,1}m)w×({0,1}2m)w{0,1} where g(a1,,aw,p1,,pw):=pa11paww if |a|=wm/2, if |a|wm/2 the value of g is 1 iff |a|>wm/2. The distinguisher fw is then defined by hiding the bits of a in Tribes instances:

fw(A1,,Amw,p1,,pw):=g(Tribes(A1),,Tribes(Amw),p1,,pw).

We sample 𝒙 from the distribution 𝒟 in two steps:

  1. 1.

    Sample 𝒙=(𝑨1,,𝑨mw,𝒑1,,𝒑w) uniformly at random.

  2. 2.

    If for 𝒂=Tribesmw(𝑨) it happens that |𝒂|=wm/2 and g(𝒂,𝒑)=0, we flip a random bit among 𝒑𝒂11,,𝒑𝒂ww.

The Ω(1/mw)-distinguishability of 𝒟 from the uniform distribution by fw is shown analogously to Lemma 7. Then according to Section 2.4 we increase the width of the DNF by the factor O(mw) to get a Ω(1)-distinguisher. The result then follows by choosing the appropriate constants in Ω and big-O.

Now we show the (nΩ(w),nΩ(1))-independence for fw: analogously to Claim 10 one can show that to establish that 𝒟 is (O(ε),k)-independent it suffices to bound Pr[𝒂1J1𝒂wJw𝑨I=α] as O(ε) for J1,,Jw[2m] and I([m]×[2mln2])mw such that |J1|++|Jw|+|I|k. Now for every j=(j1,,jw)J1××Jw such that |j|=mw/2 we have analogously to Lemma 8 Pr[𝒂=j𝑨I=α]2mw/2+w as long as |I|2mln2. Assuming that |J|k2m/4=nΩ(1) we get that i[w]|Ji|2mw/4 and therefore ε2mw/4+w=nΩ(w).

3 Local couplings

3.1 Couplings fool decision lists: Proof of Theorem 3

Let T1,,TM be the k-terms in the decision list defining f. It is sufficient to show that for L(x):=min{i[M]Ti(x)=1} we have Pr[L(𝒙)L(𝒚)]2kε. We show that Pr[L(𝒙)L(𝒚)] and Pr[L(𝒚)L(𝒙)] are both high and conclude the statement from that. Let us show Pr[L(𝒙)L(𝒚)]1kε using that (𝒙,𝒚) is an ε-semi-coupling. Denoting supp(Ti)[n] the set of input bits mentioned in the term Ti we write

Pr[L(𝒙)L(𝒚)] =i[N]Pr[L(𝒙)iL(𝒚)=i]Pr[L(𝒚)=i]
i[N]Pr[Ti(𝒙)=1L(𝒚)=i]Pr[L(𝒚)=i]
i[N]Pr[𝒙supp(Ti)=𝒚supp(Ti)L(𝒚)=i]Pr[L(𝒚)=i]
i[N]Pr[L(𝒚)=i](1jsupp(Ti)Pr[𝒙j𝒚jL(𝒚)=i])

In order to conclude that Pr[L(𝒙)L(𝒚)]1kε it suffices to show that Pr[𝒙j𝒚jL(𝒚)=i]ε. This follows from the total probability law:

Pr[𝒙j𝒚jL(𝒚)=i]=y:L(y)=iPr[𝒚=y]Pr[𝒙j𝒚j𝒚=y]ε.

Now the same argument shows that since (𝒚,𝒙) is an ε-semi-coupling we have Pr[L(𝒙)L(𝒚)]1kε. We conclude Theorem 3 by the union bound.

3.2 Surjectivity fools decision lists

Aaronson [2] refuted the GLN conjecture by considering the following distribution:

Definition 15.

For every n=m22m, let N=m2m. Define 𝒟n (or simply 𝒟 when n is clear from the context) as the distribution of 𝐱=(𝐱1,,𝐱N)({0,1}m)N generated as follows:

  1. 1.

    Sample 𝒙=(𝒙1,,𝒙N)({0,1}m)N.

  2. 2.

    Sample 𝒚{0,1}m.

  3. 3.

    For each i[N], let 𝒙i:=𝒙i if 𝒙i𝒚, otherwise 𝒙i is sampled uniformly from {0,1}m{𝒚}.

Aaronson proved the following.

Theorem 16 ([2]).

For every n=m22m, 𝒟 is (k2m+1,k)-wise independent for all k2m1. Moreover, there is a depth-3 AC0 circuit C:{0,1}n{0,1} of size O(n2) such that

|Pr𝒖{0,1}n[C(𝒖)=1]Pr𝒙𝒟[C(𝒙)=1]|Ω(1).

We prove that Aaronson’s counterexample, however, cannot refute GLN conjecture for more restricted models, even decision lists.

Lemma 17.

For every n=m22m and decision list L:{0,1}n{0,1} of width k,

|Pr𝒖{0,1}n[L(𝒖)=1]Pr𝒙𝒟[L(𝒙)=1]|2klog2n/n.
Proof.

Let 𝒙,𝒙 be as in Definition 15. Note that 𝒙𝒟,𝒙{0,1}n. By Theorem 3, it suffices to show 𝒙 is log2n/n=2m-coupled with 𝒙.

By definition, we need to show (𝒙,𝒙) and (𝒙,𝒙) are 2m-semi-couplings. The former directly follows from Definition 15: for every x{0,1}n and i[N],

Pr[𝒙i𝒙i𝒙=x]=Pr[𝒙i=𝒚𝒙=x]=2m.

Regarding the latter, fix any xsupp(𝒟), i[N]. For each y{0,1}mIm(x) we have

Pr[𝒙i𝒙i𝒙=x𝒚=y] =Pr[𝒙i=y𝒙=x𝒚=y]
=Pr[𝒙i=y𝒙i=xi𝒚=y] (5)
=Pr[𝒙i=y𝒙i=xi𝒚=y]Pr[𝒙i=xi𝒚=y]
=(2m1)12m(2m1)1=2m.

Crucially (5) holds since given 𝒚=y random variables {(𝒙j,𝒙j)}j[N] are independent from each other. We conclude by the total probability law:

Pr[𝒙i𝒙i𝒙=x]=y{0,1}mIm(x)Pr[𝒚=y𝒙=x]Pr[𝒙i𝒙i𝒙=x,𝒚=y]=2m.

3.3 Semi-couplings do not fool DNFs

In this section we give an example of a semi-coupling (𝒙,𝒖) where 𝒖{0,1}n such that 𝒙 can be distinguished from 𝒖 by a polylogarithmic-width DNF. First, observe that we can interpret the definition of 𝒙 in Definition 5 as a coupling with the uniform distribution: we sample 𝑨1,,𝑨m,𝒑 uniformly and then modify 𝒑 in the location 𝒂=Tribes(𝑨1),,Tribes(𝑨m). With 𝒑 being the state of 𝒑 before the change, that defines some coupling between 𝒙 and the uniformly distributed 𝑨1,,𝑨m,𝒑. This, however, is not a semi-coupling, since if we fix 𝑨1,,𝑨m to some value such that |𝒂|=m/2 and fix 𝒑 such that 𝒑𝒂=0, then 0=𝒑𝒂𝒑𝒂=1 with probability 1.

We modify the distribution from Definition 5 by replacing each bit of 𝐩 with an instance of Tribes.

Lemma 18.

There exists a n0.6-semi-coupling (𝐱,𝐮) with 𝐮{0,1}n and an O(log2n)-DNF that Ω(log1/2n)-distinguishes 𝐱 from 𝐮.

Proof.

Consider the smallest m such that m22mln2+2m22mln2n. We define the coupling as follows:

  1. 1.

    Sample 𝑨=𝑨1,,𝑨m({0,1}m×2mln2)m uniformly.

  2. 2.

    Sample 𝑷=𝑷1,,𝑷2m({0,1}2m×22mln2)2m uniformly.

  3. 3.

    Take 𝑸=𝑷.

  4. 4.

    Define 𝒂{0,1}m by 𝒂i=Tribes(𝑨i) for each i[m].

  5. 5.

    If |𝒂|=m/2, choose 𝒋[22mln2] and force 𝑸,𝒋𝒂:=1 for each [2m].

Local coupling.

We claim that 𝒙:=(𝑨,𝑸) is 22m-semi-coupled with 𝒖:=(𝑨,𝑷). Fix some Asupp(𝑨) and Psupp(𝑷). Then for bits of 𝒙 that correspond to 𝑨 the coupling condition is trivially satisfied as these bits are shared with 𝒖. The remaining bits are indexed by a{0,1}m, i[2m], j[22mln2], we need to bound the probability:

Pr[𝑷i,ja𝑸i,ja𝑨=A𝑷=P] =Pr[𝑸i,jaPi,ja𝑨=A𝑷=P]

If |a|m/2 or a(Tribes(A1),,Tribes(Am)), then this probability is 0 since (5) is not invoked and 𝑷=𝑸. If |a|=m/2 and a=(Tribes(A1),,Tribes(Am)) we have

Pr[𝑸i,jaPi,ja𝑨=A𝑷=P]Pr[𝒋=j]=1/22mln222mn0.6.
Distinguishability.

We take the distinguishing function F from Lemma 7 and define the new distinguisher F:supp(𝑨)×supp(𝑷){0,1} as

F(A1,,Am,P1,,P2m):=F(A1,,Am,Tribes(P1),,Tribes(P2m)).

Let E be the event “|𝒂|=m/2”. As in Lemma 7 we observe that Pr[𝑷=𝑸¬E]=1, so Pr[F(𝑨,𝑷)=1¬E]=Pr[F(𝑨,𝑸)=1¬E]. By the construction of 𝑸 and F we have Pr[F(𝑨,𝑸)=1E]=1. On the other hand

Pr[F(𝑨,𝑷)=1E] =Pr[F(𝑨,(Tribes(𝑷1),,Tribes(𝑷2m)))=1E]
(by Lemma 9) Pr𝒙{0,1}2m[F(𝑨,𝒙)=1E]+O(22m2m)
(analogous to Lemma 7) 1/2+O(2m)2/3.

Formally, to show the last inequality, we will do the following:

Pr𝒙{0,1}2m[F(𝑨,𝒙)=1E] =Pr𝒙{0,1}2m[𝒙𝒂=1E]
=a{0,1}mPr[𝒙a=1E𝒂=a]Pr[𝒂=aE]
=a{0,1}mPr[𝒙a=1]Pr[𝒂=aE]=12.

Then as shown in Lemma 7 Pr[E]=Ω(1/m). All together this gives us that F Ω(1/m)-distinguishes 𝒙 and 𝒖.

It remains to observe that the 1-certificate complexity of F is at most O(m2): to the certificate of F in Claim 4 we add the certificate that Tribes(Pj)=1 where j=(Tribes(A1),,Tribes(Am)). Thus there exists a DNF of width O(m2) that computes F.

In order to get the Ω(1)-distinguishability we follow the amplification in Section 2.4:

Theorem 19.

There exists a 1/n-semi-coupling (𝐱,𝐮) where 𝐮{0,1}n and a O(log3n)-width DNF that Ω(1)-distinguishes 𝐱 from 𝐮.

Proof.

The proof is identical to the one of Theorem 1. Take 𝒙 over {0,1}s that is s0.6-semi-coupled with 𝒖{0,1}s, then the random variable 𝒙 comprised of t=O(logs) independent copies of 𝒙, 𝒙=𝒙1,,𝒙t is s0.6-semi-coupled with t independent copies of 𝒖, 𝒖=𝒖1,,𝒖t. On the other hand by Lemma 12 and Lemma 11 there exists an O(tlog2s)=O(log3n)-DNF that Ω(1)-distinguishes 𝒙 and 𝒖. Since s0.6n1/2 we get the claim.

References

  • [1] Scott Aaronson. BQP and the Polynomial Hierarchy. In 42nd ACM Symposium on Theory of Computing, STOC, pages 141–150. ACM, 2010. doi:10.1145/1806689.1806711.
  • [2] Scott Aaronson. A Counterexample to the Generalized Linial-Nisan Conjecture. CoRR, abs/1110.6126, 2011. arXiv:1110.6126.
  • [3] Noga Alon, Oded Goldreich, Johan Håstad, and René Peralta. Simple Constructions of Almost k-wise Independent Random Variables. Random Structures and Algorithms, 3(3):289–304, 1992. doi:10.1002/rsa.3240030308.
  • [4] Louay Bazzi. Polylogarithmic independence can fool DNF formulas. SIAM Journal on Computing, 38(6):2220–2272, 2009. doi:10.1137/070691954.
  • [5] Andrej Bogdanov, Krishnamoorthy Dinesh, Yuval Filmus, Yuval Ishai, Avi Kaplan, and Akshayaram Srinivasan. Bounded Indistinguishability for Simple Sources. In 13th Innovations in Theoretical Computer Science Conference, ITCS, volume 215 of LIPIcs, pages 26:1–26:18. Schloss Dagstuhl, 2022. doi:10.4230/LIPIcs.ITCS.2022.26.
  • [6] Andrej Bogdanov, Yuval Ishai, Emanuele Viola, and Christopher Williamson. Bounded Indistinguishability and the Complexity of Recovering Secrets. In 36th Advances in Cryptology, CRYPTO, pages 593–618. Springer, 2016. doi:10.1007/978-3-662-53015-3_21.
  • [7] Mark Braverman. Poly-logarithmic Independence Fools Bounded-Depth Boolean Circuits. Communications of the ACM, 54(4):108–115, 2011. doi:10.1145/1924421.1924446.
  • [8] Benny Chor, Oded Goldreich, Johan Hasted, Joel Freidmann, Steven Rudich, and Roman Smolensky. The Bit Extraction Problem or t-resilient Functions. In 26th Annual Symposium on Foundations of Computer Science, FOCS. IEEE, 1985. doi:10.1109/sfcs.1985.55.
  • [9] Anindya De, Omid Etesami, Luca Trevisan, and Madhur Tulsiani. Improved Pseudorandom Generators for Depth 2 Circuits. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 504–517. Springer Berlin Heidelberg, 2010. doi:10.1007/978-3-642-15369-3_38.
  • [10] Mika Göös, Pritish Kamath, Toniann Pitassi, and Thomas Watson. Query-to-Communication Lifting for PNP. Computational Complexity, 28(1):113–144, 2019.
  • [11] Mika Göös, Artur Riazanov, Anastasia Sofronova, and Dmitry Sokolov. Top-Down Lower Bounds for Depth-Four Circuits. In 64th Annual Symposium on Foundations of Computer Science, FOCS, pages 1048–1055. IEEE, 2023. doi:10.1109/FOCS57990.2023.00063.
  • [12] Johan Håstad, Stasys Jukna, and Pavel Pudlák. Top-Down Lower Bounds for Depth 3 Circuits. In 34th Annual Symposium on Foundations of Computer Science, FOCS, pages 124–129. IEEE Computer Society, 1993. doi:10.1109/SFCS.1993.366875.
  • [13] Pooya Hatami and William Hoza. Paradigms for Unconditional Pseudorandom Generators. Foundations and Trends in Theoretical Computer Science, 16(1–2):1–210, 2024. doi:10.1561/0400000109.
  • [14] William Hoza. Fooling Near-Maximal Decision Trees. Technical report, ECCC, 2025. URL: https://eccc.weizmann.ac.il/report/2025/003/.
  • [15] Johan Håstad, Benjamin Rossman, Rocco Servedio, and Li-Yang Tan. An Average-Case Depth Hierarchy Theorem for Boolean Circuits. Journal of the ACM, 64(5), 2017. doi:10.1145/3095799.
  • [16] Nathan Linial and Noam Nisan. Approximate inclusion-exclusion. Combinatorica, 10(4):349–365, 1990. doi:10.1007/BF02128670.
  • [17] Xin Lyu. Improved Pseudorandom Generators for AC0 Circuits. In 37th Computational Complexity Conference, CCC, volume 234 of LIPIcs, pages 34:1–34:25. Schloss Dagstuhl, 2022. doi:10.4230/LIPIcs.CCC.2022.34.
  • [18] Joseph Naor and Moni Naor. Small-Bias Probability Spaces: Efficient Constructions and Applications. SIAM Journal on Computing, 22(4):838–856, 1993. doi:10.1137/0222053.
  • [19] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
  • [20] Ran Raz and Avishay Tal. Oracle separation of BQP and PH. In 51st ACM Symposium on Theory of Computing, STOC, pages 13–23. ACM, 2019. doi:10.1145/3313276.3316315.
  • [21] Alexander Razborov. A Simple Proof of Bazzi’s Theorem. ACM Transactions Computational Theory, 1(1):3:1–3:5, 2009. doi:10.1145/1490270.1490273.
  • [22] Avishay Tal. Tight Bounds on the Fourier Spectrum of AC0. In 32nd Computational Complexity Conference, CCC, pages 15:1–15:31. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPICS.CCC.2017.15.
  • [23] Mark Zhandry. Toward Separating QMA from QCMA with a Classical Oracle. In 16th Innovations in Theoretical Computer Science Conference, ITCS, volume 325 of LIPIcs, pages 95:1–95:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPICS.ITCS.2025.95.