Abstract 1 Introduction 2 Preliminaries 3 Characterizing 𝒌-wise probable uniformity 4 Constructing 𝒌-wise probably uniform generators 5 Implications of 𝒌-wise probable uniformity 6 Hitting sets for systems of equations over 𝔽𝟐 and for 𝑩𝟐-circuits References

Fooling Near-Maximal Decision Trees

William M. Hoza ORCID Department of Computer Science, The University of Chicago, IL, USA Zelin Lv ORCID Department of Computer Science, The University of Chicago, IL, USA
Abstract

For any constant α>0, we construct an explicit pseudorandom generator (PRG) that fools n-variate decision trees of size m with error ε and seed length (1+α)log2m+O(log(1/ε)+loglogn). For context, one can achieve seed length (2+o(1))log2m+O(log(1/ε)+loglogn) using well-known constructions and analyses of small-bias distributions, but such a seed length is trivial when m2n/2. Our approach is to develop a new variant of the classic concept of almost k-wise independence, which might be of independent interest. We say that a distribution X over {0,1}n is k-wise ε-probably uniform if every Boolean function f that depends on only k variables satisfies 𝔼[f(X)](1ε)𝔼[f]. We show how to sample a k-wise ε-probably uniform distribution using a seed of length (1+α)k+O(log(1/ε)+loglogn).

Meanwhile, we also show how to construct a set H𝔽2n such that every feasible system of k linear equations in n variables over 𝔽2 has a solution in H. The cardinality of H and the time complexity of enumerating H are at most 2k+o(k)+polylogn, whereas small-bias distributions would give a bound of 22k+O(log(n/k)).

By combining our new constructions with work by Chen and Kabanets (TCS 2016), we obtain nontrivial PRGs and hitting sets for linear-size Boolean circuits. Specifically, we get an explicit PRG with seed length (1Ω(1))n that fools circuits of size 2.99n over the U2 basis, and we get a hitting set with time complexity 2(1Ω(1))n for circuits of size 2.49n over the B2 basis.

Keywords and phrases:
almost k-wise independence, decision trees, pseudorandom generators
Category:
RANDOM
Copyright and License:
[Uncaptioned image] © William M. Hoza and Zelin Lv; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Pseudorandomness and derandomization
Related Version:
Full Version: https://eccc.weizmann.ac.il/report/2025/003/ [28]
Acknowledgements:
We thank Avishay Tal for valuable comments on a draft of this paper and for a discussion about the Fourier spectra of decision trees. We thank Frederic Koehler for pointing out the connection with Huber’s contamination model. We thank Alicia Torres Hoza for helpful comments on drafts of this paper. Zelin Lv thanks Aaron Potechin for valuable discussions.
Editors:
Alina Ene and Eshan Chattopadhyay

1 Introduction

How many coin flips does it take to sample n bits that appear random from the perspective of an observer who only looks at 0.9n of the bits?

1.1 Almost 𝒌-wise uniformity and 𝒌-wise probable uniformity

Almost k-wise uniformity is a well-studied concept that provides one possible way of formalizing the question posed above.

Definition 1 (Almost k-wise uniformity).

Let X be a distribution over {0,1}n, let k[n], and let ε[0,1]. We say that X is ε-almost k-wise uniform if, for every size-k set S[n], the total variation distance between XS and Uk is at most ε. Here XS denotes the projection of X to the coordinates in S, and Uk denotes the uniform distribution over {0,1}k. If ε=0, we simply say that X is k-wise uniform. An (ε-almost) k-wise uniform generator is a function G:{0,1}s{0,1}n such that G(Us) is (ε-almost) k-wise uniform. We refer to s as the seed length of G.

When k(12+Ω(1))n and ε=0, Karloff and Mansour showed that every k-wise uniform generator has seed length at least nO(1) [31], which might be disappointing. On the bright side, the seed length can be improved if a small positive error (ε>0) is permitted. Using a connection with “small-bias distributions” [39], Alon, Goldreich, Håstad, and Peralta constructed an explicit111We consider a generator G to be explicit if G(x) can be computed in poly(n) time, given the parameters (in this case n, k, and ε) and the seed x. ε-almost k-wise uniform generator with seed length k+O(log(k/ε)+loglogn) [6]. Notably, their seed length is meaningful even for large k such as k=0.9n.

In this work, we introduce a new variant of almost k-wise uniformity, called k-wise probable uniformity, which strengthens Definition 1. There are two equivalent definitions, described below.

Definition 2 (k-wise probable uniformity).

Let X be a distribution over {0,1}n, let k[n], and let ε[0,1]. We say that X is k-wise ε-probably uniform if it satisfies either of the following two equivalent conditions.

  1. 1.

    For every size-k set S[n], there exists a distribution E over {0,1}k such that the distribution XS can be written as the mixture distribution XS(1ε)Uk+εE. That is, the distribution XS is identical to the following distribution: With probability 1ε, sample a k-bit string uniformly at random, and with probability ε, sample a string according to E.

  2. 2.

    For every k-junta222A k-junta is a function f that depends on at most k variables. f:{0,1}n{0,1}, we have 𝔼[f(X)](1ε)𝔼[f], where 𝔼[f] is a shorthand for 𝔼[f(Un)].

(See Section 3 for a proof that the two conditions above are equivalent.) We say that G:{0,1}s{0,1}n is a k-wise ε-probably uniform generator if G(Us) is k-wise ε-probably uniform.

We find the first condition above to be more conceptually appealing. It is clearly a strengthening of ε-almost k-wise uniformity, and it inspires the terminology “k-wise ε-probably uniform.” On the other hand, we find the second condition above to be easier to work with mathematically.

The concept of k-wise probable uniformity is motivated primarily by an application to fooling decision trees, which we will discuss momentarily, but we also consider it to be an interesting concept in its own right. Using a standard nonconstructive argument (see the full version of this paper [28, §4.4]), one can show that there exists a non-explicit k-wise ε-probably uniform generator with seed length333Throughout this paper, log() denotes the base-two logarithm.

k+logk+2log(1/ε)+loglog(n/k)+O(1). (1)

The challenge is to construct an explicit generator.

Classic results regarding small-bias generators [39, 6] imply that there is an explicit k-wise ε-probably uniform generator with seed length 2k+O(logk+log(1/ε)+loglogn).444If X is k-wise γ-biased, then X is k-wise (γ2k)-probably uniform (see Lemmas 17 and 18). Alon, Goldreich, Håstad, and Peralta construct an explicit k-wise γ-biased generator with seed length 2log(1/γ)+O(logk+loglogn) [6]. Choose γ=ε2k. However, this seed length is unsatisfactory, because it is trivial when kn/2. Meanwhile, Bshouty used a different approach (the method of conditional probabilities with pessimistic estimators) to construct a generator G:{0,1}s{0,1}n such that

(1ε)𝔼[f]𝔼[f(G(Us))](1+ε)𝔼[f]

for every Boolean k-junta f [15], which is even stronger than Definition 2. Furthermore, his generator’s seed length matches Equation 1. However, his generator’s time complexity is more than (nk)2k [15]. His generator can therefore be considered “explicit” only when k=O(1), whereas we are primarily interested in the case k=Θ(n).

In this work, we present an explicit k-wise ε-probably uniform generator with seed length (1+α)k+O(log(1/ε)+loglogn), where α is an arbitrarily small positive constant and the constant hiding under the big-O depends on α.

Theorem 3 (Explicit k-wise probably uniform generator).

For every n,k and ε(0,1), there exists an explicit k-wise ε-probably uniform generator G:{0,1}s{0,1}n with seed length

s=k+O(k2/3log1/3(k/ε)+log(1/ε)+loglogn).

The simpler seed length bound (1+α)k+O(log(1/ε)+loglogn) follows from Theorem 3 by the weighted AM-GM inequality.

1.2 Fooling decision trees

Instead of modeling the observer as a k-junta, we can consider the more powerful model of depth-k decision trees. A decision tree T makes queries to the input x and then produces a Boolean output value T(x). The crucial feature of the decision tree model is that the tree can adaptively decide which variable to query next, based on the results of previous queries. (See Definition 11 for a precise definition.) Consequently, the output T(x) of a depth-k decision tree T might depend on all n variables even if kn. The problem of sampling bits that “appear random” to depth-k decision trees can be formalized using the concept of a pseudorandom generator (PRG).

Definition 4 (PRGs).

Let X be a distribution over {0,1}n, let f:{0,1}n{0,1}, and let ε(0,1). We say that X fools f with error ε if

|𝔼[f(X)]𝔼[f]|ε.

We say that G:{0,1}s{0,1}n is a pseudorandom generator (PRG) that fools f with error ε if G(Us) fools f with error ε. The parameter s is called the seed length of the PRG. If is a class of functions f:{0,1}n{0,1}, we say that X (respectively G) fools with error ε if X (respectively G) fools every f with error ε.

Almost k-wise uniformity is the special case of Definition 4 in which we take to be the class of all Boolean k-juntas. The aforementioned concept of small-bias distributions is another special case. By definition, a distribution X is k-wise γ-biased if it fools all functions of the form f(x)=iSxi, where S[n] and |S|k, with error γ/2 [39].

To fool decision trees, one could try using a generic small-bias generator. This approach works extremely well in the nonadaptive setting, as mentioned previously. In the adaptive setting, the approach still works fairly well, but it turns out that the parameters are worse. Specifically, Kushilevitz and Mansour’s analysis [34] implies that if X is k-wise γ-biased, then X fools depth-k size-m decision trees with error γm. Every depth-k decision tree has size at most 2k, so we can choose γ=ε2k. By combining this reduction with one of Alon, Goldreich, Håstad, and Peralta’s k-wise γ-biased generators [6], one can construct an explicit PRG that fools depth-k decision trees with error ε and seed length 2k+O(log(k/ε)+loglogn). This seed length is sufficient for many purposes, but we emphasize that it gives us nothing nontrivial for trees of depth kn/2.

In this paper, we show how to improve the leading constant from 2 to 1+α for any constant α>0, as a consequence of our new k-wise ε-probably uniform generator. More generally, we prove the following.

Theorem 5 (Fooling near-maximal decision trees).

Let n,m and ε(0,1). There exists an explicit PRG G:{0,1}s{0,1}n that fools n-variate decision trees of size m with error ε and seed length

s=logm+O(log2/3mlog1/3(logmε)+log(1/ε)+loglogn).

Observe that our PRG is meaningful even for trees of near-maximal size such as m=20.9n. Furthermore, it turns out that Theorem 5 extends to the more powerful model of size-m “subcube partitions.” See Section 5 for further details.

1.3 A hitting set for systems of equations over 𝔽𝟐

We also study a certain linear-algebraic variant of k-wise uniformity. We prove the following.

Theorem 6 (Hitting set for systems of equations over 𝔽2).

For every n,k, there exists H𝔽2n such that:

  1. 1.

    For every A𝔽2k×n and every bimage(A), there exists xH such that Ax=b.

  2. 2.

    Given the parameters n and k, the set H can be enumerated in time T (and hence |H|T), where T=2k+O((klogklogn)2/3+logn).

We should compare Theorem 6 to what one can get by using a small-bias distribution. One can show that if X is n-wise γ-biased, then |Pr[AX=b]Pr[AUn=b]|γ [34, 9]. If bimage(A), then Pr[AUn=b]2k by the rank-nullity theorem. Therefore, if we choose γ<2k, the set H:=Supp(X) satisfies Item 1 of Theorem 6. Plugging in one of Alon, Goldreich, Håstad, and Peralta’s γ-biased generators [6] would give us |H|22k+O(log(n/k)). Essentially, Theorem 6 improves the coefficient of k in the exponent from 2o(1) to 1+o(1), although our dependence on n is worse.

Andreev, Baskakov, Clementi, and Rolim previously claimed to prove a similar theorem, with a bound of |H|2k+O(nklogn) [9]. This would be incomparable to Theorem 6: better when kn and worse when kn. However, there seems to be a mistake in their analysis.555Andreev, Baskakov, Clementi, and Rolim partition the variables into blocks, x=(x1,,xs), and they say that the condition Ax=b can be written as a conjunction of conditions A1x1=b1,,Asxs=bs [9, Appendix B, preprint version]. But this is not true in general.

1.4 Applications: Pseudorandomness for linear-size Boolean circuits

Our results are motivated by applications in the area of circuit complexity. We consider circuits over the “B2” and “U2” bases. A B2-circuit is a circuit in which each gate computes an arbitrary function ϕ:{0,1}2{0,1}. A U2-circuit is the same, except that gates are not permitted to compute the XOR function or its complement. Chen and Kabanets used “gate elimination” methods to establish, among other results, close connections between linear-size circuits and near-maximal decision trees [19]:

  • Every U2-circuit of size (3α)n can be simulated by a decision tree of size 2(1Ω(α2))n [19].

  • Every B2-circuit of size (2.5α)n can be simulated by a parity decision tree666A “parity decision tree” is defined like an ordinary decision tree, except that in each step, the tree can query to learn the parity of any subset of the variables, instead of querying just a single variable. of size 2(1Ω(α2))n [19].

They posed the problem of designing PRGs that fool general Boolean circuits [19]. By combining their simulations with our constructions, we are able to solve their problem, at least in part. First of all, we get a PRG that fools U2-circuits of size (3α)n:

Corollary 7 (Fooling circuits over the U2 basis).

For every n and α(0,3), there exists an explicit PRG G:{0,1}s{0,1}n that fools n-variate U2-circuits of size (3α)n with error n2Ω(α6n) and seed length s=(1Ω(α2))n.

Proof.

By Chen and Kabanets’ work [19], we know every U2-circuit of size (3α)n can be simulated by a decision tree of size 2(1cα2)n for some constant c>0. By Theorem 5, we can fool such a tree with error 2cα6nn and seed length

(1cα2)n+O(n2/3(cα6n)1/3+cα6n)=ncα2n+O(cα2n).

This is nΩ(α2n) provided we choose c to be a sufficiently small constant based on c.

Second, we consider B2-circuits. We have not managed to construct a genuine PRG that fools B2-circuits, but we can at least use Theorem 6 to construct a hitting set for B2-circuits. A hitting set is a relaxation of a PRG, defined as follows.

Definition 8.

Let H{0,1}n, let be a class of functions f:{0,1}n{0,1}, and let ε(0,1). We say that H is an ε-hitting set for if, for every f such that 𝔼[f]>ε, there exists xH such that f(x)=1.

Corollary 9 (A hitting set for circuits over the B2 basis).

For every n and α(0,2.5), there exists a value ε=2Ω(α2n) and a set H{0,1}n such that:

  1. 1.

    H is an ε-hitting set for B2-circuits of size (2.5α)n.

  2. 2.

    Given the parameters n and α, the set H can be enumerated in time 2(1Ω(α2))n+O~(n2/3).

(The proof of Corollary 9 is in Section 6.)

1.4.1 Discussion

In general, the main motivation behind PRGs is that many algorithms and protocols rely on a large number of random bits, but producing truly random bits can sometimes be difficult or expensive. We think of randomness as a computational resource, similar to time or space. We try to use as little “true randomness” as possible to sample bits that are “random enough” to run randomized algorithms and protocols without distorting their behavior. With this motivation in mind, we believe that the problem of fooling U2-circuits is extremely natural.

The PRG of Corollary 7 is the first of its kind.777To be fair, we should compare Corollary 7 to a different and rather trivial approach that one could use to construct PRGs that fool circuits. In general, if h:{0,1}n1{0,1} is average-case hard for circuits of size cn, then the generator G(x)=(x,h(x)) maps n1 bits to n bits and fools circuits of size cn. Similarly, the generator G(x,y)=(x,y,h(x),h(y)) maps n2 bits to n bits and fools circuits of size (c/2)n, where n=2n. One can similarly try G′′(x,y,z)=(x,y,z,h(x),h(y),h(z)), etc. One can instantiate this approach with known average-case hardness results for circuits over the U2 basis or the full binary basis [19, 23]. However, the PRGs that can be constructed using this approach have seed length nO(1). The seed length is what makes Corollary 7 interesting. If α is constant, then our PRG has linear stretch. Note that the challenge of constructing PRGs that fool Boolean circuits is strictly harder than the notorious challenge of proving circuit lower bounds. In more detail, suppose that one could construct a poly(m)-time computable PRG G:{0,1}βm1{0,1}m that fools U2-circuits of size cm with error 0.49, where β(0,1] and c>1 are constants. Let n=βm, and define G:{0,1}n1{0,1}n by truncating the output of G. The indicator function for the image of G would be an example of a function in 𝖭𝖯 that cannot be computed by U2-circuits of size (c/β)n. Currently, the best lower bound known on the size of U2-circuits computing some function in 𝖭𝖯 is (5o(1))n [30].

Hitting sets are commonly used to solve the so-called “GAP-SAT” problem for , i.e., the problem of distinguishing the case f0 from the case 𝔼[f]>ε, given f. Indeed, if H is an ε-hitting set for , then we can solve GAP-SAT for by computing xHf(x). In this context, we should compare Corollary 9 to prior circuit analysis algorithms. Savinov designed a SAT algorithm for B2-circuits of size m with time complexity O(20.389667m) [46, 35], improving prior work by Nurk [41]. Golovnev, Kulikov, Smal, and Tamaki designed a #SAT algorithm for B2-circuits of size 2.99n with time complexity 2(1Ω(1))n [23], improving a result by Chen and Kabanets [19]. These prior algorithms solve problems that are harder than GAP-SAT, and furthermore they can handle circuits that are larger than what Corollary 9 can handle. However, Corollary 9 is superior to these prior results in one respect, namely, we can solve GAP-SAT even if we only have query access to the circuit in question. Note that the “black box” nature of hitting sets is crucial in some applications. For example, Cheng and Hoza showed that optimal explicit hitting sets for space-bounded computation would imply 𝖫=𝖡𝖯𝖫, whereas it remains an open problem to prove 𝖫=𝖡𝖯𝖫 if we merely assume the existence of an optimal GAP-SAT algorithm for space-bounded computation [20, 43].

1.5 Overview of our new constructions

1.5.1 Our 𝒌-wise probably uniform generator (Theorem 3)

The starting point of our construction is the well-known sampling properties of pairwise uniform hash functions. Let f:{0,1}n{0,1} be any nonzero k-junta, or more generally any function such that 𝔼[f]2k. If we sample a hash function h:{0,1}k+O(log(1/ε)){0,1}n from a pairwise uniform family, then with high probability over the choice of h, we have

𝔼x[f(h(x))](1ε)𝔼[f].

(This follows from Chebyshev’s inequality.)

We can think of h as a PRG with an excellent seed length. The only trouble is that sampling h itself is expensive. In general, sampling a hash function h:{0,1}q{0,1} from a pairwise uniform family costs Θ(q+) truly random bits, so in our case, the cost is Θ(n+log(1/ε)) truly random bits, which is much more than we can afford.

We can slightly decrease the cost of sampling h by composing with a γ-almost k-wise uniform generator, where γε2k, with seed length =O(k+log(1/ε)+loglogn). Such a generator fools f with error γ, which is negligible. Now the output length of h is decreased from n down to , hence the cost of sampling h is “only” O(k+log(1/ε)+loglogn). However, this cost is still more than we can afford.

To explain how we bring the cost down to o(k), for simplicity’s sake, let us assume that ε=1/poly(k) and let us neglect loglogn terms. We can assume without loss of generality that f is simply a conjunction of k literals, because every k-junta can be written as a sum of such functions. Our approach is to pseudorandomly partition the n coordinates into r=Θ~(k1/3) buckets: [n]=B1Br. In expectation, each bucket contains k/r of the k relevant variables. With high probability, each bucket has at most k0 of the variables, where k0=k/r+O~(k/r)=k/r+O~(k1/3).

We can write f(x)=f1(x)fr(x), where fi(x) only depends on variables in Bi, so fi is a k0-junta. We sample a hash function h:{0,1}k0+O(logk){0,1}n such that with high probability over the choice of h, we have

𝔼x[fi(h(x))](11poly(k))𝔼[fi].

For each bucket Bi independently, we sample x at random and put h(x) in Bi. Crucially, we reuse the same hash function h for all of the buckets, which is justified by a simple union bound. The cost of sampling h is O(k0)=O~(k2/3) truly random bits, and the cost of sampling the x values is

r(k0+O(logk))=k+O~(k2/3).

A more careful calculation, also taking into account the cost of sampling the partition [n]=B1Br, leads to the seed length bound that appears in Theorem 3.

Observe that in this construction, there are some “bad events” that occur with probability roughly ε, namely, we might get a “bad” partition of the variables into buckets or we might get a “bad” hash function h. Let B be the union of these bad events. To analyze the impact of these bad events, let X be the output distribution of our generator and let f be an arbitrary Boolean k-junta. Then

𝔼[f(X)]=Pr[B]𝔼[f(X)B]()+Pr[¬B]𝔼[f(X)¬B].

The quantity marked () is certainly nonnegative, which allows us to prove 𝔼[f(X)](1ε)𝔼[f]. On the other hand, note that the quantity marked () might be much larger than 𝔼[f], and hence we are not able to prove an upper bound of the form 𝔼[f(X)](1+ε)𝔼[f]. Thankfully, such an upper bound is not necessary for our applications.

1.5.2 Our hitting set for systems of equations over 𝔽𝟐 Theorem 6)

The first step of the proof of Theorem 6 is to apply a rank condenser due to Forbes and Guruswami [22]. This allows us to assume without loss of generality that kΩ(n/logn). The next step is to partition the variables into t equal-sized blocks, each containing n/t variables, where tn2/3. This induces a partition of the columns of A: A=[A1A2At]. Let ki be the contribution of Ai to the rank of A, so k1++ktk. A lemma by Andreev, Clementi, and Rolim says that if H is a hitting set for systems of equations in n/t variables, then there is some xHk1××Hkt such that Ax=b [10]. We construct H for every by a simple brute-force algorithm, which we can afford because the number of variables is small, and then we output the union of Hk1××Hkt over all possible partitions k=k1++kt.

1.6 Limitations of 𝒌-wise 𝜸-biased generators

A great deal of effort has been spent trying to optimize the constant factors in the seed lengths of small-bias generators [39, 5, 6, 12, 15, 51, 13]. Researchers have also developed many sophisticated techniques for proving that small-bias generators fool various models of computation; see Hatami and Hoza’s survey for a few examples [27]. The reader might reasonably wonder whether one could have proven our results by simply improving known constructions or analyses of k-wise γ-biased distributions. We prove that the answer is no. In more detail, in the full version of this paper [28, §6], we present examples showing that if the support of every k-wise γ-biased distribution is a 0.49-hitting set for U2-circuits of size 2n, then k23n and γO(2n/2). Then, we observe that Karloff and Mansour’s work [31] can be extended to prove the following lower bound on the seed length of k-wise γ-biased generators in the regime k(12+Ω(1))n.

Theorem 10 (Seed length lower bound for k-wise γ-biased generators).

Let G:{0,1}s{0,1}n be a k-wise γ-biased generator, where k=(1/2+α)n for some α(0,1/2]. Then

smin{n,2log(1/γ)}log(1/α)O(1).

Consequently, if one tries using a generic k-wise γ-biased generator to hit U2-circuits of size 2n, then the seed length will inevitably be at least nO(1). Thus, the concept of k-wise γ-biased distributions is inherently too weak to prove Corollaries 7 and 9. In turn, this implies that the concept of k-wise γ-bias is also too weak to prove our main results (Theorems 3, 5, and 6), of which Corollaries 7 and 9 are applications.888Our results are actually quantitatively stronger in various respects; see the full version of this paper [28] for details.

For context, a sequence of prior works [44, 21, 4, 6, 2, 3, 15] has shown that every k-wise γ-biased generator G:{0,1}s{0,1}n has seed length at least

min{log((nk/2)),2log(1/γ)+loglog((nk/2))loglog(1/γ)}O(1). (2)

Equations 2 and 10 are incomparable in general, but our new Theorem 10 is superior in the parameter regime in which we are interested. In particular, if γ=O(2n/2) and k=cn for a constant 1/2<c<1, then the prior bound Equation 2 is (1Ω(1))n, whereas our new Theorem 10 gives a bound of nO(1).

1.7 Related work

1.7.1 Approximate forms of 𝒌-wise uniformity

Prior researchers have studied several different ways of quantifying what it means for a distribution X over {0,1}n to be “approximately” k-wise uniform.

  • We could require that the total variation distance between XS and Uk is at most ε for every size-k set S[n]. This is the definition of an ε-almost k-wise uniform distribution (Definition 1). See, for example, work by Naor and Naor [39] and work by Alon, Goldreich, Håstad, and Peralta [6].

  • We could require that |Pr[iSXi=1]Pr[iSXi=0]|ε for every nonempty set S[n] of size at most k [39]. This is the definition of a k-wise ε-biased distribution. See, for example, the works mentioned above [39, 6].

  • We could require that the distance between XS and Uk is at most ε for every size-k set S[n]. See, for example, work by Alon, Goldreich, Håstad, and Peralta [6] and work by Bshouty [15].

  • We could require that X is ε-close in total variation distance to some exactly k-wise uniform distribution X. See, for example, work by Alon, Goldreich, and Mansour [7]; work by Alon, Andoni, Kaufman, Matulef, Rubinfeld, and Xie [3]; and work by O’Donnell and Zhao [42].

Despite the attention paid to all of the above variations, we seem to be the first to study the concept of k-wise probable uniformity.

1.7.2 Huber’s contamination model

Our notion of “probable uniformity” is similar to Huber’s contamination model in the theory of robust statistics [29]. A key difference is that in Huber’s model, contamination is applied to an unknown distribution, whereas in a k-wise probably uniform distribution, every k coordinates are distributed according to a contaminated version of the uniform distribution.

1.7.3 Universal sets

A set H{0,1}n is called k-universal if, for every size-k-set S[n] and every z{0,1}k, there exists xH such that xS=z. The concept of k-universal sets has been studied in many prior works going back more than half a century [32, 18, 52, 1, 48, 11, 5, 39, 40, 14]. The best explicit construction, due to Naor, Schulman, and Srinivasan [40], has cardinality 2k+O(log2k)logn. Our constructions were inspired by Naor, Schulman, and Srinivasan’s universal set construction [40].

The notion of k-wise probable uniformity can be considered a strengthening of k-universality, because if X is k-wise probably uniform, then the support of X is k-universal. Consequently, Theorem 3 implies the existence of an explicit k-universal set with cardinality 2k+O~(k2/3)polylogn, but this is inferior to Naor, Schulman, and Srinivasan’s construction [40].999A k-universal set H is typically considered “explicit” if the entire set can be computed in poly(|H|) time. Our set has stronger explicitness guarantees, which might possibly be of value, but note that Naor, Schulman, and Srinivasan already constructed a k-universal set of cardinality 2k+o(k)logn with similar explicitness guarantees [40]. Our k-wise uniform generator also has similarities with a recent construction of a “biased” variant of universal sets by Harel, Hoza, Vardi, Evron, Srebro, and Soudry [26].

Similarly, the set H of Theorem 6 is k-universal, because the condition xS=z can be expressed as a system of k equations. Once again, the cardinality of this set is greater than the cardinality of Naor, Schulman, and Srinivasan’s universal set [40].

1.7.4 PRGs based on pseudorandom partitions of the variables

The trick of pseudorandomly partitioning the variables into buckets is not new; similar tricks have been used in many prior PRG constructions. For a few examples that are especially similar to our work, see work by Meka and Zuckerman [38], work by Lovett, Reingold, Trevisan, and Vadhan [36], and work by Gopalan, Kane, and Meka [25].

1.7.5 Correlation bounds for general circuit models

In general, PRGs are intimately related to correlation bounds, aka average-case hardness. Loosely speaking, correlation bounds are a prerequisite to designing PRGs. See, e.g., Hatami and Hoza’s survey [27, Chapter 4] for further discussion. Chen and Kabanets proved the first correlation bounds for general, unbounded-depth circuit models [19], and our PRG for U2-circuits uses their work, as mentioned previously. Golovnev, Kulikov, Smal, and Tamaki subsequently proved better correlation bounds [23].

2 Preliminaries

2.1 Decision tree models

Below we record the standard definitions of a decision tree and parity decision trees.

Definition 11 (Decision trees).

An n-variate decision tree is a rooted tree T in which each internal node is labeled with a variable from among x1,,xn; each internal node has two outgoing edges labeled 0 and 1; and each leaf is labeled either 0 or 1. The tree T computes a Boolean function T:{0,1}n{0,1} defined inductively as follows. If T consists of a single leaf labeled b{0,1}, then we define T(x)b. Otherwise, let xi be the variable labeling the root node. Given an input x{0,1}n, we start at the root node and traverse the outgoing edge labeled with the value xi. This leads to a vertex u, which is the root of a subtree T. Then we set T(x)=T(x). The depth of the tree is the length of the longest path from the root to a leaf. The size of the tree is the total number of leaves.

Definition 12 (Parity decision trees [34]).

A parity decision tree on variables x1,,xn is a rooted tree T defined exactly as in Definition 11, except that each internal node is labeled by a non-empty subset S[n]. The internal node queries iSxi and has two outgoing edges labeled 0 and 1 corresponding to the value of that parity. Leaves are labeled by output bits in {0,1}, and evaluation proceeds exactly as for ordinary decision trees. The depth of a parity decision tree is the length of the longest root-to-leaf path, and its size is the number of leaves. Equivalently, a parity decision tree computes a Boolean function

f(x1,,xn)=T(iS1xi,,iSmxi),

where T is an ordinary decision tree on m inputs and S1,,Sm[n]. computation.

2.2 Pairwise uniform hashing

We rely on the standard notion of a pairwise uniform hashing, aka “strongly universal hashing,” introduced in Carter and Wegman’s seminal papers [16, 53].

Definition 13 (Pairwise uniform families of hash functions).

A family of hash functions h:{0,1}q{0,1} is called pairwise uniform if, for every two distinct x,x{0,1}q, if we sample h, then (h(x),h(x)) is distributed uniformly at random over {0,1}2.

Theorem 14 (Explicit pairwise uniform families of hash functions).

For every q,, there exists an explicit101010That is, given a seed x{0,1}O(q+) and an input y{0,1}q, the value hx(q) can be computed in poly(q,) time, where hx is the hash function corresponding to the seed x. pairwise uniform family of hash functions h:{0,1}q{0,1} such that h can be sampled using a seed of length O(q+).

For example, if we define ha,b(x)=ax+b, where is convolution mod 2 and + is bitwise XOR, then {ha,b:a{0,1}q+1 and b{0,1}} is a pairwise uniform family [37]. The reason pairwise uniform hashing is useful for us is given by the following relative-error sampling lemma.

Lemma 15 (Pairwise uniformity sampling lemma).

Let be a pairwise uniform family of hash functions h:{0,1}q{0,1}. Let f:{0,1}{0,1} and let μ=𝔼[f]. Then for every ε(0,1),

Prh[h fools f with error εμ]112qε2μ.
Proof.

For each x{0,1}q, define Zx=f(h(x)), so Zx is a random variable based on the choice of h. Then 𝔼[Zx]=μ and Var[Zx]=μ(1μ)μ. Furthermore, the variables Zx are pairwise independent. Therefore, if we let Z=xZx, then 𝔼[Z]=2qμ and Var[Z]2qμ. Therefore, by Chebyshev’s inequality, we have

Pr[|Z2qμ|εμ2q]Var[Z]ε2μ222q12qε2μ.

2.3 Small-bias distributions

We also rely on asymptotically optimal constructions of k-wise γ-biased generators, which were defined in Section 1.2.

Theorem 16 (Explicit k-wise γ-biased generators [39]).

For every n,k and every γ(0,1), there exists an explicit k-wise γ-biased generator G:{0,1}s{0,1}n with seed length O(log(k/γ)+loglogn).

The reason k-wise γ-biased generators are useful for us is that they satisfy the following two properties.

Lemma 17 (Small-bias generators fool juntas and conjunctions of literals [39, 6]).

Let X be a k-wise γ-biased distribution over {0,1}n. Then X is ε-almost k-wise uniform, where ε=γ2k/2. Furthermore, X fools every conjunction of at most k literals with error γ.

3 Characterizing 𝒌-wise probable uniformity

The following proposition shows the equivalence of three ways of defining k-wise probably uniform distributions.

Proposition 18 (Equivalence of three definitions of k-wise probable uniformity).

Let X be a distribution over {0,1}n, let k[n], and let ε[0,1]. Then the following are equivalent.

  1. 1.

    For every k-junta f:{0,1}n{0,1}, we have 𝔼[f(X)](1ε)𝔼[f].

  2. 2.

    For every size-k set S[n] and every z{0,1}k, we have Pr[XS=z](1ε)2k.

  3. 3.

    For every size-k set S[n], there exists a distribution E over {0,1}k such that one can sample from XS by sampling from Uk with probability 1ε and sampling from E with probability ε.

Proof.
  • (12) Consider the function f(x)=1xS=z.

  • (23) If ε=0, then for every x{0,1}k, we have Pr[XS=x]2k, which implies that XS is exactly uniform over {0,1}k. If ε>0, define p:{0,1}k by the formula

    p(x)=Pr[XS=x](1ε)2kε.

    Then p(x) is a probability mass function: it is nonnegative because Pr[XS=x](1ε)2k, and it sums to 1 because XS is a probability distribution. Let E be corresponding probability distribution.

  • (31) If f is a k-junta, then there is some set S[n] of size k and some function g:{0,1}k{0,1} such that f(x)=g(xS) for all x{0,1}n. Therefore,

    𝔼[f(X)]=𝔼[g(XS)]=(1ε)𝔼[g(Uk)]+ε𝔼[g(ES)](1ε)𝔼[f].

By definition, if X satisfies any of the three equivalent conditions in Proposition 18, then X is k-wise ε-probably uniform. The third condition in Proposition 18 motivates the name “k-wise probably uniform,” but we find it more mathematically convenient to work with the first two conditions.

4 Constructing 𝒌-wise probably uniform generators

In this section, we present our new k-wise probably uniform generator, thereby proving Theorem 3. At the end of this section, for completeness’ sake, we record the standard nonconstructive proof of the existence of nonexplicit k-wise probably uniform generators with excellent seed lengths.

4.1 A small family of generators, each with a good seed length

As a first step, we begin by constructing a family of generator 𝒢, such that for any k0-junta f, most generators g𝒢 satisfy (1ζ)𝔼[f]𝔼x[f(g(x))](1+ζ)𝔼[f]. This construction is based on a combination of pairwise uniform hash functions and k-wise γ-biased generators.

Lemma 19 (Family of generators).

For every n,k0 and ζ(0,1), there exists an explicit family 𝒢 of PRGs g:{0,1}q{0,1}n satisfying the following.

  1. 1.

    A generator g𝒢 can be sampled using O(k0+log(1/ζ)+loglogn) truly random bits.

  2. 2.

    Each generator g in 𝒢 has seed length q=k0+O(log(1/ζ)).

  3. 3.

    If f:{0,1}n{0,1} is a k0-junta with expectation 𝔼[f]=μ, then

    Prg𝒢[g fools f with error ζμ]1ζ.
Proof.

Let Gsb:{0,1}{0,1}n be a k-wise γ-biased generator where γ=(ζ/3)23k0/2 and

=O(k0+log(1/ζ)+loglogn).

Let be a pairwise uniform family of hash functions h:{0,1}q{0,1}. For each hash function h in , we define a generator g(x)=Gsb(h(x)). By Theorems 14 and 16, this family is explicit and 𝒢 can be sampled using O(k0+log(1/ζ)+loglogn) truly random bits.

For the correctness proof, define f:{0,1}{0,1} by f(y)=f(Gsb(y)) and let μ=𝔼[f]. The generator Gsb fools f with error γ2k0/2 (see Lemma 17), so |μμ|ζ/3μ. Furthermore, μ2k0 unless f0, so μ2k01. Therefore, by the pairwise uniformity sampling lemma (Lemma 15), we have

Prh[h fools f with error (ζ/3)μ]192qζ2μ1182k02qζ21ζ,

provided we choose a suitable value q=k0+O(log(1/ζ)). Now fix an h such that the bad event above does not occur, and let g be the corresponding generator in 𝒢, i.e., g(x)=Gsb(h(x)). Then g fools f with error

(ζ/3)μ+|μμ|μ(ζ/3)(2+ζ/3)ζμ.

4.2 Pseudorandomly partitioning the coordinates into buckets

In this subsection, we explain how to pseudorandomly partition the coordinates into buckets, [n]=B1Br, such that no single bucket gets too many of the k coordinates we care about. To be more precise, we construct a balanced partition generator, defined as follows.

Definition 20 (Balanced partition generator [38]).

A (k,k0,δ)-balanced partition generator is a function Gvars:{0,1}a[r]n such that for every set S[n] with |S|k, with probability at least 1δ over a uniform random choice of seed x{0,1}a, for every bucket j[r], we have |{iS:Gvars(x)i=j}|k0.

Definition 20 is due to Meka and Zuckerman, who used the term “balanced hash family” [38, Definition 4.9]. We use the term “balanced partition generator” to avoid confusion with the hash functions that appear in the proof of Lemma 19. Our balanced partition generator will essentially consist of a d-wise γ-biased generator for appropriate values d and γ. The analysis will be based on the following bound on the moments of a sum of independent Bernoulli random variables [47].111111The exact statement of Theorem 21 does not appear in Schmidt, Siegel, and Srinivasan’s work [47], but it follows from the proof of item “(III)” in their “Theorem 4.”

Theorem 21 (Moment bound for a sum of independent Bernoulli random variables [47]).

Let X1,,Xk be independent {0,1}-valued random variables. Let X=i=1kXi, let μi=𝔼[Xi], and let μ=i=1kμi. Then for every even positive integer t, we have

𝔼[(Xμ)t]max{tt,(tμ)t/2}.

Theorem 21 can be improved in some parameter regimes [49], but the simple bound in Theorem 21 suffices for our purposes. Using Theorem 21, we now present a tail bound for sums of random variables that satisfy a certain “near t-wise independence” condition. Similar bounds were proven in several previous papers [36, 17, 50], and our proof is almost identical to their proofs.

Corollary 22 (Tail bound for sums of nearly t-wise independent random variables).

Let X1,,Xk be {0,1}-valued random variables and let μ1,,μk[0,1]. Let X=i=1kXi and μ=i=1kμi. Let t be an even positive integer, let γ(0,1), and assume that for every set S[k] with |S|t, we have

|𝔼[iSXi]iSμi|γ.

Then for every Δ>0, we have

Pr[|Xμ|Δ](tΔ)t+(μtΔ)t+γ(2kΔ)t.
Proof.

See the full version of this paper [28, §4.2].

Given Corollary 22, we are ready to construct our balanced partition generator.

Lemma 23 (Balanced partition generator).

Let n,k,r and δ(0,1). Assume r is a power of two and rkn. There exists an explicit (k,k0,δ)-balanced partition generator Gvars:{0,1}a[r]n, where

k0=k/r+O(k/rlog(r/δ)+log(r/δ)),

with seed length

a=O(log(r/δ)log(2rklog(r/δ))+loglogn).
Proof.

Identify [r]n with {0,1}nlogr. We let Gvars be a (tlogr)-wise γ-biased generator for appropriate values t=log(3r/δ) and γ=δ3r(trk)t/2. The seed length bound follows from Theorem 16. For the correctness proof, assume without loss of generality that |S|=k. Sample Z[r]n using the generator. Fix any bucket j[r]. For each iS, let Xi indicate whether Zi=j. Then for any set TS with |T|t, the value iTXi can be expressed in terms of the underlying bits of Z as a conjunction of at most tlogr literals. Therefore, by Lemma 17, we have |𝔼[iTXi]r|T||γ. Therefore, by Corollary 22, for every Δ>0, we have

Pr[iSXik/r+Δ](tΔ)t+(kt/rΔ)t+γ(2kΔ)t.

We choose Δ=max{2t,2kt/r}. Then we get

Pr[iSXik/r+Δ] 2t+2t+γ(rkt)t
δ3r+δ3r+δ3r

due to our choices of t and γ. The union bound over r buckets completes the proof.

For comparison, Lovett, Reingold, Trevisan, and Vadhan constructed an explicit (k,k0,δ)-balanced partition generator for the special case k=Θ(rlog(1/δ)), with k0=O(k/r) and seed length a=O(logn+log(r/δ)log(rlog(1/δ))) [36]. For any k, one can also use Gopalan, Kane, and Meka’s PRG for Fourier shapes [25] to construct a (k,k0,δ)-balanced partition generator with the same value of k0 as in Lemma 23 and with seed length a=O~(log(n/δ)).

4.3 The full 𝒌-wise probably uniform generator

Proof of Theorem 3.

Let Gvars:{0,1}a[r]n be the (k,k0,δ)-balanced partition generator from Lemma 23 with parameters δ=ε/3 and r=(k/log(k/ε))1/3, or to be more precise, r is the largest power of two that is at most (k/log(k/ε))1/3. Let 𝒢 be the family of generators g:{0,1}q{0,1}n from Lemma 19, using ζ=ε/(3r) and using the value k0 from Gvars. The final generator G is defined as follows.

  1. 1.

    Sample a partition Z=(Z1,,Zn)[r]n using Gvars.

  2. 2.

    Sample a generator g𝒢.

  3. 3.

    Sample seeds X(1),,X(r){0,1}q independently and uniformly at random.

  4. 4.

    Output Y{0,1}n, where

    Yi=g(X(Zi))i

    for every i[n].

To prove that this works, let f:{0,1}n{0,1} be a conjunction of k literals, say

f(x)=iS(xibi)

where S[n], |S|=k, and bi{0,1} for every iS. We will prove that 𝔼[f(X)](1ε)2k, which is sufficient by Proposition 18.

For each bucket j[r], let Bj=Z1(j). The definition of a balanced partition generator ensures that except with probability ε/3 over the choice of Z, we have |SBj|k0 for every j[r]. Let E1 be this “good” event. Fix any choice of Z such that E1 occurs.

For each j[r], define fj:{0,1}n{0,1} by

fj(x)=iSBj(xibi),

so f(x)=f1(x)fr(x). By Lemma 19 and the union bound over the r buckets, except with probability ε/3 over the choice of g𝒢, we have

𝔼x{0,1}q[fj(g(x))](1ε3r)𝔼[fj]

for every j[r]. Let E2 be this “good” event. Fix any choice of g such that E2 occurs.

For any such fixing of Z and g, with respect to the choice of X(1),,X(r) alone, we have

𝔼X(1),,X(r)[f(Y)] =j=1r𝔼X(j)[fj(g(X(j)))]j=1r(1ε3r)𝔼[fj]
=(1ε3r)r2k(1ε/3)2k,

by Bernoulli’s inequality. Therefore, with respect to all the randomness, we have

𝔼[f(Y)]Pr[f(Y)=1 and E1 and E2] =Pr[E1]Pr[E2E1]Pr[f(Y)=1E1,E2]
(1ε/3)(1ε/3)(1ε/3)2k
(1ε)2k

by another application of Bernoulli’s inequality.

Now let us bound the seed length. By Lemma 23, the cost of sampling Z is

O(log(r/ε)log(2rklog(r/ε))+loglogn)
O(log(k/ε)log(2klog(k/ε))+loglogn)
O(log(k/ε)(klog(k/ε))2/3+log(k/ε)+loglogn)
=O(k2/3log1/3(k/ε)+log(k/ε)+loglogn).

Furthermore, the parameter k0 is given by

k0=k/r+O(k/rlog(r/ε)+log(r/ε))k/r+O(k/rlog(k/ε)+log(k/ε)).

Therefore, by Lemma 19, the cost of sampling g𝒢 is

O(k0+log(k/ε)+loglogn)
=O(k2/3log1/3(k/ε)+k1/3log2/3(k/ε)+log(k/ε)+loglogn)
=O(k2/3log1/3(k/ε)+log(k/ε)+loglogn).

Finally, the cost of sampling X(1),,X(r) is

rq =rk0+O(rlog(k/ε))
=k+O(k2/3log1/3(k/ε)+k1/3log2/3(k/ε)+log(k/ε))
=k+O(k2/3log1/3(k/ε)+log(k/ε)).

5 Implications of 𝒌-wise probable uniformity

In this section, we will show that every k-wise probably uniform distribution fools decision trees. In fact, we will show that such distributions fool a more general model, called the subcube partition model.

Definition 24 (The subcube partition model).

A subcube partition f is a collection of terms f1,,fm and values b1,,bm{0,1}. Each term fi:{0,1}n{0,1} is a conjunction of literals, and the sets f11(1),,fm1(1) must partition the domain {0,1}n. That is, for every x{0,1}n, we have i=1mfi(x)=1. The subcube partition computes the function f:{0,1}n{0,1} defined by

f(x)=i=1mfi(x)bi.

The width of a term fi is the number of literals in the term. The width of the subcube partition is the maximum width of any term. The size of the subcube partition is the number of terms (m).

Every width-k subcube partition has size at most 2k, because 1=i=1m𝔼[fi]m2k. A decision tree of depth k and size m can be simulated by a subcube partition of width k and size m: for each leaf u, we construct a term fu that indicates whether the tree reaches the leaf u on a given input. The converse does not hold. In fact, there exist subcube partitions of width k that cannot be simulated by decision trees of depth k1.99 [45, 33, 24, 8]. We now explain why k-wise probably uniform generators fool subcube partitions.

Lemma 25 (k-wise probable uniformity fools subcube partitions).

Let X be a distribution over {0,1}n that is k-wise ε-probably uniform. Then:

  • X fools width-k subcube partitions (hence also depth-k decision trees) with error ε.

  • X fools size-m subcube partitions (hence also size-m decision trees) with error ε+m2(k+1).

Proof.

Let f:{0,1}n{0,1} be a function computed by a subcube partition with terms f1,,fm and values b1,,bm. Let S[m] be the set of terms of width at most k. We will show that X fools f with error ε+iS𝔼[fi]. To prove it, sample R{0,1}n uniformly at random. Then

𝔼[f(X)] =i=1mbi𝔼[fi(X)]iSbi𝔼[fi(X)]iSbi(1ε)𝔼[fi]
=(1ε)𝔼[iSbifi(R)]𝔼[iSbifi(R)]ε
=𝔼[f(R)iSbifi(R)]ε𝔼[f]iS𝔼[fi]ε.

Now we bound the expectation from above. Let f¯=1f. Since f¯ can also be computed by a subcube partition with the same terms f1,,fm, we have

𝔼[f(X)]=1𝔼[f¯(X)]1𝔼[f¯]+ε+iS𝔼[fi]=𝔼[f]+ε+iS𝔼[fi].

The lemma follows, because 𝔼[fi]2(k+1) whenever iS.

By combining Theorem 3 (our k-wise probably uniform generator) with Lemma 25, we now prove the following theorem, which generalizes Theorem 5.

Theorem 26 (Fooling near-maximal subcube partitions).

Let n,m and ε(0,1). There exists an explicit PRG G:{0,1}s{0,1}n that fools n-variate subcube partitions of size m with error ε and seed length

s=logm+O(log2/3mlog1/3(logmε)+log(1/ε)+loglogn). (3)
Proof.

We use our k-wise (ε/2)-probably uniform generator, where k=logm+log(2/ε). By Lemma 25, the generator fools size-m subcube partitions with error ε/2+m2k=ε. By Theorem 3, the seed length is

k+O(k2/3log1/3(k/ε)+log(1/ε)+loglogn),

which, after substituting the choice of k, simplifies to Equation 3.

6 Hitting sets for systems of equations over 𝔽𝟐 and for 𝑩𝟐-circuits

In this section, we present our hitting set for systems of equations over 𝔽2, thereby proving Theorem 6. Next, we show that such hitting sets can hit circuits over the B2 basis, thus proving Corollary 9. In the full version of this paper [28], we also present a more explicit construction of hitting set generators for systems of equations over 𝔽2, where given the seed, we can output the corresponding string in time poly(n).

6.1 Rank condenser

First, we use a rank condenser, due to Forbes and Guruswami [22] to “condense” the number of variables from n to O(klogn).

Definition 27 (k-rank condenser).

Let 𝔽 be a field and let nk1. A collection of matrices 𝔽n×n is a k-rank condenser if, for every matrix A𝔽k×n with rank(A)=k, there exists M such that rank(AM)=k.

We say that is explicit if, given an index i[||], the i-th matrix of can be constructed in time poly(n).

 Remark 28.

Stronger “lossless” variants – which bound how many matrices in can cause rank loss or how much total rank loss can occur – have been studied (see, e.g., Forbes and Guruswami [22]). The simpler notion above suffices for our purposes.

The following theorem, due to Forbes and Guruswami, shows that we can construct such condensers explicitly over 𝔽2 while keeping the output dimension only O(klogn).

Theorem 29.

Let nk1. There is an explicit k-rank condenser 𝔽2n×4klogn with ||=poly(n).

Proof.

This follows from Forbes and Guruswami’s work [22, Corollary 8.7, preprint version], by setting the parameters appropriately.

6.2 Partition the variables

For the sake of brevity, we define the following notation.

Definition 30.

Let H𝔽2n. We say that H hits codimension k if, for every affine subspace of codimension k, there exists xH in this affine subspace. Equivalently, for every A𝔽2k×n and every bimage(A), there exists xH such that Ax=b.

Our goal is to construct an H that hits codimension k. We can split the n variables into consecutive blocks of arbitrary size. For any A𝔽2k×n, this induces a column partition, giving a column partition A=[A1A2At], where A𝔽2ni and n1++nt=n. Without loss of generality, we assume that A has full rank. Write ki for the incremental rank contributed by block Ai, i.e., ki=rank([A1A2Ai])rank([A1A2Ai1]), so k1++kt=k. Andreev, Clementi and Rolim [10] stated the result that, if Hi𝔽2ni hits codimension ki for every i, then there is some x in the Cartesian product H1×H2××Ht such that Ax=b.

However, they skipped the proof, so we complete the proof in this subsection. We began by showing that after fixing the first i1 blocks, the feasible assignments to the i-th block form an affine subspace of codimension ki. In the following, we focus on the case of partitioning A into two blocks, which will turn out to be sufficient for analyzing the general case.

Lemma 31.

Let 𝔽 be a field. Let A1𝔽k×n1 and A2𝔽k×n2. Let bimage([A1A2]), and define V={y𝔽2n1|z𝔽2n2 such that A1y+A2z=b}. Then V is an affine space with codimension rank([A1A2])rank(A2).

Proof.

Since bimage([A1A2]), we know there exists (y,z) such that A1y+A2z=b. Let W=A11(image(A2)). We claim that V=W+y. Indeed, if yW+y, then yyW, so there is some z such that A1(yy)=A2z. Hence

A1y+A2(zz)=A1y+A2z=b,

so yV. Conversely, if yV, then there is some z such that A1y+A2z=b, and consequently

A1(yy)=bA2zA1y=A2(zz),

showing that yyW, i.e. yW+y.

Now we are going to show that codim(W)=rank([A1A2])rank(A2). Let b1,,bs be a basis of W. Extend this to a basis b1,,bn1 of 𝔽n1, and set U=span(bs+1,,bn1), so 𝔽n1=U+W. Because ker(A1)W, the map A1 is injective on U. Hence dim(A1U)=dim(U)=codim(W). On the other hand, dim(A1U)=rank([A1A2])rank(A2).

Corollary 32 ([10]).

Let A=[A1A2At]𝔽k×n be a matrix of rank k, where each block Ai𝔽k×ni and n1++nt=n. For every i[t], let ki=rank([AiA2At])rank([Ai+1A2At]). For each i[t], let Hi𝔽2ni, and assume Hi hits codimension ki. Then for every bimage(A), there exists a vector x in the Cartesian product

H1×H2××Ht𝔽2n

such that Ax=b.

Proof.

We prove it by induction on . In the base case, when =1, the corollary follows immediately from the definition of hitting rank k1. Now suppose >1. Define A>1=[A2A3At], and define

V={y𝔽2n1:z𝔽2nn1 such that A1y+A>1z=b}.

By Lemma 31, V is an affine space with codimension k1. Therefore, there exists yH1V. By the definition of V, bA1yimage(A>1). Therefore, by induction, there exists zH2××Ht such that A>1z=bA1y. Let x=(y,z). Then xH1××Ht, and Ax=A1y+A>1z=b.

6.3 Brute-force construction

There is a simple brute-force method for constructing a set that hits codimension k withthe following time complexity.

Lemma 33 (Brute-force hitting set for systems of equations).

For every n,k, there exists H𝔽2n of size 2k+O(log(nk)) that hits codimension k, which can be constructed in time O(nk2kn+n+2k).

The proof of Lemma 33 can be found in the full version of this paper [28]. On its own, the algorithm of Lemma 33 is too slow to prove Theorem 6. However, we will only apply the brute-force method after reducing the length of binary strings we are searching, so we can afford the exponential time cost. Note the size of our construction matches that of the hitting set obtained by the standard probabilistic method. This method is similar to Naor, Schulman, and Srinivasan’s work [40].

6.4 Our final hitting set for systems of equations

Proof of Theorem 6.

Without loss of generality, we assume that klogn; otherwise, we can simply use a small-bias distribution as described in the paragraph following the statement of Theorem 6. First we use Theorem 29 to construct a k-rank condenser 𝔽2n×4klogn, where ||=poly(n). Then we partition the variables into t blocks of equal size, where tk2/3 (the exact value will be specified later). Without loss of generality, we assume that n/t is an integer. For each i{0,1,,n/t}, we use Lemma 33 to construct Hi𝔽2n/t that hits codimension i, as defined in Definition 30. Then we combine them by taking a Cartesian product. Thus, the overall construction is

H=k1,,ktk1++kt=k{Mx:M and xHk1××Hkt}.

We first prove the construction is efficient (Item 2) and then prove the construction is correct (Item 1).

(Item 2). By Lemma 33, we know the size of each Hi is |Hi|=O(nti2i), and the total running time to construct these hitting sets over n/t variables is i=1n/tO(2n/t+i(n/t+2)nti)O(2n/t+(n/t)2+2(n/t)(nt)3)=2O((n/t)2).

For one partition of k, namely k1++kt=k, we have

|Hk1××Hkt| i=1tki2ki+O(log(nt))=2tO(log(nt))+i=1tkii=1tki
2t(O(lognt)+logk)+k2O(tlogk)+k.

Since each 0kik, the total number of partitions k1,,kt is at most (k+1)t=2tlog(k+1)2O(tlogk). Thus,

|H| ||2tO(logk)2O(tlogk)+k
2O(logn)+O(tlogk)+k.

Thus, the time used by our algorithm is at most

2O(logn+tlogk+(klogn)2/t2)+k.

By choosing t=O(k2log2nlogk)1/3, we get the time used by our algorithm to be

2k+O((klognlogk)2/3+logn).

(Item 1). Now consider any A𝔽2k×n and any bimage(A). Without loss of generality, we assume that rank(A)=k. By Theorem 29, we know there is an M such that rankA=rankAM, and we denote A:=AM. Since image(AM)image(A) and dim(image(AM))=rank(AM)=rank(A)=dim(image(A)), we have image(AM)=image(A), thus bimage(AM) as well. By partitioning A into t blocks of size k×nt, we get

A=[A1A2At].

Let ki=rank([A1A2Ai])rank([A1A2Ai1]). Thus, by Corollary 32, we know that there exists an xHk1××Hkt, such that Ax=b, which means A(Mx)=b and MxH.

6.5 Hitting set for 𝑩𝟐-circuits

In this subsection, we will show that this hitting set can be used for B2 circuits.

Corollary 34 (Restatement of Corollary 9).

For every n and α(0,2.5), there exists a value ε=2Ω(α2n) and a set H{0,1}n such that:

  1. 1.

    H is an ε-hitting set for B2-circuits of size (2.5α)n.

  2. 2.

    Given the parameters n and α, the set H can be enumerated in time 2(1Ω(α2))n+O~(n2/3).

Proof.

Assume without loss of generality that the queries on every root-to-leaf path in f are linearly independent. First we note that any parity decision tree f can be written as f=f1++f, where each fi corresponds to a path from the root to an accepting leaf in f. Note that fi’s are disjoint and each fi is a conjunction of parities function. The number of parity functions in the conjunction is the depth of this leaf. If f is a size-m parity decision tree with 𝔼[f]>ε, then there is some accepting leaf that is reached with probability greater than ε/m. The depth of that leaf must be less than log(m/ε), because otherwise the probability of reaching it would be smaller. Consequently, if H𝔽2n hits codimension log(m/ε), then H is an ε-hitting set for size-m parity decision trees.

By Chen and Kabanets’ work [19], we know every B2-circuit of size (2.5α)n can be simulated by a parity decision tree of size m=2(1cα2)n. Let ε=2c2α2n. Then log(m/ε)=(1(c/2)α2)n. By Lemma 33, a set that hits codimension log(m/ε) can be enumerated in time

2log(m/ε)+O((log(m/ε)loglog(m/ε)logn)2/3+logn)=2(1Ω(α2))n+O~(n2/3).

 Remark 35.

In this proof, we have used the fact hitting parity decision trees is equivalent to hitting system of equations. We further note that hitting DNF of parities is also equivalent to these two problems.

References

  • [1] N. Alon. Explicit construction of exponential sized families of k-independent sets. Discrete Math., 58(2):191–193, 1986. doi:10.1016/0012-365X(86)90161-5.
  • [2] Noga Alon. Perturbed identity matrices have high rank: proof and applications. Combin. Probab. Comput., 18(1-2):3–15, 2009. doi:10.1017/S0963548307008917.
  • [3] Noga Alon, Alexandr Andoni, Tali Kaufman, Kevin Matulef, Ronitt Rubinfeld, and Ning Xie. Testing k-wise and almost k-wise independence. In Proceedings of the 39th Annual Symposium on Theory of Computing (STOC), pages 496–505, 2007. doi:10.1145/1250790.1250863.
  • [4] Noga Alon, László Babai, and Alon Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 7(4):567–583, 1986. doi:10.1016/0196-6774(86)90019-2.
  • [5] Noga Alon, Jehoshua Bruck, Joseph Naor, Moni Naor, and Ron M. Roth. Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs. IEEE Transactions on Information Theory, 38(2):509–516, 1992. doi:10.1109/18.119713.
  • [6] Noga Alon, Oded Goldreich, Johan Håstad, and René Peralta. Simple constructions of almost k-wise independent random variables. Random Structures Algorithms, 3(3):289–304, 1992. doi:10.1002/rsa.3240030308.
  • [7] Noga Alon, Oded Goldreich, and Yishay Mansour. Almost k-wise independence versus k-wise independence. Inform. Process. Lett., 88(3):107–110, 2003. doi:10.1016/S0020-0190(03)00359-4.
  • [8] Andris Ambainis, Martins Kokainis, and Robin Kothari. Nearly Optimal Separations Between Communication (or Query) Complexity and Partitions. In Proceedings of the 31st Conference on Computational Complexity (CCC), pages 4:1–4:14, 2016. doi:10.4230/LIPIcs.CCC.2016.4.
  • [9] Alexander E. Andreev, Juri L. Baskakov, Andrea E. F. Clementi, and José D. P. Rolim. Small pseudo-random sets yield hard functions: New tight explicit lower bounds for branching programs. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP), pages 179–189, 1999. preprint: https://eccc.weizmann.ac.il/report/1997/053/. doi:10.1007/3-540-48523-6_15.
  • [10] Alexander E. Andreev, Andrea E. F. Clementi, and José D. P. Rolim. Efficient constructions of hitting sets for systems of linear functions. In Proceedings of the 14th Annual Symposium on Theoretical Aspects of Computer Science (STACS), pages 387–398, 1997. doi:10.1007/BFb0023475.
  • [11] Bernd Becker and Hans-Ulrich Simon. How robust is the n-cube? Inform. and Comput., 77(2):162–178, 1988. doi:10.1016/0890-5401(88)90056-9.
  • [12] Avraham Ben-Aroya and Amnon Ta-Shma. Constructing small-bias sets from algebraic-geometric codes. Theory Comput., 9:253–272, 2013. doi:10.4086/toc.2013.v009a005.
  • [13] Guy Blanc and Dean Doron. New Near-Linear Time Decodable Codes Closer to the GV Bound. In Proceedings of the 37th Annual Computational Complexity Conference (CCC), pages 10:1–10:40, 2022. doi:10.4230/LIPIcs.CCC.2022.10.
  • [14] Nader H. Bshouty. Testers and their applications [extended abstract]. In Proceedings of the 5th Conference on Innovations in Theoretical Computer Science (ITCS), pages 327–351. ACM, New York, 2014. doi:10.1145/2554797.2554828.
  • [15] Nader H. Bshouty. Derandomizing chernoff bound with union bound with an application to k-wise independent sets, 2016. arXiv:1608.01568.
  • [16] J. Lawrence Carter and Mark N. Wegman. Universal classes of hash functions. J. Comput. System Sci., 18(2):143–154, 1979. doi:10.1016/0022-0000(79)90044-8.
  • [17] L. Elisa Celis, Omer Reingold, Gil Segev, and Udi Wieder. Balls and bins: smaller hash families and faster evaluation. SIAM J. Comput., 42(3):1030–1050, 2013. doi:10.1137/120871626.
  • [18] Ashok K. Chandra, Lawrence T. Kou, George Markowsky, and Shmuel Zaks. On sets of Boolean n-vectors with all k-projections surjective. Acta Inform., 20(1):103–111, 1983. doi:10.1007/BF00264296.
  • [19] Ruiwen Chen and Valentine Kabanets. Correlation bounds and #SAT algorithms for small linear-size circuits. Theoret. Comput. Sci., 654:2–10, 2016. doi:10.1016/j.tcs.2016.05.005.
  • [20] Kuan Cheng and William M. Hoza. Hitting sets give two-sided derandomization of small space. Theory Comput., 18:Paper No. 21, 32, 2022. doi:10.4086/toc.2022.v018a021.
  • [21] Benny Chor, Oded Goldreich, Johan Håstad, Joel Freidmann, Steven Rudich, and Roman Smolensky. The bit extraction problem or t-resilient functions. In Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS), pages 396–407, 1985. doi:10.1109/SFCS.1985.55.
  • [22] Michael A. Forbes and Venkatesan Guruswami. Dimension Expanders via Rank Condensers. In Naveen Garg, Klaus Jansen, Anup Rao, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015), volume 40 of Leibniz International Proceedings in Informatics (LIPIcs), pages 800–814, Dagstuhl, Germany, 2015. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. preprint: https://arxiv.org/abs/1411.7455. doi:10.4230/LIPIcs.APPROX-RANDOM.2015.800.
  • [23] Alexander Golovnev, Alexander S. Kulikov, Alexander V. Smal, and Suguru Tamaki. Gate elimination: circuit size lower bounds and #SAT upper bounds. Theoret. Comput. Sci., 719:46–63, 2018. doi:10.1016/j.tcs.2017.11.008.
  • [24] Mika Göös, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. SIAM J. Comput., 47(6):2435–2450, 2018. doi:10.1137/16M1059369.
  • [25] Parikshit Gopalan, Daniel M. Kane, and Raghu Meka. Pseudorandomness via the discrete Fourier transform. SIAM J. Comput., 47(6):2451–2487, 2018. doi:10.1137/16M1062132.
  • [26] Itamar Harel, William M. Hoza, Gal Vardi, Itay Evron, Nathan Srebro, and Daniel Soudry. Provable tempered overfitting of minimal nets and typical nets, 2024. doi:10.48550/arXiv.2410.19092.
  • [27] Pooya Hatami and William Hoza. Paradigms for unconditional pseudorandom generators. Found. Trends Theor. Comput. Sci., 16(1-2):1–210, 2024. doi:10.1561/0400000109.
  • [28] William Hoza and Zelin Lv. Fooling near-maximal decision trees. ECCC preprint TR25-003, 2025. URL: https://eccc.weizmann.ac.il/report/2025/003/.
  • [29] Peter J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35:73–101, 1964. doi:10.1214/aoms/1177703732.
  • [30] Kazuo Iwama and Hiroki Morizumi. An explicit lower bound of 5no(n) for Boolean circuits. In Proceedings of the 27th International Symposium on Mathematical Foundations of Computer Science (MFCS), volume 2420 of Lecture Notes in Comput. Sci., pages 353–364. Springer, Berlin, 2002. doi:10.1007/3-540-45687-2_29.
  • [31] Howard Karloff and Yishay Mansour. On construction of k-wise independent random variables. Combinatorica, 17(1):91–107, 1997. doi:10.1007/BF01196134.
  • [32] Daniel J. Kleitman and Joel Spencer. Families of k-independent sets. Discrete Math., 6:255–262, 1973. doi:10.1016/0012-365X(73)90098-8.
  • [33] Robin Kothari, David Racicot-Desloges, and Miklos Santha. Separating decision tree complexity from subcube partition complexity. In Proceedings of the 19th International Workshop on Randomization and Computation (RANDOM), pages 915–930, 2015. doi:10.4230/LIPIcs.APPROX-RANDOM.2015.915.
  • [34] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the Fourier spectrum. SIAM J. Comput., 22(6):1331–1348, 1993. doi:10.1137/0222080.
  • [35] A. A. Lialina. On the complexity of unique circuit sat. J Math Sci, 247:457–466, 2020. doi:10.1007/s10958-020-04813-1.
  • [36] Shachar Lovett, Omer Reingold, Luca Trevisan, and Salil Vadhan. Pseudorandom bit generators that fool modular sums. In Proceedings of the 13th International Workshop on Randomization and Computation (RANDOM), pages 615–630, 2009. doi:10.1007/978-3-642-03685-9_46.
  • [37] Yishay Mansour, Noam Nisan, and Prasoon Tiwari. The computational complexity of universal hashing. Theoretical Computer Science, 107(1):121–133, 1993. doi:10.1016/0304-3975(93)90257-T.
  • [38] Raghu Meka and David Zuckerman. Pseudorandom generators for polynomial threshold functions. SIAM J. Comput., 42(3):1275–1301, 2013. doi:10.1137/100811623.
  • [39] Joseph Naor and Moni Naor. Small-bias probability spaces: efficient constructions and applications. SIAM J. Comput., 22(4):838–856, 1993. doi:10.1137/0222053.
  • [40] Moni Naor, Leonard J. Schulman, and Aravind Srinivasan. Splitters and near-optimal derandomization. In Proceedings of 36th Annual Conference on Foundations of Computer Science (FOCS), pages 182–191, 1995. doi:10.1109/SFCS.1995.492475.
  • [41] Sergey Nurk. An o(20.4058m) upper bound for circuit sat. PDMI technical report, 2009. URL: http://www.pdmi.ras.ru/preprint/2009/09-10.html.
  • [42] Ryan O’Donnell and Yu Zhao. On Closeness to k-Wise Uniformity. In Proceedings of the 22nd International Conference on Randomization and Computation (RANDOM), pages 54:1–54:19, 2018. doi:10.4230/LIPIcs.APPROX-RANDOM.2018.54.
  • [43] Edward Pyne, Ran Raz, and Wei Zhan. Certified hardness vs. randomness for log-space. In Proceedings of the 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 989–1007, 2023. doi:10.1109/FOCS57990.2023.00061.
  • [44] C. Radhakrishna Rao. Factorial experiments derivable from combinatorial arrangements of arrays. Suppl. J. Roy. Statist. Soc., 9:128–139, 1947. doi:10.2307/2983576.
  • [45] Petr Savický. On determinism versus unambiquous nondeterminism for decision trees. ECCC preprint TR02-009, 2002. URL: https://eccc.weizmann.ac.il/report/2002/009/.
  • [46] S. V. Savinov. Upper bound for circuit sat. Master’s thesis, St. Petersburg Academic University RAS, 2014.
  • [47] Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discrete Math., 8(2):223–250, 1995. doi:10.1137/S089548019223872X.
  • [48] Gadiel Seroussi and Nader H. Bshouty. Vector sets for exhaustive testing of logic circuits. IEEE Trans. Inform. Theory, 34(3):513–522, 1988. doi:10.1109/18.6031.
  • [49] Maciej Skorski. Tight Chernoff-Like Bounds Under Limited Independence. In Proceedings of the 26th International Conference on Randomization and Computation (RANDOM), pages 15:1–15:14, 2022. doi:10.4230/LIPIcs.APPROX/RANDOM.2022.15.
  • [50] Thomas Steinke, Salil Vadhan, and Andrew Wan. Pseudorandomness and Fourier-growth bounds for width-3 branching programs. Theory Comput., 13:Paper No. 12, 50, 2017. doi:10.4086/toc.2017.v013a012.
  • [51] Amnon Ta-Shma. Explicit, almost optimal, epsilon-balanced codes. In Proceedings of the 49th Annual Symposium on Theory of Computing (STOC), pages 238–251, 2017. doi:10.1145/3055399.3055408.
  • [52] Donald T. Tang and Lin S. Woo. Exhaustive test pattern generation with constant weight vectors. IEEE Transactions on Computers, C-32(12):1145–1150, 1983. doi:10.1109/TC.1983.1676175.
  • [53] Mark N. Wegman and J. Lawrence Carter. New hash functions and their use in authentication and set equality. J. Comput. System Sci., 22(3):265–279, 1981. Special issue dedicated to Michael Machtey. doi:10.1016/0022-0000(81)90033-7.