Fooling Near-Maximal Decision Trees
Abstract
For any constant , we construct an explicit pseudorandom generator (PRG) that fools -variate decision trees of size with error and seed length . For context, one can achieve seed length using well-known constructions and analyses of small-bias distributions, but such a seed length is trivial when . Our approach is to develop a new variant of the classic concept of almost -wise independence, which might be of independent interest. We say that a distribution over is -wise -probably uniform if every Boolean function that depends on only variables satisfies . We show how to sample a -wise -probably uniform distribution using a seed of length .
Meanwhile, we also show how to construct a set such that every feasible system of linear equations in variables over has a solution in . The cardinality of and the time complexity of enumerating are at most , whereas small-bias distributions would give a bound of .
By combining our new constructions with work by Chen and Kabanets (TCS 2016), we obtain nontrivial PRGs and hitting sets for linear-size Boolean circuits. Specifically, we get an explicit PRG with seed length that fools circuits of size over the basis, and we get a hitting set with time complexity for circuits of size over the basis.
Keywords and phrases:
almost k-wise independence, decision trees, pseudorandom generatorsCategory:
RANDOMCopyright and License:
2012 ACM Subject Classification:
Theory of computation Pseudorandomness and derandomizationAcknowledgements:
We thank Avishay Tal for valuable comments on a draft of this paper and for a discussion about the Fourier spectra of decision trees. We thank Frederic Koehler for pointing out the connection with Huber’s contamination model. We thank Alicia Torres Hoza for helpful comments on drafts of this paper. Zelin Lv thanks Aaron Potechin for valuable discussions.Editors:
Alina Ene and Eshan ChattopadhyaySeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
How many coin flips does it take to sample bits that appear random from the perspective of an observer who only looks at of the bits?
1.1 Almost -wise uniformity and -wise probable uniformity
Almost -wise uniformity is a well-studied concept that provides one possible way of formalizing the question posed above.
Definition 1 (Almost -wise uniformity).
Let be a distribution over , let , and let . We say that is -almost -wise uniform if, for every size- set , the total variation distance between and is at most . Here denotes the projection of to the coordinates in , and denotes the uniform distribution over . If , we simply say that is -wise uniform. An (-almost) -wise uniform generator is a function such that is (-almost) -wise uniform. We refer to as the seed length of .
When and , Karloff and Mansour showed that every -wise uniform generator has seed length at least [31], which might be disappointing. On the bright side, the seed length can be improved if a small positive error () is permitted. Using a connection with “small-bias distributions” [39], Alon, Goldreich, Håstad, and Peralta constructed an explicit111We consider a generator to be explicit if can be computed in time, given the parameters (in this case , , and ) and the seed . -almost -wise uniform generator with seed length [6]. Notably, their seed length is meaningful even for large such as .
In this work, we introduce a new variant of almost -wise uniformity, called -wise probable uniformity, which strengthens Definition 1. There are two equivalent definitions, described below.
Definition 2 (-wise probable uniformity).
Let be a distribution over , let , and let . We say that is -wise -probably uniform if it satisfies either of the following two equivalent conditions.
-
1.
For every size- set , there exists a distribution over such that the distribution can be written as the mixture distribution . That is, the distribution is identical to the following distribution: With probability , sample a -bit string uniformly at random, and with probability , sample a string according to .
-
2.
For every -junta222A -junta is a function that depends on at most variables. , we have , where is a shorthand for .
(See Section 3 for a proof that the two conditions above are equivalent.) We say that is a -wise -probably uniform generator if is -wise -probably uniform.
We find the first condition above to be more conceptually appealing. It is clearly a strengthening of -almost -wise uniformity, and it inspires the terminology “-wise -probably uniform.” On the other hand, we find the second condition above to be easier to work with mathematically.
The concept of -wise probable uniformity is motivated primarily by an application to fooling decision trees, which we will discuss momentarily, but we also consider it to be an interesting concept in its own right. Using a standard nonconstructive argument (see the full version of this paper [28, §4.4]), one can show that there exists a non-explicit -wise -probably uniform generator with seed length333Throughout this paper, denotes the base-two logarithm.
| (1) |
The challenge is to construct an explicit generator.
Classic results regarding small-bias generators [39, 6] imply that there is an explicit -wise -probably uniform generator with seed length .444If is -wise -biased, then is -wise -probably uniform (see Lemmas 17 and 18). Alon, Goldreich, Håstad, and Peralta construct an explicit -wise -biased generator with seed length [6]. Choose . However, this seed length is unsatisfactory, because it is trivial when . Meanwhile, Bshouty used a different approach (the method of conditional probabilities with pessimistic estimators) to construct a generator such that
for every Boolean -junta [15], which is even stronger than Definition 2. Furthermore, his generator’s seed length matches Equation 1. However, his generator’s time complexity is more than [15]. His generator can therefore be considered “explicit” only when , whereas we are primarily interested in the case .
In this work, we present an explicit -wise -probably uniform generator with seed length , where is an arbitrarily small positive constant and the constant hiding under the big- depends on .
Theorem 3 (Explicit -wise probably uniform generator).
For every and , there exists an explicit -wise -probably uniform generator with seed length
The simpler seed length bound follows from Theorem 3 by the weighted AM-GM inequality.
1.2 Fooling decision trees
Instead of modeling the observer as a -junta, we can consider the more powerful model of depth- decision trees. A decision tree makes queries to the input and then produces a Boolean output value . The crucial feature of the decision tree model is that the tree can adaptively decide which variable to query next, based on the results of previous queries. (See Definition 11 for a precise definition.) Consequently, the output of a depth- decision tree might depend on all variables even if . The problem of sampling bits that “appear random” to depth- decision trees can be formalized using the concept of a pseudorandom generator (PRG).
Definition 4 (PRGs).
Let be a distribution over , let , and let . We say that fools with error if
We say that is a pseudorandom generator (PRG) that fools with error if fools with error . The parameter is called the seed length of the PRG. If is a class of functions , we say that (respectively ) fools with error if (respectively ) fools every with error .
Almost -wise uniformity is the special case of Definition 4 in which we take to be the class of all Boolean -juntas. The aforementioned concept of small-bias distributions is another special case. By definition, a distribution is -wise -biased if it fools all functions of the form , where and , with error [39].
To fool decision trees, one could try using a generic small-bias generator. This approach works extremely well in the nonadaptive setting, as mentioned previously. In the adaptive setting, the approach still works fairly well, but it turns out that the parameters are worse. Specifically, Kushilevitz and Mansour’s analysis [34] implies that if is -wise -biased, then fools depth- size- decision trees with error . Every depth- decision tree has size at most , so we can choose . By combining this reduction with one of Alon, Goldreich, Håstad, and Peralta’s -wise -biased generators [6], one can construct an explicit PRG that fools depth- decision trees with error and seed length . This seed length is sufficient for many purposes, but we emphasize that it gives us nothing nontrivial for trees of depth .
In this paper, we show how to improve the leading constant from to for any constant , as a consequence of our new -wise -probably uniform generator. More generally, we prove the following.
Theorem 5 (Fooling near-maximal decision trees).
Let and . There exists an explicit PRG that fools -variate decision trees of size with error and seed length
1.3 A hitting set for systems of equations over
We also study a certain linear-algebraic variant of -wise uniformity. We prove the following.
Theorem 6 (Hitting set for systems of equations over ).
For every , there exists such that:
-
1.
For every and every , there exists such that .
-
2.
Given the parameters and , the set can be enumerated in time (and hence ), where .
We should compare Theorem 6 to what one can get by using a small-bias distribution. One can show that if is -wise -biased, then [34, 9]. If , then by the rank-nullity theorem. Therefore, if we choose , the set satisfies Item 1 of Theorem 6. Plugging in one of Alon, Goldreich, Håstad, and Peralta’s -biased generators [6] would give us . Essentially, Theorem 6 improves the coefficient of in the exponent from to , although our dependence on is worse.
Andreev, Baskakov, Clementi, and Rolim previously claimed to prove a similar theorem, with a bound of [9]. This would be incomparable to Theorem 6: better when and worse when . However, there seems to be a mistake in their analysis.555Andreev, Baskakov, Clementi, and Rolim partition the variables into blocks, , and they say that the condition can be written as a conjunction of conditions [9, Appendix B, preprint version]. But this is not true in general.
1.4 Applications: Pseudorandomness for linear-size Boolean circuits
Our results are motivated by applications in the area of circuit complexity. We consider circuits over the “” and “” bases. A -circuit is a circuit in which each gate computes an arbitrary function . A -circuit is the same, except that gates are not permitted to compute the XOR function or its complement. Chen and Kabanets used “gate elimination” methods to establish, among other results, close connections between linear-size circuits and near-maximal decision trees [19]:
-
Every -circuit of size can be simulated by a decision tree of size [19].
-
Every -circuit of size can be simulated by a parity decision tree666A “parity decision tree” is defined like an ordinary decision tree, except that in each step, the tree can query to learn the parity of any subset of the variables, instead of querying just a single variable. of size [19].
They posed the problem of designing PRGs that fool general Boolean circuits [19]. By combining their simulations with our constructions, we are able to solve their problem, at least in part. First of all, we get a PRG that fools -circuits of size :
Corollary 7 (Fooling circuits over the basis).
For every and , there exists an explicit PRG that fools -variate -circuits of size with error and seed length .
Proof.
By Chen and Kabanets’ work [19], we know every -circuit of size can be simulated by a decision tree of size for some constant . By Theorem 5, we can fool such a tree with error and seed length
This is provided we choose to be a sufficiently small constant based on .
Second, we consider -circuits. We have not managed to construct a genuine PRG that fools -circuits, but we can at least use Theorem 6 to construct a hitting set for -circuits. A hitting set is a relaxation of a PRG, defined as follows.
Definition 8.
Let , let be a class of functions , and let . We say that is an -hitting set for if, for every such that , there exists such that .
Corollary 9 (A hitting set for circuits over the basis).
For every and , there exists a value and a set such that:
-
1.
is an -hitting set for -circuits of size .
-
2.
Given the parameters and , the set can be enumerated in time .
(The proof of Corollary 9 is in Section 6.)
1.4.1 Discussion
In general, the main motivation behind PRGs is that many algorithms and protocols rely on a large number of random bits, but producing truly random bits can sometimes be difficult or expensive. We think of randomness as a computational resource, similar to time or space. We try to use as little “true randomness” as possible to sample bits that are “random enough” to run randomized algorithms and protocols without distorting their behavior. With this motivation in mind, we believe that the problem of fooling -circuits is extremely natural.
The PRG of Corollary 7 is the first of its kind.777To be fair, we should compare Corollary 7 to a different and rather trivial approach that one could use to construct PRGs that fool circuits. In general, if is average-case hard for circuits of size , then the generator maps bits to bits and fools circuits of size . Similarly, the generator maps bits to bits and fools circuits of size , where . One can similarly try , etc. One can instantiate this approach with known average-case hardness results for circuits over the basis or the full binary basis [19, 23]. However, the PRGs that can be constructed using this approach have seed length . The seed length is what makes Corollary 7 interesting. If is constant, then our PRG has linear stretch. Note that the challenge of constructing PRGs that fool Boolean circuits is strictly harder than the notorious challenge of proving circuit lower bounds. In more detail, suppose that one could construct a -time computable PRG that fools -circuits of size with error , where and are constants. Let , and define by truncating the output of . The indicator function for the image of would be an example of a function in that cannot be computed by -circuits of size . Currently, the best lower bound known on the size of -circuits computing some function in is [30].
Hitting sets are commonly used to solve the so-called “GAP-SAT” problem for , i.e., the problem of distinguishing the case from the case , given . Indeed, if is an -hitting set for , then we can solve GAP-SAT for by computing . In this context, we should compare Corollary 9 to prior circuit analysis algorithms. Savinov designed a SAT algorithm for -circuits of size with time complexity [46, 35], improving prior work by Nurk [41]. Golovnev, Kulikov, Smal, and Tamaki designed a #SAT algorithm for -circuits of size with time complexity [23], improving a result by Chen and Kabanets [19]. These prior algorithms solve problems that are harder than GAP-SAT, and furthermore they can handle circuits that are larger than what Corollary 9 can handle. However, Corollary 9 is superior to these prior results in one respect, namely, we can solve GAP-SAT even if we only have query access to the circuit in question. Note that the “black box” nature of hitting sets is crucial in some applications. For example, Cheng and Hoza showed that optimal explicit hitting sets for space-bounded computation would imply , whereas it remains an open problem to prove if we merely assume the existence of an optimal GAP-SAT algorithm for space-bounded computation [20, 43].
1.5 Overview of our new constructions
1.5.1 Our -wise probably uniform generator (Theorem 3)
The starting point of our construction is the well-known sampling properties of pairwise uniform hash functions. Let be any nonzero -junta, or more generally any function such that . If we sample a hash function from a pairwise uniform family, then with high probability over the choice of , we have
(This follows from Chebyshev’s inequality.)
We can think of as a PRG with an excellent seed length. The only trouble is that sampling itself is expensive. In general, sampling a hash function from a pairwise uniform family costs truly random bits, so in our case, the cost is truly random bits, which is much more than we can afford.
We can slightly decrease the cost of sampling by composing with a -almost -wise uniform generator, where , with seed length . Such a generator fools with error , which is negligible. Now the output length of is decreased from down to , hence the cost of sampling is “only” . However, this cost is still more than we can afford.
To explain how we bring the cost down to , for simplicity’s sake, let us assume that and let us neglect terms. We can assume without loss of generality that is simply a conjunction of literals, because every -junta can be written as a sum of such functions. Our approach is to pseudorandomly partition the coordinates into buckets: . In expectation, each bucket contains of the relevant variables. With high probability, each bucket has at most of the variables, where .
We can write , where only depends on variables in , so is a -junta. We sample a hash function such that with high probability over the choice of , we have
For each bucket independently, we sample at random and put in . Crucially, we reuse the same hash function for all of the buckets, which is justified by a simple union bound. The cost of sampling is truly random bits, and the cost of sampling the values is
A more careful calculation, also taking into account the cost of sampling the partition , leads to the seed length bound that appears in Theorem 3.
Observe that in this construction, there are some “bad events” that occur with probability roughly , namely, we might get a “bad” partition of the variables into buckets or we might get a “bad” hash function . Let be the union of these bad events. To analyze the impact of these bad events, let be the output distribution of our generator and let be an arbitrary Boolean -junta. Then
The quantity marked is certainly nonnegative, which allows us to prove . On the other hand, note that the quantity marked might be much larger than , and hence we are not able to prove an upper bound of the form . Thankfully, such an upper bound is not necessary for our applications.
1.5.2 Our hitting set for systems of equations over Theorem 6)
The first step of the proof of Theorem 6 is to apply a rank condenser due to Forbes and Guruswami [22]. This allows us to assume without loss of generality that . The next step is to partition the variables into equal-sized blocks, each containing variables, where . This induces a partition of the columns of : . Let be the contribution of to the rank of , so . A lemma by Andreev, Clementi, and Rolim says that if is a hitting set for systems of equations in variables, then there is some such that [10]. We construct for every by a simple brute-force algorithm, which we can afford because the number of variables is small, and then we output the union of over all possible partitions .
1.6 Limitations of -wise -biased generators
A great deal of effort has been spent trying to optimize the constant factors in the seed lengths of small-bias generators [39, 5, 6, 12, 15, 51, 13]. Researchers have also developed many sophisticated techniques for proving that small-bias generators fool various models of computation; see Hatami and Hoza’s survey for a few examples [27]. The reader might reasonably wonder whether one could have proven our results by simply improving known constructions or analyses of -wise -biased distributions. We prove that the answer is no. In more detail, in the full version of this paper [28, §6], we present examples showing that if the support of every -wise -biased distribution is a -hitting set for -circuits of size , then and . Then, we observe that Karloff and Mansour’s work [31] can be extended to prove the following lower bound on the seed length of -wise -biased generators in the regime .
Theorem 10 (Seed length lower bound for -wise -biased generators).
Let be a -wise -biased generator, where for some . Then
Consequently, if one tries using a generic -wise -biased generator to hit -circuits of size , then the seed length will inevitably be at least . Thus, the concept of -wise -biased distributions is inherently too weak to prove Corollaries 7 and 9. In turn, this implies that the concept of -wise -bias is also too weak to prove our main results (Theorems 3, 5, and 6), of which Corollaries 7 and 9 are applications.888Our results are actually quantitatively stronger in various respects; see the full version of this paper [28] for details.
For context, a sequence of prior works [44, 21, 4, 6, 2, 3, 15] has shown that every -wise -biased generator has seed length at least
| (2) |
Equations 2 and 10 are incomparable in general, but our new Theorem 10 is superior in the parameter regime in which we are interested. In particular, if and for a constant , then the prior bound Equation 2 is , whereas our new Theorem 10 gives a bound of .
1.7 Related work
1.7.1 Approximate forms of -wise uniformity
Prior researchers have studied several different ways of quantifying what it means for a distribution over to be “approximately” -wise uniform.
-
We could require that the total variation distance between and is at most for every size- set . This is the definition of an -almost -wise uniform distribution (Definition 1). See, for example, work by Naor and Naor [39] and work by Alon, Goldreich, Håstad, and Peralta [6].
Despite the attention paid to all of the above variations, we seem to be the first to study the concept of -wise probable uniformity.
1.7.2 Huber’s contamination model
Our notion of “probable uniformity” is similar to Huber’s contamination model in the theory of robust statistics [29]. A key difference is that in Huber’s model, contamination is applied to an unknown distribution, whereas in a -wise probably uniform distribution, every coordinates are distributed according to a contaminated version of the uniform distribution.
1.7.3 Universal sets
A set is called -universal if, for every size--set and every , there exists such that . The concept of -universal sets has been studied in many prior works going back more than half a century [32, 18, 52, 1, 48, 11, 5, 39, 40, 14]. The best explicit construction, due to Naor, Schulman, and Srinivasan [40], has cardinality . Our constructions were inspired by Naor, Schulman, and Srinivasan’s universal set construction [40].
The notion of -wise probable uniformity can be considered a strengthening of -universality, because if is -wise probably uniform, then the support of is -universal. Consequently, Theorem 3 implies the existence of an explicit -universal set with cardinality , but this is inferior to Naor, Schulman, and Srinivasan’s construction [40].999A -universal set is typically considered “explicit” if the entire set can be computed in time. Our set has stronger explicitness guarantees, which might possibly be of value, but note that Naor, Schulman, and Srinivasan already constructed a -universal set of cardinality with similar explicitness guarantees [40]. Our -wise uniform generator also has similarities with a recent construction of a “biased” variant of universal sets by Harel, Hoza, Vardi, Evron, Srebro, and Soudry [26].
1.7.4 PRGs based on pseudorandom partitions of the variables
The trick of pseudorandomly partitioning the variables into buckets is not new; similar tricks have been used in many prior PRG constructions. For a few examples that are especially similar to our work, see work by Meka and Zuckerman [38], work by Lovett, Reingold, Trevisan, and Vadhan [36], and work by Gopalan, Kane, and Meka [25].
1.7.5 Correlation bounds for general circuit models
In general, PRGs are intimately related to correlation bounds, aka average-case hardness. Loosely speaking, correlation bounds are a prerequisite to designing PRGs. See, e.g., Hatami and Hoza’s survey [27, Chapter 4] for further discussion. Chen and Kabanets proved the first correlation bounds for general, unbounded-depth circuit models [19], and our PRG for -circuits uses their work, as mentioned previously. Golovnev, Kulikov, Smal, and Tamaki subsequently proved better correlation bounds [23].
2 Preliminaries
2.1 Decision tree models
Below we record the standard definitions of a decision tree and parity decision trees.
Definition 11 (Decision trees).
An -variate decision tree is a rooted tree in which each internal node is labeled with a variable from among ; each internal node has two outgoing edges labeled and ; and each leaf is labeled either or . The tree computes a Boolean function defined inductively as follows. If consists of a single leaf labeled , then we define . Otherwise, let be the variable labeling the root node. Given an input , we start at the root node and traverse the outgoing edge labeled with the value . This leads to a vertex , which is the root of a subtree . Then we set . The depth of the tree is the length of the longest path from the root to a leaf. The size of the tree is the total number of leaves.
Definition 12 (Parity decision trees [34]).
A parity decision tree on variables is a rooted tree defined exactly as in Definition 11, except that each internal node is labeled by a non-empty subset . The internal node queries and has two outgoing edges labeled and corresponding to the value of that parity. Leaves are labeled by output bits in , and evaluation proceeds exactly as for ordinary decision trees. The depth of a parity decision tree is the length of the longest root-to-leaf path, and its size is the number of leaves. Equivalently, a parity decision tree computes a Boolean function
where is an ordinary decision tree on inputs and . computation.
2.2 Pairwise uniform hashing
We rely on the standard notion of a pairwise uniform hashing, aka “strongly universal hashing,” introduced in Carter and Wegman’s seminal papers [16, 53].
Definition 13 (Pairwise uniform families of hash functions).
A family of hash functions is called pairwise uniform if, for every two distinct , if we sample , then is distributed uniformly at random over .
Theorem 14 (Explicit pairwise uniform families of hash functions).
For every , there exists an explicit101010That is, given a seed and an input , the value can be computed in time, where is the hash function corresponding to the seed . pairwise uniform family of hash functions such that can be sampled using a seed of length .
For example, if we define , where is convolution mod and is bitwise XOR, then is a pairwise uniform family [37]. The reason pairwise uniform hashing is useful for us is given by the following relative-error sampling lemma.
Lemma 15 (Pairwise uniformity sampling lemma).
Let be a pairwise uniform family of hash functions . Let and let . Then for every ,
Proof.
For each , define , so is a random variable based on the choice of . Then and . Furthermore, the variables are pairwise independent. Therefore, if we let , then and . Therefore, by Chebyshev’s inequality, we have
2.3 Small-bias distributions
We also rely on asymptotically optimal constructions of -wise -biased generators, which were defined in Section 1.2.
Theorem 16 (Explicit -wise -biased generators [39]).
For every and every , there exists an explicit -wise -biased generator with seed length .
The reason -wise -biased generators are useful for us is that they satisfy the following two properties.
3 Characterizing -wise probable uniformity
The following proposition shows the equivalence of three ways of defining -wise probably uniform distributions.
Proposition 18 (Equivalence of three definitions of -wise probable uniformity).
Let be a distribution over , let , and let . Then the following are equivalent.
-
1.
For every -junta , we have .
-
2.
For every size- set and every , we have .
-
3.
For every size- set , there exists a distribution over such that one can sample from by sampling from with probability and sampling from with probability .
Proof.
-
() Consider the function .
-
() If , then for every , we have , which implies that is exactly uniform over . If , define by the formula
Then is a probability mass function: it is nonnegative because , and it sums to because is a probability distribution. Let be corresponding probability distribution.
-
() If is a -junta, then there is some set of size and some function such that for all . Therefore,
By definition, if satisfies any of the three equivalent conditions in Proposition 18, then is -wise -probably uniform. The third condition in Proposition 18 motivates the name “-wise probably uniform,” but we find it more mathematically convenient to work with the first two conditions.
4 Constructing -wise probably uniform generators
In this section, we present our new -wise probably uniform generator, thereby proving Theorem 3. At the end of this section, for completeness’ sake, we record the standard nonconstructive proof of the existence of nonexplicit -wise probably uniform generators with excellent seed lengths.
4.1 A small family of generators, each with a good seed length
As a first step, we begin by constructing a family of generator , such that for any -junta , most generators satisfy . This construction is based on a combination of pairwise uniform hash functions and -wise -biased generators.
Lemma 19 (Family of generators).
For every and , there exists an explicit family of PRGs satisfying the following.
-
1.
A generator can be sampled using truly random bits.
-
2.
Each generator in has seed length .
-
3.
If is a -junta with expectation , then
Proof.
Let be a -wise -biased generator where and
Let be a pairwise uniform family of hash functions . For each hash function in , we define a generator . By Theorems 14 and 16, this family is explicit and can be sampled using truly random bits.
For the correctness proof, define by and let . The generator fools with error (see Lemma 17), so . Furthermore, unless , so . Therefore, by the pairwise uniformity sampling lemma (Lemma 15), we have
provided we choose a suitable value . Now fix an such that the bad event above does not occur, and let be the corresponding generator in , i.e., . Then fools with error
4.2 Pseudorandomly partitioning the coordinates into buckets
In this subsection, we explain how to pseudorandomly partition the coordinates into buckets, , such that no single bucket gets too many of the coordinates we care about. To be more precise, we construct a balanced partition generator, defined as follows.
Definition 20 (Balanced partition generator [38]).
A -balanced partition generator is a function such that for every set with , with probability at least over a uniform random choice of seed , for every bucket , we have .
Definition 20 is due to Meka and Zuckerman, who used the term “balanced hash family” [38, Definition 4.9]. We use the term “balanced partition generator” to avoid confusion with the hash functions that appear in the proof of Lemma 19. Our balanced partition generator will essentially consist of a -wise -biased generator for appropriate values and . The analysis will be based on the following bound on the moments of a sum of independent Bernoulli random variables [47].111111The exact statement of Theorem 21 does not appear in Schmidt, Siegel, and Srinivasan’s work [47], but it follows from the proof of item “(III)” in their “Theorem 4.”
Theorem 21 (Moment bound for a sum of independent Bernoulli random variables [47]).
Let be independent -valued random variables. Let , let , and let . Then for every even positive integer , we have
Theorem 21 can be improved in some parameter regimes [49], but the simple bound in Theorem 21 suffices for our purposes. Using Theorem 21, we now present a tail bound for sums of random variables that satisfy a certain “near -wise independence” condition. Similar bounds were proven in several previous papers [36, 17, 50], and our proof is almost identical to their proofs.
Corollary 22 (Tail bound for sums of nearly -wise independent random variables).
Let be -valued random variables and let . Let and . Let be an even positive integer, let , and assume that for every set with , we have
Then for every , we have
Proof.
See the full version of this paper [28, §4.2].
Given Corollary 22, we are ready to construct our balanced partition generator.
Lemma 23 (Balanced partition generator).
Let and . Assume is a power of two and . There exists an explicit -balanced partition generator , where
with seed length
Proof.
Identify with . We let be a -wise -biased generator for appropriate values and The seed length bound follows from Theorem 16. For the correctness proof, assume without loss of generality that . Sample using the generator. Fix any bucket . For each , let indicate whether . Then for any set with , the value can be expressed in terms of the underlying bits of as a conjunction of at most literals. Therefore, by Lemma 17, we have . Therefore, by Corollary 22, for every , we have
We choose . Then we get
due to our choices of and . The union bound over buckets completes the proof.
For comparison, Lovett, Reingold, Trevisan, and Vadhan constructed an explicit -balanced partition generator for the special case , with and seed length [36]. For any , one can also use Gopalan, Kane, and Meka’s PRG for Fourier shapes [25] to construct a -balanced partition generator with the same value of as in Lemma 23 and with seed length .
4.3 The full -wise probably uniform generator
Proof of Theorem 3.
Let be the -balanced partition generator from Lemma 23 with parameters and , or to be more precise, is the largest power of two that is at most . Let be the family of generators from Lemma 19, using and using the value from . The final generator is defined as follows.
-
1.
Sample a partition using .
-
2.
Sample a generator .
-
3.
Sample seeds independently and uniformly at random.
-
4.
Output , where
for every .
To prove that this works, let be a conjunction of literals, say
where , , and for every . We will prove that , which is sufficient by Proposition 18.
For each bucket , let . The definition of a balanced partition generator ensures that except with probability over the choice of , we have for every . Let be this “good” event. Fix any choice of such that occurs.
For each , define by
so . By Lemma 19 and the union bound over the buckets, except with probability over the choice of , we have
for every . Let be this “good” event. Fix any choice of such that occurs.
For any such fixing of and , with respect to the choice of alone, we have
by Bernoulli’s inequality. Therefore, with respect to all the randomness, we have
by another application of Bernoulli’s inequality.
5 Implications of -wise probable uniformity
In this section, we will show that every -wise probably uniform distribution fools decision trees. In fact, we will show that such distributions fool a more general model, called the subcube partition model.
Definition 24 (The subcube partition model).
A subcube partition is a collection of terms and values . Each term is a conjunction of literals, and the sets must partition the domain . That is, for every , we have . The subcube partition computes the function defined by
The width of a term is the number of literals in the term. The width of the subcube partition is the maximum width of any term. The size of the subcube partition is the number of terms ().
Every width- subcube partition has size at most , because . A decision tree of depth and size can be simulated by a subcube partition of width and size : for each leaf , we construct a term that indicates whether the tree reaches the leaf on a given input. The converse does not hold. In fact, there exist subcube partitions of width that cannot be simulated by decision trees of depth [45, 33, 24, 8]. We now explain why -wise probably uniform generators fool subcube partitions.
Lemma 25 (-wise probable uniformity fools subcube partitions).
Let be a distribution over that is -wise -probably uniform. Then:
-
fools width- subcube partitions (hence also depth- decision trees) with error .
-
fools size- subcube partitions (hence also size- decision trees) with error .
Proof.
Let be a function computed by a subcube partition with terms and values . Let be the set of terms of width at most . We will show that fools with error . To prove it, sample uniformly at random. Then
Now we bound the expectation from above. Let . Since can also be computed by a subcube partition with the same terms , we have
The lemma follows, because whenever .
By combining Theorem 3 (our -wise probably uniform generator) with Lemma 25, we now prove the following theorem, which generalizes Theorem 5.
Theorem 26 (Fooling near-maximal subcube partitions).
Let and . There exists an explicit PRG that fools -variate subcube partitions of size with error and seed length
| (3) |
Proof.
We use our -wise -probably uniform generator, where . By Lemma 25, the generator fools size- subcube partitions with error . By Theorem 3, the seed length is
which, after substituting the choice of , simplifies to Equation 3.
6 Hitting sets for systems of equations over and for -circuits
In this section, we present our hitting set for systems of equations over , thereby proving Theorem 6. Next, we show that such hitting sets can hit circuits over the basis, thus proving Corollary 9. In the full version of this paper [28], we also present a more explicit construction of hitting set generators for systems of equations over , where given the seed, we can output the corresponding string in time .
6.1 Rank condenser
First, we use a rank condenser, due to Forbes and Guruswami [22] to “condense” the number of variables from to .
Definition 27 (-rank condenser).
Let be a field and let . A collection of matrices is a -rank condenser if, for every matrix with , there exists such that .
We say that is explicit if, given an index , the -th matrix of can be constructed in time .
Remark 28.
Stronger “lossless” variants – which bound how many matrices in can cause rank loss or how much total rank loss can occur – have been studied (see, e.g., Forbes and Guruswami [22]). The simpler notion above suffices for our purposes.
The following theorem, due to Forbes and Guruswami, shows that we can construct such condensers explicitly over while keeping the output dimension only .
Theorem 29.
Let . There is an explicit -rank condenser with .
Proof.
This follows from Forbes and Guruswami’s work [22, Corollary 8.7, preprint version], by setting the parameters appropriately.
6.2 Partition the variables
For the sake of brevity, we define the following notation.
Definition 30.
Let . We say that hits codimension if, for every affine subspace of codimension , there exists in this affine subspace. Equivalently, for every and every , there exists such that .
Our goal is to construct an that hits codimension . We can split the variables into consecutive blocks of arbitrary size. For any this induces a column partition, giving a column partition , where and . Without loss of generality, we assume that has full rank. Write for the incremental rank contributed by block , i.e., , so . Andreev, Clementi and Rolim [10] stated the result that, if hits codimension for every , then there is some in the Cartesian product such that .
However, they skipped the proof, so we complete the proof in this subsection. We began by showing that after fixing the first blocks, the feasible assignments to the -th block form an affine subspace of codimension . In the following, we focus on the case of partitioning into two blocks, which will turn out to be sufficient for analyzing the general case.
Lemma 31.
Let be a field. Let and . Let , and define Then is an affine space with codimension
Proof.
Since , we know there exists such that . Let . We claim that . Indeed, if , then , so there is some such that . Hence
so . Conversely, if , then there is some such that , and consequently
showing that , i.e. .
Now we are going to show that . Let be a basis of . Extend this to a basis of , and set so . Because , the map is injective on . Hence On the other hand, .
Corollary 32 ([10]).
Let be a matrix of rank , where each block and . For every , let For each , let , and assume hits codimension . Then for every , there exists a vector in the Cartesian product
such that .
Proof.
We prove it by induction on . In the base case, when , the corollary follows immediately from the definition of hitting rank . Now suppose . Define , and define
By Lemma 31, is an affine space with codimension . Therefore, there exists . By the definition of , . Therefore, by induction, there exists such that . Let . Then , and .
6.3 Brute-force construction
There is a simple brute-force method for constructing a set that hits codimension withthe following time complexity.
Lemma 33 (Brute-force hitting set for systems of equations).
For every , there exists of size that hits codimension , which can be constructed in time .
The proof of Lemma 33 can be found in the full version of this paper [28]. On its own, the algorithm of Lemma 33 is too slow to prove Theorem 6. However, we will only apply the brute-force method after reducing the length of binary strings we are searching, so we can afford the exponential time cost. Note the size of our construction matches that of the hitting set obtained by the standard probabilistic method. This method is similar to Naor, Schulman, and Srinivasan’s work [40].
6.4 Our final hitting set for systems of equations
Proof of Theorem 6.
Without loss of generality, we assume that ; otherwise, we can simply use a small-bias distribution as described in the paragraph following the statement of Theorem 6. First we use Theorem 29 to construct a -rank condenser , where . Then we partition the variables into blocks of equal size, where (the exact value will be specified later). Without loss of generality, we assume that is an integer. For each , we use Lemma 33 to construct that hits codimension , as defined in Definition 30. Then we combine them by taking a Cartesian product. Thus, the overall construction is
We first prove the construction is efficient (Item 2) and then prove the construction is correct (Item 1).
(Item 2). By Lemma 33, we know the size of each is , and the total running time to construct these hitting sets over variables is .
For one partition of , namely , we have
Since each , the total number of partitions is at most . Thus,
Thus, the time used by our algorithm is at most
By choosing , we get the time used by our algorithm to be
(Item 1). Now consider any and any . Without loss of generality, we assume that . By Theorem 29, we know there is an such that , and we denote . Since and , we have , thus as well. By partitioning into blocks of size , we get
Let . Thus, by Corollary 32, we know that there exists an , such that , which means and .
6.5 Hitting set for -circuits
In this subsection, we will show that this hitting set can be used for circuits.
Corollary 34 (Restatement of Corollary 9).
For every and , there exists a value and a set such that:
-
1.
is an -hitting set for -circuits of size .
-
2.
Given the parameters and , the set can be enumerated in time .
Proof.
Assume without loss of generality that the queries on every root-to-leaf path in are linearly independent. First we note that any parity decision tree can be written as , where each corresponds to a path from the root to an accepting leaf in . Note that ’s are disjoint and each is a conjunction of parities function. The number of parity functions in the conjunction is the depth of this leaf. If is a size- parity decision tree with , then there is some accepting leaf that is reached with probability greater than . The depth of that leaf must be less than , because otherwise the probability of reaching it would be smaller. Consequently, if hits codimension , then is an -hitting set for size- parity decision trees.
By Chen and Kabanets’ work [19], we know every -circuit of size can be simulated by a parity decision tree of size . Let . Then . By Lemma 33, a set that hits codimension can be enumerated in time
Remark 35.
In this proof, we have used the fact hitting parity decision trees is equivalent to hitting system of equations. We further note that hitting DNF of parities is also equivalent to these two problems.
References
- [1] N. Alon. Explicit construction of exponential sized families of -independent sets. Discrete Math., 58(2):191–193, 1986. doi:10.1016/0012-365X(86)90161-5.
- [2] Noga Alon. Perturbed identity matrices have high rank: proof and applications. Combin. Probab. Comput., 18(1-2):3–15, 2009. doi:10.1017/S0963548307008917.
- [3] Noga Alon, Alexandr Andoni, Tali Kaufman, Kevin Matulef, Ronitt Rubinfeld, and Ning Xie. Testing -wise and almost -wise independence. In Proceedings of the 39th Annual Symposium on Theory of Computing (STOC), pages 496–505, 2007. doi:10.1145/1250790.1250863.
- [4] Noga Alon, László Babai, and Alon Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 7(4):567–583, 1986. doi:10.1016/0196-6774(86)90019-2.
- [5] Noga Alon, Jehoshua Bruck, Joseph Naor, Moni Naor, and Ron M. Roth. Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs. IEEE Transactions on Information Theory, 38(2):509–516, 1992. doi:10.1109/18.119713.
- [6] Noga Alon, Oded Goldreich, Johan Håstad, and René Peralta. Simple constructions of almost -wise independent random variables. Random Structures Algorithms, 3(3):289–304, 1992. doi:10.1002/rsa.3240030308.
- [7] Noga Alon, Oded Goldreich, and Yishay Mansour. Almost -wise independence versus -wise independence. Inform. Process. Lett., 88(3):107–110, 2003. doi:10.1016/S0020-0190(03)00359-4.
- [8] Andris Ambainis, Martins Kokainis, and Robin Kothari. Nearly Optimal Separations Between Communication (or Query) Complexity and Partitions. In Proceedings of the 31st Conference on Computational Complexity (CCC), pages 4:1–4:14, 2016. doi:10.4230/LIPIcs.CCC.2016.4.
- [9] Alexander E. Andreev, Juri L. Baskakov, Andrea E. F. Clementi, and José D. P. Rolim. Small pseudo-random sets yield hard functions: New tight explicit lower bounds for branching programs. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP), pages 179–189, 1999. preprint: https://eccc.weizmann.ac.il/report/1997/053/. doi:10.1007/3-540-48523-6_15.
- [10] Alexander E. Andreev, Andrea E. F. Clementi, and José D. P. Rolim. Efficient constructions of hitting sets for systems of linear functions. In Proceedings of the 14th Annual Symposium on Theoretical Aspects of Computer Science (STACS), pages 387–398, 1997. doi:10.1007/BFb0023475.
- [11] Bernd Becker and Hans-Ulrich Simon. How robust is the -cube? Inform. and Comput., 77(2):162–178, 1988. doi:10.1016/0890-5401(88)90056-9.
- [12] Avraham Ben-Aroya and Amnon Ta-Shma. Constructing small-bias sets from algebraic-geometric codes. Theory Comput., 9:253–272, 2013. doi:10.4086/toc.2013.v009a005.
- [13] Guy Blanc and Dean Doron. New Near-Linear Time Decodable Codes Closer to the GV Bound. In Proceedings of the 37th Annual Computational Complexity Conference (CCC), pages 10:1–10:40, 2022. doi:10.4230/LIPIcs.CCC.2022.10.
- [14] Nader H. Bshouty. Testers and their applications [extended abstract]. In Proceedings of the 5th Conference on Innovations in Theoretical Computer Science (ITCS), pages 327–351. ACM, New York, 2014. doi:10.1145/2554797.2554828.
- [15] Nader H. Bshouty. Derandomizing chernoff bound with union bound with an application to -wise independent sets, 2016. arXiv:1608.01568.
- [16] J. Lawrence Carter and Mark N. Wegman. Universal classes of hash functions. J. Comput. System Sci., 18(2):143–154, 1979. doi:10.1016/0022-0000(79)90044-8.
- [17] L. Elisa Celis, Omer Reingold, Gil Segev, and Udi Wieder. Balls and bins: smaller hash families and faster evaluation. SIAM J. Comput., 42(3):1030–1050, 2013. doi:10.1137/120871626.
- [18] Ashok K. Chandra, Lawrence T. Kou, George Markowsky, and Shmuel Zaks. On sets of Boolean -vectors with all -projections surjective. Acta Inform., 20(1):103–111, 1983. doi:10.1007/BF00264296.
- [19] Ruiwen Chen and Valentine Kabanets. Correlation bounds and #SAT algorithms for small linear-size circuits. Theoret. Comput. Sci., 654:2–10, 2016. doi:10.1016/j.tcs.2016.05.005.
- [20] Kuan Cheng and William M. Hoza. Hitting sets give two-sided derandomization of small space. Theory Comput., 18:Paper No. 21, 32, 2022. doi:10.4086/toc.2022.v018a021.
- [21] Benny Chor, Oded Goldreich, Johan Håstad, Joel Freidmann, Steven Rudich, and Roman Smolensky. The bit extraction problem or t-resilient functions. In Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS), pages 396–407, 1985. doi:10.1109/SFCS.1985.55.
- [22] Michael A. Forbes and Venkatesan Guruswami. Dimension Expanders via Rank Condensers. In Naveen Garg, Klaus Jansen, Anup Rao, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015), volume 40 of Leibniz International Proceedings in Informatics (LIPIcs), pages 800–814, Dagstuhl, Germany, 2015. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. preprint: https://arxiv.org/abs/1411.7455. doi:10.4230/LIPIcs.APPROX-RANDOM.2015.800.
- [23] Alexander Golovnev, Alexander S. Kulikov, Alexander V. Smal, and Suguru Tamaki. Gate elimination: circuit size lower bounds and #SAT upper bounds. Theoret. Comput. Sci., 719:46–63, 2018. doi:10.1016/j.tcs.2017.11.008.
- [24] Mika Göös, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. SIAM J. Comput., 47(6):2435–2450, 2018. doi:10.1137/16M1059369.
- [25] Parikshit Gopalan, Daniel M. Kane, and Raghu Meka. Pseudorandomness via the discrete Fourier transform. SIAM J. Comput., 47(6):2451–2487, 2018. doi:10.1137/16M1062132.
- [26] Itamar Harel, William M. Hoza, Gal Vardi, Itay Evron, Nathan Srebro, and Daniel Soudry. Provable tempered overfitting of minimal nets and typical nets, 2024. doi:10.48550/arXiv.2410.19092.
- [27] Pooya Hatami and William Hoza. Paradigms for unconditional pseudorandom generators. Found. Trends Theor. Comput. Sci., 16(1-2):1–210, 2024. doi:10.1561/0400000109.
- [28] William Hoza and Zelin Lv. Fooling near-maximal decision trees. ECCC preprint TR25-003, 2025. URL: https://eccc.weizmann.ac.il/report/2025/003/.
- [29] Peter J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35:73–101, 1964. doi:10.1214/aoms/1177703732.
- [30] Kazuo Iwama and Hiroki Morizumi. An explicit lower bound of for Boolean circuits. In Proceedings of the 27th International Symposium on Mathematical Foundations of Computer Science (MFCS), volume 2420 of Lecture Notes in Comput. Sci., pages 353–364. Springer, Berlin, 2002. doi:10.1007/3-540-45687-2_29.
- [31] Howard Karloff and Yishay Mansour. On construction of -wise independent random variables. Combinatorica, 17(1):91–107, 1997. doi:10.1007/BF01196134.
- [32] Daniel J. Kleitman and Joel Spencer. Families of -independent sets. Discrete Math., 6:255–262, 1973. doi:10.1016/0012-365X(73)90098-8.
- [33] Robin Kothari, David Racicot-Desloges, and Miklos Santha. Separating decision tree complexity from subcube partition complexity. In Proceedings of the 19th International Workshop on Randomization and Computation (RANDOM), pages 915–930, 2015. doi:10.4230/LIPIcs.APPROX-RANDOM.2015.915.
- [34] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the Fourier spectrum. SIAM J. Comput., 22(6):1331–1348, 1993. doi:10.1137/0222080.
- [35] A. A. Lialina. On the complexity of unique circuit sat. J Math Sci, 247:457–466, 2020. doi:10.1007/s10958-020-04813-1.
- [36] Shachar Lovett, Omer Reingold, Luca Trevisan, and Salil Vadhan. Pseudorandom bit generators that fool modular sums. In Proceedings of the 13th International Workshop on Randomization and Computation (RANDOM), pages 615–630, 2009. doi:10.1007/978-3-642-03685-9_46.
- [37] Yishay Mansour, Noam Nisan, and Prasoon Tiwari. The computational complexity of universal hashing. Theoretical Computer Science, 107(1):121–133, 1993. doi:10.1016/0304-3975(93)90257-T.
- [38] Raghu Meka and David Zuckerman. Pseudorandom generators for polynomial threshold functions. SIAM J. Comput., 42(3):1275–1301, 2013. doi:10.1137/100811623.
- [39] Joseph Naor and Moni Naor. Small-bias probability spaces: efficient constructions and applications. SIAM J. Comput., 22(4):838–856, 1993. doi:10.1137/0222053.
- [40] Moni Naor, Leonard J. Schulman, and Aravind Srinivasan. Splitters and near-optimal derandomization. In Proceedings of 36th Annual Conference on Foundations of Computer Science (FOCS), pages 182–191, 1995. doi:10.1109/SFCS.1995.492475.
- [41] Sergey Nurk. An upper bound for circuit sat. PDMI technical report, 2009. URL: http://www.pdmi.ras.ru/preprint/2009/09-10.html.
- [42] Ryan O’Donnell and Yu Zhao. On Closeness to -Wise Uniformity. In Proceedings of the 22nd International Conference on Randomization and Computation (RANDOM), pages 54:1–54:19, 2018. doi:10.4230/LIPIcs.APPROX-RANDOM.2018.54.
- [43] Edward Pyne, Ran Raz, and Wei Zhan. Certified hardness vs. randomness for log-space. In Proceedings of the 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 989–1007, 2023. doi:10.1109/FOCS57990.2023.00061.
- [44] C. Radhakrishna Rao. Factorial experiments derivable from combinatorial arrangements of arrays. Suppl. J. Roy. Statist. Soc., 9:128–139, 1947. doi:10.2307/2983576.
- [45] Petr Savický. On determinism versus unambiquous nondeterminism for decision trees. ECCC preprint TR02-009, 2002. URL: https://eccc.weizmann.ac.il/report/2002/009/.
- [46] S. V. Savinov. Upper bound for circuit sat. Master’s thesis, St. Petersburg Academic University RAS, 2014.
- [47] Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discrete Math., 8(2):223–250, 1995. doi:10.1137/S089548019223872X.
- [48] Gadiel Seroussi and Nader H. Bshouty. Vector sets for exhaustive testing of logic circuits. IEEE Trans. Inform. Theory, 34(3):513–522, 1988. doi:10.1109/18.6031.
- [49] Maciej Skorski. Tight Chernoff-Like Bounds Under Limited Independence. In Proceedings of the 26th International Conference on Randomization and Computation (RANDOM), pages 15:1–15:14, 2022. doi:10.4230/LIPIcs.APPROX/RANDOM.2022.15.
- [50] Thomas Steinke, Salil Vadhan, and Andrew Wan. Pseudorandomness and Fourier-growth bounds for width-3 branching programs. Theory Comput., 13:Paper No. 12, 50, 2017. doi:10.4086/toc.2017.v013a012.
- [51] Amnon Ta-Shma. Explicit, almost optimal, epsilon-balanced codes. In Proceedings of the 49th Annual Symposium on Theory of Computing (STOC), pages 238–251, 2017. doi:10.1145/3055399.3055408.
- [52] Donald T. Tang and Lin S. Woo. Exhaustive test pattern generation with constant weight vectors. IEEE Transactions on Computers, C-32(12):1145–1150, 1983. doi:10.1109/TC.1983.1676175.
- [53] Mark N. Wegman and J. Lawrence Carter. New hash functions and their use in authentication and set equality. J. Comput. System Sci., 22(3):265–279, 1981. Special issue dedicated to Michael Machtey. doi:10.1016/0022-0000(81)90033-7.
