Circuit Lower Bounds for MCSP from Local Pseudorandom Generators

The Minimum Circuit Size Problem (MCSP) asks if a given truth table of a Boolean function f can be computed by a Boolean circuit of size at most θ, for a given parameter θ. We improve several circuit lower bounds for MCSP, using pseudorandom generators (PRGs) that are local; a PRG is called local if its output bit strings, when viewed as the truth table of a Boolean function, can be computed by a Boolean circuit of small size. We get new and improved lower bounds for MCSP that almost match the best-known lower bounds against several circuit models. Specifically, we show that computing MCSP, on functions with a truth table of length N, requires • N3−o(1)-size de Morgan formulas, improving the recent N2−o(1) lower bound by Hirahara and Santhanam (CCC, 2017), • N2−o(1)-size formulas over an arbitrary basis or general branching programs (no non-trivial lower bound was known for MCSP against these models), and • 2Ω(N1/(d+1.01))-size depth-d AC0 circuits, improving the (implicit, in their work) exponential size lower bound by Allender et al. (SICOMP, 2006). The AC0 lower bound stated above matches the best-known AC0 lower bound (for PARITY) up to a small additive constant in the depth. Also, for the special case of depth-2 circuits (i.e., CNFs or DNFs), we get an optimal lower bound of 2Ω(N) for MCSP.

We also get almost-quadratic lower bounds against formulas over an arbitrary basis as well as general branching programs; these almost match the best-known lower bounds against these models [19]. Theorem 2. Let C be either a formula over any basis or a branching program that computes MCSP on truth tables of length N . Then C must have size at least For small-depth circuits, we have the following improved lower bound for MCSP, whose dependence on the depth matches the one in the PARITY lower bound, up to a small additive constant.
For the special case of depth-2 circuits, we can have an optimal lower bound. Also, in this article, we give a fine-grained analysis of the approach of obtaining MCSP lower bounds from average-case hardness via the Nisan-Wigderson framework (see Section 7).

Our Techniques
For a class C of N -variate Boolean functions, a pseudorandom generator (PRG) against C is a deterministic efficiently-computable function G mapping short binary strings (seeds) to longer binary strings so that every function in C accepts G's output on a uniformly random seed with about the same probability as that for an actual uniformly random string.
A key notion in this work is that of a local PRG. We say that a PRG is local if its N -bit output (viewed as the truth table of some function) has small circuit complexity. More precisely, for any fixed seed to the PRG, there exists a small circuit such that, given j ∈ [N ] as an input, the circuit computes the jth bit of the PRG output, where the complexity of the circuit is measured relative to its input length, namely, log N . Note that our notion of local PRGs does not require that the PRG in question is explicit; that is, we do not require that a local PRG can be computed by some uniform algorithm.
Local PRGs in the context of MCSP (and related problems) have been studied in previous works (see, e.g., References [2,12,13,23]). In this work, we refine the previous approaches, and obtain stronger circuit lower bounds by establishing strong locality properties of certain PRG constructions. 2 1.2.1 MCSP Lower Bounds from Local PRGs. Suppose we have a local PRG against some class of circuits C of size s, and we want to show that MCSP cannot be computed by any size-s circuit in C. Suppose some size-s circuit C in C computes MCSP. Using the fact that a random function has almost maximum circuit complexity, we have that C will output false on most of its inputs (by setting the size parameter θ to be a non-trivial quantity that is asymptotically smaller than 2 n /n, where n is the input length of the function). If we replace the uniformly random inputs with the outputs of the local PRG, then, by the definition of PRG, C will still output false with large probability. However, since the PRG is local, all of its outputs have circuit complexity smaller than the size parameter θ , and hence must be accepted by C. A contradiction. To get a strong lower bound, we would like to make the above argument to work for large s. Note that the local complexity of the PRG, λ(s), is a function on the size of the circuit C, and we need this local complexity to be "non-trivial" to reach a contradiction. Therefore, we want to choose s so that this local complexity remains asymptotically smaller than 2 n /n. As a result, the final lower bound (i.e., the largest s that we can choose) is determined by the local complexity λ. So the main question we study in our article is: What is the smallest local complexity of a PRG against a given circuit class?

MCSP Lower Bound Against de Morgan Formulas.
Our formula lower bound for MCSP is obtained by applying the framework described above to a local PRG against formulas. The stateof-the-art PRG against formulas is given by Impagliazzo et al. [14], which we refer to as the IMZ PRG. Their PRG has a seed length of s 1/3+o (1) for size s formulas (note that such a PRG is useful against sub-cubic formulas only). If we want to utilize the IMZ PRG to get an MCSP lower bound against formulas, then we will need to argue that the IMZ PRG is local.
In fact, to get an almost-cubic lower bound, we will need such a PRG to be strongly local in the sense that any single output bit of the PRG (on any given fixed seed) can be computed by a circuit of size comparable to its seed length, which is s 1/3+o (1) . However, by inspecting the construction, the IMZ PRG does not seem to have such a property, and a straightforward implementation seems to require a circuit of size at least s 2/3 (see Appendix B for more details), which yields a weaker lower bound for MCSP.
To overcome this issue, we present an alternative PRG useful against sub-cubic formulas, which is strongly local. The construction of this PRG can be viewed as a modification of the IMZ PRG. At a high level, it is based on the Ajtai-Wigderson construction [1], which is a framework for constructing PRGs against computations that can be simplified under (pseudo)random restrictions. This framework is then combined with the ideas for reducing (recycling) random bits using an extractor, by exploiting communication bottlenecks in computations [21]. Our modification, particularly the utilization of the Ajtai-Wigderson construction, allows us to compute any output bit of the PRG efficiently by reducing the number of calls to the extractor. Using some crucial observations on the circuit complexity of certain pseudorandom objects, we get a PRG that is locally computable by a s 1/3+o (1) -size circuit. 3

MCSP Lower Bounds Against Formulas Over an Arbitrary Basis or Branching Programs.
The MCSP lower bounds against formulas over an arbitrary basis or branching programs are obtained similarly to those for de Morgan formulas. The idea is to construct strongly local PRGs against these models by modifying the PRGs in Reference [14]. Then, by applying our "MCSP circuit lower bounds from local PRGs" framework, we get the desired lower bounds.

MCSP Lower Bounds
Against AC 0 . We use a local PRG against AC 0 to get MCSP lower bounds. To get a lower bound matching the one in Theorem 3, we can use the state-of-the-art PRG against AC 0 by Trevisan and Xue [31], which has a seed length of (log s) d +O (1) for size-s depth-d AC 0 circuits. By a careful analysis of the construction of this PRG, we can show that the Trevisan-Xue PRG is strongly local and can be used to get an MCSP lower bound that is close to the one stated in Theorem 3. However, in this article, we will present a more direct proof of such a lower bound by using the pseudorandom switching lemma for constant-depth circuits, which is due to Trevisan and Xue [31] as well, and is a key ingredient in their PRG.
The idea is to show that for any small-depth circuit of size less than the claimed lower bound, there is some locally computable restriction that turns the circuit into a constant function, but leaves many variables unrestricted. However, MCSP cannot be constant under such a restriction, because depending on the partial assignment to the unrestricted variables, the resulting input function (which is composed of the restriction and the partial assignment) can be either easy or hard. Such an approach based on pseudorandom restrictions can also be applied to the special case of depth-2 circuits to get optimal CNF (and DNF) lower bounds for MCSP.

Remainder of the Article
We give the necessary background in Section 2. In Section 3, we describe our framework of using local PRGs to obtain lower bounds for MCSP. We prove the almost-cubic de Morgan formula lower bound for MCSP (Theorem 1) in Section 4 and the almost-quadratic lower bounds against formulas over an arbitrary basis and branching programs (Theorem 2) in Section 5. The improved AC 0 lower bounds for MCSP (Theorems 3 and 4) are proved in Section 6. In Section 7, we discuss the framework of proving MCSP lower bounds from average-case hardness. Finally, we give some open problems in Section 8.

Notation
For any computational model, we use the term size to refer to its complexity measure. For example, if the model is circuits of some fixed depth, then the size is the number of gates in the circuit.
For a positive integer n that is a power of two, 4 we use the following notation: • [n] denotes the set {1, . . . , n}. We will sometimes identify [n] with {0, 1} log n , in a natural way. • F n denotes the field with n elements. Again, we will sometimes identify F n with {0, 1} log n where the elements in F n are represented by (log n)-bit strings. • U n denotes the uniform distribution over {0, 1} n . • We use O (·) to hide polylogarithmic factors. That is, for any f : denotes the truth table of f , and CC( f ) denotes its circuit complexity, that is, the size of the smallest Boolean circuit that computes f .

Pseudorandomness
Definition 5 (Pseudorandom generators). Let G : {0, 1} r → {0, 1} n be a function, F be a class of Boolean functions, and 0 < ε < 1. We say that G is a pseudorandom generator of seed length r that ε-fools F if, for every function f ∈ F , it is the case that A multidimensional distribution is called k-wise independent if any k coordinates of the distribution are uniformly distributed.

Definition 6 (k-wise independence).
A distribution X over [m] n is called k-wise independent with parameter p if, for any 1 ≤ i 1 ≤ · · · ≤ i k ≤ n and every b 1 , . . . ,b k ∈ [m], we have If k = 2, then we call this distribution pair-wise independent with parameter p. If p = 1/m, then we just refer to this distribution as k-wise independent.
We will need the following concentration bound for k-wise independent distributions, which is an application of Cantelli's inequality.
Proposition 7. For any 0 < p < 1, let X 1 , . . . , X n be pair-wise independent variables over {0, 1} such that Pr[X i = 1] = p for each i ∈ [n]. Then, it is the case that The following simple fact will be convenient for us.
Lemma 8. Let X and Y be two random variables that take values in {0, 1} and E be some event. If Proof. We have Then, The fact E[Y ] − E[X ] ≤ ε 1 + ε 2 can be similarly shown.

Random Restrictions
A restriction for a n-variate Boolean function f , usually denoted as ρ ∈ {0, 1, * } n , specifies a way of fixing the values of some subset of variables for f . That is, if ρ i is * , we leave the ith variable unrestricted and otherwise fix its value to be ρ i ∈ {0, 1}. We denote by f ρ the restricted function after the variables are restricted according to ρ, and denote by ρ −1 ( * ) the set of unrestricted variables. A random restriction is then a distribution over restrictions. We will often view sampling a random restriction as a two-step process: The first step is selecting (in some random manner) a subset of unrestricted variables (also called the "star" or " * " variables) and the second step is fixing (in some random manner) the values of all the other variables. Then, a random restriction over n variables can also be specified by a pair (σ , β ) ∈ {0, 1} n × {0, 1} n , where σ (as a characteristic string) specifies the set of unrestricted variables, and β specifies the values for fixing the restricted variables.
We say that a random restriction (or random selection) is p-regular if each variable is left unrestricted with probability p. One way to generate a p-regular random restriction is to leave each variable, independently, unrestricted with probability p, and otherwise assign to it a 0 or a 1, uniformly at random. Such a random restriction is called a (truly) p-random restriction. Note that to sample such a restriction, we can first pick a string in {0, 1} n ·log(1/p ) [1/p] n to specify the selection of the unrestricted variables, where a coordinate is unrestricted if and only if all of its corresponding log(1/p) bits are 0, and then a string in {0, 1} n to specify the values assigned to each of the restricted variables. So sampling a restriction in this way requires n · log(1/p) + n random bits. We can also generate a restriction in a pseudorandom manner, which may use fewer random bits. For example, one way to do this is to use a limited-independence distribution, so that each variable is set to be unrestricted with probability p, and any k of the variables are independent. Note that such a "pseudorandom selection" can be obtained using a k-wise independent distribution on [1/p] n . Also, we can let each variable be assigned a 0 or a 1 uniformly at random in a way such that any k of the variables are independent; this again can be done using a k-wise independent distribution on {0, 1} n .
Finally, note that we can also get a restriction by combining a sequence of restrictions ρ 1 , . . . , ρ t , in a natural way, namely, by applying the sub-restrictions one by one. In this case, we write the final restriction as ρ 1 • · · · • ρ t .

Simple Facts About Boolean Circuits
We refer to a textbook [15] for a general introduction to Boolean circuits. Lemma 11. For any integer t > 0, there exists a circuit C of size O (t ) such that, given any string x ∈ {0, 1} t , the circuit does the following: • If x = 0 t , then C outputs (0, 0 log t ).
• If x 0 t , then C outputs (1, q), where q ∈ {0, 1} log t is the index of the first bit in x that is not 0.
Proof. Define z (0) = 0, 0 log t and z (i ) , for any i = 1, . . . , t, recursively as follows: Note that each z (i ) can be computed in polylog(t ) size given z (i−1) . Using a circuit of size O (t ), we can compute z (t ) , which is our output.
The following circuit upper bound for the addressing (storage access) function is well-known (see, e.g., Reference [35]); we include a proof for completeness. Lemma 12. For any integers t, m > 0, there exists a circuit of size O (t · m) such that, given any string y = (y 1 , . . . ,y t ), where y i ∈ {0, 1} m , for each i, and an index i ∈ {0, 1} log t , the circuit outputs y i .
Proof. We first look at the first bit (i.e., the least significant bit in binary) of i and output either the first half of y (i.e., y 1 , . . . ,y t /2 ), if the first bit is 0, or the second half (i.e., y (t /2)+1 , . . . ,y t ), if the first bit is 1; denote this output as y (1) . This can be done by a circuit of size c · t · m, for some constant c > 0. Then, we look at the second bit of i and output either the first half or the second half of y (1) , denoted as y (2) . This can be done by a circuit of size c · t · m/2. We repeat the above process log t times, in total, until we get y (log t ) , which is y i . The circuit complexity of this procedure is

THE "MCSP CIRCUIT LOWER BOUNDS FROM LOCAL PRGs" FRAMEWORK
We first describe how to use local PRGs to obtain circuit lower bounds for MCSP.  Remark 14. Definition 13 is a notable departure from earlier work on PRGs, in that there is no requirement that a local PRG is easy to compute. Instead, the utility of the PRG is derived from the requirement that each of the functions д z is easy to compute. Proof. Let C be a device in the computational model such that C computes MCSP on truth tables of length N . Suppose C has size s, and let G be a (N , s, λ(N , s))-local PRG against C with some seed length r .
For the sake of contradiction, suppose that On the one hand, since most functions require circuits of size greater than N c log N (Theorem 10) and C computes MCSP, we have It is easy to see that a local hitting set generator (HSG) is sufficient for the above argument to work. HSGs are a weak version of PRGs with the following property: For every function f in the class, if f accepts many of its inputs, then a HSG outputs such an input for at least one of its seeds.

ALMOST-CUBIC DE MORGAN FORMULA LOWER BOUNDS FOR MCSP
In this section, we present our almost-cubic de Morgan formula lower bound for MCSP. By saying "formula" within this section, we refer to formulas over the de Morgan basis (AND, OR, and NOT). By the size of a formula, we mean its usual leaf complexity, i.e., the number of leaves in the tree representation of the formula.
Theorem 16 (Theorem 1, restated). Any de Morgan formula computing MCSP on truth tables of length N must have size at least N 3 /2 O (log 2/3 N ) .
We will construct a strongly local PRG useful against sub-cubic formulas. That is, given as input an index j, the jth bit of the PRG can be computed by a circuit of size that is comparable to its seed length, which in our case is around s 1/3 for size s formulas.
Given the local PRG in Lemma 17, we can combine it with our Theorem 15 to obtain a formula lower bound for MCSP.
Proof of Theorem 16. Let s ≤ N 3 be such that MCSP on truth tables of length N can be computed by some formula of size s. By Theorem 15 and Lemma 17, we have The rest of this section is devoted to proving Lemma 17.

Almost-linear-size k-Independent Generators
The PRG in Lemma 17 will use k-wise independent distributions. Recall that a multidimensional distribution is called k-wise independent if any k coordinates of the distribution are uniformly distributed (see Definition 6).
A k-independent generator is a function from binary strings to binary strings that takes as input a random seed and stretches that seed to a string that follows a k-wise independent distribution. We will need efficient and local constructions for k-independent generators as well as some other pseudorandom objects. These objects can be constructed using finite fields; we need the following result, which says that finite field arithmetic can be performed by almost-linear-size circuits.
Fact 18 (See, e.g., References [8,34]). For any integer > 0, let the elements in F 2 be represented by -bit strings. Then, addition over F 2 can be performed by a circuit of size O ( ) and multiplication over F 2 can be performed by a circuit of size O ( ).
We now describe an efficient construction for k-independent generators, using the fact that finite field arithmetic can be done using almost linear-size circuits.
Lemma 19. For any integer k > 0, there exists a k-independent generator G : {0, 1} r → [m] n , with r = k · max{log n, log m}, such that the following holds. There exists a circuit of size such that, given j ∈ {0, 1} log n and a seed z ∈ {0, 1} r , the circuit computes the jth coordinate of G (z).
Proof. Let n = max{n, m} and suppose n = 2 . We view the elements in F n as -bit strings. Consider the following function д : It is known (see Reference [32,Proposition 3.33]) that the function G : F k n → F n n given as Using Fact 18 it is easy to implement a circuit of size k · O ( ) that computes д(j, z). Note that to get an output in [m], we can simply output the first log m bits of G (z) j , since the field has characteristic 2.

Almost-linear-size Extractors
Our PRG will make use of randomness extractors. Here, we describe an extractor that is computable by a circuit of size that is almost linear in the length of its input. We start by reviewing some basic definitions regarding extractors.
Definition 20 (ε-closeness and statistical distance). Let 0 ≤ ε ≤ 1. We say two distributions X and Y (over some universe D) are ε-close if their statistical distance, defined as

Definition 22 (Extractors). A function
We now state the extractor, which for a high min-entropy source extracts a constant fraction of the min-entropy, using seeds of polylogarithimic length. The construction and circuit complexity of this extractor are presented in Appendix A.

Strongly Local PRG Useful Against Sub-cubic de Morgan Formulas
For a formula F , let L(F ) denote the size (which is measured by the number of leaves) of F . We need the following pseudorandom shrinkage lemma for de Morgan formulas, which says that there exists a p-regular restriction, where the unrestricted variables are selected pseudorandomly and the restricted variables are fixed truly randomly, such that with high probability the size of the restricted formula will "shrink" by a factor of p 2 . de Morgan formula F on N variables of size s, there exists a p-regular pseudorandom selection D over N variables that is samplable using r = 2 O (log 2/3 s ) random bits, such that Moreover, there exists a circuit of size 2 O (log 2/3 s ) such that, given j ∈ {0, 1} log N and a seed z ∈ {0, 1} r , the circuit computes the jth coordinate of D (z).
We are now ready to show our PRG in Lemma 17.
Proof of Lemma 17. The construction is as follows: We first sample a p-regular pseudorandom selection from Lemma 24. Then, we fill the star coordinates, specified by the pseudorandom selection, in the output string with the output of some extractor that takes a min-entropy source sample and a short seed. (More precisely, the star coordinates are filled with the output of some limitedindependence generator that takes the output of an extractor as a seed.) We then sample another pseudorandom selection and fill the star coordinates specified by this pseudorandom selection but this time only for those that have not been filled in previous steps, again with the output of the same extractor using the same min-entropy source sample but a different short seed. We continue this way until all the coordinates are filled.
More formally, our PRG uses the following parameters 6 : • p = 1/s 1/3 , the expected fraction of unrestricted variables in each of the pseudorandom selections; • ε = 1/poly(N ) and ε 0 = ε/(10t ), which specify the error of the PRG; • t = ln(4N /ε)/p = s 1/3 · O (log N ), the number of steps needed so that all the coordinates will be filled with probability 1 − ε/4; , the size of the formula after being simplified by a pseudorandom restriction; • k ≥ s 0 = s 1/3 · 2 O (log 2/3 s ) , the amount of independence needed to fool the simplified formula, and r k = k · log N the seed length for the k-independent generator; • ℵ, the length of the min-entropy source for the extractor, which is such that ℵ ≥ 2 · log(1/ε 0 ) + c · s 0 · log s 0 , where c > 0 is some constant, and that Ω(ℵ) ≥ r k . We can take , the seed length of the extractor; • = 2 O (log 2/3 s ) , the number of random bits for sampling a pseudorandom selection.
, is the seed of an extractor, and , is the seed for sampling a pseudorandom selection.
The construction of the PRG proceeds in the following two stages. Stage 1. Compute a sequence of t p-regular pseudorandom selections is the set of star coordinates, in the ith pseudorandom selection, that did not appear in the preceding sets S 1 , . . . , S i−1 . Also, ∧ denotes a coordinate-wise AND operation (i.e., coordinate-wise multiplication of Boolean vectors) and is a coordinate-wise OR operation.
Correctness. Next, we show that the above PRG ε-fools N -variate formulas of size s. First, note that, by our choice of t, with probability 1 − ε/4, S = S 1 ∪ · · · ∪ S t covers all N coordinates. To proceed, we will use a hybrid argument. Let G denote the distribution given by the PRG described above. Let U be the uniform distribution. Note that if in the above construction we replace Z i , for all i ∈ [t], with U , then we would get a uniform distribution. Now we can start from there, and we will gradually replace U with the Z i 's step-by-step for a total of t steps. We will argue that after each replacement step, the expected value of the function does not change by much. Let B i be the distribution where we have replaced U with Z i in the first i steps and S = [N ]. That is, Note that using the distributions would B i require that S = [N ] and this could result in dependencies among the sets S i . This is the reason for introducing the distributions A i ; we shall later make use of the fact that the selections σ i , that come up in the definitions of the sets S i , are independent.
Now, for the sake of contradiction, suppose there exists a size-s formula f on N variables such that Let us say that both the expectations in Equation (1) are over and we remove the absolute value without loss of generality. Then, we have , and let f be the random function (where the randomness is over W i ) defined as . That is, f is the restricted function after the first i steps. Then, the left-hand side of Equation (2) becomes Note that, at this point, we can view ρ i+1 = (σ i+1 , U ) as a pseudorandom restriction (in the sense of Lemma 24) applied to f . Next, let f be the random function defined as the restricted function of f under ρ i+1 (note that the randomness is over W i , and also the pseudorandom restriction ρ i+1 ). Now Equation (3) becomes Note that in the above, we abuse the notation and use U and Z i+1 to denote U | S i +1 and Z i+1 | S i +1 , respectively. Next, we want to show that the difference between the two expectations in Equation (4) is at most 3ε 0 = 3ε/ (10t ) ≤ ε/ (2t ), which would give a contradiction, by Equation (2). The intuition is the following. On the one hand, f is obtained by a pseudorandom restriction ρ i+1 , and so, with high probability, it has size at most s 0 . On the other hand, Z i+1 is obtained using an extractor that is supposed to extract enough random bits for an s 0 -independent generator.
The issue, however, is that f depends on X , the source sample of the extractor. Therefore, f may contain information about X , so that X is not truly random anymore. Nonetheless, being a formula of size at most s 0 , f cannot contain too much information, and so cannot take too much entropy away from X . We make this argument more formal next.
Let us define the set of good functions for f , namely, where c is some constant. Let E denote the event f ∈ F . We first show the following.
Proof of Claim 25. We have Note that, by the pseudorandom shrinkage lemma (Lemma 24), we have in fact, our choices of s 0 and ε 0 were informed by our intention to make the above inequality hold. Also note that under the condition that L( f ) ≤ s 0 , there can be at most s O (s 0 ) 0 choices for f , since a formula of size s 0 can be specified using O (s 0 log s 0 ) bits (Proposition 9). Therefore, Let us now analyze Equation (4) while conditioning on the event E. We show the following.
Proof of Claim 26. First note that conditioning on E, X still has a large min-entropy. More precisely, for every д ∈ F it is the case that This is because, for every x, we have Then, by the definition of the extractor, we have since s 0 -wise independent distributions fool size-s 0 formulas.
Combining Claim 25, Claim 26, and Lemma 8, we get that the quantity in Equation (4) is at most 3ε 0 , which leads to a contradiction. This completes the proof of the correctness.
Locality. To see that the jth bit of the PRG can be computed using a circuit of size s 1/3 · 2 O (log 2/3 s ) , we observe the following equivalent construction: (1) Compute the jth bits of the t pseudorandom selections (σ 1 ) j , . . . , (σ t ) j .
(2) Retrieve Y q , where q is the smallest integer such that σ q j is a star. (3) Compute (Z q ) j = G k (E (X , Y q )) j as the jth bit of the PRG.
Note that Step 1 can be done using a circuit of size t · 2 O (log 2/3 s ) = s 1/3 · 2 O (log 2/3 s ) , by the pseudorandom shrinkage lemma (Lemma 24). Also, Step 2 can be done by first computing q from the sequence ((σ i ) j ) i ∈[t ] using a circuit of size O (t ) (Lemma 11), and then outputting Y q from (Y i ) i ∈[t ] using a circuit of size t · polylog(N ) (Lemma 12). Finally, Step 3 can be done by a circuit of size O (ℵ) using the efficient extractor (Lemma 23) and the limited-independence generator (Lemma 19).

ALMOST-QUADRATIC LOWER BOUNDS AGAINST ARBITRARY BASIS FORMULAS AND BRANCHING PROGRAMS
Here, we prove MCSP lower bounds against formulas, over an arbitrary basis, and branching programs. These lower bounds are obtained similarly to those for de Morgan formulas in the previous section. The idea is to construct strongly local PRGs against these models by modifying the PRGs in Reference [14].
The following pseudorandom shrinkage lemma for formulas over an arbitrary basis as well as branching programs is an analogue of Lemma 24.

Lemma 27 ([14, Lemma 4.2 and Lemma 5.3]).
There exists a constant c 0 > 0 such that the following holds. For any constant c > c 0 and any s ≥ N , let p = s −1/2 and F be a formula over any basis (or a branching program) on N variables of size s; then, there exists a p-regular pseudorandom selection D over N variables that is samplable using r = polylog(N ) random bits such that Moreover, there exists a circuit of size 2 O (log 2/3 s ) such that, given j ∈ {0, 1} log N and a seed z ∈ {0, 1} r , the circuit computes the jth bit of D (z).
Using the above pseudorandom shrinkage lemma and an argument as in the proof of the strongly local PRG against de Morgan formulas (Lemma 17), we get the following local PRGs. The MCSP lower bound in Theorem 2 follows from Lemma 28 and Theorem 15.

IMPROVED AC 0 LOWER BOUNDS FOR MCSP
In this section, we show improved lower bounds for MCSP against constant-depth circuits.

The Case of Depth d > 2
We first show an improved lower bound against depth-d circuits that almost matches the lower bound for PARITY.
The above result is proved using the following structural property of small-depth circuits, which says that, for any such circuit, there exists some locally computable restriction that simplifies the circuits to a constant while leaving many variables unrestricted.

Lemma 30. For any size-s depth-d circuit C, there exists a restriction
• there exists a circuit of size d · log(N ) · O log 3 s such that, given j ∈ {0, 1} log N , the circuit computes the jth coordinate of ρ.
We now prove Theorem 29 using Lemma 30.
Proof of Theorem 29. Let C be a depth-d AC 0 circuit on {0, 1} N × {0, 1} log N such that C computes MCSP on truth tables of length N , and let s be the size of C.
For a size parameter λ = d · log(N ) · O (log 3 s), let C = C (·, λ). Let ρ be a restriction from Lemma 30 for C . By Lemma 30, we have that C ρ is a constant function. First, note that To see this, note that where C computes MCSP and f : {0, 1} log N → {0, 1} is the following: By Item 3 of Lemma 30, such a function f can be computed by a λ-size circuit. However, there can be 2 |ρ −1 ( * ) | different functions corresponding to the different partial assignments to the unrestricted variables. Since there are at most 2 O (λ log λ) different circuits of size at most λ, for C ρ to be constant and equal to 1, we must have which, by a simple calculation, implies s = 2 Ω(N 1/(d +1+γ ) ) , for any constant γ > 0.
The proof of Lemma 30 uses the pseudorandom switching lemma due to Trevisan and Xue [31], which we revisit below. The (pseudorandom) switching lemma says that a depth-2 circuit is likely to be simplified after being hit by a (pseudo)random restriction.
Below, when we refer to the size of a DNF or CNF, we mean the number of its terms or clauses, respectively.
Lemma 31 (Pseudorandom Switching Lemma, Reference [31,Lemma 7]). For any integers t, w > 0, s ≥ N , and any 0 < p, ε 0 < 1, let F be an N -variate w-CNF or w-DNF of size s, and let D be a distribution over {0, Lemma 32 (Following Reference [31,Theorem 11]). For any integers d, t > 0, s ≥ N , and any (480/N ) 1/(d −2) < p < 1 and 0 < ε 0 < 1, there exists a distribution D over {0, 1} N ·log(1/p ) × {0, 1} N for sampling a pseudorandom restriction such that • for any size-s depth-d circuit C on N variables, we have that Pr ρ∼D C ρ is not a t-DNF or t-CNF ≤ s · 2 2t +1 · (10p log s) t + ε 0 · 2 (t +1)(2t +log s ) , • with probability at least 2/3 the number of unrestricted variables is Proof (sketch). The proof is similar to that of Theorem 11 in Reference [31]. The idea is to apply the pseudorandom switching lemma (Lemma 31) repeatedly. Each time, we sample a pseudorandom restriction using some distribution that ε 0 -fools CNFs of size s 0 = s · 2 w ·(log(1/p )+1) , for w = t. By Lemma 31, each time, with high probability, the two bottom layers can be computed by depth-t decision trees, so we can switch them to t-DNFs or t-CNFs, and hence reduce the depth of the circuit by one as we merge them with the layer above.
One difference here from the argument in Reference [31] is that we only apply the pseudorandom switching lemma d − 1 times, instead of d times, since we only need the final restricted circuit to be a t-DNF or t-CNF (rather than a depth-t decision tree as in the original statement of Reference [31], which requires an additional application of the pseudorandom switching lemma). Note that we use parameter p = 1/40 for the first iteration. Another difference is that, to sample a pseudorandom restriction, we use a k-wise independent distribution (say over [1/p] 2N ), instead of using the PRG against depth-2 circuits in Reference [6], where and we use the fact that such a k-wise independent distribution ε 0 -fools s 0 -clause CNFs [29, Theorem 22]. 7 Note that the expected number of unrestricted variables is Then Item 2 follows from the fact that the random restriction is pair-wise independent and Proposition 7.
Finally, it is easy to get Item 3 using Lemma 19.
We are now ready to show Lemma 30.
Proof of Lemma 30. By Lemma 32, using the parameters t = O (log s), p = 1/O (log s), and ε 0 = 1/2 O (log 2 s ) , we get a restriction ρ 0 such that the circuit restricted by ρ 0 is a width-O (log s) DNF or CNF, with probability at least 1 − 1/poly(N ). Note that, by Item 2 of Lemma 32, ρ 0 leaves at least N O (log s ) d −2 variables unrestricted, with constant probability. Therefore, by a union bound, with some constant probability, we get a restriction ρ 0 that both simplifies the circuit to be a width-(log s) DNF or CNF and that leaves N O (log s ) d −2 variables unrestricted. Note that once we have such a restriction, we can make the restricted circuit constant by further fixing at most log s variables; denote this restriction by ρ 1 . The final restriction is ρ = ρ 0 • ρ 1 .
We now show the last item. Note that our final restriction consists of two parts, ρ 0 and ρ 1 , where ρ 0 is a restriction from Lemma 32 and ρ 1 is a restriction that fixes log s variables. To compute the final restriction, given an index j ∈ {0, 1} log N , we can first check if the jth variable is fixed by ρ 1 and output the fixing value if it is the case. This can be done by hard-wiring the log s variables that are fixed by ρ 1 and their corresponding fixing values. It is easy to see that the above can be done using a circuit of size at most O (log s · log N ). Otherwise, we can output the jth coordinate of ρ 0 , which can be done with a circuit of size d · log(N ) · O log 3 s , by Item 3 of Lemma 32.

The Case of Depth 2
Here, we show that computing MCSP requires depth-2 circuits of almost maximum size. 8 Theorem 33 (Theorem 4, restated). Any CNF or DNF computing MCSP on truth tables of length N must have size 2 Ω(N ) .
To prove Theorem 33, we will utilize the following lemma and corollary. Lemma 34. Let 0 < δ < 1. Any N -variate CNF or DNF of width s ≤ 2 δ N can be fixed to a constant by applying a restriction that sets O ( √ δ N ) variables.
Proof. We will show the lemma for the case of DNFs. The proof can be easily adapted to the case of CNFs. Fix a constant A := 1/ √ δ . We choose the restriction in question in two phases. In Phase 1, we show that we can set at most O ( √ δ N ) variables to get the width of the DNF down to A log s = √ δ N . In Phase 2, we can easily set any term of the remaining DNF to fix the function. Since Phase 2 is trivial, let us henceforth focus on Phase 1.
To this end, imagine that we choose a uniformly random input variable x i (for some i) and set it to a random value. If T is any term in the DNF of size greater than A log s, then T is set to 0 with probability at least A log s 2N .
Repeating this process t := 2 √ δ N times, we see that the probability that T survives is at most By a union bound over the (at most) s terms of the DNF in question, there is a restriction ρ that restricts O ( √ δ N ) variables such that ρ sets all the terms of width greater than A log s to 0. This completes Phase 1. We are now able to prove the main result of this subsection (Theorem 33) by using Corollary 35.
Proof of Theorem 33 (sketch). One may prove Theorem 33 by using Corollary 35 exactly as we did in the proof of Theorem 29 with Lemma 30.

MCSP CIRCUIT LOWER BOUNDS FROM AVERAGE-CASE HARD FUNCTIONS 7.1 The Nisan-Wigderson Generator
It is well known in the field of derandomization that if we have a function that is average-case hard against some circuit class C, then we can get a PRG for C by plugging the hard function into the Nisan-Wigderson framework [20] (provided that the hard function is not too hard to compute and that C satisfies some mild conditions). The construction involves computing some combinatorial design with some suitably chosen parameters; a design is a list of subsets (over some universe) that have some combinatorial properties (see Definition 36). Also, to compute a single bit of such a PRG, we need to compute the corresponding subset of the design. There are known design constructions such that any single subset of the design can be computed efficiently and locally (without computing the whole design). Therefore, using such a local design, we can get a locally computable PRG, which can be used to obtain an MCSP lower bound against C.
The idea of using Nisan-Wigderson PRGs to study MCSP and related problems has been explored before (e.g. References [2,12,23]). However, the previous works were content with the fact that the output of a PRG has circuit complexity at most polynomial in the seed length. Here, we provide a more fine-grained analysis of the local complexity of the Nisan-Wigderson PRG, which depends on the parameters that we choose for the design, and in turn will depend on the "usefulness" of the average-case hard function. This allows us to turn average-case hardness against some circuit class C into a lower bound for MCSP against the same class, where such a lower bound is more quantitatively linked to the average-case hardness.
We first review the Nisan-Wigderson framework.

Lemma 37 (Local Designs). For any positive integers
Proof. Consider the field F with elements. We identify the universe [r ] with F × F of size 2 . Let {e 1 , . . . , e } be the elements of the field (in lexicographic order). For each j ∈ {0, 1} log N , we view j as an element in [ ] α +1 and identify it with a degree-α polynomial p j ∈ F [x]. Let S j = e 1 , p j (e 1 ) , . . . , e , p j (e ) .
Note that, for all j, the set S j is a subset of F × F , the set S j has size , and for two different sets S j and S k , we have that |S j ∩ S k | ≤ α, as the difference p j − p k is a polynomial of degree at most α, and thus has at most α roots.
Note that we can hard-wire (e k , e 2 k , . . . , e α k ) into some circuit, for all k ∈ [ ], by using size · α · log = O ( ). Then computing p j (e k ), for any k, can be done with a circuit of size α · O (log ) (using Fact 18). As a result, S j can be computed in size Once we have the set S j , we can divide the input z into equal-size blocks. For each element (a, b) in S j , we output the bth bit of the ath block, using Lemma 12, in O ( ) size. Then, computing z| S j takes size · O ( ) = O (N 2/(α +1) ).
Definition 38 (Average-case Hardness). Let C be a class of circuits on N variables. We say that a function f is (s, ε)-hard against C if, for every C ∈ C of size s, it is the case that Let DNF α denote the class of DNF circuits on α variables. Note that every α-variate Boolean function can be computed by a DNF of size at most 2 α .
Theorem 39 (Nisan-Wigderson generator [20]). Let C be a class of circuits on N variables of size s. Let S 1 , . . . , S N be a (N , r , , α )-design, and let f : is a PRG that ε-fools C.
Combining Theorem 39 with the design construction in Lemma 37, we immediately get the following.
Theorem 40 (Local Nisan-Wigderson Generator). Let C be a class of circuits on N variables of size s. For any α = α (N , s), if there exists a function f : then there exists a (N , s, λ(N , s) We remark that the above local Nisan-Wigderson generator has local complexity that is comparable to its seed length (for this particular local design and modulo the circuit complexity of the hard function).

Applications
Next, we demonstrate the use of such local PRGs in obtaining lower bounds for MCSP from average-case hardness results.
One of the restricted circuit classes that have been well studied in circuit complexity is the class of constant-depth circuits augmented with few SYM (symmetric) or THR (linear threshold) gates (see, e.g., References [17,18,25,33]). A SYM gate computes a symmetric function, which is a Boolean function whose output depends only on the sum of its input variables. A THR gate computes a linear threshold function, which is a Boolean function defined as the sign of some linear form, over Boolean variables, with real coefficients. We will combine the above local Nisan-Wigderson framework with the following average-case lower bounds against the class of constantdepth circuits augmented with a few (sublinearly many) symmetric and linear threshold gates. Proof. Let C be the class of AC 0 circuits of size τ log , for some constant τ > 0, with at most 0.249 SYM or THR gates. Choose where τ > 0 is some sufficiently small constant. Then, for = N 1/(α +1) , if we can show the existence of some efficiently computable function f : We remark that the above example does not take advantage of the fact that the local complexity of the Nisan-Wigderson PRG is almost the same as its seed length. This is because, in this case, the seed length has some arbitrary constant in the exponent.
Combining Corollary 42 with Theorem 15, we get the following.
Theorem 43. There exists a constant γ > 0 such that the following hold. Let C be the class of constant-depth AC 0 circuits augmented with at most 2 γ √ log N SYM or THR gates. Then, any circuit in C computing MCSP on truth tables of length N must have size N Ω(log N ) .
As another application of our framework, combined with the Nisan-Wigderson generator, we show that separating P/poly (non-uniform circuits of polynomial size) from some restricted circuit class, such as TC 0 (non-uniform constant-depth polynomial-size circuits with threshold gates) or NC 1 (non-uniform polynomial-size logarithmic-depth circuits), implies MCSP lower bounds against the same class of circuits. More precisely, we show that if there exists some function in P/poly that is mildly hard against TC 0 (respectively NC 1 ), then MCSP cannot be computed by TC 0 (respectively, NC 1 ) circuits.
Theorem 44. If there exists a function in P/poly that requires size-s TC 0 (respectively, NC 1 ) circuits to compute within error 1/poly(n), for some superpolynomial size function s, then MCSP requires superpolynomial size TC 0 (respectively, NC 1 ) circuits.
Proof (sketch). Let s (n) = n ω (1) and let f = { f n } n , with f n : {0, 1} n → {0, 1}, be a function that requires size-s (n) TC 0 circuits to compute with error at most 1/poly(n). Using standard hardness amplification tools, such as the direct product theorem and the XOR lemma (see, e.g., Reference [3, Section 4]), we can amplify f to a strongly hard on average function within P/poly. By plugging f into the Nisan-Wigderson construction (Theorem 39), we get a local PRG against TC 0 ; this implies that MCSP TC 0 by Theorem 15.

OPEN PROBLEMS
Our de Morgan formula lower bound for MCSP is still slightly weaker than the state-of-the-art de Morgan formula lower bound due to Tal [28], which is Ω(N 3 /(log N · (log log N ) 2 )). Can the MCSP lower bound be improved? Are there better constructions of local PRGs against formulas? Or, are there alternative proofs that do not rely on local PRGs?
What are other restricted models of computation against which we can show MCSP lower bounds using local PRGs? The recent "random walk PRG" by Chattopadhyay et al. [4] is also local and can be used to get MCSP lower bounds. However, as a general PRG that can be used to fool a variety of restricted models, it has sub-optimal usefulness (which is determined by its seed length) compared to the best-known lower bounds for most of those models.

APPENDICES A CIRCUIT COMPLEXITY OF THE NISAN-ZUCKERMAN EXTRACTOR
In this section, we will describe the construction of the Nisan-Zuckerman extractor [21] and show that it can be computed by a circuit of almost-linear size.
In proving Lemma 23, we start with some definitions. The extractor works for sources of high min-entropy.
Definition 46 (Dense source). We say that a distribution over {0, 1} n is a δ -source if it has minentropy at least δ · n.
The extractor will make use of universal hashing, which we define below.
H is also called a universal hash family if it is 2-wise independent.
It is easy to see that any k-wise independent hashing family can be defined using some k-wise independent distribution. As a result, by Lemma 19, we have the following construction of k-wise independent hash families. The Nisan-Zuckerman extractor consists of two parts. The first part, block-wise source conversion, takes the source of high min-entropy and converts it into an almost block-wise source by building a list of "blocks." The second part, block-wise source extraction, takes the resulting block-wise source of the previous part and extracts the randomness "block-by-block," using some hash-based extractor. Next, we describe some basic component functions as well as how they are combined to perform the respective task of each part. The main focus here is around the circuit complexity of these procedures, and we will not get into details about their correctness. Interested readers are referred to Reference [21,Section 5] for details on the correctness.
In the following, we only work with δ -sources and block-wise δ -sources where δ and δ are constants.
Block-wise source converter D. This function has the following parameters: • n, the size of the original input; • δ , the quality of the input source; • 1 ≤ · · · ≤ s ≤ n, the size of each block; and • k, the amount of independence used.
We first describe how to build one block using a function that we call B. To build the ith block, on input x ∈ {0, 1} n and y i ∈ {0, 1} k log n , the function B first divides x into i contiguous disjoint sets A 1 , . . . , A i , each of size m i = n/ i . It then uses the (k log n)-bit string y i to pick, k-wise independently, j 1 , . . . , j i , where j q ∈ [m i ], for each q ∈ [ i ], and outputs the i -bit vector The block-wise source converter D works as follows.
Claim 50. The function D can be computed using a circuit of size s · k · O (n).
Proof. It is sufficient to show that outputting the ith block takes a circuit of size k · O (n). On input y i ∈ {0, 1} k ·log n , we can compute, using Lemma 19, (j 1 , . . . , Then, for each index j q , with q ∈ [ i ], we can compute (A q ) j q using a circuit of size O (m i ) (by Lemma 12).
Block-wise source extractor C. This function has s + 1 parameters: • δ , the quality of the block source, and • 1 , . . . , s , the block sizes. Here, i −1 i = 1 + δ 4 , for all 1 < i ≤ s. The way the block-wise source extractor C works is described below.
(1) Input: x 1 ∈ {0, 1} 1 , . . . , x s ∈ {0, 1} s and y 0 ∈ {0, 1} 2 s . (2) For each i, we consider a universal family of hash functions, given by Lemma 49, and each function in H i is by 2 i bits. It was shown in Reference [21] that if x 1 , . . . , x s are chosen from a block-wise δ -source and y 0 is uniform, then the output of the function C is 2 · 2 −δ s /4 -close to uniform.
Claim 51. The function C can be computed using a circuit of size s · O ( 1 ).
Proof. Note that, given h i ∈ {0, 1} 2 i and x i ∈ {0, 1} i , we can compute h i (x i ) using a circuit of size O ( i ) (by Lemma 49). Then, to compute h 0 , we need to compute h i for i = s − 1, . . . , 0, which takes a circuit of size The above is at most s · O ( 1 ), since 1 is the largest among 1 , . . . , s .
Claim 52. The function E can be computed using a circuit of size n · polylog(n/ε).
Proof. This follows easily from Claims 50 and 51.

B THE IMZ PRG IS "ALMOST STRONGLY LOCAL"
Here, we show that the IMZ PRG [14] is "almost strongly local," in the sense that, for most of its seeds, the output of the PRG can be computed by some circuit of size comparable to its seed length. It is easy to see that such a PRG is sufficient to obtain MCSP lower bounds using our framework (see Theorem 15).
We first need a version of the pseudorandom shrinkage lemma, in which we select and fix the variables both in a pseudorandom manner (note that in Lemma 24, we select the variables pseudorandomly and then fix the variables in a truly random manner). Such a pseudorandom shrinkage lemma is provided in Reference [14].
need to find the i's for which we need to compute (Z i ) j and select the corresponding Y i 's. This can be done by using divide and conquer and t "bins" with fixed polylog(N ) "slots," each of size log t, to store those indices i. Here, each slot is a set of gates that hold the bits of an index i and each bin is a set of slots.
More specifically, we will look through (ρ i ) j for i ∈ [t] and "copy" to the bin those i's for which (ρ i ) j = * . For the first step, we store the index "1" to the leftmost slot of the bin iff (ρ 1 ) j = * . At the next step, we look at the current bin and the next index, say i . We then create a new bin that holds all the indices in the previous bin and also i iff (ρ 1 ) j = * ; here, the indices are stored in the leftmost slots and the rest of the slots are marked as "empty." Since each bin is of size polylog(N ) and merging the current bin with a new index can be done in polynomial time (which implies that it can be done by a polylog(N )-size circuit), each step can be done by a circuit of size at most polylog(N ). After t steps, we will have a bin that stores all of the star indices (some of the slots in the bin can be empty). Therefore, the whole procedure can be done by a circuit of size O (t ) · polylog(N ).
Once we have the indices, we retrieve the corresponding Y i 's (using Lemma 12). We then compute the extractor on each of these Y i 's (with the same min-entropy source sample X ) and apply the limited-independence generator on the output of the extractor to get the jth bit for each of those i's. We also need to make sure that we produce only 0 for those i's that come from the "empty" slots, in the bin where the indices are stored. Once we have those bits, we XOR them and then we XOR the resulting bit with the non-star values in ((ρ i ) j ) i ∈[t ] . The XOR of the non-star values can be obtained by taking the XOR of the values in ((ρ i ) j ) i ∈[t ] and by treating the stars as 0's.

ACKNOWLEDGMENTS
We thank the anonymous ICALP'19 and TOCT reviewers for their excellent comments and suggestions. In particular, we address special thanks to one of our TOCT reviewers for improving our 2 N / O (log 2 N ) MCSP size lower bound against DNFs to an optimal 2 Ω(N ) , where N is the size of the input truth table to MCSP.