Small-depth Multilinear Formula Lower Bounds for Iterated Matrix Multiplication, with Applications

In this paper, we study the algebraic formula complexity of multiplying $d$ many $2\times 2$ matrices, denoted $\mathrm{IMM}_{d}$, and show that the well-known divide-and-conquer algorithm cannot be significantly improved at any depth, as long as the formulas are multilinear. Formally, for each depth $\Delta \leq \log d$, we show that any product-depth $\Delta$ multilinear formula for $\mathrm{IMM}_d$ must have size $\exp(\Omega(\Delta d^{1/\Delta})).$ It also follows from this that any multilinear circuit of product-depth $\Delta$ for the same polynomial of the above form must have a size of $\exp(\Omega(d^{1/\Delta})).$ In particular, any polynomial-sized multilinear formula for $\mathrm{IMM}_d$ must have depth $\Omega(\log d)$, and any polynomial-sized multilinear circuit for $\mathrm{IMM}_d$ must have depth $\Omega(\log d/\log \log d).$ Both these bounds are tight up to constant factors. 1. Depth-reduction: A well-known result of Brent (JACM 1974) implies that any formula of size $s$ can be converted to one of size $s^{O(1)}$ and depth $O(\log s)$; further, this reduction continues to hold for multilinear formulas. Our lower bound implies that any depth-reduction in the multilinear setting cannot reduce the depth to $o(\log s)$ without a superpolynomial blow-up in size. 2. Separations from general formulas: Our result, along with a non-trivial upper bound for $\mathrm{IMM}_{d}$ implied by a result of Gupta, Kamath, Kayal and Saptharishi (SICOMP 2016), shows that for any size $s$ and product-depth $\Delta = o(\log s),$ general formulas of size $s$ and product-depth $\Delta$ cannot be converted to multilinear formulas of size $s^{\omega(1)}$ and product-depth $\Delta,$ when the underlying field has characteristic zero.


Introduction
Algebraic Complexity theory is the study of the complexity of those computational problems that can be phrased as computing a multivariate polynomial f (x 1 , . . . , x N ) ∈ F[x 1 , . . . , x N ] over elements x 1 , . . . , x N ∈ F. Many central algorithmic problems such as the Determinant, Permanent, Matrix product etc. can be cast in this framework. The natural computational This lower bound strengthens a result of Nisan and Wigderson [NW97] who prove a similar lower bound in the more restricted set-multilinear setting.
Our result is also qualitatively different from the previous lower bounds for multilinear formulas since IMM 2,d does in fact have polynomial-sized formulas of product-depth O(log d) (via the divide-and-conquer approach), whereas we show a superpolynomial lower bound for product-depth o(log d). This observation leads to interesting consequences for multilinear formula complexity in general, which we now describe.
Depth Reduction. An important theme in Circuit complexity is the interplay between the size of a formula or circuit and its depth [Bre74,Spi73,VSBR83,AV08,Tav15]. In the context of algebraic formulas, a result of Brent [Bre74] says that any formula of size s can be converted into another of size s O(1) and depth O(log s). Further, the proof of this result also yields the same statement for multilinear formulas.
Can the result of Brent be improved? Theorem 1 implies that the answer is no in the multilinear setting. More precisely, since the IMM 2,d polynomial (over O(d) variables) has formulas of size poly(d) and depth O(log d) but no formulas of size d O(1) and depth o(log d) (by Theorem 1), we see that any multilinear depth-reduction procedure that reduces the depth of a size-s formula to o(log s) must incur a superpolynomial blow-up in size. This strengthens a result of Raz and Yehudayoff [RY09], whose results imply that any depth-reduction of multilinear formulas to depth o( √ log s/ log log s) should incur a superpolynomial blow-up in size. It is also an analogue in the algebraic setting of some recent results proved for Boolean circuits [Ros15,RS17].
Multilinear vs. general formulas. Shpilka and Yehudayoff [SY10] ask the question of whether general formulas can be more efficient at computing multilinear polynomials than multilinear formulas. This is an important question, since we have techniques for proving lower bounds for multilinear formulas, whereas the same question for general formulas (or even depth-3 formulas over large fields) remains wide open.
We are able to make progress towards this question here by showing a separation between the two models for small depths when the underlying field has characteristic zero. We do this by using Theorem 1 in conjunction with a (non-multilinear) formula upper bound for IMM 2,d over fields of characteristic zero due to Gupta et al. [GKKS16]. In particular, the result of Gupta et al. [GKKS16] implies that for any depth ∆, the polynomial IMM 2,d has formulas of product depth ∆ and size 2 O(∆d 1/2∆ ) , which is considerably smaller than our lower bound in the multilinear case for small ∆. From this, it follows that for any size parameter s and productdepth ∆ = o(log s), general formulas of size s and product-depth ∆ cannot be converted to multilinear formulas of size s ω(1) and product-depth ∆. Improving our result to allow for ∆ = O(log s) would resolve the question entirely.
Related Work. The multilinear formula model has been the focus of a large body of work on Algebraic circuit lower bounds. Nisan and Wigderson [NW97] proved some of the early results in this model by showing size lower bounds for small-depth set-multilinear 2 circuits computing IMM 2,d . They showed that any product-depth ∆ circuit for IMM 2,d must have a size of 2 Ω(d 1/∆ ) matching the upper bound from the divide-and-conquer algorithm for ∆ = o(log d/ log log d).
Our lower bounds for multilinear formulas imply similar lower bounds for multilinear circuits of product-depth ∆.
Raz [Raz06] proved the first superpolynomial lower bound for multilinear formulas by showing an n Ω(log n) lower bound for the n × n Determinant and Permanent polynomials. This was further strengthened by the results of Raz [Raz04] and Raz and Yehudayoff [RY08] to a similar lower bound for an explicit polynomial family that has polynomial-sized multilinear circuits. In particular, these results show the tightness of the depth-reduction procedure for algebraic circuits in the multilinear setting [VSBR83,RY08].
Similar polynomial families were also used in the work of Raz and Yehudayoff [RY09] to prove exponential lower bounds for multilinear constant-depth circuits. By proving a tight lower bound for depth-∆ circuits computing an explicit polynomial (similar to the construction of Raz [Raz04]), Raz and Yehudayoff [RY09] showed superpolynomial separations between multilinear circuits of different depths.
In particular, the result of Raz and Yehudayoff [RY09] implies that the polynomial families of [Raz04,RY08], which have formulas of size n O(log n) , cannot be computed by formulas of size less than some s(n) = n ω(log n) if the product-depth ∆ = o(log n/ log log n). This yields the superpolynomial separation between formulas of size s and depth o( √ log s/ log log s) alluded to above. Unfortunately, these polynomials also have nearly optimal formulas of depth just O(log n) = O( √ log s), so they cannot be used to obtain the optimal size s vs depth o(log s) separation we obtain here.
Dvir et al. [DMPY12] showed that there is an explicit polynomial on n variables that has multilinear ABPs of size poly(n) but no multilinear formulas of size less than n Ω(log n) . One might hope that this yields a superpolynomial lower bound for multilinear formulas computing IMM N,d for some N, d but this unfortunately does not seem to be the case. The reason for this is that while any polynomial f on n variables that has an ABP of size poly(n) can be reduced via variable substitutions to IMM N,d for N, d = n O(1) , this reduction might substitute different variables in the IMM N,d polynomial by the same variable x of f and in the process destroy multilinearity.
Gupta et al. [GKKS16] showed the surprising result that general (i.e. non-multilinear) formulas of depth-3 can beat the divide-and-conquer approach for computing IMM n,d , when the underlying field has characteristic zero. Their result implies that, in this setting, IMM n,d has product-depth 1 formulas of size n O( √ d) , as opposed to the n O(d) -sized formula that is obtained from the traditional divide-and-conquer approach. Using the self-reduction properties of IMM n,d , this can be easily seen to imply the existence of n O(∆d 1/2∆ ) -sized formulas of productdepth ∆. This construction uses the fact that the formulas are allowed to be non-multilinear. Our result shows that this cannot be avoided.
Proof Overview. The proof follows a two-step process as in [SY10,DMPY12]. The first step is a "product lemma" where we show that any multilinear polynomial f on n variables that has a small multilinear formula can also be computed as a sum of a small number of polynomials each of which is a product of many polynomials on disjoint sets of variables; if such a term is the product of t polynomials, we call it a t-product polynomial. 3 It is known [SY10,Lemma 3.5] that if f has a formula of size s, then we can ensure a decomposition into a sum of at most s many Ω(log n)-product polynomials. We show that if the formula further is known to have depth ∆ then the number of factors can be increased to Ω(∆n 1/∆ ). In particular, note that this is ω(log n) as long as ∆ = o(log n): this allows us to obtain superpolynomial lower bounds for up to this range of parameters.
Similar lemmas were already known in the small-depth setting [RY09], but they do not achieve the parameters of our lemma here. However, the lemma of [RY09] satisfies the additional condition that every factor of each t-product polynomial in the decomposition depends on a "large" number of variables. Here, we only get that each factor depends on a non-zero number of variables, but this is sufficient to prove the lower bound we want.
The second step is to use this decomposition to prove a lower bound. Specifically, we would like to say that the polynomial IMM 2,d has no small decomposition into terms of the above form. This is via a rank argument as in Raz [Raz06]. Specifically, we partition the variables X in our polynomial into two sets Y and Z and consider any polynomial f (X) as a polynomial in the variables in Y with coefficients from F[Z]. The dimension of the space of coefficients (as vectors over the base field F) is considered a measure of the complexity of f .
It is easy to come up with a partition of the underlying variable set X into Y, Z so that the complexity of IMM 2,d is as large as possible. Unfortunately, we also have simple multilinear formulas that have maximum dimension w.r.t. this partition. Hence, this notion of complexity is not by itself sufficient to prove a lower bound. At this point, we follow an idea of Raz [Raz06] and show something stronger for IMM 2,d : we show that its complexity is quite robust in the sense that it is full rank w.r.t. many different partitions.
More precisely, we carefully design a large space of restrictions ρ : X → Y ∪Z∪F such that for any restriction ρ, the resulting substitution of IMM 2,d continues to have high complexity w.r.t. the measure defined above. These restrictions are motivated by the combinatorial structure of the underlying polynomial, specifically the connection to Graph Reachability.
The last step is to show that, for any t-product polynomial f , a random restriction from the above space of restrictions transforms it with high probability into a polynomial whose measure is small. Once we have this result, it follows that given a small multilinear formula, there is a restriction that transforms each term in its decomposition (obtained from the product lemma) into a small complexity polynomial. The subadditivity of rank then shows that the entire formula now has small complexity, and hence it cannot be computing IMM 2,d which by the choice of our restriction has high complexity.

Basic setup
Unless otherwise stated, let F be an arbitrary field. Let d ∈ N a growing integer parameter. We define X (1) , . . . , X (d) to be disjoint sets of variables where each X (i) = {x (i) j,k | j, k ∈ [2]} is a set of four variables that we think of forming a 2 × 2 matrix. Let X = i∈[d] X (i) .
A polynomial P ∈ F[X] is called multilinear if the degree of P in each variable x ∈ X is at most 1. We define the multilinear polynomial IMM d ∈ F[X] as follows.  (1) 2,2 . This is a slight variant of the Iterated Matrix Multiplication polynomial seen in the literature, as it is usually defined to be either the matrix entry M (1, 1) or the trace M (1, 1) + M (2, 2). Our results can easily be seen to hold for these variants, but we deal with the definition above for some technical simplicity.
Another standard way of defining the polynomial IMM d is via graphs. Define the edgelabelled directed acyclic graph G d = (V, E, λ) as follows: the vertex set V is defined to be the Figure 1: The directed acyclic graph G d that defines the polynomial IMM d with its labeling. Figure 1 for a depiction of this graph. Given a path π in the graph G d , λ(π) is defined to be the product of all labels of edges in π. In this notation, IMM d can be seen to be the following.

Multilinear formulas and circuits
We refer the reader to the standard resources (e.g. [SY10,Sap15]) for basic definitions related to algebraic circuits and formulas. Having said that, we do make a few remarks.
• All the gates in our formulas and circuits will be allowed to have unbounded fan-in.
• The size of a formula or circuit will refer to the number of gates (including input gates) in it, and depth of the formula or circuit will refer to the maximum number of product gates on a path from the input gate to output gate.
• Further, the product-depth of the formula or circuit (as in [RY08]) will refer to the maximum number of product gates on a path from the input gate to output gate. Note that the product depth of a formula or circuit can be assumed to be within a factor of two of the overall depth (by collapsing sum gates if necessary).
Multilinear circuits and formulas. An algebraic formula F (resp. circuit C) computing a polynomial from F[X] is said to be multilinear if each gate in the formula (resp. circuit) computes a multilinear polynomial. Moreover, a formula F is said to be syntactic multilinear if for each multiplication gate Φ of F with children Ψ 1 , . . . , Ψ t , we have Supp(Ψ i ) ∩ Supp(Ψ j ) = ∅ for each i = j, where Supp(Φ) denotes the set of variables that appear in the subformula rooted at Φ. Finally, for ∆ ≥ 1, we say that a multilinear formula (resp. circuit) is a (ΣΠ) ∆ Σ formula (resp. circuit) if the output gate is a sum gate and along any path, the sum and product gates alternate, with each product gate appearing exactly ∆ times and the bottom gate being a sum gate. We can define (ΣΠ) ∆ , ΣΠΣ, ΣΠΣΠ formulas and circuits similarly.
For a gate Φ in a syntactically multilinear formula, we define a set of variables Vars(Φ) in a top-down fashion as follows.
Definition 2. Let C be a syntactically multilinear formula computing a polynomial on the variable set X. For the output gate Φ, which is a sum gate, we define Vars(Φ) = X. If Φ is a sum gate with children Ψ 1 , . . . , Ψ k and Vars(Φ) = S ⊆ X, then for each It is easy to see that Vars(·) satisfies the properties listed in the following proposition.
Proposition 3. For each gate Φ in a syntactically multilinear formula C, let Vars(Φ) be defined as in Definition 2 above.

For any gate
We will use the following structural results that convert general multilinear circuits (resp. formulas) to (ΣΠ) ∆ Σ circuits (resp. formulas).
Lemma 4 (Raz and Yehudayoff [RY09], Claims 2.3 and 2.4). For any multilinear formula F of product depth at most ∆ and size at most s, there is a syntactic multilinear (ΣΠ) ∆ Σ formula F ′ of size at most (∆ + 1) 2 · s computing the same polynomial as F .
Lemma 5 (Raz and Yehudayoff [RY09], Lemma 2.1). For any multilinear circuit C of product depth at most ∆ and size at most s, there is a syntactic multilinear (ΣΠ) ∆ Σ formula F of size at most (∆ + 1) 2 · s 2∆+1 computing the same polynomial as C.
We will also need the following structural result.
Lemma 6 (Raz, Shpilka and Yehudayoff [RSY08], Claim 5.6). Let F be a syntactic multilinear formula computing a polynomial f and let Φ be any gate in F computing a polynomial g. Then f can be written as A standard divide-and-conquer approach yields the best-known multilinear formulas and circuits for IMM d for all depths.
Proof sketch. We will first recursively construct C ∆ . Let us recall that the IMM d polynomial is defined over the matrices M (1) , M (2) , . . . , M (d) . Let us divide these matrices into t = d 1/∆ contiguous blocks of size d/t each, say B 1 , B 2 , . . . , B t . The polynomial IMM d can now be expressed in terms of those blocks of matrices as follows.
where P (i) -th entry of the product of the matrices in the i-th block. (In the special case i = 1, take u 0 = 1.) It is important to note that each of the polynomials P (i+1) All our logarithms will be to base 2. defined over the block B i+1 , for all i ∈ [t−1], is (almost) an instance of IMM d/t over the suitable set of variables. This enables us to recurse for ∆ steps while obtaining a ΣΠ layer at each step. Thus, we get the following recursive formula for the size of the (ΣΠ) ∆ circuit computing IMM d .
Upon unfurling, this recursion gives us the needed bound of Let us now construct a multilinear formula for this polynomial 5 . Consider the polynomial expression in Equation 2. If each of the polynomials P (k) Then there is a ΣΠ formula F 1 (say) that computes IMM t of size c t (for some constant c) whose leaves are labelled by the variables of the form y u i ,u i+1 . Since each of these leaves is an instance of IMM d/t (over a suitable set of variables) themselves, this can further be partitioned into t contiguous chunks of d/t 2 many matrices each. This when expressed as a ΣΠ formula (by introducing new variables) is of size c t . By substituting the formulas obtained now for each of the polynomials P (k) u i ,u i +1 into F 1 suitably to obtain a formula F 2 (say), of size c t · c t = c 2t . This is a ΣΠΣΠ formula whose leaves are variables corresponding to the instances of IMM d/t 2 . Continuing this process for ∆ steps gives us a (ΣΠ) ∆ formula F ∆ with 2 O(∆t) = 2 O(∆d 1/∆ ) many leaves.
We will show that the above bounds are nearly tight in the multilinear setting. If we remove the multilinear restriction on (ΣΠ) ∆ Σ formulas computing IMM d , we can get better upper bounds, as long as the underlying field has characteristic zero.
Proof sketch of Lemma 8. As in the proof of Lemma 7, we crucially use the self-reducibility of IMM d . We need the following claim (implicit in Gupta et al. [GKKS16]) to prove this lemma.
Claim 9. For t > 1, IMM t has a depth three non-multilinear formula of size at most 2 O( over any field of characteristic 0. Proof of Claim 9. Applying Lemma 7 with ∆ = 2 yields a ΣΠΣΠ formula F for IMM d of size 2 O( √ d) . It can be checked from the proof of Lemma 7 that this formula satisfies the additional property that all the product gates in the formula have fan-in O( √ t). Over any field F of characteristic zero 6 , Gupta et al. [GKKS16] showed that any ΣΠΣΠ formula of size s where all product gates have fan-in at most k can be converted into a ΣΠΣ formula of size poly(s) · 2 O(k) . Applying this result to the formula F obtained above, we get that IMM t can indeed be computed by a ΣΠΣ formula of size at at most 2 O( √ t) , over any field F of characteristic zero.
Consider the self reduction of the IMM d polynomial as follows. Split the d matrices being multiplied in IMM d into t = d 1/∆ blocks with d/t many matrices each. Let the variables Let IMM t (Y ) be the polynomial that is obtained by replacing all the polynomials P k i,j above with the corresponding variables. From Claim 9, we know that IMM t (Y ) has a ΣΠΣ formula F 1 of size at most c √ t for some constant c. It is easy to see that IMM d can now be obtained 5 It is important to note that simple replication of nodes in C∆ would prove to be wasteful. 6 It also works if the characteristic field F is positive but suitably large.
by substituting for each of the variables in Y (which appear at the leaves of F 1 ) with the corresponding polynomial in P. Using the above mentioned self-reducibility property, we shall self-reduce IMM d/t again and obtain an instance of IMM t over suitable set of new variables. This too has a ΣΠΣ formula of size c √ t . The total number of leaves of the new (ΣΠΣ)(ΣΠΣ) Continuing this process for ∆ steps yields us a (ΣΠΣ) ∆ formula of size 2 O(∆ √ t) = 2 O(∆d 1/(2∆) ) . We can merge two consecutive layers of Σ gates into one layer of Σ gates and thus obtain a (ΣΠ) ∆ Σ formula F ∆ of size 2 O(∆d 1/(2∆) ) .
3 Lower bounds for multilinear formulas and circuits computing IMM d The main theorem of this section is the following lower bound.
Theorem 10. Let d ≥ 1 be a growing parameter and fix any ∆ ≤ log d. Any syntactic multilinear (ΣΠ) ∆ Σ formula for IMM d must have a size of 2 Ω(∆d 1/∆ ) .
Putting together Theorem 10 with Lemmas 4 and 5, we have the following (immediate) corollaries.
Corollary 11. Let d ≥ 1 be a growing parameter and fix any ∆ ≤ log d/ log log d. Any multilinear circuit of product-depth ∆ for IMM d must have a size of 2 Ω(d 1/∆ ) . In particular, any polynomial-sized multilinear circuit for IMM d must have product-depth Ω(log d/ log log d).
Corollary 12. Let d ≥ 1 be a growing parameter and fix any ∆ ≤ log d. Any multilinear (ΣΠ) ∆ Σ formula for IMM d must have size 2 Ω(∆d 1/∆ ) . In particular, any polynomial-sized multilinear formula for IMM d must have product-depth Ω(log d).
Since the product-depth of a formula is at most its depth, Lemma 7 and Corollary 12 further imply the following. Choosing parameters carefully, we also obtain the following.
Corollary 14 (Separation of multilinear formulas and general formulas over zero characteristic). Let F be a field of characteristic zero. Let s ∈ N be any growing parameter and ∆ ∈ N be such that ∆ ≤ o(log s). There is an explicit multilinear polynomial F s,∆ such that F s,∆ has a (ΣΠ) ∆ Σ formula of size s, but any (ΣΠ) ∆ Σ multilinear formula for F s,∆ must have a size of s ω(1) .
Proof. We choose the polynomial F s,∆ to be IMM d for suitable d and then simply apply Theorem 10 and Lemma 8 to obtain the result. Details follow.
Having chosen d as above, we define F s,∆ = IMM d . Clearly, F s,∆ has a (non-multilinear) formula of product-depth ∆ and size at most s. On the other hand, by Theorem 10, any multilinear product-depth ∆ formula for IMM d must have size at least 2 Ω(∆d 1/∆ ) = s Ω(d 1/2∆ ) = s Ω(f (s)) = s ω(1) , which proves the claim.
It can also be proved similarly that for d as chosen above, IMM d in fact has no multilinear formulas of size s O(1) and product-depth up to (2 − ε)∆ for any absolute constant ε.

Proof of Theorem 10
Our proof follows a two-step argument as in [Raz06,RY09] (see the exposition in [SY10, Section 3.6]).

Step1 -The product lemma
The first step is a "product-lemma" for multilinear formulas.
Formally, define a polynomial f ∈ F[X] to be a t-product polynomial if we can write f as f 1 · · · f t , where we can find a partition of X into non-empty sets X f 1 , . . . , X f t such that f i is a multilinear polynomial from F[X f i ]. 7 We say that X f i is the set ascribed to f i in the t-product polynomial f . We use Vars(f i ) (with a slight abuse of notation) 8 to denote X f i . We drop f from the superscript if f is clear from the context.
We define f ∈ F[X] to be r-simple if f = L 1 · · · L r ′ · G, where r ′ ≤ r, is an (r ′ + 1)-product polynomial where L 1 , . . . , L r ′ are polynomials of degree at most 1, the sets X f 1 , . . . , X f r ′ ascribed to these linear polynomials satisfy i≤r ′ X f i ≥ 400r. We prove the following.
can be computed by a syntactic multilinear (ΣΠ) ∆ Σ formula F of size at most s. Then, f is the sum of at most s many t-product polynomials and at most s many t-simple polynomials for t = Ω(∆d 1/∆ ).
While our proof of the product lemma is motivated by earlier work [SY10, HY11, RY09], we give slightly better parameters, which turns out to be crucial for proving tight lower bounds for formulas. In particular, [RY09, Claim 5.5] yields the above with t = Ω(d 1/∆ ).
Proof of Lemma 15. Let F be the (ΣΠ) ∆ Σ syntactic multilinear formula of size at most s computing f . We use layer i to denote the layer at distance i from the leaves. So in our formula, layer 1 is a sum layer, layer 2 is a product layer and so on. Let r = ∆d 1/∆ /400.
We will prove by induction on the size s of the formula F that f is the sum of at most s polynomials, each of which is either a t-product polynomial or a t-simple polynomial for t = ∆d 1/∆ /1000.
The base case of the induction, corresponding to s = 0, is trivial.
Case 1: Suppose there exists a gate Φ in layer 2 such that Φ computes a polynomial g and has fan-in at least t. Then we use Lemma 6 and decompose f as Ag + B. Here Ag is a t-product polynomial. Since B is computed by a formula of size at most s − 1, we are done by induction.
Case 2: Suppose the above case does not hold, i.e. all the gates at layer 2 have a fanin of at most t. Now, if there exists a gate Φ in layer 2 such that |Vars(Φ)| ≥ 400r then we will decompose F using Lemma 6 and obtain f = Ag + H, where Ag is t-simple since |Vars(Φ)| ≥ 400r ≥ 400t. Again, since H has a formula of size at most s − 1, and we are done by induction.
Case 3: Now assume that neither of the above cases is applicable. Since neither Case 1 nor Case 2 above is applicable to F , each gate Φ in layer 2 satisfies |Vars(Φ)| < p := 400r. This immediately implies that ∆ ≥ 2, since in the case of a ΣΠΣ formula, we have |Vars(Φ)| = n by Proposition 3 item 2 but p = 400r ≤ d < n.
If ∆ ≥ 2, we use the following lemma.
The above lemma is applicable in our situation since we have ∆ ≤ log d, n ≥ 2d, and hence (n/p) = (n/400r) = n/( Lemma 16 now yields a decomposition of f as a sum of at most s many T -product polynomials where Since T ≥ t, these T -product polynomials are also t-product polynomials. This finishes the proof of the claim modulo the proof of Lemma 16, which we present below. Proof of Lemma 16. We shall prove by induction on the depth ∆ that we can take T = t(n, ∆) = (∆ − 1) (n/p) 1/(∆−1) − 1 . Since ∆ ≤ 2 log(n/p), this implies that T ≥ ∆(n/p) 1/(∆−1) /100. Let X denote the set of all n underlying variables. The base case is when ∆ = 2. Here, we have a ΣΠΣΠΣ formula such that for all Φ at layer 2, |Vars(Φ)| ≤ p. Let Ψ be the output (sum) gate of the formula and Ψ 1 , . . . , Ψ r be the product gates feeding into it; further let f i be the polynomial computed by Ψ i . We claim that each f i is an (n/p)-product polynomial. If this is true, we are done since f = f 1 + · · · + f r and r is at most s.
To show that f i is an (n/p)-product polynomial, it suffices to show that each Ψ i has fan-in at least (n/p). This follows since each Φ at layer 2 satisfies |Vars(Φ)| ≤ p and for each sum gate Φ ′ at layer 3, we have Vars(Φ ′ ) = Vars(Φ) for any gate Φ at layer 2 feeding into Φ ′ (Proposition 3 item 2). By Proposition 3 item 3, the fan-in of each Ψ i at layer 4 must thus be at least (n/p). This concludes the base case. Now consider ∆ ≥ 3. Say we have a polynomial f that is computed by a (ΣΠ) ∆ Σ formula F of size at most s and top fan-in (say) r. Let Ψ be the output gate of F and Ψ 1 , . . . , Ψ r the product gates feeding into it; let f i be the polynomial computed by Ψ i . It suffices to show that each f i is the sum of at most s i many t(n, ∆)-product polynomials, where s i is the size of the subformula rooted at Ψ i . We show this now.
We have thus shown that no matter what k is, t ′ ≥ t(n, ∆), from which the induction step follows.
Step 2 -Rank measure and the hard polynomial The second step is to show that any such decomposition for IMM d must have many terms. Our proof of this step is inspired by the proof of the multilinear formula lower bound of Raz [Raz06] for the determinant and also the slightly weaker lower bound of Nisan and Wigderson [NW97] for IMM d in the set-multilinear case. Following [Raz06], we define a suitable random restriction of the IMM d polynomial by assigning variables from the underlying variable set X to Y ∪Z ∪{0, 1}, where Y and Z are disjoint sets of new variables of equal size. The restriction sets distinct variables in X to distinct variables in Y ∪ Z or constants, and hence preserves multilinearity.
Having performed the restriction, we consider the partial derivative matrix of the restricted polynomial, which is defined as follows. Let g ∈ F[Y ∪ Z] be a multilinear polynomial. Define the 2 |Y | × 2 |Z| matrix M (Y,Z) (g) such that rows and columns are labelled by distinct multilinear monomials in Y and Z respectively and the (m 1 , m 2 )th entry of M (Y,Z) (g) is the coefficient of the monomial m 1 · m 2 in g.
Our restriction is defined to have the following two properties.
1. The rank of M (Y,Z) (g) is equal to its maximum possible value (i.e. min{2 |Y | , 2 |Z| }) with probability 1 where g is the restricted version of IMM d .
2. On the other hand, let f be either a t-product polynomial or a t-simple polynomial, and let f ′ denote its restriction under ρ. Then, the rank of M (Y,Z) (f ′ ) is small with high probability.
Now, if IMM d has a (ΣΠ) ∆ Σ formula F of small size, then it is a sum of a small number of t-product and t-simple polynomials by Lemma 15 and hence by a union bound, we will be able to find a restriction under which the partial derivative matrices of each of the these polynomials has small rank. By the subadditivity of rank, this will imply that M (Y,Z) (g) will itself have low rank, contradicting the first property of our restriction.
To make the above precise, we first define our restrictions. LetỸ = {y 1 , . . . , y d } and Z = {z 1 , . . . , z d } be two disjoint sets of variables. A restriction ρ is a function mapping variables X to elements ofỸ ∪Z ∪ {0, 1}. We consider the following process for sampling a random restriction.
Notation. Recall that M (i) is the 2 × 2 matrix whose (u, v)th entry is x (i) u,v . Let I and E denote the standard 2 × 2 identity matrix and the 2 × 2 flip permutation matrix respectively. For a ∈ {1, 2}, we use a to denote the other element of the set.
We observe the following simple properties of ρ.
Observation 17. The restriction ρ satisfies the following.

Distinct variables in
otherwise.

11:
else 12: Now, i ∈ A and i is the jth smallest element of A for even j. We fix otherwise.

13:
end if 14: end for 3. Only the variables of the form x (i) π(i−1),π(i) can be set to variables in Y ∪ Z by ρ. The rest are set to constants.
Note that b is distributed uniformly over {0, 1} d . Given a polynomial f ∈ F[X], the restriction ρ yields a natural polynomial f | ρ ∈ F[Y ∪ Z] by substitution. Note, moreover, that if f is multilinear then so is f | ρ since distinct variables in X cannot be mapped to the same variable in Y ∪ Z (Observation 17).
2. If f ∈ F[X] is any t-product polynomial, then for some absolute constant ε > 0, 3. If f ∈ F[X] is any r-simple polynomial, then for some absolute constant δ > 0, Given Lemmas 15 and 18, we can finish the proof of Theorem 10 as follows.
For each i ∈ [s], Lemma 18 implies that where ε and δ are absolute constants.
Thus, unless s ≥ 2 Ω(t) , we see by a union bound that there exists a ρ such that for each From Lemma 18, we also know that for any choice of ρ in the sampling algorithm S, we have rank(M (Y,Z) (IMM d | ρ )) ≥ 2 m . In particular, since F computes IMM d , we must have s ≥ 2 Ω(t) = 2 Ω(∆d 1/∆ ) .

Proof of Lemma 18
Part 1: IMM d has high rank Let π ∈ {1, 2} d and a ∈ {0, 1} d be arbitrary. Note that in our sampling algorithm, ρ, A, b are completely determined given π and a.
Let us now examine the effect of ρ on IMM d . We take the graph theoretic view of the polynomial IMM d as given in Section 2.1. Figure 2 illustrates how this restriction affects the variables labelling the edges of the graph G d defined in Section 2.1. By substituting according to ρ in (1), we get that For any S ⊆ [m], let Z S (resp., Y S ) denote the monomial i∈S z i (resp., i∈S y i ). Now consider the matrix M (Y,Z) (IMM d | ρ ) . We will simply use M to denote this matrix. For the sake of simplicity let us assume that |A| = 2m. (The case when |A| = 2m + 1 is similar.) Let the rows and columns of M be labelled by the subsets of [m] and let M(S, T ) be the coefficient of Y S · Z T in IMM d | ρ . It is easy to see that M(S, T ) = 0 if S = T and 1 otherwise. That is, M is the Identity matrix of size 2 m × 2 m and hence it has full rank. 9 .
Part 2: t-product polynomials have low rank We now prove that for a t-product polynomial f , rank(M (Y,Z) (f | ρ )) is small with high probability. Let f be a t-product polynomial, i.e. f = f 1 f 2 . . . f t . Let χ : X → [t] be a coloring function, which assigns colors to all the variables in X, so that χ −1 (i) = X f i , where X f i is the variable set ascribed to f i . That is, all the variables ascribed to f i are assigned color i under the coloring function. To prove the lemma, we will first show that, with high probability (over the choice of π), a constant fraction of the t colors appear along the path defined by π, i.e. along (π(0), π(1)), (π(1), π(2)), . . . , (π(d − 1), π(d)). Given such a multi-colored path, we will then show that with a high probability, over the choice of a, many of the colors have an imbalance. A color is said to have an imbalance under ρ if more variables from X of that color are mapped to the Y variables than the Z variables or vice versa. We will then appeal to arguments that are similar to those in [Raz06,RY09,DMPY12] to conclude that the imbalance results in a low rank.
Variable coloring, t-product polynomials and imbalance. We start with some notation. Given a string π ∈ {1, 2} d , let the path defined by π be the following sequence of pairs (π(0), π(1)), (π(1), π(2)), . . . , (π(d − 1), π(d)) (we call it a path since these pairs correspond naturally to the edges of a path in the graph G d defined in Section 2.1). We say that a color γ ∈ [t] appears in layer , i.e., C i contains all the distinct colors appearing in layers {1, 2 . . . , i}. Therefore, |C d | = t. We will also define O 2i+1 to be all the colors appearing in odd numbered layers up to 2i+1, i.
(π(i−1),π(i)) )}, i.e. C i π contains all the distinct colors appearing along the path defined by π up to layer i. We first observe a property of C d π stated in the claim below.
We will assume the claim and finish the proof of Part 2 of Lemma 18. We will then prove the claim. The above claim shows that a lot of colors appear on the uniformly random path π with high probability. Using this, we will now show that a constant fraction of these colors also exhibit an imbalance with a high probability. Using the multiplicativity of the rank, we will then show that the imbalance for a large number of factors results in the low rank of the matrix M Y,Z (f | ρ ).
We will say that π is good if |C d π | > t/100. Let L = t/100. The above claim shows that a random π is good with high probability. In what follows, we condition on picking a good π. Let a ∈ {0, 1} d be chosen uniformly at random as in the sampling algorithm. Let ρ be defined as in the sampling algorithm for π, a.
It is easy to see that if |ρ(π γ )| is odd, then γ must have an imbalance w.r.t. ρ. Note that the former event is equivalent to the event that i∈Pγ a i equals 1 where P γ = {i | (π(i − 1), π(i)) ∈ π γ }. Hence for any γ ∈ C d π , Pr[γ has an imbalance with respect to ρ along π] = 1/2. Further, since |C d π | ≥ L and the events corresponding to distinct γ ∈ C d π are mutually independent, the Chernoff bound implies Pr[at most L/4 colors have an imbalance with respect to ρ along π] ≤ 1/2 Ω(L) . . Assuming Claim 19 we are now done. We now present the proof of Claim 19.
Proof of Claim 19. We define O 2i+1 π to be all the colors appearing in odd numbered layers along π up to the layer 2i + 1, i.e. O 2i+1 (π(2i−1),π(2i)) )}. We know that |C d | = t. Therefore, either |O d | ≥ t/2 or |E d | ≥ t/2. Let us assume without loss of generality that |O d | ≥ t/2. For this part of the proof, for the sake of simplicity, we will assume that d is odd. The assumption can be easily removed by losing at most constant factors in the bound.
Let j 1 , j 2 , . . . , j τ be odd indices such that for each 1 ≤ i ≤ τ − 1 , |O j i | < |O j i+1 | , i.e. each O j i has at least one new color. Let γ 1 , γ 2 , . . . , γ τ be colors which appear new in these sets. (If multiple new colors appear in a set then choose any one.) Let W i be the indicator random variable, which takes value 1 if |O j i π | < |O j i+1 π | and 0 otherwise, where 1 ≤ i ≤ τ − 1. Then E[W i ] = 1/4 as the probability of the color γ i appearing in O j i π is equal to 1/4. Note that the W i s are independently distributed since they depend on distinct co-ordinates of π. Now E[ i W i ] ≥ t/8, as |O d | ≥ t/2. Now we get, As |O d π | ≤ |C d π |, the first inequality follows. If the number of times a new color appears along π within the odd layers is at most t/100, then i W i is also at most t/100, therefore we get the second inequality. Finally the last inequality follows by the Chernoff bound.
Imbalance implies low rank. Let us recall that f = f 1 f 2 . . . f t is a t-product polynomial that is defined over the disjoint variable partition X = X 1 ∪ X 2 ∪ · · · ∪ X t such that |X i | ≥ 1 for all i ∈ [t]. The following lemma (see, e.g., [RY09]) will be useful in bounding rank(M (Y,Z) (f | ρ )).
As f is an r-simple polynomial we know that f = r ′ i=1 L i · G, where r ′ ≤ r, L i s are linear polynomials, ∀i ∈ [r ′ ] X i is the set of variables ascribed to L i and X r ′ +1 is the set of variables ascribed to G. Moreover, | ∪ r ′ i=1 X i | ≥ 400r. To prove the above statement we set up some notation.
In the following claim we show that if U is a large set to begin with then with high probability (over the restriction ρ defined by the sampling algorithm), U | ρ is also large.
Claim 21. If |U | ≥ 400r, then Pr[|U | ρ | ≤ 4r] ≤ 1 2 Ω(r) . We first finish the proof of Part 3 of Lemma 18 assuming this claim. We say that a restriction ρ is good if we get |U | ρ | ≥ 4r. In what follows we will condition on the event that we have a good ρ.
. Assuming Claim 21 we are done with the proof of Part 3 of Lemma 18. Given below is the proof of Claim 21.
Proof of Claim 21. We say that a layer i ∈ [d] is touched by U if there is a variable x (i) u,v ∈ U . We call such an x (i) u,v a contact edge. Any layer touched by U has at most 4 contact edges. As |U | ≥ 400r, U touches at least 100r layers. At least half of the layers will be odd numbered or at least half of them will be even numbered. Let us assume without loss of generality that at least half of them are odd numbered. Let these be ℓ 1 , ℓ 2 , . . . , ℓ R , where R ≥ 50r. Let us fix a contact edge (u i , v i ) per ℓ i for each i ∈ [R]. Let us denote that these edges by x . Let us use an indicator random variable W i which is set to 1 if ρ(x (ℓ i ) (u i ,v i ) ) ∈ U | ρ and to 0 otherwise. Note that Pr a,π [W i = 1] = 1/8, where a, π are as in the sampling algorithm. This is because, for odd layers, probability that a fixed edge (among 4 possible contact edges) is picked by π is exactly 1/4 and for a odd layer ℓ the probability that a ℓ = 1 is exactly 1/2. Moreover, both these events are independent. Therefore E[ R i=1 W i ] = R/8 ≥ 5r. Hence we get, Pr[|U | ρ | ≤ 4r] ≤ Pr[ R i=1 W i ≤ 4r] ≤ 1 2 Ω(r) , where the last inequality is by the Chernoff bound.