Separation Between Read-once Oblivious Algebraic Branching Programs (ROABPs) and Multilinear Depth-three Circuits

We show an exponential separation between two well-studied models of algebraic computation, namely, read-once oblivious algebraic branching programs (ROABPs) and multilinear depth-three circuits. In particular, we show the following: (1) There exists an explicit n-variate polynomial computable by linear sized multilinear depth-three circuits (with only two product gates) such that every ROABP computing it requires 2Ω(n) size. (2) Any multilinear depth-three circuit computing IMMn,d (the iterated matrix multiplication polynomial formed by multiplying d, n × n symbolic matrices) has nΩ(d) size. IMMn,d can be easily computed by a poly(n,d) sized ROABP. (3) Further, the proof of (2) yields an exponential separation between multilinear depth-four and multilinear depth-three circuits: There is an explicit n-variate, degree d polynomial computable by a poly(n) sized multilinear depth-four circuit such that any multilinear depth-three circuit computing it has size nΩ(d). This improves upon the quasi-polynomial separation of Reference [36] between these two models. The hard polynomial in (1) is constructed using a novel application of expander graphs in conjunction with the evaluation dimension measure [15, 33, 34, 36], while (2) is proved via a new adaptation of the dimension of the partial derivatives measure of Reference [32]. Our lower bounds hold over any field.

at most 2 n δ by "reducing" a multilinear depth three circuit to a collection of ROABPs and "putting together" the hitting-sets of the ROABPs. This "putting together" process raises the hitting-set complexity from quasi-polynomial (for a single ROABP) to sub-exponential (for a composition of several ROABPs). Had it been the case that a multilinear depth-three circuit can be directly reduced to a single small size ROABP, an efficient hitting set for the former would have ensued immediately from References [2,15]. One of the results in the article (Theorem 1.6), rules out this possibility. In fact, Theorem 1.6 shows something stronger as described below.
The variable sets X = {x 1 , x 2 } and Y = {y 1 , y 2 } are completely disjoint and are called the base sets of C (X , Y ). When projected on X variables (i.e., after putting the Y variables to zero), C (X , Y ) is a setmultilinear depth-three circuit in the X variables. A similar thing is true for the Y variables. Thus, every base set is associated with a set-multilinear depth-three circuit and vice versa. Any multilinear depth-three circuit can be trivially viewed as a superposition of n set-multilinear depth-three circuits with single variable in every base set, where n is the number of variables. A crucial observation in Reference [9] is that every multilinear depth-three circuit is "almost" a superposition of n ϵ set-multilinear depth-three circuits for some ϵ < 1, and further the associated n ϵ base sets can be found in sub-exponential time using k-wise independent hash functions. Once we know the r = n ϵ base sets corresponding to r set-multilinear depth-three circuits whose superposition forms a circuit of size s, finding a hitting set for the circuit in time s r . log s follows easily by taking a direct product of hitting sets for r many set-multilinear depth-three circuits (in fact r many ROABPs, as polynomial sized set-multilinear depth-three circuits reduces to polynomial sized ROABPs). We think a useful model to consider at this juncture is superposition of constantly many setmultilinear depth-three circuits with unknown base sets. In this case knowing the r = O (1) base sets readily gives us a quasi-polynomial time hitting set generator, but finding these base sets from a given circuit is NP-hard for r ≥ 3 (as we show in Observation 1.1), which rules out the possibility of knowing the base-sets even if we are allowed to see the circuit (as in the white-box case). Indeed, even in this special case where the given multilinear depth-three circuit is promised to be a superposition of constantly many (say, 2) set-multilinear depth-three circuits, the algorithm in Reference [9] finds and works with many base sets and the resulting hitting set complexity grows to roughly exp( √ n). Could it be that the superposition of constantly many set-multilinear depth-three circuits efficiently reduce to ROABPs? Unfortunately, the answer to this also turns out to be negative as Theorem 1.6 gives an explicit example of a superposition of two set-multilinear depth-three circuit computing an n-variate polynomial f such that any ROABP computing f has width 2 Ω(n) .
While comparing two models (here multilinear depth-three circuits and ROABPs), it is desirable to show a separation in both directions whenever an efficient reduction from one to the other seems infeasible. In this sense, we show a complete seperation between the models under consideration by giving an explicit polynomial computable by a polynomial sized ROABP such that every multilinear depth-three circuit computing it requires exponential size. In fact, this explicit polynomial is simply the Iterated Matrix Multiplication IMM n,d -the (1, 1)th entry of a product of d n × n symbolic matrices (Theorem 1.7). IMM n,d can be easily computed by a polynomial-sized ROABP (see Observation 1.3). Although, a 2 Ω(d ) lower bound for multilinear depth-three circuit computing Det d is known [36], this does not imply a lower bound for IMM n,d (despite the fact that Det and IMM are both complete for algebraic branching programs (ABPs) [29]) as the projection from IMM to Det can make the circuit non-multilinear. Another related work by Reference [12] showed a separation between multilinear ABPs and multilinear formulas by exhibiting an explicit polynomial (namely, an arc-full-rank polynomial) that is computable by a linear size multilinear ABP but requires super-polynomial size multilinear formulas. But again multilinearity of a circuit can be lost when IMM is projected to the arc-full-rank polynomial used in Reference [12], and hence this result too does not imply a lower bound for IMM. An extension of Theorem 1.7 to a super-polynomial lower bound for multilinear formulas computing IMM will have interesting consequences in separating noncommutative formulas and noncommutative ABPs. In a contemporary work [25], some of the authors of this work and Sébastien Tavenas have been able to show an n Ω( √ d ) lower bound for multilinear depth-four circuits computing IMM n,d by significantly extending a few of the ideas present in this work and building upon (thereby improving) the work of Reference [16]. In summary, the models poly-sized ROABPs and poly-sized multilinear depth-three ciruits have provably different computational powers, although they share a non-trivial intersection as poly-sized set-multilinear depth-three circuits is harbored in both.
An interesting outcome of the proof of the lower bound for multilinear depth-three circuits computing IMM is an exponential separation between multilinear depth-three and multilinear depthfour circuits. Previously, Reference [36] showed a super-polynomial separation between multilinear constant depth h and depth h + 1 circuits, which when applied to the depth-three versus depth-four setting gives a quasi-polynomial separation between the two models. In comparison, Theorem 1.8 gives an exponential separation.
The models and our results. We define the relevant models and state our results now.

Definition 1.1 (Algebraic Branching Program
). An Algebraic Branching Program (ABP) in the variables X = {x 1 , x 2 , . . . , x n } is a directed acyclic graph with a source vertex s and a sink vertex t. It has (d + 1) sets or layers of vertices V 1 , V 2 , . . . ,V d +1 , where V 1 and V d +1 contain only s and t, respectively. The width of an ABP is the maximum number of vertices in any of the (d + 1) layers. All the edges in an ABP are such that an edge starts from a vertex in V i and is directed to a vertex in The edges in an ABP are labelled by polynomials 3 over a base field F . The weight of the path between any two vertices u and v in an ABP is computed by taking the product of the edge labels on the path from u to v. An ABP computes the sum of the weights of all the paths from s to t.
A special kind of ABP, namely, ROABP, is defined in Reference [15].

Definition 1.2 (Read-Once Oblivious Algebraic Branching Program).
A Read-Once Oblivious Algebraic Branching Program(ROABP) over a field F has an associated permutation π : [n] → [n] of the variables in X . The number of variables is equal to the number of layers of vertices minus one; i.e., n = (d + 1) − 1 = d. The label associated with an edge from a vertex in V i to a vertex in V i+1 is an univariate polynomial over F in the variable x π (i ) .

Definition 1.3 (Multilinear Depth-four and Depth-three Circuits).
is a multilinear polynomial for every i ∈ [s] and j ∈ [d i ]. If Q i j 's are linear polynomials, then C is a multilinear depth-three (ΣΠΣ) circuit. The parameter s is the top fan-in of C.
The sets X 1 , X 2 , . . . , X d are called the colors of X . If |X j | = 1 for every j ∈ [d], then we say X has singleton colors and C is a set-multilinear depth-three circuit with singleton colors.
As a bridge between multilinear and set-multilinear depth-three circuits, we define a model called superposition of set-multilinear depth-three circuits.

Definition 1.5 (Superposition of Set-multilinear Depth-three Circuits).
A multilinear depth-three (ΣΠΣ) circuit C over a field F is a superposition of t set-multilinear depth-three circuits over variables , C is a set-multilinear depth-three circuit in Y i variables over the field F (X \ Y i ). The sets Y 1 , . . . , Y t are called the base sets of C. Further, we restrict the Y i to have singleton colors for every i ∈ [t].
Note that although the notion of superposition makes sense even if Y i 's do not have singleton colors, we restrict to singletons as this model itself captures multilinear depth-three circuits. We make the following initial observation for superposition of set-multilinear depth-three circuits.
The proof of the observation appears in Section 6.1. We now state the main results of this article. In Theorem 1.6, we use P to denote the set of prime numbers. Theorem 1.6 (Main Theorem 1).
(1) There is an explicit family of 2n-variate polynomials { f n } n ∈P, n ≥11 over any field F such that the following hold: f n is computable by a multilinear depth-three circuit C over F with top fan-in three and C is also a superposition of two set-multilinear depth-three circuits. Any ROABP over F computing f n has width 2 Ω(n) . (2) There is an explicit family of 3n-variate polynomials {д n } n ∈P over any field F such that the following hold: д n is computable by a multilinear depth-three circuit C over F with top fan-in two and C is also a superposition of three set-multilinear depth-three circuits. Any ROABP over F computing д n has width 2 Ω(n) .
We prove Theorem 1.6 in Section 3. The tightness of the theorem is exhibited by this observation. The proof of Observation 1.2 is in Section 6.1. Thus, it follows from Theorem 1.6 that if we increase either the top fan-in or the number of variables per linear polynomial from two to three in multilinear depth-three circuits then there exist polynomials computed by such circuits such that ROABPs computing these polynomials have exponential width. We now state the "converse" of Theorem 1.6.  . Any multilinear depth-three circuit (over any field) computing IMM n,d , the (1, 1)th entry of a product of d n × n symbolic matrices, has top fan-in n Ω(d ) for n ≥ 6. Theorem 1.7 also implies a lower bound for determinant, see Corollary 4.2. We prove Theorem 1.7 in Section 4. It is not hard to observe the following. Observation 1.3. IMM n,d can be computed by an n 2 width ROABP.
The proof of Observation 1.3 given in Section 6.1 presents a brute force way to compute IMM n,d by an ROABP, whereas a more careful analysis yields a width 2n ROABP computing IMM n,d . Thus, Theorem 1.6, Theorem 1.7 and Observation 1.3 together imply a complete separation between multilinear depth-three circuits and ROABPs. As a consequence of the proof of Theorem 1.7, we also get an exponential separation between multilinear depth-three and multilinear depth-four circuits. We prove Theorem 1.8 in Section 4. The hard polynomials used in Theorem 1.6 belong to a special class of multilinear depth-three circuits-they are both superpositions of constantly many set-multilinear depth-three circuits and simultaneously a sum of constantly many set-multilinear depth-three circuits. Here is an example of a circuit from this class: C (X , Y ) is a superposition of two set-multilinear depth-three circuits with base sets X = {x 1 } ∪ {x 2 } and Y = {y 1 } ∪ {y 2 }. But C (X , Y ) is also a sum of two set-multilinear depth-three circuits with {x 1 , y 2 }, {x 2 , y 1 } being the colors in the first set-multilinear depth-three circuit (corresponding to the first two products) and {x 1 , y 1 }, {x 2 , y 2 } the colors in the second set-multilinear depth-three circuit (corresponding to the last two products). For such a subclass of multilinear depth-three circuits, we give a quasi-polynomial time hitting set by extending the proof technique of Reference [3]. Theorem 1.9. Let C n,m,l,s be a subclass of multilinear depth-three circuits computing n-variate polynomials such that every circuit in C n,m,l,s is a superposition of at most m set-multilinear depththree circuits and simultaneously a sum of at most l set-multilinear depth-three circuits, and has top fan-in bounded by s. There is a hitting-set generator for C n,m,l,s running in (ns) O (lm log s ) time.
When m and l are bounded by poly(log ns), we get quasi-polynomial time hitting sets. The proof of Theorem 1.9, which extends the shift and rank concentration technique of Reference [3], is given in Section 5. To our understanding, even if m and l are constants, Reference [9]'s algorithm yields an exp( √ n) hitting set complexity. Also, Reference [18] has recently given a (ndw ) O (l 2 l log(ndw )) time hitting set generator for n-variate, individual (variable) degree d polynomials computed by sum of l ROABPs each of width less than w. Sum of l set-multilinear depth-three circuits reduces to sum of l ROABPs as set-multilinear depth-three circuits readily reduce to poly-sized ROABPs. But, observe the doubly exponential dependence on l in their result. On the contrary, in Theorem 1.9 the dependence is singly exponential in l. So, the hitting-set complexity remains quasi-polynomial for l = (log n) O (1) , whereas Reference [18] gives an exponential time hitting-set generator when applied to the model in Theorem 1.9. However, it is also important to note that the model considered in Theorem 1.9 is somewhat weaker than the sum of ROABPs model in Reference [18] because of the additional restriction that our model is also a superposition of m set-multilinear depth-three circuits.
Proof ideas for Theorems 1.6 and 1.7. Theorem 1.6 is proved by connecting the notion of edge expansion (Definition 2.4) with the evaluation dimension measure (Definition 2.1). Starting with an explicit 3-regular bipartite expander G, we associate distinct variables with distinct vertices. Every edge now corresponds to a linear polynomial-it is the sum of the variables associated with the vertices on which the edge is incident upon. A multilinear depth-three circuit C is derived from the expander G as follows: C has three product terms, each term formed by taking product of the linear polynomials associated with the edges of a matching in G. Now, edge expansion of G can be used to argue that for every subset S of variables of a certain size there exists of a product term in C that has high evaluation dimension with respect to S. Further, one can show that high evaluation dimension of a product term implies high evaluation dimension of C with respect to S by restricting the circuit modulo two linear polynomials to nullify the other two product terms. However, for every ROABP there is a set S (of any size) such that the evaluation dimesion of the ROABP with respect to S is bounded by its width. This gives a lower bound on the width of the ROABP computing the same polynomial as C thereby proving part 1 of Theorem 1.6. Part 2 is proved similarly, but now we associate edges and vertices of a bipartite expander G with variables and linear polynomials, respectively. Circuit C is formed by adding two product terms, each term formed by multiplying the linear polynomials associated with the left or the right vertex set of G. As before, edge expansion of C implies for every set S of variables of a certain size there is a product term of C with high evaluation dimension and this in turn implies high evaluation dimension of C.
While writing this article, we came to know about a recent work by Jukna [22] that uses Ramanujan graphs to give an alternate proof of a known exponential lower bound for monotone arithmetic circuits. To our understanding, it does seem that Jukna's proof also implictly relates expansion with evaluation dimension, but the argument in Reference [22] is directed towards monotone circuits and it does not seem to imply any of the lower bounds shown in this work. In particular, the hard polynomial in Reference [22] could have any complexity, whereas in our case we need the hard polynomial to be computable by a small multilinear depth-three circuit. Theorem 1.7 is proved by introducing a new variant of the dimension of the space of partial derivatives measure that is inspired by References [32,34]. At a high level, the idea is to consider a polynomial f in two sets of variables X and Y such that |Y | |X |. If we take derivatives of f with respect to all degree k monomials in Y -variables and set all the Y -variables to zero after taking derivatives, then we do expect to get a "large" space of derivatives (especially, when f is a "hard" polynomial), simply because |Y | is large. However, in any depth-three multilinear circuit C computing f , the dimension of the space of derivatives of a product term is influenced only by the number of linear polynomials containing the X -variables as all the Y -variables are set to zero subsequently. Thus, the measure is somewhat small for a product term of C as |X | |Y |. By subadditivity of the measure (Lemma 2.3), this implies high top fan-in of C computing f . A notable difference with References [34,36] is that the variable sets X and Y are fixed deterministically, a priori, and not by random partitioning of the entire set of variables.

PRELIMINARIES
Measures. We have used two complexity measures, namely, evaluation dimension and a novel variant of the dimension of the space of partial derivatives, to prove Theorem 1.6 and 1.7, respectively. Evaluation dimension was first defined in Reference [15]. 5 Let X be a set of variables.

Definition 2.1 (Evaluation Dimension).
The evaluation dimension of a polynomial д ∈ F [X ] with respect to a set S ⊆ X , denoted as Evaldim S (д), is defined as Evaluation dimension is a nearly equivalent variant of another measure, the rank of the partial derivatives matrix, first defined in Reference [30] to prove lower bounds for non-commutative models. Rank of the partial derivatives matrix measure was also used in References [12,[33][34][35][36] to prove lower bounds and separations for several multilinear models. These two measures are identical over fields of characteristic zero (or sufficiently large size).
The partial derivatives measure was introduced in Reference [32]. The following is a simple variant of this measure that is also inspired by the measure used in Reference [34].
, where X and Y are disjoint sets of variables, and Y k be the set of all monomials in Y variables of degree k ∈ N. Define the measure In proving Theorem 1.7, we apply the above measure with a significant difference (or skew) between the number of X and Y variables-it is this imbalance that plays a crucial role in the proof. Both the above measures obey the property of subadditivity. The proof of Lemma 2.3 is in Section 6.2.
Expander Graphs. A vital ingredient that helps us construct the hard polynomials in Theorem 1.6 is a family of explicit 3-regular expanders. We recall a few basic definitions from Reference [21].
We use a three regular expander graph family to construct the hard polynomial families in Theorem 1.6. Before we state an explicitly constructible three regular expander graph family, we make precise the notion of explicit expander graphs.
be a family of d-regular expanders such that the number of vertices in G i is bounded by a polynomial in i. G is mildly explicit if there exists an algorithm that takes input i and constructs G i in time polynomial in the size of G i .
A family of mildly explicit expanders. We will use P to denote the set of prime numbers. Reference [21] mentions a family of mildly explicit 3-regular p-vertex expanders {G p } p ∈P such that for every graph G p in the family: h(G p ) > 3 2 10 −4 . The vertices of G p correspond to elements in Z p . A vertex x in G p is connected to x + 1, x − 1 and to its inverse x −1 (operations are modulo p and inverse of 0 is defined as 0, and a self-loop increases the degree of the vertex by one). We refer the reader to Reference [21], Section 11.1.2, for more details. Denote this family of 3-regular p-vertex expanders by S. Double Cover. The proof of Theorem 1.6 works with bipartite expanders. It is standard to transform a d-regular expander graph to a d-regular bipartite expander graph by taking its double cover.

Definition 2.8 (Double Cover). The double cover of a graph
for every p. This has been argued in Section 11.1.2 of [21], where they show that the normalized value of λ(G p ) is at most (1 − is the spectrum of G p then {±λ 1 , . . . , ±λ p } are exactly the eigenvalues of the adjacency matrix of the bipartite graphG p . Hence, λ(G p ) is the second largest eigenvalue of AG p . By applying Cheeger's inequality (Theorem 2.6), h(G p ) > 3 2 · 10 −4 for every p asG p is 3-regular.
Hitting-set generators. In Theorem 1.9, we give a quasi-polynomial time hitting-set generator for a subclass of multilinear depth-three circuits.

Definition 2.10 (Hitting-set Generators).
A hitting-set generator for a class of circuits C is a Turing machine H that takes (1 n , 1 s ) as input and outputs a set {a 1 , . . . , a m } ⊆ Z n such that for every circuit C ∈ C of size bounded by s and computing a nonzero n-variate polynomial over a field F ⊃ Z, there is an i ∈ [m] for which C (a i ) 0. Complexity of H is its running time. 6 Hitting set generators are also defined as a polynomial map , such that for every circuit C ∈ C computing a nonzero n-variate polynomial, C (h 1 , h 2 , . . . , h n ) is a nonzero t-variate polynomial. If |F | > n, then it is not hard to argue that the two definitions are equivalent (see Section 4.1 [42]).
Technical Lemmas. The following lemmas are used in the proof of Theorem 1.6. Lemma 2.11 follows from Hall's marriage Theorem [19]. Lemma 2.11. A d-regular bipartite graph can be split into d edge-disjoint perfect matchings. 6 Hitting-set generators can be defined similarly over finite fields by considering field extensions.
Proof. Consider the following F -evaluations of {y 1 , y 2 , . . . ,y n }: for every S ⊆ [n], if j ∈ S set y j = 1 else set y j = 0. There are m = 2 n such evaluations. By taking appropriate Flinear combinations of these evaluations of the polynomial S ⊆[n] y S · д S , one can get the m polynomials {д S } S ⊆[n] . Since these m polynomials are given to be F -linearly independent, Proof. Without loss of generality, assume the permutation π associated with the ROABP R is the identity permutation. Hence, R can be equivalently viewed as a product of n matri- . , x n ] that do not depend on which evaluation of the {x 1 , . . . , x i }variables we began with. Hence, evaluation dimension of д(X ) with respect to S is upper bounded by k.

LOWER BOUNDS FOR ROABP: PROOF OF THEOREM 1.6
Proof of Part 1 Construction of the polynomial family. We construct a family of 2n-variate multilinear polynomials { f n } n ∈P,n ≥11 from the explicit family of 3-regular expander graphs S (described in section 2). From an n-vertex graph G = (V , E) in S, construct a polynomial f (X , Y ) in variables X = {x 1 , . . . , x n } and Y = {y 1 , . . . ,y n } as follows: LetG = (L R,Ẽ) be the double cover of G. By Lemma 2.9, h(G) > 3 2 10 −4 . With every vertex in L (similarly, R) associate a unique variable in X (respectively, Y ), thus vertices in L and R are identified with the X and Y variables, respectively. An edge between x i and y j is associated with the linear polynomial (x i + y j ). By Lemma 2.11,G can be split into three edge-disjoint perfect matchings. Corresponding to every perfect matching, we have a product term formed by taking product of the linear polynomials associated with the edges of the matching. Polynomial f (X , Y ) is the sum of the three product terms corresponding to the three edge-disjoint perfect matchings ofG. It is easy to show the following claim, proof given in Section 6.3.
Claim 3.1. Polynomial f (constructed above) is computed by a multilinear depth-three circuit C of size Θ(n) and top fan-in three, and C is a superposition of two set-multilinear depth-three circuits.
High evaluation dimension of f (X, Y ). It turns out that the evaluation dimension of f (X , Y ) with respect to any subset of variables of size n/10 is large.
Proof. Consider any subset S of n/10 variables from X Y . With respect to set S, we can classify the linear polynomials in the product terms of f (X , Y ) into three types: untouched-if none of the two variables in the linear polynomial belong to S, partially touched-if exactly one of the variables in the linear polynomial belongs to S, and completely touched-if both variables belong to S. Call the three product terms of f : P 1 , P 2 , and P 3 . Claim 3.2. There exists a set X 0 ⊆ X of ( 7n 10 − 4) X -variables such that every x ∈ X 0 appears in an untouched linear polynomial in every P i (for i ∈ [3]), and further if (x + y j 1 ), (x + y j 2 ) and (x + y j 3 ) are the linear polynomials occurring in P 1 , P 2 , and P 3 , respectively, then y j 1 y j 2 y j 3 .
Proof. For every i ∈ [3], let D i represent the set of touched linear polynomials in product gate i. Hence, |D 1 | + |D 2 | + |D 3 | ≤ 3n 10 . Thus, the number of X -variables that are part of these touched linear polynomials is at most 3n 10 as every linear polynomial has exactly one X -variable. This implies at least 7n 10 X -variables are part of untouched linear polynomials in every product gate. As f (X , Y ) is constructed fromG, two product gates contain the same linear polynomial l if and only if there is a double edge between the endpoints of the edge corresponding to the linear polynomial l inG. GraphG is the double cover of the n-vertex graph G ∈ S where n ≥ 11 is a prime. A double edge between vertices u L and v R inG implies existence of a double edge between vertices u and v in G. Vertices of G are identified with elements of Z n . A vertex a in G n is connected to a + 1, a − 1 and a −1 (operations are modulo n and inverse of 0 is 0). Thus, there is a double edge incident on a vertex a in G if and only if any two of the vertices a + 1, a − 1 and a −1 are the same. If a + 1 = a − 1 mod n, then 2 = 0 mod n, which cannot be true as n ≥ 11. Hence, if there is a double edge incident on a then either a + 1 = a −1 mod n, or a − 1 = a −1 mod n. This means G has exactly two sets of double edges -between −1− -if 5 is a square in Z n ; otherwise, G has no double edge. As a double edge in G gives rise to two double edges inG, the latter has at most four double edges. Thus, at most four out of the 7n 10 X -variables are part of untouched linear polynomials that appear in more than one product gate. We remove these four variables. X 0 is the set of the remaining X -variables of size at least ( 7n 10 − 4). Naturally, every variable in X 0 has the desired property as stated in the claim. Proof. Let T = max i ∈ [3] {|B i |}. Recall that f has been constructed from the bipartite expander G, and vertices inG identified with the variable set X Y . We denote the vertices inG corresponding to the variables in S also by S, and denote the set of edges going out from S to S = L R\S iñ G byẼ (S, S ). Using the expansion property ofG, Every edge inẼ(S, S ) corresponds to a partially touched linear polynomial. SinceG is 3-regular, at least |Ẽ (S,S ) | 3 of the edges correspond to distinct partially touched linear polynomials. By assumption, the number of such partially touched linear polynomials is at most 3T ; and soT ≥ 10 −6 · n.
The next claim completes the proof of Lemma 3.1. Proof. Without loss of generality, assume |B 1 | ≥ ϵn. Pick two variables, say x and x , from the set X 0 (as described in Claim 3.2). Here, |X 0 | ≥ 7n 10 − 4 ≥ 2, for n ≥ 11. Let (x + y j 2 ) and (x + y j 3 ) be the linear polynomials appearing in P 2 and P 3 , respectively. By substituting x = −y j 2 and x = −y j 3 in д, the terms P 2 and P 3 vanish but P 1 does not (by Claim 3.2). Letf be the polynomial f after the substitution. Polynomialf has only one product termP 1 (i.e., P 1 under the substitution), and P 1 has as many partially touched linear polynomials as P 1 . At this point, we use the following observation.
Since the linear polynomials (x + y j 2 ) and (x + y j 3 ) are untouched, the variables x, x , y j 2 , y j 3 do not belong to S and hence the polynomials {ĥ 1 , . . . ,ĥ t } span the spaceV , whereĥ i is polynomial h i under the substitution x = −y j 2 and x = y j 3 . Below, we show Evaldim S (P 1 ) ≥ 2 ϵn . SupposeP 1 has T ≥ ϵn partially touched linear polynomials {l 1 , l 2 , . . . , l T }. For every r ∈ [T ], let l r = z r + u r where z r ∈ S and u r ∈ (X ∪ Y )\S. Let Z = {z 1 , z 2 , . . . , z T }. Then substitute all variables in S\Z to 1. SupposeP 1 is equal toP 1 after this substitution. Then it follows easily that Evaldim Z (P 1 ) ≤ Evaldim S (P 1 ) as Z ⊆ S. Let q be the polynomial formed by multiplying all linear polynomials inP 1 that are free of variables in Z . Then, where z ν = j ∈ν z j and u [T ]\ν = j ∈[T ]\ν u j . Since q is Z -free, by Lemma 2.12, we have This completes the proof of Claim 3.4. From Lemmas 2.13 and 3.1, we conclude that any ROABP computing f (X , Y ) has width at least 2 ϵn .

Proof of Part 2
Construction of the polynomial family. Similar to part 1, we construct a family of 3n-variate multilinear polynomials {д n } n ∈P from the explicit family of 3-regular expanders S -but this time edges will be associated with variables and vertices with linear polynomials. From an nvertex graph G = (V , E) in S, construct a polynomial д(X , Y , Z ) in variables X = {x 1 , . . . , x n }, Y = {y 1 , . . . ,y n } and Z = {z 1 , . . . , z n } as follows: LetG = (L R,Ẽ) be the double cover of G, and as before h(G) > 3 2 10 −4 . Edges ofG can be split into three edge-disjoint perfect matchings (by Lemma 2.11). Label the edges of the first perfect matching by distinct X -variables, the edges of the second matching by distinct Y -variables, and the edges of the third by distinct Z -variables. Vertices ofG now correspond to linear polynomials naturally-if the three edges incident on a vertex are labelled x i , y j , and z k , then associate the linear polynomial (x i + y j + z k ) with the vertex. Let P 1 be the product of the linear polynomials associated with the vertices of L, and P 2 the product of linear polynomials associated with the vertices of R. Polynomial д(X , Y , Z ) is the sum of P 1 and P 2 . The following claim is easy to show (just like Claim 3.1). For completeness, we prove Claim 3.5 in Section 6.3.

Claim 3.5. Polynomial д (constructed above) is computed by a multilinear depth-three circuit C of size Θ(n) and top fan-in two, and C is a superposition of three set-multilinear depth-three circuits.
High evaluation dimension of д(X, Y ). The proof of the following lemma is similar to that of Lemma 3.1, differences arise only due to the "dual" nature of д.  Proof. Let T = max i ∈ [2] {|B i |}. It is easy to observe the following.
). Let C be the set of vertices inG corresponding to the completely touched linear polynomials in either of the product gates, thus |C | = |C 1 | + |C 2 | and n 15 − 8T 3 ≤ |C | ≤ n 15 . Each edge inẼ (C, C) connects a vertex that corresponds to a completely touched linear polynomial to a vertex that corresponds to a partially touched linear polynomial. Using expansion ofG, Since edges inẼ (C, C) are associated with variables in S, a vertex corresponding to a partially touched linear polynomial has at most two edges fromẼ (C, C) incident on it. Hence, the number of vertices corresponding to partially touched linear polynomials is at least |Ẽ (C,C ) | 2 . But, by assumption, the number of such vertices is at most 2T . Thus, The proof of the next claim is much like that of Claim 3.4.
Proof. Without loss of generality assume |B 1 | ≥ ϵn. SinceG is the double cover of a graph G ∈ S, it is easy to argue that no two vertices inG have all the three edges in common. Hence, the linear polynomial l is unique to a product gate, i.e., if l is a linear factor of P 2 then l is not a linear factor of P 1 . Pick an untouched linear polynomial: (x + y + z) in P 2 such that x is part of an untouched linear polynomial in P 1 -we know there are at least n − 2n 10 = 4n 5 such X -variables. By substituting x = −(y + z), P 2 vanishes but P 1 remains nonzero. Letд be the polynomial we get after this substitution.д has just one product termP 1 (corresponding to P 1 after substitution).P 1 has as many partially touched linear polynomials as P 1 . From here on a similar argument used to  This completes the proof of Lemma 3.2. From Lemmas 2.13 and 3.2, we conclude that any ROABP computing д has width at least 2 ϵn .

LOWER BOUNDS FOR MULTILINEAR DEPTH THREE CIRCUITS
The proofs of Theorems 1.7 and 1.8 are inspired by a particular kind of projection of IMM n,d considered in Reference [16]. We say a polynomial f is a simple projection of another polynomial д if f is obtained by simply setting some variables to field constants in д.
Proof of Theorem 1.7. The proof proceeds by constructing an ABP M of width n and with d + 1 layers of vertices such that (a) the polynomial computed by M, say f , is a simple projection of IMM n,d , and (b) any multilinear depth-three circuit computing f has top fan-in n Ω(d ) . Since an ABP can be viewed equivalently as a product of matrices, we will describe M using matrices. Matrices X (1) and X (2k ) are row and column vectors of size n, respectively. The uth entry in X (1) (similarly, X (2k ) ) is x (1) u (respectively, x (2k ) u ). All the remaining matrices {X (2) , . . . , X (2k−1) } are diagonal matrices in the X variables, i.e., the (u, u)th entry in X (i ) is x (i ) u and all other entries are zero for i ∈ [2, 2k − 1]. The matrices are placed as follows: Between two adjacent Y matrices, Y (i ) and Y (i+1) , we have three matrices ordered from left to right as X (2i ) , A (i ) , and X (2i+1) for every i ∈ [1, k − 1]. Ordered from left to right, X (1) is on the left of Y (1) and X (2k ) is on the right of Y (k ) . Naturally, we have the following relation among k and d: d = 4k − 1, i.e., k = d +1 4 . Thus, |X | = 2nk and |Y | = n 2 k. This imbalance between the number of X and Y variables plays a vital role in the proof. Denote the polynomial computed by this ABP M as f (X , Y ).
The following claim is easy to verify as f is a simple projection of IMM n,d . The proof of Claim 4.1 is given in Section 6.4. Lower bounding PD Y k ( f ). LetỸ k ⊆ Y k be the set of monomials formed by picking exactly one Y -variable from each of the matrices Y (1) , . . . , Y (k ) and taking their product. Then, |Ỹ k | = n 2k . Recall PD Y k ( f ) denotes the skewed partial derivative of f as defined in Definition 2.2.
Proof. The derivative of f with respect to a monomial m ∈ Y k is nonzero if and only if m ∈Ỹ k . Also, such a derivative ∂f ∂m is a multilinear degree-r monomial in X -variables. The derivatives of f with respect to two distinct monomials m and m inỸ k give two distinct multilinear degree-r monomials in X -variables. Hence, Upper bounding PD Y k of a multilinear depth-three circuit.
We need to upper bound the dimension of the "skewed" partial derivatives of a term T i = T (say). Let T = q j=1 l j , where l j is a linear polynomial. Among the q linear polynomials at most |X | of them contain the X variables. Without loss of generality, assume the linear polynomials l 1 , . . . , l p contain X -variables and the remaining l p+1 , . . . , l q are X -free (here p ≤ |X |). Let Q = q j=p+1 l j . Then, T = Q · p j=1 l j . We take the derivative of T with respect to a monomial m ∈ Y k and then substitute the Y variables to zero. Applying the product rule of differentiation and observing that the derivative of a linear polynomial with respect to a variable makes it a constant, we have the following: where α S 's are constants from the field. Here, m is a representative element of the set Y k . Hence, every such derivative can be expressed as a linear combination of k t =0 ( p t ) ≤ (k + 1) · ( |X | k ) polynomials, where the last inequality is due to . It follows from Claim 4.2 and Lemma 4.1 that the top fan-in s of any multilinear depth-three circuit computing f (X , Y ) is such that as n ≥ 6 and k ≤ |X |/2 (required in Lemma 4.1). Claim 4.1 now completes the proof of Theorem 1.7. Theorem 1.7 implies the following corollary (already known due to Reference [36]) as IMM n,d is a simple projection of Det nd×nd , the determinant of an nd × nd symbolic matrix [44]. Proof of Theorem 1.8. We now show that the polynomial f (X , Y ), computed by the ABP M, can also be computed a multilinear depth-four circuit of size O (n 2 d ) and having top fan-in just one. ABP M has k matrices, Y (1) , . . . , Y (k ) , containing the Y -variables. Associate with each matrix Y (i ) two matrices containing the X -variables, one on the immediate left X (2i−1) , and one on the immediate right X (2i ) . Every monomial in f is formed by picking exactly one variable from every matrix and taking their product. Once we pick y (i ) u,v from Y (i ) , this automatically fixes the variables picked from X (2i−1) , and X (2i ) , as these are diagonal matrices. Moreover, any variable can be picked from Y (i ) irrespective of which other Y-variables are picked from Y (1) , . . . , This observation can be easily formalized to show that The size of this multilinear ΠΣΠ circuit is O (n 2 k ) = O (n 2 d ).

PROOF OF THEOREM 1.9
We prove Theorem 1.9 in this section. In particular, we use the shift and rank concentration technique used in Reference [3] to give a quasi-polynomial time hitting set for a restricted class of multilinear depth-three circuits. The model we consider is a multilinear depth-three circuit that is both a superposition of m set-multilinear depth-three circuits and simultaneously a sum of l set-multilinear depth-three circuits, where m and l are constants. Before we prove Theorem 1.9, we briefly review the shift and rank concentration technique from [3].
Shift and rank concentration. Suppose we wish to check whether a polynomial computed by a set-multilinear depth-three circuit is identically zero. Let the given circuit be . , x j,n } and l i, j 's are linear polynomials in variables X j . We view the polynomial C as a s coordinate vector where the ith coordintae is the polynomial computed by the ith product gate. A dot product with the all ones vector 1, would give us the polynomial C. In shift and rank concentration, we shift each variable x j,r to x j,r = x j,r + t j,r , where t j,r 's are formal variables. Let T j = {t j,1 , t j,1 , . . . , t j,n }, T = d j=1 T j , S ⊆ X , ν S = x j, r ∈S x j,r and Z ν S be the coefficient vector over F (T ) corresponding to the monomial ν S in C (X ). The idea is to use a map τ : t j,r → t ω j, r , where t is a fresh variable different from X and T , such that where span F (t ) {Z ν S } denotes the span of the coefficient vectors over F (t ) corresponding to the different monomials in the shifted polynomial and |S | equals the support 7 of the monomial ν S . We say that such a map τ achieves log s concentration. [3] showed that it is sufficient to try nd O (log s ) many maps to find a map that achieves log s concentration such that the ω jr 's of such a map are bounded by (nd ) O (log s ) . After such a shift using the desired map, the polynomial C is nonzero if and only if there a exists a monomial in the shifted polynomial with support less than or equal to log s and a nonzero coefficient in F (t ). Thus, we check whether the shifted polynomial has a nonzero monomial with support less than or equal to log s , by projecting over all possible choices of log s variables and test if the shifted polynomial is nonzero using Reference [27] in (nd ) O (log s ) time. Now we prove Theorem 1.9. Theorem 1.9 (restated). Let C n,m,l,s be a subclass of multilinear depth-three circuits computing n-variate polynomials such that every circuit in C n,m,l,s is a superposition of at most m set-multilinear depth-three circuits and simultaneously a sum of at most l set-multilinear depth-three circuits, and has top fan-in s. There is a hitting-set generator for C n,m,l,s running in (ns) O (lm log s ) time.
Proof. Circuit C is a superposition of m set-multilinear depth-three circuits in base sets X 1 , X 2 , . . . , X m . Circuit C is also a sum of l set-multilinear depth-three circuits C 1 , C 2 , . . . ,C l with top fan-in s 1 , s 2 , . . . , s l , respectively, and s 1 + s 2 + · · · + s l = s. We make the following assumptions on C: (1) For all u ∈ [m], |X u | = a and X u = {x u,1 , x u,2 , . . . , x u,a }. (2) Every product node in C computes a degree a polynomial in X variables.
The second assumption allows us to associate m permutations σ k,1 , . . . , σ k,m corresponding to base sets X 1 , . . . , X m , respectively, such that circuit C k computes the polynomial , and j ∈ [a]. These assumptions are without loss of generality and the arguments continue to hold in their absence. In particular, these assumptions enable us to present the main ideas of the proof clearly. We outline these ideas in brief below after we setup a few more notations. We have m sets of shift variables denoted T u = {t u,1 , t u,2 , . . . , t u,a }, for all u ∈ [m], and T = u ∈[m] T u . For convenience, we denote the union of the first r base sets of variables as U r , i.e., U r = u ∈[r ] X u , and W r = X \ U r .
Proof outline. The variable x u, j is shifted to x u, j + t u, j , and at first we argue that after this shift there is a monomial μ in X variables of support at most m log s with a nonzero coefficient in F [T ] if and only if C computes a nonzero polynomial. Naturally, this is true for any polynomial, but the way we prove it for C (X ), enables us to importantly show in the second part that we can construct a map that sets t u, j to t ω u, j , where t is a fresh variable and ω u, j has an appropriate small value, such that after applying the map μ has a nonzero coefficient polynomial in F [t]. We argue the first part iteratively: in the first iteration we show there is a monomial μ 1 in X 1 variables of support at most log s whose coefficient polynomial in F [W 1 T ] is nonzero. We induct on this nonzero coefficient polynomial, which is computed by a depth-three circuit that is a superposition of m − 1 set-multilinear depth-three circuits and is a sum of l set-multilinear depth-three circuits. In particular, at step r , we have a polynomial in F [W r −1 T ], that is computed by a depth-three circuit that is a superposition of m − (r − 1) set-multilinear depth-three circuits and a sum of l setmultilinear depth-three circuits. Such a polynomial we show has a monomial μ r in X r variables of support at most log s whose coefficient polynomial in F [W r T ] is nonzero. Thus, at the end of step m, the product of the monomials μ = m r =1 μ r has support at most m log s and a nonzero coefficient polynomial in F [T ]. The next part of the proof is the most important, here we argue that we can efficiently construct a map that sets t u, j to t ω u, j , where t is a fresh variable and ω u, j is bounded by (ns) O (lm log s ) , such that after applying the map, at every step r , μ r has a nonzero coefficient over F [t,W r ], and hence μ has a nonzero coefficient over F [t]. Once we show this, finding a hitting set is easy: project over all possible choices of (m log s) variables and test if the shifted polynomial is nonzero over F [t] using sparse PIT [27].

Part 1:
The polynomial computed by C after shifting the variables x u, j to x u, j + t u, j , for all u ∈ [m] and j ∈ [a] is denoted as C (X T ). We argue inductively that C (X T ) when viewed as a polynomial over F [T ] has a monomial μ of support at most m log s with a nonzero coefficient over F [T ]. We present the inductive step here, the base case can be argued similarly. At step r , we are ensured that there are monomials μ 1 , μ 2 , . . . , μ r −1 in variables X 1 , X 2 , . . . , X r −1 , respectively, each having support at most log s such that r −1 i=1 μ i has a nonzero coefficient over F [W r −1 T ]. An easy to see observation is that such a coefficient, when viewed as a polynomial over F [T ] is computed by a circuit C (r ) that is a superposition of m − (r − 1) set-multilinear depth-three circuits in base sets X r , . . . , X m , and is a sum of l set-multilinear depth-three circuits. For k ∈ [l], the k-th set-multilinear depth-three circuit, denoted C (r ) k , is the circuit computing the coefficient of the monomial r −1 i=1 μ i in C k . Without loss of generality and reusing symbols, the polynomial C (r ) k (W r −1 T ) computed by circuit C (r ) k can be represented as Without loss of generality, we may assume for all k ∈ [l] σ k,r is the identity permutation. Further for convenience of notation, in each product gate we can include the first (r − 1) log s linear polynomials by assuming for all k ∈ [l] i ∈ [s k ], u ∈ [m] and j ∈ [(r − 1) log s] z (i ) u,σ k,u (j ) = 0 and α i, j = 1. Also using the same argument as in [3], we may also assume for all k ∈ For convenience, we define ρ i, j as z (i ) 1,σ k, 1 (j ) t 1,σ k, 1 (j ) + · · · + z (i ) r −1,σ k, r −1 (j ) t r −1,σ k, r −1 (j ) + z (i ) r +1,,σ k, r +1 (j ) (x r +1,σ k, r +1 (j ) + t r +1,σ k, r +1 (j ) ) + · · · + z (i ) m,σ k,m (j ) (x m,σ k,m (j ) + t m,σ k,m (j ) ), and hence we express C (r ) We view the polynomial C (r ) (W r −1 T ) = l k=1 C (r ) k (W r −1 T ) as a polynomial in X r variables over the function field F (W r T ), and we will prove that there is monomial μ r in X r variables of support at most log s with a nonzero coefficient polynomial in F [W r T ]. To this end, we rewrite Equation (1) as follows: Then, 1 +z (i ) r, j x r, j .
Since this is true for all i ∈ [s], from Equation (6), we get the following relation among the vectors Z ν I , for all I ⊆ J : Since among all  (7), and the correctness of the proof hinges on the coefficient of the log s + 1 support monomial (i.e., д J (W r T )) being nonzero in these equations. In this part, we analyze the structure of д J (W r T ) for all r ∈ [m], and J ⊆ [a] such that |J | = log s + 1, to argue that it is possible to construct a map ψ in time (ns) O (lm log s ) that sets t u, j to t ω u, j , where ω u, j 's are also bounded by (ns) O (lm log s ) , such that д J (t,W r ) ∈ F (t,W r ) remains nonzero after applying this map. It follows immediately that after applying ψ the monomial μ has a nonzero coefficient over We begin by proving the following claim.
Claim 5.2. The expression denoted д J (W r T ) in Equation (7)  Proof. Recall Equations (3), (4), and (5). We rewrite Equation (5) below Define vectors Z ν I ∈ F [W r T ] s , for all I ⊆ J , such that the ith entry of Z ν I is j ∈I z (i ) r, j j ∈J \I (1 + ρ i, j ). Then, Since for all i ∈ [s] and j ∈ [J ], ρ i, j is a linear polynomial in 2(m − 1) variables from W r T , the expression j ∈J \I (1 + ρ i, j ) is a polynomial in O (m log s) variables from W r T . We call the variables appearing in the expression j ∈J \I (1 + ρ i, j ) as the variable set of the expression. Recall that circuit C (r ) is a sum of l set-multilinear depth-three circuits, and if two expressions correspond to product gates from the same set-multilinear depth-three circuit then they have the same variable set, i. From Claim 5.2 it follows that the number of monomials in T variables in the numerator and denominator of д J (W r T ) is equal to s O (lm log s ) . Thus, the number of monomials in T variables in the numerator/denominator of д J (W r T ), at every iteration r ∈ [m], and for all J ⊂ [a] of size log s + 1 is (ns) O (lm log s ) . The map ψ maps these (ns) O (lm log s ) monomials in T variables to distinct monomials in F [t], and it is standard to compute such a map ψ (t u, j ) = t ω u, j in (ns) O (lm log s ) time such that the values of ω u, j are also bounded by (ns) O (lm log s ) [27].

PROOF OF TECHNICAL CLAIMS 6.1 Proofs of Observations in Section 1
Observation 1.1 (restated). Given a circuit C, if C is a superposition of t set-multilinear circuits on unknown base sets Y 1 , Y 2 , . . . , Y t , finding t base sets Y 1 , Y 2 , . . . , Y t such that C is a superposition of t set-multilinear circuits on base sets Y 1 , Y 2 , . . . , Y t is NP-hard when t > 2.
Proof. We will reduce the t-coloring problem to this problem. Given a graph G, the t-coloring problem asks to color the vertices of G with t colors such that no two adjacent vertices of G have the same color. Suppose we are given a graph G = (V , E). From G construct a circuit C as follows. Let V = {u 1 , . . . ,u n }, identify these vertices with n-variables. The circuit C contains a product gate P multiplying the n variables (u 1 ) · · · (u n ). If there exists an edge between two vertices u 1 and u 2 in G, then add a product gate P u 1 ,u 2 in C having a single linear polynomial (u 1 + u 2 ). We argue below that the circuit C is a superposition of t set-multilinear depth-three circuits if and only if the graph G is t-colorable.
Suppose G is t-colorable. Then the set of vertices in G with the same color form a valid base set. Thus, the t sets of vertices of G with t different colors correspond to t valid base sets in C and thus C is a superposition of t set-multilinear depth-three circuits. In the reverse direction, say C is a superposition of t set-multilinear depth-three circuits, which implies C has t base sets. A t-coloring of G can be obtained by giving every base set a unique color. If two variables u i and u j belong to the same base set, then as u i and u j appear in different linear polynomials in product gate P there is no product gate in C in which u i and u j appear in the same linear polynomial. But this implies there is no edge between u i and u j in G else there would have been a product gate P u i ,u j having a single linear polynomial (u i + u j ) in C.

Observation 1.2 (restated). A polynomial computed by a multilinear ΣΠΣ circuit with top fanin two and at most two variables per linear polynomial can also be computed by an ROABP with constant width.
Proof. Let C be a multilinear depth-three circuit with top fan-in two and at most two variables per linear polynomial computing the polynomial f (X ) in n variables {x 1 , . . . x n }. Let σ : [n] → [n] be a permutation function. Then without loss of generality f (X ) can be expressed as We have assumed that the coefficients of x i 's and the constant term in every linear polynomial is 1, and n is even for simplicity. The arguments for this case can be adapted appropriately to prove it for the general case. Let P 1 = i ∈[n],i odd (1 + x i + x i+1 ) and P 2 = i ∈[n],i odd (1 + x σ (i ) + x σ (i+1) ). Product gates P 1 and P 2 can be easily computed individually by ROABPs of width two but with different variable orderings. We express the two ROABPs in the same variable ordering and add the polynomials computed by them (P 1 and P 2 ) to get an ROABP computing f .
We partition the linear polynomials in P 1 and P 2 into sets {L 11 , L 12 , . . . , L 1k } and {L 21 , L 22 , . . . , L 2k }, respectively, such that the sets of variables appearing in the linear polynomials in L 1t and L 2t , where t ∈ [k], are equal and this set is completely disjoint from the set of variables appearing in the linear polynomials in L mr , where m ∈ [2] and r ∈ [k] \ {t }. We give a "greedy"  partition procedure below. Mark all the linear polynomials in P 1 and P 2 as unpicked. Initialize t = 1 and i = 1: (1) Pick an unpicked linear polynomial l p = (1 + x i + x i+1 ) in P 1 and put it in L 1t . Mark l p as picked. Store the value i in temp: temp = i. (2) Let the linear polynomial in which the variable x i+1 appears in P 2 be l q = (1 + x i+1 + x j ).
Put l q in L 2t and mark l q as picked. (3) If j is equal to temp, then increment t and start from step 1. (4) Else set i = j and let the linear polynomial in which the variable x i appears in P 1 be l r = (1 + x i + x i+1 ). Put l r in L 1t and mark l r as picked. (5) Repeat from step 2.
Clearly, the sets of variables appearing in the linear polynomials in L 1t and L 2t , where t ∈ [k], are equal and this set is disjoint from the set of variables appearing in the linear polynomials in L mr , for m ∈ [2] and r ∈ [k] \ {t }. Notice that if some of the coefficients of the variables in the linear polynomial were zero (instead of 1 as in the assumption made by us) or some of the linear polynomials involved just a single variable then the sets of variables appearing in the linear polynomials in L 1t and L 2t may not be the same but their union will still be completely disjoint from the set of variables appearing in the linear polynomials in L mr , where m ∈ [2] and r ∈ [k] \ {t }. Also, the above partition procedure can be changed appropriately to handle these cases.
We express the two ROABPs computing P 1 and P 2 in the same variable ordering as a sequence of k parts. In part t, we compute the product of linear polynomials in L 1t and L 2t separately using two ROABPs such that the variable orderings in these two ROABPs are the same. Finally, we connect the ROABPs from these k parts to give a single ROABP of width six. We argue how to construct an ROABP corresponding to the linear polynomials in L 1t and L 2t . Arrange the linear polynomials in L 1t and L 2t in the order they are picked during the partition process. Suppose after this arrangement we have L 1t = {(1 + x i + x i+1 ), (1 + x j + x j+1 ), . . . , (1 + x l + x l +1 )} and L 2t = {(1 + x i+1 + x j ), (1 + x j+1 + x k ), . . . , (1 + x l +1 + x i )}. Figure 2 shows the two ROABPs computing the product of linear polynomials in L 1t and L 2t , respectively. Consider the input and output nodes of L 1t and L 2t marked in Figure 2 as the sources and sinks of these two ROABPs, respectively. The variables are arranged such that except x i all variables are in the same order in the two ROABPs. We order x i by breaking the second ROABP in two parts as shown in Figure 3. The first part computes the polynomial in which x i does not appear and the second part brings x i to the beginning and computes the polynomial in which x i appears. Finally, we add these two parts by adding an extra layer. 9 In a general case where some of the coefficients of the variables in the linear polynomial are zero or some of the linear polynomials involved just a single variable, the variables in the linear polynomials in L 1t and L 2t may define a path instead of a cycle. But notice that handling this case is easier, as both the ROABPs can be expressed in the same variable ordering to begin with. Finally, we have directed acyclic graphs each consisting of two ROABPs with consistent variable ordering from all the k pairs of sets of linear polynomials. We connect these k graphs by adding weight 1 edges between the input nodes of L 1r , L 2r and the output nodes of L 1(r +1) , L 2(r +1) , respectively, where r ∈ [k − 1]. The resulting graph is an ROABP of width six computing f . Proof. We transform the width n ABP computing IMM n,d to a width n 2 ROABP computing the same. Let {X (1) , X (2) , . . . , X (d ) } be the d matrices in IMM n,d . The (j, k )-th entry in X (i ) is x (i ) j,k . We replace matrix X (i ) by n 2 + 2 matrices: A (i,1) , A (i,2) and A (i, j,k ) where j, k ∈ [n]. A (i,1) and A (i,2) are rectangular matrices of dimension n × n 2 and n 2 × n, respectively. For j, k ∈ [n], A (i, j,k ) are diagonal matrices of dimension n 2 . Ordered from left to right A (i,1) and A (i,2) are first and last, respectively, and A (i, j 1 ,k 1 ) comes before A (i, j 2 ,k 2 ) if j 1 < j 2 or if j 1 = j 2 and k 1 < k 2 . The (a, a)-th entry of A (i, j,k ) is x (i ) j,k if a = n · (j − 1) + k and 1 otherwise. The (a, b)th entry of A (i,1) is 1 if (a − 1) · n + 1 ≤ b ≤ a · n and 0 otherwise. Similarly, the (a, b)th entry of A (i,2) is 1 if b ≡ a mod n and 0 otherwise. Figure 4 shows the part of ROABP corresponding to the split of a matrix X into n 2 + 2 matrices, when n = 4, as explained above. When we split X (i ) into n 2 + 2 matrices as above the corresponding part of ROABP computing the product of these n 2 + 2 matrices has n vertices in both the leftmost and rightmost layers of vertices. There is a unique path from the jth vertex in leftmost layer to the kth vertex in rightmost layer with weight x (i ) j,k . Hence, the product of the n 2 + 2 matrices arranged as above is X (i ) .
To transform the ABP computing IMM n,d to an ROABP, we have introduced between every pair of adjacent layers of vertices in the ABP, n 2 layers with n 2 vertices in each layer, hence the width of the ROABP is n 2 .

Proofs of Observations and Claims in Section 3
Claim 3.1 (restated). Polynomial f (as constructed in Section 3, proof of part 1) is computed by a multilinear depth-three circuit C of size Θ(n) and top fan-in three, and C is a superposition of two set-multilinear depth-three circuits.
Proof. Since f is a sum of three product terms, where each product term is a product of linear polynomials on disjoint sets of variables, it can be computed by a multilinear depth-three circuit C