Sylvester–Gallai for Arrangements of Subspaces

In this work we study arrangements of k-dimensional subspaces V1,…,Vn⊂Cℓ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_1,\ldots ,V_n \subset \mathbb {C}^\ell $$\end{document}. Our main result shows that, if every pair Va,Vb\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_{a},V_b$$\end{document} of subspaces is contained in a dependent triple (a triple Va,Vb,Vc\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_{a},V_b,V_c$$\end{document} contained in a 2k-dimensional space), then the entire arrangement must be contained in a subspace whose dimension depends only on k (and not on n). The theorem holds under the assumption that Va∩Vb={0}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_a \cap V_b = \{0\}$$\end{document} for every pair (otherwise it is false). This generalizes the Sylvester–Gallai theorem (or Kelly’s theorem for complex numbers), which proves the k=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1$$\end{document} case. Our proof also handles arrangements in which we have many pairs (instead of all) appearing in dependent triples, generalizing the quantitative results of Barak et. al. (Proc Natl Acad Sci USA 110(48):19213–19219, 2013). One of the main ingredients in the proof is a strengthening of a theorem of Barthe (Invent Math 134(2):335–361, 1998) (from the k=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1$$\end{document} to k>1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k>1$$\end{document} case) proving the existence of a linear map that makes the angles between pairs of subspaces large on average. Such a mapping can be found, unless there is an obstruction in the form of a low dimensional subspace intersecting many of the spaces in the arrangement (in which case one can use a different argument to prove the main theorem).


Introduction
The Sylvester-Gallai (SG) theorem states that for n points v 1 , v 2 , . . . , v n ∈ R , if for every pair v i , v j there is a third point v k on the line passing through v i , v j , then all points must lie on a single line. This was first posed by Sylvester [15], and was solved by Melchior [14]. It was also conjectured independently by Erdős [10] and proved shortly after by Gallai. We refer the reader to the survey [5] for more information about the history and various generalizations of this theorem. The complex version of this theorem was proved by Kelly [12] (see also [8,9] for alternative proofs) and states that if v 1 , v 2 , . . . , v n ∈ C and for every pair v i , v j there is a third v k on the same complex line, then all points are contained in some complex plane (over the complex numbers, there are planar examples and so this theorem is tight).
In [8] (based on earlier work in [4]), the following quantitative variant of the SG theorem was proved. For a set S ⊂ C we denote by dim(S) the smallest d such that S is contained in a d-dimensional subspace of C . (The dependence on δ is asymptotically tight). From here on, we will work with homogeneous subspaces (passing through zero) instead of affine subspaces (lines/planes etc). The difference is not crucial to our results and the affine version can always be derived by intersecting with a generic hyperplane. In this setting, the above theorem will be stated for a set of one-dimensional subspaces, each spanned by some v i (and no two v i 's being a multiple of each other) and collinearity of v i , v j , v k is replaced with the three vectors being linearly dependent (i.e., contained in a 2-dimensional subspace).
One natural high dimensional variant of the SG theorem, studied in [4,11], replaces 3-wise dependencies with t-wise dependencies (e.g, every triple is in some coplanar four-tuple). In this work, we raise another natural high-dimensional variant in which the points themselves are replaced with k-dimensional subspaces. We consider such arrangements with many 3-wise dependencies (defined appropriately) and attempt to prove that the entire arrangement lies in some low dimensional space. We will consider arrangements V 1 , . . . , V n ⊂ C in which each V i is k-dimensional and with each pair satisfying V i 1 ∩ V i 2 = {0}. A dependency can then be defined as a triple V i 1 , V i 2 , V i 3 of k-dimensional subspaces that are contained in a single 2k-dimensional subspace. The pair-wise zero intersections guarantee that every pair of subspaces defines a unique 2k-dimensional space (their span) and so, this definition of dependency behaves in a similar way to collinearity. For example, we have that if V i 1 , V i 2 , V i 3 are dependent and V i 2 , V i 3 , V i 4 are dependent then also V i 1 , V i 2 , V i 4 are dependent. This would not hold if we allowed some pairs to have non-zero intersections. In fact, if we allow non-zero intersection then we can construct an arrangement of two dimensional spaces with many dependent triples and with dimension as large as √ n (see below). We now state our main theorem, generalizing Theorem 1.1 (with slightly worse parameters) to the case k > 1. We use the standard V + U notation to denote the subspace spanned by all vectors in V ∪ U . We use big 'O' notation to hide absolute constants. Theorem 1.2 Let V 1 , V 2 , . . . , V n ⊂ C be k-dimensional subspaces such that V i ∩ V i = {0} for all i = i ∈ [n]. Suppose that, for every i 1 ∈ [n] there exists at least δn values of i 2 ∈ [n] \ {i 1 } such that V i 1 + V i 2 contains some V i 3 with i 3 / ∈ {i 1 , i 2 }. Then dim(V 1 + V 2 + · · · + V n ) = O(k 4 /δ 2 ).
The condition V i ∩ V i = {0} is needed due to the following example. Set k = 2 and n = ( − 1)/2 and let {e 1 , e 2 , . . . , e } be the standard basis of R . Define the n spaces to be V i j = span{e i , e j } with 1 ≤ i < j ≤ . Now, for each (i, j) = (i , j ) the sum V i j + V i j will contain a third space (since the size of {i, j, i , j } is at least three). However, this arrangement has dimension > √ n. The bound O(k 4 /δ 2 ) is probably not tight and we conjecture that it could be improved to O(k/δ), possibly with a modification of our proof. One can always construct an arrangement with dimension 2k/δ by partitioning the subspaces into 1/δ groups, each contained in a single 2k dimensional space.
Overview of the Proof A preliminary observation is that it suffices to prove the theorem over R. This is because an arrangement of k-dimensional complex subspaces can be translated into an arrangement of 2k-dimensional real subspaces (this is proved at the end of Sect. 2). Hence, we will now focus on real arrangements.
The proof of the theorem is considerably simpler when the arrangement of subspaces V 1 , . . . , V n satisfies an extra 'robustness' condition, namely that every two spaces have an angle bounded away from zero. More formally, if for every two unit vectors v 1 ∈ V i 1 and v 2 ∈ V i 2 we have | v 1 , v 2 | ≤ 1 − τ for some absolute constant τ > 0. This condition implies that, when we have a dependency of the form V i 3 ⊂ V i 1 + V i 2 , every unit vector in V i 3 can be obtained as a linear combination with bounded coefficients (in absolute value) of unit vectors from V i 1 , V i 2 . Fixing an orthogonal basis for each subspace and using the conditions of the theorem, we are able to construct many local linear dependencies between the basis elements. We then show (using the bound on the coefficients in the linear combinations) that the space of linear dependencies between all basis vectors, considered as a subspace of R kn , contains the rows of an nk × nk matrix that has large entries on the diagonal and small entries off the diagonal. Since matrices of this form have high rank (by a simple spectral argument), we conclude that the original set of basis vectors must have small dimension.
To handle the general case, we show that, unless some low dimensional subspace W intersects many of the spaces V i in the arrangement, we can find a change of basis that makes the angles between the spaces large on average (in which case, the previous argument works). This gives us the overall strategy of the proof: If such a W exists, we project W to zero and continue by induction. The loss in the overall dimension is bounded by the dimension of W , which can be chosen to be small enough. Otherwise (if such W does not exist) we apply the change of basis and use it to bound the dimension.
The change of basis is found by generalizing a theorem of Barthe [1] (see [7] for a more accessible treatment) from the k = 1 case (arrangement of points) to higher dimension. We state this result here since we believe it could be of independent interest. To state the theorem we must first introduce the following, somewhat technical, definition.

Definition 1.3 (Admissible basis set, admissible basis vector) Given a list of vector spaces
i.e. if every space with index in H has intersection {0} with the span of the other spaces with indices in H , and the spaces with indices in H span the entire space i∈[n] V i .
A V-admissible basis vector is any indicator vector 1 H of some V-admissible basis set H (where the ith entry of 1 H equals 1 if i ∈ H and 0 otherwise).
The following theorem is proved in Sect. 3. 1

in the convex hull of all V-admissible basis vectors. Then there exists an invertible linear map
The connection to the explanation given in the proof overview is as follows: If there is no subspace W of low dimension that intersects many of the spaces V 1 , . . . , V n , then one can show that there exists a vector p in the convex hull of all V-admissible basis vectors such that the entries of p are not too small. This is enough to show that the average angle between pairs of spaces is large since otherwise one can derive a contradiction to the inequality which says that the sum of orthogonal projections of any unit vector must be relatively small.
The proof of the one dimensional case in [1] proceeds by defining a strictly convex function f (t 1 , . . . , t m ) on R m and shows that the function is bounded. This means that there must exist a point in which all partial derivatives of f vanish. Solving the resulting equations gives an invertible matrix that defines the required change of basis. We follow a similar strategy, defining an appropriate bounded function f (t 1 , . . . , t m , R 1 , . . . , R n ) in more variables, where the extra variables R 1 , . . . , R n represent the action of the orthogonal group O(k) on each of the spaces. However, in our case, we cannot show that f is strictly convex and so a maximum might not exist. However, we are still able to show that there exists a point in which all partial derivatives are very small (smaller than any ε > 0), which is sufficient for our purposes.

Connection to Locally Correctable Codes
A q-query Locally Correctable Code (LCC) over a field F is a d-dimensional subspace C ⊂ F n that allows for 'local correction' of codewords (elements of C) in the following sense. Let y ∈ C and suppose we have query access to y such that y i = y i for at least (1 − δ)n indices i ∈ [n] (think of y as a noisy version of y). Then, for every i, we can probabilistically pick q positions in y and, from their (possibly incorrect values), recover the correct value of y i with high probability (over the choice of queries). LCC's play an important role in theoretical computer science (mostly over finite fields but recently also over the reals, see [6]) and are still poorly understood. In particular, when q is constant greater than 2, there are exponential gaps between the dimension of explicit constructions and the proven upper bounds. In [3] it was observed that q-LCCs are essentially equivalent to configurations of points with many local dependencies 2 . A variant of Theorem 1.1 shows for example that the maximal dimension of a 2-LCC in R n has dimension bounded by (1/δ) O (1) . Our results can be interpreted in this framework as dimension upper bounds for 2-query LCC's in which each coordinate is replaced by a 'block' of k coordinates. Our results then show that, even under this relaxation, the dimension still cannot increase with n. The case of 3-query LCC's over the reals is still wide open (some modest progress was made recently in [7]) and we hope that the methods developed in this work could lead to further progress on this tough problem.
Organization In Sect. 2, we define the notion of (α, δ)-systems (which generalizes the SG condition) and reduce our k-dimensional Sylvester-Gallai theorem to a more general theorem, Theorem 2.8, on the dimension of (α, δ)-systems (this part also includes the reduction from complex to real arrangements). Then, in Sect. 3, we prove the generalization of Barthe's theorem (Theorem 1.4). Finally, in Sect. 4, we prove our main result regarding (α, δ)-systems.

Reduction to (α, δ)-Systems
The notion of an (α, δ)-system is used to 'organize' the dependent triples in the arrangement in a more convenient form so that each space is in many triples and every pair of spaces is together only in a few dependent triples. We also allow dependent pairs as those might arise when we apply a linear map on the arrangement.

Definition 2.1 ((α, δ)-System) Given a list of vector spaces
Every S j is a subset of [n] of size either 3 or 2. 2. If S j contains 3 elements i 1 , i 2 and i 3 , If S j contains 2 elements i 1 and i 2 , then V i 1 = V i 2 . 3. Every i ∈ [n] is contained in at least δn sets of S. 4. Every pair {i 1 , i 2 } (i 1 = i 2 ∈ [n]) appears together in at most α sets of S.
Note that we allow δ > 1 in an (α, δ)-system. This is different from the statement of the Sylvester-Gallai theorem where δ ∈ [0, 1]. We have the following simple observations.

Lemma 2.2 Let
Proof We consider the sum j∈[w] |S j |. By the definition of (α, δ)-system,

Then we consider the number of pairs j∈[w]
|S j | 2 , and we can see It follows that δ/α ≤ 3/2. Proof We iteratively remove all V i 's that appear in less than δn/2 sets, and the sets they appear in. There are n V i 's in total, so we can remove at most n · δn/2 sets. When the procedure ends, we still have at least δn 2 − δn 2 /2 ≥ δn 2 /2 sets. So we do not remove all of V 1 , V 2 , . . . , V n . For a remaining V i , since it appears in at least δn/2 sets, we must still have at least δn/(2α) vector spaces left. Let V be the list of these spaces and S be the list of the remaining sets. We can see that S is an (α, δ/2)-system of V .

Lemma 2.4 Let
be a list of vector spaces with an (α, δ)-system S = (S 1 , S 2 , . . . , S w ). Then for any linear map P : Suppose we remove all zero ({0}) spaces in V in the following way: 1. Let n be the number of nonzero (not {0}) vector spaces in V, and φ be a one-toone mapping from the indices of nonzero spaces to [n ].
to be all the nonzero spaces.

For each S j
3. We remove the S j 's that are empty.
Let S be the list of the remaining sets in S 1 , Proof We first consider an S j containing 3 elements i 1 , i 2 and i 3 .
We then consider an S j containing 2 elements i 1 and i 2 .
In summary, the first two requirements of the definition of (α, δ)-system are satisfied. We can also see that each i ∈ [n ] is contained in at least δ n = δn sets and each pair is contained in at most α sets, because we have only removed the sets containing only indices of zero spaces. Therefore the third and fourth requirements are also satisfied.
Combining the above two lemmas, we have the following corollary.
be a list of vector spaces with an (α, δ)-system, and P : R → R be any linear map. Define V = (V 1 , V 2 , . . . , V n ) to be the list of nonzero spaces in P(V 1 ), P(V 2 ), . . . , P(V n ). Then V has an (α, δ )system, where δ = δn/n . Theorem 1.2 will be derived from the following, more general statement, saying that the entire arrangement is contained in a low dimensional space if there is an (α, δ)-system.
We can easily reduce the high dimensional Sylvester-Gallai problem in C (Theorem 1.2) to the setting of Theorem 2.8 in R as shown below.

Proof of Theorem 1.2 using Theorem 2.8 Let
where the span is taken over real numbers.
Since λ 1 , λ 2 , . . . , λ k , μ 1 , μ 2 , . . . , μ k can take all values in R, we can see the claim is proved. We call a 2k-dimensional subspace U ⊂ C special if it contains at least three of V 1 , V 2 , . . . , V n . We define the size of a special space as the number of spaces among V 1 , V 2 , . . . , V n contained in it. For a special space with size r , we take the r 2 −r triples of indices of the spaces in it with the properties in Claim 2.10. Let S be the family of all these triples. We claim that S is a (6, 3δ One can see that every pair in [n] appears in at most 6 triples because the corresponding two spaces are contained in at most one special space, and the pair appears at most 6 times in the triples constructed from this special space. For every j ∈ [n], there are at least δn values of j ∈ [n] \ { j} such that there is a special space containing V j and V j . This implies that the number of triples that j appears in is Therefore S is a (6, 3δ)-system of V. By Theorem 2.8, Note that ,s∈ [k] (span with complex coefficients), ,s∈ [k] (span with real coefficients).
We thus have dim(

A Generalization of Barthe's Theorem
We prove Theorem 1.4 in the following 3 subsections. In the fourth and last subsection, we state a convenient variant of the theorem (Theorem 3.8) that will be used later in the proof of our main result. The idea of the proof is similar to [1] (see also [7,Sect. 5]), which considers the maximum point of a function, and using the fact that all derivatives are 0 the result is proved. Here we consider a similar function f defined in Sect. 3.1. However, since our problem is more complicated, it is unclear whether we can find a maximum point at which all derivatives are 0. Instead we will show that there is a point with very small derivatives in Sect. 3.2, which is sufficient for our proof of the theorem in Sect. 3.3.

The Function and Basic Properties
We use the notations V = (V 1 , V 2 , . . . , V n ) and p = ( p 1 , p 2 , . . . , p n ) that are introduced in the statement of Theorem 1.4. Let k 1 , k 2 , . . . , k n be the dimensions of V 1 , V 2 , . . . , V n respectively and m = k 1 + k 2 + · · · + k n . Throughout our proof, we use pairs We define a vector γ ∈ R m as where, for every i ∈ [n], the vectors x i j are given by We note that here for every i ∈ [n], j ∈ [k i ], x i j is a function of R i and {x i1 , . . . , x ik i } is another basis of V i . The next lemma shows that the function f is bounded over its domain. The proof is similar to Proposition 3 in [1]. For completeness, we include the proof here. In the proof, we will use the Cauchy-Binet formula which states that for an × m matrix A and an m × matrix B, where A I denotes the × matrix that consists of the A's columns with indices in I , and B I denotes the × matrix that consists of the B's rows with indices in I . We use t I to denote the sum of the entries in t with indices in I , and L I be the × submatrix of [x 11 , . . . , . . . , x nk n ] containing only the columns with indices in I . We then have Using Eqs. (1) and (2), (By (2)).
Take the logarithm of both sides, The right side is a function of the orthogonal matrices R 1 , R 2 , . . . , R n because L I is a function of them. We use f (R 1 , R 2 , . . . , R n ) to denote the right side of the above inequality. For μ I = 0, I must be a good basis set. Hence det(L I ) = 0 no matter what the orthogonal matrices R 1 , R 2 , . . . , R n are, and f is a well-defined continuous function. Since f is defined on the compact set O(k 1 ) × O(k 2 ) × · · · × O(k n ), it must have a finite upper bound. And that is also an upper bound for the function f .

Finding a Point with Small Derivatives
We first define some notation. Let Note that X is always a positive definite matrix, since for any w = 0, when x 11 , . . . , . . . , x nk n span the entire space (implied by V 1 + V 2 + · · · + V n = R ). Let M be an × full rank matrix satisfying M T M = X −1 . We can consider M as a function of t, R 1 , R 2 , . . . , R n .
In a later part of the proof we will show that the linear map obtained from M satisfies the requirement in Theorem 1.4 when t, R 1 , R 2 , . . . , R n take appropriate values. We first find an appropriate value of (R 1 , R 2 , . . . , R n ) = (R * 1 (t), R * 2 (t), . . . , R * n (t)) for every t ∈ R m , and then find some t * with specific properties.

Lemma 3.2 For every
Proof The first condition can be satisfied by the compactness of O(k 1 ) × O(k 2 ) × · · · × O(k n ). We will show how to change (R * 1 (t), R * 2 (t), . . . , R * n (t)), which already satisfies the first condition, so that it satisfies the second condition while preserving the first condition.
Fix an i ∈ [n] and partition the indices of (t i1 , t i2 , . . . , t ik i ) into equivalence classes J 1 , J 2 , . . . , J b ⊆ [k i ] such that for j, j in the same class t i j = t i j and for j, j in different classes t i j = t i j . We use t J r to denote the value of t i j for j ∈ J r , and L J r to denote the matrix consisting of all columns x i j with j ∈ J r . The terms in X that depend on R i are where Q r can be taken to be any |J r | × |J r | orthogonal matrix. This means that if we denotes the matrix in which the submatrix with row and column indices J r is Q r ), or equivalently change L J r to L J r Q r for every r ∈ [b], the matrix X does not change, hence M and f do not change, and the first condition is preserved as f is still the maximum for the fixed t.
For every r ∈ [b], we can find a Q r such that the columns of M L J r Q r are orthogonal (consider the singular value decomposition of M L J r ). Change R * i (t) to R * i (t) diag(Q 1 , . . . , Q b ) and the second condition is satisfied while preserving the first condition. Doing this for every i we can obtain an (R * 1 (t), R * 2 (t), . . . , R * n (t)) satisfying both conditions. From now on we use R * 1 (t), R * 2 (t), . . . , R * n (t) to denote the matrices satisfying the conditions in Lemma 3.2.

Lemma 3.3 For any ε > 0, there exists t * ∈ R m such that for every
This lemma follows immediately from the following more general lemma.

Lemma 3.4 Let
be a compact set. Let f : R m × A → R and y * : R m → A be functions satisfying the following properties:

For every fixed y ∈ A, f (x, y) as a function of x is differentiable on R m .
Then, for every ε > 0, there exists an x * ∈ R m such that for every i ∈ [m], Proof We use f * (x) to denote f (x, y * (x)). For the sake of contradiction, assume that for any x ∈ R m , there is an index i ∈ [m] such that In particular, there is a derivative greater than ε at x = 0. Therefore there exists an We can see G = ∅ by x 0 ∈ G. By Property 1, f (x, y) is bounded and any (x, y) with sufficiently large x cannot be in G. Hence G is bounded. Since g(x, y) is a continuous function by Property 2, the set G = g −1 ([0, +∞)) must be closed. Therefore G is compact. Thus we can find Pick (x 1 , y 1 ) ∈ G with x 1 = Z . The point (x 1 , y 1 ) is in the compact set By (3), there must be an Note that f * (x 2 ) is strictly greater than f * (x * 1 ). By the maximality of f (x * 1 , y * (x * 1 )) on B Z , we can see x 2 = Z . There are two cases: 1. x 2 < Z . This implies that the maximum value of f over ). Say f archives the maximum value over B ≤Z at (x, y) = (x 3 , y * (x 3 )). Then we have x 3 < Z by the maximality of (4) and (5), we have Therefore (x 2 , y * (x 2 )) ∈ G. By the definition of Z , there should be x 2 ≤ Z , contradiction.
Thus the lemma is proved.

Proof of Theorem 1.4
Fix some ε > 0. We apply Lemma 3.3 and obtain a t * . In the remaining proof we will use X , M and x i j (i ∈ [n], j ∈ [k i ]) to denote their values when t = t * and Proof , this is guaranteed by Lemma 3.2. We only consider the case that t * Let θ ∈ R be a variable, and define x i j for i ∈ [n], j ∈ [k i ] as follows.
We consider the following function h : R → R,

Claim 3.6 h(θ ) has a maximum at θ = 0.
Proof Let R(θ ) be the k i 0 × k i 0 orthogonal matrix obtained from the identity matrix by changing the ( j 0 , j 0 ), ( j 0 , j 0 ) entries to cos θ , the ( j 0 , j 0 ) entry to sin θ , and the ( j 0 , j 0 ) entry to − sin θ . We can see R(0) is the identity matrix and Therefore for all θ ∈ R.
Thus the claim is proved.
We define It remains to show that M * is invertible. Assume it is not invertible, then there is a nonzero vector w orthogonal to the range of M * . We have Proj M * (V i ) (w) = 0 for every i ∈ [n]. This contradicts the fact that the sum of p i Proj M * (V i ) is the identity matrix. Therefore M * is invertible. Thus Theorem 1.4 is proved.

A Convenient Form of Theorem 1.4
We give Theorem 3.8 which is implied by Theorem 1.4 and is the form that will be used in our proof. Before stating the theorem, we need to define admissible sets and admissible vectors as Definition 3.7, which have weaker requirements than admissible basis sets and admissible basis vectors (Definition 1.3) as they are not required to span the entire arrangement.  Note that with a slight abuse of notation we use Proj M(V i ) to denote both the projection matrix and the projection map.

Theorem 3.8 Given a list of vector spaces
Proof We use V to denote V 1 + V 2 + · · · + V n . Let d = dim(V ) and {b 1 , b 2 , . . . , b d } be some orthonormal basis of V . We construct (V , p ) satisfying the conditions in Theorem 1.4 in the following 2 steps.
1. In this step, we construct V and p so that p is in the convex hull of all V-admissible basis vectors. Define V n+1 = span{b 1 }, V n+2 = span{b 2 }, …, V n+d = span{b d } and For every V-admissible set H ⊆ [n], we can see that H is also V-admissible, and there is a subset G ⊆ {n + 1, n + 2, . . . , n + d} such that H = H ∪ G is a V-admissible basis set. Assume where μ H ∈ [0, 1] and μ H = 1. We define where H is the V-admissible basis set extended from H as above. We can see that p is a prefix of p , and p is in the convex hull of all V-admissible basis vectors. 2. In this step, we construct V based on V so that the vector spaces span the entire Euclidean space. We find an isomorphism linear map P : We can see that V 1 + V 2 + · · · + V n+d = R d and p is in the convex hull of all V -admissible basis vectors. Hence (V , p ) satisfies the conditions in Theorem 1.4.

Apply Theorem 1.4 on (V , p ). There exists an invertible linear map
For every unit vector w ∈ R d , we have Note that the linear map P defined in Step 2 only changes orthonormal basis, and P −1 • M • P is an invertible linear map over V . We can extend this map to R and find an invertible linear map M : R → R such that M(v) = P −1 (M (P(v))) for every v ∈ V . Then for every unit vector w ∈ V , It is easy to see that the same equality holds for every unit vector w ∈ R . Recall that in Step 1, p is a prefix of p . The theorem is proved because the above inequality is stronger than the required.

Proof of the Main Theorem
Theorem 2.8 will follow from the following theorem using a simple recursive argument.
be a list of k-bounded vector spaces with an (α, δ)-system and d = dim(V 1 +V 2 +· · ·+V n ), then for any β ∈ (0, 1), at least one of these two cases holds: Proof of Theorem 2.8 using Theorem 4.1 Initially let n t ). We apply Theorem 4.1 on V (t) .
Let t ← t + 1 and repeat the procedure.

Proof of Theorem 4.1: A Special Case
In this subsection, we consider the case that all vector spaces are 'well separated'.

Definition 4.2 Two vector spaces
for any two unit vectors u ∈ V and u ∈ V .
Proof Let v = λ 1 u 1 + λ 2 u 2 + · · · + λ k 1 u k 1 and w = μ 1 u 1 + μ 2 u 2 + · · · + μ k 2 u k 2 . We have We will need the following lower bound for the rank of a diagonal dominating matrix. The same lemma for Hermitian matrices was proved in [4]. Here we change the proof slightly and show that the consequence also holds for an arbitrary matrix.
Proof Let k 1 , k 2 , . . . , k n be the dimensions of V 1 , V 2 , . . . , V n , and m = k 1 + k 2 + · · · + k n . For every i ∈ [n], fix B i = {u i1 , u i2 , . . . , u ik i } to be some orthonormal basis of V i . We use A to denote the m × matrix whose rows are u T 11 , . . . , . . . , u T nk n . We will bound d = rank(A) by constructing a high rank m × m matrix D satisfying D A = 0.
In other words, the sth row of A is a vector in B ψ(s) .

Claim 4.7 For every s ∈ [m]
, there is a vector y s ∈ R m satisfying y T s A = 0 T , y ss = δn , and t =s y 2 st ≤ α δn /τ .
Proof Say the sth row of A is u T , where u ∈ B ψ(s) . Let J ⊆ [w] be a set of size |J | = δn such that for every j ∈ J , S j contains ψ(s). We construct a vector c j for every j ∈ J as follows.
In either case we obtain a c j such that c T j A = 0 T , c js = 1 and t =s c 2 jt ≤ 1/τ . We define We have y T s A = 0 T and y ss = δn . We consider t =s y 2 st . From the above construction of c j , we can see c jt = 0 (t = s) only when ψ(t) = ψ(s) and {ψ(s), ψ(t)} ⊆ S j . Hence for every t = s, there are at most α nonzero values in {c jt } j∈J . It follows that t =s Thus the claim is proved.
Define D to be the matrix consists of rows y T 1 , y T 2 , . . . , y T m . Then every entry on the diagonal of D is δn , and the sum of squares of all entries off the diagonal is at most α δn m/τ . Apply Lemma 4.5 on D, and we have By D A = 0, the rank of A is d ≤ αk/(τ δ).

Claim 4.9
For a subset E ⊆ [n] of size greater than q, we can find a V-admissible set H ⊆ E with size at least βd/k.
Proof Initially let H = ∅. In each step we pick an i 0 ∈ E with V i 0 ∩ i∈H V i = {0}, and add i 0 to H . If such an i 0 does not exist, the procedure terminates. If |H | < βd/k, then for every i 0 ∈ E, V i 0 has a nonzero vector contained in the space i∈H V i , which has dimension at most βd. This contradicts the condition that the second case of Theorem 4.1 does not hold. Hence |H | ≥ βd/k, and the claim is proved.
We repeatedly find a V-admissible sets H 1 , H 2 , . . . such that H i ⊆ [n] \ (H 1 ∪ · · · ∪ H i−1 ) and |H i | ≥ βd/k using the above claim. We can find at most n βd/k = nk βd such V-admissible sets in total. Let I be the union of these V-admissible sets. We have |I | ≥ n − q ≥ (1 − δ/(10α))n. Let D be the uniform distribution on these V-admissible sets. We can see that the probability Pr H ∼D [i ∈ H ] ≥ βd/(kn) for every i ∈ I . Thus the lemma is proved.
We assume that neither case of Theorem 4.1 holds and apply Lemma 4.8. For i ∈ [n], we use k i to denote the dimension of V i , and p i to denote Pr H ∼ D [i ∈ H ]. Then p i ≥ βd/(kn) for every i ∈ I . We apply Theorem 3.8 with the p = ( p 1 , p 2 , . . . , p n ), and obtain an invertible linear map M : R → R such that for any unit vector w ∈ R , where V i denotes M(V i ). Since p i ≥ βd/(kn) for every i ∈ I , we have i∈I We will reduce the problem to the special case discussed in the previous subsection. We say a pair {i 1 , i 2 } ⊆ [n] is bad if V i 1 , V i 2 are not 1 2 -separated. Let S = (S 1 , S 2 , . . . , S w ) be the (α, δ)-system of V. By Lemma 2.4, S is also an (α, δ)-system of V = (V 1 , V 2 , . . . , V n ). We estimate the number of sets among S 1 , S 2 , . . . , S w containing a bad pair.
Since there are k i 0 ≤ k values of j 0 ∈ [k i 0 ], the number of i's that V i 0 and V i are not 1 2 -separated is at most k · 4k 2 n βd ≤ 4k 3 n βd ≤ δn 10α .
In the last inequality we used the assumption d > 40αk 3 /(βδ).
The number of bad pairs is at most We remove all S j 's that contains a bad pair and use S to denote the list of the remaining sets. Since each pair appears at most α times, we have removed at most δn 2 /5 sets.
Since we have removed all bad pairs, V and S must satisfy the conditions of Theorem 4.6. By Theorem 4.6, dim(V i 1 + V i 2 + · · · + V i q ) ≤ αk In the last inequality we used the assumption d > 40αk 3 /(βδ). Recall that the linear map M is invertible. So the space V i 1 + V i 2 + · · · + V i q has the same dimension as V i 1 +V i 2 +· · ·+V i q . Therefore there are q ≥ δn/(20α) spaces V i 1 , V i 2 , . . . , V i q within dimension βd. The second case of Theorem 4.1 holds, which violates our assumption that neither case of Theorem 4.1 holds. Therefore Theorem 4.1 is proved.