The Paulsen Problem Made Simple

The Paulsen problem is a basic problem in operator theory that was resolved in a recent tour-de-force work of Kwok, Lau, Lee and Ramachandran. In particular, they showed that every $\epsilon$-nearly equal norm Parseval frame in $d$ dimensions is within squared distance $O(\epsilon d^{13/2})$ of an equal norm Parseval frame. We give a dramatically simpler proof based on the notion of radial isotropic position, and along the way show an improved bound of $O(\epsilon d^2)$.


Introduction
The Paulsen problem is a basic problem in operator theory that was resolved in a recent work of Kwok, Lau, Lee and Ramachandran [12]. To state the problem, we need the following definition: Definition 1. We say that a set of vectors v 1 , v 2 , . . . , v n ∈ R d is an equal norm Parseval frame if When we drop the condition on the norm of each vector, we refer to the set of vectors as a Parseval frame or an -nearly Parseval frame respectively. Let F denote the set of all equal norm Parseval frames. Lastly for two sequences of vectors V = v 1 , v 2 , . . . , v n and W = w 1 , w 2 , . . . , w n of the same length, we let With this terminology in hand, the Paulsen problem asks: bounded by a fixed polynomial in and d?
See [12] and references therein for a detailed account of the history of the Paulsen problem along with earlier bounds on the squared distance that were polynomial in , d and n. Through a tourde-force utilizing operator scaling, connections to dynamical systems and ideas from smoothed analysis, Kwok, Lau, Lee and Ramachandran [12] proved that the squared distance is at most O( d 13/2 ). The paper was 104 pages long and highly complex. Our main result is a dramatically simpler proof of the Paulsen conjecture, that also yields a much better bound: Theorem 1 (Main). For any -nearly equal norm Parseval frame V , there is an equal norm Parseval frame W with In terms of lower bounds, Cahill and Casazza [4] gave a family of examples of -nearly equal norm Parseval frames where the squared distance to the closest equal norm Parseval frame is at least Ω( d). It is an interesting open question to close this gap.
Our main idea is to make use of the notion of radial isotropic position 1 . In the next section, we define it formally. But to understand it informally, it is useful to compare it to the more familiar notion of placing a set of vectors in isotropic position: Given a set of vectors V = v 1 , v 2 , . . . , v n ∈ R d , is there an invertible affine transformation that generates a new set of vectors Y = Av 1 + b, Av 2 + b, . . . , Av n + b that has mean zero and identity covariance? It is well known that there is such a transformation if and only if i v i v T i has full rank. However such a transformation can also stretch out some directions much more than otherse.g. if all but one of the vectors in V are contained in a d − 1-dimensional subspace. In this case, the set of vectors after applying the transformation would be quite far from where it started out, in total squared distance. Informally, radial isotropic position asks for a linear transformation A so that the renormalized vectors w i = Av i / Av i have the property that i w i w T i is a scalar multiple of the identity. The transformation is now nonlinear but is particularly well suited for constructing a close by equal norm Parseval frame.
One can now ask the same sort of question as before: When can a set of vectors be placed in radial isotropic position? Barthe [2] gave a complete characterization of when this is and is not possible which in turn plays a key role in our proof. It turns out that a sufficient condition is that every d vectors are linearly independent. Now we construct an equal norm Parseval frame as follows: First we renormalize the vectors in V and then we perturb them. Perturbations play a delicate role in [12]. They give a dynamical system which constructs an equal norm Parseval frame from an -nearly equal norm Parseval frame as its input. In order to bound the total squared distance between the input and output, they need to lower bound the convergence rate. They do this through a certain pseudorandom property (Definition 4.3.2) which they show holds when the input is appropriately perturbed. In our proof, all we need is that the perturbations do not move the set of points by too much in squared distance and that afterwards every d of them are linearly independent 2 . The latter condition guarantees that there is a linear transformation that places them in radial isotropic position. Let W be the set of vectors, after applying the linear transformation and renormalizing. By definition, it is an equal norm Parseval frame. Our main technical contribution is in bounding the squared distance between V and W , which we do through some elementary but subtle algebraic manipulations.
Taking a step back, the notion of radial isotropic position seems quite powerful and mysterious but has thus far only found a handful of applications. Forster [7] used it to prove a remarkable lower bound in communication complexity (by lower bounding the sign rank of the Hadamard matrix). Hardt and Moitra [11] gave the first algorithm for computing the transformation that places a set of vectors in radial isotropic position (under a slight strengthening of Barthe's conditions). They also gave applications to linear regression in the presence of outliers. Dvir, Saraf and Wigderson [5] used it to prove superquadratic lower bounds for 3-query locally correctable codes over the reals. Here we use it to give a simple proof of the Paulsen conjecture. Are there other exciting applications waiting to be discovered?

Connections to Operator Scaling and the Brascamp-Lieb Inequality
Radial isotropic position is itself a special case of the more general notion of geometric position [1,2] where we are given an n tuple of linear transformations If we set d i = 1 for all i, then each linear transformation B i can be written as the inner-product with some vector v i . Now if we also set c i = d n for all i, it is easy to check that A places the set of vectors v 1 , v 2 , . . . , v n in radial isotropic position.
It turns out that having A 1 , A 2 , . . . , A n and A that place B 1 , B 2 , . . . , B n in geometric position with respect to the vector c yields an explicit expression for the best constant C for which the inequality . This is called the Brascamp-Lieb inequality.
Finally, in terms of how to compute A 1 , A 2 , . . . , A n and A, a popular approach is operator scaling [10] and there has been considerable recent progress in bounding the number of iterations it needs [8,9]. As we mentioned, Kwok, Lau, Lee and Ramachandran [12] used operator scaling to solve the Paulsen conjecture. In this sense, our approach and theirs are closely related in that they both revolve around algorithms (in our case the ellipsoid algorithm) for computing radial isotropic position. Perhaps the main technical divergence is that they track how the squared distance changes after each iteration of operator scaling, while we are able to bound the squared distance just based on transformation that places v 1 , v 2 , . . . , v n into radial isotropic position. It is also worth mentioning that if instead of proving existence of a nearby equal norm Parseval frame, we want to find it up to some target precision δ, the approaches based on operator scaling typically require the number of iterations to be polynomial in 1/δ. In contrast, we will give algorithms whose running time is polynomial in log 1/δ.

Radial Isotropic Position and the Proof
First we introduce some of the basic concepts and results about radial isotropic position. We will do so in slightly more generality than we will ultimately need.
Definition 2. We say that a set of vectors u 1 , u 2 , . . . , u n ∈ R d is in radial isotropic position with respect to a coefficient vector c Note that if we take the trace of both sides in the expression, we get the necessary condition that n i=1 c i = d. In fact we will only ever consider the case when each c i = d n . We will also need the following key definition: For a set U of vectors u 1 , u 2 , . . . , u n ∈ R d , its basis polytope is defined as  2) The alternative definition we gave in Definition 3 will be more directly useful for our purposes, and was proven to be equivalent by Edmonds [6]. He used this equivalence to give a separation oracle for the basis polytope, which in turn plays a key role in the algorithm of Hardt and Moitra [11] for computing the linear transformation that puts a set of vectors into radial isotropic position. Now we are ready to prove our main theorem: where η i is a perturbation. What we need from these perturbations is just that they make every set of d vectors in U be linearly independent, and that if the norm of the perturbation is a large polynomial in , 1/d and 1/n that it has a negligible effect on our squared distance bounds. As we go along, we will quantify how small we need the perturbation to be. First we want to bound the squared distance between V and U . We can upper bound where γ ≤ η i 2 + 2 η i is a term that depends on the perturbation and if γ ≤ (1− √ 1− ) d n then the last inequality holds. Now summing over all pairs of vectors we get that dist 2 (V, U ) ≤ d.
Next we observe that the vectors in U are still a nearly equal norm Parseval frame. Before adding the perturbation, each vector v i was scaled by a factor between √ 1 − and √ 1 + . Also if we take η i ≤ 2n for each i, then we conclude that the vectors in U are a 4 -nearly equal norm Parseval frame. Now we will utilize Theorem 2. We work with the coefficient vector c where c i = d n for each i. It is easy to check in Definition 3 that, because every set of d vectors from U are linearly independent, c is in their basis polytope. Hence by Theorem 2 we are guaranteed that there is a linear transformation A that places them in radial isotropic position with respect to c. We claim that we can assume without loss of generality that A is a nonnegative diagonal matrix whose entries are sorted in non-increasing order along the diagonal. This follows by writing the singular value decomposition A = CΣD T and observing that radial isotropic position is preserved under taking orthogonal transformations. Thus A = DΣD T also places the vectors in U in radial isotropic position. Now if we change basis so that A is diagonal (and make the same transformation to the vectors in U ), we have the desired conclusion. Now suppose that M is a diagonal matrix with entries λ 1 ≥ λ 2 ≥ · · · ≥ λ d ≥ 0 along the diagonal with the property that it places the vectors in U in radial isotropic position with respect to c. Then set W = w 1 , w 2 , . . . , w n with w i d n M u i M u i By construction, this is an equal norm Parseval frame. What remains is to bound the total squared distance between U and W . In Lemma 1, we show that dist 2 (U, W ) ≤ 16 d 2 + 8γ d 2 , where γ is again another negligible term. Concretely we can choose γ = max i n d ( η i 2 + 2 η i ) when we apply Lemma 1.
Finally, for any three vectors a, b and c we have the triangle-like inequality and when we apply this for all triples of vectors v i , u i and w i we have where the last inequality follows if γ ≤ . This now completes the proof.
Lemma 1. With U , M and W as defined 3 in the proof of Theorem 1 and under the condition that for each i Proof. First we introduce a notion of majorization: Definition 4. For d element sequences x and y we say Next we introduce a notion of distance, similar to the Wasserstein distance, but for vectors that are not necessarily nonnegative: It is easy to see that 1 2 x − y 1 ≤ W (x, y), which follows because W is a type of transport cost for changing x into y and each difference between the vectors must be moved distance at least one.
Lastly we define some useful sequences of helper vectors. Let W = w 1 , w 2 , . . . , w n with And let x i be the result of entrywise squaring u i and let y i be the result of entrywise squaring w i . By construction, the sum of the entries in x i and in y i are the same. We prove in Claim 1 that y i x i for each i. Now we have where the first inequality follows because for any real values a and b with the same sign we have Since M put the vectors U in radial isotropic position with respect to c and the squared norm of each u i is between (1 − γ ) d n and (1 + γ ) d n , we have that n i=1 (( w i ) j ) 2 ≥ 1 − γ . And because the vectors U = u 1 , u 2 , . . . , u n are a 4 -nearly Parseval frame, we have n i=1 ((u i ) j ) 2 ≤ 1 + 4 which gives To complete the proof, using the fact that for each i, w i and w i differ by a scaling factor whose square is between 1 − γ and 1 + γ , we have and putting it all together we get again using the triangle-like inequality.

An Algorithm for the Paulsen Problem
Every step of the proof of Theorem 1 is straightforward to implement algorithmically, except for the step where we compute the transformation A that places the set of vectors U in radial isotropic position. Fortunately, Hardt and Moitra [11] gave an algorithm for computing A under a slight strengthening of Barthe's conditions which holds in our setting. Informally, they require the vector c to be strictly inside the basis polytope according to the following notion of scaling: We will state a special case of their main theorem, which is sufficient for our purposes. Theorem 3.
[11] Let δ > 0 and α > 0. Suppose U = u 1 , u 2 , . . . , u n ∈ R d has the property that every set of d vectors are linearly independent. Then given c ∈ 1 − α)B(U ), there is an algorithm to find a linear transformation A so that where J ∞ ≤ δ. The running time is polynomial in 1/α, log 1/δ and L where L is an upper bound on the bit complexity of U and c.
By combining their algorithm with our proof of Theorem 1 we get: . . , v n ∈ R d is an -nearly equal norm Parseval frame. Then given δ > 0, there is an algorithm to compute a δ-nearly equal norm Parseval frame W with dist 2 (V, W ) ≤ 40 d 2 whose running time is polynomial in log 1/δ and L where L is an upper bound on the bit complexity of V .
Proof. We perturb V as in the proof of Theorem 1 and run the algorithm in Theorem 3 on U with c with c i = d n for all i and some δ to be specified later. First we note that when every set of d vectors of u are linearly independent and n > d, then c ∈ (1 − α)B(U ) for α = 1 n . Moreover we only needed the perturbation to be polynomially small in , 1/d and 1/n and hence we can ensure that its bit complexity is a polynomial in the bit complexity of V .
Finally we can choose δ = δ d 3 so that the output is a δ d -nearly equal norm Parseval frame. Finally note that our bound on the squared distance between V and W in Lemma 1 used the fact that W was an equal norm Parseval frame. But it is easy to see that the slack in the bounds we used can accommodate a δ d -nearly equal norm Parseval frame instead.
This answers an open question of [12], where they ask whether there is an algorithm for finding an equal norm Parseval frame up to some precision δ whose running time is polynomial in log 1/δ.