Guessing Efficiently for Constrained Subspace Approximation
Abstract
In this paper we study constrained subspace approximation problem. Given a set of points in , the goal of the subspace approximation problem is to find a dimensional subspace that best approximates the input points. More precisely, for a given , we aim to minimize the th power of the norm of the error vector , where denotes the projection matrix onto the subspace and the norms are Euclidean. In constrained subspace approximation (CSA), we additionally have constraints on the projection matrix . In its most general form, we require to belong to a given subset that is described explicitly or implicitly.
We introduce a general framework for constrained subspace approximation. Our approach, that we term coreset-guess-solve, yields either -multiplicative or -additive approximations for a variety of constraints. We show that it provides new algorithms for partition-constrained subspace approximation with applications to fair subspace approximation, -means clustering, and projected non-negative matrix factorization, among others. Specifically, while we reconstruct the best known bounds for -means clustering in Euclidean spaces, we improve the known results for the remainder of the problems.
Keywords and phrases:
parameterized complexity, low rank approximation, fairness, non-negative matrix factorization, clusteringCategory:
Track A: Algorithms, Complexity and GamesFunding:
Aditya Bhaskara: Supported by NSF CCF-2047288.Copyright and License:
![[Uncaptioned image]](x1.png)
David P. Woodruff; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Continuous optimizationEditors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele PuppisSeries and Publisher:

1 Introduction
Large data sets, often represented as collections of high-dimensional points, naturally arise in fields such as machine learning, data mining, and computational geometry. Despite their high-dimensional nature, these points typically exhibit low intrinsic dimensionality. Identifying (or summarizing) this underlying low-dimensional structure is a fundamental algorithmic question with numerous applications to data analysis. We study a general formulation, that we call the subspace approximation problem.
In subspace approximation, given a set of points and a rank parameter , we consider the problem of “best approximating” the input points with a -dimensional subspace in a high-dimensional space. Here the goal is to find a rank projection that minimizes the projection costs , aggregated over . The choice of aggregation leads to different well-studied formulations. In the subspace approximation problem, the objective is . Formally, denoting by the matrix whose th column is , the -subspace approximation problem asks to find a rank projection matrix that minimizes . For different choices of , -subspace approximation captures some well-studied problems, notably the median hyperplane problem (when ), the principal component analysis (PCA) problem (when ), and the center hyperplane problem (when ).
Subspace approximation for general turns out to be NP-hard for all . For , semidefinite programming helps achieve a constant factor approximation (for fixed ) for the problem [19]. Matching hardness results were also shown for the case , first assuming the Unique Games Conjecture [19], and then based only on [23]. For , hardness results were first shown in the work of [13].
Due to the ubiquitous applications of subspace approximation in various domains, several “constrained” versions of the problem have been extensively studied as well [20, 44, 33, 2, 8, 14]. In the most general setting of the constrained -subspace approximation problem, we are additionally given a collection of rank- projection matrices (specified either explicitly or implicitly) and the goal is to find a projection matrix minimizing the objective. I.e.,
(1) |
Some examples of problems in constrained subspace approximation include the well-studied column subset selection [7, 39, 18, 11, 24, 6, 1] where the projection matrices are constrained to project on to the span of of the original vectors, -means clustering in which the set of projection matrices can be specified by the partitioning of the points into clusters (see [14] for a reference), and many more which we will describe in this paper.
1.1 Our Contributions and Applications
In this paper, we provide a general algorithmic framework for constrained -subspace approximation that yields either -multiplicative or -additive error approximations to the objective (depending on the setting), with running time exponential in . We apply the framework to several classes of constrained subspace approximation, leading to new results or results matching the state-of-the-art for these problems. Note that since the problems we consider are typically APX-hard (including -means, and even the unconstrained version of -subspace approximation for ), a running time exponential in is necessary for our results, assuming the Exponential Time Hypothesis; a discussion in Section 2. Before presenting our results, we start with an informal description of the framework.
Overview of Approach.
Our approach is based on coresets [21] (also [16, 15, 27] and references therein), but turns out to be different from the standard approach in a subtle yet important way. Recall that a (strong) coreset for an optimization problem on set of points is a subset such that for any solution for , the cost on is approximately the same as the cost on , up to an appropriate scaling. In the formulation of -subspace approximation above, a coreset for a dataset would be a subset of its columns with columns, such that for all -dimensional subspaces, each defined by some , , up to scaling. Thus the goal becomes to minimize the former quantity.
In the standard coreset approach, first a coreset is obtained, and then a problem-specific enumeration procedure is used to find a near optimal solution . For example, for the -means clustering objective, one can consider all the -partitions of the points in the coreset ; each partition leads to a set of candidate centers, and the best of these candidate solutions will be an approximate solution to the full instance. Similarly for (unconstrained) -subspace approximation, one observes that for an optimal solution, the columns of must lie in the span of the vectors of , and thus one can enumerate over the combinations of the vectors of . Each combination gives a candidate , and the best of these candidate solutions is an approximate solution to the full instance.
However, this approach does not work in general for constrained subspace approximation. To see this, consider the very simple constraint of having the columns of coming from some given subspace . Here, the coreset for -subspace approximation on will be some set that is “oblivious” of the subspace . Thus, enumerating over combinations of may not yield any vectors in !
Our main idea is to avoid enumeration over candidate solutions, but instead, we view the solution (the matrix ) as simply a set of variables. We then note that since the goal is to use to approximate , there must be some combination of the vectors of (equivalently, a set of coefficients) that approximates each vector in . If the coreset size is , there are only coefficients in total, and we can thus hope to enumerate these coefficients in time . For every given choice of coefficients, we can then solve an optimization problem to find the optimal . For the constraints we consider (including the simple example above), this problem turns out to be convex, and can thus be solved efficiently!
This simple idea yields -additive approximation guarantees for a range of problems. We then observe that in specific settings of interest, we can obtain -multiplicative approximations by avoiding guessing of the coefficients. In these settings, once the coefficients have been guessed, there is a closed form for the optimal basis vectors, in the form of low degree polynomials of the coefficients. We can then use the literature on solving polynomial systems of equations (viewing the coefficients as variables) to obtain algorithms that are more efficient than guessing. The framework is described more formally in Section 3.
We believe our general technique of using coresets to reduce the number of coefficients needed in order to turn a constrained non-convex optimization problem into a convex one, may be of broader applicability. We note it is fundamentally different than the “guess a sketch” technique for variable reduction in [34, 3, 4, 30] and the techniques for reducing variables in non-negative matrix factorization [32]. To support this statement, the guess a sketch technique requires the existence of a small sketch, and consequently has only been applied to approximation with entrywise -norms for and weighted variants [34, 3, 30], whereas our technique applies to a much wider family of norms.
Relation to Prior Work.
We briefly discuss the connection to prior work on binary matrix factorization using coresets. The work of [40] considers binary matrix factorization by constructing a strong coreset that reduces the number of distinct rows via importance sampling, leveraging the discrete structure of binary inputs. Our framework generalizes these ideas to continuous settings: we use strong coresets not merely to reduce distinct rows, but to reduce the number of variables in a polynomial system for solving continuous constrained optimization problems. This enables us to extend the approach to real-valued matrices and to more general loss functions. Our framework can be seen as a generalization and unification of prior coreset-based “guessing” strategies, adapting them to significantly broader settings.
Applications.
We apply our framework to the following applications. Each of these settings can be viewed as subspace approximation with a constraint on the subspace (i.e., on the projection matrix), or on properties of the associated basis vectors. Below we describe these applications, mention how they can be formulated as Constrained Subspace Approximation, and state our results for them. See Table 1 for a summary.
Problem | Running Time | Approx. | Prior Work |
---|---|---|---|
PC--Subspace Approx. | (21) | - | |
(22) | - | ||
Constrained Subspace Est. | (18) | ||
(19) | |||
PNMF | (1) | ||
(2) | |||
-Means Clustering | (3) | [21] | |
Sparse PCA | (4) | [17] |
1.1.1 Subspace Approximation with Partition Constraints
First, we study a generalization of -subspace approximation, where we have partition constraints on the subspace. More specifically, we consider PC--subspace approximation, where besides the point set , we are given subspaces along with capacities such that . Now the set of valid projections is implicitly defined to be the set of projections onto the subspaces that are obtained by selecting vectors from for each , taking their span.
PC--subspace approximation can be viewed as a variant of data summarization with “fair representation”. Specifically, when is the span of the vectors (or points) in group , then by setting values properly (depending on the application or the choice of policy makers), PC--subspace approximation captures the problem of finding a summary of the input data in which groups are fairly represented. This corresponds to the equitable representation criterion, a popular notion studied extensively in the fairness of algorithms, e.g., clustering [29, 28, 10, 26].111We note that the fair representation definitions differ from those in the line of work on fair PCA and column subset selection [35, 38, 31, 37], where the objective contributions (i.e., projection costs) of different groups must either be equal (if possible) or ensure that the maximum incurred cost is minimized. We focus on the question of groups having equal, or appropriately bounded, representation among the chosen low-dimensional subspace (i.e., directions). This distinction is also found in algorithmic fairness studies of other problems, such as clustering. We show the following results for PC-subspace approximation:
-
First, in Theorem 21, we show for any , an algorithm for PC--subspace approximation with runtime that returns a solution with additive error at most , where is the condition number of the optimal choice of vectors from the given subspaces.
-
For , which is one of the most common loss functions for PC--subspace approximation, we also present a multiplicative approximation guarantee. There exists a -approximation algorithm running in time where is the bit complexity of each element in the input and is the sum of the dimensions of the input subspaces , i.e., . The formal statement is in Theorem 22.
1.1.2 Constrained Subspace Estimation
The Constrained Subspace Estimation problem originates from the signal processing community [36], and aims to find a subspace of dimension , that best approximates a collection of experimentally measured subspaces , with the constraint that it intersects a model-based subspace in at least a predetermined number of dimensions , i.e., . This problem arises in applications such as beamforming, where the model-based subspace is used to encode the available prior information about the problem. The paper of [36] formulates and motivates that problem, and further present an algorithm based on a semidefinite relaxation of this non-convex problem, where its performance is only demonstrated via numerical simulation.
We show in Section 4.1, that this problem can be reduced to at most instances of PC--subspace approximation, in which the number of parts is . This will give us the following result for the constrained subspace estimation problem.
-
In Corollary 18, we show a -multiplicative-additive approximation in time .
-
In Theorem 19, we show a multiplicative approximation in time where we assume has integer entries of absolute value at most . We assume that .
1.1.3 Projective Non-Negative Matrix Factorization
Projective Non-Negative Matrix Factorization (PNMF) [45] (see also [46, 43]) is a variant of Non-Negative Matrix Factorization (NMF), used for dimensionality reduction and data analysis, particularly for datasets with non-negative values such as images and texts. In NMF, a non-negative matrix is factorized into the product of two non-negative matrices and such that where contains basis vectors, and represents coefficients. In PNMF, the aim is to approximate the data matrix by projecting it onto a subspace spanned by non-negative vectors, similar to NMF. However, in PNMF, the factorization is constrained to be projective.
Formally, PNMF can be formulated as a constrained -subspace approximation as follows: the set of feasible projection matrices , consists of all matrices that can be written as , where is a orthonormal matrix with all non-negative entries.
We show the following results:
Theorem 1 (Additive approximation for NMF).
Given an instance of Non-negative matrix factorization, there is an algorithm that computes a such that
in time . For any .
Theorem 2 (Multiplicative approximation for NMF).
Given an instance of Non-negative matrix factorization with integer entries of absolute value at most in , there is an algorithm that computes a such that
in time .
1.1.4 -Means Clustering
-means is a popular clustering algorithm widely used in data analysis and machine learning. Given a set of vectors and a parameter , the goal of -means clustering is to partition these vectors into clusters such that the sum of the squared distances of all points to their corresponding cluster center is minimized, where denotes the cluster that belongs to and denotes its center. It is an easy observation that once the clustering is determined, the cluster centers need to be the centroid of the points in each cluster. It is shown in [14] that this problem is an instance of constrained subspace approximation. More precisely, the set of valid projection matrices are all those that can be written as , where is a matrix where is if and otherwise. Note that this is an orthonormal matrix and thus is an orthogonal projection matrix. Further, note that using our language we need to apply the constrained subspace approximation on the matrix , i.e., . In Theorem 3, we show a approximation algorithm for -means that runs in time, whose dependency on and matches that of [21].
Theorem 3.
Given an instance of -means, there is an algorithm that computes a -approximate solution to -means in time.
1.1.5 Sparse PCA
The goal of Principal Component Analysis (PCA) is to find linear combinations of the features (dimensions), which are called principal components, that captures most of the mass of the data. As mentioned earlier, PCA is the subspace approximation problem with . However, typically the obtained principal components are linear combinations of all vectors which makes interpretability of the components more difficult. As such, Sparse PCA which is the optimization problem obtained from PCA by adding a sparsity constraint on the principal components have been defined which provides higher data interpretability [17, 47, 9, 25, 5].
Sparse PCA can be formulated as a constrained subspace approximation problem in which the set of projection matrices are constrained to those that can be written as where is a orthonormal matrix such that the total number of non-zero entries in the is at most , for a given parameter .
(sparse-PCA-max) | ||||
(2) |
Program sparse-PCA-max can also be formulated as a minimization version
(sparse-PCA-min) | ||||
(3) |
We give an algorithm that runs in time that computes a additive approximate solution, which translates to a -multiplicative approximate solution to one formulation of the problem.
Theorem 4.
Given an instance of sparse-PCA, there is an algorithm that runs in time
(4) |
with that computes a additive approximate solution to both sparse-PCA-max and sparse-PCA-min. This is guarantees as a -approximate solution to sparse-PCA-min because is a lower bound to sparse-PCA-min.
1.1.6 Column Subset Selection with a Partition Constraint
Column subset selection (CSS) is a popular data summarization technique [8, 14, 1], where given a matrix , the goal is to find columns in that best approximates all columns of . Since in CSS, a subset of columns in the matrix are picked as the summary of the matrix , enforcing partition constraints naturally captures the problem of column subset selection with fair representation. More formally, in column subset selection with partition constraints (PC-column subset selection), given a partitioning of the columns of into groups, , along with capacities , where , the set of valid subspaces are obtained by picking vectors from , and projecting onto the span of these columns of .
In Theorem 5, we show that PC-column subset selection is hard to approximate to any factor in polynomial time, even if there are only two groups, or even when we allow for violating the capacity constraint by a factor of This is in sharp contrast with the standard column subset selection problem for which efficient algorithms with tight guarantees are known.
Theorem 5.
Assuming , the PC-column subset selection problem is hard to approximate to any multiplicative factor , even in the following special cases:
(i) The case of groups, where the capacities on all the groups are the same parameter .
(ii) The case where the capacities on all the groups are the same parameter , and we allow a solution to violate the capacity by a factor , where is the total number of columns in the instance.
2 Preliminaries
We will heavily use standard notations for vector and matrix quantities. For a matrix , we denote by the th column of and by the th row. We denote by the Frobenius norm, which is simply , where is the entry in the th row and th column of . We also use mixed norms, where . I.e., it is the norm of the vector whose entries are the norm of the columns of .
We also use to denote the least singular value of a matrix, and to denote the largest singular value. The value is used to denote the condition number, which is the ratio of the largest to the smallest singular value.
In analyzing the running times of our algorithms, we will use the following basic primitives, the running times of which we denote as and respectively. These are standard results from numerical linear algebra; while there are several improvements using randomization, these bounds will not be the dominant ones in our running time, so we do not optimize them.
Lemma 6 (SVD Computation; see [22]).
Given , computing the reduced matrix as in Lemma 12 takes time , where is the maximum bit complexity of any element of .
Lemma 7 (Least Squares Regression; see [22]).
Given and given a target matrix with columns, the optimization problem can be solved in time H, where is the maximum bit length of any entry in .
Remark on the Exponential in Running Times.
In all of our results, it is natural to ask if the exponential dependence on is necessary. We note that many of the problems we study are APX hard, and thus obtaining multiplicative factors will necessarily require exponential time in the worst case. For problems that generalize -subspace approximation (e.g., the PC--subspace approximation problem, Section 4.2), the work of [23] showed APX hardness for while [13] showed NP-hardness. In our reductions, we in fact have the stronger property that the YES and NO instances differ in objective value by , where is the matrix used in the reduction. Thus, assuming the Exponential Time Hypothesis, even the additive error guarantee in general requires an exponential dependence on either or , at least for .
3 Framework for Constrained Subspace Approximation
Given a matrix and a special collection of rank projection matrices, we are interested in selecting the projection matrix that minimizes the sum of projection costs (raised to the power) of the columns of onto . More compactly, the optimization problem is
(CSA) | ||||
A more geometric and equivalent interpretation is that we have a collection of data-points and we would like to approximate these data points by a subspace while satisfying certain constraints on the subspace: | ||||
(CSA-geo) | ||||
See Lemma 9 for a proof of the equivalence. We provide a unified framework to obtain approximately optimal solutions for various special collections of . In our framework, there are three steps to obtaining an approximate solution to any instance of CSA.
-
1.
Build a coreset: Reduce the size of the problem by replacing with a different matrix with a fewer number of columns typically . The property we need to guarantee is that the projection cost is approximately preserved possibly with an additive error independent of :
(5) Such a (for ) has been referred to as a Projection-Cost-Preserving Sketch with one sided error in [14] . See Definition 10, Theorem 11, and Lemma 12 for results obtaining such a for various . Lemma 14 shows that approximate solutions to reduced instances satisfying Equation 5 are also approximate solutions to the original instance .
-
2.
Guess Coefficients: Since the projection matrix is of rank , it can be represented as such that . Using this, observe that the residual matrix
can be represented as where is a matrix. The norm of the column of can be bounded by the norm of the column of . This allows us to guess every column of inside a dimensional ball of radius at most the norm of the corresponding column in . Using a net with appropriate granularity, we guess the optimal up to an additive error.
-
3.
Solve: For every fixed in the search space above, we solve the constrained regression problem
exactly. If is the matrix that induces the minimum cost, and is the minimizer to the constrained regression problem, we return the projection matrix .
The following lemma formalizes the framework above and can be used as a black box application for several specific instances of CSA.
Lemma 8.
Given an instance of CSA, for ,
-
1.
Let be the time taken to obtain a smaller instance such that the approximate cost property in Equation 5 is satisfied and the number of columns in is .
-
2.
Let be the time taken to solve the constrained regression problem for any fixed and
(6) Then for any granularity parameter , we obtain a solution such that
(7) in time .
Here, and .
Proof.
Let the optimal solution to the instance be and let . Since the columns of are unit vectors, the norm of the column of is at most the norm of the column of . We will try to approximately guess the columns of using epsilon nets. For each , we search for the column of using a -net inside a dimensional ball of radius centered at origin. The size of the net for each column of is and hence the total search space over matrices has possibilities.
For each , we solve the constrained regression problem in Equation 6. Let be the matrix for which the cost is minimized and be the corresponding minimizer to the constrained regression problem respectively. Consider the solution . The cost of this solution on reduced instance is
(8) | ||||
Let be the matrix in the search space such that for every . Using the cost minimality of , we can imply that the above cost is | ||||
(9) | ||||
(10) |
It remains to upper bound the difference . If we let and for , then
(11) | ||||
Using the fact that , we know that | ||||
(12) |
This implies that each error term
(13) | ||||
(Triangle inequality) | ||||
() | ||||
( is increasing in ) |
Summing up, the total error is at most for . This implies that
(14) | ||||
Using the property of from Equation 5, we can imply that | ||||
(15) | ||||
setting in Equation 5 and using the fact that gives . Plugging this in the equation above gives | ||||
(16) |
The total time taken by the algorithm is .
Proof.
First, we will prove the equivalence between CSA and CSA-fac.
- 1.
-
2.
For the other direction, it suffices to show that for any fixed choice of such that , an optimal choice of is . In order to see this, observe that the problem
(17) where and are the columns of and respectively. Since the cost function decomposes into separate problems for each column, we can push the minimization inside. (18) Using the normal equations, the optimal choice for satisfies . Since the columns of are orthonormal, this implies that for each and hence .
Now we show the equivalence between CSA-fac and CSA-geo. Observe that CSA-geo can be re-written as
Because the column span of is identical to the column span of . Replacing by gives CSA-fac.
Definition 10 (Strong coresets; as defined in [41]).
Let and . Let . Then, a diagonal matrix is a strong coreset for subspace approximation if for all rank projection matrices , we have
(19) |
The number of non-zero entries of will be referred to as the size of the coreset.
Theorem 11 (Theorems 1.3 and 1.4 of [42]).
Let and be given, and let . There is an algorithm running in time which, with probability at least , constructs a strong coreset that satisfies Definition 10 and has size:
(20) |
Remark.
Note that for any that satisfies the property in Definition 10, we can scale it up to satisfy matching the condition in Equation 5.
For many of the applications, we have . For this case, the choice of the reduced matrix that replaces is simply the matrix of scaled left singular vectors of . More formally,
Lemma 12.
When , if be the singular value decomposition of (where is the largest singular value and are the left singular vector and right singular vector corresponding to ), then satisfies Equation 5 for .
The proof is deferred to Appendix A.1.
Remark 13.
Notice that when , Lemma 12 proves the condition in Equation 49:
which is stronger than the condition in Equation 5.
Lemma 14.
If is an instance of CSA and is a matrix that satisfies Equation 5, and
(21) |
then is an -approximate solution to the instance i.e.,
(22) |
-
1.
More generally, if is an approximate solution to such that
for some , then we have
-
2.
For the specific case when , if is an exact solution to , then we have
Proof.
-
1.
Using the approximate optimality of for the instance , we have
(23) Using the lower-bound and upper-bound from Equation 5 for the LHS and RHS, we get (24) Since and , we get (25) -
2.
Using the optimality of for the instance for with , we have
(26) Using Remark 13, we know that for any rank projection matrix for some independent of (see Equation 49). Using this, we get
Canceling out the gives the inequality we claimed.
Lemma 15 (Lemma 4.1 in [12]).
If matrix has integer entries bounded in magnitude by , and has rank , then the singular value of has as . This implies that as . Here
4 Applications
We present two applications to illustrate our framework. The remaining applications, as well as our hardness result for fair column-based approximation, are deferred to the full version.
4.1 Constrained Subspace Estimation [36]
In constrained subspace estimation, we are given a collection of target subspaces and a model subspace . The goal is to find a subspace of dimension such that that maximizes the average overlap between the subspace and . More formally, the problem can be formulated as mathematical program:
(CSE-max) | ||||
(27) | ||||
(28) | ||||
(29) |
Let us assume that the constraint is actually an exact constraint because we can solve for different cases for each . Since is a PSD matrix, let it be for some . Changing the optimization problem from a maximization problem to a minimization problem, we get
(CSE-min) | ||||
(30) | ||||
(31) |
Proof.
Let be the reduced matrix obtained as in Lemma 12. Using Lemma 14, it is sufficient to focus on the reduced instance with replaced instead of .
Any subspace such that can be represented equivalently as
Using these observations and Lemma 9, we can focus on the following subspace estimation program
(32) | ||||
(33) | ||||
(34) | ||||
Since is unconstrained, we can replace the condition in Equation 33 with the much simpler condition . This gives | ||||
(CSE-min-reduced) | ||||
(35) | ||||
(36) |
Lemma 17.
For any fixed and , the Equation CSE-min-reduced can be solved exactly in time.
Proof.
For fixed and , the objective is convex quadratic in and the constraints are linear on . Linear constrained convex quadratic program can be efficiently solved.
Corollary 18 (Additive approximation for CSE).
Lemma 15 gives a lower bound for OPT when the entries of the input matrix are integers bounded in magnitude by .
Theorem 19 (Multiplicative approximation for CSE).
Given an instance of constrained subspace estimation with integer entries of absolute value at most in , there is an algorithm that obtains a subspace such that and
in time.
Proof.
Using Lemma 15, we know that . Setting in Corollary 18 gives the desired time complexity.
4.2 Partition Constrained -Subspace Approximation
We now consider the PC--subspace approximation problem, which generalizes the subspace approximation and subspace estimation problems.
Definition 20 (Partition Constrained -Subspace Approximation).
In the PC--subspace approximation problem, we are given a set of target vectors as columns of a matrix , a set of subspaces , and a sequence of capacity constraints where . The goal is to select vectors in total, from subspace , such that their span captures as much of as possible. Formally, the goal is to select vectors , such that for every , , so as to minimize .
Our results will give algorithms with running times exponential in for PC--subspace approximation. Given this goal, we can focus on the setting where , since we can replace each in the original formulation with copies of , with a budget of for each copy.
PC--subspace approximation with Unit Capacity.
Given a set of vectors as columns of a matrix and subspaces , select a vector for in order to minimize , where is a given parameter. A more compact formulation is
(PC--SA-geo) | ||||
(37) | ||||
(38) | ||||
Using Lemma 9, the two other equivalent formulations are | ||||
(PC--SA) | ||||
(39) | ||||
(40) | ||||
(PC--SA-fac) | ||||
(41) | ||||
(42) |
In what follows, we thus focus on the unit capacity version. We can use our general framework to derive an additive error approximation, for any .
Theorem 21.
There exists an algorithm for PC--subspace approximation with runtime which returns a solution with additive error at most , where is the condition number of an optimal solution for the PC-subspace approximation problem PC--SA-fac.
For the special case of , it turns out that we can obtain a -multiplicative approximation, using a novel idea. We now outline this approach. As described in our framework, we start by constructing the reduced instance , where is a set of target vectors and is the given collection of subspaces of . We define to be some fixed orthonormal basis for the space . Recall that any solution to PC--subspace approximation is defined by (a) the vector that expresses the chosen as (we have one for each ), and (b) a set of combination coefficients used to represent the vectors using the vectors . We collect the vectors into one long vector and the coefficients into a matrix .
Theorem 22.
Let be an instance of PC--subspace approximation, where , and suppose that the bit complexity of each element in the input is bounded by . Suppose there exists an (approximately) optimal solution is defined by the pair with bit complexity . There exists an algorithm that runs in time and outputs a solution whose objective value is within a factor of the optimum objective value. We denote and ; for this result can be set to .
Algorithm Overview.
Recall that specifies an orthonormal basis for . Let , where are variables. Define to be the matrix consisting of blocks; the block is and we let be the vectors representing all the stacked vertically respectively as shown below:
The problem PC--subspace approximation can now be expressed as the regression problem:
(43) |
Written this way, it is clear that for any , the optimization problem with respect to is simply a regression problem. For the sake of exposition, suppose that for the optimal solution , the matrix turns out to have a full column rank (i.e., is invertible). In this case, the we can write down the normal equation and solve it using Cramer’s rule! More specifically, let and be the matrix obtained by replacing the column in the column block of with the column for . Using Cramer’s rule, we have .
The key observation now is that substituting this back into the objective yields an optimization problem over (the variables) . First, observe that using the normal equations, the objective can be simplified as
Suppose is a real valued parameter that is a guess for the objective value. We then consider the following feasibility problem:
(44) | ||||
(45) |
The idea is to solve this feasibility problem using the literature on solving polynomial systems. This leaves two main gaps: guessing , and handling the case of not having a full column rank in the optimal solution. We handle the former issue using known quantitative bounds on the solution value to polynomial systems, and the latter using a pre-multiplication with random matrices of different sizes.
References
- [1] Jason Altschuler, Aditya Bhaskara, Gang Fu, Vahab Mirrokni, Afshin Rostamizadeh, and Morteza Zadimoghaddam. Greedy column subset selection: New bounds and distributed algorithms. In International Conference on Machine Learning, pages 2539–2548, 2016. URL: http://proceedings.mlr.press/v48/altschuler16.html.
- [2] Megasthenis Asteris, Dimitris Papailiopoulos, and Alexandros Dimakis. Nonnegative sparse pca with provable guarantees. In International Conference on Machine Learning, pages 1728–1736. PMLR, 2014. URL: http://proceedings.mlr.press/v32/asteris14.html.
- [3] Frank Ban, Vijay Bhattiprolu, Karl Bringmann, Pavel Kolev, Euiwoong Lee, and David P. Woodruff. A PTAS for -low rank approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 747–766, 2019.
- [4] Frank Ban, David P. Woodruff, and Qiuyi (Richard) Zhang. Regularized weighted low rank approximation. CoRR, abs/1911.06958, 2019. arXiv:1911.06958.
- [5] Christos Boutsidis, Petros Drineas, and Malik Magdon-Ismail. Sparse features for pca-like linear regression. Advances in Neural Information Processing Systems, 24, 2011.
- [6] Christos Boutsidis, Petros Drineas, and Malik Magdon-Ismail. Near-optimal column-based matrix reconstruction. SIAM Journal on Computing, 43(2):687–717, 2014. doi:10.1137/12086755X.
- [7] Christos Boutsidis, Michael W Mahoney, and Petros Drineas. An improved approximation algorithm for the column subset selection problem. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 968–977, 2009. doi:10.1137/1.9781611973068.105.
- [8] Christos Boutsidis, Anastasios Zouzias, Michael W Mahoney, and Petros Drineas. Randomized dimensionality reduction for -means clustering. IEEE Transactions on Information Theory, 61(2):1045–1062, 2014. doi:10.1109/TIT.2014.2375327.
- [9] Jorge Cadima and Ian T Jolliffe. Loading and correlations in the interpretation of principle compenents. Journal of applied Statistics, 22(2):203–214, 1995.
- [10] Ashish Chiplunkar, Sagar Kale, and Sivaramakrishnan Natarajan Ramamoorthy. How to solve fair -center in massive data models. In International Conference on Machine Learning, pages 1877–1886, 2020.
- [11] Ali Civril and Malik Magdon-Ismail. Column subset selection via sparse approximation of SVD. Theoretical Computer Science, 421:1–14, 2012. doi:10.1016/J.TCS.2011.11.019.
- [12] Kenneth L Clarkson and David P Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 205–214, 2009. doi:10.1145/1536414.1536445.
- [13] Kenneth L. Clarkson and David P. Woodruff. Input sparsity and hardness for robust subspace approximation. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 310–329, 2015. doi:10.1109/FOCS.2015.27.
- [14] Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for -means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 163–172, 2015. doi:10.1145/2746539.2746569.
- [15] Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, and Omar Ali Sheikh-Omar. Improved coresets for euclidean -means. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 2024.
- [16] Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. A new coreset framework for clustering. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 169–182, 2021. doi:10.1145/3406325.3451022.
- [17] Alberto Del Pia. Sparse pca on fixed-rank matrices. Mathematical Programming, 198(1):139–157, 2023. doi:10.1007/S10107-022-01769-9.
- [18] Amit Deshpande and Luis Rademacher. Efficient volume sampling for row/column subset selection. In 2010 ieee 51st annual symposium on foundations of computer science, pages 329–338, 2010. doi:10.1109/FOCS.2010.38.
- [19] Amit Deshpande, Madhur Tulsiani, and Nisheeth K. Vishnoi. Algorithms and hardness for subspace approximation. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, pages 482–496, USA, 2011. doi:10.1137/1.9781611973082.39.
- [20] Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, and Vishwanathan Vinay. Clustering large graphs via the singular value decomposition. Machine learning, 56:9–33, 2004. doi:10.1023/B:MACH.0000033113.59016.96.
- [21] Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A ptas for -means clustering based on weak coresets. In Proceedings of the twenty-third annual symposium on Computational geometry, pages 11–18, 2007. doi:10.1145/1247069.1247072.
- [22] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 4th edition, 2013.
- [23] Venkatesan Guruswami, Prasad Raghavendra, Rishi Saket, and Yi Wu. Bypassing ugc from some optimal geometric inapproximability results. ACM Trans. Algorithms, 12(1), February 2016. doi:10.1145/2737729.
- [24] Venkatesan Guruswami and Ali Kemal Sinop. Optimal column-based low-rank matrix reconstruction. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1207–1214, 2012. doi:10.1137/1.9781611973099.95.
- [25] Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity. Monographs on statistics and applied probability, 143(143):8, 2015.
- [26] Sedjro Salomon Hotegni, Sepideh Mahabadi, and Ali Vakilian. Approximation algorithms for fair range clustering. In International Conference on Machine Learning, pages 13270–13284. PMLR, 2023.
- [27] Lingxiao Huang, Jian Li, and Xuan Wu. On optimal coreset construction for euclidean -clustering. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 1594–1604, 2024. doi:10.1145/3618260.3649707.
- [28] Matthew Jones, Huy Nguyen, and Thy Nguyen. Fair -centers via maximum matching. In International Conference on Machine Learning, pages 4940–4949, 2020.
- [29] Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. Fair -center clustering for data summarization. In International Conference on Machine Learning, pages 3448–3457, 2019. URL: http://proceedings.mlr.press/v97/kleindessner19a.html.
- [30] Arvind V. Mahankali and David P. Woodruff. Optimal column subset selection and a fast PTAS for low rank approximation. CoRR, abs/2007.10307, 2020.
- [31] Antonis Matakos, Bruno Ordozgoiti, and Suhas Thejaswi. Fair column subset selection. arXiv preprint arXiv:2306.04489, 2023. doi:10.48550/arXiv.2306.04489.
- [32] Ankur Moitra. An almost optimal algorithm for computing nonnegative rank. SIAM J. Comput., 45(1):156–173, 2016. doi:10.1137/140990139.
- [33] Dimitris Papailiopoulos, Alexandros Dimakis, and Stavros Korokythakis. Sparse pca through low-rank approximations. In International Conference on Machine Learning, pages 747–755. PMLR, 2013. URL: http://proceedings.mlr.press/v28/papailiopoulos13.html.
- [34] Ilya Razenshteyn, Zhao Song, and David P. Woodruff. Weighted low rank approximations with provable guarantees. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pages 250–263, 2016. doi:10.1145/2897518.2897639.
- [35] Samira Samadi, Uthaipon Tantipongpipat, Jamie H Morgenstern, Mohit Singh, and Santosh Vempala. The price of fair PCA: One extra dimension. In Advances in neural information processing systems, pages 10976–10987, 2018.
- [36] Ignacio Santamaria, Javier Vía, Michael Kirby, Tim Marrinan, Chris Peterson, and Louis Scharf. Constrained subspace estimation via convex optimization. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 1200–1204. IEEE, 2017. doi:10.23919/EUSIPCO.2017.8081398.
- [37] Zhao Song, Ali Vakilian, David Woodruff, and Samson Zhou. On socially fair regression and low-rank approximation. In Advances in Neural Information Processing Systems, 2024.
- [38] Uthaipon Tantipongpipat, Samira Samadi, Mohit Singh, Jamie H Morgenstern, and Santosh Vempala. Multi-criteria dimensionality reduction with applications to fairness. In Advances in Neural Information Processing Systems, pages 15135–15145, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/2201611d7a08ffda97e3e8c6b667a1bc-Abstract.html.
- [39] Joel A Tropp. Column subset selection, matrix factorization, and eigenvalue optimization. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 978–986. SIAM, 2009. doi:10.1137/1.9781611973068.106.
- [40] Ameya Velingker, Maximilian Vötsch, David P. Woodruff, and Samson Zhou. Fast -approximation algorithms for binary matrix factorization. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
- [41] David P. Woodruff and Taisuke Yasuda. Nearly linear sparsification of subspace approximation, 2024. doi:10.48550/arXiv.2407.03262.
- [42] David P. Woodruff and Taisuke Yasuda. Ridge leverage score sampling for subspace approximation, 2025. arXiv:2407.03262.
- [43] Zhirong Yang and Erkki Oja. Linear and nonlinear projective nonnegative matrix factorization. IEEE Transactions on Neural Networks, 21:734–749, 2010. doi:10.1109/TNN.2010.2041361.
- [44] Xiao-Tong Yuan and Tong Zhang. Truncated power method for sparse eigenvalue problems. Journal of Machine Learning Research, 14(4), 2013. doi:10.5555/2567709.2502610.
- [45] Zhijian Yuan and Erkki Oja. Projective nonnegative matrix factorization for image compression and feature extraction. In Image Analysis: 14th Scandinavian Conference, SCIA 2005, Joensuu, Finland, June 19-22, 2005. Proceedings 14, pages 333–342. Springer, 2005. doi:10.1007/11499145_35.
- [46] Zhijian Yuan, Zhirong Yang, and Erkki Oja. Projective nonnegative matrix factorization: Sparseness, orthogonality, and clustering, 2009.
- [47] Hui Zou, Trevor Hastie, and Robert Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265–286, 2006.
Appendix A Missing Proofs
A.1 Proof of Lemma 12
Proof.
For any two arbitrary projection matrices and of rank , consider the difference
(46) | |||
(47) | |||
(48) | |||
() | |||
(rank of ) | |||
() | |||
() | |||
() |
If we let , then we have
for any projection matrix of rank at most . This can we re written as
(49) |
Using the fact that , we get
The fact that follows from the fact that
(50) | ||||
() |