Decremental (1+ε)-Approximate Maximum Eigenvector: Dynamic Power Method

Adil, Deeksha; Saranurak, Thatchaphol

doi:10.4230/LIPIcs.ICALP.2025.6

Decremental $(1+\epsilon)$ -Approximate Maximum Eigenvector: Dynamic Power Method

Deeksha Adil Institute for Theoretical Studies, ETH Zürich, Switzerland Thatchaphol Saranurak

Department of Computer Science, University of Michigan, Ann Arbor, MI, USA

Abstract

We present a dynamic algorithm for maintaining $(1+\epsilon)$ -approximate maximum eigenvector and eigenvalue of a positive semi-definite matrix $A$ undergoing decreasing updates, i.e., updates which may only decrease eigenvalues. Given a vector $v$ updating $A\leftarrow A-vv^{\top}$ , our algorithm takes $\tilde{O}(\mathrm{nnz}(v))$ amortized update time, i.e., polylogarithmic per non-zeros in the update vector.

Our technique is based on a novel analysis of the influential power method in the dynamic setting. The two previous sets of techniques have the following drawbacks (1) algebraic techniques can maintain exact solutions but their update time is at least polynomial per non-zeros, and (2) sketching techniques admit polylogarithmic update time but suffer from a crude additive approximation.

Our algorithm exploits an oblivious adversary. Interestingly, we show that any algorithm with polylogarithmic update time per non-zeros that works against an adaptive adversary and satisfies an additional natural property would imply a breakthrough for checking psd-ness of matrices in $\tilde{O}(n^{2})$ time, instead of $O(n^{\omega})$ time.

Keywords and phrases:

Power Method, Dynamic Algorithms

Category:

Track A: Algorithms, Complexity and Games

Funding:

Deeksha Adil: Supported by Dr. Max Rössler, the Walter Haefner Foundation and the ETH Zürich Foundation.

Thatchaphol Saranurak: Supported by NSF grant CCF-2238138.

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Algorithm design techniques

Editors:

Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Computing eigenvalues and eigenvectors of a matrix is predominant in several applications, including principle component analysis, clustering in high-dimensional data, semidefinite programming, spectral graph partitioning, and algorithms such as Google’s PageRank algorithm. In the era of massive and dynamic datasets, developing algorithms capable of efficiently updating spectral information with changing inputs has become indispensable.

The study of the change in the spectrum of a matrix undergoing updates spans over five decades, with [13] providing the first algebraic characterization of the change for positive semi-definite matrices via the secular equation. This result offered an explicit formula for exactly computing all new eigenvalues when a matrix undergoes a single rank-one update, assuming knowledge of the entire eigen-decomposition of the initial matrix. Subsequently, [12] showed how to additionally compute eigenvectors explicitly, and handle the case of repeated eigenvalues of the initial matrix. A decade later, [5] extended the works of [13, 12] to compute all new eigenvalues and eigenvectors of a positive semi-definite matrix undergoing an update of the form of a small-rank matrix, or a single batch update. Further works extend these techniques to computing singular values [27], as well as computing new eigenvalues and eigenvectors when only the partial eigen-decomposition of the initial matrix is known [21]. However, all these works require a full eigen-decomposition of the initial matrix and only handle a single update to the matrix.

Another independent line of work aims at finding a small and sparse matrix that has eigenvalues close to the original matrix. A recent work in this direction by [10] provides a universal sparsifier $\SS$ for positive semi-definite matrices $\boldsymbol{\mathit{A}}$ , such that $\SS$ has at most $n/\epsilon^{4}$ non-zero entries and $\|\boldsymbol{\mathit{A}}-\boldsymbol{\mathit{A}}\circ\SS\|\leq\epsilon n$ . [28] give an alternate technique by using gaussian sketches with $O(1/\epsilon^{2})$ rows to approximate all eigenvalues of $\boldsymbol{\mathit{A}}$ to an additive error of $\epsilon\|\boldsymbol{\mathit{A}}\|_{F}$ . The algorithms that find sparse matrices approximating the eigenvalues in these works may be extended to handle updates in the initial matrix quickly. However, the approximation of the eigenvalues achieved is quite crude, i.e., at least $\epsilon n$ additive factor. This approximation is also shown to be tight for such techniques.

Now consider the simpler task of only maintaining the maximum eigenvalue and eigenvector of a matrix $\boldsymbol{\mathit{A}}$ as it undergoes updates. As we have seen above, known methods based on algebraic techniques require full spectral information before computing the new eigenvalues, and maintaining the entire eigen-decomposition can be slow. Sparsification-based algorithms are fast but only achieve large additive approximations of at least $\epsilon n$ , which is not quite satisfactory. Works on streaming PCA, which have a similar goal of maintaining large eigenvectors focus on optimizing the space complexity instead of runtimes [2]. Previous works based on dynamically maintaining the matrix inverse can maintain $(1+\epsilon)$ -multiplicative approximations to the maximum eigenvalues of an $n\times n$ symmetric matrix undergoing single-entry updates with an update time of $O(n^{1.447}/\mathrm{poly}(\epsilon))$ [26]. This was further improved to $O(n^{1.407}/\mathrm{poly}(\epsilon))$ [32]. The update time is even slower for row, column, or rank-one updates. In any case, it takes polynomial time per non-zeros of the updates. This leads to the following natural question:

Is there a dynamic algorithm that can maintain a multiplicative approximation of the maximum eigenvalue of a matrix using polylogarithmic update time per non-zeros in the update?

In this paper, we study the problem of maintaining a $(1+\epsilon)$ -multiplicative approximation to the maximum eigenvalue and eigenvector of a positive semi-definite matrix as it undergoes a sequence of rank-one updates that may only decrease the eigenvalues. We note that this is equivalent to finding the minimum eigenvalue and eigenvector of a positive semi-definite matrix undergoing rank-one updates that may only increase the eigenvalues. We now formally state our problem.

Problem 1 (Decremental Approximate Maximum Eigenvalue and Eigenvector).

We are given a size parameter $n$ , an accuracy parameter $\epsilon\in(0,1)$ , a psd matrix $\boldsymbol{\mathit{A}}_{0}\succeq 0$ of size $n\times n$ , and an online sequence of vectors $\boldsymbol{\mathit{v}}_{1},\boldsymbol{\mathit{v}}_{2},\dots,\boldsymbol{% \mathit{v}}_{T}$ that update $\boldsymbol{\mathit{A}}_{t}\leftarrow\boldsymbol{\mathit{A}}_{t-1}-\boldsymbol% {\mathit{v}}_{t}\boldsymbol{\mathit{v}}_{t}^{\top}$ with a promise that $\boldsymbol{\mathit{A}}_{t}\succeq 0$ for all $t$ . The goal is to maintain an $\epsilon$ -approximate eigenvalue $\lambda_{t}\in\mathbb{R}$ and $\epsilon$ -approximate eigenvector $\boldsymbol{\mathit{w}}_{t}\in\mathbb{R}^{n}$ of $\boldsymbol{\mathit{A}}_{t}$ for all time $t$ . That is $\lambda_{t}$ and unite vector $\boldsymbol{\mathit{w}}_{t}$ satisfying,

\max_{\boldsymbol{\mathit{u}}:\textrm{unit}}\boldsymbol{\mathit{u}}^{\top}% \boldsymbol{\mathit{A}}_{t}\boldsymbol{\mathit{u}}\geq\lambda_{t}\geq(1-% \epsilon)\max_{\boldsymbol{\mathit{u}}:\textrm{unit}}\boldsymbol{\mathit{u}}^{% \top}\boldsymbol{\mathit{A}}_{t}\boldsymbol{\mathit{u}},

(1)

and

\boldsymbol{\mathit{w}}_{t}^{\top}\boldsymbol{\mathit{A}}_{t}\boldsymbol{% \mathit{w}}_{t}\geq(1-\epsilon)\max_{\boldsymbol{\mathit{u}}:\textrm{unit}}% \boldsymbol{\mathit{u}}^{\top}\boldsymbol{\mathit{A}}_{t}\boldsymbol{\mathit{u% }}.

(2)

1.1 Our Results

We give an algorithm for the Decremental Approximate Maximum Eigenvalue and Eigenvector Problem that has an amortized update time of $\approx\widetilde{O}(nnz(\boldsymbol{\mathit{v}}_{t}))\leq\widetilde{O}(n)$ ¹¹1 $\widetilde{O}$ hides polynomials in $\log n$ .. In other words, the total time required by our algorithm over $T$ updates is at most $\widetilde{O}(nnz(\boldsymbol{\mathit{A}}_{0})+\sum_{i=1}^{T}nnz(\boldsymbol{% \mathit{v}}_{i}))$ . Observe that our algorithm only requires time that is $\widetilde{O}(1)\times$ (time required to read the input) and can handle any number of decremental updates. Our algorithm works against an oblivious adversary, i.e., the update sequence is fixed from the beginning. This is the first algorithm that can handle a sequence of updates while providing a multiplicative approximation to the eigenvalues and eigenvectors in a total runtime of $\leq\widetilde{O}(n^{2}+n\cdot T)$ and an amortized update time faster than previous algorithms by a factor of $n^{\Omega(1)}$ . Formally, we prove the following:

Theorem 2.

There is an algorithm for Problem 1 under a sequence of $T$ decreasing updates, that given parameters $n$ , $\boldsymbol{\mathit{A}}_{0}$ , and $\epsilon>1/n$ as input, with probability at least $1-1/n$ works against an oblivious adversary in total time,

O\mathopen{}\mathclose{{\left(\frac{\log^{3}n\log^{6}\frac{n}{\epsilon}\log% \frac{\lambda_{\max}(\boldsymbol{\mathit{A}}_{0})}{\lambda_{\max}(\boldsymbol{% \mathit{A}}_{T})}}{\epsilon^{4}}\mathopen{}\mathclose{{\left(nnz(\boldsymbol{% \mathit{A}}_{0})+\sum_{i=1}^{T}nnz(\boldsymbol{\mathit{v}}_{i})}}\right)}}% \right).

Our algorithm is a novel adaptation of the classical power method (see, e.g., [29]) to the dynamic setting, along with a new analysis that may be of independent interest.

Our work can also be viewed as the first step towards generalizing the dynamic algorithms for solving positive linear programs of [11] to solving dynamic positive semi-definite programs. We discuss this connection in detail in Appendix A.

1.2 Towards Separation between Oblivious and Adaptive adversaries

We also explore the possibility of removing the assumption of an oblivious adversary and working against adaptive adversaries. Recall that update sequences are given by adaptive adversaries when the update sequence can depend on the solution of the algorithm.

We show that if there is an algorithm for Problem 1 with a total running time of at most $\widetilde{O}(n^{2})$ such that the output $\boldsymbol{\mathit{w}}_{t}$ ’s satisfy an additional natural property (which is satisfied by the output of the power method, but not by our algorithm), then it contradicts the hardness of a well-known barrier in numerical linear algebra, therefore ruling out such algorithms.

We first state the barrier formally, and then state our result for adaptive adversaries. Recall that, given a matrix $\boldsymbol{\mathit{A}}$ , the condition number of $\boldsymbol{\mathit{A}}$ is $\frac{\max_{x:\textrm{unit}}\|\boldsymbol{\mathit{A}}\boldsymbol{\mathit{x}}\|% }{\min_{x:\textrm{unit}}\|\boldsymbol{\mathit{A}}\boldsymbol{\mathit{x}}\|}$ .

Problem 3 (Checking psdness with certificate).

Given $\delta\in(0,1)$ , parameter $\kappa$ , and a symmetric matrix $\boldsymbol{\mathit{A}}$ of size $n\times n$ with condition number at most $\kappa$ , either

$\blacksquare$

Compute a matrix $\boldsymbol{\mathit{X}}$ where $\|\boldsymbol{\mathit{A}}-\boldsymbol{\mathit{X}}\boldsymbol{\mathit{X}}^{T}\|% \leq\delta\min_{\|\boldsymbol{\mathit{x}}\|=1}\|\boldsymbol{\mathit{A}}% \boldsymbol{\mathit{x}}\|$ , certifying that $\boldsymbol{\mathit{A}}$ is a psd matrix, or
$\blacksquare$

Report that $\boldsymbol{\mathit{A}}$ is not a psd matrix.

Recall that $A$ is a psd matrix iff there exists $\boldsymbol{\mathit{X}}$ such that $\boldsymbol{\mathit{A}}=\boldsymbol{\mathit{X}}\boldsymbol{\mathit{X}}^{\top}$ . The matrix of $\boldsymbol{\mathit{X}}$ is called a vector realization of $\boldsymbol{\mathit{A}}$ , and note that $\boldsymbol{\mathit{X}}$ is not unique. The problem above asks to compute a (highly accurate) vector realization of $\boldsymbol{\mathit{A}}$ . Both eigendecomposition and Cholesky decomposition of $\boldsymbol{\mathit{A}}$ give a solution for Problem 3.²²2Given an eigendecomposition $\boldsymbol{\mathit{A}}=\boldsymbol{\mathit{Q}}\Lambda\boldsymbol{\mathit{Q}}^% {\top}$ where $\Lambda$ is a non-negative diagonal matrix and $\boldsymbol{\mathit{Q}}$ is an orthogonal matrix, we set $\boldsymbol{\mathit{X}}=\boldsymbol{\mathit{Q}}\Lambda^{1/2}$ . Given a Cholesky decomposition $\boldsymbol{\mathit{A}}=\boldsymbol{\mathit{L}}\boldsymbol{\mathit{L}}^{\top}$ where $\boldsymbol{\mathit{L}}$ is a lower triangular matrix, we set $\boldsymbol{\mathit{X}}=\boldsymbol{\mathit{L}}$ . Banks et al [7] showed how to compute with high accuracy an eigendecomposition of a psd matrix in $O(n^{\omega+\eta})$ time for any constant $\eta>0$ , where $\omega>2.37$ is the matrix multiplication exponent. Hence, a vector realization of $\boldsymbol{\mathit{A}}$ can be found in the same time. Observe that, when $\delta<\kappa$ , Problem 3 is at least as hard as certifying that a given matrix $\boldsymbol{\mathit{A}}$ is a psd matrix. It is a notorious open problem whether certifying the psd-ness of a matrix can be done faster than $o(n^{\omega})$ even when the condition number is polynomial in $n$ . Therefore, we view Problem 3 as a significant barrier and formalize it as follows.

Hypothesis 4 (PSDness Checking Barrier).

There is a constant $\eta>0$ such that, every randomized algorithm for solving Problem 3 for instances with $\frac{\kappa}{\delta}\leq\mathrm{poly}(n)$ , correctly with probability at least $2/3$ requires $n^{2+\eta}$ time.

Our negative result states that, assuming Hypothesis 4, there is no algorithm for Problem 1 against adaptive adversaries with sub-polynomial update time that maintains $\boldsymbol{\mathit{w}}_{t}$ satisfying an additional property stated below.

Property 5.

For every $t$ let $\boldsymbol{\mathit{u}}_{i}(\boldsymbol{\mathit{A}}_{t})$ denote the eigenvectors of $\boldsymbol{\mathit{A}}_{t}$ and $\lambda_{i}(\boldsymbol{\mathit{A}}_{t})$ denote the eigenvalues. For all $i$ such that $\lambda_{i}(\boldsymbol{\mathit{A}}_{t})\leq\lambda_{1}(\boldsymbol{\mathit{A}% }_{t})/2$ ,

\mathopen{}\mathclose{{\left(\boldsymbol{\mathit{w}}_{t}^{\top}\boldsymbol{% \mathit{u}}_{i}\mathopen{}\mathclose{{\left(\boldsymbol{\mathit{A}}_{t}}}% \right)}}\right)^{2}\leq\frac{1}{n^{2}}\cdot\frac{\lambda_{i}\mathopen{}% \mathclose{{\left(\boldsymbol{\mathit{A}}_{t}}}\right)}{\lambda_{1}(% \boldsymbol{\mathit{A}}_{t})}.

Theorem 6.

Assuming Hypothesis 4, there is no algorithm for Problem 1 that maintains $\boldsymbol{\mathit{w}}_{t}$ ’s additionally satisfying Property 5, and given parameters $n$ and $\epsilon=\min\mathopen{}\mathclose{{\left\{1-\frac{1}{n^{o(1)}},\frac{1-\delta% }{1+\delta}}}\right\}$ as input, works against an adaptive adversary in time $n^{o(1)}\cdot\mathopen{}\mathclose{{\left(nnz(\boldsymbol{\mathit{A}}_{0})+% \sum_{t=1}^{T}nnz(\boldsymbol{\mathit{v}}_{i})}}\right)$ .

Let us motivate Property 5 from several aspects below. First, in the static setting, this property can be easily satisfied since the static power method strongly satisfies it (see Lemma 14). Second, the statement of the property itself is a natural property that we might expect from an algorithm. Consider a “bad” eigenvector $\boldsymbol{\mathit{u}}_{i}$ whose corresponding eigenvalue is very small, i.e., less than half of the maximum one. It states that the projection of the output $\boldsymbol{\mathit{w}}_{t}$ along $\boldsymbol{\mathit{u}}_{i}$ should be very small, i.e., a polynomial factor smaller than the projection along the maximum eigenvector. This is intuitively useful because we do not want the output $\boldsymbol{\mathit{w}}_{t}$ to direct to the “bad” direction. It should mainly direct along the approximately maximum eigenvector. Third, we formalize this intuition and crucially exploit Property 5 to prove a reduction in Theorem 6. Specifically, Property 5 allows us to decrease eigenvalues of a psd matrix while maintaining the psd-ness, which is crucial for us. See Section 4 for more details. Lastly, our current algorithm from Theorem 2, a dynamic version of the power method, actually maintains $\boldsymbol{\mathit{w}}_{t}$ ’s that satisfy this property for certain snapshots, but not at every step. Unfortunately, we do not see how to strengthen our algorithm to satisfy Property 5 nor how to remove it from the requirement of Theorem 6. We leave both possibilities as open problems.

Understanding the power and limitations of dynamic algorithms against an oblivious adversary vs. an adaptive adversary has become one of the main research programs in the area of dynamic algorithms. However, finding a natural dynamic problem that separates oblivious and adaptive adversaries is still a wide-open problem.³³3 Beimel et al. [9] gave the first separations for artificial problems assuming a strong cryptographic assumption. Bateni et al. [8] shows a separation between the adversaries for the $k$ -center clustering problem, however, their results are based on the existence of a black box which the adaptive adversary can control. In this paper, we suggest the problem of maintaining approximate maximum eigenvectors as a natural candidate for the separation and give some preliminary evidence for this.

Organization

In the following sections, we begin with some preliminaries required to prove our results in Section 2, followed by our algorithm and its analysis in Section 3, and our conditional lower bound in Section 4. In Section 5, we present some open problems. In the appendix (Section A), we show connections with positive semi-definite programs.

2 Preliminaries

Let $\boldsymbol{\mathit{A}}_{0}$ denote the initial matrix. Let $\lambda_{0}=\lambda_{\max}(\boldsymbol{\mathit{A}}_{0})=\|\boldsymbol{\mathit{% A}}_{0}\|$ denote the maximum eigenvalue of the initial matrix $\boldsymbol{\mathit{A}}_{0}$ . The following are the key definitions we use in our analysis.

Definition 7 ( $\epsilon$ -max span and dimension).

We define $\mathrm{span}(\epsilon,\boldsymbol{\mathit{A}})$ to denote the space spanned by all eigenvectors of $\boldsymbol{\mathit{A}}$ corresponding to eigenvalues $\lambda$ satisfying $\lambda\geq(1-\epsilon)\lambda_{0}$ . Let $\mathrm{dim}(\epsilon,\boldsymbol{\mathit{A}})$ to denote the dimension of the space $\mathrm{span}(\epsilon,\boldsymbol{\mathit{A}})$ .

We emphasize that $\lambda_{0}$ depends only on $\boldsymbol{\mathit{A}}_{0}$ . So, it is a static value that does not change through time. We will use the following linear algebraic notations.

Definition 8.

Let $S_{1}$ and $S_{2}$ be two subspaces of a vector space $S$ . The sum $S_{1}+S_{2}$ is the space,

S_{1}+S_{2}=\{s=s_{1}+s_{2}:s_{1}\in S_{1},s_{2}\in S_{2}\}.

The complement $\overline{S}$ of $S$ is the vector space such that $S+\overline{S}=\mathbb{R}^{n}$ , and $S\cap\overline{S}=\{0\}$ . The difference, $S_{1}-S_{2}$ is defined as,

S_{1}-S_{2}=S_{1}\cap\overline{S_{2}}.

Next, we list standard facts about high-dimensional probability needed in our analysis.

Lemma 9 (Chernoff Bound).

Let $x_{1},\cdots x_{m}$ be independent random variables such that $a\leq x_{i}\leq b$ for all $i$ . Let $x=\sum_{i}x_{i}$ and let $\mu=\operatorname*{\mathbbm{E}}[x]$ . Then for all $\delta>0$ ,

\Pr[x\geq(1+\delta)\mu]\leq\exp\mathopen{}\mathclose{{\left(-\frac{2\delta^{2}% \mu^{2}}{m(b-a)^{2}}}}\right)

\Pr[x\leq(1-\delta)\mu]\leq\exp\mathopen{}\mathclose{{\left(-\frac{\delta^{2}% \mu^{2}}{m(b-a)^{2}}}}\right)

Lemma 10 (Norm of Gaussian Vector).

A random vector $\boldsymbol{\mathit{v}}\in\mathbb{R}^{n}$ with every coordinate chosen from a normal distribution, $N(0,1)$ satisfies,

\Pr[|\|\boldsymbol{\mathit{v}}\|^{2}-n|\leq 2(1+\delta)\delta\cdot n]\geq 1-e^% {-\delta^{2}n}.

Proof.

The vector $\boldsymbol{\mathit{v}}$ has entries that are from $N(0,1)$ . Now, every $\boldsymbol{\mathit{v}}_{i}^{2}$ follows a $\chi^{2}$ distribution. From Lemma 1 of [18] we have the following tail bound for a sum of $\chi^{2}$ random variables,

\Pr[|\sum_{i}\boldsymbol{\mathit{v}}^{2}_{i}-n|>2\sqrt{nx}+2x]\leq e^{-x}.

Choosing $x=n\delta^{2}$ gives,

\Pr[|\|\boldsymbol{\mathit{v}}\|^{2}-n|\leq 2\delta(1+\delta)n]\geq 1-e^{-% \delta^{2}n},

as required. $\hfill\blacktriangleleft$

Lemma 11 (Distribution of $\chi^{2}$ Variable).

Let $x\sim N(0,1)$ be a gaussian random variable. Then,

\Pr\mathopen{}\mathclose{{\left[x^{2}\geq\frac{1}{n^{4}}}}\right]\geq 1-\frac{% 1}{n^{2}}.

Proof.

The probability distribution function for $y=x^{2}$ is given by,

\boldsymbol{\mathit{f}}(y)=\frac{1}{\sqrt{2}\Gamma(\frac{1}{2})}y^{-\frac{1}{2% }}e^{-\frac{y}{2}}.

It is known that $\Gamma(\frac{1}{2})=\sqrt{\pi}$ . Now,

\displaystyle\Pr\mathopen{}\mathclose{{\left[x^{2}\leq\frac{1}{n^{4}}}}\right]% =\int_{0}^{1/n^{4}}\frac{1}{\sqrt{2}\Gamma(\frac{1}{2})}y^{-\frac{1}{2}}e^{-% \frac{y}{2}}dy\leq\frac{e^{0}}{\sqrt{2\pi}}\int_{0}^{1/n^{4}}y^{-\frac{1}{2}}% dy=\sqrt{\frac{2}{\pi}}\cdot\frac{1}{n^{2}}\leq\frac{1}{n^{2}}.

$\hfill\blacktriangleleft$

3 Algorithms against an Oblivious Adversary

To prove Theorem 2, we first reduce Problem 1 to solving a normalized threshold version of the problem where we assume that initially, the maximum eigenvalue is not much bigger than one. Then we want to maintain a certificate that the maximum eigenvalue is not much less than one until no such certificate exists. This is formalized below.

Problem 12 (DecMaxEV( $\epsilon,\boldsymbol{\mathit{A}}_{0},\boldsymbol{\mathit{v}}_{1},\cdots,% \boldsymbol{\mathit{v}}_{T}$ )).

Let $\boldsymbol{\mathit{A}}_{0}$ be an $n\times n$ symmetric PSD matrix such that $\lambda_{\max}(\boldsymbol{\mathit{A}}_{0})\leq 1+\frac{\epsilon}{\log n}$ . The DecMaxEV( $\epsilon,\boldsymbol{\mathit{A}}_{0},\boldsymbol{\mathit{v}}_{1},\cdots,% \boldsymbol{\mathit{v}}_{T}$ ) problem asks to find for every $t$ , a vector $\boldsymbol{\mathit{w}}_{t}$ such that

\|\boldsymbol{\mathit{w}}_{t}\|=1\quad\text{and}\quad\boldsymbol{\mathit{w}}_{% t}^{\top}\boldsymbol{\mathit{A}}_{t}\boldsymbol{\mathit{w}}_{t}\geq 1-40\epsilon,

or return False indicating that $\lambda_{\max}(\boldsymbol{\mathit{A}}_{t})\leq 1-\frac{\epsilon}{\log n}$ .

We defer the proof of the standard reduction stated below to the appendix.

Lemma 13.

Given an algorithm $\mathcal{A}$ that solves the decision problem DecMaxEV( $\epsilon,\boldsymbol{\mathit{A}}_{0},\boldsymbol{\mathit{v}}_{1},\cdots,% \boldsymbol{\mathit{v}}_{T}$ ) (Definition 12) for any $\epsilon>0$ , $\boldsymbol{\mathit{A}}_{0}\succeq 0$ and vectors $\boldsymbol{\mathit{v}}_{1},\cdots,\boldsymbol{\mathit{v}}_{T}$ in time $\mathcal{T}$ , we can solve Problem 1 in total time $O\mathopen{}\mathclose{{\left(\frac{\log^{2}n\log\frac{n}{\epsilon}}{\epsilon}% \cdot nnz(\boldsymbol{\mathit{A}}_{0})+\frac{\log n}{\epsilon}\log\frac{% \lambda_{\max}(\boldsymbol{\mathit{A}}_{0})}{\lambda_{\max}(\boldsymbol{% \mathit{A}}_{T})}\mathcal{T}}}\right)$ .

Next, we describe Algorithm 1 which can be viewed as an algorithm for Problem 12 when there are no updates. Our algorithm essentially applies the power iteration, which is a standard algorithm used to find an approximate maximum eigenvalue and eigenvector of a matrix. In the algorithm, we make $R=O(\log n)$ copies to boost the probability.

Algorithm 1 DecMaxEV with no update.

Below, we state the guarantees of the power method.

Lemma 14.

Let $\epsilon>0$ and $\boldsymbol{\mathit{A}}\succeq 0$ . Let $\boldsymbol{\mathit{W}}$ be as defined in Line 9 in the execution of PowerMethod( $\epsilon,\boldsymbol{\mathit{A}}$ ). With probability at least $1-1/n^{10}$ , for some $\boldsymbol{\mathit{w}}\in\boldsymbol{\mathit{W}}$ , it holds that $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{w}}% \geq(1-\frac{\epsilon}{2})\lambda_{\max}(\boldsymbol{\mathit{A}})$ . The total time taken by the algorithm is at most $O\mathopen{}\mathclose{{\left(\frac{nnz(\boldsymbol{\mathit{A}})\log n\log% \frac{n}{\epsilon}}{\epsilon}}}\right)$ .

Furthermore, let $\lambda_{i}$ and $\boldsymbol{\mathit{u}}_{i}$ denote the eigenvalues and eigenvectors of $\boldsymbol{\mathit{A}}$ . For all $i$ such that $\lambda_{i}(\boldsymbol{\mathit{A}})\leq\frac{\lambda_{\max}(\boldsymbol{% \mathit{A}})}{2}$ , with probability at least $1-2/n^{10}$ , $\mathopen{}\mathclose{{\left[\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit% {u}}_{i}}}\right]^{2}\leq\frac{1}{n^{8}}\cdot\frac{\lambda_{i}}{\lambda_{1}}$ .

We note that the last line of the above lemma is saying that the vectors returned by the power method satisfy Property 5, which we state for completeness but is not required by our algorithm. The following result is a direct consequence of Lemma 14.

Corollary 15.

Let $\epsilon>0,\boldsymbol{\mathit{A}}\succeq 0$ . Let $\boldsymbol{\mathit{W}}$ be as defined in Line 9 in the execution of PowerMethod( $\epsilon,\boldsymbol{\mathit{A}}$ ). If $\lambda_{\max}(\boldsymbol{\mathit{A}})\geq 1-\epsilon$ , then with probability at least $1-1/n^{10}$ , $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{w}}% \geq 1-5\epsilon$ for some $\boldsymbol{\mathit{w}}\in\boldsymbol{\mathit{W}}$ . Furthermore, if $\lambda_{\max}(\boldsymbol{\mathit{A}})\geq 1-\epsilon/\log n$ , then with probability at least $1-1/n^{10}$ , $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{w}}% \geq 1-\epsilon$ for some $\boldsymbol{\mathit{w}}\in\boldsymbol{\mathit{W}}$ . The total time taken by the algorithm is at most $O\mathopen{}\mathclose{{\left(\frac{nnz(\boldsymbol{\mathit{A}})\log n\log% \frac{n}{\epsilon}}{\epsilon}}}\right)$ .

Observe that, if the algorithm returns $[r_{0},\boldsymbol{\mathit{W}}]$ , then $(\boldsymbol{\mathit{w}}^{(r)})^{\top}\boldsymbol{\mathit{A}}\boldsymbol{% \mathit{w}}^{(r)}\geq 1-5\epsilon$ for $r=r_{0}$ , and $\boldsymbol{\mathit{w}}^{(r_{0})}$ is therefore a solution to Problem 12 when there is no update. The power method and its analysis are standard, and we thus defer the proof of Lemma 14 to the appendix.

Next, in Algorithms 2 and 3 we describe an algorithm for Problem 12 when we have an online sequence of updates $\boldsymbol{\mathit{v}}_{1},\cdots,\boldsymbol{\mathit{v}}_{T}$ . The algorithm starts by initializing $R=O(\log n)$ copies of the approximate maximum eigenvectors from the power method. Given a sequence of updates, as long as one of the copies is the witness that the current matrix $\boldsymbol{\mathit{A}}_{t}$ still has a large eigenvalue, i.e., there exists $r$ where $(\boldsymbol{\mathit{w}}^{(r)})_{t}^{\top}\boldsymbol{\mathit{A}}_{t}% \boldsymbol{\mathit{w}}^{(r)}_{t}\geq 1-40\epsilon$ , we can just return $\boldsymbol{\mathit{w}}^{(r)}$ as the solution to Problem 12. Otherwise, $(\boldsymbol{\mathit{w}}^{(r)})_{t}^{\top}\boldsymbol{\mathit{A}}_{t}% \boldsymbol{\mathit{w}}^{(r)}_{t}<1-40\epsilon$ for all $r\leq R$ and none of the vectors from the previous call to the power method are a witness of large eigenvalues anymore. In this case, we simply recompute these vectors by calling the power method again. If the power method returns that there is no large eigenvector, then we return False from now. Otherwise, we continue in the same manner. Note that our algorithm is very simple, but as we will see, the analysis is not straightforward.

Algorithm 2 Initialization.

Algorithm 3 Update algorithm at time

t

(

A_{t-1},r_{t},\boldsymbol{\mathit{W}}_{t-1}=[w^{(r)}_{t-1}:r=1,\cdots R],\epsilon

are maintained).

3.1 Proof Overview

The overall proof of Theorem 2, including the proof of correctness and the runtime depends on the number of executions in Line 4 in Algorithm 3. If the number of executions of Line 4 is bounded by $\mathrm{poly}(\log n/\epsilon)$ , then the remaining analysis is straightforward. Therefore, the majority of our analysis is dedicated to proving this key lemma, i.e., $\mathrm{poly}(\log n/\epsilon)$ bound on the number of calls to the power method:

Lemma 16 (Key Lemma).

The number of executions of Line 4 over all updates is bounded by $O(\log n\log^{5}\frac{n}{\epsilon}/\epsilon^{2})$ with probability at least $1-\frac{1}{n}$ .

Given the key lemma, the correctness and runtime analyses are quite straightforward. We now give an overview of the proof of Lemma 16.

Let us consider what happens between two consecutive calls to Line 4, say at $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\widetilde{\mathit{A}}}=\boldsymbol{\mathit{A}}-\sum_{i=1}^{k}% \boldsymbol{\mathit{v}}_{i}\boldsymbol{\mathit{v}}_{i}^{\top}$ . We first define the following subspaces of $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\widetilde{\mathit{A}}}$ . Recall Definition 8, which we use to define the following subspaces.

Definition 17 (Subspaces of $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\widetilde{\mathit{A}}}$ ).

Given $\epsilon>0$ , $\boldsymbol{\mathit{A}}$ , and $\boldsymbol{\widetilde{\mathit{A}}}$ define for $\nu=0,1,\cdots,15\log\frac{n}{\epsilon}-1$ :

T_{\nu}=\mathrm{span}\mathopen{}\mathclose{{\left(\frac{(\nu+1)\epsilon}{5\log% \frac{n}{\epsilon}},\boldsymbol{\mathit{A}}}}\right)-\mathrm{span}\mathopen{}% \mathclose{{\left(\frac{\nu\epsilon}{5\log\frac{n}{\epsilon}},\boldsymbol{% \mathit{A}}}}\right),

and,

\tilde{T}_{\nu}=\mathrm{span}\mathopen{}\mathclose{{\left(\frac{(\nu+1)% \epsilon}{5\log\frac{n}{\epsilon}},\boldsymbol{\widetilde{\mathit{A}}}}}\right% )-\mathrm{span}\mathopen{}\mathclose{{\left(\frac{\nu\epsilon}{5\log\frac{n}{% \epsilon}},\boldsymbol{\widetilde{\mathit{A}}}}}\right).

That is, the space $T_{\nu}$ and ${\tilde{\mathit{T}}}_{\nu}$ are spanned by eigenvectors of $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\widetilde{\mathit{A}}}$ , respectively, corresponding to eigenvalues between $\mathopen{}\mathclose{{\left(1-(\nu+1)\frac{\epsilon}{5\log\frac{n}{\epsilon}}% }}\right)\lambda_{0}$ and $\mathopen{}\mathclose{{\left(1-\nu\frac{\epsilon}{5\log\frac{n}{\epsilon}}}}% \right)\lambda_{0}$ .

Let $d_{\nu}=\mathrm{dim}(T_{\nu})$ and $\tilde{d}_{\nu}=\mathrm{dim}(\tilde{T}_{\nu})$ . Also define,

\tilde{T}=\mathrm{span}(3\epsilon,\boldsymbol{\widetilde{\mathit{A}}}),\quad T% =\mathrm{span}(3\epsilon,\boldsymbol{\mathit{A}}),

and let $d=\mathrm{dim}(T)$ , $\tilde{d}=\mathrm{dim}(\tilde{T})$ .

Observe that $T=\sum_{\nu=0}^{15\log\frac{n}{\epsilon}-1}T_{\nu}$ and similarly ${\tilde{\mathit{T}}}=\sum_{\nu=0}^{15\log\frac{n}{\epsilon}-1}{\tilde{\mathit{% T}}}_{\nu}$ . We next define some indices/levels corresponding to large subspaces, which we call “important levels”.

Definition 18 (Important $\nu$ ).

We say a level $\nu$ is important if,

d_{\nu}\geq\frac{\epsilon}{600\log^{3}\frac{n}{\epsilon}}\sum_{\nu^{\prime}<% \nu}d_{\nu^{\prime}}.

We will use $\mathcal{I}$ to denote the set of $\nu$ that are important.

The main technical lemma that implies Lemma 16 is the following:

Lemma 19 (Measure of Progress).

Let $\epsilon>0$ and let $\boldsymbol{\mathit{W}}=[\boldsymbol{\mathit{w}}^{(1)},\dots,\boldsymbol{% \mathit{w}}^{(R)}]$ be as defined in Line 9 in the execution of PowerMethod( $\epsilon,\boldsymbol{\mathit{A}}$ ). Let $\boldsymbol{\mathit{v}}_{1},\cdots,\boldsymbol{\mathit{v}}_{k}$ be a sequence of updates generated by an oblivious adversary and define $\boldsymbol{\widetilde{\mathit{A}}}=\boldsymbol{\mathit{A}}-\sum_{i=1}^{k}% \boldsymbol{\mathit{v}}_{i}\boldsymbol{\mathit{v}}_{i}^{\top}$ .

Suppose that $\lambda_{\max}(\boldsymbol{\mathit{A}})\geq 1-\epsilon$ and $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\widetilde{\mathit{A}}}\boldsymbol{% \mathit{w}}<1-40\epsilon$ for all $\boldsymbol{\mathit{w}}\in\boldsymbol{\mathit{W}}$ . Then, with probability at least $1-\frac{50\log\frac{n}{\epsilon}}{n^{2}}$ , for some $\nu\in\mathcal{I}$ ,

$\blacksquare$

$\mathrm{dim}(T_{\nu}-\tilde{T})\geq\frac{\epsilon}{300\log\frac{n}{\epsilon}}d% _{\nu}$ if $d_{\nu}\geq\frac{3000\log n\log\frac{n}{\epsilon}}{\epsilon}$ , or
$\blacksquare$

$\mathrm{dim}(T_{\nu}-\tilde{T})\geq 1$ if $d_{\nu}<\frac{3000\log n\log\frac{n}{\epsilon}}{\epsilon}$ .

We prove this lemma in the full version. Intuitively speaking, it means that, whenever Line 4 of Algorithm 3 is executed, there is some important level $\nu$ such that an $\Omega(\epsilon/\mathrm{poly}\log(n/\epsilon))$ -fraction of eigenvalues of $\boldsymbol{\mathit{A}}$ at level $\nu$ have decreased in value. This is the crucial place where we exploit an oblivious adversary.

Given Lemma 19, the remaining proof of Lemma 16 follows a potential function analysis which is presented in detail in the full version. We consider potentials $\Phi_{j}=\sum_{\nu=0}^{j}d_{\nu}$ for $j=0,\cdots,15\log\frac{n}{\epsilon}-1$ . The main observation is that $\Phi_{j}$ is non-increasing over time for all $j$ , and whenever there exists $\nu_{0}\in\mathcal{I}$ that satisfies the condition of Lemma 19, $\Phi_{\nu_{0}}$ decreases by $\mathrm{dim}(T_{\nu_{0}}-\tilde{T})$ . Since $\mathrm{dim}(T_{\nu_{0}}-\tilde{T})\geq\Omega(\epsilon/\mathrm{poly}\log(n/% \epsilon))d_{\nu_{0}}$ and $\nu_{0}\in\mathcal{I}$ , i.e., $\Phi_{\nu_{0}}=\sum_{\nu<\nu_{0}}d_{\nu}+d_{\nu_{0}}\leq d_{\nu_{0}}\mathopen{% }\mathclose{{\left(\frac{O(\log^{3}\frac{n}{\epsilon})}{\epsilon}+1}}\right)$ , we can prove that $\Phi_{\nu_{0}}$ decreases by a multiplicative factor of $\Omega(1-\epsilon^{2}/\mathrm{poly}\log(n/\epsilon))$ . As a result, every time our algorithm executes Line 4, $\Phi_{j}$ decreases by a multiplicative factor for some $j$ , and since we have at most $15\log\frac{n}{\epsilon}$ values of $j$ , we can only have $\mathrm{poly}(\log n/\epsilon)$ executions of Line 4.

It remains to describe how we prove Lemma 19 at a high level. We can write $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\widetilde{\mathit{A}}}\boldsymbol{% \mathit{w}}$ for any $\boldsymbol{\mathit{w}}\in\boldsymbol{\mathit{W}}$ as

\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\widetilde{\mathit{A}}}\boldsymbol{% \mathit{w}}=\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{A}}\boldsymbol{% \mathit{w}}-\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}\boldsymbol{% \mathit{w}},

for $\boldsymbol{\mathit{V}}=\sum_{i=1}^{k}\boldsymbol{\mathit{v}}_{i}\boldsymbol{% \mathit{v}}_{i}^{\top}$ . Our strategy is to show that:

		$\displaystyle\text{If }\mathrm{dim}(T_{\nu}-\tilde{T})\text{ does not % satisfies the inequalities in \lx@cref{creftypecap~refnum}{lem:EigenspaceChang% e} for all }\nu\in\mathcal{I},$
		$\displaystyle\text{then }\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}% \boldsymbol{\mathit{w}}\leq 35\epsilon\text{ for all }\boldsymbol{\mathit{w}}% \in\boldsymbol{\mathit{W}}.$		( $\star$ )

Given (3.1) as formalized in the full version, we can conclude Lemma 19 because, from the definition of $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\widetilde{\mathit{A}}}$ , we have that for some $\boldsymbol{\mathit{w}}\in\boldsymbol{\mathit{W}}$ , $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{w}}% \geq 1-5\epsilon$ by Corollary 15 and $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\widetilde{\mathit{A}}}\boldsymbol{% \mathit{w}}<1-40\epsilon$ . As a result for this $\boldsymbol{\mathit{w}}$ , $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}\boldsymbol{\mathit{w}}>35\epsilon$ . Now, by contra-position of (3.1), we have that $\mathrm{dim}(T_{\nu}-\tilde{T})$ is large for some $\nu\in\mathcal{I}$ .

To prove (3.1), we further decompose $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}\boldsymbol{\mathit{w}}$ as

\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}\boldsymbol{\mathit{w}}=% \boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}_{\tilde{T}}\boldsymbol{% \mathit{w}}+\sum_{\nu=0}^{15\log\frac{n}{\epsilon}-1}\boldsymbol{\mathit{w}}^{% \top}\boldsymbol{\mathit{V}}_{T_{\nu}-\tilde{T}}\boldsymbol{\mathit{w}}+% \boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}_{\overline{T}}% \boldsymbol{\mathit{w}}.

In the above equation, $\boldsymbol{\mathit{V}}_{\tilde{T}}=\Pi_{\tilde{T}}\boldsymbol{\mathit{V}}\Pi_% {\tilde{T}},\boldsymbol{\mathit{V}}_{T_{\nu}-\tilde{T}}=\Pi_{\nu}\boldsymbol{% \mathit{V}}\Pi_{\nu}$ , and $\boldsymbol{\mathit{V}}_{\overline{T}}=\Pi_{\overline{T}}\boldsymbol{\mathit{V% }}\Pi_{\overline{T}}$ where $\Pi_{\tilde{T}},\Pi_{\nu},\Pi_{\overline{T}}$ denote the projections matrices that project any vector onto the spaces $\tilde{T}$ , $T_{\nu}-\tilde{T}$ , and $\overline{T}$ respectively⁴⁴4Suppose a subspace $S$ is spanned by vectors $\boldsymbol{\mathit{u}}_{1},\dots,\boldsymbol{\mathit{u}}_{k}$ . Let $\boldsymbol{\mathit{U}}=[\boldsymbol{\mathit{u}}_{1},\dots,\boldsymbol{\mathit% {u}}_{k}]$ . Recall that the projection matrix onto $S$ is $\boldsymbol{\mathit{U}}(\boldsymbol{\mathit{U}}^{\top}\boldsymbol{\mathit{U}})% ^{-1}\boldsymbol{\mathit{U}}^{\top}$ .. Refer to the full version for proof of why such a split is possible. Our proof of (3.1) then bounds the terms on the right-hand side. Let us consider each term separately.

1.

$\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}_{\tilde{T}}\boldsymbol{% \mathit{w}}$ : We prove that this is always at most $10\epsilon(1+\epsilon)$ . From the definition of $\boldsymbol{\mathit{V}},$

$\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}_{\tilde{T}}\boldsymbol{% \mathit{w}}=\boldsymbol{\mathit{w}}^{\top}\Pi_{\tilde{T}}\boldsymbol{\mathit{A% }}\Pi_{\tilde{T}}\boldsymbol{\mathit{w}}-\boldsymbol{\mathit{w}}^{\top}\Pi_{% \tilde{T}}\boldsymbol{\widetilde{\mathit{A}}}\Pi_{\tilde{T}}\boldsymbol{% \mathit{w}}.$

Since $\Pi_{\tilde{T}}\boldsymbol{\mathit{w}}$ is the projection of $\boldsymbol{\mathit{w}}$ along the large eigenspace of $\boldsymbol{\widetilde{\mathit{A}}}$ , the second term on the right-hand side above is large, i.e. $\geq(1-10\epsilon)\lambda_{0}\|\Pi_{\tilde{T}}\boldsymbol{\mathit{w}}\|^{2}$ . The first term on the right-hand side can be bounded as, $\boldsymbol{\mathit{w}}^{\top}\Pi_{\tilde{T}}\boldsymbol{\mathit{A}}\Pi_{% \tilde{T}}\boldsymbol{\mathit{w}}\leq\|\boldsymbol{\mathit{A}}\|\|\Pi_{\tilde{% T}}\boldsymbol{\mathit{w}}\|^{2}\leq\lambda_{0}\|\Pi_{\tilde{T}}\boldsymbol{% \mathit{w}}\|^{2}$ . Therefore the difference on the right-hand side is at most $10\epsilon\lambda_{0}\|\Pi_{\tilde{T}}\boldsymbol{\mathit{w}}\|^{2}\leq 10% \epsilon\lambda_{0}\|\boldsymbol{\mathit{w}}\|^{2}=10\epsilon\lambda_{0}\leq 1% 0\epsilon(1+\epsilon)$ .
2.

$\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}_{\overline{T}}% \boldsymbol{\mathit{w}}$ : Observe that this term corresponds to the projection of $\boldsymbol{\mathit{w}}$ along the space spanned by the eigenvalues of $\boldsymbol{\mathit{A}}$ of size at most $1-3\epsilon$ . Let $\boldsymbol{\mathit{u}}_{i}$ and $\lambda_{i}$ denote an eigenvector and eigenvalue pair with $\lambda_{i}<1-3\epsilon$ . Since the power method can guarantee that $\boldsymbol{\mathit{w}}^{\top}{\boldsymbol{\mathit{u}}_{i}}\approx\lambda_{i}^% {2K}$ , we have $\lambda_{i}^{2K}\leq(1-3\epsilon)^{2K}\leq\mathrm{poly}\mathopen{}\mathclose{{% \left(\frac{\epsilon}{n}}}\right)$ is tiny. So we have that $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}_{\overline{T}}% \boldsymbol{\mathit{w}}\leq\epsilon$ .

Before we look at the final case, we define a basis for the space $T_{\nu}$ .

Definition 20 (Basis for $T_{\nu}$ ).

Let $T_{\nu}$ be as defined in Definition 17. Define indices $a_{\nu}$ and $b_{\nu}$ with $b_{\nu}-a_{\nu}+1=d_{\nu}$ such that the basis of $T_{\nu}$ is given by $\boldsymbol{\mathit{u}}_{a_{\nu}},\cdots,\boldsymbol{\mathit{u}}_{b_{\nu}}$ , where $\boldsymbol{\mathit{u}}_{1},\boldsymbol{\mathit{u}}_{2},\cdots,\boldsymbol{% \mathit{u}}_{n}$ are the eigenvectors of $\boldsymbol{\mathit{A}}$ in decreasing order of eigenvalues.
3.

$\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}_{T_{\nu}-\tilde{T}}% \boldsymbol{\mathit{w}}$ : For this discussion, we will ignore the constant factors and assume that the high probability events hold. Let $\Pi_{\nu}$ denote the projection matrix for the space $T_{\nu}-{\tilde{\mathit{T}}}$ . Observe that $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}_{T_{\nu}-{\tilde{\mathit% {T}}}}\boldsymbol{\mathit{w}}=\boldsymbol{\mathit{w}}^{\top}\Pi_{\nu}% \boldsymbol{\mathit{V}}\Pi_{\nu}\boldsymbol{\mathit{w}}\leq\|\boldsymbol{% \mathit{V}}\|\|\Pi_{\nu}\boldsymbol{\mathit{w}}\|^{2}\leq(1+\epsilon)\|\Pi_{% \nu}\boldsymbol{\mathit{w}}\|^{2}$ , where the last inequality is because $\boldsymbol{\widetilde{\mathit{A}}}=\boldsymbol{\mathit{A}}-\boldsymbol{% \mathit{V}}\succeq 0$ , and therefore, $\|\boldsymbol{\mathit{V}}\|\leq\|\boldsymbol{\mathit{A}}\|\leq(1+\epsilon)$ . Hence, it suffices to bound $\|\Pi_{\nu}\boldsymbol{\mathit{w}}\|^{2}=O(\epsilon)$ .

We can write $\boldsymbol{\mathit{w}}=\frac{\sum_{i=1}^{n}\lambda_{i}^{K}\alpha_{i}% \boldsymbol{\mathit{u}}_{i}}{\sqrt{\sum_{i=1}^{n}\lambda_{i}^{2K}\alpha_{i}^{2% }}}$ where $\lambda_{i},\boldsymbol{\mathit{u}}_{i}$ ’s are the eigenvalues and eigenvectors of $\boldsymbol{\mathit{A}}$ and $\alpha_{i}\sim N(0,1)$ . Define $\boldsymbol{\mathit{z}}=\sum_{i=1}^{n}z_{i}\boldsymbol{\mathit{u}}_{i}$ where $z_{i}=\lambda_{i}^{K}\alpha_{i}$ . That is, $\boldsymbol{\mathit{w}}=\frac{\boldsymbol{\mathit{z}}}{\|\boldsymbol{\mathit{z% }}\|}$ . Since $\|\Pi_{\nu}\boldsymbol{\mathit{w}}\|=\|\Pi_{\nu}\boldsymbol{\mathit{z}}\|/\|% \boldsymbol{\mathit{z}}\|$ , it suffices to show that $\|\Pi_{\nu}\boldsymbol{\mathit{z}}\|^{2}\leq O(\epsilon)\|\boldsymbol{\mathit{% z}}\|^{2}$ . We show this in two separate cases. In both cases, we start with the following bound

$\|\Pi_{\nu}\boldsymbol{\mathit{z}}\|^{2}\leq\lambda_{a_{\nu}}^{2K}\cdot\mathrm% {dim}(T_{\nu}-\tilde{T}),$

which holds with high probability. To see this, let $\boldsymbol{g}_{\nu}\sim N(0,1)$ be a gaussian vector in the space $T_{\nu}-\tilde{T}$ . We can couple $\textbf{g}_{\nu}$ with $\Pi_{\nu}\boldsymbol{\mathit{z}}$ so that $\Pi_{\nu}\boldsymbol{\mathit{z}}$ is dominated by $\lambda_{a_{\nu}}^{K}\cdot\boldsymbol{g}_{\nu}$ . So $\|\Pi_{\nu}\boldsymbol{\mathit{z}}\|^{2}\leq\lambda_{a_{\nu}}^{2K}\|% \boldsymbol{g}_{\nu}\|^{2}$ . By Lemma 10, the norm square of gaussian vector is concentrated to its dimension so $\|\boldsymbol{g}_{\nu}\|^{2}\leq\mathrm{dim}(T_{\nu}-\tilde{T})$ with high probability, thus proving the inequality. Next, we will bound $\mathrm{dim}(T_{\nu}-\tilde{T})$ in terms of $\|\boldsymbol{\mathit{z}}\|$ in two cases.

When $\nu\notin{\cal I}$

From the definition of the important levels, we have

$\mathrm{dim}(T_{\nu}-\tilde{T})\leq d_{\nu}\leq\frac{O(\epsilon)}{\log^{3}% \frac{n}{\epsilon}}\sum_{\nu^{\prime}<\nu}d_{\nu^{\prime}}.$

Now, we have $\sum_{\nu^{\prime}<\nu}d_{\nu^{\prime}}\approx\sum_{i=1}^{b_{\nu-1}}\alpha_{i}% ^{2}$ because $\alpha_{i}\sim N(0,1)$ is gaussian and the norm square of gaussian vector is concentrated to its dimension (Lemma 10). Since $\alpha_{i}=z_{i}/\lambda_{i}^{K}$ , we have that

$\sum_{\nu^{\prime}<\nu}d_{\nu^{\prime}}\approx\sum_{i=1}^{b_{\nu-1}}\alpha_{i}% ^{2}=\sum_{i=1}^{b_{\nu-1}}\frac{z_{i}^{2}}{\lambda_{i}^{2K}}\leq\|\boldsymbol% {\mathit{z}}\|^{2}/\lambda_{b_{\nu-1}}^{2K}.$

Therefore, we have

$\|\Pi_{\nu}\boldsymbol{\mathit{z}}\|^{2}\leq\lambda_{a_{\nu}}^{2K}\mathrm{dim}% (T_{\nu}-\tilde{T})\leq\mathopen{}\mathclose{{\left(\frac{\lambda_{a_{\nu}}}{% \lambda_{{}_{b_{\nu-1}}}}}}\right)^{2K}\frac{O(\epsilon)}{\log^{3}\frac{n}{% \epsilon}}\|\boldsymbol{\mathit{z}}\|^{2}\leq O(\epsilon)\|\boldsymbol{\mathit% {z}}\|^{2}$

where the last inequality is trivial because $\lambda_{b_{\nu-1}}\geq\lambda_{a_{\nu}}$ by definition.

When $\nu\in{\cal I}$

In this case, according to (3.1), we can assume

$\mathrm{dim}(T_{\nu}-\tilde{T})\lesssim\epsilon d_{\nu}.$

Again, by Lemma 10, we have that $d_{\nu}\approx\sum_{i=a_{\nu}}^{b_{\nu}}\alpha_{i}^{2}$ because $\alpha_{i}\sim N(0,1)$ is gaussian. Since $\alpha_{i}=z_{i}/\lambda_{i}^{K}$ , we have

$d_{\nu}\approx\sum_{i=a_{\nu}}^{b_{\nu}}\alpha_{i}^{2}=\sum_{i=a_{\nu}}^{b_{% \nu}}\frac{z_{i}^{2}}{\lambda_{i}^{2K}}\leq\|\boldsymbol{\mathit{z}}\|^{2}/% \lambda_{b_{\nu}}^{2K}.$

Therefore,

$\|\Pi_{\nu}\boldsymbol{\mathit{z}}\|^{2}\leq\lambda_{a_{\nu}}^{2K}\mathrm{dim}% (T_{\nu}-\tilde{T})\leq\mathopen{}\mathclose{{\left(\frac{\lambda_{a_{\nu}}}{% \lambda_{{}_{b_{\nu}}}}}}\right)^{2K}\epsilon\|\boldsymbol{\mathit{z}}\|^{2}% \leq O(\epsilon)\|z\|^{2}$

where the last inequality is because $\mathopen{}\mathclose{{\left(\frac{\lambda_{a_{\nu}}}{\lambda_{b_{\nu}}}}}% \right)^{2K}\leq\mathopen{}\mathclose{{\left(\frac{1-\frac{\nu\epsilon}{5\log% \frac{n}{\epsilon}}}{1-\frac{(\nu+1)\epsilon}{5\log\frac{n}{\epsilon}}}}}% \right)^{2K}\approx\mathopen{}\mathclose{{\left(1+\frac{\epsilon}{2\log\frac{n% }{\epsilon}}}}\right)^{2K}\approx O(1).$

From these three cases, we can conclude that if $\mathrm{dim}(T_{\nu}-\tilde{T})$ is small for all $\nu\in\mathcal{I}$ , then $\boldsymbol{\mathit{w}}^{\top}\boldsymbol{\mathit{V}}\boldsymbol{\mathit{w}}% \leq 35\epsilon$ , proving our claim.

We defer the formal proofs of the claims made in this section to the full version.

4 Conditional Lower Bounds for an Adaptive Adversary

In this section, we will prove a conditional hardness result for algorithms against adaptive adversaries. In particular, we will prove Theorem 6. Consider Algorithm 4 for solving Problem 3. The only step in Algorithm 4 whose implementation is not specified is Line 8. We will implement this step using an algorithm for Problem 1.

Algorithm 4 Algorithm for Checking PSDness.

High-level idea

Overall for our hardness result, we use the idea that an adaptive adversary can use the maximum eigenvectors returned to perform an update. This can happen $n$ times and in the process, we would recover the entire eigen-decomposition of the matrix, which is hard. Now consider Algorithm 4. We claim that Algorithm 4 solves Problem 3. At the first glance, this claim looks suspicious because the input matrix for Problem 3 might not be PSD, but the dynamic algorithm for Problem 1 at Line 8 has any guarantees only when the matrices remain PSD. However, the reduction does work by crucially exploiting Property 5. The high-level idea is as follows.

$\blacksquare$

If the input matrix $\boldsymbol{\mathit{A}}$ is initially PSD, then we can show that $\boldsymbol{\mathit{A}}_{t}$ remains PSD for all $t$ by exploiting Property 5, (see Lemma 21). So, the approximation guarantee of the algorithm at Line 8 is valid at all steps. From this guarantee, $\|\boldsymbol{\mathit{A}}_{T}\|$ must be tiny since we keep decreasing the approximately maximum eigenvalues (see Lemma 22). At the end, the reduction will return $\boldsymbol{\mathit{X}}$ .
$\blacksquare$

If the input matrix $\boldsymbol{\mathit{A}}$ is initially not PSD, there must exist a direction $\boldsymbol{\mathit{v}}$ such that $\boldsymbol{\mathit{v}}^{\top}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{v}}<0$ . Since in the reduction, we update $\boldsymbol{\mathit{A}}_{T}=\boldsymbol{\mathit{A}}-\boldsymbol{\mathit{W}}$ for some $\boldsymbol{\mathit{W}}\succeq 0$ , we must have that $\boldsymbol{\mathit{v}}^{\top}\boldsymbol{\mathit{A}}_{T}\boldsymbol{\mathit{v% }}<\boldsymbol{\mathit{v}}^{\top}\boldsymbol{\mathit{A}}\boldsymbol{\mathit{v}}$ . That is, this negative direction remains negative or gets even more negative. It does not matter at all what guarantees the algorithm at Line 8 has. We still have that $\|\boldsymbol{\mathit{A}}_{T}\|$ cannot be tiny. We can distinguish whether $\|\boldsymbol{\mathit{A}}_{T}\|$ is tiny or not using the static power method at Line 11, and, hence, we will return False in this case (see Lemma 22).

Analysis

We prove the guarantees of the output of Algorithm 4 when $\boldsymbol{\mathit{w}}_{t}$ ’s satisfy Property 5 for all $t$ .

Lemma 21.

In Algorithm 4, let $\boldsymbol{\mathit{w}}_{t}$ ’s, $t=1,\cdots,T$ be generated such that they additionally satisfy Property 5. If $\boldsymbol{\mathit{A}}_{0}\succeq 0$ , then $\boldsymbol{\mathit{A}}_{t}\succeq 0$ for all $t$ .

We would like to point out that our parameter $\epsilon$ is quite large. This just implies that our reduction can work even if we find crude approximations to the maximum eigenvector as long as this is along the directions with large eigenvalue, since $\boldsymbol{\mathit{w}}$ also has to satisfy Property 5. We defer the proof of the above to the Appendix.

Lemma 22.

In Algorithm 4, let $\boldsymbol{\mathit{w}}_{t}$ ’s, $t=1,\cdots,T$ be generated such that they additionally satisfy Property 5.

$\blacksquare$

If $\boldsymbol{\mathit{A}}\succeq 0$ , then Algorithm 4 returns $\boldsymbol{\mathit{X}}$ such that $\|\boldsymbol{\mathit{A}}-\boldsymbol{\mathit{X}}\boldsymbol{\mathit{X}}^{\top% }\|\leq\delta\min_{\|\boldsymbol{\mathit{x}}\|=1}\|\boldsymbol{\mathit{A}}% \boldsymbol{\mathit{x}}\|$ .
$\blacksquare$

If $\boldsymbol{\mathit{A}}$ is not psd, then Algorithm 4 returns False.

Proof of Theorem 6

We are now ready to prove our conditional lower bound.

Proof.

Let $\mathcal{M}(\epsilon,\boldsymbol{\mathit{A}}_{0},\boldsymbol{\mathit{v}}_{1},% \cdots,\boldsymbol{\mathit{v}}_{T})$ denote an algorithm for Problem 1 that maintains an $\epsilon$ -max eigenvalue (1), $\mu_{t}$ , and eigenvector (2), $\boldsymbol{\mathit{w}}_{t}$ , for matrices $\boldsymbol{\mathit{A}}_{t}=\boldsymbol{\mathit{A}}_{t-1}-\boldsymbol{\mathit{% v}}_{t}\boldsymbol{\mathit{v}}_{t}^{\top}$ such that $\boldsymbol{\mathit{w}}_{t}$ ’s satisfy Property 5. We will show that if the total update time of $\mathcal{M}$ is $n^{o(1)}\cdot\mathopen{}\mathclose{{\left(nnz(\boldsymbol{\mathit{A}}_{0})+% \sum_{t=1}^{T}nnz(\boldsymbol{\mathit{v}}_{i})}}\right)$ , then there is an $n^{2+o(1)}$ -time algorithm for Problem 3 which contradicts Hypothesis 4.

Given an instance $(\delta,\kappa,\boldsymbol{\mathit{A}})$ of Problem 3, we will run Algorithm 4 where Line 8 is implemented using $\mathcal{M}$ . We will generate the input and the update sequence for $\mathcal{M}$ as follows. Set $\boldsymbol{\mathit{A}}_{0}\leftarrow\boldsymbol{\mathit{A}}$ . Set $\epsilon$ and $T$ according to Algorithm 4. For $1\leq t\leq T$ , we set $\boldsymbol{\mathit{v}}_{t}=\frac{1}{\sqrt{10}}\cdot\sqrt{\mu_{t-1}}% \boldsymbol{\mathit{w}}_{t-1}$ according to Line 7 of Algorithm 4. We note that this is a valid update sequence for Problem 1 when $\boldsymbol{\mathit{A}}\succeq 0$ since from Lemma 21, if $\boldsymbol{\mathit{A}}\succeq 0$ then, $\boldsymbol{\mathit{A}}_{t}=\boldsymbol{\mathit{A}}-\sum_{i=0}^{t-1}\frac{\mu_% {i}}{10}\boldsymbol{\mathit{w}}_{i}\boldsymbol{\mathit{w}}_{i}^{\top}\succeq 0$ .

Now, we describe what we return as an answer for Problem 3. From Lemma 22 if $\boldsymbol{\mathit{A}}$ is not PSD, then Algorithm 4 returns False and reports the matrix is not PSD. Additionally if $\boldsymbol{\mathit{A}}\succeq 0$ , the algorithm returns $\boldsymbol{\mathit{X}}=\frac{1}{\sqrt{10}}\begin{bmatrix}\sqrt{\mu_{1}}% \boldsymbol{\mathit{w}}_{1}&\sqrt{\mu_{2}}\boldsymbol{\mathit{w}}_{2}&\cdots&% \sqrt{\mu_{T}}\boldsymbol{\mathit{w}}_{T}\end{bmatrix}$ as a certificate that $\boldsymbol{\mathit{A}}$ is PSD. This completes the reduction from Problem 3 to Problem 1.

The total time required by the reduction is

O\mathopen{}\mathclose{{\left(n^{o(1)}\cdot(nnz(\boldsymbol{\mathit{A}})+\sum_% {t=1}^{T}nnz(\boldsymbol{\mathit{w}}_{t}))}}\right)\leq O\mathopen{}\mathclose% {{\left(n^{o(1)}\cdot(n^{2}+T\cdot n)}}\right)\leq O(n^{2+o(1)}\log\frac{% \kappa}{\delta}),

which is at most $n^{2+o(1)}$ when $\frac{\kappa}{\delta}\leq\mathrm{poly}(n)$ .

To conclude, we indeed obtain an $n^{2+o(1)}$ -time algorithm for Problem 3. Assuming Hypothesis 4, the algorithm $\mathcal{M}$ cannot have $n^{o(1)}\cdot\mathopen{}\mathclose{{\left(nnz(\boldsymbol{\mathit{A}}_{0})+% \sum_{t=1}^{T}nnz(\boldsymbol{\mathit{v}}_{i})}}\right)$ total update time. $\hfill\blacktriangleleft$

5 Conclusion and Open Problems

Upper Bounds

We have presented a novel extension of the power method to the dynamic setting. Our algorithm from Theorem 2 maintains a multiplicative $(1+\epsilon)$ -approximate maximum eigenvalue and eigenvector of a positive semi-definite matrix that undergoes decremental updates from an oblivious adversary. The algorithm has polylogarithmic amortized update time per non-zeros in the updates.

Our algorithm is simple, but our analysis is quite involved. While we believe a tighter analysis that improves our logarithmic factors is possible, it is an interesting open problem to give a simpler analysis for our algorithm. Other natural questions are whether we can get similar algorithms in incremental or fully dynamic settings and whether one can get a worst-case update time.

Lower Bounds

We have shown a conditional lower bound for a class of algorithms against an adaptive adversary in Theorem 6. It would also be very exciting to generalize our lower bound to hold for any algorithm against an adaptive adversary, as that would imply an oblivious-vs-adaptive separation for a natural dynamic problem.

Incremental Updates

We believe that the corresponding incremental updates problem, i.e., we update the matrix as $\boldsymbol{\mathit{A}}_{t}\leftarrow\boldsymbol{\mathit{A}}_{t-1}+\boldsymbol% {\mathit{v}}_{t}\boldsymbol{\mathit{v}}_{t}^{\top}$ cannot be solved in polylogarithmic amortized update time, even when the update sequence $\boldsymbol{\mathit{v}}_{t}$ ’s are from an oblivious adversary. At a high level, the incremental version of our problem seems significantly harder for the following reasons. When we perform decremental updates to a matrix, the new maximum eigenvector must be a part of the eigenspace spanned by the original maximum eigenvectors. Furthermore, it is easy to detect whether the maximum eigenvalue has gone down as we have shown in our paper. For the incremental setting, it is possible that after an update the maximum eigenvalue has gone up and the new maximum eigenvector is a direction that was not the previous one or the update direction and in such cases we cannot really detect this quickly with known information on previous eigenvectors and update directions. This can also happen $n$ times and in every such case, we have to compute the eigenvalue and eigenvector from scratch. Therefore, we leave lower bounds and algorithms for incremental setting as an open problem.

Dynamic SDPs

As discussed in Appendix A, Theorem 2 can be viewed as a starting point towards a dynamic algorithm for general positive semi-definite programs. Can we make further progress? The dynamic semi-definite program problem, even with just two matrix constraints, already seems to be challenging.

One promising approach to attack this problem is to dynamize the matrix multiplicative weight update (MMWU) method for solving a packing/covering SDP [23] since the corresponding approach was successful for linear programs – the near-optimal algorithms of [11] are essentially dynamized multiplicative weight update methods for positive linear programs. However, in our preliminary study exploring this approach, we could only obtain an algorithm that solves Problem 24, which has a single matrix constraint, and solves Problem 1 partially, i.e., maintains an approximate eigenvalue only. The main barrier in this approach is that the algorithm requires maintaining the exponential of the sum of the constraint matrices, and to do this fast, we require that for any two constraint matrices $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\mathit{B}}$ , $e^{\boldsymbol{\mathit{A}}+\boldsymbol{\mathit{B}}}=e^{\boldsymbol{\mathit{A}}% }e^{\boldsymbol{\mathit{B}}}$ which only holds when $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\mathit{B}}$ commute i.e., $\boldsymbol{\mathit{A}}\boldsymbol{\mathit{B}}=\boldsymbol{\mathit{B}}% \boldsymbol{\mathit{A}}$ . Note that when $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\mathit{B}}$ are diagonal, this is true; therefore, we can obtain the required algorithms for positive LPs. Even when we have just two constraint matrices where one of them is a diagonal matrix, this remains an issue as the matrices still do not commute.

References

[1] Zeyuan Allen-Zhu, Yin Tat Lee, and Lorenzo Orecchia. Using optimization to obtain a width-independent, parallel, simpler, and faster positive sdp solver. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 1824–1831. SIAM, 2016. doi:10.1137/1.9781611974331.CH127.
[2] Zeyuan Allen-Zhu and Yuanzhi Li. First efficient convergence for streaming $k$ -pca: a global, gap-free, and near-optimal rate. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 487–492. IEEE, 2017. doi:10.1109/FOCS.2017.51.
[3] Zeyuan Allen-Zhu and Lorenzo Orecchia. Nearly-linear time positive lp solver with faster convergence rate. In Proceedings of the forty-seventh annual ACM symposium on Theory of Computing, pages 229–236, 2015. doi:10.1145/2746539.2746573.
[4] Zeyuan Allen-Zhu and Lorenzo Orecchia. Nearly linear-time packing and covering lp solvers: Achieving width-independence and-convergence. Mathematical Programming, 175:307–353, 2019. doi:10.1007/S10107-018-1244-X.
[5] Peter Arbenz, Walter Gander, and Gene H Golub. Restricted rank modification of the symmetric eigenvalue problem: Theoretical considerations. Linear Algebra and its Applications, 104:75–95, 1988.
[6] Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach to semidefinite programs. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 227–236, 2007. doi:10.1145/1250790.1250823.
[7] Jess Banks, Jorge Garza-Vargas, Archit Kulkarni, and Nikhil Srivastava. Pseudospectral shattering, the sign function, and diagonalization in nearly matrix multiplication time. Foundations of Computational Mathematics, pages 1–89, 2022.
[8] MohammadHossein Bateni, Hossein Esfandiari, Hendrik Fichtenberger, Monika Henzinger, Rajesh Jayaram, Vahab Mirrokni, and Andreas Wiese. Optimal fully dynamic k-center clustering for adaptive and oblivious adversaries. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2677–2727. SIAM, 2023. doi:10.1137/1.9781611977554.CH101.
[9] Amos Beimel, Haim Kaplan, Yishay Mansour, Kobbi Nissim, Thatchaphol Saranurak, and Uri Stemmer. Dynamic algorithms against an adaptive adversary: Generic constructions and lower bounds. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1671–1684, 2022. doi:10.1145/3519935.3520064.
[10] Rajarshi Bhattacharjee, Gregory Dexter, Cameron Musco, Archan Ray, and David P Woodruff. Universal matrix sparsifiers and fast deterministic algorithms for linear algebra. arXiv preprint arXiv:2305.05826, 2023.
[11] Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. Dynamic algorithms for packing-covering lps via multiplicative weight updates. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1–47. SIAM, 2023. doi:10.1137/1.9781611977554.CH1.
[12] James R Bunch, Christopher P Nielsen, and Danny C Sorensen. Rank-one modification of the symmetric eigenproblem. Numerische Mathematik, 31(1):31–48, 1978.
[13] Gene H Golub. Some modified matrix eigenvalue problems. SIAM review, 15(2):318–334, 1973.
[14] Garud Iyengar, David J Phillips, and Cliff Stein. Feasible and accurate algorithms for covering semidefinite programs. In Scandinavian Workshop on Algorithm Theory, pages 150–162. Springer, 2010.
[15] Garud Iyengar, David J Phillips, and Clifford Stein. Approximating semidefinite packing programs. SIAM Journal on Optimization, 21(1):231–268, 2011. doi:10.1137/090762671.
[16] Arun Jambulapati, Yin Tat Lee, Jerry Li, Swati Padmanabhan, and Kevin Tian. Positive semidefinite programming: Mixed, parallel, and width-independent, 2021. arXiv:2002.04830.
[17] Philip Klein and Hsueh-I Lu. Efficient approximation algorithms for semidefinite programs arising from max cut and coloring. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 338–347, 1996. doi:10.1145/237814.237980.
[18] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. Annals of statistics, pages 1302–1338, 2000.
[19] Yin Tat Lee and He Sun. An sdp-based algorithm for linear-sized spectral sparsification. In Proceedings of the 49th annual acm sigact symposium on theory of computing, pages 678–687, 2017. doi:10.1145/3055399.3055477.
[20] Michael Luby and Noam Nisan. A parallel approximation algorithm for positive linear programming. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 448–457, 1993. doi:10.1145/167088.167211.
[21] Roy Mitz, Nir Sharon, and Yoel Shkolnisky. Symmetric rank-one updates from partial spectrum with an application to out-of-sample extension. SIAM Journal on Matrix Analysis and Applications, 40(3):973–997, 2019. doi:10.1137/18M1172120.
[22] Lorenzo Orecchia, Sushant Sachdeva, and Nisheeth K Vishnoi. Approximating the exponential, the lanczos method and an o (m)-time spectral algorithm for balanced separator. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 1141–1160, 2012. doi:10.1145/2213977.2214080.
[23] Richard Peng, Kanat Tangwongsan, and Peng Zhang. Faster and simpler width-independent parallel algorithms for positive semidefinite programming. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 101–108, 2012.
[24] Serge A Plotkin, David B Shmoys, and Éva Tardos. Fast approximation algorithms for fractional packing and covering problems. Mathematics of Operations Research, 20(2):257–301, 1995. doi:10.1287/MOOR.20.2.257.
[25] Kent Quanrud. Nearly linear time approximations for mixed packing and covering problems without data structures or randomization. In Symposium on Simplicity in Algorithms, pages 69–80. SIAM, 2020. doi:10.1137/1.9781611976014.11.
[26] Piotr Sankowski. Dynamic transitive closure via dynamic matrix inverse. In 45th Annual IEEE Symposium on Foundations of Computer Science, pages 509–517. IEEE, 2004.
[27] Peter Stange. On the efficient update of the singular value decomposition. In PAMM: Proceedings in Applied Mathematics and Mechanics, volume 8, pages 10827–10828. Wiley Online Library, 2008.
[28] William Swartworth and David P Woodruff. Optimal eigenvalue approximation via sketching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 145–155, 2023. doi:10.1145/3564246.3585102.
[29] Lloyd N Trefethen and David Bau. Numerical linear algebra, volume 181. Siam, 2022.
[30] Luca Trevisan. Parallel approximation algorithms by positive linear programming. Algorithmica, 21(1):72–88, 1998. doi:10.1007/PL00009209.
[31] Luca Trevisan. Max cut and the smallest eigenvalue. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 263–272, 2009. doi:10.1145/1536414.1536452.
[32] Jan van den Brand, Danupon Nanongkai, and Thatchaphol Saranurak. Dynamic matrix inverse: Improved algorithms and matching conditional lower bounds. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 456–480. IEEE, 2019. doi:10.1109/FOCS.2019.00036.
[33] Di Wang, Satish Rao, and Michael W Mahoney. Unified acceleration method for packing and covering problems via diameter reduction. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016.

Appendix A Connections to Dynamic Positive Semi-definite Programs

This section discusses connections between our Problem 1 and the dynamic versions of positive semi-definite programs. Using this connection, we conclude that Theorem 2 implies a dynamic algorithm for a special case of the dynamic covering SDP problem.

We first define packing and covering semi-definite programs (SDPs).⁵⁵5Some papers [16] flip our definition of packing and covering SDPs by considering their dual form.

Definition 23 (Packing/Covering SDP).

Let $\boldsymbol{\mathit{C}},\boldsymbol{\mathit{A}}_{i}$ ’s for $i=1,2,...,m$ be $n\times n$ symmetric PSD matrices and $b_{i}$ ’s denote positive real numbers. The packing SDP problem asks to find

\max_{\boldsymbol{\mathit{Y}}\succeq 0}\operatorname{Tr}[\boldsymbol{\mathit{C% }}\boldsymbol{\mathit{Y}}]\quad\text{s.t. }\operatorname{Tr}[\boldsymbol{% \mathit{A}}_{i}\boldsymbol{\mathit{Y}}]\leq b_{i},\quad\forall i=1,\cdots,m.

The covering SDP problem asks to find

\min_{\boldsymbol{\mathit{Y}}\succeq 0}\operatorname{Tr}[\boldsymbol{\mathit{C% }}\boldsymbol{\mathit{Y}}]\quad\text{s.t. }\operatorname{Tr}[\boldsymbol{% \mathit{A}}_{i}\boldsymbol{\mathit{Y}}]\geq b_{i},\quad\forall i=1,\cdots,m.

Note that when the matrices $\boldsymbol{\mathit{C}}$ and $\boldsymbol{\mathit{A}}_{i}$ ’s are all diagonal matrices, the packing and covering SDP problems above are precisely the well-studied packing and covering LP problems, respectively. Near-linear time algorithms for $(1+\epsilon)$ -approximately solving packing and covering LPs are very well-studied in a long line of work, some of which are [3, 4, 33, 25], and these problems have many applications, such as in graph embedding [24], approximation algorithms [20, 30], scheduling [24], to name a few.

Dynamic LPs

Near-optimal dynamic algorithms for packing and covering LPs were shown in [11]. The paper studies two kinds of updates – restricting and relaxing updates. Restricting updates can only shrink the feasible region. In contrast, relaxing updates can only grow the feasible region. In [11], the authors gave a deterministic algorithm that can maintain a $(1+\epsilon)$ -approximate solution to either packing and covering LPs that undergo only restricting updates or only relaxing updates in total time $\widetilde{O}(N/\epsilon^{3}+t/\epsilon)$ , where $N$ is the total number of nonzeros in the initial input and the updates, and $t$ is the number of updates. Hence, this is optimal up to logarithmic factors.

A natural question is whether one can generalize the near-optimal dynamic LP algorithms with polylogarithmic overhead by [11] to work with SDPs since SDPs capture many further applications such as maximum cuts [15, 17], Sparse PCA [15], sparsest cuts [14], and balanced separators [22], among many others.

Static SDPs

Unfortunately, the algorithms for solving packing and covering SDPs are much more limited, even in the static setting. Near-linear time algorithms are known only for covering SDPs when the cost matrix $\boldsymbol{\mathit{C}}=\boldsymbol{\mathit{I}}$ is the identity [23, 1].

The fundamental barrier to working with general psd matrix $\boldsymbol{\mathit{C}}$ in covering SDPs is that it is as hard as approximating the minimum eigenvalue of $\boldsymbol{\mathit{C}}$ (consider the program $\max_{\boldsymbol{\mathit{Y}}\succeq 0}\operatorname{Tr}[\boldsymbol{\mathit{C% }}\boldsymbol{\mathit{Y}}]$ such that $\operatorname{Tr}[\boldsymbol{\mathit{Y}}]\leq 1$ ). To the best of our knowledge, near-linear time algorithms for approximating the minimum eigenvalue assume a near-linear-time solver for $\boldsymbol{\mathit{C}}$ , i.e., to compute $\boldsymbol{\mathit{C}}^{-1}\boldsymbol{\mathit{x}}$ in the near-linear time given $\boldsymbol{\mathit{x}}$ . This can be done, for example, by applying the power method to $\boldsymbol{\mathit{C}}^{-1}$ . When $\boldsymbol{\mathit{C}}^{-1}$ admits a fast solver, sometimes one can approximately solve a covering SDP fast, such as for spectral sparsification of graphs [19], and the max-cut problem [6, 31].

For packing SDPs, there is simply no near-linear time algorithm known. An algorithm for approximately solving packing SDPs and even the generalization to mixed packing-covering SDPs was claimed in [16], but there is an issue in the convergence analysis even for pure packing SDPs. Fast algorithms for this problem, hence, remain open.

Dynamic SDPs

Since near-linear time static algorithms are prerequisites for dynamic algorithms with polylogarithmic overhead, we can only hope for a dynamic algorithm for covering SDPs when $\boldsymbol{\mathit{C}}$ is an identity. Below, we will show that our algorithm for Theorem 2 implies a dynamic algorithm for maintaining the covering SDP solution when there is a single constraint and the updates are restricting. This follows because this problem is equivalence to Problem 1.

We first define the dynamic covering problem with a single constraint under restricting updates.

Problem 24 (Covering SDP with a Single Matrix Constraint under Restricting Updates).

Given $\boldsymbol{\mathit{A}}_{0}\succeq 0$ , an accuracy parameter $\epsilon>0$ , and an online sequence of vectors $\boldsymbol{\mathit{v}}_{1},\boldsymbol{\mathit{v}}_{2},\cdots,\boldsymbol{% \mathit{v}}_{T}$ that update $\boldsymbol{\mathit{A}}_{t}\leftarrow\boldsymbol{\mathit{A}}_{t}-\boldsymbol{% \mathit{v}}_{t}\boldsymbol{\mathit{v}}_{t}^{\top}$ such that $\boldsymbol{\mathit{A}}_{t}\succeq 0$ . The problem asks to explicitly maintain, for all $t$ , an $(1+\epsilon)$ -approximate optimal value $\nu_{t}$ of the SDP, i.e.,

\nu_{t}\leq(1+\epsilon)OPT_{t}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}{\min}% _{\boldsymbol{\mathit{Y}}\succeq 0,\operatorname{Tr}[\boldsymbol{\mathit{A}}_{% t}\boldsymbol{\mathit{Y}}]\geq 1}\operatorname{Tr}[\boldsymbol{\mathit{Y}}].

Furthermore, given a query request, return a matrix $\boldsymbol{\mathit{Q}}^{(t)}$ where $\boldsymbol{\mathit{Y}}^{(t)}=\boldsymbol{\mathit{Q}}^{(t)}{\boldsymbol{% \mathit{Q}}^{(t)}}^{\top}\in\mathbb{R}^{n\times n}$ is a $(1+\epsilon)$ -approximate optimal solution, i.e.,

\operatorname{Tr}[\boldsymbol{\mathit{A}}_{t}\boldsymbol{\mathit{Y}}^{(t)}]% \geq 1\text{ and }\operatorname{Tr}[\boldsymbol{\mathit{Y}}^{(t)}]\leq% \mathopen{}\mathclose{{\left(1+\epsilon}}\right)OPT_{t}.

Problem 24 is equivalent to Problem 1 in the following sense: given an algorithm for Problem 1, we can obtain an algorithm for Problem 24 with the same total update time and optimal query time. Conversely, given an algorithm for Problem 24, we can obtain an algorithm for Problem 1 in the eigenvalue-only version with the same total update time.

Proposition 25.

The following holds,

1.

Given an algorithm for Problem 1 with update time $\mathcal{T}$ , there is an algorithm for Problem 24 with update time $\mathcal{T}$ and query time $O(n)$ , i.e., the time required to query $\boldsymbol{\mathit{Q}}^{(t)}$ at any time $t$ .
2.

Given an algorithm for Problem 24 with update time $\widetilde{\mathcal{T}}$ , there is an algorithm for Problem 1 that only maintains the approximate eigenvalues with update time at most $\widetilde{\mathcal{T}}$ .

Proof.

Let us first characterize the solution to Problem 24 at any $t$ for $\epsilon=0$ . Since any feasible solution $\boldsymbol{\mathit{Y}}$ is PSD it can be characterized as, $\boldsymbol{\mathit{Y}}=\sum_{i=1}^{n}p_{i}\boldsymbol{\mathit{y}}_{i}% \boldsymbol{\mathit{y}}_{i}^{\top}$ for some unit vectors $\boldsymbol{\mathit{y}}_{i}$ ’s. Now $\operatorname{Tr}[\boldsymbol{\mathit{Y}}]=\sum_{i}p_{i}$ and

1\leq\operatorname{Tr}[\boldsymbol{\mathit{A}}_{t}\boldsymbol{\mathit{Y}}]=% \sum_{i=1}^{n}p_{i}\operatorname{Tr}[\boldsymbol{\mathit{A}}_{t}\boldsymbol{% \mathit{y}}_{i}\boldsymbol{\mathit{y}}_{i}^{\top}]=\sum_{i=1}^{n}p_{i}% \boldsymbol{\mathit{y}}_{i}^{\top}\boldsymbol{\mathit{A}}_{t}\boldsymbol{% \mathit{y}}_{i}\leq\lambda_{\max}(\boldsymbol{\mathit{A}}_{t})\sum_{i=1}^{n}p_% {i}.

The above implies that for any feasible solution $\boldsymbol{\mathit{Y}}^{(t)}$ , it must hold that $\sum_{i=1}^{n}p_{i}\geq 1/\lambda_{\max}(\boldsymbol{\mathit{A}}_{t})$ , and equality holds iff $\boldsymbol{\mathit{y}}_{i}$ ’s are the maximum eigenvector of $\boldsymbol{\mathit{A}}_{t}$ for all $t$ . Since the problem is asking to minimize $\sum_{i=1}^{n}p_{i}$ , the solution must be $\boldsymbol{\mathit{Y}}^{(t)}=\frac{1}{\lambda_{\max}(\boldsymbol{\mathit{A}}_% {t})}\boldsymbol{\mathit{u}}\boldsymbol{\mathit{u}}^{\top}$ where $\boldsymbol{\mathit{u}}$ is the maximum eigenvector of $\boldsymbol{\mathit{A}}_{t}$ . Note that here $\boldsymbol{\mathit{Q}}^{(t)}=\frac{1}{\sqrt{\lambda_{\max}(\boldsymbol{% \mathit{A}}_{t})}}\boldsymbol{\mathit{u}}$ .

We now show the first part, by proving that at any $t$ , given $\epsilon$ , the solution to Problem 1 gives a solution to Problem 24. Let $\lambda_{t}$ and $\boldsymbol{\mathit{w}}_{t}$ denote an $\epsilon/2$ -approximate solution to Problem 1 for some $t$ . Consider the solution $\boldsymbol{\mathit{Q}}^{(t)}=\frac{1}{\sqrt{(1-\epsilon/2)\lambda_{t}}}% \boldsymbol{\mathit{w}}_{t}$ which gives $\boldsymbol{\mathit{Y}}^{(t)}=\frac{1}{(1-\epsilon/2)\lambda_{t}}\boldsymbol{% \mathit{w}}_{t}\boldsymbol{\mathit{w}}_{t}^{\top}$ . Then,

\operatorname{Tr}[\boldsymbol{\mathit{A}}_{t}\boldsymbol{\mathit{Y}}^{(t)}]=% \frac{1}{(1-\epsilon/2)\lambda_{t}}\boldsymbol{\mathit{w}}_{t}^{\top}% \boldsymbol{\mathit{A}}_{t}\boldsymbol{\mathit{w}}_{t}\geq(1-\epsilon/2)\frac{% \lambda_{\max}(\boldsymbol{\mathit{A}}_{t})}{(1-\epsilon/2)\lambda_{\max}(% \boldsymbol{\mathit{A}})}\geq 1.

Therefore, $\boldsymbol{\mathit{Y}}^{(t)}$ satisfies the constraints of Problem 24. Next we look at the objective value, $\nu_{t}=\operatorname{Tr}[\boldsymbol{\mathit{Y}}^{(t)}]=\frac{1}{(1-\epsilon/% 2)\lambda_{t}}\leq\frac{1}{(1-\epsilon/2)^{2}\lambda_{\max}(\boldsymbol{% \mathit{A}}_{t})}\leq(1+\epsilon)OPT_{t}.$ We can maintain $\boldsymbol{\mathit{Q}}^{(t)}$ by just maintaining $\boldsymbol{\mathit{w}}_{t},\lambda_{t}$ which requires no extra time. The time required to obtain $\boldsymbol{\mathit{Q}}^{(t)}$ , which is the query time, from $\boldsymbol{\mathit{w}}_{t}$ and $\lambda_{t}$ is at most $O(nnz(\boldsymbol{\mathit{w}}_{t}))=O(n)$ and the value of $\nu_{t}$ can be obtained in $O(1)$ time.

To see the other direction, consider the solution $\boldsymbol{\mathit{Q}}^{(t)}$ , $\nu_{t}$ of Problem 24. We can set $\lambda_{t}=\frac{1}{\nu_{t}}$ . This implies that,

\lambda_{t}=\frac{1}{\nu_{t}}\geq\frac{\lambda_{\max}(\boldsymbol{\mathit{A}}_% {t})}{(1+\epsilon)}\geq(1-\epsilon)\lambda_{\max}(\boldsymbol{\mathit{A}}_{t}),

as required. In this case, we do not require any extra time. $\hfill\blacktriangleleft$

By plugging Theorem 2 into Proposition 25, we conclude the following.

Corollary 26.

There is a randomized algorithm for Problem 24 under a sequence of $T$ restricting updates, that given $n,\boldsymbol{\mathit{A}}_{0}$ and $\epsilon>1/n$ as input, with probability at least $1-1/n$ works against an oblivious adversary in total update time

O\mathopen{}\mathclose{{\left(\frac{\log^{3}n\log^{6}\frac{n}{\epsilon}\log% \frac{\lambda_{\max}(\boldsymbol{\mathit{A}}_{0})}{\lambda_{\max}(\boldsymbol{% \mathit{A}}_{T})}}{\epsilon^{4}}\mathopen{}\mathclose{{\left(nnz(\boldsymbol{% \mathit{A}}_{0})+\sum_{i=1}^{T}nnz(\boldsymbol{\mathit{v}}_{i})}}\right)}}% \right),

and query time $O(n)$ .

[bib.bib1] [1] Zeyuan Allen-Zhu, Yin Tat Lee, and Lorenzo Orecchia. Using optimization to obtain a width-independent, parallel, simpler, and faster positive sdp solver. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 1824–1831. SIAM, 2016. doi:10.1137/1.9781611974331.CH127.

[bib.bib2] [2] Zeyuan Allen-Zhu and Yuanzhi Li. First efficient convergence for streaming $k$ -pca: a global, gap-free, and near-optimal rate. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 487–492. IEEE, 2017. doi:10.1109/FOCS.2017.51.

[bib.bib3] [3] Zeyuan Allen-Zhu and Lorenzo Orecchia. Nearly-linear time positive lp solver with faster convergence rate. In Proceedings of the forty-seventh annual ACM symposium on Theory of Computing, pages 229–236, 2015. doi:10.1145/2746539.2746573.

[bib.bib4] [4] Zeyuan Allen-Zhu and Lorenzo Orecchia. Nearly linear-time packing and covering lp solvers: Achieving width-independence and-convergence. Mathematical Programming, 175:307–353, 2019. doi:10.1007/S10107-018-1244-X.

[bib.bib5] [5] Peter Arbenz, Walter Gander, and Gene H Golub. Restricted rank modification of the symmetric eigenvalue problem: Theoretical considerations. Linear Algebra and its Applications, 104:75–95, 1988.

[bib.bib6] [6] Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach to semidefinite programs. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 227–236, 2007. doi:10.1145/1250790.1250823.

[bib.bib7] [7] Jess Banks, Jorge Garza-Vargas, Archit Kulkarni, and Nikhil Srivastava. Pseudospectral shattering, the sign function, and diagonalization in nearly matrix multiplication time. Foundations of Computational Mathematics, pages 1–89, 2022.

[bib.bib8] [8] MohammadHossein Bateni, Hossein Esfandiari, Hendrik Fichtenberger, Monika Henzinger, Rajesh Jayaram, Vahab Mirrokni, and Andreas Wiese. Optimal fully dynamic k-center clustering for adaptive and oblivious adversaries. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2677–2727. SIAM, 2023. doi:10.1137/1.9781611977554.CH101.

[bib.bib9] [9] Amos Beimel, Haim Kaplan, Yishay Mansour, Kobbi Nissim, Thatchaphol Saranurak, and Uri Stemmer. Dynamic algorithms against an adaptive adversary: Generic constructions and lower bounds. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1671–1684, 2022. doi:10.1145/3519935.3520064.

[bib.bib10] [10] Rajarshi Bhattacharjee, Gregory Dexter, Cameron Musco, Archan Ray, and David P Woodruff. Universal matrix sparsifiers and fast deterministic algorithms for linear algebra. arXiv preprint arXiv:2305.05826, 2023.

[bib.bib11] [11] Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. Dynamic algorithms for packing-covering lps via multiplicative weight updates. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1–47. SIAM, 2023. doi:10.1137/1.9781611977554.CH1.

[bib.bib12] [12] James R Bunch, Christopher P Nielsen, and Danny C Sorensen. Rank-one modification of the symmetric eigenproblem. Numerische Mathematik, 31(1):31–48, 1978.

[bib.bib13] [13] Gene H Golub. Some modified matrix eigenvalue problems. SIAM review, 15(2):318–334, 1973.

[bib.bib14] [14] Garud Iyengar, David J Phillips, and Cliff Stein. Feasible and accurate algorithms for covering semidefinite programs. In Scandinavian Workshop on Algorithm Theory, pages 150–162. Springer, 2010.

[bib.bib15] [15] Garud Iyengar, David J Phillips, and Clifford Stein. Approximating semidefinite packing programs. SIAM Journal on Optimization, 21(1):231–268, 2011. doi:10.1137/090762671.

[bib.bib16] [16] Arun Jambulapati, Yin Tat Lee, Jerry Li, Swati Padmanabhan, and Kevin Tian. Positive semidefinite programming: Mixed, parallel, and width-independent, 2021. arXiv:2002.04830.

[bib.bib17] [17] Philip Klein and Hsueh-I Lu. Efficient approximation algorithms for semidefinite programs arising from max cut and coloring. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 338–347, 1996. doi:10.1145/237814.237980.

[bib.bib18] [18] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. Annals of statistics, pages 1302–1338, 2000.

[bib.bib19] [19] Yin Tat Lee and He Sun. An sdp-based algorithm for linear-sized spectral sparsification. In Proceedings of the 49th annual acm sigact symposium on theory of computing, pages 678–687, 2017. doi:10.1145/3055399.3055477.

[bib.bib20] [20] Michael Luby and Noam Nisan. A parallel approximation algorithm for positive linear programming. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 448–457, 1993. doi:10.1145/167088.167211.

[bib.bib21] [21] Roy Mitz, Nir Sharon, and Yoel Shkolnisky. Symmetric rank-one updates from partial spectrum with an application to out-of-sample extension. SIAM Journal on Matrix Analysis and Applications, 40(3):973–997, 2019. doi:10.1137/18M1172120.

[bib.bib22] [22] Lorenzo Orecchia, Sushant Sachdeva, and Nisheeth K Vishnoi. Approximating the exponential, the lanczos method and an o (m)-time spectral algorithm for balanced separator. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 1141–1160, 2012. doi:10.1145/2213977.2214080.

[bib.bib23] [23] Richard Peng, Kanat Tangwongsan, and Peng Zhang. Faster and simpler width-independent parallel algorithms for positive semidefinite programming. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 101–108, 2012.

[bib.bib24] [24] Serge A Plotkin, David B Shmoys, and Éva Tardos. Fast approximation algorithms for fractional packing and covering problems. Mathematics of Operations Research, 20(2):257–301, 1995. doi:10.1287/MOOR.20.2.257.

[bib.bib25] [25] Kent Quanrud. Nearly linear time approximations for mixed packing and covering problems without data structures or randomization. In Symposium on Simplicity in Algorithms, pages 69–80. SIAM, 2020. doi:10.1137/1.9781611976014.11.

[bib.bib26] [26] Piotr Sankowski. Dynamic transitive closure via dynamic matrix inverse. In 45th Annual IEEE Symposium on Foundations of Computer Science, pages 509–517. IEEE, 2004.

[bib.bib27] [27] Peter Stange. On the efficient update of the singular value decomposition. In PAMM: Proceedings in Applied Mathematics and Mechanics, volume 8, pages 10827–10828. Wiley Online Library, 2008.

[bib.bib28] [28] William Swartworth and David P Woodruff. Optimal eigenvalue approximation via sketching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 145–155, 2023. doi:10.1145/3564246.3585102.

[bib.bib29] [29] Lloyd N Trefethen and David Bau. Numerical linear algebra, volume 181. Siam, 2022.

[bib.bib30] [30] Luca Trevisan. Parallel approximation algorithms by positive linear programming. Algorithmica, 21(1):72–88, 1998. doi:10.1007/PL00009209.

[bib.bib31] [31] Luca Trevisan. Max cut and the smallest eigenvalue. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 263–272, 2009. doi:10.1145/1536414.1536452.

[bib.bib32] [32] Jan van den Brand, Danupon Nanongkai, and Thatchaphol Saranurak. Dynamic matrix inverse: Improved algorithms and matching conditional lower bounds. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 456–480. IEEE, 2019. doi:10.1109/FOCS.2019.00036.

[bib.bib33] [33] Di Wang, Satish Rao, and Michael W Mahoney. Unified acceleration method for packing and covering problems via diameter reduction. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016.

Decremental (1+ϵ)-Approximate Maximum Eigenvector: Dynamic Power Method

Abstract

Keywords and phrases:

Category:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Problem 1 (Decremental Approximate Maximum Eigenvalue and Eigenvector).

1.1 Our Results

Theorem 2.

1.2 Towards Separation between Oblivious and Adaptive adversaries

Problem 3 (Checking psdness with certificate).

Hypothesis 4 (PSDness Checking Barrier).

Property 5.

Theorem 6.

Organization

2 Preliminaries

Definition 7 (ϵ-max span and dimension).

Definition 8.

Lemma 9 (Chernoff Bound).

Lemma 10 (Norm of Gaussian Vector).

Proof.

Lemma 11 (Distribution of χ2 Variable).

Proof.

3 Algorithms against an Oblivious Adversary

Problem 12 (DecMaxEV(ϵ,𝑨0,𝒗1,⋯,𝒗T)).

Lemma 13.

Lemma 14.

Corollary 15.

3.1 Proof Overview

Lemma 16 (Key Lemma).

Definition 17 (Subspaces of 𝑨 and 𝑨~).

Definition 18 (Important ν).

Lemma 19 (Measure of Progress).

Definition 20 (Basis for Tν).

When 𝝂∉𝓘

When 𝝂∈𝓘

4 Conditional Lower Bounds for an Adaptive Adversary

High-level idea

Analysis

Lemma 21.

Lemma 22.

Proof of Theorem 6

Proof.

5 Conclusion and Open Problems

Upper Bounds

Lower Bounds

Incremental Updates

Dynamic SDPs

References

Appendix A Connections to Dynamic Positive Semi-definite Programs

Definition 23 (Packing/Covering SDP).

Dynamic LPs

Static SDPs

Dynamic SDPs

Problem 24 (Covering SDP with a Single Matrix Constraint under Restricting Updates).

Proposition 25.

Proof.

Corollary 26.

Decremental $(1+\epsilon)$ -Approximate Maximum Eigenvector: Dynamic Power Method

Definition 7 ( $\epsilon$ -max span and dimension).

Lemma 11 (Distribution of $\chi^{2}$ Variable).

Problem 12 (DecMaxEV( $\epsilon,\boldsymbol{\mathit{A}}_{0},\boldsymbol{\mathit{v}}_{1},\cdots,% \boldsymbol{\mathit{v}}_{T}$ )).

Definition 17 (Subspaces of $\boldsymbol{\mathit{A}}$ and $\boldsymbol{\widetilde{\mathit{A}}}$ ).

Definition 18 (Important $\nu$ ).

Definition 20 (Basis for $T_{\nu}$ ).

When $\nu\notin{\cal I}$

When $\nu\in{\cal I}$