Abstract 1 Introduction 2 Diagonalization of symmetric tridiagonal and arrowhead matrices in Real RAM 3 Stability of tridiagonal reduction 4 Conclusion References

Deterministic Complexity Analysis of Hermitian Eigenproblems

Aleksandros Sobczyk111Work performed while at IBM Research and ETH – Currently at Huawei Zurich Research Center ORCID IBM Research Europe, Rüschlikon, Switzerland
ETH Zurich, Zurich, Switzerland
Abstract

In this work we revisit the arithmetic and bit complexity of Hermitian eigenproblems. Recently, [BGVKS, FOCS 2020] proved that a (non-Hermitian) matrix 𝐀 can be diagonalized with a randomized algorithm in O(nωlog2(nϵ)) arithmetic operations, where ω2.371 is the square matrix multiplication exponent, and [Shah, SODA 2025] significantly improved the bit complexity for the Hermitian case. Our main goal is to obtain similar deterministic complexity bounds for various Hermitian eigenproblems. In the Real RAM model, we show that a Hermitian matrix can be diagonalized deterministically in O(nωlog(n)+n2polylog(nϵ)) arithmetic operations, improving the classic deterministic O~(n3) algorithms, and derandomizing the aforementioned state-of-the-art. The main technical step is a complete, detailed analysis of a well-known divide-and-conquer tridiagonal eigensolver of Gu and Eisenstat [GE95], when accelerated with the Fast Multipole Method, asserting that it can accurately diagonalize a symmetric tridiagonal matrix in nearly-O(n2) operations. In finite precision, we show that an algorithm by Schönhage [Sch72] to reduce a Hermitian matrix to tridiagonal form is stable in the floating point model, using O(log(nϵ)) bits of precision. This leads to a deterministic algorithm to compute all the eigenvalues of a Hermitian matrix in O(nω(log(nϵ))+n2polylog(nϵ)) bit operations, where (b)O~(b) is the bit complexity of a single floating point operation on b bits. This improves the best known O~(n3) deterministic and O(nωlog2(nϵ)(log(nϵ))) randomized complexities. We conclude with some other useful subroutines such as computing spectral gaps, condition numbers, and spectral projectors, and with some open problems.

Keywords and phrases:
Hermitian eigenproblem, eigenvalues, SVD, tridiagonal reduction, matrix multiplication time, diagonalization, bit complexity
Category:
Track A: Algorithms, Complexity and Games
Copyright and License:
[Uncaptioned image] © Aleksandros Sobczyk; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Divide and conquer
; Theory of computation Numeric approximation algorithms
Related Version:
Full Version: https://arxiv.org/abs/2410.21550 [79]
Acknowledgements:
I am grateful to Efstratios Gallopoulos, Daniel Kressner, and David Woodruff for helpful discussions.
Editors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

1 Introduction

Eigenproblems arise naturally in many applications. Given a matrix 𝐀, the goal is to compute (a subset of) the eigenvalues λ and/or the eigenvectors 𝐯, which satisfy

𝐀𝐯=λ𝐯.

The properties of the given matrix, as well as the quantities that need to be computed can vary depending on the application, giving rise to different variants of the eigenproblem. These include (i) eigenvalue problems, such as the approximation of eigenvalues, singular values, spectral gaps, and condition numbers, (ii) eigenspace problems, which refer to the approximation of eigenvectors, spectral projectors, and invariant subspaces, and (iii) diagonalization problems, which involve the (approximate) computation of all the eigenvalues and eigenvectors of a matrix (or pencil), i.e., a full spectral factorization and/or the Singular Value Decomposition (SVD).

In this work we focus on algorithms for Hermitian eigenproblems, i.e., the special case where the input matrix is Hermitian. Our motivation to dedicate the analysis to this special class is twofold. First, they arise in many fundamental applications in Machine Learning, Spectral Graph Theory, and Scientific Computing. For example, the SVD, which is ubiquitous for many applications such as low rank approximations [69, 39, 35, 22], directly reduces to a Hermitian eigenproblem. Second, the spectral theorem states that a Hermitian matrix 𝐀 can always be written in a factored form 𝐀=𝐐𝚲𝐐, where 𝐐 is unitary and 𝚲 is diagonal with real entries. This alleviates several difficulties of the non-Hermitian case, which can lead to efficient dedicated algorithms.

Algorithms for eigenproblems have been studied for decades, some of the earliest being attributed to Jacobi [53, 43]. We refer to standard textbooks for an overview of the rich literature [31, 44, 70, 12, 73]. Some landmark works include the power method [60], the Lanczos algorithm [57], and the paramount QR algorithm [37, 38, 56], which has been recognized as one of the “top ten algorithms of the twentieth century” [33], signifying the importance of the eigenproblem in science and engineering. Given a Hermitian tridiagonal matrix 𝐓 with size n×n, the algorithm can compute a set of approximate eigenvalues in O(n2log(nϵ)) arithmetic operations, based on the seminal analyses of Wilkinson [84] and Dekker and Traub [25]. A set of approximate eigenvectors can be also returned in O(n3log(nϵ)) operations. In conjunction with the classic unitary similarity transformation algorithm of Householder [52], the shifted-QR algorithm has heavylifted the computational burden of solving eigenvalue problems for decades, both in theory and in practice. A detailed bit complexity analysis was provided recently in [8, 7, 9].

Despite the daunting literature, several details regarding the computational complexity of many algorithms remain unclear. It is well-known, for example, that the cubic arithmetic complexity to compute the eigenvalues of a dense matrix is not optimal: a classic work by Pan and Chen [68] showed that the eigenvalues can be approximated in O(nω) arithmetic operations, albeit without detailing how to also compute eigenvectors, or to fully diagonalize a matrix. Here ω2 is the matrix multiplication exponent, i.e. the smallest number such that two n×n matrices can be multiplied in O(nω+η) arithmetic operations, for any η>0. The current best known upper bound is ω<2.371339 [1], and we will write nω instead of nω+η hereafter for simplicity. Recently, Banks, Garza-Vargas, Kulkarni, and Srivastava [6] described a numerically stable randomized algorithm to compute a provably accurate diagonalization, in O~(nω) operations, improving the previous best-known bounds, specifically, O~(n3) (Hermitian) and O(n10) (non-Hermitian) [3]. [78] further improved the analysis for the Hermitian case, and several works have studied extensions and related applications [28, 29, 74, 80]. To date, we are not aware of any deterministic algorithm that achieves the same arithmetic (or bit) complexity with provable approximation guarantees, even for the Hermitian case. In this work, we proceed step-by-step and analyze several variants of the aforementioned eigenvalue, eigenspace, and diagonalization problems, in different models of computation, and report complexity upper bounds with provable approximation guarantees.

1.1 Models of computation and complexity

From the Abel-Ruffini theorem it is known that the eigenvalues and/or the eigenvectors of matrices with size larger than n=5 can not be computed exactly. Even in exact arithmetic, they can only be approximated. Before analyzing algorithms, we first need to clarify what are the quantities of interest, to define how accuracy is measured, and what is the underlying model of computation. The main two models that we use to analyze algorithms are the Real RAM and the Floating Point model, described below.

Exact real arithmetic (Real RAM):

For the Real RAM model we follow the definition of [36]. The machine has a memory (RAM) and registers that store real numbers in infinite precision. Moving real numbers between the memory and the registers takes constant time. A processor can execute arithmetic operations {+,,×,/,,>} on the real numbers stored in the registers exactly, without any errors, in constant time. Other functions such as log(x),exp(x), and trigonometric functions, are not explicitly available, but they can be efficiently approximated up to some additive error, e.g. with a truncated polynomial expansion, using a polylogarithmic number of basic arithmetic operations. Bit-wise operations on real numbers are forbidden, since this can give the machine unreasonable computing power [48, 76, 36].

Floating point:

In this model, a real number α is rounded to a floating point number 𝗳𝗹(α)=s×2et×m, where s=±1, e is the exponent, t is the bias, and m is the significand (or “mantissa”). Floating point operations also introduce rounding errors, i.e., for two floating point numbers α and β, each operation {+,,×,/} satisfies:

𝗳𝗹(αβ)=(1+θ)(αβ),and also 𝗳𝗹(a)=(1+θ)a, (1)

where |θ|u, and u is the machine precision. Assuming a total of b bits for each number, every floating point operation costs (b) bit operations, where typically (b)O(b2), or even O~(b), with more advanced algorithms [77, 40, 49]. More details are provided in Appendix D of the full version [79].

In Section 3.4 we will also use as a subroutine an algorithm from [13, 15], which was originally analyzed in Boolean RAM model. We describe in detail how to use it in the corresponding section.

Arithmetic and boolean complexity

Given a model of computation, the arithmetic complexity of an algorithm is quantified in terms of the arithmetic operations executed. The boolean (or bit) complexity, accounts for the total number of boolean operations (i.e. boolean gates with maximum fan-in two).

1.2 Notation

Matrices and vectors are denoted with bold capital and small letters, respectively. 𝐀 is the spectral norm of 𝐀, 𝐀F its Frobenius norm, κ(𝐀)=𝐀𝐀 is the condition number, and Λ(𝐀) its spectrum. For the complexity analysis we denote the geometric series Sx(m)=l=1m(2x2)l. As already mentioned, for simplicity, the complexity of multiplying two n×n matrices will be denoted as O(nω), instead of O(nω+η), for arbitrarily small η>0. We also use the standard notation ω(a,b,c) for the complexity of multiplying two rectangular matrices with sizes na×nb and nb×nc in time O(nω(a,b,c)), and therefore ω:=ω(1,1,1). For example, for a=c=1 and b=2, the best known bound for ω(1,2,1)3.250035 [1], which is slightly better than naively performing n square multiplications in O(nnω)O(n3.371339). (b) denotes the bit complexity of a single floating point operation on b bits.

1.3 Methods and Contributions

1.3.1 Real RAM

We start with the analysis in exact arithmetic, which is the simplest model to analyze numerical algorithms. The goal is to obtain end-to-end upper bounds for the arithmetic complexity of approximately diagonalizing symmetric tridiagonal matrices and, ultimately, dense Hermitian matrices, as well as to approximate the SVD. To measure the accuracy, we follow the notion of backward-stability (or backward-approximation) for Hermitian diagonalization from [63]. Formally, the following problems are considered.

Problem 1.1.

Backward-approximate diagonalization problems in exact arithmetic.

  1. (i)

    Symmetric arrowhead/tridiagonal diagonalization: Given a symmetric arrowhead or tridiagonal matrix 𝐀 with size n×n, 𝐀1, and accuracy ϵ(0,1), compute a diagonal matrix 𝚲~ and a matrix 𝐔~, such that 𝐔~=𝐔+𝐄𝐔, where 𝐔 is orthogonal and 𝐄𝐔ϵ, and 𝐀𝐔𝚲~𝐔ϵ.

  2. (ii)

    Hermitian diagonalization: Given a Hermitian matrix 𝐀 with size n×n, 𝐀1, and accuracy ϵ(0,1), compute a diagonal matrix 𝚲~ and a matrix 𝐔~, such that 𝐔~=𝐔+𝐄𝐔, where 𝐔 is unitary and 𝐄𝐔ϵ, and 𝐀𝐔𝚲~𝐔ϵ.

  3. (iii)

    SVD: Given a matrix 𝐀m×n, m=nk, k1, 𝐀1, and accuracy ϵ(0,1) compute a diagonal matrix 𝚺~, an m×n matrix 𝐔~=𝐔+𝐄𝐔, and an n×n matrix 𝐕~=𝐕+𝐄𝐕, such that 𝐔𝐔=𝐕𝐕=𝐈, 𝐄{𝐔,𝐕}ϵ, and 𝐀𝐔𝚺~𝐕ϵ.

To obtain rigorous complexity guarantees for these problems, with respect to all the involved parameters, we start from Problem 1.1-(i), namely, diagonalization of symmetric tridiagonal matrices. To that end, we revisit the so-called fast tridiagonal eigensolvers, which aim to reduce the complexity from cubic to O~(n2) operations. Many such algorithms have been studied in the literature [34, 32, 14, 13, 41, 83, 66, 81, 10], most of which are based on the divide-and-conquer (DC) strategy of Cuppen [24]. The algorithms in all of the aforementioned works have been rigorously analyzed, however, explicit complexity bounds in terms of strictly solving Problem 1.1-(i) are not detailed. We resolve this by providing an end-to-end complexity analysis of the algorithm of Gu and Eisenstat [46]. In their original work, the authors outlined how to accelerate several parts of the algorithm with the Fast Multipole Method (FMM) [71], which could eventually lead to a final complexity of O~(n2). However, the actual analysis of this approach and the FMM details were not provided. [61] further extended the analysis in floating point, but it also relies on a numerically stable FMM implementation, which is not detailed. In this work, we use the elegant FMM analysis of [45, 58, 21], which is particularly suited for the problems considered. It is summarized in the following Proposition 1.1.

Proposition 1.1 (FMM).

There exists an algorithm, which we refer to as (ϵ,n)-approximate FMM (or (ϵ,n)-FMM, for short), which takes as input

  • a kernel function k(x){log(|x|),1x,1x2},

  • 2n+m real numbers: {x1,,xm}{c1,,cn}{y1,,yn}, and a constant C, such that mn and for all i[m],j[n] it holds that

    |xi|,|cj|,|yj|<C and |xiyj|Ω(poly(ϵn)).

It returns m values f~(x1),,f~(xm) such that |f~(xi)f(xi)|ϵ, for all i[m], where f(x)=j=1ncjk(xiyj), in a total of O(nlogξ(nϵ)) arithmetic operations, where ξ1 is a small constant that is independent of ϵ,n.

Proof.

The result follows from the analysis of [45, 58, 21]. A detailed description and a short proof is provided for completeness in the full version [79], specifically in Appendix B, Proposition B.1. By taking advantage of the FMM, the analysis from Section 2 leads to the following Theorem 1.2, whose proof can be found in Section 2.2.1.

Theorem 1.2.

Given an unreduced symmetric tridiagonal matrix 𝐓 with size n×n, 𝐓1, an accuracy ϵ(0,1/2), and an (ϵ,n)-FMM  implementation as in Proposition 1.1, the recursive algorithm of [46] (Algorithm 1), returns an approximately orthogonal matrix 𝐔~ and a diagonal matrix 𝚲~ such that

𝐓𝐔~𝚲~𝐔~ϵ,𝐔~𝐔~𝐈ϵ/n2,

or, stated alternatively,

𝐔~=𝐔+𝐄𝐔,𝐔𝐔=𝐈,𝐄𝐔ϵ/n2,𝐓𝐔𝚲~𝐔ϵ,

using a total of O(n2logξ+1(nϵ)) arithmetic operations and comparisons, where ξ1 is a small constant that depends on the specific FMM implementation and it is independent of ϵ,n.

Table 1: Comparison of algorithms for the Problems 1.1 (i)-(iii) in the Real RAM model. Here ξ1 is a small constant which depends on the FMM implementation (see Prop. 1.1). The algorithms marked with (R) are randomized and succeed with high probability (at least 11/poly(n)).
Arithmetic Complexity Comment
Prob. 1.1-(i)
Arrowhead/Trid.
diagonalization
Refs. [24, 65, 46] O(n3)+O~(n2) Conjectured O~(n2)
Theorem 1.2 O(n2logξ+1(nϵ)) Req. FMM (Prop. 1.1)
Prob. 1.1-(ii)
Hermitian
diagonalization
Refs. [6, 78] (R) O(nωlog2(nϵ)) -
Ref. [11] (R) O~(nω+1) Conjectured O~(nω)
Refs. [25, 84, 51] O(n3log(nϵ)) Shifted-QR
Ref. [63] O~(n3) Req. separated spectrum
Ref. [68] O(nω+npolylog(nϵ)) Only eigenvalues
Corollary 2.4 O(nωlog(n)+n2logξ+1(nϵ)) Req. FMM (Prop. 1.1)
Prob. 1.1-(iii), SVD
Shifted-QR on 𝐀𝐀 O(nω(1,k,1)+n3log(nϵ)) Partial error analysis
Refs. [6, 78, 55] (R) O(nω(1,k,1)+nωlog2(nϵ)) Partial error analysis
Theorem 1.3 O(nω(1,k,1)+nωlog(n)+n2polylog(nκ(𝐀)ϵ)) Req. FMM (Prop. 1.1)

This result has a direct application to the dense Hermitian diagonalization problem. The best-known complexities of deterministic algorithms, with end-to-end analysis, are at least cubic. For example, the aforementioned shifted-QR algorithm requires O(n3log(n/ϵ)) arithmetic operations, even for tridiagonal matrices. The cubic barrier originates from the accumulation of the elementary rotation matrices to form the final eigenvector matrix. The so-called spectral divide-and-conquer methods (see e.g. [31, 63]) have also at least cubic complexities in the deterministic case. The two main difficulties in their analysis are the basis computation of spectral projectors, for which the best-known deterministic complexity bounds are Ω(n3), e.g. via Strong Rank-Revealing QR (RRQR) factorization [47], and the choice of suitable splitting points, which relies on the existence of spectral gaps.

Randomness can help to overcome certain difficulties in the analysis. [26] analyzed a numerically stable Randomized URV decomposition, which can be used to replace RRQR for basis computations in spectral DC algorithms. A highly parallelizable Hermitian diagonalization algorithm with end-to-end analysis was proposed in [11]. While the reported sequential arithmetic complexity is O(nω+1), the authors conjectured that it can be further reduced to O~(nω). The first end-to-end O~(nω) complexity upper bound was achieved in [6]. One of the main techniques was to use random perturbations to ensure that the pseudospectrum is well-separated, which helps to find splitting points in spectral DC. [78] further improved the analysis for Hermitian matrices.

Theorem 1.2 can be directly used to obtain a simple and deterministic solution for the Hermitian diagonalization problem. Specifically, Corollary 2.4 states that the problem can be solved in O(nωlog(n)+n2polylog(n/ϵ)) arithmetic operations, which is both faster and fully deterministic. This is achieved by combining Theorem 1.2 with the (rather overlooked) algorithm of Schönhage [75], who proved that a Hermitian matrix can be reduced to tridiagonal form with unitary similarity transformations in O(nωlog(n)c(ω)) arithmetic operations, where c(ω)=O(1) if ω is treated as a fixed constant larger than 2, while c(ω)=O(log(n)) if it turns out that ω=2.

As a consequence, Theorem 1.3 reports similar deterministic complexity results for the SVD. The SVD is a fundamental computational kernel in many applications, such as Principal Component Analysis [54], and it is also widely used as a subroutine for many other advanced algorithms, e.g. [69, 39, 35, 18, 20, 23, 19, 22], to name a few. A straightforward algorithm to compute it is to first form the Gramian matrix 𝐀𝐀, and then diagonalize it. Other classic SVD algorithms, such as the widely adopted Golub-Kahan bidiagonalization and its variants [42], or polar decomposition-based methods [63, 62], avoid the computation of 𝐀𝐀 for numerical stability reasons, but they also rely on a diagonalization algorithm as a subroutine. [55] elaborated on a matrix multiplication time SVD algorithm, by using [6] to diagonalize 𝐀𝐀, albeit without fully completing the backward stability analysis. The following Theorem 1.3 states our main result. The proof can be found in the full version [79], in Appendix C.4.

Theorem 1.3 (SVD).

Let 𝐀m×n, m=nk, k1. Assume that 1/nc𝐀1, for some constant c. Given accuracy ϵ(0,1/2) and an (ϵ,n)-FMM  (see Prop. 1.1), we can compute three matrices 𝐔~m×n,𝚺~n×n,𝐕~n×n such that 𝚺~ is diagonal with positive diagonal elements and

𝐀=𝐔~𝚺~𝐕~,𝐔~𝐔~𝐈ϵ,𝐕~𝐕~𝐈ϵ/(κ(𝐀)n2)ϵ,

or, stated alternatively,

𝐀𝐔𝚺~𝐕ϵ,𝐔~ =𝐔+𝐄𝐔,𝐄𝐔ϵ,𝐔𝐔=𝐈,
𝐕~ =𝐕+𝐄𝐕,𝐄𝐕ϵ/n2,𝐕𝐕=𝐈.

The algorithm requires a total of at most O(nω(1,k,1)+nωlog(n)+n2polylog(nκ(𝐀)ϵ)) arithmetic operations, where κ(𝐀)=𝐀𝐀.

To summarize this section, in Table 2 we list all the deterministic arithmetic complexities achieved in the Real RAM model, for all the aforementioned problems, and compare with some important existing algorithms.

1.3.2 Finite precision

Similar deterministic complexity upper bounds are obtained for several problems in finite precision. In particular, we study the following problems, for which we seek to bound the boolean complexity, i.e., the total number of bit operations.

Problem 1.2.

Main problems in finite precision.

  1. (i)

    Tridiagonal reduction: Given a Hermitian matrix 𝐀 with floating point elements, reduce 𝐀 to tridiagonal form using (approximate) unitary similarity transformations. In particular, return a tridiagonal matrix 𝐓~, and (optionally) an approximately unitary matrix 𝐐~, such that

    𝐐~𝐐~𝐈ϵ,and𝐀𝐐~𝐓~𝐐~ϵ𝐀,
  2. (ii)

    Hermitian eigenvalues: Given a Hermitian matrix 𝐀, 𝐀1, and accuracy ϵ(0,1), compute a set of approximate eigenvalues λ~i such that |λ~iλi(𝐀)|ϵ.

Table 2: Boolean complexity for Problems 1.2, for matrix size n×n and accuracy ϵ(0,1).
Boolean Complexity Comment
Prob. 1.2-(i)
Tridiag. Reduction
Refs. [52, 50] O(n3(log(nϵ))) Standard Householder reduction
Theorem 3.2 O(nωlog(n)(log(nϵ))) [75] with stable fast QR [26]
Prob. 1.2-(ii)
Herm. Eigenvalues
Ref. [78] O(nωlog2(nϵ)(log(nϵ))) Randomized, Pr[success]11n
Theorem 3.4 O(nω(log(nϵ))+n2polylog(nϵ)) Deterministic, Thm. 3.2 + [15]

Regarding deterministic algorithms, with end-to-end-analysis, the standard approach is to first reduce the Hermitian matrix to tridiagonal form with Householder transformations [52], which can be done stably in O(n3) arithmetic operations using O(log(n/ϵ)) bits of precision; see e.g. [50]. Thereafter, there is a plethora of algorithms (e.g. the ones mentioned in the previous section) for the eigenvalues of the tridiagonal matrix, with varying complexities and stability properties. However, the total boolean complexity cannot be lower than Ω(n3(log(nϵ))) due to the Householder reduction step. Other well-known deterministic and numerically stable algorithms in the literature also require at least n3 arithmetic operations to compute all the eigenvalues [67, 30, 63], and at least polylog(n,1/ϵ) bits of precision in a floating point machine. The arithmetic complexity of the algorithm of [68] scales as O(nω) with respect to the matrix size n, but the boolean complexity can increase up to O(nω+1) in rational arithmetic. [59] described a randomized algorithm to compute only the largest eigenvalue in nearly O(nω) bit complexity. The fastest algorithm to compute all the eigenvalues of a Hermitian matrix is [78], which requires O(nωpolylog(n/ϵ)) boolean operations and succeeds with high probability.

Randomized eigenvalue algorithms have also been studied in the sketching/streaming setting [2, 64, 82]. The (optimal) algorithm of [82] has not been analyzed in finite-precision, but, due to its simplicity, it should be straightforward to achieve. The algorithm approximates all the eigenvalues of a Hermitian matrix 𝐀 up to additive error ϵ𝐀F. However, to reduce the error to a spectral-norm bound ϵ𝐀, the algorithm internally needs to diagonalize a matrix with size Ω(n), and therefore it does not provide any improvement against any other Hermitian eigenvalue solver. Nevertheless, our main results can be directly applied as the main eigenvalue subroutine of this algorithm, and to help analyze its bit complexity.

To improve the aforementioned Hermitian eigenvalue algorithms, we first prove in Theorem 3.2 it is proved that Schönhage’s algorithm is numerically stable in the floating point model of computation. Thereafter, we carefully combine it with the algorithms of [13, 14, 15] which provably and deterministically approximate all the eigenvalues of a symmetric tridiagonal matrix in O~(n2) boolean operations. The latter algorithm is analyzed in the Boolean RAM model, therefore, in order to use it we need to convert the floating point elements of the tridiagonal matrix that is returned by Theorem 3.2 to integers, which can be done efficiently under reasonable assumptions on the initial number of bits used to represent the floating point numbers. This is described in detail in the proof of our main Theorem 3.4. Our result derandomizes and slightly improves the final bit complexity of the algorithm of [78], which has the currently best known bit complexity for this problem. Table 2 summarizes this discussion.

As a direct consequence of Theorem 3.4, in Section 4 of the full version [79] we also provide the analysis for several other useful subroutines related to eigenvalue/eigenvector computations, including:

  1. (i)

    Singular values and condition number: In [79, Proposition 4.1] we describe how to obtain relative error approximations of singular values. In [79, Corollary 4.1] we show how to compute the condition number.

  2. (ii)

    Definite pencil eigenvalues: In [79, Corollary 4.2] we demonstrate how to extend Theorem 3.4 to compute the eigenvalues of Hermitian-definite pencils.

  3. (iii)

    Spectral gaps: In [79, Corollary 4.4] we show how to compute the spectral gap and the midpoint between any two eigenvalues of a Hermitian-definite pencil. Our algorithm is deterministic and it requires significantly less bits of precision than the algorithm of [80], who described a randomized algorithm for this problem that is slightly faster than applying [6] as a black-box, but it only computes a single spectral gap.

  4. (iv)

    Spectral projector: [79, Corollary 4.5] details how to compute spectral projectors on invariant subspaces of Hermitian-definite pencils, which are important for many applications.

1.4 Outline

The paper is organized as follows. In Section 2 we analyze the algorithm of [46] when implemented with the FMM, and its application to Hermitian diagonalization and the SVD. In Section 3 it is proved that Schönhage’s algorithm is numerically stable in floating point, and it is used as a preprocessing step to compute the eigenvalues of Hermitian matrices. We finally conclude and state some open problems in Section 4.

2 Diagonalization of symmetric tridiagonal and arrowhead matrices in Real RAM

Our main target is to compute the eigenvalues and the eigenvectors of tridiagonal symmetric matrices in nearly linear time. To derive the desired result, we provide an analysis the divide-and-conquer algorithm of Gu and Eisenstat [46], when implemented with the FMM from Proposition 1.1.

The algorithm of [46] first “divides” the problem by partitioning the (unreduced) tridiagonal matrix 𝐓 as follows:

𝐓=(𝐓1βk+1𝐞kβk+1𝐞kαk+1βk+2𝐞1βk+2𝐞1𝐓2).

If one has access to the spectral decomposition of 𝐓1 and 𝐓2, i.e. 𝐓1=𝐐1𝐃1𝐐1 and 𝐓2=𝐐2𝐃2𝐐2, then 𝐓 can be factorized as

(𝐐11𝐐2)(αk+1βk+1𝐥1βk+2𝐟2βk+1𝐥1𝐃1βk+2𝐟2𝐃2)(1𝐐1𝐐2)=𝐐𝐇𝐐, (2)

where 𝐥1 is the last row of 𝐐1 and 𝐟2 is the first row of 𝐐2, and 𝐇 has the so-called arrowhead structure. Thus, given this form, to compute the spectral decomposition of 𝐓, it suffices to diagonalize 𝐇. One can then apply recursively the algorithm to compute the spectral decompositions of 𝐓1 and 𝐓2, and, finally, at the “conquering stage,” combine the solutions with the eigendecomposition of 𝐇. For the individual steps to be computed efficiently, we will need to appropriately use the FMM from Proposition 1.1.

2.1 Symmetric arrowhead diagonalization

The first step is to provide an end-to-end complexity analysis for the arrowhead diagonalization algorithm of [46], when implemented with the FMM. We start with an n×n arrowhead matrix of the form

𝐇=(α𝐳𝐳𝐃), (3)

where 𝐃 is a diagonal matrix, 𝐳 is a vector of size n1 and α is a scalar. Without loss of generality we assume that 𝐇1. The main result is stated in Theorem 2.1. The full proof of Theorem 2.1 relies on several technical lemmas that can be found in the full version [79, Appendices A and B]. Here we briefly describe the methodology, which was originally described in [46], but it was not analyzed in detail.

  1. 1.

    Deflation: In the first step, the matrix 𝐇 is preprocessed to ensure that it satisfies the following: di+1diτ, and |𝐳i|τ, where τ(0,1) is an appropriately chosen parameter that will be specified later. If this holds, it implies several useful properties, in particular, the eigenvalues λi of 𝐇 are the roots of the secular equation

    f(λ)=λα+j=2n𝐳j2djλ, (4)

    and they satisfy the following:

    λ1<d2<λ2<<dn<λn, (5)
    min{|diλi|,|diλi1|}τ3n+1,for all i=2,,n. (6)

    We refer to [79, Lemma A.1] for more details. This allows us to efficiently utilize the FMM in the subsequent steps. The full deflation procedure takes O(nlog(n)) steps, and it is described in [79, Proposition A.1].

  2. 2.

    Eigenvalues: The eigenvalues of the deflated matrix can be conveniently approximated as the roots of Eq. (4). FMM-based root finder is described in [79, Lemma B.1], which requires a total of at most O(npolylog(n,ϵ1,τ1)) arithmetic operations.

  3. 3.

    From [17, 46], it is known that the approximate eigenvalues returned by the root finder are the exact eigenvalues of another arrowhead matrix 𝐇^=(α^𝐳^𝐳^𝐃). There is an analytical expression for the elements of 𝐇^. [79, Lemma B.2] describes how to approximate those elements up to error ϵ with the FMM in O(npolylog(n,ϵ1,τ1)) arithmetic operations.

  4. 4.

    Next, given that we know the exact eigenvalues and the approximate elements of the matrix 𝐇^, we can focus on computing its eigenvectors. In particular, in [79, Lemma B.3] it is shown how to use the FMM to approximate the inner products between the eigenvectors of 𝐇^ and some arbitrary unit vector 𝐛. Computing such inner products with all the columns of the identity, we obtain the final approximate eigenvector matrix of 𝐇 and, ultimately, an approximate diagonalization of 𝐇.

Theorem 2.1.

Given a symmetric arrowhead matrix 𝐇n×n as in Eq. (3), with 𝐇1, an accuracy parameter ϵ(0,1), a matrix 𝐁 with r columns 𝐁i,i[r], where 𝐁i1, and an (ϵ,n)-FMM  implementation (see Prop. 1.1), we can compute a diagonal matrix 𝚲~, and a matrix 𝐐~𝐁, such that

𝐇𝐐𝚲~𝐐 ϵ,|(𝐐𝐁𝐐~𝐁)i,j|ϵ/n2,

where 𝐐n×n is (exactly) orthogonal, in O(nrlogξ+1(nϵ)) arithmetic operations and comparisons, where ξ1 is a small constant that depends on the specific FMM implementation and it is independent of ϵ,n.

Alternatively, if only want to compute a set of approximate values λ~1,,λ~n, such that |λi(𝐇)λ~i|ϵ, the complexity reduces to O(nlog(1ϵ)logξ(nϵ)) arithmetic operations.

Proof.

The full proof follows the methodology described above, and it is provided [79, Theorem B.1] in the full version.

 Remark 2.2.

We note that, if one is only interested in a full diagonalization of the arrowhead matrix, the use of the FMM for Step 3 is redundant, i.e., we can naively compute the elements of 𝐇^ exactly without the FMM in O(n2) operations, without affecting the final complexity. However, it is beneficial if we need to compute only a few matrix-vector products with the eigenvector matrix.

2.2 Tridiagonal diagonalization

Given the analysis for arrowhead diagonalization, we can now proceed to tridiagonal matrices. The next lemma bounds the error of the reduction to arrowhead form when the spectral factorizations of the matrices 𝐓1 and 𝐓2 in Equation (2) are approximate rather than exact. This will be used as an inductive step for the final algorithm.

Lemma 2.3.

Let ϵ(0,1/2) be a given accuracy parameter and 𝐓=(𝐓1βk+1𝐞kβk+1𝐞kαk+1βk+2𝐞1βk+2𝐞1𝐓2) be a tridiagonal matrix with size n3 and 𝐓1, where 𝐓1=𝐔1𝐃1𝐔1 and 𝐓2=𝐔2𝐃2𝐔2 be the exact spectral factorizations of 𝐓1 and 𝐓2. Let 𝐔~1,𝐃~1,𝐔~2,𝐃~2 be approximate spectral factorizations of 𝐓1,𝐓2. If these factors satisfy

𝐓{1,2}𝐔~{1,2}𝐃~{1,2}𝐔~{1,2}ϵ1,𝐔~{1,2}𝐔~{1,2}𝐈 ϵ1/n,

for some ϵ1(0,1/2), where 𝐃~{1,2} are both diagonal, then, assuming an (ϵ,n)-FMM  implementation as in Prop. 1.1, we can compute a diagonal matrix 𝐃~ and an approximately orthogonal matrix 𝐔~ such that

𝐔~𝐔~𝐈3(ϵ1+ϵ)/n,and𝐓𝐔~𝐃~𝐔~2ϵ1+7ϵ,

in a total of O(n2logξ+1(nϵ)) arithmetic operations and comparisons, where ξ1 is a small constant that depends on the specific FMM implementation and it is independent of ϵ,n.

Proof.

The proof can be found in the full version, in [79, Appendix C, Lemma C.1].

Lemma 2.3 gives rise to the following recursive algorithm. We can finally proceed with the proof of Theorem 1.2, which gives the complexity of Algorithm 1.

Algorithm 1 Recursive algorithm based on [46] to diagonalize a symmetric tridiagonal matrix.

2.2.1 Proof of Theorem 1.2

Proof.

Let 𝐓(p) be a matrix at recursion depth p. We can always divide 𝐓(p) such that the sizes of 𝐓{1,2}(p) differ by at most 1. If we keep dividing the matrix 𝐓(p) in this manner, then the recursion tree has depth at most d=log(n).

The correctness follows by induction. Consider the base case d where 𝐓(d) has size at most 5×5, which means that 𝐓1,2(d) have size 1×1 or 2×2. We can compute the exact diagonalization of 𝐓1,2(d), and therefore they satisfy the requirements of Lemma 2.3 for ϵ1=0. Thus, if we apply Lemma 2.3 with parameter ϵ to compute the matrices 𝐔~(d) and 𝐃~(d), they will satisfy the following bounds:

𝖾𝗋𝗋𝐔~(d) :=𝐔~(d)𝐔~(d)𝐈3(ϵ1+ϵ)/n=3ϵ/n,
𝖾𝗋𝗋𝐓(d) :=𝐓𝐔~(d)𝐃~(d)𝐔~(d)7ϵ.

We need to bound the error at the root (depth p=0). As long as the error at depth p1 is smaller than 1/2, the the error at depth p is bounded by

𝖾𝗋𝗋𝐔~(p) 3𝖾𝗋𝗋𝐔~(p+1)+3ϵ/n<4𝖾𝗋𝗋𝐔~(p+1)+4ϵ/n,
𝖾𝗋𝗋𝐓(p) 2𝖾𝗋𝗋𝐓(p+1)+7ϵ<8𝖾𝗋𝗋𝐓(p+1)+8ϵ.

Solving the recursion we have

𝖾𝗋𝗋𝐔~(0) <p=1log(n)4pϵ/n=O(n2)ϵn=O(n)ϵ,𝖾𝗋𝗋𝐓(0)<p=1log(n)8pϵ=O(n3)ϵ.

It thus suffices to run the algorithm with initial error ϵ=cϵ/n3 for some small constant c.

The complexity analysis for ϵ follows. For size n, the complexity is given by

T(n)2T(n2)+Cn2logξ+1(nϵ).

Solving the recursion yields T(n)=O(n2logξ+1(nϵ))=O(n2logξ+1(nϵ)).

The final matrices 𝐔~=𝐔~(0) and 𝚲~=𝐃~(0) satisfy 𝐔~(d)𝐔~(d)𝐈ϵ/n2 and 𝐓𝐔~𝚲~𝐔~ϵ, which means that:

𝚲~𝐔~12(𝐓+𝐓𝐔~𝚲~𝐔~)11ϵ/n2(1+ϵ).

We can then write 𝐔~=𝐔+𝐄𝐔, which gives

𝐓𝐔𝚲~𝐔 𝐓𝐔~𝚲~𝐔~+2𝚲~𝐄𝐔+𝚲~𝐄𝐔2
ϵ+(1+ϵ)1ϵ/n2ϵn2+1+ϵ1ϵ/n2(ϵn2)2=ϵ(1+(1+ϵ)(1+ϵ/n2)n2ϵ).

Rescaling ϵ by a small constant slightly larger than one gives the alternative statement.

2.3 Hermitian diagonalization

Given an algorithm to diagonalize tridiagonal matrices, the following corollary is immediate.

Corollary 2.4.

Let 𝐀 be a Hermitian matrix of size n with 𝐀1. Given accuracy ϵ(0,1/2), and an (ϵ,n)-FMM  implementation of Prop. 1.1, we can compute a matrix 𝐐~ and a diagonal matrix 𝚲~ such that

𝐀𝐐~𝚲~𝐐~ϵ,𝐐~𝐐~𝐈ϵ/n2.

The algorithm requires a total of O(nωlog(n)+n2logξ+1(nϵ)) arithmetic operations and comparisons, where ξ1 is a small constant that depends on the specific FMM implementation and it is independent of ϵ,n.

Proof.

The first step is to use the algorithm of Schönhage [75] to reduce the matrix to tridiagonal form in O(nωlog(n)) arithmetic operations using unitary similarity transformations. Thereafter we diagonalize the tridiagonal matrix using Theorem 1.2. The full proof, as well as an alternative statement of the result, can be found in [79, Corollary C.2, Appendix C.3].

3 Stability of tridiagonal reduction

In this section we analyze the numerical stability and the boolean complexity of Schönhage’s algorithm the floating point model. For this we will use the following stable matrix multiplication and backward-stable QR factorization algorithms as subroutines from [26, 27].

3.1 Matrix nomenclature

Schönhage [75] used a block variant of Rutishauser’s algorithm [72] to reduce a matrix to tridiagonal form, where elementary rotations are replaced with block factorizations; see also [16, 4, 5] for similar methodologies. We start with a n×n block-pentadiagonal matrix 𝐀(k,s,t), where k{0,,log(n)2} (we assume without loss of generality that n is a power of two). The matrix 𝐀(k,s,t) is partitioned in bk×bk blocks of size nk×nk each, where bk=nnk and nk=2k. The integer s2,,bk t denotes that all the blocks 𝐀i,i2 and 𝐀i2,i, for all i=2,,s are equal to zero. s=2 is a special case to denote a full block pentadiagonal matrix. The integer t{s+2,,bk} denotes that the matrix has two additional nonzero blocks in the third off-diagonals, specifically at positions 𝐀t,t3 and 𝐀t3,t. These blocks are often called the “bulge” in the literature. When t=0, there is no bulge. As a consequence, the matrix 𝐀(k,2,0) is full block-pentadiagonal, while the matrix 𝐀(k,bk,0) is block-tridiagonal. An illustration of these definitions is shown in Equation (7). A box is placed around the bulge on the second matrix.

(𝐀1,1𝐀1,2𝐀1,300000𝐀2,1𝐀2,2𝐀2,3𝐀2,40000𝐀3,1𝐀3,2𝐀3,3𝐀3,4𝐀3,50000𝐀4,2𝐀4,3𝐀4,4𝐀4,5𝐀4,60000𝐀5,3𝐀5,4𝐀5,5𝐀5,6𝐀5,70000𝐀6,4𝐀6,5𝐀6,6𝐀6,7𝐀6,80000𝐀7,5𝐀7,6𝐀7,7𝐀7,800000𝐀8,6𝐀8,7𝐀8,8)𝐀(k,2,0),(𝐀1,1𝐀1,2000000𝐀2,1𝐀2,2𝐀2,3000000𝐀3,2𝐀3,3𝐀3,4𝐀3,5𝐀3,60000𝐀4,3𝐀4,4𝐀4,5𝐀4,60000𝐀5,3𝐀5,4𝐀5,5𝐀5,6𝐀5,7000𝐀6,3𝐀6,4𝐀6,5𝐀6,6𝐀6,7𝐀6,80000𝐀7,5𝐀7,6𝐀7,7𝐀7,800000𝐀8,6𝐀8,7𝐀8,8)𝐀(k,4,6). (7)

3.2 Rotations

The algorithm defines two types of block rotations Ri and Ri, which are unitary similarity transformations, with the following properties.

Definition 3.1 (Rotations).

The algorithm of [75] uses the following two types of rotations:

  1. 1.

    Ri(𝐀(k,i,0)), for i=2,,bk1, operates on a block-pentadiagonal matrix without a bulge. It transforms the matrix 𝐀(k,i,0) to a matrix 𝐀(k,i+1,i+3). In paricular, the block 𝐀i,i1 becomes upper triangular, the block 𝐀i+1,i1 becomes zero, and a new bulge block arises at 𝐀i,i+3. Due to symmetry, 𝐀i1,i becomes lower triangular, 𝐀i1,i+1 is eliminated, and 𝐀i+3,i becomes non-zero.

  2. 2.

    Rj(𝐀(k,s,j+1)), for some j=s+1,,bk1, operates on a block-pentadiagonal matrix with a bulge at positions 𝐀j+1,j2, 𝐀j2,j+1. It transforms the matrix 𝐀(k,s,j+1) to a matrix 𝐀(k,s,j+3), such that the bulge is moved two positions “down-and-right”, i.e. the blocks 𝐀j2,j+1 and 𝐀j+1,j2 become zero and the blocks 𝐀j,i+3 and 𝐀j,j+3 become the new bulge. In addition, the matrices 𝐀j,j2 and 𝐀j+1,j1 become upper triangular, and, by symmetry, the matrices 𝐀j2,j and 𝐀j1,j+1 become lower triangular.

In the full version, in [79, Lemmas E.1 and E.2] it is proved that both rotation types can be stably implemented in floating point using fast QR factorizations.

3.3 Recursive bandwidth halving

The following Algorithm 2 halves the bandwidth of a symmetric matrix using rotations. Its complexity and stability properties are stated in the full version [79, Lemma E.3]. Applying this algorithm recursively gives the main Theorem 3.2.

Algorithm 2 Halves the bandwidth of a Hermitian matrix with unitary rotations.
Theorem 3.2.

There exists a floating point implementation of the tridiagonal reduction algorithm of [75], which takes as input a Hermitian matrix 𝐀, and returns a tridiagonal matrix 𝐓~, and (optionally) an approximately unitary matrix 𝐐~. If the machine precision u satisfies uϵ1cnβ+4, where ϵ(0,1), and c,β are constants that are independent of n, which translates to O(log(n)+log(1/ϵ)) bits of precision, then the following hold:

𝐐~𝐐~𝐈ϵ,and𝐀𝐐~𝐓~𝐐~ϵ𝐀.

The algorithm executes at most:

  • O(n2Sω(log(n))) floating point operations to return only 𝐓~, where Sx(m)=l=1m(2x2)l.

  • If 𝐀 is banded with 1dn bands, the floating point operations reduce to O(n2Sω(log(d)).

  • If 𝐐~ is also returned, the complexity increases to O(n2Cω(log(n))), where Cx(n):=k=2log(n)2(Sx(log(n)1)Sx(k)).

If ω is treated as a constant ω2.371, the corresponding complexities are O(nω),O(n2dω2), and O(nωlog(n)), respectively.

Proof.

The proof can be found in the full version [79, Theorem E.3, Appendix E].

3.4 Eigenvalues of Hermitian matrices

We now have all the prerequisites to compute the eigenvalues Hermitian matrices in nearly matrix multiplication time in finite precision. For this we can use the eigenvalue solver of [13, 15], which has O~(n2) boolean complexity, albeit in the Boolean RAM model, instead of floating point. The algorithm accepts as input symmetric tridiagonal matrices with bounded integer entries.

Theorem 3.3 (Imported from [13, 15]).

Let 𝐓 be a symmetric tridiagonal matrix with integer elements bounded in magnitude by 2m for some m. Let ϵ=2u(0,1) be a desired accuracy. Algorithm 4.1 of [13] computes a set of approximate eigenvalues λ~i (which are returned as rationals) such that |λ~iλi(𝐓)|<ϵ. The algorithm requires O(n2blog2(n)log(nb)(log2(b)+log(n))log(log(nb))) boolean operations, where b=m+u.

Theorem 3.4.

Let 𝐀 be a (banded) Hermitian matrix, with 𝐀1, 1dn1 off-diagonals, and let ϵ(0,1) be an accuracy parameter. Assume that the elements of 𝐀 are floating point numbers on a machine with precision u, t=log(1/u) bits for the significand, and p=O(log(log(n))) bits for the exponent. There exists an algorithm that returns a set of n approximate eigenvalues λ~1,,λ~n such that

|λ~iλi(𝐀)|ϵ,

using at most O(n2Sω(log(d))(log(nϵ))+n2polylog(nϵ)) boolean operations, where (b) is the bit complexity of a floating point operation on b bits, and n2Sω(log(d))=O(n2dω2) if ω is treated as a constant greater than two.

Proof.

First, recall that each element of a floating point matrix 𝐀 is represented by a floating point number 𝐀i,j=±1×2ei,j×mi,j, where ei,j[2p,2p] is its exponent, p is the number of exponent bits, and mi,j is the significand, which is an integer with t=log(1/u) bits. To represent all the elements of 𝐀 with normalized floating point numbers it is sufficient to use p=O(log(log(A))), where A=max{𝐀max,1𝐀min}. By assumption, 𝐀max𝐀1. It is common to assume that 1𝐀minpoly(n), which means that O(log(log(n))) bits for the exponent are sufficient.

To compute the eigenvalues of 𝐀 we work as follows. We first run the algorithm of Theorem 3.2 to reduce 𝐀 to tridiagonal form 𝐓~, with parameter ϵ and O(log(nϵ)) bits of precision, and a total of O(n2Sω(log(d))) floating point operations, which is equal to O(n2dω2) if ω is treated as a constant. It holds that

|λi(𝐓~)λi(𝐀)|ϵ𝐀ϵ𝐀n22p+1ϵ,

where in the last inequality we used that 𝐀n𝐀maxn22p+1.

Therefore, it suffices to approximate the eigenvalues of 𝐓~. For this we first transform 𝐓~ to a tridiagonal matrix with integer values and use the algorithm of [13] to compute its eigenvalues. A floating point matrix can be transformed to one with integer entries as follows. Assuming that we have p bits for the floating point exponent, and t=log(1/u) bits for the significand, the floating point numbers with O(p+t) bits can be transformed to integers of O(22p+t) bits by multiplying all the elements with 22p+t. This is achieved by first allocating a tridiagonal matrix with 3n2 integers with O(22p+t)=O(log2(n)+log(n/ϵ)) bits each, that are initially set to zero. We then visit each floating point element 𝐓~i,j of the original matrix, which is represented as a number ±1×2ei,j×mi,j, where ei,j[2p,2p] is its exponent and mi,j is the significand, which is an integer with t bits. We copy the t bits of mi,j at the positions ei,j+2p,ei,j+2p+1,,ei,j+2p+t of the corresponding element of the new matrix. This takes O(n(22p+t))=O(n(log2(n)+log(n/ϵ))) boolean operations in total to allocate the new matrix and copy the elements.

Now that we have an integer valued symmetric tridiagonal matrix 𝐓, with O(22p+t)=O(log2(n)+log(n/ϵ)) bits per element, we can compute its eigenvalues up to additive accuracy ϵ with Theorem 3.3. Specifically, we set ϵ′′=ϵ22p+t, and run the algorithm with the matrix 𝐓 as input and u=log(1ϵ′′). Let

b=u+m=log(1ϵ′′)+2p+1+t=log(1ϵ22p+t)+log(22p+1+t)=O(log(nϵ)),

where the last inequality follows from the assumption that p=O(log(log(n))). The returned eigenvalues λi satisfy

|λiλi(𝐓)|ϵ′′|λi22p+tλi(𝐓~)|ϵ.

Therefore, if we set λ~i=λi22p+t, we finally obtain that |λ~iλi(𝐀)|2ϵ. Rescaling ϵ gives the final bound. The algorithm runs in

B=O(n2blog2(n)log(nb)(log2(b)+log(n))log(log(nb)))
=O(n2log(nϵ)log2(n)log(nlog(nϵ))[log2(log(nϵ))+log(n)]log(log(nlog(nϵ))))

boolean operations.

4 Conclusion

In this work we provided a deterministic complexity analysis for Hermitian eigenproblems. In the Real RAM model, we reported nearly-linear complexity upper bounds for the full diagonalization of arrowhead and tridiagonal matrices, and nearly matrix multiplication-type complexities for diagonalizing Hermitian matrices and for the SVD. This was achieved by analyzing the divide-and-conquer algorithm of [46], when implemented with the Fast Multipole Method. We also showed that the tridiagonal reduction algorithm of [75] is numerically stable in the floating point model. This paved the way to obtain improved deterministic boolean complexities for computing the eigenvalues, singular values, spectral gaps, and spectral projectors, of Hermitian matrices and Hermitian-definite pencils. Some interesting questions for future research are the following:

  1. 1.

    Stability of arrowhead diagonalization: The FMM-accelerated arrowhead diagonalization algorithm was only analyzed in the Real RAM model. Several works [46, 83, 66, 21] have provided stabilization techniques of related algorithms in floating point, albeit, without an end-to-end complexity analysis. Such an analysis will be insightful to better understand the boolean complexity of (Hermitian) diagonalization.

  2. 2.

    Condition number in the SVD complexity: The complexity of the SVD in Theorem 1.3 has a polylogarithmic dependence on the condition number. Frustratingly, we were not able to remove it at the time of this writing.

  3. 3.

    Non-Hermitian diagonalization: Schönhage also proved that a non-Hermitian matrix can be reduced to upper Hessenberg form in matrix multiplication time [75]. In this work we only provided the stability analysis for the Hermitian case. It would be interesting to investigate whether the Hessenberg reduction algorithm can be used to diagonalize non-Hermitian matrices in matrix multiplication time deterministically (e.g. in conjunction with [8, 7, 9]).

  4. 4.

    Deterministic pseudospectral shattering: One of the main techniques of the seminal work of [6] is called “pseudospectral shattering.” The main idea is to add a tiny random perturbation to the original matrix to ensure that the minimum eigenvalue gap between any pair of eigenvalues will be at least polynomial in 1/n. We highlight that the arrowhead deflation preprocessing step has this precise effect: the pseudospectrum is shattered with respect to a deterministic grid. Can this be generalized to obtain deterministic pseudospectral shattering techniques, for Hermitian or even non-Hermitian matrices?

References

  • [1] Josh Alman, Ran Duan, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. More asymmetry yields faster matrix multiplication. In Proc. 2025 ACM-SIAM Symposium on Discrete Algorithms, pages 2005–2039. SIAM, 2025. doi:10.1137/1.9781611978322.63.
  • [2] Alexandr Andoni and Huy L Nguyen. Eigenvalues of a matrix in the streaming model. In Proc. 24th ACM-SIAM Symposium on Discrete Algorithms, pages 1729–1737. SIAM, 2013. doi:10.1137/1.9781611973105.124.
  • [3] Diego Armentano, Carlos Beltrán, Peter Bürgisser, Felipe Cucker, and Michael Shub. A stable, polynomial-time algorithm for the eigenpair problem. Journal of the European Mathematical Society, 20(6):1375–1437, 2018.
  • [4] Grey Ballard, James Demmel, and Nicholas Knight. Communication avoiding successive band reduction. ACM SIGPLAN Notices, 47(8):35–44, 2012. doi:10.1145/2145816.2145822.
  • [5] Grey Ballard, James Demmel, and Nicholas Knight. Avoiding communication in successive band reduction. ACM Transactions on Parallel Computing, 1(2):1–37, 2015. doi:10.1145/2686877.
  • [6] Jess Banks, Jorge Garza-Vargas, Archit Kulkarni, and Nikhil Srivastava. Pseudospectral Shattering, the Sign Function, and Diagonalization in Nearly Matrix Multiplication Time. Foundations of Computational Mathematics, pages 1–89, 2022.
  • [7] Jess Banks, Jorge Garza-Vargas, and Nikhil Srivastava. Global Convergence of Hessenberg Shifted QR II: Numerical Stability. arXiv, 2022. doi:10.48550/arXiv.2205.06810.
  • [8] Jess Banks, Jorge Garza-Vargas, and Nikhil Srivastava. Global convergence of Hessenberg Shifted QR I: Exact Arithmetic. Foundations of Computational Mathematics, pages 1–34, 2024.
  • [9] Jess Banks, Jorge Garza-Vargas, and Nikhil Srivastava. Global Convergence of Hessenberg Shifted QR III: Approximate Ritz Values via Shifted Inverse Iteration. SIAM Journal on Matrix Analysis and Applications, 46(2):1212–1246, 2025.
  • [10] Jesse L Barlow. Error analysis of update methods for the symmetric eigenvalue problem. SIAM Journal on Matrix Analysis and Applications, 14(2):598–618, 1993. doi:10.1137/0614042.
  • [11] Michael Ben-Or and Lior Eldar. A Quasi-Random Approach to Matrix Spectral Analysis. In Proc. 9th Innovations in Theoretical Computer Science Conference, pages 6:1–6:22. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPICS.ITCS.2018.6.
  • [12] Rajendra Bhatia. Perturbation bounds for matrix eigenvalues. SIAM, 2007.
  • [13] Dario Bini and Victor Pan. Parallel complexity of tridiagonal symmetric eigenvalue problem. In Proc. 2nd ACM-SIAM Symposium on Discrete Algorithms, pages 384–393, 1991. URL: http://dl.acm.org/citation.cfm?id=127787.127855.
  • [14] Dario Bini and Victor Pan. Practical improvement of the divide-and-conquer eigenvalue algorithms. Computing, 48(1):109–123, 1992. doi:10.1007/BF02241709.
  • [15] Dario Bini and Victor Y Pan. Computing matrix eigenvalues and polynomial zeros where the output is real. SIAM Journal on Computing, 27(4):1099–1115, 1998. doi:10.1137/S0097539790182482.
  • [16] Christian H Bischof, Bruno Lang, and Xiaobai Sun. A framework for symmetric band reduction. ACM Transactions on Mathematical Software, 26(4):581–601, 2000. doi:10.1145/365723.365735.
  • [17] D Boley and Gene Howard Golub. Inverse eigenvalue problems for band matrices. In Numerical Analysis: Proceedings of the Biennial Conference Held at Dundee, June 28–July 1, 1977, pages 23–31. Springer, 1977.
  • [18] Christos Boutsidis and David P Woodruff. Optimal CUR matrix decompositions. In Proc. 46th ACM Symposium on Theory of Computing, pages 353–362, 2014. doi:10.1145/2591796.2591819.
  • [19] Christos Boutsidis, David P Woodruff, and Peilin Zhong. Optimal principal component analysis in distributed and streaming models. In Proc. 48th ACM Symposium on Theory of Computing, pages 236–249, 2016. doi:10.1145/2897518.2897646.
  • [20] Christos Boutsidis, Anastasios Zouzias, Michael W Mahoney, and Petros Drineas. Randomized dimensionality reduction for k-means clustering. IEEE Transactions on Information Theory, 61(2):1045–1062, 2014. doi:10.1109/TIT.2014.2375327.
  • [21] Difeng Cai and Jianlin Xia. A stable matrix version of the fast multipole method: stabilization strategies and examples. ETNA-Electronic Transactions on Numerical Analysis, 54, 2020.
  • [22] Kenneth L Clarkson and David P Woodruff. Low-rank approximation and regression in input sparsity time. Journal of the ACM, 63(6):1–45, 2017.
  • [23] Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proc. 47th ACM Symposium on Theory of Computing, pages 163–172, 2015. doi:10.1145/2746539.2746569.
  • [24] Jan JM Cuppen. A divide and conquer method for the symmetric tridiagonal eigenproblem. Numerische Mathematik, 36:177–195, 1980.
  • [25] TJ Dekker and JF Traub. The shifted QR algorithm for Hermitian matrices. Linear Algebra and its Applications, 4(3):137–154, 1971.
  • [26] James Demmel, Ioana Dumitriu, and Olga Holtz. Fast linear algebra is stable. Numerische Mathematik, 108(1):59–91, 2007. doi:10.1007/S00211-007-0114-X.
  • [27] James Demmel, Ioana Dumitriu, Olga Holtz, and Robert Kleinberg. Fast matrix multiplication is stable. Numerische Mathematik, 106(2):199–224, 2007. doi:10.1007/S00211-007-0061-6.
  • [28] James Demmel, Ioana Dumitriu, and Ryan Schneider. Fast and inverse-free algorithms for deflating subspaces. arXiv, 2024. arXiv:2310.00193.
  • [29] James Demmel, Ioana Dumitriu, and Ryan Schneider. Generalized Pseudospectral Shattering and Inverse-Free Matrix Pencil Diagonalization. Foundations of Computational Mathematics, pages 1–77, 2024.
  • [30] James Demmel and Krešimir Veselić. Jacobi’s method is more accurate than QR. SIAM Journal on Matrix Analysis and Applications, 13(4):1204–1245, 1992. doi:10.1137/0613074.
  • [31] James W Demmel. Applied numerical linear algebra. SIAM, 1997. doi:10.1137/1.9781611971446.
  • [32] Inderjit Singh Dhillon. A new O(N2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. University of California, Berkeley, 1997.
  • [33] Jack Dongarra and Francis Sullivan. Guest Editors Introduction to the top 10 algorithms. Computing in Science & Engineering, 2(01):22–23, 2000. doi:10.1109/MCISE.2000.814652.
  • [34] Jack J Dongarra and Danny C Sorensen. A fully parallel algorithm for the symmetric eigenvalue problem. SIAM Journal on Scientific and Statistical Computing, 8(2):s139–s154, 1987.
  • [35] Petros Drineas, Michael W Mahoney, and Shan Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844–881, 2008. doi:10.1137/07070471X.
  • [36] Jeff Erickson, Ivor Van Der Hoog, and Tillmann Miltzow. Smoothing the gap between NP and ER. SIAM Journal on Computing, 0(0):FOCS20–102–FOCS20–138, 2022.
  • [37] John GF Francis. The QR transformation a unitary analogue to the LR transformation—Part 1. The Computer Journal, 4(3):265–271, 1961. doi:10.1093/COMJNL/4.3.265.
  • [38] John GF Francis. The QR transformation—Part 2. The Computer Journal, 4(4):332–345, 1962.
  • [39] Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast Monte-Carlo algorithms for finding low-rank approximations. Journal of the ACM, 51(6):1025–1041, 2004. doi:10.1145/1039488.1039494.
  • [40] Martin Fürer. Faster integer multiplication. In Proc. 39th ACM Symposium on Theory of Computing, pages 57–66, 2007. doi:10.1145/1250790.1250800.
  • [41] Doron Gill and Eitan Tadmor. An O(N2) Method for Computing the Eigensystem of N×N Symmetric Tridiagonal Matrices by the Divide and Conquer Approach. SIAM Journal on Scientific and Statistical Computing, 11(1):161–173, 1990. doi:10.1137/0911010.
  • [42] Gene Golub and William Kahan. Calculating the singular values and pseudo-inverse of a matrix. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 2(2):205–224, 1965.
  • [43] Gene H Golub and Henk A Van der Vorst. Eigenvalue computation in the 20th century. Journal of Computational and Applied Mathematics, 123(1-2):35–65, 2000.
  • [44] Gene H Golub and Charles F Van Loan. Matrix Computations. Johns Hopkins University Press, 2013.
  • [45] Ming Gu and Stanley C Eisenstat. A stable and fast algorithm for updating the singular value decomposition, 1993.
  • [46] Ming Gu and Stanley C Eisenstat. A divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem. SIAM Journal on Matrix Analysis and Applications, 16(1):172–191, 1995. doi:10.1137/S0895479892241287.
  • [47] Ming Gu and Stanley C Eisenstat. Efficient algorithms for computing a strong rank-revealing qr factorization. SIAM Journal on Scientific Computing, 17(4):848–869, 1996. doi:10.1137/0917055.
  • [48] Juris Hartmanis and Janos Simon. On the power of multiplication in random access machines. In 15th Annual Symposium on Switching and Automata Theory, pages 13–23. IEEE, 1974. doi:10.1109/SWAT.1974.20.
  • [49] David Harvey and Joris Van Der Hoeven. Integer multiplication in time O(nlogn). Annals of Mathematics, 193(2):563–617, 2021.
  • [50] Nicholas J Higham. Accuracy and stability of numerical algorithms. SIAM, 2002.
  • [51] Walter Hoffmann and Beresford N Parlett. A new proof of global convergence for the tridiagonal QL algorithm. SIAM Journal on Numerical Analysis, 15(5):929–937, 1978.
  • [52] Alston S Householder. Unitary triangularization of a nonsymmetric matrix. Journal of the ACM, 5(4):339–342, 1958. doi:10.1145/320941.320947.
  • [53] C.G.J. Jacobi. Über ein leichtes Verfahren die in der Theorie der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen. Journal für die reine und angewandte Mathematik, 30:51–94, 1846. URL: http://eudml.org/doc/147275.
  • [54] Ian T Jolliffe. Principal component analysis for special types of data. Springer, 2002.
  • [55] Praneeth Kacham and David P Woodruff. Faster Algorithms for Schatten-p Low Rank Approximation. In Proc. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.
  • [56] Vera N Kublanovskaya. On some algorithms for the solution of the complete eigenvalue problem. USSR Computational Mathematics and Mathematical Physics, 1(3):637–657, 1962.
  • [57] Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45(4), 1950.
  • [58] Oren E Livne and Achi Brandt. N roots of the secular equation in O(N) operations. SIAM Journal on Matrix Analysis and Applications, 24(2):439–453, 2002. doi:10.1137/S0895479801383695.
  • [59] Anand Louis and Santosh S Vempala. Accelerated newton iteration for roots of black box polynomials. In Proc. 57th IEEE Symposium on Foundations of Computer Science, pages 732–740. IEEE, 2016. doi:10.1109/FOCS.2016.83.
  • [60] RV Mises and Hilda Pollaczek-Geiringer. Praktische Verfahren der Gleichungsauflösung. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 9(1):58–77, 1929.
  • [61] Cameron Musco, Christopher Musco, and Aaron Sidford. Stability of the Lanczos method for matrix function approximation. In Proc. 29th ACM-SIAM Symposium on Discrete Algorithms, pages 1605–1624. SIAM, 2018. doi:10.1137/1.9781611975031.105.
  • [62] Yuji Nakatsukasa, Zhaojun Bai, and François Gygi. Optimizing Halley’s iteration for computing the matrix polar decomposition. SIAM Journal on Matrix Analysis and Applications, 31(5):2700–2720, 2010. doi:10.1137/090774999.
  • [63] Yuji Nakatsukasa and Nicholas J Higham. Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. SIAM Journal on Scientific Computing, 35(3):A1325–A1349, 2013. doi:10.1137/120876605.
  • [64] Deanna Needell, William Swartworth, and David P Woodruff. Testing positive semidefiniteness using linear measurements. In Proc. 2022 IEEE Symposium on Foundations of Computer Science, pages 87–97. IEEE, 2022. doi:10.1109/FOCS54457.2022.00016.
  • [65] Dianne P O’Leary and Gilbert W Stewart. Computing the eigenvalues and eigenvectors of symmetric arrowhead matrices. Journal of Computational Physics, 90(2):497–505, 1990.
  • [66] Xiaofeng Ou and Jianlin Xia. Superdc: superfast divide-and-conquer eigenvalue decomposition with improved stability for rank-structured matrices. SIAM Journal on Scientific Computing, 44(5):A3041–A3066, 2022.
  • [67] Chris C Paige. Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem. Linear algebra and its Applications, 34:235–258, 1980.
  • [68] Victor Y Pan and Zhao Q Chen. The complexity of the matrix eigenproblem. In Proc. 31st ACM Symposium on Theory of Computing, pages 507–516, 1999. doi:10.1145/301250.301389.
  • [69] Christos H Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: A probabilistic analysis. In Proc. 17th ACM Symposium on Principles of Database Systems, pages 159–168, 1998. doi:10.1145/275487.275505.
  • [70] Beresford N Parlett. The Symmetric Eigenvalue Problem. SIAM, 1998.
  • [71] Vladimir Rokhlin. Rapid solution of integral equations of classical potential theory. Journal of computational physics, 60(2):187–207, 1985.
  • [72] H Rutishauser. On Jacobi rotation patterns. In Proceedings of Symposia in Applied Mathematics, volume 15, pages 219–239, 1963.
  • [73] Yousef Saad. Numerical methods for large eigenvalue problems: revised edition. SIAM, 2011.
  • [74] Ryan Schneider. Pseudospectral divide-and-conquer for the generalized eigenvalue problem. University of California, San Diego, 2024.
  • [75] Arnold Schönhage. Unitäre transformationen großer matrizen. Numerische Mathematik, 20:409–417, 1972.
  • [76] Arnold Schönhage. On the power of random access machines. In Proc. International Colloquium on Automata, Languages, and Programming, pages 520–529. Springer, 1979. doi:10.1007/3-540-09510-1_42.
  • [77] Arnold Schönhage and Volker Strassen. Fast multiplication of large numbers. Computing, 7:281–292, 1971.
  • [78] Rikhav Shah. Hermitian Diagonalization in Linear Precision. In Proc. 2025 ACM-SIAM Symposium on Discrete Algorithms, pages 5599–5615. SIAM, 2025. doi:10.1137/1.9781611978322.192.
  • [79] Aleksandros Sobczyk. Deterministic complexity analysis of Hermitian eigenproblems. arXiv, 2025. doi:10.48550/arXiv.2410.21550.
  • [80] Aleksandros Sobczyk, Marko Mladenovic, and Mathieu Luisier. Invariant subspaces and PCA in nearly matrix multiplication time. Advances in Neural Information Processing Systems, 37:19013–19086, 2024.
  • [81] Nevena Jakovčević Stor, Ivan Slapničar, and Jesse L Barlow. Accurate eigenvalue decomposition of real symmetric arrowhead matrices and applications. Linear Algebra and its Applications, 464:62–89, 2015.
  • [82] William Swartworth and David P Woodruff. Optimal eigenvalue approximation via sketching. In Proc. 55th ACM Symposium on Theory of Computing, pages 145–155, 2023. doi:10.1145/3564246.3585102.
  • [83] James Vogel, Jianlin Xia, Stephen Cauley, and Venkataramanan Balakrishnan. Superfast divide-and-conquer method and perturbation analysis for structured eigenvalue solutions. SIAM Journal on Scientific Computing, 38(3):A1358–A1382, 2016. doi:10.1137/15M1018812.
  • [84] James Hardy Wilkinson. Global convergene of tridiagonal QR algorithm with origin shifts. Linear Algebra and its Applications, 1(3):409–420, 1968.