Abstract 1 Introduction 2 Preliminaries and Notation 3 Decoupling Inequalities 4 Multilinear Polynomial Random Matrices 5 Gaussian Polynomial Random Matrices 6 Application: Graph Matrices References

Simple Norm Bounds for Polynomial Random Matrices via Decoupling

Madhur Tulsiani Toyota Technological Institute at Chicago, IL, USA June Wu University of Chicago, IL, USA
Abstract

We present a new method for obtaining norm bounds for random matrices, where each entry is a low-degree polynomial in an underlying set of independent real-valued random variables. Such matrices arise in a variety of settings in the analysis of spectral and optimization algorithms, which require understanding the spectrum of a random matrix depending on data obtained as independent samples.

Using ideas of decoupling and linearization from analysis, we show a simple way of expressing norm bounds for such matrices, in terms of matrices of lower-degree polynomials corresponding to derivatives. Iterating this method gives a simple bound with an elementary proof, which can recover many bounds previously required more involved techniques.

Keywords and phrases:
Matrix Concentration, Decoupling, Graph Matrices
Funding:
Madhur Tulsiani: NSF grants CCF-1816372 and CCF-2326685.
Copyright and License:
[Uncaptioned image] © Madhur Tulsiani and June Wu; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Probability and statistics
; Theory of computation Theory and algorithms for application domains
Editors:
Raghu Meka

1 Introduction

For fixed matrices 𝐂𝟏,,𝐂𝐧 and independent scalar random variables x1,,xn, consider the problem of analyzing the random matrix

𝐌=𝐂𝟏x1++𝐂𝐧xn.

Note that the entries of the random matrix 𝐌 are not necessarily independent, but are (possibly correlated) linear functions of the independent random variables x1,,xn. Such matrices, which arise in a variety of applications in algorithms, statistics, and numerical linear algebra, can often be shown as being concentrated around the mean, using a rich selection of matrix deviation inequalities [34]. Moreover, when the random variables x1,,xn are Gaussian, recent breakthrough results have obtained even sharper concentration guarantees depending on the structure of the matrices 𝐂𝟏,,𝐂𝐧 [5, 9] already leading to applications in discrepancy theory [6] and quantum information theory [20].

In contrast to the above random matrices which are linear functions of independent random variables, several recent applications in spectral algorithms and lower bounds for statistical problems require understanding the (expected) norms of random matrices of the form

𝐅=S[n]|S|d𝐂SiSxi.

The above random matrix, which denote as a matrix-valued function 𝐅(𝐱), is a (multilinear, in the example above) low-degree polynomial in the vector of random variables 𝐱=(x1,,xn). Such matrix-valued polynomial functions arise naturally in a variety of applications, including algorithms for tensor decomposition and completion [13, 12, 18, 10], orbit recovery [24], power-sum decompositions and learning of Gaussian mixtures [4], and lower bounds for Sum-of-Squares hierarchies [2, 7, 16, 28]. In general, such matrices are often an important technical component in the analysis of spectral algorithms, which require understanding the spectrum of some random matrix depending on data obtained as independent samples. Given the expressive power of polynomials, one can often write the matrix entries as low-degree polynomials in the data.

In several applications above, one is interested in obtaining bounds on the spectral norm of the deviation matrix 𝐅(𝐱)𝔼𝐅(𝐱), which hold with high probability over the choice of 𝐱. Although matrix-deviation inequalities for nonlinear random matrices have been a subject of active research in recent work [26, 22, 3, 14, 15], the analyses in the applications above have often needed to rely on estimating matrix norms via direct computations of trace moments. While these do yield sharp bounds required for applications, they require somewhat intricate computations and ingenious combinatorial arguments tailored to the applications at hand.

Analyzing concentration via moment estimates.

We consider concentration bounds on a scalar polynomial f(𝐱) with mean zero. Using Markov’s inequality, this can be reduced to computing moment estimates, since

[|f(𝐱)|λ]=[(f(𝐱))2tλ2t]λ2t𝔼[(f(𝐱))2t]

Note that while in some cases 𝔼[(f(𝐱))2t] can be computed by direct expansion, it often involves an intricate analysis of the structure of terms with degrees growing with t, and therefore indirect methods may be more convenient. Methods for obtaining concentration of (scalar) polynomial functions have been the subject of a large body of work, including results by Kim and Vu [17], Latała [21], Schudy and Sviridenko [31, 32], Adamczak and Wolff [1], and Bobkov, Götze and Sambale [8]. A useful comparison to direct computation, is the method of Adamczak and Wolff, which is applicable to functions f(𝐱) of a broad range of random vectors 𝐱 (with not necessarily independent coordinates) from a distribution obeying the Poincaré inequality. They obtain such moment bounds by recursive applications of the Poincaré inequality, reducing the computation of the moments of f to that of its derivatives. For a degree-d polynomial function f, one can consider its coefficients as forming a (symmetric) order-d tensor, and their results obtain bounds on the moments of f in terms of the norms of various lower-order tensors “flattening” of this coefficient tensor. Thus, the problem of estimating moments of the random variable f(𝐱), can be reduced in a black-box way, to understanding the norms of a small number (depending on d) of deterministic tensors. Any dependence on the problem structure can then be limited to understanding these norms, where often crude estimates can suffice.

The matrix analog of the Markov argument involves the Schatten-2t norm .2t, which is defined for a matrix 𝐌 with non-zero singular values σ1,,σr as 𝐌2t2t:=j[r]σj2t. For a function 𝐅 with 𝔼[𝐅(Z)]=𝟎, we have the following bound using Schatten norms.

[σ1(𝐅)λ]λ2t𝔼𝐅2t2t=λ2t𝔼tr[(𝐅(𝐱)𝐅(𝐱))t]

Several applications requiring norm bounds for matrix-valued polynomial functions, start with the above inequality, and often rely on direct expansion of the trace for the matrix power (𝐅(𝐱)𝐅(𝐱))t. Recall that when 𝐀 is the adjacency matrix of a graph, tr(𝐀2t) can be interpreted as the number of cycles of length 2t. Similarly, when working with matrices 𝐅(𝐱) where each entry can be interpreted as arising from a combinatorial pattern, (𝐅(𝐱)𝐅(𝐱))t can be viewed as the number of copies of such patterns “chained” together in a certain way. Estimating the trace then amounts to estimating the (expected) number of such chainings [2].

Obtaining bounds on the 𝔼𝐅2t2t via this method requires first using problem structure to decompose 𝐅 into matrices with such combinatorial patterns, and then using (ingenious) combinatorial arguments to understand the expected number of occurrences of such chains of patterns. The focus of this work is understanding alternative general methods for estimating such norm bounds, when the underlying problem may not necessarily have a combinatorial flavor, or when such decompositions may be hard to analyze.

Decoupling.

Decoupling inequalities were developed in the study of U-statistics [27], multiple stochastic integration [23], and polynomial chaos [19], and have found important applications in applied mathematics, theoretical computer science, applied probability theory and statistics. Some examples include the study of compressed sensing [30], query complexity [25], the proof of Hanson-Wright inequality [35], learning mixture of Gaussians [11] and so on. Moreover, the inequalities are applicable in both scalar and matrix settings, and even more broadly in Banach spaces.

For a degree-d homogeneous multilinear polynomial f(𝐱) in n independent random variables, standard decoupling inequalities (see Section 3) can be used to obtain

𝔼𝐱[(f(𝐱))2t]Cd2t𝔼𝐱(1),,𝐱(d)[(f(𝐱(1),,𝐱(d)))2t],

where f(𝐱(1),,𝐱(d)) denotes the polynomial in dn random variables obtained by replacing the d variables in each (ordered) monomial by the corresponding d coordinates coming from the d independent random vectors 𝐱(1),,𝐱(d), and Cd is a constant depending on the degree d. One can now fix the vectors 𝐱(1),,𝐱(d1), and consider the expectation

𝔼𝐱(d)[(f(𝐱(1),,𝐱(d)))2t]=𝔼𝐱(d)[(i[n]fi(𝐱(1),,𝐱(d1))xi(d))2t],

where we write f(𝐱(1),,𝐱(d)) as a linear form in 𝐱(d) with coefficients fi(𝐱(1),,𝐱(d1)). The moments of such a linear form can be understood (for example) using the Khintchine inequality, which bounds it in terms of a polynomial depending only on 𝐱(1),,𝐱(d1).

Similarly, in the matrix case, we use decoupling inequalities to reduce the problem of estimating moments for degree-d homogeneous (and multilinear) matrix-valued polynomials, to that of estimating moments for linear matrix-valued functions. At this point, one can apply standard and well-known linear matrix concentration inequalities (we use the matrix Rosenthal inequality) which yield a bound in terms of a degree-(d1) matrix-valued polynomial function of the vectors 𝐱(1),,𝐱(d1). Iterating this process gives a simple method for obtaining norm bounds, applicable for multilinear polynomial functions on n independent random variables (with any reasonable distribution).

While this method does not immediately apply for non-multilinear polynomials, it suffices for many of the applications mentioned above. Moreover, for arbitrary polynomials in Gaussian random variables, we can also obtain a simple recursion as an immediate consequence of the more recent Poincaré inequalities of Huang and Tropp [15]. Together, these suffice for most applications of interest.

Methods and results: a technical overview

Let x1,,xn be i.i.d real-valued random variables with 𝔼[xi]=0,𝔼[xi2]=1 and |xi|L for each i[n]. Let 𝒯nd[n]d denote the set of ordered d-tuples 𝐢=(i1,,id) with i1,,id all distinct. For a fixed sequence of (deterministic) matrices {𝐀𝐢}𝐢𝒯ndd1×d2, we consider a matrix-valued multilinear polynomial function defined as

𝐅(𝐱)=𝐢𝒯nd𝐀𝐢j[d]xij.

We write the polynomial in terms of ordered tuples for decoupling, but since the variables x1,,xn commute, we can assume that for any permutation σ:[d][d] permuting the d coordinates of each tuple, 𝐀𝐢=𝐀σ(𝐢) (this is referred to as permutation symmetry of 𝐅). We can use the decoupling inequalities from Section 3 to show that

𝐅(𝐱)4tCd𝐅(𝐱(1),,𝐱(d))4t=Cd𝐢𝒯nd𝐀𝐢j[d]xij(j)4t,

where 4t denotes the (expected) Schatten norm (𝔼[tr(𝐅(𝐱)𝖳𝐅(𝐱))2t])1/4t.

We remark that while the constant Cd is of the form dd in general, this can be improved when the underlying set of indices [n] has more structure. For example, when [n] corresponds to the set of pairs in an base set [r] and each of the monomials involves at most k elements of [r] (has index degree k), it is possible to improve the constant to kk. This is essentially the “vertex partitioning” argument used in works on graph matrices, and is discussed in Section 3 in the language of decoupling.

As mentioned earlier, we can now “linearize” the function 𝐅(𝐱(1),,𝐱(d)) by fixing 𝐱(d),,𝐱(d1) and treating it only as a function of 𝐱(d) i.e., we consider

𝔼𝐢𝒯nd𝐀𝐢j[d]xij(j)4t4t =𝔼𝐱(1),,𝐱(d1)𝔼𝐱(d)k[n](𝐢𝒯ndid=k𝐀𝐢j[d1]xij(j))xk(d)4t4t
=𝔼𝐱(1),,𝐱(d1)𝔼𝐱(d)k[n](xk(d)𝐅)xk(d)4t4t.

To bound the inner expectation, we can now use the matrix Rosenthal inequality (Lemma 8) which says that for a collection of centered, independent random matrices {𝐘k} with finite moments, we have

𝔼k𝐘k4t4t(16t)3t{(k𝔼𝐘k𝐘k𝖳)1/24t4t+(k𝔼𝐘k𝖳𝐘k)1/24t4t}+(8t)4t(k𝔼𝐘k4t4t).

Taking 𝐘k=xk(d)𝐅xk(d) we can now compute the expectations (over 𝐱(d)) in the RHS as

k𝔼𝐘k𝐘k𝖳 =[x1(d)𝐅xn(d)𝐅][x1(d)𝐅xn(d)𝐅]𝖳
k𝔼𝐘k𝖳𝐘k =[x1(d)𝐅𝖳xn(d)𝐅𝖳][x1(d)𝐅𝖳xn(d)𝐅𝖳]𝖳
k𝔼𝐘k4t4t =𝔼[x1(d)x1(d)𝐅xn(d)xn(d)𝐅]4t4tL4t[x1(d)𝐅xn(d)𝐅]4t4t

Iterating the above argument, we can prove a bound in terms of a collection of partial derivative matrices 𝐅a,b,c (see Section 2 for details).

Note that in the above definition, we do not specify the order of differentiation, which can be ignored due to the permutation symmetry of 𝐅 and the fact that changing the order does not change the Schatten norms (see Section 2 and Section 4 for details). This argument yields the following bound for all homogeneous multilinear polynomial functions.

Theorem 1 (Restatement of Theorem 10).

Let 𝐱={xi}i=1n be a sequence of i.i.d random variables with 𝔼xi=0, 𝔼xi2=1 and |xi|L for all 1in. Let {𝐀𝐢}𝐢𝒯nd be a multi-indexed sequence of deterministic matrices of the same dimension. Define a permutation symmetric, homogeneous multilinear polynomial random matrix of degree d as

𝐅(𝐱)=𝐢𝒯nd(𝐀𝐢j{i1,,id}xj).

Let a,b,c0 and d=a+b+c. Then for 2t,

𝔼𝐅(𝐱)4t4ta,b,c:a+b+c=d(48dt)4dtL4ct𝐅a,b,c4t4t.

We remark that the condition that |xi|L can be replaced by a condition on the growth of moments of xi, which is true for subgaussian variables. The above bound is in terms of the norms of a constant number of deterministic matrices 𝐅a,b,c, as is also the case for general moment bounds on scalar polynomials [1]. We will see later that these deterministic matrices can easily be interpreted in several cases of interest, to recover the bounds obtained via combinatorial methods.

Gaussian polynomial matrices.

While the above bounds are only for multilinear polynomials, they can also be extended to arbitrary polynomials of independent Gaussian random variables, using standard techniques to approximate them by multilinear polynomials. However, for the case of Gaussian polynomials, the recent Poincaré inequalities of Huang and Tropp [14] directly yield a simple bound, which is easier to apply. They show that for a Hermitian matrix-valued function 𝐇 of Gaussian valued random variables

𝔼𝐇(𝐱)𝔼𝐇(𝐱)2t2t(2t)2t𝔼tr(i=1n(i𝐇(𝐱))2)t.

As before, one can apply this argument inductively to obtain a bound in terms of the matrices 𝐅a,b which are defined similarly as the matrices 𝐅a,b,c before, with c=0. Since we no longer have 𝔼𝐅a,b=𝟎 for a+b<d, the bound is in terms of the expected matrices 𝔼𝐅a,b for all a,b with a+bd.

Theorem 2 (Restatement of Theorem 14).

Let 𝐱𝒩(0,𝕀n) and {𝐀𝐢}𝐢[n]d be a sequence of deterministic matrices of the same dimension. Define a degree-d homogeneous Gaussian polynomial random matrix as

𝐅(𝐱)=𝐢[n]d(𝐀𝐢j{i1,,id}xj).

Let a,b0. Then for 2t,

𝔼𝐅(𝐱)𝔼𝐅(𝐱)2t2t(2d2t)2t(1a+bd𝔼𝐅a,b2t2t).

While the above theorem is stated for homogeneous polynomials, the proof is identical also for the non-homogeneous ones, with the sum in the definition being over all tuple sizes.

Do try this at home: applying the framework

We now discuss how to interpret the matrices 𝐅a,b,c arising in the bound in Theorem 1 to recover the norm bounds for graph matrices derived via combinatorial methods. This is discussed with more formal details in Section 6, but we present an intuitive (at least for the authors) version of the argument here.

Graph matrices are defined using constant-sized template “shapes” and provide a convenient basis for expressing (large) random matrices where the the entries are low-degree polynomials in the (normalized) indicators of edges in a Gn,p random graph. Such matrices and their norm bounds have been used for a large number of applications [2].

Let N=(n2) and fix a canonical bijection between the space [N] and ([n]2), and let [N] index the space of all possible edges in a random graph on n vertices. For each e={i,j}([n]2), for Δp=(1p)/p let xe be independently 1/Δp with probability (1p) and Δp with probability p, so that 𝔼xe=0, 𝔼xe2=1, and |xe|Δp.

Let (V(τ),E(τ)) be a graph on k vertices with d edges, and let Uτ,Vτ be ordered subsets of V(τ) of size k1 and k2 respectively. The tuple τ=(V(τ),E(τ),Uτ,Vτ) is referred to as a “shape” and is used to define the following graph matrix 𝐌τ with rows and columns indexed by 𝒯nk1 and 𝒯nk2 respectively

𝐌τ[𝐢,𝐣]:=ψ𝒯nk𝟙{ψ(Uτ)=𝐢}𝟙{ψ(Vτ)=𝐣}e0E(τ)xψ(e0),

where we interpret a tuple ψ𝒯nk as an injective function ψ:V(τ)[n] in the canonical way. If the function ψ maps Uτ to 𝐢 (the row-index) and Vτ to 𝐣 (the column index), we add the monomial to 𝐌τ[𝐢,𝐣] corresponding to the product of the images of all edges in the “pattern” E(τ) under ψ. Note that each nonzero entry of the matrix-valued function 𝐌 is a multilinear homogeneous polynomial in the variables {xe}e([n]2) with degree d=|E(τ)|. We denote 𝐌τ by 𝐅 for ease of notation in the discussion below.

Figure 1: A shape τ.

As will be apparent from the analysis below, the norm bounds for such matrices behave different characterizations in the “dense” regime where p=Ω(1) and the “sparse” one where p=o(1). This is because the subgaussian norm L=Δpp1/2 is bounded in the first case, and growing in the second case. The bounds for the first case are stated in terms of the size the minimum vertex separator separating Uτ from Vτ in the shape τ. For the second case, one needs to augment the defintion of minimum vertex separator, to use a cost based on the subgaussian norm Δp. We will show how to recover both these characterizations, using the matrices 𝐅a,b,c given by Theorem 1.

Dense graph matrices.

We start by considering p=1/2 so that the random variables xe are just independent Rademacher variables. Taking d to be constant, the bound in Theorem 1 involves only a constant number of matrices, and will depend on the dominant term. To understand each of the terms, we first consider the matrices derived after one step of the recursion.

𝐅1,0,0=[x1𝐅xN𝐅]𝖳,𝐅0,1,0=[x1𝐅xN𝐅]and𝐅0,0,1=[x1𝐅xN𝐅].

Since tr(i𝐀i𝐀i𝖳)2t+tr(i𝐀i𝖳𝐀i)2titr(𝐀i𝐀i𝖳)2t and L=1, we will have that 𝐅1,0,04t4t+𝐅0,1,04t4tL4t𝐅0,0,14t4t. This will also be true (up to constant factors) for any constant L, since we interested in the 4t-th root of the above trace. Using a similar argument, we can ignore terms with c>0 for now (we will come back to these in the sparse case) and focus on matrices 𝐅a,b,0.

We now interpret the matrices 𝐅1,0,0 and 𝐅0,1,0. The row-space of 𝐅1,0,0 is now indexed by (𝐢,e) for e([n]2) (respectively (𝐣,e) for the column space of 𝐅0,1,0). We now include terms corresponding to maps ψ for which ψ(Uτ)=𝐢, ψ(Vτ)=𝐣 and ψ(e0)=e for some e0E(τ) so that xe𝐅[𝐢,𝐣]0.

Decomposing 𝐅1,0,0 into |E(τ)| matrices (one for each choice of e0), we can consider the 𝐅1,0,0,e0 as simply a new graph matrix we have deleted the edge e0 and included both its end points in Uτ (respectively in Vτ for 𝐅0,1,0) to obtain a new shape τ. Similarly, each 𝐅a,b,0 can be decomposed into |E(τ)|a+b graph matrices, where we delete a+b edges in total, and include the endpoints in Uτ for a of them and Vτ for b of them. This is illustrated in Figure 2 where we color the edges red or green depending on their inclusion in Uτ or Vτ.

Figure 2: 𝐅1,0,0, 𝐅0,1,0 and 𝐅2,3,0.

The bound in Theorem 1 is then in terms of these new graph matrices where we delete all d edges, and split their endpoints between Uτ or Vτ to obtain U,VV(τ)=[k]. For an entry 𝐌[𝐢,𝐣] of any such matrix, there is at most one ψ:V(τ)[n] such that ψ(U)=𝐢 and ψ(V)=𝐣 (assuming τ has no isolated vertices), and 𝐌[𝐢,𝐣]=𝟙{ψ(U)=𝐢}𝟙{ψ(V)=𝐣}. Taking S=UV, this matrix can be simply written as a block-diagonal matrix with n|S| of size n|US|×n|VS|. It is easy to check that the spectral norm of such a matrix is n(|US|+|VS|)/2=n(k|S|)/2 since |UV|=k and S=UV.

Thus, each of the terms in the bound will correspond to graph matrices with different U and V (obtained by coloring edges red or green according to the above process) and the dominant term will be the one with the smallest value of |UV|. We now claim that UV must be a vertex separator separating Uτ and Vτ in the shape τ i.e., any path between them must pass through UV. If the path contains both red and green edges, then it contains one vertex with both red and green edges incident on it, which is then in UV (since U is obtain by adding endpoint of red edges to Uτ and V similarly for green edges). If path is entirely red, then we have an endpoint of a red edge in Vτ which is a vertex in UV, and similarly for green paths.

Thus, we have n(k|S|)/2n(kr)/2, where r is the size of the minimum vertex separator between Uτ and Vτ, which recovers the known bound for dense graph matrices [2].

Sparse graph matrices.

The argument for sparse graph matrices is almost the same as above except now we also need to consider terms of the form 𝐅a,b,c for c>0. Just as we interpreted 𝐅a,b,0 as coloring a edges from E(τ) as red and including in UUτ and b green edges in VVτ, we can now interpret 𝐅a,b,c as having c edges whose endpoints are included in both U and V (say these edges are colored yellow). This simply reflects the fact that the derivatives in the definition of 𝐅a,b,c+1 are placed as diagonal blocks.

Figure 3: 𝐅0,0,1.

As we saw above, increasing the intersection of U and V decreases the norm, and so the matrices with c>0 should have a smaller Schatten norm. However, they are now included in the bound with a multiplicative factor of Lc. Thus, we simply look for a vertex separator S maximizing Le(S)n(k|S|)/2 where e(S) counts the (yellow) edges contained in S=UV. This is precisely the “sparse vertex separator” which determines the norm bound for such sparse graph matrices [16, 29].

2 Preliminaries and Notation

Vectors are denoted by bold face lower case letters 𝐱,𝐲 and 𝐳. Deterministic matrices are denoted by bold face upper case letters 𝐀 and 𝐁. Matrix-valued functions are denoted by bold face upper case letters 𝐗, 𝐘, 𝐅, 𝐌, 𝐇 and 𝐏.

Sets and indices.

For a set S, let |S| denote the number of distinct elements in S. We denote [n]={1,,n} for any positive integer n. For 𝐢:=(i1,,im)[n]m, define

𝒯nm:={(i1,,im):j,k[m],jkijik}.

Matrix norms.

Let d1×d2 be the space of all d1×d2 real matrices. Let d be the subspace of d×d containing all Hermitian matrices. We write 2 for the 2 operator norm, F for the Frobenius norm, and tr() for the trace.

Definition 3 (Schatten norm).

For 𝐀d1×d2 and t1, the Schatten 2t-norm is defined as

𝐀2t:=(tr(𝐀𝖳𝐀)t)1/2t.

Polynomial random matrices.

Let (Ω,,μ) be a probability space. Introduce the random vector 𝐱:=(x1,,xn)Ω where x1,,xn are independent random variables. A polynomial random matrix 𝐅(𝐱) is a matrix-valued polynomial 𝐅:Ωd1×d2. We can construct a polynomial random matrix 𝐅(𝐱) by drawing a copy of 𝐱μ. Note that despite 𝐱 having independent entries, the entries of 𝐅(𝐱) can be dependent (or correlated) with each other. Unlike Wigner matrices whose entries are i.i.d., the polynomial random matrices can have dependencies among its entries.

Definition 4 (Permutation Symmetric Property).

Suppose 𝐅(𝐱) is a degree-d homogeneous multilinear polynomial random matrix, i.e.,

𝐅(𝐱)=𝐢𝒯nd(𝐀𝐢j{i1,,id}xj),

where {𝐀𝐢}𝐢𝒯nd is a multi-indexed sequence of deterministic matrices of the same dimension. 𝐅(𝐱) is permutation symmetric if 𝐀i1,,id=𝐀iσ(1),,iσ(d) for any permutation σ𝔖d and any (i1,,id)𝒯nd.

Partial derivatives.

Let 𝐅(𝐱) be an n-variate polynomial random matrix of degree D. For a,b,c0 and d=a+b+c, we consider the block matrices 𝐅a,b,c containing all d-th order partial derivatives of 𝐅(𝐱). We define 𝐅a,b,c recursively as 𝐅0,0,0=𝐅(𝐱) and

𝐅a+1,b,c=[x1𝐅a,b,cxn𝐅a,b,c]𝖳,
𝐅a,b+1,c=[x1𝐅a,b,cxn𝐅a,b,c],
𝐅a,b,c+1=[x1𝐅a,b,cxn𝐅a,b,c].

It is evident from the definition that the block matrices 𝐅a+1,b,c, 𝐅a,b+1,c, 𝐅a,b,c+1 are assembled from sub-blocks {xi𝐅a,b,c}i=1n arranged vertically, horizontally and diagonally respectively. The order of increment between a and b doesn’t affect the resulting matrix, i.e.,

𝐅a+1,b+1,c=[x1𝐅a,b+1,cxn𝐅a,b+1,c]=[x1𝐅a+1,b,cxn𝐅a+1,b,c]=[x1x1𝐅a,b,cx1xn𝐅a,b,cxnx1𝐅a,b,cxnxn𝐅a,b,c].

However, the order of increment involving c affects the resulting matrix, i.e.,

[x1𝐅a,b,c+1xn𝐅a,b,c+1][x1𝐅a,b+1,cxn𝐅a,b+1,c].

Nevertheless, the order of increment doesn’t affect the Schatten norm. Since we will only be interested in the Schatten norms of these matrices, we will simply label them as 𝐅a,b+1,c+1 without specifying the order. It is easy to see that 𝐅a,b,c is a deterministic matrix if a+b+c=D and 𝐅a,b,c=𝟎 if a+b+c>D.

Constants.

We denote C for some universal constants and Ca for constants depending only on some parameter a. The values of the constants may differ from one instance to another.

3 Decoupling Inequalities

In this work, we focus on decoupling inequalities for moments, which is fairly elementary and can be interpreted combinatorially as partitioning the random vector 𝐱.

Lemma 5 (see [30] Lemma 6.21).

Let 𝐱={xj}j=1n be a sequence of independent random variables with 𝔼xj=0 for all j=1,,n. Let {𝐁𝐢}𝐢𝒯nd be a multi-indexed sequence of deterministic matrices of the same dimension. Then for 1t

𝔼𝐢𝒯nd𝐁𝐢xi1xid2tdd𝔼𝐢𝒯nd𝐁𝐢xi1(1)xid(d)2t,

where 𝐱(1),,𝐱(d) denote d independent copies of 𝐱.

Improved Decoupling for Random Variables with Structured Indices

For a generic degree d multilinear polynomial random matrices, an application of Lemma 5 yields a decoupling constant of dd, which might not be optimal for polynomial random matrices with additional structures. We will show that the additional structures enable us to give a tighter decoupling inequality through a careful partitioning.

Building on Lemma 5, we prove a decoupling inequality for polynomial random matrices, in which the indices of random variables have a graph structure. Let G=(V,E) be a fixed simple graph with |V|=k, and |E|=d. We denote the vertices by V(G)={v1,,vk} and the ordered edge set by E(G)={(i1,j1),,(id,jd)}. For any φ𝒯nk, we write φ(i) for the i-th component of φ and φ(E) for {(φ(i1),φ(j1)),,(φ(id),φ(jd))}.

Lemma 6.

Let 𝐳={Zφ(i),φ(j)}φ𝒯nk,(i,j)E be a double sequence of independent random variables with 𝔼Zφ(i),φ(j)=0 for all (i,j)E and φ𝒯nk. Let {𝐁φ(E)}φ𝒯nk be a multi-indexed sequence of deterministic matrices of the same dimension. Then for dk(k1) and 1t,

𝔼φ𝒯nk𝐁φ(E)(i,j)EZφ(i),φ(j)2tkk𝔼φ𝒯nk𝐁φ(E)Zφ(i1),φ(j1)(1)Zφ(id),φ(jd)(d)2t,

where 𝐳(1),,𝐳(d) denote d independent copies of 𝐳.

 Remark 7.

It is crucial for the indices of all monomials to share the same structure so that d independent copies of 𝐳 suffices. Otherwise, k(k1) independent copies are needed in general.

Proof.

Consider a random partition of the ground set [n] into k parts and let 𝐫 be the vertex partitioner. Formally, let 𝐫=(r1,,rn) be a sequence of independent random variables with each rp uniformly distributed on {1,,k}, i.e., for 1pn,

(rp=1)=(rp=2)==(rp=k)=1/k.

Define an event A:={rφ(v1)=1,,rφ(vk)=k} for each φ𝒯nk. Conditioned on 𝐳,

𝔼𝐫[𝟙A(rφ(v1),,rφ(vk))]=1/kk.

It follows that

F: =𝔼𝐳φ𝒯nk𝐁φ(E)(i,j)EZφ(i),φ(j)2t
=kk𝔼𝐳𝔼𝐫[φ𝒯nk𝟙A(rφ(v1),,rφ(vk))𝐁φ(E)(i,j)EZφ(i),φ(j)]2t
kk𝔼𝐫𝔼𝐳φ𝒯nk𝟙A(rφ(v1),,rφ(vk))𝐁φ(E)(i,j)EZφ(i),φ(j)2t,

where the last step is due to Jensen’s inequality and Fubini’s theorem. Since the expectation over 𝐫 satisfies the inequality, this implies the existence of an 𝐫 satisfying the inequality.

Fix 𝐫 and define the vertex partitioning with respect to 𝐫 as

P1={p[n]:rp=1},,Pk={p[n]:rp=k},

which in turn induces an edge partitioning,

Pe:={(Pi,Pj)}(i,j)𝒯k2.

Notice that there are k(k1) elements in Pe, but we only have d edges to partition. So we select d elements out of Pe in the following way. Fix an arbitrary φ𝒯nk, choose

P1=(Pφ(i1),Pφ(j1)),,Pd=(Pφ(id),Pφ(jd)).

Let’s call {P1,Pd} a d-edge partitioning. In other words, we may select any d elements out of Pe to form a d-edge partitioning so long as the pattern of the indices of (Pi,Pj)’s matches that of E(G). Since the elements in Pe have no intersections, the partitions in {P1,,Pd} also have no intersections. Since Zφ(i),φ(j)’s are independent random variables, replacing Zφ(i1),φ(j1)Zφ(id),φ(jd) with Zφ(i1),φ(j1)(1)Zφ(id),φ(jd)(d) for any φ(E)P1××Pd will not change the distribution. Hence

Fkk𝔼φ(E)P1××Pd𝐁φ(E)Zφ(i1),φ(j1)(1)Zφ(id),φ(jd)(d)2t. (1)

Denote all variables in P1××Pd by

𝒵:={Zφ(i1),φ(j1)(1),,Zφ(id),φ(jd)(d):(φ(i1),φ(j1))P1,,(φ(id),φ(jd))Pd},

and denote the rest of the variables by 𝒵c. It remains to show that the sum on the right-hand side of (1) is over all φ𝒯nk. Observe that

𝔼𝒵c[φ(E)P1××Pd𝐁φ(E)Zφ(i1),φ(j1)(1)Zφ(id),φ(jd)(d)|𝒵]=𝟎. (2)

Denote P1××Pd by 𝒫. Since the sum in (2) has conditional expectation zero, we can add it to (1) to get

Fkk𝔼𝒵φ(E)𝒫𝐁φ(E)Zφ(i1),φ(j1)(1)Zφ(id),φ(jd)(d)+𝔼𝒵c[φ(E)𝒫𝐁φ(E)Zφ(i1),φ(j1)(1)Zφ(id),φ(jd)(d)|𝒵]2t.

Since the two sums on the right-hand side are independent conditioned on 𝒵, conditional application of Jensen’s inequality yields the desired inequality,

Fkk𝔼𝐳(1),,𝐳(d)φ𝒯nk𝐁φ(E)Zφ(i1),φ(j1)(1)Zφ(id),φ(jd)(d)2t.

4 Multilinear Polynomial Random Matrices

In this section, we prove a moment bound for multilinear polynomial random matrices with bounded, normalized random variables. We follow our recursion framework by decoupling the polynomial random matrices first and then apply the matrix Rosenthal inequality recursively to sums of (conditionally) independent random matrices.

To this end, we derive the non-Hermitian matrix Rosenthal inequality from the Hermitian matrix Rosenthal inequality by Hermitian dilation (see [34] Sec. 2.1.16).

Lemma 8 (Non-Hermitian Matrix Rosenthal Inequality).

Suppose that t=1 or t1.5. Consider a finite sequence {𝐘k}k1 of centered, independent, random matrices, and assume that 𝔼𝐘k4t4t<. Then

𝔼k𝐘k4t4t(16t)3t{(k𝔼𝐘k𝐘k𝖳)1/24t4t+(k𝔼𝐘k𝖳𝐘k)1/24t4t}+(8t)4t(k𝔼𝐘k4t4t).

Proof.

For a sequence of Hermitian matrices {𝐗k}k1 satisfying all the assumptions above, we have (Hermitian) matrix Rosenthal inequality [22] Corollary 7.4,

𝔼k𝐗k4t4t(16t)3t(k𝔼𝐗k2)1/24t4t+(8t)4tk𝔼𝐗k4t4t.

We use Hermitian dilation to extend the inequality to non-Hermitian matrices by setting 𝐗k=(𝐘k) and notice that

𝔼k[𝟎𝐘k𝐘k𝖳𝟎]4t4t=𝔼tr([((k𝐘k)(k𝐘k𝖳))2t𝟎𝟎((k𝐘k𝖳)(k𝐘k))2t])=2𝔼k𝐘k4t4t,
(k𝔼[𝐘k𝐘k𝖳𝟎𝟎𝐘k𝖳𝐘k])1/24t4t =tr([k𝔼𝐘k𝐘k𝖳𝟎𝟎k𝔼𝐘k𝖳𝐘k])2t
=(k𝔼𝐘k𝐘k𝖳)1/24t4t+(k𝔼𝐘k𝖳𝐘k)1/24t4t,
k𝔼[𝟎𝐘k𝐘k𝖳𝟎]4t4t =2k𝔼𝐘k4t4t.

Hence, the result follows.

Claim 9.

Let 𝐅(𝐱) be a homogeneous multilinear polynomial random matrix of degree D. Let 𝐅(𝐱(1),,𝐱(D)) be the decoupled 𝐅(𝐱), i.e.,

𝐅(𝐱(1),,𝐱(D))=𝐣𝒯nD𝐀𝐣xj1(1)xjD(D),

where {𝐀𝐤}𝐤𝒯nD is a multi-indexed sequence of deterministic matrices of the same dimension. For some fixed a,b,c0 and k=a+b+c<D, let 𝐅a,b,c be the block matrix of k-th order partial derivatives of 𝐅(𝐱(1),,𝐱(D)). Then 𝐅a,b,c is a homogeneous multilinear polynomial random matrix of degree d=Dk. Furthermore,

𝔼𝐅a,b,c4t4t(16t)3t(𝔼𝐅a+1,b,c4t4t+𝔼𝐅a,b+1,c4t4t)+(8t)4t𝔼𝐅a,b,c+14t4t.

Proof.

Without loss of generality, we differentiate 𝐅(𝐱(1),,𝐱(D)) with respect to 𝐱(D) first, then 𝐱(D1), and so on. It is straightforward to see that after k=a+b+c rounds of differentiation, 𝐅a,b,c(𝐱(1),,𝐱(d)) is a homogeneous polynomial random matrix of degree d=Dk. And

𝐅a,b,c(𝐱(1),,𝐱(d))=𝐢𝒯nd𝐁𝐢xi1(1)xid(d),

where {𝐁𝐢}𝐢𝒯nd is a multi-indexed sequence of deterministic matrices of the same dimension. More specifically, each 𝐁𝐢 is a block matrix whose blocks consist of {𝐀𝐣}𝐣𝒯nD such that j1=i1,,jd=id.

Conditioned on 𝐱(1),,𝐱(d1), 𝐅a,b,c(𝐱(1),,𝐱(d)) is a sum of centered, (conditionally) independent random matrices in 𝐱(d). It follows that

𝔼(𝐅a,b,c(𝐱(1),,𝐱(d))4t4t|𝐱(1),𝐱(d1))
= 𝔼(𝔼𝐱(d)(𝐢𝒯nd𝐁𝐢xi1(1)xid(d)4t4t|𝐱(1),𝐱(d1)))
(16t)3t𝔼(id=1n(𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1))(𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1))𝖳)1/24t4t
+(16t)3t𝔼(id=1n(𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1))𝖳(𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1)))1/24t4t
+(8t)4tL4tid=1n𝔼𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1)4t4t,

where the inequality is due to matrix Rosenthal inequality (Lemma 8). Now we have

𝔼(id=1n(𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1))(𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1))𝖳)1/24t4t
= 𝔼([x1(d)𝐅a,b,cxn(d)𝐅a,b,c][x1(d)𝐅a,b,c𝖳xn(d)𝐅a,b,c𝖳])1/24t4t=𝔼𝐅a,b+1,c4t4t,
𝔼(id=1n(𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1))𝖳(𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1)))1/24t4t
= 𝔼([x1(d)𝐅a,b,c𝖳xn(d)𝐅a,b,c𝖳][x1(d)𝐅a,b,cxn(d)𝐅a,b,c])1/24t4t=𝔼𝐅a+1,b,c4t4t,
id=1n𝔼𝐢𝒯nd𝐁𝐢xi1(1)xid1(d1)4t4t
= 𝔼tr[x1(d)𝐅a,b,cx1(d)𝐅a,b,c𝖳xn(d)𝐅a,b,cxn(d)𝐅a,b,c𝖳]2t=𝔼𝐅a,b,c+14t4t.

We present a moment bound for permutation symmetric, homogeneous multilinear polynomial random matrices of degree d.

Theorem 10 (Homogeneous Multilinear Recursion).

Let 𝐱={xi}i=1n be a sequences of i.i.d random variables with 𝔼xi=0, 𝔼xi2=1 and |xi|L for all 1in. Let {𝐀𝐢}𝐢𝒯nd be a multi-indexed sequence of deterministic matrices of the same dimension. Define a permutation symmetric, homogeneous multilinear polynomial random matrix of degree d as

𝐅(𝐱)=𝐢𝒯nd(𝐀𝐢j{i1,,id}xj).

Let a,b,c0 and d=a+b+c. Then for 2t,

𝔼𝐅(𝐱)𝔼𝐅(𝐱)4t4ta,b,c:a+b+c=d(48dt)4dtL4ct𝐅a,b,c4t4t.

Proof.

Let 𝐱(1),,𝐱(d) be d independent copies of 𝐱. We decouple 𝐅(𝐱) by Lemma 5,

E:=𝔼𝐅((𝐱)4t4td4dt𝔼𝐅(𝐱(1),,𝐱(d))4t4t.

We take the partial derivative with respect to 𝐱(d) by applying Lemma 8 to get,

Ed4dt(16t)3t(𝔼𝐅1,0,04t4t+𝔼𝐅0,1,04t4t)+d4dt(8t)4tL4t𝔼𝐅0,0,14t4t.

Note that 𝐅1,0,0, 𝐅0,1,0, and 𝐅0,0,1 are functions in variables 𝐱(1),,𝐱(d1). We take the partial derivative with respect to the rest of the variables until 𝐅a,b,c’s become deterministic matrices. Apply 9 recursively and use the permutation symmetric property of 𝐅(𝐱), we have

Ea,b,c:a+b+c=d(16dt)4dtL4ctd!a!b!c!𝐅a,b,c4t4t.

Since d!a!b!c!d!(d/3)!(d/3)!(d/3)! and d!(d/3)!(d/3)!(d/3)!332πd3d,

Ea,b,c:a+b+c=d(48dt)4dtL4ct𝐅a,b,c4t4t.

As a corollary, we derive a moment bound for general multilinear polynomial random matrices of degree D. The polynomials are split into homogeneous parts and each parts are bounded separately. Let 𝐅=d(𝐱) denote the degree-d homogeneous part of 𝐅(𝐱).

Corollary 11 (Multilinear Recursion).

Let 𝐱={xi}i=1n be a sequences of i.i.d random variables with 𝔼xi=0, 𝔼xi2=1 and |xi|L for all 1in. Let {𝐀𝐢}𝐢𝒯nd be multi-indexed sequences of deterministic matrices of the same dimension. Define a degree-D multilinear polynomial random matrix as

𝐅(𝐱)=d=1D𝐢𝒯nd(𝐀𝐢j{i1,,id}xj).

Suppose 𝐅=d(𝐱) is permutation symmetric for 1dD. Let a,b,c0 and d=a+b+c. Then for 2t,

𝔼𝐅(𝐱)𝔼𝐅(𝐱)4t4td=1DD4t(48dt)4dt(a,b,c:a+b+c=dL4ct𝐅a,b,c=d4t4t).

Proof.

We rewrite 𝐅(𝐱) into a formal sum of its homogeneous parts 𝐅(𝐱)=d=1D𝐅=d(𝐱). By the trace inequality (see [33] Theorem 3.1), we have

𝔼𝐅(𝐱)𝔼𝐅(𝐱)4t4tD4td=1D𝔼𝐅=d(𝐱)𝔼𝐅=d(𝐱)4t4t.

By Theorem 10,

𝔼𝐅(𝐱)𝔼𝐅(𝐱)4t4tD4td=1D(48dt)4dt(a,b,c:a+b+c=dL4ct𝐅a,b,c=d4t4t).

5 Gaussian Polynomial Random Matrices

In this section, we prove a moment bound for polynomial random matrices with Gaussian random variables. While the decoupling technique could work for Gaussian polynomial random matrices as well, we base our recursion framework on the following bound for its simplicity.

Lemma 12 (Polynomial Moments, see [14] Theorem 7.1).

Let 𝐇(𝐱):Ωd be a function with 𝐱𝒩(0,𝕀n). For t=1 and t1.5,

𝔼𝐇(𝐱)𝔼𝐇(𝐱)2t2t(2t)2t𝔼tr(i=1n(i𝐇(𝐱))2)t. (3)

The non-Hermitian version of Lemma 12 can be easily obtained by the technique of Hermitian dilation.

Lemma 13 (Non-Hermitian Polynomial Moments).

Let 𝐅(𝐱):Ωd1×d2 be a function with 𝐱𝒩(0,𝕀n). For t=1 and t1.5,

𝔼𝐅(𝐱)𝔼𝐅(𝐱)2t2t(2t)2t(𝔼(i=1ni𝐅(𝐱)i𝐅𝖳(𝐱))1/22t2t+𝔼(i=1ni𝐅𝖳(𝐱)i𝐅(𝐱))1/22t2t).
Theorem 14 (Gaussian Recursion).

Let 𝐱𝒩(0,𝕀n) and {𝐀𝐢}𝐢[n]d be multi-indexed sequences of deterministic matrices of the same dimension. Define a degree-d homogeneous Gaussian polynomial random matrix as

𝐏(𝐱)=𝐢[n]d(𝐀𝐢j{i1,,id}xj).

Let a,b0. Then for 2t,

𝔼𝐏(𝐱)𝔼𝐏(𝐱)2t2t(2d2t)2t(1a+bd𝔼𝐏a,b2t2t).

Proof.

Start by applying 15 with k=0. Apply 15 recursively until k=d1 to get the desired bound.

Claim 15.

Let 𝐱𝒩(0,𝕀n) and {𝐀𝐢}𝐢[n]d be multi-indexed sequences of deterministic matrices of the same dimension. Define a degree-d homogeneous Gaussian polynomial random matrix as

𝐏(𝐱)=𝐢[n]d(𝐀𝐢j{i1,,id}xj).

Let a,b0 and a+b=k<d. Let 𝐏a,b(𝐱) be the k-th partial derivative block matrix associated to 𝐏(𝐱). Then for 2t,

𝔼𝐏a,b𝔼𝐏a,b2t2t (22t)2t(𝔼𝐏a+1,b𝔼𝐏a+1,b2t2t+𝔼𝐏a+1,b2t2t)
+(22t)2t(𝔼𝐏a,b+1𝔼𝐏a,b+12t2t+𝔼𝐏a,b+12t2t).

Proof.

By the non-hermitian polynomial moment (Lemma 13), we have

E:=𝔼𝐏a,b𝔼𝐏a,b2t2t (2t)2t(𝔼(i=1ni𝐏a,bi𝐏a,b𝖳)1/22t2t+𝔼(i=1ni𝐏a,b𝖳i𝐏a,b)1/22t2t)
=(2t)2t(𝔼[1𝐏a,bn𝐏a,b]2t2t+𝔼[1𝐏a,bn𝐏a,b]2t2t)
=(2t)2t(𝔼𝐏a+1,b2t2t+𝔼𝐏a,b+12t2t),

where the last equality uses our notation for partial derivative block matrices introduced in Section 2. Notice that 𝐏a+1,b and 𝐏a,b+1 are homogeneous Gaussian polynomial random matrix of degree k1. Since 𝐏a+1,b and 𝐏a,b+1 are not necessarily centered,

E (2t)2t(𝔼𝐏a+1,b𝔼𝐏a+1,b+𝔼𝐏a+1,b2t2t+𝔼𝐏a,b+1𝔼𝐏a,b+1+𝔼𝐏a,b+12t2t)
(22t)2t(𝔼𝐏a+1,b𝔼𝐏a+1,b2t2t+𝔼𝐏a+1,b2t2t+𝔼𝐏a,b+1𝔼𝐏a,b+12t2t+𝔼𝐏a,b+12t2t),

where the last inequality is due to the trace inequality (see [33] Theorem 3.1).

6 Application: Graph Matrices

Denote the Erdős-Rényi random graph on n vertices as 𝒢n,p, where each edge is present with probability p independent of all other edges. When 𝒢n,p is viewed as a probability space, it is equal to (Ω,,μ) where Ω is the sample space of all possible graphs on n vertices. Any GΩ is a random vector representing all edges. Each coordinate in G, denoted by Gij, is an independent random variable representing a single edge. We can construct a random graph by drawing a copy of Gμ. To adopt the convention in p-biased Fourier analysis, we normalize Gij such that 𝔼Gij=0 and 𝔼Gij2=1, which leads to the sample space Ω={1pp,p1p}(n2).

Definition 16 (Shape, see e.g. [29] Definition 4.2).

A shape is a tuple τ=(V(τ),E(τ),Uτ,Vτ) where (V(τ),E(τ)) is a graph and Uτ, Vτ are ordered subsets of the vertices.

Definition 17 (Graph matrix, see e.g. [29] Definition 4.4).

Given a shape τ, the associated graph matrix 𝐌 : Ωd1×d2 is a matrix-valued function such that

𝐌[I,J]=φ𝒯nk:φ(Uτ)=I,φ(Vτ)=J(u,v)E(τ)Gφ(u),φ(v).

In other words, 𝐌 maps an input graph GΩ to a n!(n|I|)!×n!(n|J|)! matrix whose rows and columns are indexed by I and J respectively.

Definition 18 (Vertex Separator, see e.g. [29] Definition 4.8).

For any shape τ, a vertex separator is a subset of vertices SV(τ) such that there is no path from Uτ to Vτ in τ\S, which is the shape obtained by deleting S and all edges adjacent to S. We write Sτ for a vertex separator of the minimum size.

Claim 19.

For any shape τ, let {E1,E2} be an arbitrary cover of E(τ). Let S=E1E2 be the vertex set of the overlapping edges between E1 and E2. If S contains E1Vτ and E2Uτ, then S is a vertex separator.

Proof.

Since S=E1E2, there is no path from E1\S to E2\S. Additionally, if S contains E1Vτ, then E1\S is not adjacent to Vτ and E1\S contains Uτ\S. If S also contains E2Uτ, then E2\S is not adjacent to Uτ and E2\S contains Vτ\S. Since there is no path from Uτ\S to Vτ\S, S is a vertex separator.

Theorem 20.

For any shape τ and 2t,

𝔼𝐌4t4t((48t|V(τ)|)4t|V(τ)|(C|E(τ)|)|E(τ)|n|V(τ)|)(1pp)4t|E(Sτ)|n2t(|V(τ)||Sτ|)

where C is an absolute constant and E(Sτ) are all edges adjacent to Sτ.

Proof.

First, let |V(τ)|=k and we write 𝐌 as a polynomial whose coefficients are matrices,

𝐌=φ𝒯nk𝐁φ(E)(i,j)E(τ)Gφ(i),φ(j),

where the [I,J]-entry of 𝐁φ(E) is

𝐁φ(E)[I,J]={1ifφ(Uτ)=Iandφ(Vτ)=J0otherwise.

Note that 𝐁φ(E)’s have the same dimension as 𝐌 and the rows and columns of 𝐁φ(E)’s are indexed in the same way as 𝐌. Since Uτ and Vτ are ordered and each φ assigns one set of value to vertices in Uτ and Vτ, there is only one nonzero entry in each 𝐁φ(E). But there might be multiple φ’s for which 𝐁φ(E)’s are identical due to free vertices.

Let |E(τ)|=d and we rewrite 𝐌 in a permutation symmetric way,

𝐌 =φ𝒯nk(1d!σ𝔖d𝐁σ(φ(E))Gφ(iσ(1)),φ(jσ(1))Gφ(iσ(d)),φ(jσ(d)))
=1d!φ𝒯nk𝐀φ(E)Gφ(i1),φ(j1)Gφ(id),φ(jd),

where 𝐀φ(E)=σ𝔖d𝐁σ(φ(E)). Notice that the indices of the random variables of 𝐌 have the same structure as in Lemma 6, so we can decouple 𝐌 using Lemma 6,

F:=𝔼𝐌4t4t|V(τ)|4t|V(τ)|(1d!)4t𝔼φ𝒯nk𝐀φ(E)Gφ(i1),φ(j1)(1)Gφ(id),φ(jd)(d)4t4t,

where {G(1),,G(d)} denote d independent copies of G. Since 𝐌 is a multilinear polynomial random matrix, 𝔼𝐌=𝟎. An application of Theorem 10 yields,

Fa,b,c:a+b+c=|E(τ)|(48t|V(τ)|)4t|V(τ)|(1|E(τ)|!)4t(1pp)4ct𝐌a,b,c4t4t,

where {𝐌a,b,c} are partial derivative block matrices associated to 𝐌.

Fix a set of a,b,c, 𝐌a,b,c is a block matrix whose blocks are comprised of {𝐀φ(E)}φ𝒯nk. More specifically, the [I,J]-th block of 𝐌a,b,c is

𝐌a,b,c[I,J]=GI1GIa+cGJ1GJb𝐌=𝐀IJ=σ𝔖d𝐁σ(IJ). (4)

Algebraically, the last equality is due to the commutativity of partial derivative operators. Combinatorially, if we fix an arbitrary ordering on the edge set E(τ), the last expression is summing up all permutations on d edges in E(τ). Now let’s delve into the combinatorics. Let E(τ)=E1E2, where E1 contains a+c edges and E2 contains b+c edges with c number of overlapping edges. Let S=E1E2 be the set of vertices shared by E1 and E2. The row and column blocks of 𝐌a,b,c are indexed by φ(E1) and φ(E2) respectively. Let’s write

𝐌a,b,c=σ𝔖d𝐌a,b,c,σ,

where 𝐌a,b,c,σ[I,J]=𝐁σ(IJ). Applying the trace inequality (see [33] Theorem 3.1), it follows that

Fa,b,c:a+b+c=|E(τ)|σ𝔖d(48t|V(τ)|)4t|V(τ)|(1pp)4ct𝐌a,b,c,σ4t4t. (5)

For any two distinct σ1 and σ2𝔖d, 𝐌a,b,c,σ14t4t=𝐌a,b,c,σ24t4t since 𝐌a,b,c,σ1 and 𝐌a,b,c,σ2 are identical after permuting rows and columns. So we will focus on bounding 𝐌a,b,c,σ04t4t where σ0 is the identity permutation. For identity permutation, we have 𝐌a,b,c,σ0[I,J]=𝐁IJ by (4). Denote 𝐌a,b,c,σ0 by 𝐅,

E:= 𝐌a,b,c,σ04t4t=𝐅4t4t=tr(𝐅𝖳𝐅)2t
= trI1,,I2t[n]2(a+c)J1,,J2t[n]2(b+c)𝐅𝖳[I1,J1]𝐅[I1,J2]𝐅𝖳[I2,J2]𝐅[I2,J3]𝐅𝖳[I2t,J2t]𝐅[I2t,J1]
= trI1,,I2t[n]2(a+c)J1,,J2t[n]2(b+c)𝐁I1J1𝖳𝐁I1J2𝐁I2J2𝖳𝐁I2J3𝐁I2tJ2t𝖳𝐁I2tJ1 (6)

First of all, 𝐁IiJi𝟎 implies that IiJi=φ(E1E2), for some φ𝒯nk and Ii(S)=Ji(S) for 1i2t. Similarly, 𝐁IiJi+1𝟎 implies that IiJi+1=φ(E1E2), for some φ𝒯nk and Ii(S)=Ji+1(S) for 1i2t (the additions in the subscripts are in mod 2t,i.e.,J2t+1=J1). Secondly, for each 𝐁IiJi𝟎, there is only one nonzero entry, namely 𝐁IiJi[IiJi(Uτ),IiJi(Vτ)]=1. So 𝐁IiJi𝖳𝐁IiJi+1𝟎 if IiJi(Uτ)=IiJi+1(Uτ) and the only nonzero entry is

𝐁IiJi𝖳𝐁IiJi+1[IiJi(Vτ),IiJi+1(Vτ)]=1.

It follows that 𝐁IiJi𝖳𝐁IiJi+1𝟎 for 1i2t if J1(E2Uτ)=J2(E2Uτ)==J2t(E2Uτ). Lastly, 𝐁IiJi𝖳𝐁IiJi+1𝐁Ii+1Ji+1𝖳𝐁Ii+1Ji+2𝟎 if IiJi+1(Vτ)=Ii+1Ji+1(Vτ) and the only nonzero entry is

𝐁IiJi𝖳𝐁IiJi+1𝐁Ii+1Ji+1𝖳𝐁Ii+1Ji+2[IiJi(Vτ),Ii+1Ji+2(Vτ)]=1.

It follows that 𝐁I1J1𝖳𝐁I1J2𝐁I2J2𝖳𝐁I2J3𝐁I2tJ2t𝖳𝐁I2tJ1𝟎 if I1(E1Vτ)=I2(E1Vτ)==I2t(E1Vτ) and the only nonzero entry is

𝐁I1J1𝖳𝐁I1J2𝐁I2J2𝖳𝐁I2J3𝐁I2tJ2t𝖳𝐁I2tJ1[I1J1(Vτ),I2tJ1(Vτ)]=1.

Since I1(E1Vτ)=I2t(E1Vτ), we have I1J1(Vτ)=I2tJ1(Vτ). So if the summand is nonzero, it has a 1 on its diagonal.

To summarize, each summand in (6) is nonzero if and only if IiJi=φi(E1E2),IiJi+1=φi(E1E2) for some φi,φi𝒯nk, Ii’s and Ji’s agree on S, Ii’s agree on E1Vτ and Ji’s agree on E2Uτ for 1i2t. Since each nonzero summand contributes a 1 on the diagonal, we can simply count the number of nonzero summands to compute the trace in (6). Notice that the number of nonzero summands is equal to the number of φ1,,φ2t,φ1,,φ2t that satisfy all the constraints. Hence

En|S|n|E1Vτ|n|E2Uτ|n2t|E1\Vτ\S|n2t|E2\Uτ\S|. (7)

For a fixed set of a,b,c, (7) provides an upper bound for 𝐌a,b,c,σ4t4t for any permutation σ𝔖d. But what is an upper bound for 𝐌a,b,c,σ4t4t among all possible sets of a,b,c? By varying a,b,c, we can always make S=E1E2 large enough to contain vertices in E1Vτ and E2Uτ. Thus S is a vertex separator by 19. It follows that E1\Vτ\S=E1\S and E2\Uτ\S=E2\S. Hence

En|V(τ)|n2t|E1\S|n2t|E2\S|n|V(τ)|n2t(|V(τ)||Sτ|), (8)

where Sτ is a vertex separator of minimum size. Substituting (8) into (5) yields the desired bound.

Corollary 21.

For any given shape τ, any ε>0, with probability 1ε,

𝐌|V(τ)||V(τ)|(Clog(|E(τ)||E(τ)|n|V(τ)|/ε))|V(τ)|(1pp)|E(Sτ)|n|V(τ)||Sτ|,

where C>0 is an absolute constant.

Proof.

By Markov’s inequality and Theorem 20, we have

(𝐌θ)(𝐌4t4tθ4t)θ4t𝔼𝐌4t4t
= θ4t((48t|V(τ)|)4t|V(τ)|(C|E(τ)|)|E(τ)|n|V(τ)|)(1pp)4t|E(Sτ)|n2t(|V(τ)||Sτ|). (9)

Set the right hand side of (9) to ε by taking

θ=(ε14t(48t|V(τ)|)|V(τ)|(C|E(τ)|)|E(τ)|4tn|V(τ)|4t)(1pp)|E(Sτ)|n|V(τ)||Sτ|.

Take

t=14log(|E(τ)||E(τ)|n|V(τ)|/ε)

to complete the proof.

References

  • [1] Radosław Adamczak and Paweł Wolff. Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order. Probability Theory and Related Fields, 162(3):531–586, 2015.
  • [2] Kwangjun Ahn, Dhruv Medarametla, and Aaron Potechin. Graph matrices: Norm bounds and applications. arXiv preprint, 2021. arXiv:1604.03423.
  • [3] Richard Aoun, Marwa Banna, and Pierre Youssef. Matrix poincaré inequalities and concentration. Advances in Mathematics, 371:107251, 2020.
  • [4] Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari, and Jeff Xu. Polynomial-time power-sum decomposition of polynomials. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 956–967, 2022. doi:10.1109/FOCS54457.2022.00094.
  • [5] Afonso S Bandeira, March T Boedihardjo, and Ramon van Handel. Matrix concentration inequalities and free probability. Inventiones Mathematicae, pages 1–69, 2023.
  • [6] Nikhil Bansal, Haotian Jiang, and Raghu Meka. Resolving matrix Spencer conjecture up to poly-logarithmic rank. In Proceedings of the 55th ACM Symposium on Theory of Computing, pages 1814–1819, 2023. doi:10.1145/3564246.3585103.
  • [7] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019. doi:10.1137/17M1138236.
  • [8] Sergey G Bobkov, Friedrich Götze, and Holger Sambale. Higher order concentration of measure. Communications in Contemporary Mathematics, 21(03):1850043, 2019.
  • [9] Tatiana Brailovskaya and Ramon van Handel. Universality and sharp matrix concentration inequalities. arXiv preprint, 2022. arXiv:2201.05142.
  • [10] Jingqiu Ding, Tommaso d’Orsi, Chih-Hung Liu, David Steurer, and Stefan Tiegel. Fast algorithm for overcomplete order-3 tensor decomposition. In Conference on Learning Theory, pages 3741–3799. PMLR, 2022. URL: https://proceedings.mlr.press/v178/ding22a.html.
  • [11] Rong Ge, Qingqing Huang, and Sham M. Kakade. Learning mixtures of gaussians in high dimensions. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, pages 761–770, New York, NY, USA, 2015. Association for Computing Machinery. doi:10.1145/2746539.2746616.
  • [12] Samuel B Hopkins, Tselil Schramm, and Jonathan Shi. A robust spectral algorithm for overcomplete tensor decomposition. In Conference on Learning Theory, pages 1683–1722. PMLR, 2019. URL: http://proceedings.mlr.press/v99/hopkins19b.html.
  • [13] Samuel B Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer. Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors. In Proceedings of the 48th ACM Symposium on Theory of Computing, pages 178–191, 2016. doi:10.1145/2897518.2897529.
  • [14] De Huang and Joel A. Tropp. From poincaré inequalities to nonlinear matrix concentration. Bernoulli, 2020.
  • [15] De Huang and Joel A. Tropp. Nonlinear matrix concentration via semigroup methods. Electronic Journal of Probability, 26:Art. No. 8, January 2021.
  • [16] Chris Jones, Aaron Potechin, Goutham Rajendran, Madhur Tulsiani, and Jeff Xu. Sum-of-squares lower bounds for sparse independent set. In Proceedings of the 62nd IEEE Symposium on Foundations of Computer Science, 2021.
  • [17] Jeong Han Kim and Van H Vu. Concentration of multivariate polynomials and its applications. Combinatorica, 20(3):417–434, 2000. doi:10.1007/S004930070014.
  • [18] Bohdan Kivva and Aaron Potechin. Exact nuclear norm, completion and decomposition for random overcomplete tensors via degree-4 sos. arXiv preprint arXiv:2011.09416, 2020. arXiv:2011.09416.
  • [19] Stanislaw Kwapien. Decoupling Inequalities for Polynomial Chaos. The Annals of Probability, 15(3):1062–1071, 1987.
  • [20] Cécilia Lancien and Pierre Youssef. A note on quantum expanders, 2023. arXiv:2302.07772.
  • [21] Rafał Latała. Estimates of moments and tails of gaussian chaoses. The Annals of Probability, 34(6):2315–2331, 2006.
  • [22] Lester Mackey, Michael Jordan, Richard Chen, Brendan Farrell, and Joel Tropp. Matrix concentration inequalities via the method of exchangeable pairs. The Annals of Probability, 42, 2012.
  • [23] Terry R. McConnell and Murad S. Taqqu. Double integration with respect to symmetric stable processes. Technical report, Technical Report 618, Cornell Univ., 1984.
  • [24] Ankur Moitra and Alexander S Wein. Spectral methods from tensor networks. In Proceedings of the 51st ACM Symposium on Theory of Computing, pages 926–937, 2019. doi:10.1145/3313276.3316357.
  • [25] Ryan O’Donnell and Yu Zhao. Polynomial bounds for decoupling, with applications. In Proceedings of the 31st Conference on Computational Complexity, CCC ’16, Dagstuhl, DEU, 2016. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CCC.2016.24.
  • [26] Daniel Paulin, Lester Mackey, and Joel A. Tropp. Efron–Stein inequalities for random matrices. The Annals of Probability, 44(5):3431–3473, 2016.
  • [27] Víctor H. Peña and Evarist Giné. Decoupling: From dependence to independence. Springer-Verlag, 1999.
  • [28] Goutham Rajendran. Nonlinear Random Matrices and Applications to the Sum of Squares Hierarchy. PhD thesis, University of Chicago, 2022.
  • [29] Goutham Rajendran and Madhur Tulsiani. Concentration of polynomial random matrices via efron-stein inequalities. Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2023.
  • [30] Holger Rauhut. Compressive sensing and structured random matrices. In Theoretical Foundations and Numerical Methods for Sparse Recovery, pages 1–92. De Gruyter, Berlin, New York, 2010.
  • [31] Warren Schudy and Maxim Sviridenko. Bernstein-like concentration and moment inequalities for polynomials of independent random variables: multilinear case. arXiv preprint, 2011. arXiv:1109.5193.
  • [32] Warren Schudy and Maxim Sviridenko. Concentration and moment inequalities for polynomials of independent random variables. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 437–446. SIAM, 2012. doi:10.1137/1.9781611973099.37.
  • [33] Khalid Shebrawi and Hussien Albadawi. Trace inequalities for matrices. Bulletin of the Australian Mathematical Society, 87(1):139–148, 2013.
  • [34] Joel A. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1–230, 2015.
  • [35] Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018.