Simple Norm Bounds for Polynomial Random Matrices via Decoupling

Tulsiani, Madhur; Wu, June

doi:10.4230/LIPIcs.ITCS.2025.91

Simple Norm Bounds for Polynomial Random Matrices via Decoupling

Madhur Tulsiani Toyota Technological Institute at Chicago, IL, USA June Wu University of Chicago, IL, USA

Abstract

We present a new method for obtaining norm bounds for random matrices, where each entry is a low-degree polynomial in an underlying set of independent real-valued random variables. Such matrices arise in a variety of settings in the analysis of spectral and optimization algorithms, which require understanding the spectrum of a random matrix depending on data obtained as independent samples.

Using ideas of decoupling and linearization from analysis, we show a simple way of expressing norm bounds for such matrices, in terms of matrices of lower-degree polynomials corresponding to derivatives. Iterating this method gives a simple bound with an elementary proof, which can recover many bounds previously required more involved techniques.

Keywords and phrases:

Matrix Concentration, Decoupling, Graph Matrices

Funding:

Madhur Tulsiani: NSF grants CCF-1816372 and CCF-2326685.

Copyright and License:

2012 ACM Subject Classification:

Mathematics of computing

\rightarrow

Probability and statistics ; Theory of computation

\rightarrow

Theory and algorithms for application domains

DOI:

10.4230/LIPIcs.ITCS.2025.91

Event:

16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Editors:

Raghu Meka

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

For fixed matrices $\mathbf{C_{1}},\ldots,\mathbf{C_{n}}$ and independent scalar random variables $x_{1},\ldots,x_{n}$ , consider the problem of analyzing the random matrix

\mathbf{M}\leavevmode\nobreak\ =\leavevmode\nobreak\ \mathbf{C_{1}}\cdot x_{1}% +\cdots+\mathbf{C_{n}}\cdot x_{n}\,.

Note that the entries of the random matrix $\mathbf{M}$ are not necessarily independent, but are (possibly correlated) linear functions of the independent random variables $x_{1},\ldots,x_{n}$ . Such matrices, which arise in a variety of applications in algorithms, statistics, and numerical linear algebra, can often be shown as being concentrated around the mean, using a rich selection of matrix deviation inequalities [34]. Moreover, when the random variables $x_{1},\ldots,x_{n}$ are Gaussian, recent breakthrough results have obtained even sharper concentration guarantees depending on the structure of the matrices $\mathbf{C_{1}},\ldots,\mathbf{C_{n}}$ [5, 9] already leading to applications in discrepancy theory [6] and quantum information theory [20].

In contrast to the above random matrices which are linear functions of independent random variables, several recent applications in spectral algorithms and lower bounds for statistical problems require understanding the (expected) norms of random matrices of the form

\mathbf{F}\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{\genfrac{}{}{0.0pt}% {}{S\subseteq[n]}{\left\lvert S\right\rvert\leq d}}\mathbf{C}_{S}\cdot\prod_{i% \in S}x_{i}\,.

The above random matrix, which denote as a matrix-valued function $\mathbf{F}({\bf x})$ , is a (multilinear, in the example above) low-degree polynomial in the vector of random variables ${\bf x}=(x_{1},\ldots,x_{n})$ . Such matrix-valued polynomial functions arise naturally in a variety of applications, including algorithms for tensor decomposition and completion [13, 12, 18, 10], orbit recovery [24], power-sum decompositions and learning of Gaussian mixtures [4], and lower bounds for Sum-of-Squares hierarchies [2, 7, 16, 28]. In general, such matrices are often an important technical component in the analysis of spectral algorithms, which require understanding the spectrum of some random matrix depending on data obtained as independent samples. Given the expressive power of polynomials, one can often write the matrix entries as low-degree polynomials in the data.

In several applications above, one is interested in obtaining bounds on the spectral norm of the deviation matrix $\mathbf{F}({\bf x})-\operatorname*{\mathbb{E}}\mathbf{F}({\bf x})$ , which hold with high probability over the choice of ${\bf x}$ . Although matrix-deviation inequalities for nonlinear random matrices have been a subject of active research in recent work [26, 22, 3, 14, 15], the analyses in the applications above have often needed to rely on estimating matrix norms via direct computations of trace moments. While these do yield sharp bounds required for applications, they require somewhat intricate computations and ingenious combinatorial arguments tailored to the applications at hand.

Analyzing concentration via moment estimates.

We consider concentration bounds on a scalar polynomial $f({\bf x})$ with mean zero. Using Markov’s inequality, this can be reduced to computing moment estimates, since

\mathchoice{{\mathbb{P}}\left[\left\lvert f({\bf x})\right\rvert\geq\lambda% \right]}{{\mathbb{P}}[\left\lvert f({\bf x})\right\rvert\geq\lambda]}{{\mathbb% {P}}[\left\lvert f({\bf x})\right\rvert\geq\lambda]}{{\mathbb{P}}[\left\lvert f% ({\bf x})\right\rvert\geq\lambda]}\leavevmode\nobreak\ =\leavevmode\nobreak\ % \mathchoice{{\mathbb{P}}\left[\left(f({\bf x})\right)^{2t}\geq\lambda^{2t}% \right]}{{\mathbb{P}}[\left(f({\bf x})\right)^{2t}\geq\lambda^{2t}]}{{\mathbb{% P}}[\left(f({\bf x})\right)^{2t}\geq\lambda^{2t}]}{{\mathbb{P}}[\left(f({\bf x% })\right)^{2t}\geq\lambda^{2t}]}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ % \lambda^{-2t}\cdot{\mathchoice{{\mathbb{E}}\left[\left(f({\bf x})\right)^{2t}% \right]}{{\mathbb{E}}[\left(f({\bf x})\right)^{2t}]}{{\mathbb{E}}[\left(f({\bf x% })\right)^{2t}]}{{\mathbb{E}}[\left(f({\bf x})\right)^{2t}]}}

Note that while in some cases $\mathchoice{{\mathbb{E}}\left[\left(f({\bf x})\right)^{2t}\right]}{{\mathbb{E}% }[\left(f({\bf x})\right)^{2t}]}{{\mathbb{E}}[\left(f({\bf x})\right)^{2t}]}{{% \mathbb{E}}[\left(f({\bf x})\right)^{2t}]}$ can be computed by direct expansion, it often involves an intricate analysis of the structure of terms with degrees growing with $t$ , and therefore indirect methods may be more convenient. Methods for obtaining concentration of (scalar) polynomial functions have been the subject of a large body of work, including results by Kim and Vu [17], Latała [21], Schudy and Sviridenko [31, 32], Adamczak and Wolff [1], and Bobkov, Götze and Sambale [8]. A useful comparison to direct computation, is the method of Adamczak and Wolff, which is applicable to functions $f({\bf x})$ of a broad range of random vectors ${\bf x}$ (with not necessarily independent coordinates) from a distribution obeying the Poincaré inequality. They obtain such moment bounds by recursive applications of the Poincaré inequality, reducing the computation of the moments of $f$ to that of its derivatives. For a degree- $d$ polynomial function $f$ , one can consider its coefficients as forming a (symmetric) order- $d$ tensor, and their results obtain bounds on the moments of $f$ in terms of the norms of various lower-order tensors “flattening” of this coefficient tensor. Thus, the problem of estimating moments of the random variable $f({\bf x})$ , can be reduced in a black-box way, to understanding the norms of a small number (depending on $d$ ) of deterministic tensors. Any dependence on the problem structure can then be limited to understanding these norms, where often crude estimates can suffice.

The matrix analog of the Markov argument involves the Schatten- $2t$ norm $\left\lVert.\right\rVert_{2t}$ , which is defined for a matrix $\mathbf{M}$ with non-zero singular values $\sigma_{1},\ldots,\sigma_{r}$ as $\left\lVert\mathbf{M}\right\rVert_{2t}^{2t}\leavevmode\nobreak\ :=\leavevmode% \nobreak\ \sum_{j\in[r]}\sigma_{j}^{2t}$ . For a function $\mathbf{F}$ with $\mathchoice{{\mathbb{E}}\left[\mathbf{F}(Z)\right]}{{\mathbb{E}}[\mathbf{F}(Z)% ]}{{\mathbb{E}}[\mathbf{F}(Z)]}{{\mathbb{E}}[\mathbf{F}(Z)]}=\bf 0$ , we have the following bound using Schatten norms.

\mathchoice{{\mathbb{P}}\left[\sigma_{1}(\mathbf{F})\geq\lambda\right]}{{% \mathbb{P}}[\sigma_{1}(\mathbf{F})\geq\lambda]}{{\mathbb{P}}[\sigma_{1}(% \mathbf{F})\geq\lambda]}{{\mathbb{P}}[\sigma_{1}(\mathbf{F})\geq\lambda]}% \leavevmode\nobreak\ \leq\leavevmode\nobreak\ \lambda^{-2t}\cdot\mathbb{E}% \left\lVert{\mathbf{F}}\right\rVert_{2t}^{2t}\leavevmode\nobreak\ =\leavevmode% \nobreak\ \lambda^{-2t}\cdot\mathbb{E}\operatorname{\operatorname{tr}}\left[% \left({\mathbf{F}}({\bf x}){\mathbf{F}}({\bf x})^{\intercal}\right)^{t}\right]

Several applications requiring norm bounds for matrix-valued polynomial functions, start with the above inequality, and often rely on direct expansion of the trace for the matrix power $\left({\mathbf{F}}({\bf x}){\mathbf{F}}({\bf x})^{\intercal}\right)^{t}$ . Recall that when ${\mathbf{A}}$ is the adjacency matrix of a graph, $\operatorname{\operatorname{tr}}({\mathbf{A}}^{2t})$ can be interpreted as the number of cycles of length $2t$ . Similarly, when working with matrices ${\mathbf{F}}({\bf x})$ where each entry can be interpreted as arising from a combinatorial pattern, $\left({\mathbf{F}}({\bf x}){\mathbf{F}}({\bf x})^{\intercal}\right)^{t}$ can be viewed as the number of copies of such patterns “chained” together in a certain way. Estimating the trace then amounts to estimating the (expected) number of such chainings [2].

Obtaining bounds on the $\mathbb{E}\left\lVert{\mathbf{F}}\right\rVert_{2t}^{2t}$ via this method requires first using problem structure to decompose ${\mathbf{F}}$ into matrices with such combinatorial patterns, and then using (ingenious) combinatorial arguments to understand the expected number of occurrences of such chains of patterns. The focus of this work is understanding alternative general methods for estimating such norm bounds, when the underlying problem may not necessarily have a combinatorial flavor, or when such decompositions may be hard to analyze.

Decoupling.

Decoupling inequalities were developed in the study of $U$ -statistics [27], multiple stochastic integration [23], and polynomial chaos [19], and have found important applications in applied mathematics, theoretical computer science, applied probability theory and statistics. Some examples include the study of compressed sensing [30], query complexity [25], the proof of Hanson-Wright inequality [35], learning mixture of Gaussians [11] and so on. Moreover, the inequalities are applicable in both scalar and matrix settings, and even more broadly in Banach spaces.

For a degree- $d$ homogeneous multilinear polynomial $f({\bf x})$ in $n$ independent random variables, standard decoupling inequalities (see Section 3) can be used to obtain

\mathchoice{\underset{{\bf x}}{\mathbb{E}}\left[\left(f({\bf x})\right)^{2t}% \right]}{{\mathbb{E}}_{{\bf x}}[\left(f({\bf x})\right)^{2t}]}{{\mathbb{E}}_{{% \bf x}}[\left(f({\bf x})\right)^{2t}]}{{\mathbb{E}}_{{\bf x}}[\left(f({\bf x})% \right)^{2t}]}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ C_{d}^{2t}\cdot% \mathchoice{\underset{{\bf x}^{(1)},\ldots,{\bf x}^{(d)}}{\mathbb{E}}\left[% \left(f({\bf x}^{(1)},\ldots,{\bf x}^{(d)})\right)^{2t}\right]}{{\mathbb{E}}_{% {\bf x}^{(1)},\ldots,{\bf x}^{(d)}}[\left(f({\bf x}^{(1)},\ldots,{\bf x}^{(d)}% )\right)^{2t}]}{{\mathbb{E}}_{{\bf x}^{(1)},\ldots,{\bf x}^{(d)}}[\left(f({\bf x% }^{(1)},\ldots,{\bf x}^{(d)})\right)^{2t}]}{{\mathbb{E}}_{{\bf x}^{(1)},\ldots% ,{\bf x}^{(d)}}[\left(f({\bf x}^{(1)},\ldots,{\bf x}^{(d)})\right)^{2t}]}\,,

where $f({\bf x}^{(1)},\ldots,{\bf x}^{(d)})$ denotes the polynomial in $d\cdot n$ random variables obtained by replacing the $d$ variables in each (ordered) monomial by the corresponding $d$ coordinates coming from the $d$ independent random vectors ${\bf x}^{(1)},\ldots,{\bf x}^{(d)}$ , and $C_{d}$ is a constant depending on the degree $d$ . One can now fix the vectors ${\bf x}^{(1)},\ldots,{\bf x}^{(d-1)}$ , and consider the expectation

\mathchoice{\underset{{\bf x}^{(d)}}{\mathbb{E}}\left[\left(f({\bf x}^{(1)},% \ldots,{\bf x}^{(d)})\right)^{2t}\right]}{{\mathbb{E}}_{{\bf x}^{(d)}}[\left(f% ({\bf x}^{(1)},\ldots,{\bf x}^{(d)})\right)^{2t}]}{{\mathbb{E}}_{{\bf x}^{(d)}% }[\left(f({\bf x}^{(1)},\ldots,{\bf x}^{(d)})\right)^{2t}]}{{\mathbb{E}}_{{\bf x% }^{(d)}}[\left(f({\bf x}^{(1)},\ldots,{\bf x}^{(d)})\right)^{2t}]}\leavevmode% \nobreak\ =\leavevmode\nobreak\ \mathchoice{\underset{{\bf x}^{(d)}}{\mathbb{E% }}\left[\left(\sum_{i\in[n]}f_{i}({\bf x}^{(1)},\ldots,{\bf x}^{(d-1)})\cdot x% _{i}^{(d)}\right)^{2t}\right]}{{\mathbb{E}}_{{\bf x}^{(d)}}[\left(\sum_{i\in[n% ]}f_{i}({\bf x}^{(1)},\ldots,{\bf x}^{(d-1)})\cdot x_{i}^{(d)}\right)^{2t}]}{{% \mathbb{E}}_{{\bf x}^{(d)}}[\left(\sum_{i\in[n]}f_{i}({\bf x}^{(1)},\ldots,{% \bf x}^{(d-1)})\cdot x_{i}^{(d)}\right)^{2t}]}{{\mathbb{E}}_{{\bf x}^{(d)}}[% \left(\sum_{i\in[n]}f_{i}({\bf x}^{(1)},\ldots,{\bf x}^{(d-1)})\cdot x_{i}^{(d% )}\right)^{2t}]}\,,

where we write $f({\bf x}^{(1)},\ldots,{\bf x}^{(d)})$ as a linear form in ${\bf x}^{(d)}$ with coefficients $f_{i}({\bf x}^{(1)},\ldots,{\bf x}^{(d-1)})$ . The moments of such a linear form can be understood (for example) using the Khintchine inequality, which bounds it in terms of a polynomial depending only on ${\bf x}^{(1)},\ldots,{\bf x}^{(d-1)}$ .

Similarly, in the matrix case, we use decoupling inequalities to reduce the problem of estimating moments for degree- $d$ homogeneous (and multilinear) matrix-valued polynomials, to that of estimating moments for linear matrix-valued functions. At this point, one can apply standard and well-known linear matrix concentration inequalities (we use the matrix Rosenthal inequality) which yield a bound in terms of a degree- $(d-1)$ matrix-valued polynomial function of the vectors ${\bf x}^{(1)},\ldots,{\bf x}^{(d-1)}$ . Iterating this process gives a simple method for obtaining norm bounds, applicable for multilinear polynomial functions on $n$ independent random variables (with any reasonable distribution).

While this method does not immediately apply for non-multilinear polynomials, it suffices for many of the applications mentioned above. Moreover, for arbitrary polynomials in Gaussian random variables, we can also obtain a simple recursion as an immediate consequence of the more recent Poincaré inequalities of Huang and Tropp [15]. Together, these suffice for most applications of interest.

Methods and results: a technical overview

Let $x_{1},\ldots,x_{n}$ be i.i.d real-valued random variables with $\mathchoice{{\mathbb{E}}\left[x_{i}\right]}{{\mathbb{E}}[x_{i}]}{{\mathbb{E}}[% x_{i}]}{{\mathbb{E}}[x_{i}]}=0,\mathchoice{{\mathbb{E}}\left[x_{i}^{2}\right]}% {{\mathbb{E}}[x_{i}^{2}]}{{\mathbb{E}}[x_{i}^{2}]}{{\mathbb{E}}[x_{i}^{2}]}=1$ and $\left\lvert x_{i}\right\rvert\leq L$ for each $i\in[n]$ . Let ${\mathcal{T}}^{d}_{n}\subseteq[n]^{d}$ denote the set of ordered $d$ -tuples ${\bf i}=(i_{1},\ldots,i_{d})$ with $i_{1},\ldots,i_{d}$ all distinct. For a fixed sequence of (deterministic) matrices $\left\{\mathbf{A}_{{\bf i}}\right\}_{{\bf i}\in{\mathcal{T}}^{d}_{n}}\subseteq% {\mathbb{R}}^{d_{1}\times d_{2}}$ , we consider a matrix-valued multilinear polynomial function defined as

\mathbf{F}({\bf x})\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{{\bf i}\in% {\mathcal{T}}^{d}_{n}}\mathbf{A}_{{\bf i}}\cdot\prod_{j\in[d]}x_{i_{j}}\,.

We write the polynomial in terms of ordered tuples for decoupling, but since the variables $x_{1},\ldots,x_{n}$ commute, we can assume that for any permutation $\sigma:[d]\rightarrow[d]$ permuting the $d$ coordinates of each tuple, $\mathbf{A}_{{\bf i}}=\mathbf{A}_{\sigma({\bf i})}$ (this is referred to as permutation symmetry of $\mathbf{F}$ ). We can use the decoupling inequalities from Section 3 to show that

\left\lVert\mathbf{F}({\bf x})\right\rVert_{4t}\leavevmode\nobreak\ \leq% \leavevmode\nobreak\ C_{d}\cdot\left\lVert\mathbf{F}({\bf x}^{(1)},\ldots,{\bf x% }^{(d)})\right\rVert_{4t}\leavevmode\nobreak\ =\leavevmode\nobreak\ C_{d}\cdot% \left\lVert\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}\mathbf{A}_{{\bf i}}\cdot% \prod_{j\in[d]}x_{i_{j}}^{(j)}\right\rVert_{4t}\,,

where $\left\lVert\cdot\right\rVert_{4t}$ denotes the (expected) Schatten norm $\left(\mathchoice{{\mathbb{E}}\left[\operatorname{\operatorname{tr}}({\mathbf{% F}({\bf x})}^{\mathsf{T}}\mathbf{F}({\bf x}))^{2t}\right]}{{\mathbb{E}}[% \operatorname{\operatorname{tr}}({\mathbf{F}({\bf x})}^{\mathsf{T}}\mathbf{F}(% {\bf x}))^{2t}]}{{\mathbb{E}}[\operatorname{\operatorname{tr}}({\mathbf{F}({% \bf x})}^{\mathsf{T}}\mathbf{F}({\bf x}))^{2t}]}{{\mathbb{E}}[\operatorname{% \operatorname{tr}}({\mathbf{F}({\bf x})}^{\mathsf{T}}\mathbf{F}({\bf x}))^{2t}% ]}\right)^{1/4t}$ .

We remark that while the constant $C_{d}$ is of the form $d^{d}$ in general, this can be improved when the underlying set of indices $[n]$ has more structure. For example, when $[n]$ corresponds to the set of pairs in an base set $[r]$ and each of the monomials involves at most $k$ elements of $[r]$ (has index degree $k$ ), it is possible to improve the constant to $k^{k}$ . This is essentially the “vertex partitioning” argument used in works on graph matrices, and is discussed in Section 3 in the language of decoupling.

As mentioned earlier, we can now “linearize” the function $\mathbf{F}({\bf x}^{(1)},\ldots,{\bf x}^{(d)})$ by fixing ${\bf x}^{(d)},\ldots,{\bf x}^{(d-1)}$ and treating it only as a function of ${\bf x}^{(d)}$ i.e., we consider

	$\displaystyle\operatorname*{\mathbb{E}}\left\lVert\sum_{{\bf i}\in{\mathcal{T}% }^{d}_{n}}\mathbf{A}_{{\bf i}}\cdot\prod_{j\in[d]}x_{i_{j}}^{(j)}\right\rVert_% {4t}^{4t}$	$\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \operatorname{\mathbb% {E}}_{{\bf x}^{(1)},\ldots,{\bf x}^{(d-1)}}\leavevmode\nobreak\ \leavevmode% \nobreak\ \operatorname{\mathbb{E}}_{{\bf x}^{(d)}}\left\lVert\sum_{k\in[n]}% \left(\sum_{\genfrac{}{}{0.0pt}{}{{\bf i}\in{\mathcal{T}}^{d}_{n}}{i_{d}=k}}% \mathbf{A}_{{\bf i}}\cdot\prod_{j\in[d-1]}x_{i_{j}}^{(j)}\right)\cdot x_{k}^{(% d)}\right\rVert_{4t}^{4t}$
		$\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \operatorname{\mathbb% {E}}_{{\bf x}^{(1)},\ldots,{\bf x}^{(d-1)}}\leavevmode\nobreak\ \leavevmode% \nobreak\ \operatorname{\mathbb{E}}_{{\bf x}^{(d)}}\left\lVert\sum_{k\in[n]}% \left(\partial_{x_{k}^{(d)}}\mathbf{F}\right)\cdot x_{k}^{(d)}\right\rVert_{4t% }^{4t}\,.$

To bound the inner expectation, we can now use the matrix Rosenthal inequality (Lemma 8) which says that for a collection of centered, independent random matrices $\left\{\mathbf{Y}_{k}\right\}$ with finite moments, we have

{\mathbb{E}}\left\lVert\sum_{k}{\bf Y}_{k}\right\rVert_{4t}^{4t}\leq(16t)^{3t}% \cdot\left\{\left\lVert\left(\sum_{k}{\mathbb{E}}{\bf Y}_{k}{{\bf Y}}^{\mathsf% {T}}_{k}\right)^{1/2}\right\rVert_{4t}^{4t}+\left\lVert\left(\sum_{k}{\mathbb{% E}}{{\bf Y}}^{\mathsf{T}}_{k}{\bf Y}_{k}\right)^{1/2}\right\rVert_{4t}^{4t}% \right\}+(8t)^{4t}\cdot\left(\sum_{k}{\mathbb{E}}\left\lVert{\bf Y}_{k}\right% \rVert_{4t}^{4t}\right).

Taking $\mathbf{Y}_{k}=\partial_{x_{k}^{(d)}}\mathbf{F}\cdot x_{k}^{(d)}$ we can now compute the expectations (over ${\bf x}^{(d)}$ ) in the RHS as

	$\displaystyle\sum_{k}{\mathbb{E}}\mathbf{Y}_{k}{\mathbf{Y}}^{\mathsf{T}}_{k}$	$\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \begin{bmatrix}% \partial_{x_{1}^{(d)}}\mathbf{F}&\dots&\partial_{x_{n}^{(d)}}\mathbf{F}\end{% bmatrix}{\begin{bmatrix}\partial_{x_{1}^{(d)}}\mathbf{F}&\dots&\partial_{x_{n}% ^{(d)}}\mathbf{F}\end{bmatrix}}^{\mathsf{T}}$
	$\displaystyle\sum_{k}{\mathbb{E}}{\mathbf{Y}}^{\mathsf{T}}_{k}\mathbf{Y}_{k}$	$\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \begin{bmatrix}% \partial_{x_{1}^{(d)}}{\mathbf{F}}^{\mathsf{T}}&\dots&\partial_{x_{n}^{(d)}}{% \mathbf{F}}^{\mathsf{T}}\end{bmatrix}{\begin{bmatrix}\partial_{x_{1}^{(d)}}{% \mathbf{F}}^{\mathsf{T}}&\dots&\partial_{x_{n}^{(d)}}{\mathbf{F}}^{\mathsf{T}}% \end{bmatrix}}^{\mathsf{T}}$
	$\displaystyle\sum_{k}{\mathbb{E}}\left\lVert{\bf Y}_{k}\right\rVert_{4t}^{4t}$	$\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ {\mathbb{E}}\left% \lVert\begin{bmatrix}x_{1}^{(d)}\cdot\partial_{x_{1}^{(d)}}\mathbf{F}&&\\ &\ddots&\\ &&x_{n}^{(d)}\cdot\partial_{x_{n}^{(d)}}\mathbf{F}\end{bmatrix}\right\rVert_{4% t}^{4t}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ L^{4t}\cdot\left\lVert% \begin{bmatrix}\partial_{x_{1}^{(d)}}\mathbf{F}&&\\ &\ddots&\\ &&\partial_{x_{n}^{(d)}}\mathbf{F}\end{bmatrix}\right\rVert_{4t}^{4t}$

Iterating the above argument, we can prove a bound in terms of a collection of partial derivative matrices $\mathbf{F}_{a,b,c}$ (see Section 2 for details).

Note that in the above definition, we do not specify the order of differentiation, which can be ignored due to the permutation symmetry of $\mathbf{F}$ and the fact that changing the order does not change the Schatten norms (see Section 2 and Section 4 for details). This argument yields the following bound for all homogeneous multilinear polynomial functions.

Theorem 1 (Restatement of Theorem 10).

Let ${\bf x}=\{x_{i}\}_{i=1}^{n}$ be a sequence of i.i.d random variables with ${\mathbb{E}}x_{i}=0$ , ${\mathbb{E}}x_{i}^{2}=1$ and $|x_{i}|\leq L$ for all $1\leq i\leq n$ . Let $\{{\bf A}_{{\bf i}}\}_{{\bf i}\in{\mathcal{T}}^{d}_{n}}$ be a multi-indexed sequence of deterministic matrices of the same dimension. Define a permutation symmetric, homogeneous multilinear polynomial random matrix of degree $d$ as

{\bf F}({\bf x})=\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}\left({\bf A}_{{\bf i}}% \cdot\prod_{j\in\{i_{1},\dots,i_{d}\}}x_{j}\right).

Let $a,b,c\in{\mathbb{Z}}_{\geq 0}$ and $d=a+b+c$ . Then for $2\leq t\leq\infty,$

{\mathbb{E}}\left\lVert{\bf F}({\bf x})\right\rVert_{4t}^{4t}\leavevmode% \nobreak\ \leq\leavevmode\nobreak\ \sum_{\begin{subarray}{c}a,b,c:\\ a+b+c=d\end{subarray}}(48dt)^{4dt}\cdot L^{4ct}\left\lVert{\bf F}_{a,b,c}% \right\rVert_{4t}^{4t}.

We remark that the condition that $\left\lvert x_{i}\right\rvert\leq L$ can be replaced by a condition on the growth of moments of $x_{i}$ , which is true for subgaussian variables. The above bound is in terms of the norms of a constant number of deterministic matrices $\mathbf{F}_{a,b,c}$ , as is also the case for general moment bounds on scalar polynomials [1]. We will see later that these deterministic matrices can easily be interpreted in several cases of interest, to recover the bounds obtained via combinatorial methods.

Gaussian polynomial matrices.

While the above bounds are only for multilinear polynomials, they can also be extended to arbitrary polynomials of independent Gaussian random variables, using standard techniques to approximate them by multilinear polynomials. However, for the case of Gaussian polynomials, the recent Poincaré inequalities of Huang and Tropp [14] directly yield a simple bound, which is easier to apply. They show that for a Hermitian matrix-valued function $\mathbf{H}$ of Gaussian valued random variables

{\mathbb{E}}\left\lVert{\bf H}({\bf x})-{\mathbb{E}}{\bf H}({\bf x})\right% \rVert_{2t}^{2t}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ (\sqrt{2}t)^{2t}% \cdot{\mathbb{E}}\operatorname{\operatorname{tr}}\left(\sum_{i=1}^{n}(\partial% _{i}{\bf H}({\bf x}))^{2}\right)^{t}.

As before, one can apply this argument inductively to obtain a bound in terms of the matrices $\mathbf{F}_{a,b}$ which are defined similarly as the matrices $\mathbf{F}_{a,b,c}$ before, with $c=0$ . Since we no longer have ${\mathbb{E}}\mathbf{F}_{a,b}=\mathbf{0}$ for $a+b<d$ , the bound is in terms of the expected matrices ${\mathbb{E}}\mathbf{F}_{a,b}$ for all $a, b$ with $a+b\leq d$ .

Theorem 2 (Restatement of Theorem 14).

Let ${\bf x}\sim{\mathcal{N}(0,\mathbb{I}_{n})}$ and $\{{\bf A}_{{\bf i}}\}_{{\bf i}\in[n]^{d}}$ be a sequence of deterministic matrices of the same dimension. Define a degree- $d$ homogeneous Gaussian polynomial random matrix as

{\bf F}({\bf x})=\sum_{{\bf i}\in[n]^{d}}\left({\bf A}_{{\bf i}}\cdot\prod_{j% \in\{i_{1},\dots,i_{d}\}}x_{j}\right).

Let $a,b\in{\mathbb{Z}}_{\geq 0}$ . Then for $2\leq t\leq\infty,$

{\mathbb{E}}\left\lVert{\bf F}({\bf x})-{\mathbb{E}}{\bf F}({\bf x})\right% \rVert_{2t}^{2t}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ (2^{d}\sqrt{2}t)% ^{2t}\left(\sum_{1\leq a+b\leq d}\left\lVert{\mathbb{E}}{\bf F}_{a,b}\right% \rVert_{2t}^{2t}\right).

While the above theorem is stated for homogeneous polynomials, the proof is identical also for the non-homogeneous ones, with the sum in the definition being over all tuple sizes.

Do try this at home: applying the framework

We now discuss how to interpret the matrices $\mathbf{F}_{a,b,c}$ arising in the bound in Theorem 1 to recover the norm bounds for graph matrices derived via combinatorial methods. This is discussed with more formal details in Section 6, but we present an intuitive (at least for the authors) version of the argument here.

Graph matrices are defined using constant-sized template “shapes” and provide a convenient basis for expressing (large) random matrices where the the entries are low-degree polynomials in the (normalized) indicators of edges in a $G_{n,p}$ random graph. Such matrices and their norm bounds have been used for a large number of applications [2].

Let $N=\binom{n}{2}$ and fix a canonical bijection between the space $[N]$ and $\binom{[n]}{2}$ , and let $[N]$ index the space of all possible edges in a random graph on $n$ vertices. For each $e=\{i,j\}\in\binom{[n]}{2}$ , for $\Delta_{p}=\sqrt{(1-p)/p}$ let $x_{e}$ be independently $1/\Delta_{p}$ with probability $(1-p)$ and $-\Delta_{p}$ with probability $p$ , so that ${\mathbb{E}}x_{e}=0$ , ${\mathbb{E}}x_{e}^{2}=1$ , and $\left\lvert x_{e}\right\rvert\leq\Delta_{p}$ .

Let $(V(\tau),E(\tau))$ be a graph on $k$ vertices with $d$ edges, and let $U_{\tau},V_{\tau}$ be ordered subsets of $V(\tau)$ of size $k_{1}$ and $k_{2}$ respectively. The tuple $\tau=(V(\tau),E(\tau),U_{\tau},V_{\tau})$ is referred to as a “shape” and is used to define the following graph matrix $\mathbf{M}_{\tau}$ with rows and columns indexed by ${\mathcal{T}}^{k_{1}}_{n}$ and ${\mathcal{T}}^{k_{2}}_{n}$ respectively

\displaystyle\mathbf{M}_{\tau}[{\bf i},{\bf j}]\leavevmode\nobreak\ :=% \leavevmode\nobreak\ \sum_{\psi\in{\mathcal{T}}^{k}_{n}}\mathds{1}_{\{\psi(U_{% \tau})={\bf i}\}}\cdot\mathds{1}_{\{\psi(V_{\tau})={\bf j}\}}\cdot\prod_{e_{0}% \in E(\tau)}x_{\psi(e_{0})}\,,

where we interpret a tuple $\psi\in{\mathcal{T}}^{k}_{n}$ as an injective function $\psi:V(\tau)\rightarrow[n]$ in the canonical way. If the function $\psi$ maps $U_{\tau}$ to ${\bf i}$ (the row-index) and $V_{\tau}$ to ${\bf j}$ (the column index), we add the monomial to $\mathbf{M}_{\tau}[{\bf i},{\bf j}]$ corresponding to the product of the images of all edges in the “pattern” $E(\tau)$ under $\psi$ . Note that each nonzero entry of the matrix-valued function $\mathbf{M}$ is a multilinear homogeneous polynomial in the variables $\left\{x_{e}\right\}_{e\in\binom{[n]}{2}}$ with degree $d=\left\lvert E(\tau)\right\rvert$ . We denote $\mathbf{M}_{\tau}$ by $\mathbf{F}$ for ease of notation in the discussion below.

Figure 1: A shape

\tau

.

As will be apparent from the analysis below, the norm bounds for such matrices behave different characterizations in the “dense” regime where $p=\Omega(1)$ and the “sparse” one where $p=o(1)$ . This is because the subgaussian norm $L=\Delta_{p}\approx p^{-1/2}$ is bounded in the first case, and growing in the second case. The bounds for the first case are stated in terms of the size the minimum vertex separator separating $U_{\tau}$ from $V_{\tau}$ in the shape $\tau$ . For the second case, one needs to augment the defintion of minimum vertex separator, to use a cost based on the subgaussian norm $\Delta_{p}$ . We will show how to recover both these characterizations, using the matrices $\mathbf{F}_{a,b,c}$ given by Theorem 1.

Dense graph matrices.

We start by considering $p=1/2$ so that the random variables $x_{e}$ are just independent Rademacher variables. Taking $d$ to be constant, the bound in Theorem 1 involves only a constant number of matrices, and will depend on the dominant term. To understand each of the terms, we first consider the matrices derived after one step of the recursion.

\displaystyle{\bf F}_{1,0,0}\leavevmode\nobreak\ =\leavevmode\nobreak\ {\begin% {bmatrix}\partial_{x_{1}}{\bf F}&\dots&\partial_{x_{N}}{\bf F}\end{bmatrix}}^{% \mathsf{T}},\leavevmode\nobreak\ \leavevmode\nobreak\ {\bf F}_{0,1,0}% \leavevmode\nobreak\ =\leavevmode\nobreak\ \begin{bmatrix}\partial_{x_{1}}{\bf F% }&\dots&\partial_{x_{N}}{\bf F}\end{bmatrix}\leavevmode\nobreak\ \leavevmode% \nobreak\ \text{and}\leavevmode\nobreak\ \leavevmode\nobreak\ {\bf F}_{0,0,1}% \leavevmode\nobreak\ =\leavevmode\nobreak\ \begin{bmatrix}\partial_{x_{1}}{\bf F% }&&\\ &\ddots&\\ &&\partial_{x_{N}}{\bf F}\end{bmatrix}\,.

Since $\operatorname{\operatorname{tr}}{\left(\sum_{i}{\bf A}_{i}{{\bf A}}^{\mathsf{T% }}_{i}\right)^{2t}}+\operatorname{\operatorname{tr}}{\left(\sum_{i}{{\bf A}}^{% \mathsf{T}}_{i}{\bf A}_{i}\right)^{2t}}\leavevmode\nobreak\ \geq\leavevmode% \nobreak\ \sum_{i}\operatorname{\operatorname{tr}}\left({\bf A}_{i}{{\bf A}}^{% \mathsf{T}}_{i}\right)^{2t}$ and $L=1$ , we will have that $\left\lVert{\bf F}_{1,0,0}\right\rVert_{4t}^{4t}+\left\lVert{\bf F}_{0,1,0}% \right\rVert_{4t}^{4t}\geq L^{4t}\cdot\left\lVert{\bf F}_{0,0,1}\right\rVert_{% 4t}^{4t}$ . This will also be true (up to constant factors) for any constant $L$ , since we interested in the $4t$ -th root of the above trace. Using a similar argument, we can ignore terms with $c>0$ for now (we will come back to these in the sparse case) and focus on matrices $\mathbf{F}_{a,b,0}$ .

We now interpret the matrices $\mathbf{F}_{1,0,0}$ and $\mathbf{F}_{0,1,0}$ . The row-space of $\mathbf{F}_{1,0,0}$ is now indexed by $({\bf i},e)$ for $e\in\binom{[n]}{2}$ (respectively $({\bf j},e)$ for the column space of $\mathbf{F}_{0,1,0}$ ). We now include terms corresponding to maps $\psi$ for which $\psi(U_{\tau})={\bf i}$ , $\psi(V_{\tau})={\bf j}$ and $\psi(e_{0})=e$ for some $e_{0}\in E(\tau)$ so that $\partial x_{e}\mathbf{F}[{\bf i},{\bf j}]\neq 0$ .

Decomposing $\mathbf{F}_{1,0,0}$ into $\left\lvert E(\tau)\right\rvert$ matrices (one for each choice of $e_{0}$ ), we can consider the $\mathbf{F}_{1,0,0,e_{0}}$ as simply a new graph matrix we have deleted the edge $e_{0}$ and included both its end points in $U_{\tau}$ (respectively in $V_{\tau}$ for $\mathbf{F}_{0,1,0}$ ) to obtain a new shape $\tau^{\prime}$ . Similarly, each $\mathbf{F}_{a,b,0}$ can be decomposed into $\left\lvert E(\tau)\right\rvert^{a+b}$ graph matrices, where we delete $a+b$ edges in total, and include the endpoints in $U_{\tau}$ for $a$ of them and $V_{\tau}$ for $b$ of them. This is illustrated in Figure 2 where we color the edges red or green depending on their inclusion in $U_{\tau}$ or $V_{\tau}$ .

Figure 2:

\mathbf{F}_{1,0,0}

,

\mathbf{F}_{0,1,0}

and

\mathbf{F}_{2,3,0}

.

The bound in Theorem 1 is then in terms of these new graph matrices where we delete all $d$ edges, and split their endpoints between $U_{\tau}$ or $V_{\tau}$ to obtain $U^{\prime},V^{\prime}\subseteq V(\tau)=[k]$ . For an entry $\mathbf{M}[{\bf i}^{\prime},{\bf j}^{\prime}]$ of any such matrix, there is at most one $\psi:V(\tau)\rightarrow[n]$ such that ${\psi(U^{\prime})={\bf i}^{\prime}}$ and $\psi(V^{\prime})={\bf j}^{\prime}$ (assuming $\tau$ has no isolated vertices), and $\mathbf{M}[{\bf i}^{\prime},{\bf j}^{\prime}]=\mathds{1}_{\{\psi(U^{\prime})={% \bf i}^{\prime}\}}\cdot\mathds{1}_{\{\psi(V^{\prime})={\bf j}^{\prime}\}}$ . Taking $S=U^{\prime}\cap V^{\prime}$ , this matrix can be simply written as a block-diagonal matrix with $n^{\left\lvert S\right\rvert}$ of size $n^{\left\lvert U^{\prime}\setminus S\right\rvert}\times n^{\left\lvert V^{% \prime}\setminus S\right\rvert}$ . It is easy to check that the spectral norm of such a matrix is $n^{(\left\lvert U^{\prime}\setminus S\right\rvert+\left\lvert V^{\prime}% \setminus S\right\rvert)/2}=n^{(k-|S|)/2}$ since $\left\lvert U^{\prime}\cup V^{\prime}\right\rvert=k$ and $S=U^{\prime}\cap V^{\prime}$ .

Thus, each of the terms in the bound will correspond to graph matrices with different $U^{\prime}$ and $V^{\prime}$ (obtained by coloring edges red or green according to the above process) and the dominant term will be the one with the smallest value of $\left\lvert U^{\prime}\cap V^{\prime}\right\rvert$ . We now claim that $U^{\prime}\cap V^{\prime}$ must be a vertex separator separating $U_{\tau}$ and $V_{\tau}$ in the shape $\tau$ i.e., any path between them must pass through $U^{\prime}\cap V^{\prime}$ . If the path contains both red and green edges, then it contains one vertex with both red and green edges incident on it, which is then in $U^{\prime}\cap V^{\prime}$ (since $U^{\prime}$ is obtain by adding endpoint of red edges to $U_{\tau}$ and $V^{\prime}$ similarly for green edges). If path is entirely red, then we have an endpoint of a red edge in $V_{\tau}$ which is a vertex in $U^{\prime}\cap V^{\prime}$ , and similarly for green paths.

Thus, we have $n^{(k-\left\lvert S\right\rvert)/2}\leq n^{(k-r)/2}$ , where $r$ is the size of the minimum vertex separator between $U_{\tau}$ and $V_{\tau}$ , which recovers the known bound for dense graph matrices [2].

Sparse graph matrices.

The argument for sparse graph matrices is almost the same as above except now we also need to consider terms of the form $\mathbf{F}_{a,b,c}$ for $c>0$ . Just as we interpreted $\mathbf{F}_{a,b,0}$ as coloring $a$ edges from $E(\tau)$ as red and including in $U^{\prime}\supseteq U_{\tau}$ and $b$ green edges in $V^{\prime}\supseteq V_{\tau}$ , we can now interpret $\mathbf{F}_{a,b,c}$ as having $c$ edges whose endpoints are included in both $U^{\prime}$ and $V^{\prime}$ (say these edges are colored yellow). This simply reflects the fact that the derivatives in the definition of $\mathbf{F}_{a,b,c+1}$ are placed as diagonal blocks.

Figure 3:

\mathbf{F}_{0,0,1}

.

As we saw above, increasing the intersection of $U^{\prime}$ and $V^{\prime}$ decreases the norm, and so the matrices with $c>0$ should have a smaller Schatten norm. However, they are now included in the bound with a multiplicative factor of $L^{c}$ . Thus, we simply look for a vertex separator $S$ maximizing $L^{e(S)}\cdot n^{(k-\left\lvert S\right\rvert)/2}$ where $e(S)$ counts the (yellow) edges contained in $S=U^{\prime}\cap V^{\prime}$ . This is precisely the “sparse vertex separator” which determines the norm bound for such sparse graph matrices [16, 29].

2 Preliminaries and Notation

Vectors are denoted by bold face lower case letters ${\bf x},{\bf y}$ and ${\bf z}$ . Deterministic matrices are denoted by bold face upper case letters ${\bf A}$ and ${\bf B}$ . Matrix-valued functions are denoted by bold face upper case letters ${\bf X}$ , ${\bf Y}$ , ${\bf F}$ , ${\bf M}$ , ${\bf H}$ and ${\bf P}$ .

Sets and indices.

For a set $S$ , let $|S|$ denote the number of distinct elements in $S$ . We denote $[n]=\{1,\dots,n\}$ for any positive integer $n$ . For ${\bf i}:=(i_{1},\dots,i_{m})\in[n]^{m}$ , define

{\mathcal{T}}^{m}_{n}:=\{(i_{1},\dots,i_{m}):\forall j,k\in[m],\leavevmode% \nobreak\ j\neq k\Rightarrow i_{j}\neq i_{k}\}.

Matrix norms.

Let ${\mathbb{R}}^{d_{1}\times d_{2}}$ be the space of all $d_{1}\times d_{2}$ real matrices. Let $\mathbb{H}^{d}$ be the subspace of ${\mathbb{R}}^{d\times d}$ containing all Hermitian matrices. We write $\left\lVert\cdot\right\rVert_{2}$ for the $\ell_{2}$ operator norm, $\left\lVert\cdot\right\rVert_{F}$ for the Frobenius norm, and $\operatorname{\operatorname{tr}}(\cdot)$ for the trace.

Definition 3 (Schatten norm).

For ${\bf A}\in{\mathbb{R}}^{d_{1}\times d_{2}}$ and $t\geq 1$ , the Schatten $2t$ -norm is defined as

\left\lVert{\bf A}\right\rVert_{2t}:=(\operatorname{\operatorname{tr}}({{\bf A% }}^{\mathsf{T}}{\bf A})^{t})^{1/2t}.

Polynomial random matrices.

Let $(\Omega,\mathcal{F},\mu)$ be a probability space. Introduce the random vector ${\bf x}:=(x_{1},\dots,x_{n})\in\Omega$ where $x_{1},\dots,x_{n}$ are independent random variables. A polynomial random matrix ${\bf F}({\bf x})$ is a matrix-valued polynomial ${\bf F}:\Omega\rightarrow{\mathbb{R}}^{d_{1}\times d_{2}}$ . We can construct a polynomial random matrix ${\bf F}({\bf x})$ by drawing a copy of ${\bf x}\sim\mu$ . Note that despite ${\bf x}$ having independent entries, the entries of ${\bf F}({\bf x})$ can be dependent (or correlated) with each other. Unlike Wigner matrices whose entries are i.i.d., the polynomial random matrices can have dependencies among its entries.

Definition 4 (Permutation Symmetric Property).

Suppose ${\bf F}({\bf x})$ is a degree-d homogeneous multilinear polynomial random matrix, i.e.,

{\bf F}({\bf x})=\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}\left({\bf A}_{{\bf i}}% \cdot\prod_{j\in\{i_{1},\dots,i_{d}\}}x_{j}\right),

where $\{{\bf A}_{{\bf i}}\}_{{\bf i}\in{\mathcal{T}}^{d}_{n}}$ is a multi-indexed sequence of deterministic matrices of the same dimension. ${\bf F}({\bf x})$ is permutation symmetric if ${\bf A}_{i_{1},\dots,i_{d}}={\bf A}_{i_{\sigma(1)},\dots,i_{\sigma(d)}}$ for any permutation $\sigma\in\mathfrak{S}_{d}$ and any $(i_{1},\dots,i_{d})\in{\mathcal{T}}^{d}_{n}$ .

Partial derivatives.

Let ${\bf F}({\bf x})$ be an $n$ -variate polynomial random matrix of degree $D$ . For $a,b,c\in{\mathbb{Z}}_{\geq 0}$ and $d=a+b+c$ , we consider the block matrices ${\bf F}_{a,b,c}$ containing all $d$ -th order partial derivatives of ${\bf F}({\bf x})$ . We define ${\bf F}_{a,b,c}$ recursively as ${\bf F}_{0,0,0}={\bf F}({\bf x})$ and

{\bf F}_{a+1,b,c}={\begin{bmatrix}\partial_{x_{1}}{\bf F}_{a,b,c}&\dots&% \partial_{x_{n}}{\bf F}_{a,b,c}\end{bmatrix}}^{\mathsf{T}},

{\bf F}_{a,b+1,c}=\begin{bmatrix}\partial_{x_{1}}{\bf F}_{a,b,c}&\dots&% \partial_{x_{n}}{\bf F}_{a,b,c}\end{bmatrix},

{\bf F}_{a,b,c+1}=\begin{bmatrix}\partial_{x_{1}}{\bf F}_{a,b,c}&&\\ &\ddots&\\ &&\partial_{x_{n}}{\bf F}_{a,b,c}\end{bmatrix}.

It is evident from the definition that the block matrices ${\bf F}_{a+1,b,c}$ , ${\bf F}_{a,b+1,c}$ , ${\bf F}_{a,b,c+1}$ are assembled from sub-blocks $\{\partial_{x_{i}}{\bf F}_{a,b,c}\}_{i=1}^{n}$ arranged vertically, horizontally and diagonally respectively. The order of increment between $a$ and $b$ doesn’t affect the resulting matrix, i.e.,

{\bf F}_{a+1,b+1,c}=\begin{bmatrix}\partial_{x_{1}}{\bf F}_{a,b+1,c}\\ \vdots\\ \partial_{x_{n}}{\bf F}_{a,b+1,c}\end{bmatrix}=\begin{bmatrix}\partial_{x_{1}}% {\bf F}_{a+1,b,c}&\dots&\partial_{x_{n}}{\bf F}_{a+1,b,c}\end{bmatrix}=\begin{% bmatrix}\partial_{x_{1}}\partial_{x_{1}}{\bf F}_{a,b,c}&\dots&\partial_{x_{1}}% \partial_{x_{n}}{\bf F}_{a,b,c}\\ &\vdots&\\ \partial_{x_{n}}\partial_{x_{1}}{\bf F}_{a,b,c}&\dots&\partial_{x_{n}}\partial% _{x_{n}}{\bf F}_{a,b,c}\end{bmatrix}.

However, the order of increment involving $c$ affects the resulting matrix, i.e.,

\begin{bmatrix}\partial_{x_{1}}{\bf F}_{a,b,c+1}&\dots&\partial_{x_{n}}{\bf F}% _{a,b,c+1}\end{bmatrix}\neq\begin{bmatrix}\partial_{x_{1}}{\bf F}_{a,b+1,c}&&% \\ &\ddots&\\ &&\partial_{x_{n}}{\bf F}_{a,b+1,c}\end{bmatrix}.

Nevertheless, the order of increment doesn’t affect the Schatten norm. Since we will only be interested in the Schatten norms of these matrices, we will simply label them as ${\bf F}_{a,b+1,c+1}$ without specifying the order. It is easy to see that ${\bf F}_{a,b,c}$ is a deterministic matrix if $a+b+c=D$ and ${\bf F}_{a,b,c}={\bf 0}$ if $a+b+c>D$ .

Constants.

We denote $C$ for some universal constants and $C_{a}$ for constants depending only on some parameter $a$ . The values of the constants may differ from one instance to another.

3 Decoupling Inequalities

In this work, we focus on decoupling inequalities for moments, which is fairly elementary and can be interpreted combinatorially as partitioning the random vector ${\bf x}$ .

Lemma 5 (see [30] Lemma 6.21).

Let ${\bf x}=\{x_{j}\}_{j=1}^{n}$ be a sequence of independent random variables with ${\mathbb{E}}x_{j}=0$ for all $j=1,\dots,n$ . Let $\{{\bf B}_{{\bf i}}\}_{{\bf i}\in{\mathcal{T}}_{n}^{d}}$ be a multi-indexed sequence of deterministic matrices of the same dimension. Then for $1\leq t\leq\infty$

{\mathbb{E}}\left\lVert\sum_{{\bf i}\in{\mathcal{T}}_{n}^{d}}{\bf B}_{{\bf i}}% \cdot x_{i_{1}}\dots x_{i_{d}}\right\rVert_{2t}\leq d^{d}\cdot{\mathbb{E}}% \left\lVert\sum_{{\bf i}\in{\mathcal{T}}_{n}^{d}}{\bf B}_{{\bf i}}\cdot x_{i_{% 1}}^{(1)}\dots x_{i_{d}}^{(d)}\right\rVert_{2t},

where ${\bf x}^{(1)},\dots,{\bf x}^{(d)}$ denote $d$ independent copies of ${\bf x}$ .

Improved Decoupling for Random Variables with Structured Indices

For a generic degree $d$ multilinear polynomial random matrices, an application of Lemma 5 yields a decoupling constant of $d^{d}$ , which might not be optimal for polynomial random matrices with additional structures. We will show that the additional structures enable us to give a tighter decoupling inequality through a careful partitioning.

Building on Lemma 5, we prove a decoupling inequality for polynomial random matrices, in which the indices of random variables have a graph structure. Let $G=(V,E)$ be a fixed simple graph with $|V|=k$ , and $|E|=d$ . We denote the vertices by $V(G)=\{v_{1},\ldots,v_{k}\}$ and the ordered edge set by $E(G)=\{(i_{1},j_{1}),\ldots,(i_{d},j_{d})\}$ . For any $\varphi\in{\mathcal{T}}^{k}_{n}$ , we write $\varphi(i)$ for the $i$ -th component of $\varphi$ and $\varphi(E)$ for $\{(\varphi(i_{1}),\varphi(j_{1})),\ldots,(\varphi(i_{d}),\varphi(j_{d}))\}$ .

Lemma 6.

Let ${\bf z}=\{Z_{\varphi(i),\varphi(j)}\}_{\varphi\in{\mathcal{T}}^{k}_{n},(i,j)% \in E}$ be a double sequence of independent random variables with ${\mathbb{E}}Z_{\varphi(i),\varphi(j)}=0$ for all $(i,j)\in E$ and $\varphi\in{\mathcal{T}}^{k}_{n}$ . Let $\{{\bf B}_{\varphi(E)}\}_{\varphi\in{\mathcal{T}}^{k}_{n}}$ be a multi-indexed sequence of deterministic matrices of the same dimension. Then for $d\leq k(k-1)$ and $1\leq t\leq\infty$ ,

{\mathbb{E}}\left\lVert\sum_{\varphi\in{\mathcal{T}}_{n}^{k}}{\bf B}_{\varphi(% E)}\cdot\prod_{(i,j)\in E}Z_{\varphi(i),\varphi(j)}\right\rVert_{2t}\leq k^{k}% \cdot{\mathbb{E}}\left\lVert\sum_{\varphi\in{\mathcal{T}}_{n}^{k}}{\bf B}_{% \varphi(E)}\cdot Z_{\varphi(i_{1}),\varphi(j_{1})}^{(1)}\cdots Z_{\varphi(i_{d% }),\varphi(j_{d})}^{(d)}\right\rVert_{2t},

where ${\bf z}^{(1)},\ldots,{\bf z}^{(d)}$ denote $d$ independent copies of ${\bf z}$ .

$\blacktriangleright$ Remark 7.

It is crucial for the indices of all monomials to share the same structure so that $d$ independent copies of ${\bf z}$ suffices. Otherwise, $k(k-1)$ independent copies are needed in general.

Proof.

Consider a random partition of the ground set $[n]$ into $k$ parts and let $\bf r$ be the vertex partitioner. Formally, let ${\bf r}=(r_{1},\dots,r_{n})$ be a sequence of independent random variables with each $r_{p}$ uniformly distributed on $\{1,\dots,k\}$ , i.e., for $1\leq p\leq n$ ,

\mathbb{P}(r_{p}=1)=\mathbb{P}(r_{p}=2)=\dots=\mathbb{P}(r_{p}=k)=1/k.

Define an event $A:=\{r_{\varphi(v_{1})}=1,\cdots,r_{\varphi(v_{k})}=k\}$ for each $\varphi\in{\mathcal{T}}^{k}_{n}$ . Conditioned on ${\bf z}$ ,

{\mathbb{E}}_{\bf r}\left[\mathbb{1}_{A}(r_{\varphi(v_{1})},\dots,r_{\varphi(v% _{k})})\right]=1/k^{k}.

It follows that

	$\displaystyle F:$	$\displaystyle={\mathbb{E}}_{{\bf z}}\left\lVert\sum_{\varphi\in{\mathcal{T}}_{% n}^{k}}{\bf B}_{\varphi(E)}\cdot\prod_{(i,j)\in E}Z_{\varphi(i),\varphi(j)}% \right\rVert_{2t}$
		$\displaystyle=k^{k}\cdot{\mathbb{E}}_{{\bf z}}\left\lVert{\mathbb{E}}_{\bf r}% \left[\sum_{\varphi\in{\mathcal{T}}_{n}^{k}}\mathbb{1}_{A}(r_{\varphi(v_{1})},% \dots,r_{\varphi(v_{k})})\cdot{\bf B}_{\varphi(E)}\cdot\prod_{(i,j)\in E}Z_{% \varphi(i),\varphi(j)}\right]\right\rVert_{2t}$
		$\displaystyle\leq k^{k}\cdot{\mathbb{E}}_{\bf r}{\mathbb{E}}_{\bf z}\left% \lVert\sum_{\varphi\in{\mathcal{T}}_{n}^{k}}\mathbb{1}_{A}(r_{\varphi(v_{1})},% \dots,r_{\varphi(v_{k})})\cdot{\bf B}_{\varphi(E)}\cdot\prod_{(i,j)\in E}Z_{% \varphi(i),\varphi(j)}\right\rVert_{2t},$

where the last step is due to Jensen’s inequality and Fubini’s theorem. Since the expectation over $\bf r$ satisfies the inequality, this implies the existence of an ${\bf r}^{*}$ satisfying the inequality.

Fix ${\bf r}^{*}$ and define the vertex partitioning with respect to ${\bf r}^{*}$ as

P_{1}=\{p\in[n]:r^{*}_{p}=1\},\leavevmode\nobreak\ \ldots\leavevmode\nobreak\ % ,P_{k}=\{p\in[n]:r^{*}_{p}=k\},

which in turn induces an edge partitioning,

P_{e}\leavevmode\nobreak\ :=\leavevmode\nobreak\ \{(P_{i},P_{j})\}_{(i,j)\in{% \mathcal{T}}^{2}_{k}}.

Notice that there are $k(k-1)$ elements in $P_{e}$ , but we only have $d$ edges to partition. So we select $d$ elements out of $P_{e}$ in the following way. Fix an arbitrary $\varphi^{*}\in{\mathcal{T}}^{k}_{n}$ , choose

P^{*}_{1}=(P_{\varphi^{*}(i_{1})},P_{\varphi^{*}(j_{1})}),\leavevmode\nobreak% \ \ldots\leavevmode\nobreak\ ,P^{*}_{d}=(P_{\varphi^{*}(i_{d})},P_{\varphi^{*}% (j_{d})}).

Let’s call $\{P_{1}^{*},\ldots P^{*}_{d}\}$ a d-edge partitioning. In other words, we may select any $d$ elements out of $P_{e}$ to form a d-edge partitioning so long as the pattern of the indices of $(P_{i},P_{j})$ ’s matches that of $E(G)$ . Since the elements in $P_{e}$ have no intersections, the partitions in $\{P_{1}^{*},\ldots,P^{*}_{d}\}$ also have no intersections. Since $Z_{\varphi(i),\varphi(j)}$ ’s are independent random variables, replacing $Z_{\varphi(i_{1}),\varphi(j_{1})}\cdots Z_{\varphi(i_{d}),\varphi(j_{d})}$ with $Z_{\varphi(i_{1}),\varphi(j_{1})}^{(1)}\cdots Z_{\varphi(i_{d}),\varphi(j_{d})% }^{(d)}$ for any $\varphi(E)\in P^{*}_{1}\times\cdots\times P^{*}_{d}$ will not change the distribution. Hence

F\leq k^{k}\cdot{\mathbb{E}}\left\lVert\sum_{\varphi(E)\in P^{*}_{1}\times% \cdots\times P^{*}_{d}}{\bf B}_{\varphi(E)}\cdot Z_{\varphi(i_{1}),\varphi(j_{% 1})}^{(1)}\cdots Z_{\varphi(i_{d}),\varphi(j_{d})}^{(d)}\right\rVert_{2t}.

(1)

Denote all variables in $P^{*}_{1}\times\cdots\times P^{*}_{d}$ by

{\mathcal{Z}}:=\left\{Z_{\varphi(i_{1}),\varphi(j_{1})}^{(1)},\leavevmode% \nobreak\ \ldots,\leavevmode\nobreak\ Z_{\varphi(i_{d}),\varphi(j_{d})}^{(d)}:% \leavevmode\nobreak\ \forall\leavevmode\nobreak\ (\varphi(i_{1}),\varphi(j_{1}% ))\in P_{1}^{*},\leavevmode\nobreak\ \ldots\leavevmode\nobreak\ ,(\varphi(i_{d% }),\varphi(j_{d}))\in P_{d}^{*}\right\},

and denote the rest of the variables by ${\mathcal{Z}}^{c}$ . It remains to show that the sum on the right-hand side of (1) is over all $\varphi\in{\mathcal{T}}^{k}_{n}$ . Observe that

{\mathbb{E}}_{{\mathcal{Z}}^{c}}\left[\sum_{\varphi(E)\not\in P^{*}_{1}\times% \cdots\times P^{*}_{d}}{\bf B}_{\varphi(E)}\cdot Z_{\varphi(i_{1}),\varphi(j_{% 1})}^{(1)}\cdots Z_{\varphi(i_{d}),\varphi(j_{d})}^{(d)}\leavevmode\nobreak\ % \leavevmode\nobreak\ \Biggr{\rvert}\leavevmode\nobreak\ \leavevmode\nobreak\ {% \mathcal{Z}}\leavevmode\nobreak\ \leavevmode\nobreak\ \right]={\bf 0}.

(2)

Denote $P^{*}_{1}\times\cdots\times P^{*}_{d}$ by ${\mathcal{P}}$ . Since the sum in (2) has conditional expectation zero, we can add it to (1) to get

F\leq k^{k}\cdot{\mathbb{E}}_{{\mathcal{Z}}}\left\lVert\sum_{\varphi(E)\in{% \mathcal{P}}}{\bf B}_{\varphi(E)}\cdot Z_{\varphi(i_{1}),\varphi(j_{1})}^{(1)}% \cdots Z_{\varphi(i_{d}),\varphi(j_{d})}^{(d)}+{\mathbb{E}}_{{\mathcal{Z}}^{c}% }\left[\sum_{\varphi(E)\not\in{\mathcal{P}}}{\bf B}_{\varphi(E)}\cdot Z_{% \varphi(i_{1}),\varphi(j_{1})}^{(1)}\cdots Z_{\varphi(i_{d}),\varphi(j_{d})}^{% (d)}\leavevmode\nobreak\ \leavevmode\nobreak\ \Biggr{\rvert}\leavevmode% \nobreak\ \leavevmode\nobreak\ {\mathcal{Z}}\leavevmode\nobreak\ \leavevmode% \nobreak\ \right]\right\rVert_{2t}.

Since the two sums on the right-hand side are independent conditioned on ${\mathcal{Z}}$ , conditional application of Jensen’s inequality yields the desired inequality,

F\leq k^{k}\cdot{\mathbb{E}}_{{\bf z}^{(1)},\ldots,{\bf z}^{(d)}}\left\lVert% \sum_{\varphi\in{\mathcal{T}}_{n}^{k}}{\bf B}_{\varphi(E)}\cdot Z_{\varphi(i_{% 1}),\varphi(j_{1})}^{(1)}\cdots Z_{\varphi(i_{d}),\varphi(j_{d})}^{(d)}\right% \rVert_{2t}.\

$\hfill\blacktriangleleft$

4 Multilinear Polynomial Random Matrices

In this section, we prove a moment bound for multilinear polynomial random matrices with bounded, normalized random variables. We follow our recursion framework by decoupling the polynomial random matrices first and then apply the matrix Rosenthal inequality recursively to sums of (conditionally) independent random matrices.

To this end, we derive the non-Hermitian matrix Rosenthal inequality from the Hermitian matrix Rosenthal inequality by Hermitian dilation (see [34] Sec. 2.1.16).

Lemma 8 (Non-Hermitian Matrix Rosenthal Inequality).

Suppose that $t=1$ or $t\geq 1.5$ . Consider a finite sequence $\{{\bf Y}_{k}\}_{k\geq 1}$ of centered, independent, random matrices, and assume that ${\mathbb{E}}\left\lVert{\bf Y}_{k}\right\rVert_{4t}^{4t}<\infty$ . Then

{\mathbb{E}}\left\lVert\sum_{k}{\bf Y}_{k}\right\rVert_{4t}^{4t}\leq(16t)^{3t}% \cdot\left\{\left\lVert\left(\sum_{k}{\mathbb{E}}{\bf Y}_{k}{{\bf Y}}^{\mathsf% {T}}_{k}\right)^{1/2}\right\rVert_{4t}^{4t}+\left\lVert\left(\sum_{k}{\mathbb{% E}}{{\bf Y}}^{\mathsf{T}}_{k}{\bf Y}_{k}\right)^{1/2}\right\rVert_{4t}^{4t}% \right\}+(8t)^{4t}\cdot\left(\sum_{k}{\mathbb{E}}\left\lVert{\bf Y}_{k}\right% \rVert_{4t}^{4t}\right).

Proof.

For a sequence of Hermitian matrices $\{{\bf X}_{k}\}_{k\geq 1}$ satisfying all the assumptions above, we have (Hermitian) matrix Rosenthal inequality [22] Corollary 7.4,

{\mathbb{E}}\left\lVert\sum_{k}{\bf X}_{k}\right\rVert_{4t}^{4t}\leq(16t)^{3t}% \cdot\left\lVert\left(\sum_{k}{\mathbb{E}}{\bf X}^{2}_{k}\right)^{1/2}\right% \rVert_{4t}^{4t}+(8t)^{4t}\cdot\sum_{k}{\mathbb{E}}\left\lVert{\bf X}_{k}% \right\rVert_{4t}^{4t}.

We use Hermitian dilation to extend the inequality to non-Hermitian matrices by setting ${\bf X}_{k}={\mathcal{H}}({\bf Y}_{k})$ and notice that

{\mathbb{E}}\left\lVert\sum_{k}\begin{bmatrix}{\bf 0}&{\bf Y}_{k}\\ {{\bf Y}}^{\mathsf{T}}_{k}&{\bf 0}\end{bmatrix}\right\rVert_{4t}^{4t}={\mathbb% {E}}\operatorname{\operatorname{tr}}\left(\begin{bmatrix}((\sum_{k}{\bf Y}_{k}% )(\sum_{k}{{\bf Y}}^{\mathsf{T}}_{k}))^{2t}&{\bf 0}\\ {\bf 0}&((\sum_{k}{{\bf Y}}^{\mathsf{T}}_{k})(\sum_{k}{\bf Y}_{k}))^{2t}\end{% bmatrix}\right)=2{\mathbb{E}}\left\lVert\sum_{k}{\bf Y}_{k}\right\rVert_{4t}^{% 4t},

	$\displaystyle\left\lVert\left(\sum_{k}{\mathbb{E}}\begin{bmatrix}{\bf Y}_{k}{{% \bf Y}}^{\mathsf{T}}_{k}&{\bf 0}\\ {\bf 0}&{{\bf Y}}^{\mathsf{T}}_{k}{\bf Y}_{k}\end{bmatrix}\right)^{1/2}\right% \rVert_{4t}^{4t}$	$\displaystyle=\operatorname{\operatorname{tr}}\left(\begin{bmatrix}\sum_{k}{% \mathbb{E}}{\bf Y}_{k}{{\bf Y}}^{\mathsf{T}}_{k}&\bf 0\\ \bf 0&\sum_{k}{\mathbb{E}}{{\bf Y}}^{\mathsf{T}}_{k}{\bf Y}_{k}\end{bmatrix}% \right)^{2t}$
		$\displaystyle=\left\lVert\left(\sum_{k}{\mathbb{E}}{\bf Y}_{k}{{\bf Y}}^{% \mathsf{T}}_{k}\right)^{1/2}\right\rVert_{4t}^{4t}+\left\lVert\left(\sum_{k}{% \mathbb{E}}{{\bf Y}}^{\mathsf{T}}_{k}{\bf Y}_{k}\right)^{1/2}\right\rVert_{4t}% ^{4t},$
	$\displaystyle\sum_{k}{\mathbb{E}}\left\lVert\begin{bmatrix}{\bf 0}&{\bf Y}_{k}% \\ {{\bf Y}}^{\mathsf{T}}_{k}&{\bf 0}\end{bmatrix}\right\rVert_{4t}^{4t}$	$\displaystyle=2\sum_{k}{\mathbb{E}}\left\lVert{\bf Y}_{k}\right\rVert_{4t}^{4t}.$

Hence, the result follows. $\hfill\blacktriangleleft$

Claim 9.

Let ${\bf F}({\bf x})$ be a homogeneous multilinear polynomial random matrix of degree $D$ . Let ${\bf F}({\bf x}^{(1)},\dots,{\bf x}^{(D)})$ be the decoupled ${\bf F}({\bf x})$ , i.e.,

{\bf F}({\bf x}^{(1)},\dots,{\bf x}^{(D)})=\sum_{{\bf j}\in{\mathcal{T}}^{D}_{% n}}{\bf A}_{\bf j}\cdot x_{j_{1}}^{(1)}\cdots x_{j_{D}}^{(D)},

where $\{{\bf A}_{\bf k}\}_{{\bf k}\in{\mathcal{T}}^{D}_{n}}$ is a multi-indexed sequence of deterministic matrices of the same dimension. For some fixed $a,b,c\in{\mathbb{Z}}_{\geq 0}$ and $k=a+b+c<D$ , let ${\bf F}_{a,b,c}$ be the block matrix of $k$ -th order partial derivatives of ${\bf F}({\bf x}^{(1)},\dots,{\bf x}^{(D)})$ . Then ${\bf F}_{a,b,c}$ is a homogeneous multilinear polynomial random matrix of degree $d=D-k$ . Furthermore,

{\mathbb{E}}\left\lVert{\bf F}_{a,b,c}\right\rVert_{4t}^{4t}\leq(16t)^{3t}% \cdot\left({\mathbb{E}}\left\lVert{{\bf F}_{a+1,b,c}}\right\rVert_{4t}^{4t}+{% \mathbb{E}}\left\lVert{\bf F}_{a,b+1,c}\right\rVert_{4t}^{4t}\right)+(8t)^{4t}% \cdot{\mathbb{E}}\left\lVert{\bf F}_{a,b,c+1}\right\rVert_{4t}^{4t}.

Proof.

Without loss of generality, we differentiate ${\bf F}({\bf x}^{(1)},\dots,{\bf x}^{(D)})$ with respect to ${\bf x}^{(D)}$ first, then ${\bf x}^{(D-1)}$ , and so on. It is straightforward to see that after $k=a+b+c$ rounds of differentiation, ${\bf F}_{a,b,c}({\bf x}^{(1)},\dots,{\bf x}^{(d)})$ is a homogeneous polynomial random matrix of degree $d=D-k$ . And

{\bf F}_{a,b,c}({\bf x}^{(1)},\dots,{\bf x}^{(d)})=\sum_{{\bf i}\in{\mathcal{T% }}^{d}_{n}}{\bf B}_{{\bf i}}\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d}}^{(d)},

where $\{{\bf B}_{{\bf i}}\}_{{\bf i}\in{\mathcal{T}}^{d}_{n}}$ is a multi-indexed sequence of deterministic matrices of the same dimension. More specifically, each ${\bf B}_{{\bf i}}$ is a block matrix whose blocks consist of $\{{\bf A}_{{\bf j}}\}_{{\bf j}\in{\mathcal{T}}^{D}_{n}}$ such that $j_{1}=i_{1},\dots,j_{d}=i_{d}$ .

Conditioned on ${\bf x}^{(1)},\dots,{\bf x}^{(d-1)}$ , ${\bf F}_{a,b,c}({\bf x}^{(1)},\dots,{\bf x}^{(d)})$ is a sum of centered, (conditionally) independent random matrices in ${\bf x}^{(d)}$ . It follows that

		$\displaystyle{\mathbb{E}}\left(\left\lVert{\bf F}_{a,b,c}({\bf x}^{(1)},\dots,% {\bf x}^{(d)})\right\rVert^{4t}_{4t}\leavevmode\nobreak\ \Biggr{\rvert}% \leavevmode\nobreak\ {\bf x}^{(1)},\dots\bf x^{(d-1)}\right)$
	$\displaystyle=$	$\displaystyle\leavevmode\nobreak\ {\mathbb{E}}\left({\mathbb{E}}_{{\bf x}^{(d)% }}\left(\left\lVert\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}}% \cdot x_{i_{1}}^{(1)}\cdots x_{i_{d}}^{(d)}\right\rVert^{4t}_{4t}\leavevmode% \nobreak\ \Biggr{\rvert}\leavevmode\nobreak\ {\bf x}^{(1)},\dots\bf x^{(d-1)}% \right)\right)$
	$\displaystyle\leq$	$\displaystyle\leavevmode\nobreak\ (16t)^{3t}\cdot{\mathbb{E}}\left\lVert\left(% \sum_{i_{d}=1}^{n}\left(\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}% }\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d-1}}^{(d-1)}\right){\left(\sum_{{\bf i}\in% {\mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}}\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d-1}}% ^{(d-1)}\right)}^{\mathsf{T}}\right)^{1/2}\right\rVert_{4t}^{4t}$
		$\displaystyle+(16t)^{3t}\cdot{\mathbb{E}}\left\lVert\left(\sum_{i_{d}=1}^{n}{% \left(\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}}\cdot x_{i_{1}}^{% (1)}\cdots x_{i_{d-1}}^{(d-1)}\right)}^{\mathsf{T}}\left(\sum_{{\bf i}\in{% \mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}}\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d-1}}^% {(d-1)}\right)\right)^{1/2}\right\rVert_{4t}^{4t}$
		$\displaystyle+(8t)^{4t}L^{4t}\cdot\sum_{i_{d}=1}^{n}{\mathbb{E}}\left\lVert% \sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}}\cdot x_{i_{1}}^{(1)}% \cdots x_{i_{d-1}}^{(d-1)}\right\rVert_{4t}^{4t},$

where the inequality is due to matrix Rosenthal inequality (Lemma 8). Now we have

		$\displaystyle{\mathbb{E}}\left\lVert\left(\sum_{i_{d}=1}^{n}\left(\sum_{{\bf i% }\in{\mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}}\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d% -1}}^{(d-1)}\right){\left(\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i% }}\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d-1}}^{(d-1)}\right)}^{\mathsf{T}}\right)^% {1/2}\right\rVert_{4t}^{4t}$
	$\displaystyle=$	$\displaystyle\leavevmode\nobreak\ {\mathbb{E}}\left\lVert\left(\begin{bmatrix}% \partial_{x_{1}^{(d)}}{\bf F}_{a,b,c}&\dots&\partial_{x_{n}^{(d)}}{\bf F}_{a,b% ,c}\end{bmatrix}\begin{bmatrix}\partial_{x_{1}^{(d)}}{{\bf F}}^{\mathsf{T}}_{a% ,b,c}\\ \vdots\\ \partial_{x_{n}^{(d)}}{{\bf F}}^{\mathsf{T}}_{a,b,c}\end{bmatrix}\right)^{1/2}% \right\rVert^{4t}_{4t}={\mathbb{E}}\left\lVert{\bf F}_{a,b+1,c}\right\rVert^{4% t}_{4t},$

		$\displaystyle{\mathbb{E}}\left\lVert\left(\sum_{i_{d}=1}^{n}{\left(\sum_{{\bf i% }\in{\mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}}\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d% -1}}^{(d-1)}\right)}^{\mathsf{T}}\left(\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}{% \bf B}_{{\bf i}}\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d-1}}^{(d-1)}\right)\right)^% {1/2}\right\rVert_{4t}^{4t}$
	$\displaystyle=$	$\displaystyle\leavevmode\nobreak\ {\mathbb{E}}\left\lVert\left(\begin{bmatrix}% \partial_{x_{1}^{(d)}}{{\bf F}}^{\mathsf{T}}_{a,b,c}&\dots&\partial_{x_{n}^{(d% )}}{{\bf F}}^{\mathsf{T}}_{a,b,c}\end{bmatrix}\begin{bmatrix}\partial_{x_{1}^{% (d)}}{\bf F}_{a,b,c}\\ \vdots\\ \partial_{x_{n}^{(d)}}{\bf F}_{a,b,c}\end{bmatrix}\right)^{1/2}\right\rVert_{4% t}^{4t}={\mathbb{E}}\left\lVert{\bf F}_{a+1,b,c}\right\rVert^{4t}_{4t},$

		$\displaystyle\sum_{i_{d}=1}^{n}{\mathbb{E}}\left\lVert\sum_{{\bf i}\in{% \mathcal{T}}^{d}_{n}}{\bf B}_{{\bf i}}\cdot x_{i_{1}}^{(1)}\cdots x_{i_{d-1}}^% {(d-1)}\right\rVert_{4t}^{4t}$
	$\displaystyle=$	$\displaystyle\leavevmode\nobreak\ {\mathbb{E}}\operatorname{\operatorname{tr}}% \begin{bmatrix}\partial_{x_{1}^{(d)}}{\bf F}_{a,b,c}\cdot\partial_{x_{1}^{(d)}% }{{\bf F}}^{\mathsf{T}}_{a,b,c}&&\\ &\ddots&\\ &&\partial_{x_{n}^{(d)}}{\bf F}_{a,b,c}\cdot\partial_{x_{n}^{(d)}}{{\bf F}}^{% \mathsf{T}}_{a,b,c}\end{bmatrix}^{2t}={\mathbb{E}}\left\lVert{\bf F}_{a,b,c+1}% \right\rVert^{4t}_{4t}.$

$\hfill\vartriangleleft$

We present a moment bound for permutation symmetric, homogeneous multilinear polynomial random matrices of degree $d$ .

Theorem 10 (Homogeneous Multilinear Recursion).

Let ${\bf x}=\{x_{i}\}_{i=1}^{n}$ be a sequences of i.i.d random variables with ${\mathbb{E}}x_{i}=0$ , ${\mathbb{E}}x_{i}^{2}=1$ and $|x_{i}|\leq L$ for all $1\leq i\leq n$ . Let $\{{\bf A}_{{\bf i}}\}_{{\bf i}\in{\mathcal{T}}^{d}_{n}}$ be a multi-indexed sequence of deterministic matrices of the same dimension. Define a permutation symmetric, homogeneous multilinear polynomial random matrix of degree $d$ as

{\bf F}({\bf x})=\sum_{{\bf i}\in{\mathcal{T}}^{d}_{n}}\left({\bf A}_{{\bf i}}% \cdot\prod_{j\in\{i_{1},\dots,i_{d}\}}x_{j}\right).

Let $a,b,c\in{\mathbb{Z}}_{\geq 0}$ and $d=a+b+c$ . Then for $2\leq t\leq\infty,$

{\mathbb{E}}\left\lVert{\bf F}({\bf x})-{\mathbb{E}}{\bf F}({\bf x})\right% \rVert_{4t}^{4t}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ \sum_{\begin{% subarray}{c}a,b,c:\\ a+b+c=d\end{subarray}}(48dt)^{4dt}\cdot L^{4ct}\left\lVert{\bf F}_{a,b,c}% \right\rVert_{4t}^{4t}.

Proof.

Let ${\bf x}^{(1)},\dots,{\bf x}^{(d)}$ be $d$ independent copies of ${\bf x}$ . We decouple ${\bf F}({\bf x})$ by Lemma 5,

E:={\mathbb{E}}\left\lVert{\bf F}(({\bf x})\right\rVert_{4t}^{4t}\leavevmode% \nobreak\ \leq\leavevmode\nobreak\ d^{4dt}\cdot{\mathbb{E}}\left\lVert{\bf F}(% {\bf x}^{(1)},\dots,{\bf x}^{(d)})\right\rVert_{4t}^{4t}.

We take the partial derivative with respect to ${\bf x}^{(d)}$ by applying Lemma 8 to get,

E\leavevmode\nobreak\ \leq\leavevmode\nobreak\ d^{4dt}(16t)^{3t}\cdot\left({% \mathbb{E}}\left\lVert{\bf F}_{1,0,0}\right\rVert_{4t}^{4t}+{\mathbb{E}}\left% \lVert{\bf F}_{0,1,0}\right\rVert_{4t}^{4t}\right)+d^{4dt}(8t)^{4t}L^{4t}\cdot% {\mathbb{E}}\left\lVert{\bf F}_{0,0,1}\right\rVert_{4t}^{4t}.

Note that ${\bf F}_{1,0,0}$ , ${\bf F}_{0,1,0}$ , and ${\bf F}_{0,0,1}$ are functions in variables ${\bf x}^{(1)},\dots,{\bf x}^{(d-1)}$ . We take the partial derivative with respect to the rest of the variables until ${\bf F}_{a,b,c}$ ’s become deterministic matrices. Apply 9 recursively and use the permutation symmetric property of ${\bf F}({\bf x})$ , we have

E\leavevmode\nobreak\ \leq\leavevmode\nobreak\ \sum_{\begin{subarray}{c}a,b,c:% \\ a+b+c=d\end{subarray}}(16dt)^{4dt}\cdot L^{4ct}\frac{d!}{a!b!c!}\left\lVert{% \bf F}_{a,b,c}\right\rVert_{4t}^{4t}.

Since $\frac{d!}{a!b!c!}\leq\frac{d!}{(d/3)!(d/3)!(d/3)!}$ and $\frac{d!}{(d/3)!(d/3)!(d/3)!}\sim\frac{3\sqrt{3}}{2\pi d}\cdot 3^{d}$ ,

E\leavevmode\nobreak\ \leq\leavevmode\nobreak\ \sum_{\begin{subarray}{c}a,b,c:% \\ a+b+c=d\end{subarray}}(48dt)^{4dt}\cdot L^{4ct}\left\lVert{\bf F}_{a,b,c}% \right\rVert_{4t}^{4t}.\

$\hfill\blacktriangleleft$

As a corollary, we derive a moment bound for general multilinear polynomial random matrices of degree $D$ . The polynomials are split into homogeneous parts and each parts are bounded separately. Let ${\bf F}^{=d}({\bf x})$ denote the degree- $d$ homogeneous part of ${\bf F}({\bf x})$ .

Corollary 11 (Multilinear Recursion).

Let ${\bf x}=\{x_{i}\}_{i=1}^{n}$ be a sequences of i.i.d random variables with ${\mathbb{E}}x_{i}=0$ , ${\mathbb{E}}x_{i}^{2}=1$ and $|x_{i}|\leq L$ for all $1\leq i\leq n$ . Let $\{{\bf A}_{{\bf i}}\}_{{\bf i}\in{\mathcal{T}}^{d}_{n}}$ be multi-indexed sequences of deterministic matrices of the same dimension. Define a degree- $D$ multilinear polynomial random matrix as

{\bf F}({\bf x})=\sum_{d=1}^{D}\leavevmode\nobreak\ \sum_{{\bf i}\in{\mathcal{% T}}^{d}_{n}}\left({\bf A}_{{\bf i}}\cdot\prod_{j\in\{i_{1},\dots,i_{d}\}}x_{j}% \right).

Suppose ${\bf F}^{=d}({\bf x})$ is permutation symmetric for $1\leq d\leq D$ . Let $a,b,c\in{\mathbb{Z}}_{\geq 0}$ and $d=a+b+c$ . Then for $2\leq t\leq\infty,$

{\mathbb{E}}\left\lVert{\bf F}({\bf x})-{\mathbb{E}}{\bf F}({\bf x})\right% \rVert_{4t}^{4t}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ \sum_{d=1}^{D}D^% {4t}(48dt)^{4dt}\left(\sum_{\begin{subarray}{c}a,b,c:\\ a+b+c=d\end{subarray}}L^{4ct}\left\lVert{\bf F}_{a,b,c}^{=d}\right\rVert_{4t}^% {4t}\right).

Proof.

We rewrite ${\bf F}({\bf x})$ into a formal sum of its homogeneous parts ${\bf F}({\bf x})=\sum_{d=1}^{D}{\bf F}^{=d}({\bf x})$ . By the trace inequality (see [33] Theorem 3.1), we have

{\mathbb{E}}\left\lVert{\bf F}({\bf x})-{\mathbb{E}}{\bf F}({\bf x})\right% \rVert_{4t}^{4t}\leq D^{4t}\sum_{d=1}^{D}{\mathbb{E}}\left\lVert{\bf F}^{=d}({% \bf x})-{\mathbb{E}}{\bf F}^{=d}({\bf x})\right\rVert_{4t}^{4t}.

By Theorem 10,

{\mathbb{E}}\left\lVert{\bf F}({\bf x})-{\mathbb{E}}{\bf F}({\bf x})\right% \rVert_{4t}^{4t}\leq D^{4t}\sum_{d=1}^{D}(48dt)^{4dt}\left(\sum_{\begin{% subarray}{c}a,b,c:\\ a+b+c=d\end{subarray}}L^{4ct}\left\lVert{\bf F}_{a,b,c}^{=d}\right\rVert_{4t}^% {4t}\right).\

$\hfill\blacktriangleleft$

5 Gaussian Polynomial Random Matrices

In this section, we prove a moment bound for polynomial random matrices with Gaussian random variables. While the decoupling technique could work for Gaussian polynomial random matrices as well, we base our recursion framework on the following bound for its simplicity.

Lemma 12 (Polynomial Moments, see [14] Theorem 7.1).

Let ${\bf H}({\bf x}):\Omega\rightarrow\mathbb{H}^{d}$ be a function with ${\bf x}\sim{\mathcal{N}(0,\mathbb{I}_{n})}$ . For $t=1$ and $t\geq 1.5$ ,

{\mathbb{E}}\left\lVert{\bf H}({\bf x})-{\mathbb{E}}{\bf H}({\bf x})\right% \rVert_{2t}^{2t}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ (\sqrt{2}t)^{2t}% \cdot{\mathbb{E}}\operatorname{\operatorname{tr}}\left(\sum_{i=1}^{n}(\partial% _{i}{\bf H}({\bf x}))^{2}\right)^{t}.

(3)

The non-Hermitian version of Lemma 12 can be easily obtained by the technique of Hermitian dilation.

Lemma 13 (Non-Hermitian Polynomial Moments).

Let ${\bf F}({\bf x}):\Omega\rightarrow\mathbb{R}^{d_{1}\times d_{2}}$ be a function with ${\bf x}\sim{\mathcal{N}(0,\mathbb{I}_{n})}$ . For $t=1$ and $t\geq 1.5$ ,

{\mathbb{E}}\left\lVert{\bf F}({\bf x})-{\mathbb{E}}{\bf F}({\bf x})\right% \rVert_{2t}^{2t}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ (\sqrt{2}t)^{2t}% \cdot\left({\mathbb{E}}\left\lVert\left(\sum_{i=1}^{n}\partial_{i}{\bf F}({\bf x% })\leavevmode\nobreak\ \partial_{i}{{\bf F}}^{\mathsf{T}}({\bf x})\right)^{1/2% }\right\rVert^{2t}_{2t}+{\mathbb{E}}\left\lVert\left(\sum_{i=1}^{n}\partial_{i% }{{\bf F}}^{\mathsf{T}}({\bf x})\leavevmode\nobreak\ \partial_{i}{\bf F}({\bf x% })\right)^{1/2}\right\rVert_{2t}^{2t}\right).

Theorem 14 (Gaussian Recursion).

Let ${\bf x}\sim{\mathcal{N}(0,\mathbb{I}_{n})}$ and $\{{\bf A}_{{\bf i}}\}_{{\bf i}\in[n]^{d}}$ be multi-indexed sequences of deterministic matrices of the same dimension. Define a degree- $d$ homogeneous Gaussian polynomial random matrix as

{\bf P}({\bf x})=\sum_{{\bf i}\in[n]^{d}}\left({\bf A}_{{\bf i}}\cdot\prod_{j% \in\{i_{1},\dots,i_{d}\}}x_{j}\right).

Let $a,b\in{\mathbb{Z}}_{\geq 0}$ . Then for $2\leq t\leq\infty,$

{\mathbb{E}}\left\lVert{\bf P}({\bf x})-{\mathbb{E}}{\bf P}({\bf x})\right% \rVert_{2t}^{2t}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ (2^{d}\sqrt{2}t)% ^{2t}\left(\sum_{1\leq a+b\leq d}\left\lVert{\mathbb{E}}{\bf P}_{a,b}\right% \rVert_{2t}^{2t}\right).

Proof.

Start by applying 15 with $k=0$ . Apply 15 recursively until $k=d-1$ to get the desired bound. $\hfill\blacktriangleleft$

Claim 15.

Let ${\bf x}\sim{\mathcal{N}(0,\mathbb{I}_{n})}$ and $\{{\bf A}_{{\bf i}}\}_{{\bf i}\in[n]^{d}}$ be multi-indexed sequences of deterministic matrices of the same dimension. Define a degree- $d$ homogeneous Gaussian polynomial random matrix as

{\bf P}({\bf x})=\sum_{{\bf i}\in[n]^{d}}\left({\bf A}_{{\bf i}}\cdot\prod_{j% \in\{i_{1},\dots,i_{d}\}}x_{j}\right).

Let $a,b\in{\mathbb{Z}}_{\geq 0}$ and $a+b=k<d$ . Let ${\bf P}_{a,b}({\bf x})$ be the $k$ -th partial derivative block matrix associated to ${\bf P}({\bf x})$ . Then for $2\leq t\leq\infty,$

	$\displaystyle{\mathbb{E}}\left\lVert{\bf P}_{a,b}-{\mathbb{E}}{\bf P}_{a,b}% \right\rVert_{2t}^{2t}\leavevmode\nobreak\$	$\displaystyle\leq\leavevmode\nobreak\ (2\sqrt{2}t)^{2t}\cdot\left({\mathbb{E}}% \left\lVert{\bf P}_{a+1,b}-{\mathbb{E}}{\bf P}_{a+1,b}\right\rVert^{2t}_{2t}+% \left\lVert{\mathbb{E}}{\bf P}_{a+1,b}\right\rVert^{2t}_{2t}\right)$
		$\displaystyle\leavevmode\nobreak\ +(2\sqrt{2}t)^{2t}\cdot\left({\mathbb{E}}% \left\lVert{\bf P}_{a,b+1}-{\mathbb{E}}{\bf P}_{a,b+1}\right\rVert^{2t}_{2t}+% \left\lVert{\mathbb{E}}{\bf P}_{a,b+1}\right\rVert^{2t}_{2t}\right).$

Proof.

By the non-hermitian polynomial moment (Lemma 13), we have

	$\displaystyle E:={\mathbb{E}}\left\lVert{\bf P}_{a,b}-{\mathbb{E}}{\bf P}_{a,b% }\right\rVert_{2t}^{2t}$	$\displaystyle\leq(\sqrt{2}t)^{2t}\cdot\left({\mathbb{E}}\left\lVert\left(\sum_% {i=1}^{n}\partial_{i}{\bf P}_{a,b}\leavevmode\nobreak\ \partial_{i}{{\bf P}}^{% \mathsf{T}}_{a,b}\right)^{1/2}\right\rVert^{2t}_{2t}+{\mathbb{E}}\left\lVert% \left(\sum_{i=1}^{n}\partial_{i}{{\bf P}}^{\mathsf{T}}_{a,b}\leavevmode% \nobreak\ \partial_{i}{\bf P}_{a,b}\right)^{1/2}\right\rVert_{2t}^{2t}\right)$
		$\displaystyle=(\sqrt{2}t)^{2t}\cdot\left({\mathbb{E}}\left\lVert\begin{bmatrix% }\partial_{1}{\bf P}_{a,b}&\dots&\partial_{n}{\bf P}_{a,b}\end{bmatrix}\right% \rVert^{2t}_{2t}+{\mathbb{E}}\left\lVert\begin{bmatrix}\partial_{1}{\bf P}_{a,% b}\\ \vdots\\ \partial_{n}{\bf P}_{a,b}\end{bmatrix}\right\rVert^{2t}_{2t}\right)$
		$\displaystyle=(\sqrt{2}t)^{2t}\cdot\left({\mathbb{E}}\left\lVert{\bf P}_{a+1,b% }\right\rVert^{2t}_{2t}+{\mathbb{E}}\left\lVert{\bf P}_{a,b+1}\right\rVert^{2t% }_{2t}\right),$

where the last equality uses our notation for partial derivative block matrices introduced in Section 2. Notice that ${\bf P}_{a+1,b}$ and ${\bf P}_{a,b+1}$ are homogeneous Gaussian polynomial random matrix of degree $k-1$ . Since ${\bf P}_{a+1,b}$ and ${\bf P}_{a,b+1}$ are not necessarily centered,

	$\displaystyle E\leavevmode\nobreak\$	$\displaystyle\leq\leavevmode\nobreak\ (\sqrt{2}t)^{2t}\cdot\left({\mathbb{E}}% \left\lVert{\bf P}_{a+1,b}-{\mathbb{E}}{\bf P}_{a+1,b}+{\mathbb{E}}{\bf P}_{a+% 1,b}\right\rVert^{2t}_{2t}+{\mathbb{E}}\left\lVert{\bf P}_{a,b+1}-{\mathbb{E}}% {\bf P}_{a,b+1}+{\mathbb{E}}{\bf P}_{a,b+1}\right\rVert^{2t}_{2t}\right)$
		$\displaystyle\leq\leavevmode\nobreak\ (2\sqrt{2}t)^{2t}\cdot\left({\mathbb{E}}% \left\lVert{\bf P}_{a+1,b}-{\mathbb{E}}{\bf P}_{a+1,b}\right\rVert^{2t}_{2t}+% \left\lVert{\mathbb{E}}{\bf P}_{a+1,b}\right\rVert^{2t}_{2t}+{\mathbb{E}}\left% \lVert{\bf P}_{a,b+1}-{\mathbb{E}}{\bf P}_{a,b+1}\right\rVert^{2t}_{2t}+\left% \lVert{\mathbb{E}}{\bf P}_{a,b+1}\right\rVert^{2t}_{2t}\right),$

where the last inequality is due to the trace inequality (see [33] Theorem 3.1). $\hfill\vartriangleleft$

6 Application: Graph Matrices

Denote the Erdős-Rényi random graph on $n$ vertices as ${\mathcal{G}}_{n,p}$ , where each edge is present with probability $p$ independent of all other edges. When ${\mathcal{G}}_{n,p}$ is viewed as a probability space, it is equal to $(\Omega,\mathcal{F},\mu)$ where $\Omega$ is the sample space of all possible graphs on $n$ vertices. Any $G\in\Omega$ is a random vector representing all edges. Each coordinate in $G$ , denoted by $G_{ij}$ , is an independent random variable representing a single edge. We can construct a random graph by drawing a copy of $G\sim\mu$ . To adopt the convention in $p$ -biased Fourier analysis, we normalize $G_{ij}$ such that ${\mathbb{E}}G_{ij}=0$ and ${\mathbb{E}}G_{ij}^{2}=1$ , which leads to the sample space $\Omega=\left\{-\sqrt{\frac{1-p}{p}},\sqrt{\frac{p}{1-p}}\right\}^{\binom{n}{2}}$ .

Definition 16 (Shape, see e.g. [29] Definition 4.2).

A shape is a tuple $\tau=(V(\tau),E(\tau),U_{\tau},V_{\tau})$ where $(V(\tau),E(\tau))$ is a graph and $U_{\tau}$ , $V_{\tau}$ are ordered subsets of the vertices.

Definition 17 (Graph matrix, see e.g. [29] Definition 4.4).

Given a shape $\tau$ , the associated graph matrix ${\bf M}$ : $\Omega\rightarrow{\mathbb{R}}^{d_{1}\times d_{2}}$ is a matrix-valued function such that

{\bf M}[I,J]=\sum_{\begin{subarray}{c}\varphi\in{\mathcal{T}}^{k}_{n}:\\ \varphi(U_{\tau})=I,\varphi(V_{\tau})=J\end{subarray}}\prod_{(u,v)\in E(\tau)}% G_{\varphi(u),\varphi(v)}.

In other words, ${\bf M}$ maps an input graph $G\in\Omega$ to a $\frac{n!}{(n-|I|)!}\times\frac{n!}{(n-|J|)!}$ matrix whose rows and columns are indexed by $I$ and $J$ respectively.

Definition 18 (Vertex Separator, see e.g. [29] Definition 4.8).

For any shape $\tau$ , a vertex separator is a subset of vertices $S\subseteq V(\tau)$ such that there is no path from $U_{\tau}$ to $V_{\tau}$ in $\tau\leavevmode\nobreak\ \textbackslash\leavevmode\nobreak\ S$ , which is the shape obtained by deleting $S$ and all edges adjacent to S. We write $S_{\tau}$ for a vertex separator of the minimum size.

Claim 19.

For any shape $\tau$ , let $\{E_{1},E_{2}\}$ be an arbitrary cover of $E(\tau)$ . Let $S=E_{1}\cap E_{2}$ be the vertex set of the overlapping edges between $E_{1}$ and $E_{2}$ . If $S$ contains $E_{1}\cap V_{\tau}$ and $E_{2}\cap U_{\tau}$ , then $S$ is a vertex separator.

Proof.

Since $S=E_{1}\cap E_{2}$ , there is no path from $E_{1}\backslash S$ to $E_{2}\backslash S$ . Additionally, if $S$ contains $E_{1}\cap V_{\tau}$ , then $E_{1}\backslash S$ is not adjacent to $V_{\tau}$ and $E_{1}\backslash S$ contains $U_{\tau}\backslash S$ . If $S$ also contains $E_{2}\cap U_{\tau}$ , then $E_{2}\backslash S$ is not adjacent to $U_{\tau}$ and $E_{2}\backslash S$ contains $V_{\tau}\backslash S$ . Since there is no path from $U_{\tau}\backslash S$ to $V_{\tau}\backslash S$ , $S$ is a vertex separator. $\hfill\vartriangleleft$

Theorem 20.

For any shape $\tau$ and $2\leq t\leq\infty$ ,

{\mathbb{E}}\left\lVert{\bf M}\right\rVert_{4t}^{4t}\leavevmode\nobreak\ \leq% \leavevmode\nobreak\ \left((48t|V(\tau)|)^{4t|V(\tau)|}(C|E(\tau)|)^{|E(\tau)|% }n^{|V(\tau)|}\right)\left(\sqrt{\frac{1-p}{p}}\right)^{4t|E(S_{\tau})|}n^{2t(% |V(\tau)|-|S_{\tau}|)}

where $C$ is an absolute constant and $E(S_{\tau})$ are all edges adjacent to $S_{\tau}$ .

Proof.

First, let $|V(\tau)|=k$ and we write ${\bf M}$ as a polynomial whose coefficients are matrices,

{\bf M}=\sum_{\varphi\in{\mathcal{T}}^{k}_{n}}{\bf B}_{\varphi(E)}\prod_{(i,j)% \in E(\tau)}G_{\varphi(i),\varphi(j)},

where the $[\textit{I},\textit{J}]$ -entry of ${\bf B}_{\varphi(E)}$ is

{\bf B}_{\varphi(E)}[\textit{I},\textit{J}]=\begin{cases}1&\text{if}% \leavevmode\nobreak\ \varphi(U_{\tau})=\textit{I}\leavevmode\nobreak\ % \leavevmode\nobreak\ \text{and}\leavevmode\nobreak\ \leavevmode\nobreak\ % \varphi(V_{\tau})=\textit{J}\\ 0&\text{otherwise}.\end{cases}

Note that ${\bf B}_{\varphi(E)}$ ’s have the same dimension as ${\bf M}$ and the rows and columns of ${\bf B}_{\varphi(E)}$ ’s are indexed in the same way as ${\bf M}$ . Since $U_{\tau}$ and $V_{\tau}$ are ordered and each $\varphi$ assigns one set of value to vertices in $U_{\tau}$ and $V_{\tau}$ , there is only one nonzero entry in each ${\bf B}_{\varphi(E)}$ . But there might be multiple $\varphi$ ’s for which ${\bf B}_{\varphi(E)}$ ’s are identical due to free vertices.

Let $|E(\tau)|=d$ and we rewrite ${\bf M}$ in a permutation symmetric way,

	$\displaystyle{\bf M}$	$\displaystyle=\sum_{\varphi\in{\mathcal{T}}^{k}_{n}}\left(\frac{1}{d!}\sum_{% \sigma\in\mathfrak{S}_{d}}{\bf B}_{\sigma(\varphi(E))}G_{\varphi(i_{\sigma(1)}% ),\varphi(j_{\sigma(1)})}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ G_{% \varphi(i_{\sigma(d)}),\varphi(j_{\sigma(d)})}\right)$
		$\displaystyle=\frac{1}{d!}\sum_{\varphi\in{\mathcal{T}}^{k}_{n}}{\bf A}_{% \varphi(E)}\cdot G_{\varphi(i_{1}),\varphi(j_{1})}\leavevmode\nobreak\ \cdots% \leavevmode\nobreak\ G_{\varphi(i_{d}),\varphi(j_{d})},$

where ${\bf A}_{\varphi(E)}=\sum_{\sigma\in\mathfrak{S}_{d}}{\bf B}_{\sigma(\varphi(E% ))}$ . Notice that the indices of the random variables of ${\bf M}$ have the same structure as in Lemma 6, so we can decouple ${\bf M}$ using Lemma 6,

F:={\mathbb{E}}\left\lVert{\bf M}\right\rVert_{4t}^{4t}\leavevmode\nobreak\ % \leq\leavevmode\nobreak\ |V(\tau)|^{4t|V(\tau)|}\left(\frac{1}{d!}\right)^{4t}% \cdot{\mathbb{E}}\left\lVert\sum_{\varphi\in{\mathcal{T}}^{k}_{n}}{\bf A}_{% \varphi(E)}\cdot G_{\varphi(i_{1}),\varphi(j_{1})}^{(1)}\cdots G_{\varphi(i_{d% }),\varphi(j_{d})}^{(d)}\right\rVert_{4t}^{4t},

where $\{G^{(1)},\ldots,G^{(d)}\}$ denote $d$ independent copies of $G$ . Since ${\bf M}$ is a multilinear polynomial random matrix, ${\mathbb{E}}{\bf M}=\bf 0$ . An application of Theorem 10 yields,

F\leavevmode\nobreak\ \leq\leavevmode\nobreak\ \sum_{\begin{subarray}{c}a,b,c:% \\ a+b+c=|E(\tau)|\end{subarray}}(48t|V(\tau)|)^{4t|V(\tau)|}\left(\frac{1}{|E(% \tau)|!}\right)^{4t}\left(\sqrt{\frac{1-p}{p}}\right)^{4ct}\left\lVert{\bf M}_% {a,b,c}\right\rVert_{4t}^{4t},

where $\{{\bf M}_{a,b,c}\}$ are partial derivative block matrices associated to ${\bf M}$ .

Fix a set of $a, b, c$ , ${\bf M}_{a,b,c}$ is a block matrix whose blocks are comprised of $\{{\bf A}_{\varphi(E)}\}_{\varphi\in{\mathcal{T}}^{k}_{n}}$ . More specifically, the $[I,J]$ -th block of ${\bf M}_{a,b,c}$ is

{\bf M}_{a,b,c}[I,J]=\partial_{G_{I_{1}}}\cdots\partial_{G_{I_{a+c}}}\partial_% {G_{J_{1}}}\cdots\partial_{G_{J_{b}}}{\bf M}={\bf A}_{I\cup J}=\sum_{\sigma\in% \mathfrak{S}_{d}}{\bf B}_{\sigma(I\cup J)}.

(4)

Algebraically, the last equality is due to the commutativity of partial derivative operators. Combinatorially, if we fix an arbitrary ordering on the edge set $E(\tau)$ , the last expression is summing up all permutations on $d$ edges in $E(\tau)$ . Now let’s delve into the combinatorics. Let $E(\tau)=E_{1}\cup E_{2}$ , where $E_{1}$ contains $a+c$ edges and $E_{2}$ contains $b+c$ edges with $c$ number of overlapping edges. Let $S=E_{1}\cap E_{2}$ be the set of vertices shared by $E_{1}$ and $E_{2}$ . The row and column blocks of ${\bf M}_{a,b,c}$ are indexed by $\varphi(E_{1})$ and $\varphi(E_{2})$ respectively. Let’s write

{\bf M}_{a,b,c}=\sum_{\sigma\in\mathfrak{S}_{d}}{\bf M}_{a,b,c,\sigma},

where ${\bf M}_{a,b,c,\sigma}[I,J]={\bf B}_{\sigma(I\cup J)}$ . Applying the trace inequality (see [33] Theorem 3.1), it follows that

F\leavevmode\nobreak\ \leq\leavevmode\nobreak\ \sum_{\begin{subarray}{c}a,b,c:% \\ a+b+c=|E(\tau)|\end{subarray}}\sum_{\sigma\in\mathfrak{S}_{d}}(48t|V(\tau)|)^{% 4t|V(\tau)|}\left(\sqrt{\frac{1-p}{p}}\right)^{4ct}\left\lVert{\bf M}_{a,b,c,% \sigma}\right\rVert_{4t}^{4t}.

(5)

For any two distinct $\sigma_{1}$ and $\sigma_{2}\in\mathfrak{S}_{d}$ , $\left\lVert{\bf M}_{a,b,c,\sigma_{1}}\right\rVert^{4t}_{4t}=\left\lVert{\bf M}% _{a,b,c,\sigma_{2}}\right\rVert^{4t}_{4t}$ since ${\bf M}_{a,b,c,\sigma_{1}}$ and ${\bf M}_{a,b,c,\sigma_{2}}$ are identical after permuting rows and columns. So we will focus on bounding $\left\lVert{\bf M}_{a,b,c,\sigma_{0}}\right\rVert^{4t}_{4t}$ where $\sigma_{0}$ is the identity permutation. For identity permutation, we have ${\bf M}_{a,b,c,\sigma_{0}}[I,J]={\bf B}_{I\cup J}$ by (4). Denote ${\bf M}_{a,b,c,\sigma_{0}}$ by ${\bf F}$ ,

$\displaystyle E:=$	$\displaystyle\left\lVert{\bf M}_{a,b,c,\sigma_{0}}\right\rVert^{4t}_{4t}=\left% \lVert{\bf F}\right\rVert^{4t}_{4t}=\operatorname{\operatorname{tr}}\left({{% \bf F}}^{\mathsf{T}}{\bf F}\right)^{2t}$
$\displaystyle=$	$\displaystyle\operatorname{\operatorname{tr}}\sum_{I_{1},\dots,I_{2t}\in[n]^{2% (a+c)}}\sum_{J_{1},\dots,J_{2t}\in[n]^{2(b+c)}}{{\bf F}}^{\mathsf{T}}[I_{1},J_% {1}]\leavevmode\nobreak\ {\bf F}[I_{1},J_{2}]\leavevmode\nobreak\ {{\bf F}}^{% \mathsf{T}}[I_{2},J_{2}]\leavevmode\nobreak\ {\bf F}[I_{2},J_{3}]\cdots{{\bf F% }}^{\mathsf{T}}[I_{2t},J_{2t}]\leavevmode\nobreak\ {\bf F}[I_{2t},J_{1}]$
$\displaystyle=$	$\displaystyle\operatorname{\operatorname{tr}}\sum_{I_{1},\dots,I_{2t}\in[n]^{2% (a+c)}}\sum_{J_{1},\dots,J_{2t}\in[n]^{2(b+c)}}{{\bf B}}^{\mathsf{T}}_{I_{1}% \cup J_{1}}\leavevmode\nobreak\ {\bf B}_{I_{1}\cup J_{2}}\leavevmode\nobreak\ % {{\bf B}}^{\mathsf{T}}_{I_{2}\cup J_{2}}\leavevmode\nobreak\ {\bf B}_{I_{2}% \cup J_{3}}\cdots{{\bf B}}^{\mathsf{T}}_{I_{2t}\cup J_{2t}}\leavevmode\nobreak% \ {\bf B}_{I_{2t}\cup J_{1}}$	(6)

First of all, ${\bf B}_{I_{i}\cup J_{i}}\neq{\bf 0}$ implies that $I_{i}\cup J_{i}=\varphi(E_{1}\cup E_{2})$ , for some $\varphi\in{\mathcal{T}}^{k}_{n}$ and $I_{i}(S)=J_{i}(S)$ for $1\leq i\leq 2t$ . Similarly, ${\bf B}_{I_{i}\cup J_{i+1}}\neq{\bf 0}$ implies that $I_{i}\cup J_{i+1}=\varphi’(E_{1}\cup E_{2})$ , for some $\varphi’\in{\mathcal{T}}^{k}_{n}$ and $I_{i}(S)=J_{i+1}(S)$ for $1\leq i\leq 2t$ (the additions in the subscripts are in mod $2t,i.e.,J_{2t+1}=J_{1}$ ). Secondly, for each ${\bf B}_{I_{i}\cup J_{i}}\neq{\bf 0}$ , there is only one nonzero entry, namely ${\bf B}_{I_{i}\cup J_{i}}[I_{i}\cup J_{i}(U_{\tau}),I_{i}\cup J_{i}(V_{\tau})]=1$ . So ${{\bf B}}^{\mathsf{T}}_{I_{i}\cup J_{i}}{\bf B}_{I_{i}\cup J_{i+1}}\neq{\bf 0}$ if $I_{i}\cup J_{i}(U_{\tau})=I_{i}\cup J_{i+1}(U_{\tau})$ and the only nonzero entry is

{{\bf B}_{I_{i}\cup J_{i}}}^{\mathsf{T}}{\bf B}_{I_{i}\cup J_{i+1}}[I_{i}\cup J% _{i}(V_{\tau}),I_{i}\cup J_{i+1}(V_{\tau})]=1.

It follows that ${{\bf B}}^{\mathsf{T}}_{I_{i}\cup J_{i}}{\bf B}_{I_{i}\cup J_{i+1}}\neq{\bf 0}$ for $1\leq i\leq 2t$ if $J_{1}(E_{2}\cap U_{\tau})=J_{2}(E_{2}\cap U_{\tau})=\cdots=J_{2t}(E_{2}\cap U_% {\tau})$ . Lastly, ${{\bf B}}^{\mathsf{T}}_{I_{i}\cup J_{i}}{\bf B}_{I_{i}\cup J_{i+1}}{{\bf B}}^{% \mathsf{T}}_{I_{i+1}\cup J_{i+1}}{\bf B}_{I_{i+1}\cup J_{i+2}}\neq{\bf 0}$ if $I_{i}\cup J_{i+1}(V_{\tau})=I_{i+1}\cup J_{i+1}(V_{\tau})$ and the only nonzero entry is

{{\bf B}}^{\mathsf{T}}_{I_{i}\cup J_{i}}{\bf B}_{I_{i}\cup J_{i+1}}{{\bf B}}^{% \mathsf{T}}_{I_{i+1}\cup J_{i+1}}{\bf B}_{I_{i+1}\cup J_{i+2}}[I_{i}\cup J_{i}% (V_{\tau}),I_{i+1}\cup J_{i+2}(V_{\tau})]=1.

It follows that ${{\bf B}}^{\mathsf{T}}_{I_{1}\cup J_{1}}\leavevmode\nobreak\ {\bf B}_{I_{1}% \cup J_{2}}\leavevmode\nobreak\ {{\bf B}}^{\mathsf{T}}_{I_{2}\cup J_{2}}% \leavevmode\nobreak\ {\bf B}_{I_{2}\cup J_{3}}\cdots{{\bf B}}^{\mathsf{T}}_{I_% {2t}\cup J_{2t}}\leavevmode\nobreak\ {\bf B}_{I_{2t}\cup J_{1}}\neq{\bf 0}$ if $I_{1}(E_{1}\cap V_{\tau})=I_{2}(E_{1}\cap V_{\tau})=\cdots=I_{2t}(E_{1}\cap V_% {\tau})$ and the only nonzero entry is

{{\bf B}}^{\mathsf{T}}_{I_{1}\cup J_{1}}\leavevmode\nobreak\ {\bf B}_{I_{1}% \cup J_{2}}\leavevmode\nobreak\ {{\bf B}}^{\mathsf{T}}_{I_{2}\cup J_{2}}% \leavevmode\nobreak\ {\bf B}_{I_{2}\cup J_{3}}\cdots{{\bf B}}^{\mathsf{T}}_{I_% {2t}\cup J_{2t}}\leavevmode\nobreak\ {\bf B}_{I_{2t}\cup J_{1}}[I_{1}\cup J_{1% }(V_{\tau}),I_{2t}\cup J_{1}(V_{\tau})]=1.

Since $I_{1}(E_{1}\cap V_{\tau})=I_{2t}(E_{1}\cap V_{\tau})$ , we have $I_{1}\cup J_{1}(V_{\tau})=I_{2t}\cup J_{1}(V_{\tau})$ . So if the summand is nonzero, it has a $1$ on its diagonal.

To summarize, each summand in (6) is nonzero if and only if $I_{i}\cup J_{i}=\varphi_{i}(E_{1}\cup E_{2}),I_{i}\cup J_{i+1}=\varphi’_{i}(E_% {1}\cup E_{2})$ for some $\varphi_{i},\varphi’_{i}\in{\mathcal{T}}^{k}_{n}$ , $I_{i}$ ’s and $J_{i}$ ’s agree on $S$ , $I_{i}$ ’s agree on $E_{1}\cap V_{\tau}$ and $J_{i}$ ’s agree on $E_{2}\cap U_{\tau}$ for $1\leq i\leq 2t$ . Since each nonzero summand contributes a $1$ on the diagonal, we can simply count the number of nonzero summands to compute the trace in (6). Notice that the number of nonzero summands is equal to the number of $\varphi_{1},\dots,\varphi_{2t},\varphi’_{1},\dots,\varphi’_{2t}$ that satisfy all the constraints. Hence

E\leq n^{|S|}\leavevmode\nobreak\ n^{|E_{1}\cap V_{\tau}|}\leavevmode\nobreak% \ n^{|E_{2}\cap U_{\tau}|}\leavevmode\nobreak\ n^{2t|E_{1}\backslash V_{\tau}% \backslash S|}\leavevmode\nobreak\ n^{2t|E_{2}\backslash U_{\tau}\backslash S|}.

(7)

For a fixed set of $a, b, c$ , (7) provides an upper bound for $\left\lVert{\bf M}_{a,b,c,\sigma}\right\rVert^{4t}_{4t}$ for any permutation $\sigma\in\mathfrak{S}_{d}$ . But what is an upper bound for $\left\lVert{\bf M}_{a,b,c,\sigma}\right\rVert^{4t}_{4t}$ among all possible sets of $a, b, c$ ? By varying $a, b, c$ , we can always make $S=E_{1}\cap E_{2}$ large enough to contain vertices in $E_{1}\cap V_{\tau}$ and $E_{2}\cap U_{\tau}$ . Thus $S$ is a vertex separator by 19. It follows that $E_{1}\backslash V_{\tau}\backslash S=E_{1}\backslash S$ and $E_{2}\backslash U_{\tau}\backslash S=E_{2}\backslash S$ . Hence

E\leavevmode\nobreak\ \leq\leavevmode\nobreak\ n^{|V(\tau)|}\leavevmode% \nobreak\ n^{2t|E_{1}\backslash S|}\leavevmode\nobreak\ n^{2t|E_{2}\backslash S% |}\leavevmode\nobreak\ \leq\leavevmode\nobreak\ n^{|V(\tau)|}\leavevmode% \nobreak\ n^{2t(|V(\tau)|-|S_{\tau}|)},

(8)

where $S_{\tau}$ is a vertex separator of minimum size. Substituting (8) into (5) yields the desired bound. $\hfill\blacktriangleleft$

Corollary 21.

For any given shape $\tau$ , any $\varepsilon>0$ , with probability $1-\varepsilon$ ,

\left\lVert{\bf M}\right\rVert\leavevmode\nobreak\ \leq\leavevmode\nobreak\ |V% (\tau)|^{|V(\tau)|}\left(C\log\left(|E(\tau)|^{|E(\tau)|}n^{|V(\tau)|}/% \varepsilon\right)\right)^{|V(\tau)|}\left(\sqrt{\frac{1-p}{p}}\right)^{|E(S_{% \tau})|}\sqrt{n}^{|V(\tau)|-|S_{\tau}|},

where $C>0$ is an absolute constant.

Proof.

By Markov’s inequality and Theorem 20, we have

		$\displaystyle\mathbb{P}(\left\lVert{\bf M}\right\rVert\geq\theta)\leavevmode% \nobreak\ \leq\leavevmode\nobreak\ \mathbb{P}(\left\lVert{\bf M}\right\rVert^{% 4t}_{4t}\geq\theta^{4t})\leavevmode\nobreak\ \leq\leavevmode\nobreak\ \theta^{% -4t}{\mathbb{E}}\left\lVert{\bf M}\right\rVert_{4t}^{4t}$
	$\displaystyle=\leavevmode\nobreak\$	$\displaystyle\theta^{-4t}\left((48t\|V(\tau)\|)^{4t\|V(\tau)\|}(C\|E(\tau)\|)^{\|E(% \tau)\|}n^{\|V(\tau)\|}\right)\left(\sqrt{\frac{1-p}{p}}\right)^{4t\|E(S_{\tau})\|}% n^{2t(\|V(\tau)\|-\|S_{\tau}\|)}.$		(9)

Set the right hand side of (9) to $\varepsilon$ by taking

\theta=\left(\varepsilon^{-\frac{1}{4t}}(48t|V(\tau)|)^{|V(\tau)|}(C|E(\tau)|)% ^{\frac{|E(\tau)|}{4t}}n^{\frac{|V(\tau)|}{4t}}\right)\left(\sqrt{\frac{1-p}{p% }}\right)^{|E(S_{\tau})|}\sqrt{n}^{|V(\tau)|-|S_{\tau}|}.

Take

t=\frac{1}{4}\log\left(|E(\tau)|^{|E(\tau)|}n^{|V(\tau)|}/\varepsilon\right)

to complete the proof. $\hfill\blacktriangleleft$

References

[1] Radosław Adamczak and Paweł Wolff. Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order. Probability Theory and Related Fields, 162(3):531–586, 2015.
[2] Kwangjun Ahn, Dhruv Medarametla, and Aaron Potechin. Graph matrices: Norm bounds and applications. arXiv preprint, 2021. arXiv:1604.03423.
[3] Richard Aoun, Marwa Banna, and Pierre Youssef. Matrix poincaré inequalities and concentration. Advances in Mathematics, 371:107251, 2020.
[4] Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari, and Jeff Xu. Polynomial-time power-sum decomposition of polynomials. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 956–967, 2022. doi:10.1109/FOCS54457.2022.00094.
[5] Afonso S Bandeira, March T Boedihardjo, and Ramon van Handel. Matrix concentration inequalities and free probability. Inventiones Mathematicae, pages 1–69, 2023.
[6] Nikhil Bansal, Haotian Jiang, and Raghu Meka. Resolving matrix Spencer conjecture up to poly-logarithmic rank. In Proceedings of the 55th ACM Symposium on Theory of Computing, pages 1814–1819, 2023. doi:10.1145/3564246.3585103.
[7] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019. doi:10.1137/17M1138236.
[8] Sergey G Bobkov, Friedrich Götze, and Holger Sambale. Higher order concentration of measure. Communications in Contemporary Mathematics, 21(03):1850043, 2019.
[9] Tatiana Brailovskaya and Ramon van Handel. Universality and sharp matrix concentration inequalities. arXiv preprint, 2022. arXiv:2201.05142.
[10] Jingqiu Ding, Tommaso d’Orsi, Chih-Hung Liu, David Steurer, and Stefan Tiegel. Fast algorithm for overcomplete order-3 tensor decomposition. In Conference on Learning Theory, pages 3741–3799. PMLR, 2022. URL: https://proceedings.mlr.press/v178/ding22a.html.
[11] Rong Ge, Qingqing Huang, and Sham M. Kakade. Learning mixtures of gaussians in high dimensions. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, pages 761–770, New York, NY, USA, 2015. Association for Computing Machinery. doi:10.1145/2746539.2746616.
[12] Samuel B Hopkins, Tselil Schramm, and Jonathan Shi. A robust spectral algorithm for overcomplete tensor decomposition. In Conference on Learning Theory, pages 1683–1722. PMLR, 2019. URL: http://proceedings.mlr.press/v99/hopkins19b.html.
[13] Samuel B Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer. Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors. In Proceedings of the 48th ACM Symposium on Theory of Computing, pages 178–191, 2016. doi:10.1145/2897518.2897529.
[14] De Huang and Joel A. Tropp. From poincaré inequalities to nonlinear matrix concentration. Bernoulli, 2020.
[15] De Huang and Joel A. Tropp. Nonlinear matrix concentration via semigroup methods. Electronic Journal of Probability, 26:Art. No. 8, January 2021.
[16] Chris Jones, Aaron Potechin, Goutham Rajendran, Madhur Tulsiani, and Jeff Xu. Sum-of-squares lower bounds for sparse independent set. In Proceedings of the 62nd IEEE Symposium on Foundations of Computer Science, 2021.
[17] Jeong Han Kim and Van H Vu. Concentration of multivariate polynomials and its applications. Combinatorica, 20(3):417–434, 2000. doi:10.1007/S004930070014.
[18] Bohdan Kivva and Aaron Potechin. Exact nuclear norm, completion and decomposition for random overcomplete tensors via degree-4 sos. arXiv preprint arXiv:2011.09416, 2020. arXiv:2011.09416.
[19] Stanislaw Kwapien. Decoupling Inequalities for Polynomial Chaos. The Annals of Probability, 15(3):1062–1071, 1987.
[20] Cécilia Lancien and Pierre Youssef. A note on quantum expanders, 2023. arXiv:2302.07772.
[21] Rafał Latała. Estimates of moments and tails of gaussian chaoses. The Annals of Probability, 34(6):2315–2331, 2006.
[22] Lester Mackey, Michael Jordan, Richard Chen, Brendan Farrell, and Joel Tropp. Matrix concentration inequalities via the method of exchangeable pairs. The Annals of Probability, 42, 2012.
[23] Terry R. McConnell and Murad S. Taqqu. Double integration with respect to symmetric stable processes. Technical report, Technical Report 618, Cornell Univ., 1984.
[24] Ankur Moitra and Alexander S Wein. Spectral methods from tensor networks. In Proceedings of the 51st ACM Symposium on Theory of Computing, pages 926–937, 2019. doi:10.1145/3313276.3316357.
[25] Ryan O’Donnell and Yu Zhao. Polynomial bounds for decoupling, with applications. In Proceedings of the 31st Conference on Computational Complexity, CCC ’16, Dagstuhl, DEU, 2016. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CCC.2016.24.
[26] Daniel Paulin, Lester Mackey, and Joel A. Tropp. Efron–Stein inequalities for random matrices. The Annals of Probability, 44(5):3431–3473, 2016.
[27] Víctor H. Peña and Evarist Giné. Decoupling: From dependence to independence. Springer-Verlag, 1999.
[28] Goutham Rajendran. Nonlinear Random Matrices and Applications to the Sum of Squares Hierarchy. PhD thesis, University of Chicago, 2022.
[29] Goutham Rajendran and Madhur Tulsiani. Concentration of polynomial random matrices via efron-stein inequalities. Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2023.
[30] Holger Rauhut. Compressive sensing and structured random matrices. In Theoretical Foundations and Numerical Methods for Sparse Recovery, pages 1–92. De Gruyter, Berlin, New York, 2010.
[31] Warren Schudy and Maxim Sviridenko. Bernstein-like concentration and moment inequalities for polynomials of independent random variables: multilinear case. arXiv preprint, 2011. arXiv:1109.5193.
[32] Warren Schudy and Maxim Sviridenko. Concentration and moment inequalities for polynomials of independent random variables. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 437–446. SIAM, 2012. doi:10.1137/1.9781611973099.37.
[33] Khalid Shebrawi and Hussien Albadawi. Trace inequalities for matrices. Bulletin of the Australian Mathematical Society, 87(1):139–148, 2013.
[34] Joel A. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1–230, 2015.
[35] Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018.

[bib.bib1] [1] Radosław Adamczak and Paweł Wolff. Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order. Probability Theory and Related Fields, 162(3):531–586, 2015.

[bib.bib2] [2] Kwangjun Ahn, Dhruv Medarametla, and Aaron Potechin. Graph matrices: Norm bounds and applications. arXiv preprint, 2021. arXiv:1604.03423.

[bib.bib3] [3] Richard Aoun, Marwa Banna, and Pierre Youssef. Matrix poincaré inequalities and concentration. Advances in Mathematics, 371:107251, 2020.

[bib.bib4] [4] Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari, and Jeff Xu. Polynomial-time power-sum decomposition of polynomials. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 956–967, 2022. doi:10.1109/FOCS54457.2022.00094.

[bib.bib5] [5] Afonso S Bandeira, March T Boedihardjo, and Ramon van Handel. Matrix concentration inequalities and free probability. Inventiones Mathematicae, pages 1–69, 2023.

[bib.bib6] [6] Nikhil Bansal, Haotian Jiang, and Raghu Meka. Resolving matrix Spencer conjecture up to poly-logarithmic rank. In Proceedings of the 55th ACM Symposium on Theory of Computing, pages 1814–1819, 2023. doi:10.1145/3564246.3585103.

[bib.bib7] [7] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019. doi:10.1137/17M1138236.

[bib.bib8] [8] Sergey G Bobkov, Friedrich Götze, and Holger Sambale. Higher order concentration of measure. Communications in Contemporary Mathematics, 21(03):1850043, 2019.

[bib.bib9] [9] Tatiana Brailovskaya and Ramon van Handel. Universality and sharp matrix concentration inequalities. arXiv preprint, 2022. arXiv:2201.05142.

[bib.bib10] [10] Jingqiu Ding, Tommaso d’Orsi, Chih-Hung Liu, David Steurer, and Stefan Tiegel. Fast algorithm for overcomplete order-3 tensor decomposition. In Conference on Learning Theory, pages 3741–3799. PMLR, 2022. URL: https://proceedings.mlr.press/v178/ding22a.html.

[bib.bib11] [11] Rong Ge, Qingqing Huang, and Sham M. Kakade. Learning mixtures of gaussians in high dimensions. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, pages 761–770, New York, NY, USA, 2015. Association for Computing Machinery. doi:10.1145/2746539.2746616.

[bib.bib12] [12] Samuel B Hopkins, Tselil Schramm, and Jonathan Shi. A robust spectral algorithm for overcomplete tensor decomposition. In Conference on Learning Theory, pages 1683–1722. PMLR, 2019. URL: http://proceedings.mlr.press/v99/hopkins19b.html.

[bib.bib13] [13] Samuel B Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer. Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors. In Proceedings of the 48th ACM Symposium on Theory of Computing, pages 178–191, 2016. doi:10.1145/2897518.2897529.

[bib.bib14] [14] De Huang and Joel A. Tropp. From poincaré inequalities to nonlinear matrix concentration. Bernoulli, 2020.

[bib.bib15] [15] De Huang and Joel A. Tropp. Nonlinear matrix concentration via semigroup methods. Electronic Journal of Probability, 26:Art. No. 8, January 2021.

[bib.bib16] [16] Chris Jones, Aaron Potechin, Goutham Rajendran, Madhur Tulsiani, and Jeff Xu. Sum-of-squares lower bounds for sparse independent set. In Proceedings of the 62nd IEEE Symposium on Foundations of Computer Science, 2021.

[bib.bib17] [17] Jeong Han Kim and Van H Vu. Concentration of multivariate polynomials and its applications. Combinatorica, 20(3):417–434, 2000. doi:10.1007/S004930070014.

[bib.bib18] [18] Bohdan Kivva and Aaron Potechin. Exact nuclear norm, completion and decomposition for random overcomplete tensors via degree-4 sos. arXiv preprint arXiv:2011.09416, 2020. arXiv:2011.09416.

[bib.bib19] [19] Stanislaw Kwapien. Decoupling Inequalities for Polynomial Chaos. The Annals of Probability, 15(3):1062–1071, 1987.

[bib.bib20] [20] Cécilia Lancien and Pierre Youssef. A note on quantum expanders, 2023. arXiv:2302.07772.

[bib.bib21] [21] Rafał Latała. Estimates of moments and tails of gaussian chaoses. The Annals of Probability, 34(6):2315–2331, 2006.

[bib.bib22] [22] Lester Mackey, Michael Jordan, Richard Chen, Brendan Farrell, and Joel Tropp. Matrix concentration inequalities via the method of exchangeable pairs. The Annals of Probability, 42, 2012.

[bib.bib23] [23] Terry R. McConnell and Murad S. Taqqu. Double integration with respect to symmetric stable processes. Technical report, Technical Report 618, Cornell Univ., 1984.

[bib.bib24] [24] Ankur Moitra and Alexander S Wein. Spectral methods from tensor networks. In Proceedings of the 51st ACM Symposium on Theory of Computing, pages 926–937, 2019. doi:10.1145/3313276.3316357.

[bib.bib25] [25] Ryan O’Donnell and Yu Zhao. Polynomial bounds for decoupling, with applications. In Proceedings of the 31st Conference on Computational Complexity, CCC ’16, Dagstuhl, DEU, 2016. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CCC.2016.24.

[bib.bib26] [26] Daniel Paulin, Lester Mackey, and Joel A. Tropp. Efron–Stein inequalities for random matrices. The Annals of Probability, 44(5):3431–3473, 2016.

[bib.bib27] [27] Víctor H. Peña and Evarist Giné. Decoupling: From dependence to independence. Springer-Verlag, 1999.

[bib.bib28] [28] Goutham Rajendran. Nonlinear Random Matrices and Applications to the Sum of Squares Hierarchy. PhD thesis, University of Chicago, 2022.

[bib.bib29] [29] Goutham Rajendran and Madhur Tulsiani. Concentration of polynomial random matrices via efron-stein inequalities. Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2023.

[bib.bib30] [30] Holger Rauhut. Compressive sensing and structured random matrices. In Theoretical Foundations and Numerical Methods for Sparse Recovery, pages 1–92. De Gruyter, Berlin, New York, 2010.

[bib.bib31] [31] Warren Schudy and Maxim Sviridenko. Bernstein-like concentration and moment inequalities for polynomials of independent random variables: multilinear case. arXiv preprint, 2011. arXiv:1109.5193.

[bib.bib32] [32] Warren Schudy and Maxim Sviridenko. Concentration and moment inequalities for polynomials of independent random variables. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 437–446. SIAM, 2012. doi:10.1137/1.9781611973099.37.

[bib.bib33] [33] Khalid Shebrawi and Hussien Albadawi. Trace inequalities for matrices. Bulletin of the Australian Mathematical Society, 87(1):139–148, 2013.

[bib.bib34] [34] Joel A. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1–230, 2015.

[bib.bib35] [35] Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018.

Simple Norm Bounds for Polynomial Random Matrices via Decoupling

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Analyzing concentration via moment estimates.

Decoupling.

Methods and results: a technical overview

Theorem 1 (Restatement of Theorem 10).

Gaussian polynomial matrices.

Theorem 2 (Restatement of Theorem 14).

Do try this at home: applying the framework

Dense graph matrices.

Sparse graph matrices.

2 Preliminaries and Notation

Sets and indices.

Matrix norms.

Definition 3 (Schatten norm).

Polynomial random matrices.

Definition 4 (Permutation Symmetric Property).

Partial derivatives.

Constants.

3 Decoupling Inequalities

Lemma 5 (see [30] Lemma 6.21).

Improved Decoupling for Random Variables with Structured Indices

Lemma 6.

▶ Remark 7.

Proof.

4 Multilinear Polynomial Random Matrices

Lemma 8 (Non-Hermitian Matrix Rosenthal Inequality).

Proof.

Claim 9.

Proof.

Theorem 10 (Homogeneous Multilinear Recursion).

Proof.

Corollary 11 (Multilinear Recursion).

Proof.

5 Gaussian Polynomial Random Matrices

Lemma 12 (Polynomial Moments, see [14] Theorem 7.1).

Lemma 13 (Non-Hermitian Polynomial Moments).

Theorem 14 (Gaussian Recursion).

Proof.

Claim 15.

Proof.

6 Application: Graph Matrices

Definition 16 (Shape, see e.g. [29] Definition 4.2).

Definition 17 (Graph matrix, see e.g. [29] Definition 4.4).

Definition 18 (Vertex Separator, see e.g. [29] Definition 4.8).

Claim 19.

Proof.

Theorem 20.

Proof.

Corollary 21.

Proof.

References

$\blacktriangleright$ Remark 7.