Abstract 1 Introduction 2 Preliminaries 3 The Proof Framework 4 Simplified Proof of Communication Complexity Lower Bounds 5 Proof of the Main Theorem References

A Lower Bound on the Trace Norm of Boolean Matrices and Its Applications

Tsun-Ming Cheung School of Computer Science, McGill University, Montreal, Canada Hamed Hatami ORCID School of Computer Science, McGill University, Montreal, Canada Kaave Hosseini ORCID Department of Computer Science, University of Rochester, NY, USA Aleksandar Nikolov Department of Computer Science, University of Toronto, Canada Toniann Pitassi Department of Computer Science, Columbia University, New York, NY, USA Morgan Shirley ORCID Department of Computer Science, University of Victoria, Canada
Abstract

We present a simple method based on a variant of Hölder’s inequality to lower-bound the trace norm of Boolean matrices. As the main result, we obtain an exponential separation between the randomized decision tree depth and the spectral norm (i.e. the Fourier L1-norm) of a Boolean function. This answers an open question of Cheung, Hatami, Hosseini and Shirley (CCC 2023). As immediate consequences, we obtain the following results.

  • We give an exponential separation between the logarithm of the randomized and the deterministic parity decision tree size. This is in sharp contrast with the standard binary decision tree setting where the logarithms of randomized and deterministic decision tree size are essentially polynomially related, as shown recently by Chattopadhyay, Dahiya, Mande, Radhakrishnan, and Sanyal (STOC 2023).

  • We give an exponential separation between the approximate and the exact spectral norm for Boolean functions.

  • We give an exponential separation for XOR functions between the deterministic communication complexity with oracle access to Equality function (DEQ) and randomized communication complexity. Previously, such a separation was known for general Boolean matrices by Chattopadhyay, Lovett, and Vinyals (CCC 2019) using the Integer Inner Product (IIP) function.

  • Finally, our method gives an elementary and short proof for the mentioned exponential DEQ lower bound of Chattopadhyay, Lovett, and Vinyals for Integer Inner Product (IIP).

Keywords and phrases:
Boolean function complexity, parity decision trees, randomized communication complexity
Funding:
Hamed Hatami: Supported by an NSERC grant.
Kaave Hosseini: Partially supported by the Goergen Institute for Data Science at the University of Rochester.
Morgan Shirley: Research was done while the author was a student at the University of Toronto. Supported by an NSERC grant.
Copyright and License:
[Uncaptioned image] © Tsun-Ming Cheung, Hamed Hatami, Kaave Hosseini, Aleksandar Nikolov, Toniann Pitassi, and Morgan Shirley; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Communication complexity
; Theory of computation Oracles and decision trees
Editors:
Raghu Meka

1 Introduction

We start by recalling two fundamental notions related to the complexities of Boolean functions: the randomized parity decision tree complexity and the spectral norms.

A parity decision tree (PDT) for a Boolean function f:𝔽2n{0,1} is similar to a standard decision tree with the following strengthened query power. Instead of a single variable, each internal node can query the parity (sum over 𝔽2) of any subset of variables, e.g., x1x5x6. The parity decision tree complexity of f, denoted by PDTdepth(f), is the smallest depth of a PDT that outputs the correct value of f(x) on every input x. A randomized parity decision tree (RPDT) of depth at most d is a probability distribution over PDTs of depth at most d. It computes f with error ϵ if, for every input x, the RPDT outputs f(x) with probability at least 1ϵ. The randomized parity decision tree complexity, denoted by RPDTdepth(f), is the smallest depth of an RPDT computing f with error ϵ=1/3.

Parity decision trees are closely related to the Fourier expansions of Boolean functions. The spectral norm (also known as algebra norm) of a function f:𝔽2n, denoted by f^1, is the sum of the absolute values of the Fourier coefficients.

It is easy to see from the Fourier expansion of a PDT that every Boolean function f:𝔽2n{0,1} satisfies logf^1PDTdepth(f). The randomized counterpart of this inequality does not hold as illustrated by h, the indicator function of the standard basis vectors {e1,,en}: as observed in [14], RPDTdepth(h)=O(1), but logh^1=Θ(logn). Nonetheless, this still leaves the possibility of logf^1=O~(RPDTdepth(f)) open, where O~() hides polylogarithmic dependencies on n. Cheung et al. [8] specifically asked whether such a relation could hold, which if confirmed, would have established the stronger lower bound on RPDTdepth.

The main result of this work gives an example in which logf^1 is exponential in RPDTdepth(f), answering the question by [8] in the negative.

Theorem 1 (Main theorem).

There exists a function f:𝔽2n{0,1} with

RPDTdepth(f)=O(logn) and logf^1=Ω(n).

In the converse direction, the quadratic gap RPDTdepth(f)=O(f^12) holds for every Boolean function (see e.g. [14, Lemma 2.7]). Chattopadhyay, Mande, and Sherif [5] proved that the sink function SINK satisfies RPDTdepth(SINK)=Ω(SINK^1). This result and Theorem 1 together demonstrate that the measures logf^1 and RPDTdepth(f) are not polynomially related in both ways.

The proof technique used in the main theorem involves an application of Hölder’s inequality with carefully-chosen parameters (see Section 3) that allow for a combinatorial interpretation of the problem. This framework is useful not only for the question considered above but also in communication complexity, where we will use the Hölder’s inequality technique to simplify the proofs of existing lower bounds. For the rest of this section, we will discuss applications of both the proof technique and of the main theorem itself.

1.1 Communication Complexity: The Power of Oracle Access to Equality

Arguably the most well-known communication problem is the Equality function eq, in which the two parties compare if their inputs are identical. In the model of public-coin randomized communication, two parties are given a publicly accessible random string and are allowed to make errors with probability bounded away from 1/2. It is an elementary result that eq:{0,1}n×{0,1}n{0,1} has a randomized protocol requiring only O(1) bits of communication, while any deterministic protocol of eq requires n+1 bits of communication.

A natural question would be whether Equality fully captures the power of randomness in communication. A formal formulation of the question is to compare the relative power of randomized protocols and deterministic protocols with oracle access to Equality or, in short, DEQ protocols. A DEQ protocol performs communications as usual; in addition, the parties are given access to an oracle that computes Equality function, and each oracle call is charged at a cost of one bit. The seminal work by Chattopadhyay, Lovett and Vinyals [4] considered this question and proved an exponential separation between the DEQ complexity and randomized communication complexity (denoted as R) of the Integer Inner Product function (IIP) in dimension at least 6.

For a chosen dimension k and parameter n, the Boolean matrix IIPk(n):{2n,,2n}k×{2n,,2n}k{0,1} is defined by

IIPk[(x1,,xk),(y1,,yk)]={1if i=1kxiyi=00otherwise.

For this communication problem, each player holds a Θ(kn)-bit input. As we only consider the case when k=O(1), we treat IIPk(n) as a Θ(n)-bit communication problem.

Chattopadhyay, Lovett, and Vinyals presented a (one-way) randomized protocol for IIPk(n) of cost O(klogn). For the lower bound, they introduced a lower bound technique which we call relative area method (see Theorem 13), and showed that no DEQ protocol could compute the function with fewer than Ω(n) bits of communication.

Theorem 2 ([4]).

For constant k6, R(IIPk(n))=O(logn) and DEQ(IIPk(n))=Ω(n).

In a subsequent work, Cheung et al. [8] obtained a strengthening of Theorem 2 via a different approach of spectral methods. It was shown in [14] that

DEQ(A)12logAntr (1)

for any boolean matrix A{0,1}m×n, where Antr is the normalized trace norm of A with the normalization factor from the matrix dimensions (see Definition 10 for the precise definition).

Cheung et al. [8] observed that for k3, the Θ(2kn)×Θ(2kn) matrix IIPk(n) contains a 24n/3×24n/3 point-line incidence matrix PL(4n/3) as a submatrix. The matrix PL(n) is defined on the domains 𝒫==[2n/4]×[23n/4]2, and the entries of PL(n):𝒫×{0,1} are given by

PL[(x,x),(y,y)]={1if xy+x=y0otherwise.

They proved that the normalized trace norm of PL(n) is exponential in n, which shows that DEQ(PL(n))=Ω(n). Since communication complexity does not increase under restriction, this subsequently implies Theorem 2.

Theorem 3 ([8]).

The 2m×2m Boolean matrix PL(m) satisfies PL(m)ntr=Ω(2m/32). Consequently, for k3,

DEQ(IIPk(n))n48+O(1).

A common shortcoming of both proofs of Theorem 2 in [4, 8] is their highly technical nature. In Section 4, we give a considerably simplified proof – with improved linear factor on DEQ(IIPk(n)) – based on the Hölder’s inequality technique. Here, we consider a different submatrix of IIPk, which is the point-line incidence system PL that proves the optimality of Szemerédi–Trotter theorem. We will provide the precise definition of PL(m) in Section 4.

Theorem 4 (Improved version of Theorem 3).

The 2m+1×2m Boolean matrix PL(m) satisfies PL(m)ntr=Ω(2m/6). Consequently, for k3,

DEQ(IIPk(n))n8+O(1).

Nondeterministic communication

When analyzing the power of Equality in communication, another model of interest is nondeterministic protocols with oracle access to Equality, or NEQ for short. For the precise definition of the NEQ model, we refer readers to [19]. NEQ and its relationships with related models have been studied previously, both implicitly in [12] and explicitly in [19]. In the latter work, Pitassi, Shirley and Shraibman analyzed the relative area method used by [4] to lower bound DEQ(IIP6) and showed that the same technique is applicable to NEQ complexity as well. It is not clear that the lower bound technique of [8] on the DEQ complexity of IIP3 works for NEQ, so prior to this work it was open whether IIP3 is hard for NEQ. We observe that the analysis in Theorem 4 can be adapted to the relative area method, and therefore yields an almost linear lower bound on NEQ(IIPk) for every k3.

Theorem 5.

For constant k3,

NEQ(IIPk(n))=Ω(nlogn).

As far as we know, this method cannot be improved to give a linear lower bound on NEQ(IIP3). We leave resolving this logarithmic gap as an open question.

xor-lifts

A related open problem posed in [8] is whether an exponential separation of DEQ and holds for an xor-lift. For a Boolean function f:𝔽2n{0,1}, the xor-lift f:𝔽2n×𝔽2n{0,1} is defined by f(x,y)=f(xy) for each x,y𝔽2n. It can be verified that IIPk is not an xor-lift, so the DEQ-vs-R separation for the xor-lift case was open.

The matrix class of xor-lifts is interesting in complexity theory since many complexity measures of the matrix f can be related (or even equated) to complexity measures of the Boolean function f. Indeed, the query-to-communication connections allow us to translate the separation question to a purely query-complexity setting. For any Boolean function f:𝔽2n{0,1}, the inequality

R(f)2RPDTdepth(f) (2)

is evident from the standard simulation of a randomized parity decision tree by a communication protocol. A result of [10] shows that

fntr=f^1. (3)

Immediate from Equations 1, 2, and 3, we obtain the exponential DEQ-vs-R separation for xor-lifts from Theorem 1.

Corollary 6.

There exists a Boolean function f:𝔽2n{0,1} such that its xor-lift f satisfies R(f)=O(logn) and DEQ(f)=Ω(n).

1.2 Query Complexity: The Power of Randomness

Understanding the power of randomized versus deterministic algorithms is a fundamental problem in complexity theory. Depending on the computational model, this problem varies from fully resolved to forbiddingly out of reach. In the standard decision tree model, it is well known that randomness does not significantly reduce the number of queries. One early result in complexity theory by Nisan [18] showed that a randomized decision tree of depth d computing a Boolean function can be simulated by a deterministic decision tree of depth at most O(d3). On the other end of the spectrum, the randomized-versus-deterministic problem remains overwhelmingly difficult for Turing Machines.

A generic framework to study the relative powers of two computation models is to study the relations of the complexity classes defined by suitable complexity measures. For the sake of comparisons between measures, we consider the complexity measures normalized to the range [0,n]. Given a complexity measure C() for a deterministic computation model, P is defined to be the class of functions f:𝔽2n{0,1} (more accurately, sequences of functions fn) such that C(f)polylog(n). We define the class BPP similarly for the randomized counterpart of C(). The inclusion PBPP for every measure is obvious, so the randomized-versus-deterministic problem amounts to whether the classes P and BPP are equal.

We consider the two natural measures for trees, namely depth and logarithmic size (for normalization purposes) and four types of decision trees: (standard) decision tree (DT), parity decision tree (PDT), randomized decision tree (RDT), and randomized parity decision tree (RPDT). This branches into eight complexity measures in concern, and we define the other six measures DTdepth, logDTsize, RDTdepth, logRDTsize, PDTdepth and logPDTsize in an analogous manner to RPDTdepth and logRPDTsize. Based on the type of queries equipped and the complexity measure, the P-vs-BPP question branches into four pairs for comparison.

Standard decision trees

Nisan [18] showed that DTdepth(f)O(RDTdepth(f)3) in 1988 and settled that P=BPP in the depth setting. The question in the size setting remained unsettled for decades until recently Chattopadhyay, Dahiya, Mande, Radhakrishnan, and Sanyal [3] showed that

logDTsize(f)=O(log4(RDTsize(f))log3n),

concluding that P=BPP in this setting as well.

Parity decision trees

In the depth setting, the two classes P and BPP are strongly separated by the simple function of OR. It is well-known that the OR function on n bits satisfies RPDTdepth(OR)=O(1) but PDTdepth(OR)=n, which provides the optimal separation and hence PBPP in this setting.

Note that however PDTsize(OR)=O(n), so OR does not attest a separation between P and BPP in the size setting. Indeed, previous to this work the P-vs-BPP question was unknown in the size setting. As mentioned that logf^1PDTdepth(f) for every function f, an immediate corollary of Theorem 1 implies that PBPP in the size setting for parity decision trees.

Corollary 7.

There is a function f:𝔽2n{0,1} such that logRPDTsize(f)=O(logn), but logPDTsize(f)=Ω(n).

In fact, the randomized parity decision tree in our construction is non-adaptive and has a one-sided error, hence we obtain the stronger separation PRP111RP denotes the class of functions computable by RPDT with constant one-sided error and poly-logarithmic costs in the size setting parity decision tree size model. It remains an interesting open problem to determine whether a separation of O(1)-vs-Ω(n) for logRPDTsize and logPDTsize is possible, as in the case of depth illustrated by the OR function.

Question 8.

Is there a boolean function f:𝔽2n{0,1} such that logRPDTsize(f)=O(1), but logPDTsize(f)=Ω(n), or even logPDTsize(f)=ω(logn)?

1.3 Fourier Analysis: Approximate versus Exact Spectral Norms

The notion of approximate norms is customary in complexity theory as complexity measures for models with error tolerance. The approximate spectral norm of a function f with error ϵ, denoted as L^1,ϵ(f)222We adopt this function-like notation to emphasize that approximate spectral norm is not a norm despite what the name suggests, is the minimum g^1 for some function g such that fgϵ. As in the case of other complexity measures, we adopt the canonical choice ϵ=1/3 when referring to constant error. Spectral norm and approximate spectral norm of a Boolean function are fundamental parameters that find applications in many areas such as learning theory [17], complexity theory [21, 22, 23, 1], communication complexity [23, 14, 8], Fourier analysis [23, 11] and additive combinatorics [13, 20, 9].

It is natural to ask whether the spectral norm of a Boolean function is upper bounded by its approximate spectral norm. Towards answering this question, Cheung et al. [9] observed that if f is the indicator function of n-bit strings of Hamming-weight 1, then L^1,1/3(f)=O(1) but f^1=Ω(n). Nevertheless, this separation does not rule out the possibility of polynomial relations with dependency on n such as f^1=poly(L^1,1/3(f),n). Using the result by [14] that logf^1RPDTdepth(f) for any Boolean function f, Theorem 1 immediately rules out the possibility of polynomial dependency.

Corollary 9.

There exists a function f:𝔽2n{0,1} such that L^1,1/3(f)=nO(1) but f^1=2Ω(n).

2 Preliminaries

For a positive integer k, we denote [k]{1,,k}. All logarithms in this paper are base 2.

We denote the indicator function of a predicate P as 𝟏[P] and the indicator function of a set S as 𝟏S. For a random variable r and a set S, we write rS to indicate that r is uniformly sampled from S. Throughout this work, we adopt the standard big-O notations in computer science.

2.1 Schatten norms

For a vector vk and p[1,], we denote the p-norm of v as vp. For a matrix Am×n, the singular values of A are the square roots of the eigenvalues of AA, which we denote by σ1(A)σ2(A)σmin{m,n}(A)0. The central matrix norms in this paper are Schatten p-norm, which is defined as

ASp=(iσi(A)p)1/p

for p[1,), and AS=σ1(A). Put differently, Schatten p-norm of a matrix is the p norm of the vector [σ1(A),σ2(A),,σmin{m,n}(A)]. One property of Schatten norms inherited from p norms is the monotonicity of Schatten p-norm in p: ASpASq for 1p<q.

The Schatten 1-norm, 2-norm and -norm are frequently used and are commonly known as trace norm, Frobenius norm, and spectral norm respectively:

Atr =AS1=iσi(A);
AF =AS2=iσi(A)2=ijAij2;
A =AS=σ1(A)=maxxn{0}Ax2x.

For applications in theoretical computer science, it is more convenient to work with the normalized trace norm as a complexity measure.

Definition 10 (Normalized trace norm).

For Am×n, the normalized trace norm of A is

AntrAtrmn=1mniσi(A).

We remind readers that unlike essentially all other complexity measures and matrix norms used in this work, the normalized trace norm may increase upon restriction to a submatrix.

2.2 Fourier analysis of Boolean functions

This section gives a basic overview of Fourier analysis on the Boolean cube. For every η𝔽2n, define the Fourier character χη:𝔽2n{±1} as χη(x)=(1)x,η, where x,η is the standard inner product over 𝔽2n. For a function f:𝔽2n, its Fourier expansion is given by

f=η𝔽2nf^(η)χη, (4)

where the Fourier coefficient f^(η) is defined as f^(η)𝔼x𝔽2n[f(x)χη(x)].

For a function f:𝔽2n and p[1,), the Fourier p-norm is defined as

f^p=(x𝔽2n|f^(η)|p)1/p,

and f^=maxη|f^(η)|. In other words, the Fourier p-norm is the p norm of the vector of Fourier coefficients. Fourier 1-norm is better known as the spectral norm of f, i.e. f^1η𝔽2n|f^(η)|.

For an Abelian group, the convolution of two real-valued functions f,g:G is a function defined to be

fg(x)𝔼yG[f(y)g(xy)].

It is a standard fact in Fourier analysis that the Fourier transformation of convolution of two functions is the pointwise product of their respective Fourier transforms: fg^=f^g^.

Lastly, Parseval’s identity states that the squared Fourier 2-norm of a function f is equal to its second moment (under uniform probability measure):

f^22=η𝔽2nf^(η)2=𝔼x𝔽2n[f(x)2].

3 The Proof Framework

The key proof technique throughout this work is Hölder’s inequality with a nuanced choice of p-norms. Hölder’s inequality states that for any p,q[1,] satisfying 1p+1q=1,

|u,v|upvq (5)

for any real vectors u,vn. A generalization of Hölder’s inequality known as Littlewood’s inequality [15], which can be deduced from the standard Hölder’s inequality, relates the p-norms of a function for some “interpolated” tuples of p. The inequality states that for 1p0<p<p1,

vpvp0θvp11θ (6)

for any vector vn, where θ(0,1) satisfies 1p=θp0+1θp1. As Schatten norms and Fourier norms are defined based on p norms, the above inequality also holds with replacing the p norms with these families of norms.

Setting the parameters (p,p0,p1)=(2,1,4) (so that θ=1/3) for Equation 6 yields the following lower bound for 1-norm (after some algebraic manipulations):

f1f23f42. (7)

As this is the sole parametrization of Hölder’s inequality that we employ, we colloquially refer to Equation 7 as “the Hölder’s inequality” in the rest of this work.

Among all the possible parametrizations for Littlewood’s inequality, the choice made in Equation 7 of (p,p0,p1)=(2,1,4) is appealing because of the nice physical interpretations of 2-norm and 4-norm for Boolean matrices (the Schatten norms) and Boolean functions (the Fourier norms). One main inspiration of Equation 7 as a practical bound is the work of Chazelle and Lvov on hereditary discrepancy [7]. Chazelle and Lvov proved a hereditary discrepancy lower bound of a matrix in terms of its Schatten 2-norm and 4-norm, highlighting the combinatorial interpretations of these norms for Boolean matrices. In the next section, we illustrate the combinatorial perspective of Equation 7 with a simpler proof of Theorem 3.

4 Simplified Proof of Communication Complexity Lower Bounds

It is customary to associate a Boolean matrix A with the biadjacency matrix of a bipartite graph G=(UV,E). In view of this, the square of Schatten 2-norm i.e. Frobenius norm of A is simply the number of edges of G. The Schatten 4-norm of A also admits a graph-theoretic interpretation:

AS44=Tr((AA)2)=i,jUk,VAikAiAjkAj=i,jUk,V𝟏[{ik,i,jk,j}E].

In other words, AS44 is precisely the count of possibly degenerate 4-cycles in G. This quantity becomes especially easy to evaluate when the underlying graph G does not contain non-degenerate 4-cycles (C4-free), or equivalently, A does not contain a 2×2 all-one submatrix.

For a C4-free bipartite graph, all contributions to AS44 come from the counts of edges and paths of length 2. Furthermore, if G is almost balanced in the sense that the average degree of G is close to the maximum degree, Equation 7 indeed yields a non-trivial lower bound for Antr:

Proposition 11.

Let G=(UV,E) be a C4-free bipartite graph with maximum degree Δ(G). For xUV, let dx be the degree of x. If A{0,1}U×V is the biadjacency matrix of G, Then

Atr|E|3/2xUVdx2|E|2Δ(G).
Proof.

By Hölder’s inequality (Equation 7),

AtrAF3AS42=|E|3/2Tr((AA)2).

Since G is C4-free, the expression of Tr((AA)2) is simplified to

Tr((AA)2) =iUkVAik+iUk,VkAikAi+i,jUijkVAikAjk
=|E|+iUdi(di1)+kVdk(dk1)
=(xUVdx2)|E|

and the required bounds follow. Since every two distinct points define a unique line, the bipartite graph G associated with any point-line incidence system (with no duplicated lines) is C4-free. From Proposition 11, one can readily derive an improved exponential normalized trace norm lower bound for PL using the same point-incidence submatrix in [8].

We prove a better bound by considering a point-line incidence system PL attributed to Paul Erdős (see [6, Lemma 6.25]). This point-line incidence system is constructed to show the tightness of the Szemerédi–Trotter incidence bound. Concretely, the 2n+1×2n matrix PL(n):𝒫×{0,1} is defined as follows: the input domains are 𝒫=[2n/3]×[22n/3+1] and =[2n/3]×[22n/3], and the entries are given by

PL(n)[(x,x),(y,y)]={1if xy+y=x0otherwise.

We first show that IIP3 contains PL as a submatrix.

Claim 12.

The matrix IIP3(n) contains the matrix PL(3n/23/2) as a submatrix.

Proof.

The condition of incidence of PL can be rewritten as xy+(1)y+x(1)=0, which is an instance of integer inner product. For PL(3n/23/2), the magnitude of each entry is bounded by 223(32n32)+1=2n, therefore PL(3n/23/2) is a submatrix of IIP3(n).

Proof of Theorem 4.

As noted above, PL(m) is C4-free because no two distinct lines intersect at the same pair of points. For the matrix PL(m), each line is incident to exactly one point of the form (i,x)𝒫 for each i[2m/3]. Therefore the bipartite graph defined by PL contains ||2m/3=24m/3 edges. Also, each point is incident to at most one line of the form (j,y) for each j[2m/3], hence the maximum degree of the graph is 2m/3. By Proposition 11,

PL(m)ntrΩ(12m24m/322m/3)=Ω(2m/6).

By 12 and Equation 1, we obtain

DEQ(IIPk(n))DEQ(IIP3(n))DEQ(PL(3n/23/2))12(163n2)+O(1)=n8+O(1).

As mentioned earlier, Chattopadhyay, Lovett, and Vinyals [4] proved a linear lower bound on DEQ(IIPk) for k6 by a method different from the approach in [8] and this work. The method used in [4], the relative area method, considered a parametrized weighted sum of monochromatic rectangle partition of a Boolean matrix. The following DEQ lower bound is deduced from the communication complexity lower bound with more general oracle access.

Theorem 13 ([4, Lemmas 3.5 and 3.7]).

Suppose M is a Boolean matrix with α 1-entries, and the maximum area of any 1-monochromatic rectangle in M is β. For any constant η(1/2,1),

DEQ(M)=Ω(log(αβ1η|M|η)).

For the matrix PL(m), as observed in the proof of Theorem 4, α=|E(G)|=24m/3 and β=Δ(G)=2m/3 since PL(m) is C4-free. Applying Theorem 13 with η=1/2+ϵ for some very small constant ϵ>0, we obtain a linear DEQ lower bound with a slightly inferior constant due to the slackness in η. We can also use this technique to lower bound NEQ, which proves Theorem 5.

Theorem 14 ([19]).

Let M be a 2m×2m Boolean matrix, α be the number of 1-entries in M and let β be the size of the largest 1-monochromatic rectangle in M. Then

NEQ(M)=Ω(log(αβ2m)1logm).
Proof of Theorem 5.

As noted before, α=|E(G)|=24m/3 and β=Δ(G)=2m/3. For m=3n/23/2, applying Theorem 14 gives

NEQ(PL(m))=Ω(nlogn)

and therefore the same lower bound holds for NEQ(IIPk(n)) for k3.

5 Proof of the Main Theorem

The Hölder’s inequality supplies a large lower bound on 1-norm in the scenario of large 2-norm and small 4-norm. The proof of Theorem 3 utilizes the C4-free property and sufficient edge density of PL to show the desired trace norm lower bound. We adopt the same strategy in exhibiting a function that the separation of randomized parity decision tree depth and spectral norm.

As in the matrix case, we begin by examining the physical interpretations of p-norms in the Hölder’s inequality. A fundamental quantity in additive combinatorics known as the additive energy, coincides with the fourth power of Fourier 4-norm in Boolean function setting.

Definition 15 (Additive energy).

For f:G{0,1} over an Abelian group G, the additive energy of f is defined as

E(f)𝔼x,y,zG[f(x)f(y)f(z)f(x+yz)].

For a Boolean function f:𝔽2n{0,1}, tt is straightforward to show that E(f)=f^44:

E(f) =𝔼t𝔽2n[𝔼x𝔽2n[f(x)f(tx)]𝔼z𝔽2n[f(z)f(tz)]]
=𝔼t𝔽2n[(ff(t))2]
=η𝔽2nff^(η)2=ηGf^(η)4=f^44.

It is also direct from Parseval’s identity that f^22=𝔼x[f(x)]. Therefore in the Boolean function setting, Equation 7 states that

f^1f^23f^42=𝔼x[f(x)]3E(f). (8)

Emulating the approach in Section 4 to guarantee a small 4-norm, we consider a Boolean function f:𝔽2n{0,1} that satisfies f(x)f(y)f(z)f(x+y+z)=0 for any distinct x,y,z𝔽2n as an analogue of C4-free bipartite graphs. For such a function, the additive energy of f is lower than typical as only O(|𝔽2n|2) tuples of (x,y,z)(𝔽2n)3 could contribute to E(f). A function with this property is precisely the indicator function of a Sidon set.

Definition 16 (Sidon set).

For an Abelian group G, a set SG is called a Sidon set if for every a,b,c,dS such that a+b=c+d, then {a,b}={c,d}.

The indicator of a Sidon set emerges as a natural candidate function that attains the exponential spectral norm lower bound stated in Theorem 1. Indeed, one can show that the spectral norm of an indicator of a sufficiently big Sidon set is large.

Proposition 17.

Every Sidon set S of a finite Abelian group G satisfies 𝟏S^1|S|/2.

Proof.

Since S is a Sidon set, 𝟏S(x)𝟏S(y)𝟏S(z)𝟏S(x+yz)=1 implies that z=x or z=y. Thus

E(𝟏S)2|G|𝔼x,yG[𝟏S(x)𝟏S(y)]=2|G|×(|S||G|)2=2|S|2|G|3.

Applying Hölder’s inequality (Equation 8) gives

𝟏S^1𝔼x[𝟏S(x)]3E(𝟏S)=|S|3/|G|32|S|2/|G|3=|S|2.

By Cauchy-Schwarz inequality and Parseval identity, it is direct to check that for any set TG, the spectral norm of 𝟏T is at most |T|. The above proposition shows that this upper bound is tight up to constant factor for a Sidon set.

By considering the possible sums of a pair of elements, it is easy to see that for a Sidon set S, we have (|S|2)|G| and hence |S|=O(|G|). In the case of G=2n 333For clarity, in the rest of this section, we use the notation 2 to refer to the additive group of size 2., a well-known construction of a Sidon set matching the size upper bound is the Bose–Chaudhuri–Hocquenghem (BCH) code [2, 16]. This code is constructed from polynomials over a finite field of characteristic 2.

Denote 𝔽2m the characteristic-2 finite field of size 2m. The elements of 𝔽2m can be isomorphically identified with univariate 𝔽2-polynomials of degree at most m1, where multiplication is defined modulus a fixed irreducible polynomial P(x)𝔽2[x] of degree m. Viewing as additive groups, 𝔽2m𝔽2[x]/P(x) is isomorphic to 2m.

For 𝔽p where p is a prime, the Sidon set construction in [2] is given by the tuples (a,ak) over all a in a suitable subfield and a suitable exponent k. For p=2, the construction takes the form of BCH(S), where for a set S we define

BCH(S){(a,a3):aS}.

For the sake of completeness, we include the proof for this specific construction.

Theorem 18 ([2]).

For every even number m, the set BCH(𝔽2m/2)𝔽2m/2×𝔽2m/22m is a Sidon set.

Proof.

Let (a,a3) and (b,b3) be two pairs with a prescribed sum (u,v)𝔽2m/2×𝔽2m/2. This means a+b=u and a3+b3=v, which implies that

u3=(a+b)3=a3+b3+(a+b)ab=v+uab.

Suppose ab i.e. u0, then a(u+a)=ab=u2+vu1. The same calculation concludes that a and b are the roots of the quadratic equation x(x+u)=u2+vu1. Since a quadratic equation has at most two roots, {a,b} is uniquely determined by u and v.

Theorem 18 allows us to construct a large Sidon set in 2n. Since a subset of a Sidon set remains a Sidon set, it remains to select a suitably structured subset whose indicator function is computable with a short randomized parity decision tree. As mentioned earlier, every element in 𝔽2n can be uniquely identified with a polynomial in 𝔽2[x] of degree at most n1. For d, denote 𝒫d the set of 𝔽2-polynomials of degree at most d1. We show that BCH(𝒫n/4) is the desired Sidon set.

Theorem 19.

Let n=4d for some d, and S=BCH(𝒫d)𝒫d×𝒫3d𝔽2d×𝔽23d2n. The Boolean function 𝟏S:𝔽2n{0,1} satisfies that RPDTdepth(𝟏S)=O(logn) and 𝟏S^1=Ω(2n/8).

Proof.

Note that S is a Sidon set of size |𝒫d|=2d=2n/4. The spectral norm lower bound follows from Proposition 17. It remains to upper-bound the randomized parity decision tree complexity.

We first describe a randomized algorithm that computes 𝟏S. Suppose a pair (u,v)𝔽2[x]×𝔽2[x] with deg(u)<d and deg(v)<3d is a given. To determine whether (u,v)S, we interpret vu3 as a polynomial in 𝔽2m[x], where m=log(3d)+2, and check whether vu30. This can be accomplished using the standard randomized polynomial identity testing algorithm: pick a random t𝔽2m and evaluate v(t)u(t)3𝔽2m. The algorithm declares that “vu3” if any of the evaluated values is non-zero, and declares “v=u3” otherwise.

Notice that this randomized algorithm has a one-sided error, as it always makes the correct declaration when v=u3 (i.e. 𝟏S(u,v)=1). In the case of vu3, since the polynomial vu3 has at most 3d roots, the probability of error is at most 3d/2m1/4.

Next, we convert the randomized algorithm into a randomized parity decision tree. For every t𝔽2m, the set

Ht{u𝒫d:u(t)=0}𝒫d2d

is a linear subspace of co-dimension m in 2d, and each coset of Ht takes a fixed value when evaluated at t. Therefore, the value of u(t) is determined by m deterministic linear queries to u. Similarly, the value of v(t) is determined by m deterministic linear queries to v. Hence, we construct the randomized parity decision tree as follows: pick a random t𝔽2m and make the 2m parity queries that determine the values of u(t),v(t)𝔽2m, and the decision tree outputs whether v(t) and u(t)3 are equal.

5.1 Payley-Zygmud inequality and typical Fourier coefficients

Here instead of Hölder’s inequality, we use the Payley-Zygmund inequality to deduce a stronger conclusion than Proposition 17 which could be of independent interest. We show that for the indicator of a Sidon set, a constant fraction of the Fourier coefficients is large. We have shown that for a Sidon set S in an Abelian group G,

χG^|𝟏S^(χ)|2=|S||G| and χG^|𝟏S^(χ)|42|S|2|G|3.

Consider the random variable X=|𝟏S^(χ)|2 where χ is a character picked uniformly at random. The above bounds translate to 𝔼[X]=|S|/|G|2 and 𝔼[X2]=2|S|2/|G|4=2𝔼[X]2. By Paley-Zygmund inequality, we have for any δ(0,1),

Pr[X>δ𝔼[X]](1δ)2(𝔼[X])2𝔼[X2](1δ)22.

Taking δ=1/2 for example, this gives

Prχ[|𝟏S^(χ)|>|S|2|G|]18.

References

  • [1] Nikhil Bansal and Makrand Sinha. k-forrelation optimally separates quantum and classical query complexity. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 1303–1316, New York, NY, USA, 2021. Association for Computing Machinery. doi:10.1145/3406325.3451040.
  • [2] R.C. Bose and D.K. Ray-Chaudhuri. On a class of error correcting binary group codes. Information and Control, 3(1):68–79, March 1960. doi:10.1016/s0019-9958(60)90287-4.
  • [3] Arkadev Chattopadhyay, Yogesh Dahiya, Nikhil S. Mande, Jaikumar Radhakrishnan, and Swagato Sanyal. Randomized versus deterministic decision tree size. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC ’23. ACM, June 2023. doi:10.1145/3564246.3585199.
  • [4] Arkadev Chattopadhyay, Shachar Lovett, and Marc Vinyals. Equality alone does not simulate randomness. In 34th Computational Complexity Conference (CCC 2019), 2019. doi:10.4230/LIPICS.CCC.2019.14.
  • [5] Arkadev Chattopadhyay, Nikhil S. Mande, and Suhail Sherif. The log-approximate-rank conjecture is false. J. ACM, 67(4):23:1–23:28, 2020. doi:10.1145/3396695.
  • [6] Bernard Chazelle. The discrepancy method: randomness and complexity. Cambridge University Press, 2001.
  • [7] Bernard Chazelle and Alexey Lvov. A trace bound for the hereditary discrepancy. In Proceedings of the sixteenth annual symposium on Computational geometry, pages 64–69, 2000. doi:10.1145/336154.336179.
  • [8] Tsun-Ming Cheung, Hamed Hatami, Kaave Hosseini, and Morgan Shirley. Separation of the factorization norm and randomized communication complexity. In 38th Computational Complexity Conference (CCC 2023). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.CCC.2023.1.
  • [9] Tsun-Ming Cheung, Hamed Hatami, Rosie Zhao, and Itai Zilberstein. Boolean functions with small approximate spectral norm. Electron. Colloquium Comput. Complex., TR22-041, 2022. URL: https://eccc.weizmann.ac.il/report/2022/041, arXiv:TR22-041.
  • [10] Kenneth R. Davidson and Allan P. Donsig. Norms of schur multipliers. Illinois Journal of Mathematics, 51(3), July 2007. doi:10.1215/ijm/1258131101.
  • [11] Uma Girish, Avishay Tal, and Kewen Wu. Fourier Growth of Parity Decision Trees. In 36th Computational Complexity Conference (CCC 2021), volume 200 of Leibniz International Proceedings in Informatics (LIPIcs), pages 39:1–39:36, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CCC.2021.39.
  • [12] Mika Göös, Toniann Pitassi, and Thomas Watson. The landscape of communication complexity classes. computational complexity, 27(2):245–304, June 2018. doi:10.1007/s00037-018-0166-6.
  • [13] Ben Green and Tom Sanders. Boolean functions with small spectral norm. Geometric and Functional Analysis, 18(1):144–162, March 2008. doi:10.1007/s00039-008-0654-y.
  • [14] Lianna Hambardzumyan, Hamed Hatami, and Pooya Hatami. Dimension-free bounds and structural results in communication complexity. Israel Journal of Mathematics, 253(2):555–616, 2023.
  • [15] G H Hardy, J E Littlewood, and Georg Polya. Cambridge mathematical library: Inequalities. Cambridge University Press, Cambridge, England, February 1988.
  • [16] Alexis Hocquenghem. Codes correcteurs d’erreurs. Chiffres, 2:147–156, 1959.
  • [17] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. In Proceedings of the Twenty-Third Annual ACM Symposium on Theory of Computing, STOC ’91, pages 455–464, New York, NY, USA, 1991. Association for Computing Machinery. doi:10.1145/103418.103466.
  • [18] Noam Nisan. Crew prams and decision trees. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 327–335, 1989. doi:10.1145/73007.73038.
  • [19] Toniann Pitassi, Morgan Shirley, and Adi Shraibman. The strength of equality oracles in communication. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 89:1–89:19, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2023.89.
  • [20] Tom Sanders. Boolean functions with small spectral norm, revisited. Math. Proc. Camb. Philos. Soc., 167(02):335–344, September 2019.
  • [21] Amir Shpilka, Avishay Tal, and Ben lee Volk. On the structure of boolean functions with small spectral norm. Computational Complexity, 26(1):229–273, September 2017. doi:10.1007/s00037-015-0110-y.
  • [22] Avishay Tal. Tight Bounds on the Fourier Spectrum of AC0. In 32nd Computational Complexity Conference (CCC 2017), volume 79 of Leibniz International Proceedings in Informatics (LIPIcs), pages 15:1–15:31, Dagstuhl, Germany, 2017. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CCC.2017.15.
  • [23] Hing Yin Tsang, Chung Hoi Wong, Ning Xie, and Shengyu Zhang. Fourier sparsity, spectral norm, and the log-rank conjecture. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 658–667, 2013. doi:10.1109/FOCS.2013.76.