Sparsity Lower Bounds for Probabilistic Polynomials

Alman, Josh; Chattopadhyay, Arkadev; Williams, Ryan

doi:10.4230/LIPIcs.ITCS.2025.3

Sparsity Lower Bounds for Probabilistic Polynomials

Josh Alman

Columbia University, New York, NY, USA Arkadev Chattopadhyay

TIFR Mumbai, India Ryan Williams

CSAIL and EECS, MIT, Cambridge, MA, USA

Abstract

Probabilistic polynomials over commutative rings offer a powerful way of representing Boolean functions. Although many degree lower bounds for such representations have been proved, sparsity lower bounds (counting the number of monomials in the polynomials) have not been so common. Sparsity upper bounds are of great interest for potential algorithmic applications, since sparse probabilistic polynomials are the key technical tool behind the best known algorithms for many core problems, including dense All-Pairs Shortest Paths, and the existence of sparser polynomials would lead to breakthrough algorithms for these problems.

In this paper, we prove several strong lower bounds on the sparsity of probabilistic and approximate polynomials computing Boolean functions when $0$ means “false”. Our main result is that the AND of $n$ ORs of $c\log n$ variables requires probabilistic polynomials (over any commutative ring which isn’t too large) of sparsity $n^{\Omega(\log c)}$ to achieve even $1/4$ error. The lower bound is tight, and it rules out a large class of polynomial-method approaches for refuting the APSP and SETH conjectures via matrix multiplication. Our other results include:

$\blacksquare$

Every probabilistic polynomial (over a commutative ring) for the disjointness function on two $n$ -bit vectors requires exponential sparsity in order to achieve exponentially low error.
$\blacksquare$

A generic lower bound that any function requiring probabilistic polynomials of degree $d$ must require probabilistic polynomials of sparsity $\Omega(2^{d})$ .
$\blacksquare$

Building on earlier work, we consider the probabilistic rank of Boolean functions which generalizes the notion of sparsity for probabilistic polynomials, and prove separations of probabilistic rank and probabilistic sparsity.

Some of our results and lemmas are basis independent. For example, over any basis $\{a,b\}$ for true and false where $a\neq b$ , and any commutative ring $R$ , the AND function on $n$ variables has no probabilistic $R$ -polynomial with $2^{o(n)}$ sparsity, $o(n)$ degree, and $1/2^{o(n)}$ error simultaneously. This AND lower bound is our main technical lemma used in the above lower bounds.

Keywords and phrases:

Probabilistic Polynomials, Sparsity, Orthogonal Vectors, Probabilistic Rank

Funding:

Josh Alman: Work supported in part by NSF Grant CCF-2238221.

Arkadev Chattopadhyay: Supported by funds of Department of Atomic Energy, Govt. of India, under project RTI4001, and a Google India Research Award.

Ryan Williams: Work supported in part by NSF CCF-2127597 and NSF CCF-2420092.

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Algebraic complexity theory ; Theory of computation

\rightarrow

Models of computation ; Mathematics of computing

\rightarrow

Discrete mathematics

DOI:

10.4230/LIPIcs.ITCS.2025.3

Event:

16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Editors:

Raghu Meka

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Let $R$ be a ring. The sparsity of a polynomial $p\in R[x_{1},\ldots,x_{n}]$ , denoted by ${\text{sparsity}}(p)$ , is the number of monomials of $p$ . A probabilistic polynomial in $n$ variables is a distribution ${\cal P}$ on $n$ -variate polynomials. Its degree and sparsity are the maximum degree and sparsity, respectively, of any polynomial in its support. The error of ${\cal P}$ on a function $f:\{0,1\}^{n}\rightarrow\{0,1\}$ is the function from ${\mathbb{N}}$ to $[0,1]$ defined as

{\text{error}}({\cal P},f)(n)=\max_{x\in\{0,1\}^{n}}\Pr_{p\sim{\cal P}}\left[p% (x)\neq f(x)\right].

Probabilistic polynomials with low sparsity are the main technical component in a number of recent randomized algorithms for core problems. This includes the best known randomized algorithm for the dense case of All-Pairs Shortest Paths (APSP) [36], the best known randomized algorithm for Orthogonal Vectors (OV) [1], the best known deterministic algorithms for these problems (which derandomizes the aforementioned probabilistic polynomials) [11], as well as Constraint Satisfaction problems [37, 15], All-Pairs Nearest Neighbor problems [3, 2], systems of polynomial equations over finite fields [23], online matrix-vector multiplication [22], and Stable Matching [26]. APSP and OV are especially of interest, as substantially faster algorithms for them would have many applications (see the survey of Vassilevska-Williams [38]). Moreover, it has become popular to conjecture that it is impossible to improve the best-known runtimes of these problems by polynomial factors.

For all of these algorithms, the running time improvement is completely determined by the sparsity of a certain probabilistic polynomial: if we can find a sparser polynomial, we would directly improve the runtime. This leads to the main motivating question of this paper:

Is it possible to design sparser probabilistic polynomials to speed up these algorithms further, and in particular, can we design polynomials sparse enough to refute some of the popular conjectures?

In this paper, we prove unconditionally that several known probabilistic polynomial constructions are already optimal, in a broad sense. The particular Boolean functions of interest for APSP and OV are as follows.

For a function $c:{\mathbb{N}}\to{\mathbb{N}}$ , the Boolean function $OV_{c}:\{0,1\}^{2\cdot n\cdot c(n)\log n}\to\{0,1\}$ , is defined by:

OV_{c}(x,y)=\bigvee_{i_{1},i_{2}\in[n]^{2}}\bigwedge_{j=1}^{c(n)\log n}(% \overline{x_{i_{1},j}}\vee\overline{y_{i_{2},j}}).

The best known algorithm for OV [1] uses a probabilistic ${\mathbb{F}}_{2}$ -polynomial representation of the $OV_{c}$ function with $n^{O(\log c)}$ sparsity to quickly check the orthogonality of many pairs of vectors. The OV problem for $n$ input vectors of dimension $c\log n$ could be solved in truly sub-quadratic time, if a probabilistic polynomial with sparsity $n^{O(1)}$ could be designed for $OV_{c}$ . In particular, if there were a polynomial for $OV_{c}$ with sparsity $n^{k}$ (for some universal $k$ ) that worked for every $c\geq 1$ , that would yield a fast enough algorithm for OV to refute the Strong Exponential Time Hypothesis (SETH) [18, 10]. For completeness, we prove this in Theorem 27 in Appendix B.

Chan and Williams [36, 11] showed that $OV_{n/\log n}$ can be used to solve APSP as well. Their algorithm, the best known randomized algorithm for APSP in dense graphs, crucially uses a probabilistic ${\mathbb{F}}_{2}$ -polynomial representation of the $OV_{n/\log n}$ function with $n^{O(\log n)}$ sparsity, in order to get a $n^{3}/2^{\Omega(\sqrt{\log n})}$ time algorithm. If the sparsity could be improved to $n^{O(1)}$ , then APSP would be solvable in truly subcubic time – an algorithmic breakthrough.

Our main result shows that in a broad sense, the sparsities of the known probabilistic polynomials for $OV_{c}$ are already optimal, up to a constant factor in the exponent:

Theorem 1.

For every $c:{\mathbb{N}}\to{\mathbb{N}}$ , every ${\varepsilon}>0$ and every finite commutative ring $R$ such that $|R|\leq 2^{O(n^{1/2-{\varepsilon}})}$ , $R$ -probabilistic polynomials with error $1/4$ for $OV_{c}$ which use the $0$ of $R$ to represent “false” require sparsity $n^{\Omega(\log c(n))}$ .

Theorem 1 is quite powerful, as it holds for all finite commutative rings rather than just finite fields. For instance, many simple functions like OR are known to have smaller polynomial representations when working modulo a composite number $m$ rather than over a finite field [6], but we rule out a sparsity improvement over such rings as well.

Theorem 1 also rules out sparser representations over larger rings (even infinite rings like ${\mathbb{R}}$ ) without very large coefficients. For instance, given a sparse probabilistic polynomial for $OV_{c}$ over the integers, we could compute all of its coefficients modulo $2$ to get a probabilistic polynomial over ${\mathbb{F}}_{2}$ with at most the same degree, sparsity, and error. Even if we are working over a commutative ring $R$ where such a trick doesn’t work, it takes $n^{1/2-o(1)}$ bits to describe elements of $R$ when $|R|\geq 2^{O(n^{1/2-{\varepsilon}})}$ . Working with such large coefficients is prohibitive when trying to solve OV or APSP, since it would presumably multiply the runtime by such a large polynomial factor, unless polynomial evaluation techniques substantially different from the current known techniques (FFT and fast matrix multiplication) are used.

The (only) caveat of our lower bound in Theorem 1 is that it relies on using the $0$ of $R$ for false (which is very natural when working with $\wedge$ and $\vee$ ). We leave open the problem of proving a “basis independent” lower bound (where false can correspond to any value in $R$ ).

A General Lower Bound for AND and OR

Our main technical tool is a general (basis independent) lower bound for probabilistic polynomials representing the AND and OR functions, which is also of independent interest. The celebrated probabilistic polynomial constructions of Razborov and Smolensky, which are the key component behind the aforementioned probabilistic polynomials in many recent algorithms (including APSP and OV) yield:

Theorem 2 ([29, 32]).

For all primes $p$ and integers $t\geq 1$ , OR and AND on $n$ variables have ${\mathbb{F}}_{p}$ -probabilistic polynomials of degree $(p-1)t$ , sparsity $O(n^{(p-1)t})$ , and error $1/p^{t}$ .

Using this construction, there are ${\mathbb{F}}_{p}$ -probabilistic polynomials for AND on $n$ variables with low values for any two choices of the sparsity, degree, and error measures:

$\blacksquare$

Sparsity $(p-1)n$ and degree $p-1$ (but error $\Omega(1/p)$ ) follows from Theorem 2 with $t=1$ .
$\blacksquare$

Degree $\log n$ and error $1/p^{(\log n)/(p-1)}$ (but sparsity $n^{\Omega(\log n)}$ ) follows from Theorem 2 with $t=(\log n)/(p-1)$ .
$\blacksquare$

Sparsity $1$ and error $0$ (but degree $\Omega(n)$ ) is achieved by the exact polynomial $x_{1}x_{2}\cdots x_{n}$ .

A natural question arises: is there a probabilistic polynomial for AND with low degree, low sparsity, and low error simultaneously?

Question 1.

Is there a probabilistic polynomial for AND on $n$ variables over any commutative ring which has $O(\log n)$ degree, $1/{\text{poly}}(n)$ error, and ${\text{poly}}(n)$ sparsity?

While Question 1 is fundamental in its own right, a positive answer would also contradict a non-uniform version of SETH, by using it to construct sparse probabilistic polynomials for $OV_{c}$ ; see the discussion in Appendix B for more details. In constructing these polynomials, there is a choice of which values in our ring $R$ map to true and false. We could use $\{0,1\}$ , or $\{1,-1\}$ , or even a basis like $\{1,2\}$ [16]. While the degrees of polynomials do not change under such transformations, it is known that in general, the sparsity of polynomials can be very sensitive to this map [21]. Could this be exploited to refute SETH?

We unconditionally prove a strong no answer to Question 1:

Theorem 3.

For every commutative ring $R$ , every distinct $a,b\in R$ , and all $d\geq 1$ and $e>0$ , there is a $c\geq 1$ such that for all sufficiently large $m$ , the AND function on $m$ variables does not have a $R$ -probabilistic polynomial, where $a$ represents false and $b$ represents true, of sparsity $2^{dm/c}$ , degree $m/c$ , and error $2^{-em/c}$ .

Indeed, a polynomial giving a yes answer to Question 1, upon setting all but $m=\omega(\log n)$ variables to 1 (or whichever element of the ring $R$ corresponds to true), would contradict Theorem 3, since sparsity, degree, and error cannot increase upon setting variables. We emphasize that Theorem 3 holds over any commutative ring $R$ and any choice of basis $\{a,b\}$ .

Further Results

Our technical lemmas can be applied to yield several sparsity lower bounds for various algebraic representations of natural and well-studied Boolean functions. First, we give an exponential sparsity-error tradeoff lower bound for probabilistic polynomials for the disjointness function $D I S J$ and the “complements” function $C O M P$ . Informally, $D I S J$ determines whether two given bit-vectors have zero inner product, and $C O M P$ determines whether two given bit-vectors are binary complements of each other; it is very closely related to the equality function (see Section 2 for formal definitions).

Theorem 4.

For every commutative ring $R$ , function $f:{\mathbb{N}}\rightarrow{\mathbb{N}}$ , constants $e>0$ and $d\geq 1$ , there is a $c\geq 1$ such that the disjointness of two $n$ bit vectors does not have a $R$ -probabilistic polynomial of sparsity $f(c)2^{dn/c}$ and error $f(c)/2^{en/c}$ which uses the $0$ of $R$ to represent “false”.

Theorem 5.

Theorem 4 also holds with disjointness replaced by $C O M P$ .

That is, we cannot obtain a probabilistic polynomial for disjointness with subexponential sparsity and subexponential error simultaneously, regardless of the degree. (Observe that the exact representation of disjointness of two $n$ -bit vectors has $2^{n}$ sparsity and $0$ error.)

In the appendices, we also apply the above results in a number of other ways.

In Appendix A, we give a simple equivalence between probabilistic polynomials and certain depth-3 circuits, which makes some prior work relevant. In Appendix B, we discuss how sparse probabilistic polynomials for simple ${\sf AC}^{0}$ functions would refute SETH:

Theorem 6 (Informal; see Theorems 24 and 27).

For any commutative ring $R$ , either of the following would refute SETH:

$\blacksquare$

A low-degree, low-sparsity, and low-error $R$ -probabilistic polynomial for AND (over $0/1$ ), or
$\blacksquare$

An $R$ -probabilistic polynomial for $OR_{n}\circ AND_{c\log n}\circ OR_{2}$ with sparsity $n^{k}$ and constant error for any fixed integer $k$ .

In Appendix C, we investigate probabilistic polynomials over the $\{-1,1\}$ basis instead of $\{0,1\}$ . While the degree of a polynomial does not depend on the basis, the sparsity can be very sensitive to such a change, and we give examples where probabilistic polynomial sparsity lower bounds in the $\{0,1\}$ setting do not hold in the $\{-1,1\}$ setting. Building on our lower bound Theorem 5 over the $\{0,1\}$ basis, we prove:

Theorem 7.

Over any commutative ring $R$ which doesn’t have characteristic $2$ , the complements function $C O M P$ has a probabilistic polynomial with sparsity ${\text{poly}}(n)$ and error $1/{\text{poly}}(n)$ over the basis $\{-1,1\}$ , and does not over the basis $\{0,1\}$ .

In Appendix D, we consider a notion related to the probabilistic sparsity of a function, introduced by [4], called the probabilistic rank. The probabilistic rank of a function $f$ measures the extent to which the truth table matrix, or communication matrix, of $f$ can be probabilistically represented by low rank matrices. Probabilistic sparsity upper bounds give corresponding probabilistic rank upper bounds, a fact which is used by the aforementioned “polynomial method” algorithms to reduce polynomial evaluation to fast matrix multiplication. For some functions, the probabilistic rank could potentially be much smaller than the probabilistic sparsity. This is interesting, both for the prospect of being able to design faster algorithms using rank instead of sparsity, and given the connection between probabilistic rank and the matrix rigidity problem [4]. In Appendix D, we prove a separation between the two notions in a number of settings over ${\mathbb{F}}_{2}$ , including showing that the two notions can be arbitrarily far apart for a natural function:

Theorem 8.

For any $k>\Omega(n/\log(1/{\varepsilon})),$ the ${\varepsilon}$ -probabilistic sparsity of $MAJ\circ XOR$ over ${\mathbb{F}}_{2}$ , where $M A J$ has fan-in $n$ and each $X O R$ has fan-in $k$ , is strictly greater than its ${\varepsilon}$ -probabilistic rank. In particular, as $k$ increases, the ${\varepsilon}$ -probabilistic sparsity grows unboundedly while ${\varepsilon}$ -probabilistic rank remains fixed.

We note that there are more straightforward constructions of functions $f$ which separate probabilistic sparsity and probabilistic rank, such as any function of large probabilistic sparsity which depends only on the first half of its input bits and thus has rank 1. However, the functions which are most interesting in algorithmic applications typically depend on both the first and second halves of the input variables similarly.

In Appendix E, we observe how bounds on one of the sparsity or degree of probabilistic polynomials for a function can give bounds on the other as well. We notably show that, if a function requires probabilistic degree $d$ , then it must require probabilistic sparsity $\Omega(2^{d})$ . Hence, probabilistic degree lower bounds, which are more common in the literature, imply probabilistic sparsity lower bounds as well. We also relate probabilistic degree to probabilistic rank in a better-than-trivial way: Any $n$ -variate Boolean function with a probabilistic polynomial of degree $d$ and error ${\varepsilon}$ has ${\varepsilon}$ -probabilistic rank most $O\left(\binom{n/2}{d/2}\right)$ , compared to the trivial $O\left(\binom{n/2}{d}\right)$ .

Intuition for our Results

Our main proof technique, which we use to prove Theorem 3, and then again in some of our additional results, is a novel combination of two well-known ideas from the literature on lower bounds for Boolean functions and their polynomial representations.

The first idea is carefully chosen random restrictions. Random restrictions are convenient for studying probabilistic polynomial sparsity, since we can study each monomial of a probabilistic polynomial separately, and analyze the distribution of its resulting degree after applying a restriction. We give random restrictions that do not restrict too many variables, but that substantially lower the degree of any sparse polynomial.

The second idea is a variant of the Schwartz-Zippel Lemma (Lemma 13 below), which roughly says that low degree multivariate polynomials must have many roots on the Boolean hypercube. This is especially interesting, for instance, when applied to the $O R$ function, which has only one Boolean root; it says that low degree probabilistic polynomials for $O R$ must have high error.

Our proof strategy combines these two ideas in a new way. Given a sparse probabilistic polynomial for $A N D$ with low degree and error, we design a random restriction so that the restriction of at least one polynomial in the support of the probabilistic polynomial violates the Schwartz-Zippel Lemma, giving a contradiction. Previous work using random restrictions to get sparsity lower bounds for other types of polynomials has typically combined the random restrictions with simple degree lower bounds. Here, we need some novel technical arguments in order to apply the Schwartz-Zippel Lemma with the trade-off between the error and degree of the resulting restriction.

Of course, our description here (necessarily) glosses over many important details. In addition to random restrictions and the Schwartz-Zippel Lemma, our techniques use multiparty communication complexity, algebraic results about commutative rings (especially in our proof that Theorem 3 holds over any basis), and prior work in the area of polynomial representations of Boolean functions. Our proof extending Theorem 3 to any basis, which we give in Section G, is very general, and could be of independent interest for proving basis-independent versions of other results. It shows that for any function $f$ , a simultaneous lower bound on the degree and sparsity of a probabilistic polynomial for $f$ over any one basis implies a sparsity lower bound over any other basis as well.

Prior Work

Work on particular depth-three circuit size lower bounds (namely, ${\sf MAJ}\circ{\sf SYM}\circ{\sf AND}$ circuits) can be seen as lower bounds on the sparsity of probabilistic polynomials (see Appendix A for an overview of the simple connection). For example:

$\blacksquare$

Razborov and Wigderson [30] showed that a simple depth-three ${\sf AC}^{0}[2]$ function $f$ requires ${\sf MAJ}\circ{\sf SYM}\circ{\sf AND}$ circuits of size $n^{\Omega(\log n)}$ . This implies $f\notin{\mathbb{F}}_{p}\text{-SDE}_{0,1}[n^{o(\log n)},n,1/2-{\varepsilon}]$ for any fixed prime $p$ , a sparsity/degree/error lower bound for probabilistic polynomials computing $f$ .
$\blacksquare$

Chattopadhyay [12] gave an ${\sf AC}^{0}$ function requiring exponential ${\sf MAJ}\circ{\sf SYM}\circ{\sf ANY}$ circuits when the bottom fan-in is $o(\log\log n)$ ; this corresponds to an $\Omega(\log\log n)$ degree lower bound for probabilistic polynomials computing an ${\sf AC}^{0}$ function.
$\blacksquare$

Beame and Huynh [8] give a depth-9 ${\sf AC}^{0}$ circuit that needs ${\sf MAJ}\circ{\sf SYM}\circ{\sf AND}$ circuits of size $n^{\Omega(\log n)}$ , corresponding to a probabilistic $n^{\Omega(\log n)}$ -sparsity lower bound for this ${\sf AC}^{0}$ function.
$\blacksquare$

Sherstov [31] gives a depth-3 ${\sf AC}^{0}$ circuit with tighter ${\sf MAJ}\circ{\sf SYM}\circ{\sf AND}$ circuit size lower bound. Essentially, if the bottom fan-in is $(1/2-{\varepsilon})\log n$ for some ${\varepsilon}>0$ , then the circuit size must be exponential.

Our Lemma 20, which we prove on the way to proving Theorem 1, can be viewed as extending this line of work by showing a probabilistic $n^{\Omega(\log n)}$ -sparsity lower bound for a depth-2 ${\sf AC}^{0}$ circuit.

Polynomial sparsity has been studied in a stronger setting, where the polynomials in question only need to correlate with a target function $f$ rather than probabilistically represent $f$ , by Lovett and Srinivasan [24]. Their main result is that polynomials over ${\mathbb{F}}_{2}$ with $n^{o(\log n)}$ monomials have exponentially low correlation with the ${\sf MOD3}$ function. Note that a sparse probabilistic polynomial for a Boolean function $f$ implies the existence of a sparse polynomial with high correlation with $f$ , but not vice versa. Hence, Lovett and Srinivasan’s lower bound applies to probabilistic polynomials as well. Their technique has similarities to some of ours – they also use a type of variable restriction to reduce low sparsity polynomials to low degree polynomials – but their “tree restriction” technique and analysis isn’t powerful enough to apply in our setting where, as we will see, we are constrained in how we are allowed to restrict our variables based on the function we are computing.

Sparsity has also been studied in the setting of polynomial threshold functions, where the sign of a polynomial dictates its output (as a Boolean function). Krause and Pudlak [21, 20] give a function $f$ that has exponentially-high sparsity as a polynomial threshold function in the $\{-1,1\}$ basis, but polynomially-low sparsity over $\{0,1\}$ . Other references on sparsity in the polynomial threshold function setting include Basu et al. [7], O’Donnell and Servedio [28], and Hansen and Poldoskii [16]. It should be noted that these polynomial threshold function sparsity lower bounds do not suffice to prove probabilistic polynomial sparsity lower bounds: they correspond to lower bounds against depth-2, ${\sf MAJ}\circ{\sf AND}$ circuits, compared to our lower bounds against probabilistic polynomials which correspond to depth-3, ${\sf MAJ}\circ{\sf SYM}\circ{\sf AND}$ circuits.

The degrees of probabilistic polynomials are more well-studied, starting with the work of Razborov and Smolensky [29, 32], although even some basic questions remain open. For instance, while $A N D$ is known to have constant-degree probabilistic polynomials for constant error over any fixed finite field, it is unknown what degree is needed over $\mathbb{R}$ : there is a gap between the best upper bound of $O(\log n)$ [9, 34, 5] and the best lower bound of $\tilde{\Omega}(\sqrt{\log n})$ [25, 17]. Our connection between sparsity and degree which we prove in Appendix E makes use of the probabilistic degree of $A N D$ , and is thus only (close to) tight over a finite field.

2 Preliminaries

Definitions and Notation

As usual, all logarithms are assumed to be base-two. For gates ${\cal G}_{1}$ and ${\cal G}_{2}$ , we write ${\cal G}_{1}\circ{\cal G}_{2}$ to denote the circuit whose output gate is a ${\cal G}_{1}$ whose inputs are all copies of ${\cal G}_{2}$ on disjoint variables. We write $AND_{k}$ , $OR_{k}$ , and $XOR_{k}$ to denote the AND, OR, and XOR functions on $k$ inputs, respectively, and $M A J$ is the majority function which computes whether at least half of its inputs are $1$ .

Let ${\cal F}=\{f_{n}:\{0,1\}^{n}\rightarrow\{0,1\}\}$ be a decision problem, construed as an infinite family of Boolean functions (one for each $n\in{\mathbb{N}}$ ). To facilitate the presentation, we define a (non-uniform) complexity class for problems which are computable by sparse, low-degree, and low-error probabilistic polynomials over $R$ :

Definition 9.

A decision problem ${\cal F}$ is in the class $\text{\bf$R$-SDE}[s(n),d(n),e(n)]$ if for all but finitely many $n$ , there is a probabilistic $R$ -polynomial ${\cal P}_{n}$ for $f_{n}\in{\cal F}$ with sparsity at most $s(n)$ , degree at most $d(n)$ , and error at most $e(n)$ . We analogously define $\text{\bf$R$-SE}[s(n),e(n)]$ and $\text{\bf$R$-DE}[d(n),e(n)]$ when there is no bound on the third parameter, and $\text{\bf$R$-SDE}_{a,b}[s(n),d(n),e(n)]$ for probabilistic polynomials over the basis where “false” means $a\in R$ and “true” means $b\in R$ (when omitted, the default is that $a=0$ and $b=1$ ).

With this notation, we can succinctly state results about probabilistic polynomials for Boolean functions.

Many of our lower bounds concern two different Boolean functions, the disjointness function, and the complements function.

Definition 10.

For vectors $x,y\in\{0,1\}^{n}$ , the disjointness function, $DISJ:\{0,1\}^{2n}\to\{0,1\}$ , is given by $DISJ(x_{1},\ldots,x_{n},y_{1},\ldots,y_{n})=\bigwedge_{i=1}^{n}(x_{i}\vee y_{i% }).$ (Note that this definition negates the variables compared to how the disjointness function often appears in the literature.)

Definition 11.

For vectors $y\in\{0,1\}^{n}$ , we write $\bar{y}$ to denote the complement of $y$ , given by $\bar{y}_{i}=1-y_{i}$ for each $1\leq i\leq n$ . The equality function, $EQ:\{0,1\}^{2n}\to\{0,1\}$ , is given by, for $x,y\in\{0,1\}^{n}$ , $EQ(x,y)=1$ if $x=y$ and $EQ(x,y)=0$ otherwise. In other words, ${EQ}(x_{1},\ldots,x_{n},y_{1},\ldots,y_{n}):=\bigwedge_{i=1}^{n}(x_{i}\oplus y% _{i}\oplus 1)$ .

The complements function, $COMP:\{0,1\}^{2n}\to\{0,1\}$ , is given by, for $x,y\in\{0,1\}^{n}$ , $COMP(x,y)=EQ(x,\bar{y})$ . In other words, ${COMP}(x_{1},\ldots,x_{n},y_{1},\ldots,y_{n}):=\bigwedge_{i=1}^{n}(x_{i}\oplus y% _{i})$ .

Chernoff Bound

In a few of our proofs, we will use a standard Chernoff bound for sums of independent Bernoulli random variables.

Lemma 12 (Chernoff bound).

If $X_{1},\ldots,X_{n}$ are independent Bernoulli random variables with sum $X=X_{1}+\cdots+X_{n}$ , and $\mu$ is the expected value of $X$ , then we have:

\Pr[X\leq(1-\delta)\mu]\leq\exp\{-\delta^{2}\mu/2\}\text{ for all $0<\delta<1$% , and}

\Pr[X\geq(1+\delta)\mu]\leq\exp\{-\delta^{2}\mu/3\}\text{ for all $0<\delta$}.

Useful Facts About Polynomials

We state two key facts from past work which we will use in the proofs of our main results. First, we will need the following variant of the Schwartz-Zippel-DeMillo-Lipton Lemma (which appears, for instance, in [27, Lemma 2.6]).

Lemma 13 (The 0-1 Schwartz-Zippel Lemma).

Let $p$ be a nonzero multilinear $n$ -variate polynomial over any commutative ring $R$ , of total degree at most $d$ . Then $\Pr_{x\sim\{0,1\}^{n}}\left[p(x)\neq 0\right]\geq 1/2^{d}$ .

A proof can be found in Appendix G for completeness. Note that Lemma 13 is tight for $p(x_{1},\ldots,x_{d})=\prod_{i=1}^{d}x_{i}$ . Second, we need a simple lower bound on the probabilistic degree of $AND\circ OR$ circuits which can be derived from a communication complexity lower bound. We refer the reader to the introduction of Sherstov’s paper [31] for the relevant definitions (of the number-on-forehead communication model, and the $k$ -party set disjointness problem), as we will only need the statement of Corollary 15 for this paper.

Theorem 14 ([31] Theorem 1.1).

The number-on-forehead communication complexity of $k$ -party set disjointness on $n$ elements with error $1/3$ is at least $\Omega(\sqrt{n}/(2^{k}k))$ .

Corollary 15.

For every ${\varepsilon}>0$ and every $a>0$ , there is a $c>0$ such that: For every sufficiently large positive integer $n$ , every positive integer $k$ with $k<c\cdot\log n$ , and every finite commutative ring $R$ with $|R|\leq 2^{a\cdot n^{1/2-{\varepsilon}}}$ , the function $AND\circ OR$ , where the $A N D$ has fan-in $n$ and each $O R$ has fan-in $k$ , does not have a probabilistic polynomial of error $1/3$ and degree less than $k$ .

Proof (sketch).

From such a polynomial, we can design a $1/3$ error number-on-forehead communication protocol for $k$ -party set disjointness: draw a polynomial from the probabilistic polynomial, then map each monomial to a player whose variable is not in that monomial. Each player can evaluate their monomials and report the sum, from which they can compute set disjointness with the desired error. Each player only needs to communicate one element of $R$ , using $O(\log(|R|))=O(a\cdot n^{1/2-{\varepsilon}})$ bits of communication, so the total communication is $O(a\cdot k\cdot n^{1/2-{\varepsilon}})$ . If $k<c\cdot\log n$ for sufficiently small $c>0$ , then this contradicts Theorem 14. $\hfill\blacktriangleleft$

3 Sparsity Lower Bounds for Probabilistic Polynomials

We begin by proving a probabilistic polynomial lower bound for AND:

Theorem 16.

For every commutative ring $R$ , and all $d\geq 1$ and $e>0$ , there is a $c\geq 1$ such that for all sufficiently large $m$ , the AND function on $m$ variables is not in $\text{\bf$R$-SDE}[2^{dm/c},m/c,2^{-em/c}]$ .

For notational simplicity, we use the substitution $m=c\log n$ in our proof:

$\blacktriangleright$ Restatement of Theorem 16. For every commutative ring $R$ , and all $d\geq 1$ and $e>0$ , there is a $c\geq 1$ such that for all sufficiently large $n$ , the AND function on $c\log n$ variables is not in $\text{\bf$R$-SDE}[n^{d},\log n,1/n^{e}]$ .

Theorem 16 is a special case of Theorem 3, restricted to when the basis is $\{0,1\}$ . Theorems 4 and 5 will follow from this. We will later prove the more general Theorem 3, which holds for any basis over $R$ , in Appendix G. In other words, we are currently focusing on the $\{0,1\}$ basis, but we later generalize this statement to hold over any basis. Before we give the proof, let us show that Theorem 16 implies Theorems 4 and 5.

Proof of Theorem 4.

Let $P$ be an $\text{\bf$R$-SE}[f(c)2^{dn/c},f(c)/2^{en/c}]$ representation of $\text{AND}_{n}\circ\text{OR}_{2}$ . From each of the $\text{OR}_{2}$ gates, pick at random one variable feeding into the gate and fix it $0$ . Thus, for each $c^{\prime}$ , every monomial in $P$ of degree at least $c^{\prime}n/c$ survives with probability at most $1/2^{c^{\prime}n/c}$ . The probability that $P$ under this restriction has degree greater than $c^{\prime}n/c$ is therefore at most $f(c)2^{dn/c}\cdot\frac{1}{2^{c^{\prime}n/c}}$ . We set $c>c^{\prime}>d$ .

Note that under this restriction, the disjointness function reduces to $\text{AND}_{n}$ . Thus, we have constructed a probabilistic polynomial for $\text{AND}_{n}$ of degree $c^{\prime}n/c$ with error $\frac{f(c)2^{dn/c}}{2^{c^{\prime}n/c}}+\frac{f(c)}{2^{en/c}}$ . If $c$ is large enough, this contradicts Theorem 16. Hence our assumption about the existence of $P$ must be false. $\hfill\blacktriangleleft$

Proof of Theorem 5.

The proof is identical, since restricting one input of each $XOR_{2}$ gate to $0$ reduces $C O M P$ to $A N D$ as well. $\hfill\blacktriangleleft$

We now begin proving Theorem 16. We start with a simple case to illustrate our use of the 0-1 Schwartz-Zippel Lemma (Lemma 13):

Lemma 17.

For every commutative ring $R$ , and all $e>1$ , there is a $c>1$ such that for all sufficiently large $n$ , the AND function on $c\log n$ variables is not in $R$ -DE $[\log n,1/n^{e}]$ .

Proof.

Let ${\cal P}$ be a probabilistic $R$ -polynomial with error at most $1/n^{e}$ computing AND of $c\log n$ bits, such that every polynomial in the distribution has at most $\log n$ degree.

First we claim that, without loss of generality, the support of ${\cal P}$ may be assumed to not contain the identically zero polynomial $z$ . Since ${\cal P}$ is a polynomial for AND with error at most $1/n^{e}$ , on the point $y=(1,\ldots,1)$ we must have $\Pr_{p\sim{\cal P}}[p(y)=1]\geq 1-1/n^{e}$ . Therefore, if $z$ is in the support of ${\cal P}$ , it must have probability at most $1/n^{e}$ of being chosen. Hence if we simply replace $z$ in ${\cal P}$ with the polynomial which is identically the constant $1$ , the new probability of error is at most $2/n^{e}$ .

Next, we prove there is no distribution ${\cal P}$ satisfying the above properties, if $e>1$ . Otherwise, by fixing randomness in ${\cal P}$ (i.e., by picking any polynomial in the support of ${\cal P}$ which achieves at most the average error), we can identify a fixed polynomial $p$ of degree at most $\log n$ which disagrees with the AND function on at most a $1/n^{e}<1/n$ fraction of inputs. By the previous paragraph, $p$ is not identically zero. Since the AND of $c\log n$ variables is nonzero on only one point in $\{0,1\}^{c\log n}$ , it follows that the polynomial $p$ is nonzero on at most $n^{c}/n^{e}+1$ points in $\{0,1\}^{c\log n}$ . Hence the nonzero degree- $\log n$ polynomial $p$ satisfies

\Pr_{x\sim\{0,1\}^{c\log n}}\left[p(x)\neq 0\right]\leq 1/n^{e}+1/n^{c},

contradicting Lemma 13 when $e,c>1$ . $\hfill\blacktriangleleft$

Before we turn to the harder part of the proof, where $e\leq 1$ , we sketch the main ideas. Our first step is to hit a presumed probabilistic polynomial for AND with a random restriction of $1$ -inputs. The probabilities are chosen carefully so that the degree of the polynomial decreases considerably, yet the number of variables restricted is not too small, with high probability. That is, we shrink the degree of the polynomial, but we do not force it to a constant. Next, we select a non-zero polynomial from the remaining probabilistic polynomial distribution, and argue that with high probability, its degree has dropped significantly below $e\log n$ , but its error on the AND function remains about $O(1/n^{e})$ . This contradicts Lemma 13, which says that a non-zero polynomial of degree- $d$ must be non-zero on at least $1/2^{d}$ points. While the general idea of the proof is not too complicated, we need to be careful with the details, and choose parameters very particularly so that all of the required constraints hold. Let us now give the details:

Proof of Theorem 16.

Assume there is a probabilistic polynomial ${\cal P}$ over $R$ for AND with $d,e>0$ such that for all $c\geq 1$ , the sparsity is at most $n^{d}$ , the degree is $\log n$ , and the error is $1/n^{e}$ . As in the proof of Lemma 17, we may assume that the zero polynomial is not in the support of ${\cal P}$ . Let $\alpha,\beta,\gamma>0$ be real-valued parameters to be set later. Define

g:=(1+\alpha)(\log n)/\gamma,\text{ and }\ell:=(1-\beta)(c\log n)/\gamma.

Consider the following construction ${\cal Q}$ of a probabilistic polynomial for AND on $\ell$ bits:

$\blacksquare$

Given $(x_{1},\ldots,x_{\ell})\in\{0,1\}^{\ell}$ , sample $p(y_{1},\ldots,y_{c\log n})\sim{\cal P}$ .
$\blacksquare$

For $i=1,\ldots,c\log n$ , independently and with probability $1-1/\gamma$ , assign $y_{i}$ to $1$ in the polynomial $p$ . Let $p^{\prime}$ be the remaining polynomial in ${\text{vars}}(p^{\prime})\leq c\log n$ variables.
$\blacksquare$

If ${\text{vars}}(p^{\prime})\geq\ell$ and $\deg(p^{\prime})\leq g$ , then output $p^{\prime}(x_{1},\ldots,x_{\ell},1,\cdots,1)$ (where the $\cdots$ indicate ${\text{vars}}(p^{\prime})-\ell$ ones).
$\blacksquare$

Otherwise, output a random bit.

Our central claim is that, for appropriate setting of $\alpha,\beta,\gamma$ , the probabilistic polynomial ${\cal Q}$ has degree at most $g$ and computes the AND of $\ell$ bits with error at most $O(1/n^{e})$ . The degree of ${\cal Q}$ and AND functionality follow by construction; we need to bound the error. This will follow from positing a series of inequalities in the analysis that imply a small error bound, then proving at the end that the inequalities can be satisfied.

Define $\text{err}:=\Pr[(\deg(p^{\prime})>g)\vee({\text{vars}}(p^{\prime})<\ell)]$ , where the probability is over the sampling in ${\cal P}$ (of step 1) and the random restriction to $p^{\prime}$ (of step 2). Observe that err is precisely the probability that case 4 is reached in the above procedure. Further observe that the error of ${\cal Q}$ is at most $1/n^{e}+\text{err}$ : we have $1/n^{e}$ probability of error from ${\cal P}$ which is $1/n^{e}$ , and probability err of reaching case 4.

We now claim that $\text{err}\leq 2/n^{e}$ , when $\alpha,\beta,\gamma>0$ are set appropriately. In particular, provided that

	$\displaystyle\frac{\alpha^{2}(1+\alpha)}{3\gamma^{2}}$	$\displaystyle\geq d+e,\leavevmode\nobreak\ \text{ and }\leavevmode\nobreak\$		(1)
	$\displaystyle\frac{\beta^{2}c}{2\ln(2)\gamma}$	$\displaystyle\geq e,$		(2)

we will have that both $\Pr[\deg(p^{\prime})>g]$ and $\Pr[{\text{vars}}(p^{\prime})<\ell]$ are at most $1/n^{e}$ .

Observe that the number of variables ${\text{vars}}(p^{\prime})$ in $p^{\prime}$ can be seen as a sum of independent $0-1$ random variables $Y_{1},\ldots,Y_{c\log n}$ , such that $\Pr[Y_{i}]=1/\gamma$ . Hence ${\text{vars}}(p^{\prime})$ has expectation $(c\log n)/\gamma$ and by a Chernoff bound (Lemma 12),

\Pr[{\text{vars}}(p^{\prime})<\ell]=\Pr[\sum_{i}Y_{i}<(1-\beta)(c\log n)/% \gamma]\leq\exp\{-\beta^{2}(c\log n)/(2\gamma)\}.

But if (2) holds, then $\Pr[{\text{vars}}(p^{\prime})<\ell]<\exp\{-\beta^{2}(c\log n)/(2\gamma)\}\leq% \exp\{-e\ln n\}=1/n^{e}$ .

We will analyze $\Pr[\deg(p^{\prime})>g]$ by considering the monomials of $p$ . Let $m$ be a monomial in $p\sim{\cal P}$ of degree $\deg(m)\leq\log n$ . If $\deg(m)\leq g$ , then the monomial $m$ cannot possibly contribute to the event that $\deg(p^{\prime})>g$ (the degree cannot increase by setting variables). So we may assume $\deg(m)>g$ . There is a corresponding randomly reduced monomial $m^{\prime}$ in $p^{\prime}$ , obtained by setting each of the variables in monomial $m$ to $1$ with probability $1/\gamma$ . (Note this is the worst case; in some commutative rings $R$ , the monomial $m^{\prime}$ could also cancel with another monomial and disappear from $p^{\prime}$ .) The degree $\deg(m^{\prime})$ can be seen as a sum of independent 0-1 random variables $Z_{1},\ldots,Z_{\deg(m)}$ where $\Pr[Z_{i}]=1/\gamma$ . By a Chernoff bound (Lemma 12),

	$\displaystyle\Pr[\deg(m^{\prime})>(1+\alpha)\deg(m)/\gamma]$	$\displaystyle=\Pr[\sum_{i}Z_{i}>(1+\alpha)\deg(m)/\gamma]$
		$\displaystyle\leq\exp\{-\alpha^{2}\deg(m)/(3\gamma)\}$
		$\displaystyle<\exp\{-\alpha^{2}g/(3\gamma)\}=\exp\{-\alpha^{2}(1+\alpha)(\log n% )/(3\gamma^{2})\}.$

Now, if (1) holds, then $(\alpha^{2}(1+\alpha))/(3\ln(2)\gamma^{2})\geq d+e$ and therefore

\Pr[\deg(m^{\prime})>(1+\alpha)\deg(m)/\gamma]<\exp\{-\alpha^{2}(1+\alpha)(% \log n)/(3\gamma^{2})\}\leq\exp\{-(d+e)\ln n\}=1/n^{d+e}.

This bound and $\deg(m)\leq\log n$ holds for every monomial $m$ in $p$ , and there are at most $n^{d}$ monomials, so

	$\displaystyle\Pr[\deg(p^{\prime})>(1+\alpha)(\log n)/\gamma]$	$\displaystyle\leq\Pr\left[(\exists\text{ monomial }m^{\prime})[\deg(m^{\prime}% )>(1+\alpha)(\log n)/\gamma]\right]$
		$\displaystyle\leq\Pr\left[(\exists\text{ monomial }m^{\prime})[\deg(m^{\prime}% )>(1+\alpha)\deg(m)/\gamma]\right]$
		$\displaystyle\leq n^{d}/n^{d+e}=1/n^{e},$

by the union bound.

We have proved that, assuming $\alpha,\beta,\gamma$ are set properly, ${\cal Q}$ has degree at most $g$ and computes the AND of $\ell$ bits with error at most $O(1/n^{e})$ . Finally, suppose that

\displaystyle\frac{1+\alpha}{\gamma}<e.

(3)

Then, ${\cal Q}$ is a probabilistic polynomial for the AND on $\ell$ variables, with degree at most $g=(1+\alpha)(\log n)/\gamma<e\log n$ and error $O(1/n^{e})$ .

Fixing the randomness in ${\cal Q}$ , we are in an analogous situation as the earlier case of $e>1$ : we can find a single non-zero degree- $g$ polynomial $q$ that differs from the AND of $\ell$ variables on an $O(1/n^{e})$ -fraction of points. Since the AND is nonzero on a $1/2^{\ell}$ fraction of points, the polynomial $q$ is non-zero on at most an $O(1/n^{e})+1/2^{\ell}$ fraction of points. Provided that

\displaystyle\frac{(1-\beta)c}{\gamma}>e,

(4)

we will have $\ell=(1-\beta)(c\log n)/\gamma>e\log n$ and $1/n^{e}>1/2^{\ell}$ , hence $q$ is non-zero on an $O(1/n^{e})$ fraction of points. But then the degree of $q$ is $\deg(q)\leq g\leq(e-{\varepsilon})\log n$ , for some ${\varepsilon}>0$ . This contradicts Lemma 13.

It remains to show that (1), (2), (3), and (4) can be simultaneously satisfied to yield the above contradiction, which we do in Lemma 46 in Appendix F. $\hfill\blacktriangleleft$

The full proof of Theorem 3, which generalizes the above Theorem 16 to any basis $\{a,b\}$ over $R$ , can be found in Appendix G. It shows that picking representatives in $R$ for true and false other than $1$ and $0$ cannot help to overcome the lower bound of Theorem 16. It is unfortunately not so straightforward to prove from here, since our proof of Theorem 16 critically uses the Schwartz-Zippel lemma, Lemma 13, which does not hold for general $a, b$ over any commutative ring $R$ . For a simple example, if $(a,b)=(0,3)$ over $R={\mathbb{Z}}/6{\mathbb{Z}}$ , then the nonzero linear polynomial $p(x)=2x$ is nonetheless zero on both $a$ and $b$ . We solve this issue by taking any probabilistic polynomial ${\cal P}$ over $R$ with a basis $\{a,b\}$ where the Schwartz-Zippel lemma does not hold, and performing a combinatorial transformation to yield a new probabilistic polynomial ${\cal Q}$ over $R$ with a different basis where the Schwartz-Zippel lemma does hold, and such that ${\cal Q}$ has the same degree and error, and only mildly greater sparsity, than ${\cal P}$ .

3.1 Probabilistic Sparsity Lower Bounds for Compositions

We next show how probabilistic degree lower bounds for Boolean functions $f$ can lead to probabilistic sparsity lower bounds for compositions of $f$ with simple functions like $O R$ and $X O R$ .

Theorem 18.

Let $R$ be any commutative ring, and let $f:\{0,1\}^{n}\to\{0,1\}$ be any Boolean function such that any probabilistic polynomial of error ${\varepsilon}$ for $f$ over $R$ requires degree $d$ . Then, for every $0<a<{\varepsilon}$ , every $({\varepsilon}-a)$ -error probabilistic polynomial over $R$ for the $k n$ -variate function $f\circ OR_{k}$ , the composition of $f$ with $O R$ s of $k$ disjoint variables, requires sparsity at least $a\cdot k^{d}$ over the $\{0,1\}$ basis.

Proof.

Label the variables of our function of interest, $f\circ OR_{k}$ , by $x_{i,j}$ for $i\in\{1,\ldots,n\}$ and $j\in\{1,\ldots,k\}$ , so that our function can be written $f\left(\bigvee_{j=1}^{k}x_{1,j},\ldots,\bigvee_{j=1}^{k}x_{n,j}\right)$ . Consider the following random restriction $\rho$ : for each $i=1,\ldots,n$ , choose one $j\in\{1,\ldots,k\}$ uniformly and independently at random, and set $x_{i,j^{\prime}}$ to $0$ for each $j^{\prime}\neq j$ . Notice in particular that after applying $\rho$ to the inputs of our function $f\circ OR_{k}$ , the result is the function $f$ on the remaining $n$ variables which were not set to $0$ .

Assume to the contrary that a probabilistic polynomial $P$ exists for $f\circ OR_{k}$ with sparsity less than $a\cdot k^{d}$ . We construct a probabilistic polynomial for $f$ as follows:

1.

Draw a $p\sim P$ , and a random restriction $\rho$ as described above.
2.

Let $p_{\rho}$ be the polynomial obtained by substituting $0$ s chosen by $\rho$ into the variables of $p$ .
3.

If $deg(p_{\rho})<d$ then output $p_{\rho}$ , otherwise output a random bit.

Since the restriction $\rho$ transforms $f\circ OR_{k}$ into $f$ , we know that $p_{\rho}$ as constructed in step 2 is a probabilistic polynomial for $f$ with error $({\varepsilon}-a)$ . If we can show that we actually return $p_{\rho}$ with probability at least $1-a$ in step 3, then we know our probabilistic polynomial has error at most $({\varepsilon}-a)+a={\varepsilon}$ , and it by definition has degree less than $d$ , which will contradict our assumption about $f$ as desired.

Observe that the monomials of $p_{\rho}$ are a subset of the monomials of $p$ . Moreover, a monomial of $p$ appears in $p_{\rho}$ if and only if none of its variables were set to $0$ . For a monomial $m$ in $p$ of degree at least $d$ , the probability that $m$ appears in $p_{\rho}$ is at most $1/k^{d}$ : if both $x_{i,j_{1}}$ and $x_{i,j_{2}}$ appear in $m$ for some $i$ and $j_{1}\neq j_{2}$ , then $m$ is definitely set to zero upon restriction $\rho$ , and otherwise, each variable in $m$ is set to $0$ with probability $1-1/k$ independently of the rest.

Therefore, if $p$ has sparsity $|p|$ , then the expected number of monomials of degree at least $d$ in $p_{\rho}$ is at most $|p|/k^{d}\leq a$ . Hence, by Markov’s inequality, the probability that $p_{\rho}$ has degree at least $d$ , meaning it has at least one monomial of degree at least $d$ , is at most $a$ , as desired. $\hfill\blacktriangleleft$

Corollary 19.

Theorem 18 also holds with $f\circ OR_{k}$ replaced by $f\circ XOR_{k}$ .

Applying Theorem 18 and Corollary 15, we can prove that a simple depth-2 $AC^{0}$ circuit requires probabilistic sparsity $n^{\Omega(\log n)}$ . (Note the lower bound is tight, as mentioned in the introduction.)

Lemma 20.

For every ${\varepsilon}>0$ and every finite commutative ring $R$ such that $|R|\leq 2^{O(n^{1/2-{\varepsilon}})}$ , and every function $g:{\mathbb{N}}\to{\mathbb{N}}$ , $R$ -probabilistic polynomials with error $1/4$ for $AND\circ OR$ , where the $A N D$ has fan-in $n$ and each $O R$ has fan-in $g(n)\cdot\log(n)$ , require sparsity $n^{\Omega(\log g(n))}$ over the $\{0,1\}$ basis.

Proof.

Pick a sufficiently small $c>0$ such that Corollary 15 holds, meaning $f:=AND_{n}\circ OR_{c\log n}$ requires $R$ -probabilistic degree $\Omega(\log n)$ for error $1/3$ . Applying Theorem 18 to $f$ with $a=1/12$ and $k=g(n)$ , we see that $f\circ OR_{g(n)}$ requires ${\mathbb{F}}_{2}$ -probabilistic sparsity $(g(n))^{\Omega(\log n)}=n^{\Omega(\log g(n))}$ for error $1/4$ . Collapsing the two bottom layers of $O R$ s, we see that $f\circ OR_{g(n)}=AND_{n}\circ OR_{c\cdot g(n)\log n}$ , which implies the desired result since $c\cdot g(n)\log n<g(n)\log(n)$ . $\hfill\blacktriangleleft$

We can finally prove our main theorem:

Proof of Theorem 1.

Setting $y_{i,j}=1$ for all $i, j$ changes $OV_{g}$ into the negation of the function from Lemma 20. The same lower bounds hold since setting variables does not increase the degree, sparsity, or error. $\hfill\blacktriangleleft$

4 Conclusion

By combining multiple techniques (random restrictions, Schwartz-Zippel, communication complexity lower bounds, known degree lower bounds, etc.), we have shown sparsity lower bounds for probabilistic and approximate polynomials computing several natural functions of interest in algorithm design. Perhaps the main question left open by our work is whether there is a polynomially-sparse constant-error probabilistic polynomial for the depth-three $OR_{n}\circ AND_{n}\circ OR_{2}$ circuit over some basis $\{a,b\}$ where $a\neq b$ , working in some ring $R$ . Even more generally, does this circuit have low probabilistic rank over some ring? In this paper, we proved several strong sparsity lower bounds over the basis $\{0,1\}$ which significantly narrows the search space for such polynomials, but some of steps in our proofs require that the basis contains $0$ (“killing” monomials by setting variables to $0$ ).

For the complements ( $C O M P$ ) function, there is a nice sparsity upper bound over $\{-1,1\}$ (compared to the lower bound over $\{0,1\}$ ), so it is still possible (but unlikely) that the $\{-1,1\}$ basis could support sparse probabilistic polynomials for depth-3 ${\sf AC}^{0}$ functions. More basis-independent methods for sparsity lower bounds (such as Theorem 3) would be of great interest, as well as new techniques for constructing sparse polynomial representations.

References

[1] Amir Abboud, Ryan Williams, and Huacheng Yu. More applications of the polynomial method to algorithm design. In SODA, pages 218–230, 2015.
[2] Josh Alman, Timothy M Chan, and Ryan Williams. Polynomial representations of threshold functions and algorithmic applications. In FOCS, pages 467–476, 2016. doi:10.1109/FOCS.2016.57.
[3] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. In FOCS, pages 136–150, 2015. doi:10.1109/FOCS.2015.18.
[4] Josh Alman and Ryan Williams. Probabilistic rank and matrix rigidity. In STOC, pages 641–652, 2017. doi:10.1145/3055399.3055484.
[5] James Aspnes, Richard Beigel, Merrick Furst, and Steven Rudich. The expressive power of voting polynomials. Combinatorica, 14(2):135–148, 1994. doi:10.1007/BF01215346.
[6] David A Mix Barrington, Richard Beigel, and Steven Rudich. Representing boolean functions as polynomials modulo composite numbers. Computational Complexity, 4(4):367–382, 1994. doi:10.1007/BF01263424.
[7] Saugata Basu, Nayantara Bhatnagar, Parikshit Gopalan, and Richard J. Lipton. Polynomials that sign represent parity and descartes’ rule of signs. Computational Complexity, 17(3):377–406, 2008. doi:10.1007/S00037-008-0244-2.
[8] Paul Beame and Trinh Huynh. Multiparty communication complexity and threshold circuit size of $\sf{AC}^{0}$ . SIAM Journal on Computing, 41(3):484–518, 2012. doi:10.1137/100792779.
[9] Richard Beigel, Nick Reingold, and Daniel Spielman. The perceptron strikes back. In Structure in Complexity Theory Conference, pages 286–291. IEEE, 1991. doi:10.1109/SCT.1991.160270.
[10] Chris Calabro, Russell Impagliazzo, and Ramamohan Paturi. The complexity of satisfiability of small depth circuits. In Parameterized and Exact Complexity (IWPEC), pages 75–85, 2009. doi:10.1007/978-3-642-11269-0_6.
[11] Timothy M Chan and Ryan Williams. Deterministic apsp, orthogonal vectors, and more: Quickly derandomizing razborov-smolensky. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1246–1255. Society for Industrial and Applied Mathematics, 2016. doi:10.1137/1.9781611974331.CH87.
[12] Arkadev Chattopadhyay. Discrepancy and the power of bottom fan-in in depth-three circuits. In FOCS, pages 449–458. IEEE, 2007. doi:10.1109/FOCS.2007.30.
[13] Ernie Croot, Vsevolod F Lev, and Péter Pál Pach. Progression-free sets in Zn. Annals of Mathematics, 185:331–337, 2017.
[14] Zeev Dvir and Benjamin L Edelman. Matrix rigidity and the croot-lev-pach lemma. Theory Of Computing, 15(8):1–7, 2019. doi:10.4086/TOC.2019.V015A008.
[15] Jiawei Gao, Russell Impagliazzo, Antonina Kolokolova, and R. Ryan Williams. Completeness for first-order properties on sparse structures with algorithmic applications. In SODA, pages 2162–2181, 2017. doi:10.1137/1.9781611974782.141.
[16] Kristoffer Arnsfelt Hansen and Vladimir V Podolskii. Polynomial threshold functions and boolean threshold circuits. Information and Computation, 240:56–73, 2015. doi:10.1016/J.IC.2014.09.008.
[17] Prahladh Harsha and Srikanth Srinivasan. On polynomial approximations to ac. Random Structures & Algorithms, 54(2):289–303, 2019. doi:10.1002/RSA.20786.
[18] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. JCSS, 62(2):367–375, 2001. doi:10.1006/JCSS.2000.1727.
[19] Swastik Kopparty and Srikanth Srinivasan. Certifying polynomials for ACˆ0(parity) circuits, with applications. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2012, pages 36–47, 2012. doi:10.4230/LIPICS.FSTTCS.2012.36.
[20] Matthias Krause and Pavel Pudlák. On the computational power of depth-2 circuits with threshold and modulo gates. Theoretical Computer Science, 174(1-2):137–156, 1997. doi:10.1016/S0304-3975(96)00019-9.
[21] Matthias Krause and Pavel Pudlák. Computing boolean functions by polynomials and threshold circuits. computational complexity, 7(4):346–370, 1998. doi:10.1007/S000370050015.
[22] Kasper Green Larsen and R. Ryan Williams. Faster online matrix-vector multiplication. In SODA, pages 2182–2189, 2017. doi:10.1137/1.9781611974782.142.
[23] Daniel Lokshtanov, Ramamohan Paturi, Suguru Tamaki, R. Ryan Williams, and Huacheng Yu. Beating brute force for systems of polynomial equations over finite fields. In SODA, pages 2190–2202, 2017. doi:10.1137/1.9781611974782.143.
[24] Shachar Lovett and Srikanth Srinivasan. Correlation bounds for poly-size $\sf{AC}^{0}$ circuits with $n^{1-o(1)}$ symmetric gates. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 640–651. Springer, 2011.
[25] Raghu Meka, Oanh Nguyen, and Van Vu. Anti-concentration for polynomials of independent random variables. arXiv preprint, 2015. arXiv:1507.00829.
[26] Daniel Moeller, Ramamohan Paturi, and Stefan Schneider. Subquadratic algorithms for succinct stable matching. In Computer Science Symposium in Russia, pages 294–308, 2016. doi:10.1007/978-3-319-34171-2_21.
[27] Noam Nisan and Mario Szegedy. On the degree of boolean functions as real polynomials. Computational complexity, 4(4):301–313, 1994. doi:10.1007/BF01263419.
[28] Ryan O’Donnell and Rocco A Servedio. Extremal properties of polynomial threshold functions. In IEEE Conference on Computational Complexity, pages 3–12. Citeseer, 2003. doi:10.1109/CCC.2003.1214406.
[29] A. A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
[30] Alexander Razborov and Avi Wigderson. $n^{\Omega(\log n)}$ lower bounds on the size of depth-3 threshold cicuits with and gates at the bottom. Information Processing Letters, 45(6):303–307, 1993. doi:10.1016/0020-0190(93)90041-7.
[31] Alexander A Sherstov. Communication lower bounds using directional derivatives. Journal of the ACM (JACM), 61(6):34, 2014.
[32] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In STOC, pages 77–82, 1987. doi:10.1145/28395.28404.
[33] Srikanth Srinivasan. On improved degree lower bounds for polynomial approximation. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS, pages 201–212, 2013. doi:10.4230/LIPICS.FSTTCS.2013.201.
[34] Jun Tarui. Probabilistic polynomials, ac0 functions and the polynomial-time hierarchy. Theoretical computer science, 113(1):167–183, 1993. doi:10.1016/0304-3975(93)90214-E.
[35] Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theoretical Computer Science, 348(2-3):357–365, 2005. doi:10.1016/J.TCS.2005.09.023.
[36] Ryan Williams. Faster all-pairs shortest paths via circuit complexity. In STOC, pages 664–673, 2014. doi:10.1145/2591796.2591811.
[37] Ryan Williams. The polynomial method in circuit complexity applied to algorithm design (invited talk). In 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS, pages 47–60, 2014.
[38] Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In Proceedings of the ICM, 2018.

Appendix A Depth-3 Circuits and Probabilistic Polynomials

The study of probabilistic polynomials over ${\mathbb{F}}_{p}$ is equivalent to studying depth-three circuits of a certain form; let us recall the simple connection. In the following, let $||x||$ denote the number of 1s in the binary string $x$ .

Definition 21.

The approximate majority of separation $\delta$ on $n$ bits, denoted $MAJ_{\delta}$ , is defined only on $x\in\{0,1\}^{n}$ with $||x||\notin((1/2-\delta)n,(1/2+\delta)n)$ , and is given by $MAJ_{\delta}(x)=1$ when $||x||\geq(1/2+\delta)n$ , and $f(x)=0$ when $||x||\leq(1/2-\delta)n$ .

Lemma 22.

If $f$ is computable by a depth-three circuit with an approximate majority of separation $\delta$ at the output, PARITY gates with fan-in at most $s$ on the middle layer, and AND gates of fan-in at most $d$ at the bottom layer, then there is an ${\mathbb{F}}_{2}$ probabilistic polynomial for $f$ with degree $d$ , sparsity $s$ , and error $\delta$ .

Proof.

An ${\mathbb{F}}_{2}$ polynomial can be viewed as a $PARITY\circ AND$ circuit, and vice versa: each monomial in such a polynomial is computing the $A N D$ of its constituent variables, and then the output of the polynomial is the ${\mathbb{F}}_{2}$ sum of these monomials, which is a $P A R I T Y$ . We thus get our probabilistic polynomial by selecting a uniformly random $PARITY\circ AND$ subcircuit which feeds into the top $MAJ_{\delta}$ gate of our circuit, and outputting it as an ${\mathbb{F}}_{2}$ polynomial. $\hfill\blacktriangleleft$

Lemma 23.

For any constant $c>0$ , if there is an ${\mathbb{F}}_{2}$ probabilistic polynomial for $f$ with degree $d$ , sparsity $s$ , and error $\delta$ , then $f$ is computable by a depth-three circuit with an approximate majority of separation $(1-c)\delta$ and fan-in $O(n/\delta)$ at the output, PARITY gates with fan-in at most $s$ on the middle layer, and AND gates of fan-in at most $d$ at the bottom layer.

If we didn’t bound the fan-in of the approximate majority gate, then the proof would be almost identical to the proof of Lemma 22. In order to bound the fan-in by $O(n)$ , we need to use a Chernoff bound:

Proof.

Randomly sample $t$ polynomials $p_{1},\ldots,p_{t}$ from the probabilistic polynomial, for a parameter $t$ to be selected later. For each $x\in\{0,1\}^{n}$ , by the Chernoff bound, the probability that less than a $(1-c)\delta$ fraction of the polynomials $p_{1},\ldots,p_{t}$ output the correct answer on $x$ is at most $exp(-c^{2}\delta t)$ . By setting $t=a\cdot n/(c^{2}\delta)$ for a sufficiently large constant $a$ , this can be made less than $2^{-n}$ . Then, by the union bound, there is a probability less than 1 that, for any $x\in\{0,1\}^{n}$ , less than a $(1-c)\delta$ fraction of the polynomials $p_{1},\ldots,p_{t}$ output the correct answer on $x$ . Hence, by the probabilistic method, there must be a choice of $p_{1},\ldots,p_{t}$ such that at least a $(1-c)\delta$ fraction give the correct answer on every $x$ . We can convert these polynomials into the desired circuit as in the proof of Theorem 22. $\hfill\blacktriangleleft$

It is interesting to note, via Lemmas 22 and 23, that we may assume that any probabilistic polynomial with constant error over $\mathbb{F}_{2}$ is the uniform distribution on only $O(n)$ different polynomials.

Appendix B SETH Predictions about Probabilistic Polynomials

Several algorithms based on the polynomial method in circuit complexity (such as [36, 37, 1, 3, 2]) have the following form: (a) we identify a “subcircuit” $C$ which, if $C$ could be evaluated on many inputs rapidly, we could solve a desired problem faster, (b) efficiently convert $C$ into a polynomial (probabilistic, approximate, etc.) so that the evaluation task becomes algebraic, and (c) solve the evaluation task rapidly using algebraic algorithms, such as a fast Fourier transform or a matrix multiplication. How far can this approach be pushed? A significant question has been whether this kind of approach can be used to solve CNF-SAT fast enough to refute the pesky Strong Exponential Time Hypothesis (SETH).

We observe in this section that a low-degree, low-sparsity, and low-error probabilistic polynomial for AND would have contradicted SETH (recall that we unconditionally prove in this paper that there is no such polynomial). Along the way, we demonstrate other “predictions” about polynomial representations obtained by assuming SETH.

Theorem 24.

For any commutative ring $R$ , a low-degree, low-sparsity, and low-error $R$ -probabilistic polynomial for AND (over $0/1$ ) implies that there is a universal ${\varepsilon}>0$ such that for all $k\geq 3$ , there is a non-uniform circuit family of size $2^{(1-{\varepsilon})n}$ for solving $k$ -SAT on $n$ variables.¹¹1Moreover, an efficiently-samplable probabilistic polynomial for AND with these properties would refute the original Strong Exponential Time Hypothesis. All interesting probabilistic polynomial constructions we are aware of, such as those found in the works [29, 32, 9, 5, 19, 33, 3, 2], are efficiently samplable.

The proof of Theorem 24, follows from a few claims. First, we note that a low-degree/low-sparsity/low-error probabilistic polynomial for AND over 0/1 is equivalent to low-sparsity/low-error probabilistic polynomial for $D I S J$ over 0/1. Let $R$ be any commutative ring.

Proposition 25.

For any $n,s,d,e\geq 0$ , if $AND_{n}\in\text{\bf$R$-SDE}[s,d,e]$ then $AND_{n}\circ OR_{2}\in\text{\bf$R$-SE}[s\cdot 3^{d},e]$ .

Proof.

From a probabilistic polynomial for $AND_{n}$ , substitute the exact OR of two variables, $OR(x,y)=x+y-xy$ , to get a probabilistic polynomial of the same error for $AND_{n}\circ OR_{2}$ . In this substitution, each original monomial is replaced by a product of at most $d$ polynomials, each of which is the OR of two variables and has three monomials; so the new polynomial has at most $s\cdot 3^{d}$ monomials. $\hfill\blacktriangleleft$

Next, we note that a probabilistic polynomial for disjointness ( $AND_{n}\circ OR_{2}$ ) with low sparsity and error implies the probabilistic polynomial needed for refuting SETH:

Theorem 26.

For any $n,s\geq 0$ , if $AND_{c\log n}\circ OR_{2}\in\text{\bf$R$-SE}[s,1/(8n)]$ then $OR_{n}\circ AND_{c\log n}\circ OR_{2}$ has a $R$ -probabilistic polynomial $\cal{P}$ of sparsity $O(n^{2}\cdot s^{2})$ and error $11/24$ which (when there is no error) outputs $0$ for false and a nonzero value for true.

Proof.

We define $\cal{P}$ . Let $\cal{Q}$ be the $\text{\bf$R$-SE}[s,1/(8n)]$ probabilistic polynomial for $AND_{c\log n}\circ OR_{2}$ . Draw a $q\sim\cal{Q}$ . By a union bound, with probability at least $7/8$ , $q$ correctly computes the $0$ or $1$ value of $AND_{c\log n}\circ OR_{2}$ on the $n$ different inputs which we want to compute the $O R$ of. Hence, there are $n$ polynomials $q_{1},\ldots,q_{n}$ which each have at most $s$ monomials, and we want to compute their $O R$ assuming they all evalutate to $0$ or $1$ .

The subring $R^{\prime}$ of $R$ generated by $1$ consists of all integer multiples of $1$ . It is isomorphic to ${\mathbb{Z}}/m{\mathbb{Z}}$ for some nonnegative integer $m\neq 1$ . (The $m=0$ case is when no positive integer multiple of $1$ is $0$ in $R$ , and hence $R^{\prime}={\mathbb{Z}}$ .) We consider three cases depending on whether $m=0$ , $m=2$ , or $m>2$ .

If $m=0$ , then simply $q_{1}+\cdots+q_{n}$ suffices to compute OR over ${\mathbb{Z}}$ . This has sparsity at most $n\cdot s$ , and no additional error, for a total error of $1/8$ .

If $m=2$ , then let $S_{1},S_{2}\subseteq\{1,\ldots,n\}$ be two uniformly random subsets, and we pick

-1+\prod_{i=1}^{2}\left(1+\sum_{j\in S_{i}}q_{j}\right).

This is $0$ when all the $q_{j}$ are $0$ , and each $\sum_{j\in S_{i}}q_{j}$ is $1$ with probability $1/2$ otherwise, so the entire polynomial is $1$ with probability at least $3/4$ . Hence the total error is $3/8$ , and the sparsity is $O(n^{2}\cdot s^{2})$ .

If $m>2$ , then pick uniformly and independently random $a_{1},\ldots,a_{n}\in\{1,\ldots,m\}$ , and pick the polynomial $\sum_{i=1}^{n}a_{i}q_{i}$ . If each $q_{i}=0$ then this is $0$ , and otherwise it is $0$ with probability at most $1/m$ . The sparsity is at most $n\cdot s$ , and the error is at most $1/8+1/m\leq 11/24$ . $\hfill\blacktriangleleft$

Finally, we note that a $n^{O(1)}$ -sparse constant-error probabilistic polynomial (in the sense of Theorem 26) for $OR_{n}\circ AND_{c\log n}\circ OR_{2}$ for every $c\geq 1$ would refute (non-uniform) SETH:

Theorem 27.

Suppose there is a fixed $k\geq 1$ such that for all $c\geq 1$ , $OR_{n}\circ AND_{c\log n}\circ OR_{2}\in\text{\bf$R$-SE}[n^{k},11/24]$ . Then there is a universal ${\varepsilon}>0$ such that for all $k\geq 3$ , there is a non-uniform circuit family of size $2^{(1-{\varepsilon})n}$ for solving $k$ -SAT on $n$ variables.

Proof (sketch).

The Orthogonal Vectors problem asks: given a collection of bit vectors, is there a pair of them which are orthogonal? We show that the hypothesis implies that for all $c\geq 1$ , the Orthogonal Vectors problem with $n$ vectors from $\{0,1\}^{c\log n}$ can be solved in $n^{2-\delta}$ time with $O(n^{2-\delta})$ advice, for a universal $\delta>0$ . The theorem then follows from a (by now) standard reduction (as in [35, 3]).

By our hypothesis, we can store (as non-uniform advice) a probabilistic polynomial ${\cal P}$ with sparsity $n^{\alpha k}$ and error $11/24$ for the function $OR_{n^{\alpha}}\circ AND_{c\log n}\circ OR_{2}$ , where $\alpha\in(0,1/k)$ is a sufficiently small constant. Note this requires storing a distribution of only $O(n)$ distinct $n^{\alpha k}$ -sparse polynomials (see Section A).

To solve Orthogonal Vectors on $n$ vectors, we proceed just as in Abboud-Williams-Yu [1]: partition the set of vectors into $O(n^{1-\alpha/2})$ parts, with each part containing at most $n^{\alpha/2}$ vectors. For all $O(n^{2-\alpha})$ pairs $(P_{1},P_{2})$ of parts, we want to evaluate the probabilistic polynomial ${\cal P}$ on all pairs of vectors in the union $P_{1}\cup P_{2}$ of the two parts. This gives an input of $n^{\alpha}$ pairs of vectors; feeding this input to a polynomial $p\sim{\cal P}$ , we can estimate whether there is an orthogonal pair among vectors in $P_{1}\cup P_{2}$ .

The problem of solving Orthogonal Vectors now reduces to: evaluate a random $p\sim{\cal P}$ with sparsity $n^{\alpha k}$ , on $O(n^{2-\alpha})$ different pairs of input vectors. When $\alpha k<0.3$ (for example), this problem can be solved in $n^{2-\alpha+o(1)}$ time, via fast rectangular matrix multiplication. To have a high probability of success, we only need to sample $O(\log n)$ random $p$ from our probabilistic polynomial ${\cal P}$ . $\hfill\blacktriangleleft$

For similar reasons, many types of low sparsity polynomial representations for depth-3 ${\sf AC}^{0}$ , including low sparsity approximate polynomials over ${\mathbb{R}}$ , would refute this form of SETH.

Appendix C Sparsity for $-$ 1/1 Probabilistic Polynomials

In many settings, the choice of basis (which field element corresponds to “true” and which to “false”) can have a large impact on the sparsity of polynomial representations of Boolean functions. For instance, over $\{0,1\}$ , computing XOR exactly on $n$ inputs requires $\Omega(2^{n})$ monomials, whereas over $\{-1,1\}$ it requires only one monomial.

We will show in this section that the choice of basis can make a big difference in the sparsity of probabilistic polynomials for the complements function (defined in Section 2). Although Theorem 16 holds over any choice of basis for our polynomials, our proof of Theorem 5 relies heavily on the fact that we are working over a basis where “false” corresponded to 0, and hence setting a variable to false eliminated every monomial that it was a part of.

Suppose we look instead at the $\{-1,1\}$ basis, with the standard convention that “true” corresponds to $-1$ . In this setting, the statement of Theorem 5 is no longer true, since we can construct a probabilistic polynomial for the complements function with polynomial sparsity and polynomial error:

Lemma 28.

For any ${\varepsilon}>0$ , and any commutative ring $R$ which doesn’t have characteristic 2, the AND of $n$ variables has a probabilistic polynomial over $R$ of sparsity $O(1/{\varepsilon})$ and error ${\varepsilon}$ over the basis $\{-1,1\}$ .

Proof.

Pick a random subset $S\subseteq\{1,\ldots,n\}$ , and consider the polynomial $E(x_{1},\ldots,x_{n})=1+(-1)^{|S|}\prod_{i\in S}x_{i}$ . If $AND(x)=1$ , then some nonempty subset of the $x_{i}$ ’s are $1$ ’s, and the probability that $\prod_{i\in S}x_{i}=(-1)^{|S|}$ is $1/2$ . Hence, in this case, $E=0$ with probability $1/2$ . If $AND(x)=-1$ , then all $x_{i}$ ’s are $-1$ , and we always have that $E=2$ . Hence, if we take $k=\lceil\log(1/{\varepsilon})\rceil$ independent copies $E_{1},\ldots,E_{k}$ of the above $E$ , then the polynomial $P:=1-\frac{1}{2^{k-1}}\prod_{i=1}^{k}E_{i}$ has the desired properties. $\hfill\blacktriangleleft$

Theorem 29.

For any ${\varepsilon}>0$ , and any commutative ring $R$ which doesn’t have characteristic 2, $C O M P$ has a probabilistic polynomial over $R$ of sparsity $O(1/{\varepsilon})$ and error ${\varepsilon}$ over the basis $\{-1,1\}$ .

Proof.

The XOR of two variables is a single monomial over the $\{-1,1\}$ basis, and so we simply substitute these monomials into the probabilistic polynomial from Lemma 28. $\hfill\blacktriangleleft$

Theorem 29 with ${\varepsilon}=1/{\text{poly}}(n)$ shows that $C O M P$ has a probabilistic polynomial with polynomial sparsity and any polynomial error over the $\{-1,1\}$ basis, whereas Theorem 5 says that no such probabilistic polynomial can exist over the $\{0,1\}$ basis.

Finally, we remark that just as in the $\{0,1\}$ case, we can still construct probabilistic polynomials for OR over $\{-1,1\}$ that achieve any two of the three goals of low sparsity, low degree, and low error:

$\blacksquare$

Low sparsity and error comes from Theorem 29,
$\blacksquare$

Low degree and error still follows from Theorem 2 with $t=(\log n)/(p-1)$ , since degree and error are unchanged upon changing basis, and
$\blacksquare$

Low sparsity and degree still follows from Theorem 2 with $t=1$ , since then the degree is still $(p-1)$ and so by exploiting multilinearity, the sparsity must be $O(n^{p-1})$ .

Appendix D Separation Between Probabilistic Rank and Probabilistic Sparsity

Definition 30.

Let $a:[2^{n/2}]\to\{0,1\}^{n/2}$ be the canonical bijection, $R$ be any ring, $f:\{0,1\}^{n}\to\{0,1\}$ be any Boolean function, and $A\cup B=\{x_{1},\ldots,x_{n}\}$ be any partition of the input variables to $f$ into two sets of size $|A|=|B|=n/2$ . For $x,y\in\{0,1\}^{n/2}$ , let $f(A|_{x},B|_{y})$ denote $f$ evaluated at the assignment where the variables in $A$ are set to $x$ and the variables in $B$ are set to $y$ . The truth table matrix $M_{f,A,B}$ of $f$ is the $2^{n/2}\times 2^{n/2}$ matrix given by $M_{f,A,B}(i,j)=f(A|_{a(i)},B|_{b(j)})$ .

For ${\varepsilon}\geq 0$ , and any $2^{n/2}\times 2^{n/2}$ matrix $M$ over $R$ , the ${\varepsilon}$ -probabilistic rank of $M$ is the minimum $r$ for which there is a distribution ${\cal D}$ on $2^{n/2}\times 2^{n/2}$ matrices over $R$ of rank at most $r$ such that for all $i, j$ :

\Pr_{D\sim\cal{D}}[D(i,j)\neq M(i,j)]\leq{\varepsilon}.

The ${\varepsilon}$ -probabilistic rank of $f$ is the maximum, over all such partitions $A\cup B$ , of the ${\varepsilon}$ -probabilistic rank of $M_{f,A,B}$ over $R$ .

In this section, we compare the probabilistic sparsity of a function with its probabilistic rank. We know that the probabilistic sparsity is always an upper bound on the probabilistic rank:

Lemma 31 ([4] Corollary 2.1).

For any ring $R$ , if $f\in\text{\bf$R$-SE}[m,{\varepsilon}]$ , then $f$ has ${\varepsilon}$ -probabilistic rank at most $m$ over $R$ .

Using our probabilistic polynomial sparsity lower bounds, we are able to prove a separation in two different settings.

D.1 Complements

First, by using Theorem 5, we can give a simple explicit function, the complements function (defined in Section 2), which has substantially lower probabilistic rank than probabilistic sparsity over ${\mathbb{F}}_{2}$ .

Recall from Theorem 5 that $C O M P$ requires superpolynomial sparsity for any polynomial error over ${\mathbb{F}}_{2}$ . In contrast, $E Q$ is known to have low probabilistic rank, by simulating a communication complexity protocol for disjointness (similar to our proof of Theorem 29), and it follows that $C O M P$ has low probabilistic rank as well:

Proposition 32 ([4] Lemma D.2).

$E Q$ has ${\varepsilon}$ -probabilistic rank $O(1/{\varepsilon})$ over any field, including ${\mathbb{F}}_{2}$ .

Proposition 33.

$C O M P$ has ${\varepsilon}$ -probabilistic rank $O(1/{\varepsilon})$ over any field, including ${\mathbb{F}}_{2}$ .

Proof.

The communication matrix of $C O M P$ is the same as that of $E Q$ , up to a permutation of columns that swaps $y$ with $\bar{y}$ for each $y\in\{0,1\}^{\log n}$ . $\hfill\blacktriangleleft$

Corollary 34.

$C O M P$ has polynomial rank for any polynomial error over ${\mathbb{F}}_{2}$ , whereas it requires superpolynomial sparsity for any polynomial error over ${\mathbb{F}}_{2}$ .

Proof.

Follows from Corollary 33 and Theorem 5. $\hfill\blacktriangleleft$

D.2 Compositions with high fan-in XORs

Second, we combine Corollary 19 with Lemma 31 and a simple upper bound construction to prove a separation between probabilistic rank and probabilistic sparsity for functions of the form $f\circ XOR_{k}$ , over ${\mathbb{F}}_{2}$ .

Let $f:\{0,1\}^{n}\to\{0,1\}$ be any Boolean function, and ${\varepsilon}>0$ , $k\geq 1$ be any values. Let $d$ be the smallest value such that $f$ has an ${\mathbb{F}}_{2}$ -probabilistic polynomial of error ${\varepsilon}$ and degree $d$ . We will prove a separation for the $n k$ -variate function

f\circ XOR_{k}(x_{1,1},\ldots,x_{n,k}):=f\left(\bigoplus_{j=1}^{k}x_{1,j},% \ldots,\bigoplus_{j=1}^{k}x_{n,j}\right).

Proposition 35.

Over ${\mathbb{F}}_{2}$ , the function $f\circ XOR_{k}$ has ${\varepsilon}$ -probabilistic rank at most

r:=\sum_{i=0}^{2d}\binom{2n}{i}.

Proof.

Suppose we have partitioned our $n k$ variables into parts $A, B$ , and for $1\leq i\leq n$ , let $A_{i}\subseteq A$ and $B_{i}\subseteq B$ be the corresponding partition of the variables $x_{i,1},\ldots,x_{i,k}$ . Define $y_{i}:=\bigoplus_{x_{i,j}\in A_{i}}x_{i,j}\in\{0,1\}$ and $z_{i}:=\bigoplus_{x_{i,j}\in B_{i}}x_{i,j}\in\{0,1\}$ , and note that $y_{i}$ and $z_{i}$ are each functions of the variables in only one of the parts.

By assumption, $f$ has an ${\mathbb{F}}_{2}$ -probabilistic polynomial $P$ of error ${\varepsilon}$ and degree $d$ . Suppose we draw a $p\in P$ , and consider the expression

\displaystyle p(\chi(y_{1},z_{1}),\chi(y_{2},z_{2}),\ldots,\chi(y_{n},z_{n})),

(5)

where $\chi:\{0,1\}^{2}\to\{0,1\}$ is the exact degree-2 polynomial for computing XOR on two variables. Expression (5) is therefore a probabilistic expression for $f\circ XOR_{k}$ , and it is a degree $2d$ polynomial in the $2n$ variables $y_{1},\ldots,y_{n}$ and $z_{1},\ldots,z_{n}$ , where each depends only on the variables in one side of the partition. The probabilistic rank is hence at most $r$ by Lemma 31. $\hfill\blacktriangleleft$

Meanwhile, recall from Corollary 19 that $f\circ XOR_{k}$ requires ${\mathbb{F}}_{2}$ -probabilistic sparsity ${\varepsilon}\cdot k^{d}$ . Combining these two bounds can show a separation, for instance:

Theorem 36.

For any

k>\left(\frac{n\cdot e}{d}\right)^{2},

the ${\varepsilon}$ -probabilistic sparsity of $f\circ XOR_{k}$ is strictly greater than its ${\varepsilon}$ -probabilistic rank. In particular, as $k$ increases, the ${\varepsilon}$ -probabilistic sparsity grows unboundedly while ${\varepsilon}$ -probabilistic rank remains fixed.

Proof.

With this bound on $k$ , we have

k^{d}>\left(\frac{n\cdot e}{d^{\prime}}\right)^{2d}\geq\sum_{i=0}^{2d}\binom{2% n}{i},

as desired, where the second inequality is a standard bound on binomial coefficients. We can make the separation arbitrarily big by making $k$ sufficiently large, since increasing $k$ does not change the rank upper bound (the bound from Proposition 35 has no dependence on $k$ ), but increases the sparsity lower bound to as large as we would like. $\hfill\blacktriangleleft$

We can now apply known upper and lower bounds to get separations for other explicit functions. Consider, for instance, the majority function:

Theorem 37 ([29, 32]).

${\mathbb{F}}_{2}$ -probabilistic polynomials for $M A J$ on $n$ bits with error ${\varepsilon}$ require degree $\Omega(\sqrt{n\log(1/{\varepsilon})})$ .

Theorem 38 ([3]).

There is an ${\mathbb{F}}_{2}$ -probabilistic polynomial for $M A J$ on $n$ bits with error ${\varepsilon}$ and degree $O(\sqrt{n\log(1/{\varepsilon})})$ .

Combining these with Theorem 36 yields:

Theorem 39.

There is a constant $c>0$ such that, for any

k>\frac{c}{\log(1/{\varepsilon})}\cdot n,

the ${\varepsilon}$ -probabilistic sparsity of $MAJ\circ XOR$ over ${\mathbb{F}}_{2}$ , where $M A J$ has fan-in $n$ and each $X O R$ has fan-in $k$ , is strictly greater than its ${\varepsilon}$ -probabilistic rank. In particular, as $k$ increases, the ${\varepsilon}$ -probabilistic sparsity grows unboundedly while ${\varepsilon}$ -probabilistic rank remains fixed.

Appendix E Relationship between Degree, Sparsity, and Rank

Since the degree and sparsity of a probabilistic polynomial are related concepts, we are able to bound one using bounds on the other. Bounding the degree of a probabilistic polynomial will bound the sparsity of the polynomial, since there are only so many monomials of a given degree:

Proposition 40.

For any decision problem ${\cal F}$ , any $d,{\varepsilon}>0$ , and any prime $p$ , if ${\cal F}\in\text{${\mathbb{F}}_{p}$-DE}[d,{\varepsilon}]$ , then ${\cal F}\in\text{${\mathbb{F}}_{p}$-SDE}[m,d,{\varepsilon}]$ , where

m:=\sum_{i=0}^{d}\binom{n}{i}\leq(1+n)^{d}.

Proof.

Let $P$ be a $\text{${\mathbb{F}}_{p}$-DE}[d,{\varepsilon}]$ representation of ${\cal F}$ . Since $x_{i}^{2}=x_{i}$ whenever $x_{i}\in\{0,1\}$ , we may assume that any polynomial in the support of $P$ is multilinear. Hence, each polynomial in the support of $P$ is a multilinear polynomial of degree at most $d$ , which has at most $m$ monomials. $P$ is therefore a $\text{${\mathbb{F}}_{p}$-SDE}[m,d,{\varepsilon}]$ representation of ${\cal F}$ . $\hfill\blacktriangleleft$

Perhaps more surprisingly, decision problems with low sparsity probabilistic polynomials must have low degree representations as well:

Proposition 41.

For any decision problem ${\cal F}$ , any $m,{\varepsilon}>0$ , and any prime $p$ , if ${\cal F}\in\text{${\mathbb{F}}_{p}$-SE}[m,{\varepsilon}]$ , then ${\cal F}\in\text{${\mathbb{F}}_{p}$-DE}[d,2{\varepsilon}]$ , where

d:=(p-1)\log_{p}(m/{\varepsilon}).

Proof.

Let $P$ be a $\text{${\mathbb{F}}_{p}$-SE}[m,{\varepsilon}]$ representation of ${\cal F}$ . We design a new probabilistic polynomial for ${\cal F}$ , defined as follows:

1.

Draw a polynomial $p(x_{1},\ldots,x_{n})$ from $P$ . We can write $p$ as a sum of monomials as $p(x_{1},\ldots,x_{n})=\sum_{i=1}^{m^{\prime}}a_{i}\cdot m_{i}$ , where $m^{\prime}\leq m$ , each $a_{i}\in\mathbb{F}_{p}$ , and each $m_{i}$ is a monomial.
2.

Recall by Theorem 2 that $\text{AND}\in\text{${\mathbb{F}}_{p}$-SDE}[O(n^{(p-1)t}),(p-1)t,1/p^{t}]$ for all $t\geq 1$ . Draw a polynomial $q$ from the corresponding probabilistic polynomial on $n$ variables with $t=\log_{p}(m/{\varepsilon})$ .
3.

Each monomial $m_{i}$ computes the AND of some set $x_{i,1},x_{i,2},\ldots,x_{i,n_{i}}$ of variables. It can therefore be probabilistically computed as $q(x_{i,1},\ldots,x_{i,n_{i}},1,1,\ldots,1)$ , where there are $(n-n_{i})$ ‘1’s. We can thus output the polynomial

$\sum_{i=1}^{m^{\prime}}a_{i}\cdot q(x_{i,1},\ldots,x_{i,n_{i}},1,1,\ldots,1).$

This has degree $(p-1)t=d$ , and by a union bound over the error of the replacement for each monomial, it has error at most ${\varepsilon}+m/p^{t}=2{\varepsilon}$ . $\hfill\blacktriangleleft$

At most a logarithmic factor is lost when converting between these two bounds; if ${\cal F}$ has probabilistic degree $d$ then it can have probabilistic sparsity $\leq n^{d}$ , and if it has probabilistic sparsity $n^{d}$ then it can have probabilistic degree $\leq d\cdot(p-1)\cdot\log_{p}(n)$ .

Proposition 41 allows us to prove probabilistic sparsity lower bound from probabilistic degree lower bounds, which are much more common in the literature. For instance:

Corollary 42.

If $f:\{0,1\}^{n}\to\{0,1\}$ is any Boolean function which requires ${\mathbb{F}}_{2}$ -probabilistic degree $d$ for error ${\varepsilon}$ , then $f$ requires ${\mathbb{F}}_{2}$ -probabilistic sparsity ${\varepsilon}\cdot 2^{d}$ for error ${\varepsilon}$ .

Proof.

A lower sparsity representation would yield a probabilistic degree less than $d$ using Proposition 41. $\hfill\blacktriangleleft$

Combining Proposition 40 with Lemma 31, we see that if $f$ has a probabilistic polynomial of degree $d$ , then it has probabilistic rank at most $\sum_{i=1}^{d}\binom{n}{i}$ . Applying ideas from the recent breakthrough work of Croot, Lev, and Pach [13] on Roth’s theorem over ${\mathbb{Z}}_{4}^{n}$ , we can derive an improved bound on the rank from the degree:

Proposition 43.

Let $n$ be even. Let $f:\{0,1\}^{n}\to\{0,1\}$ be any Boolean function with a probabilistic polynomial of degree $d$ and error ${\varepsilon}$ . Then, the ${\varepsilon}$ -probabilistic rank of $f$ is at most $m:=2\sum_{i=0}^{\lfloor d/2\rfloor}\binom{n/2}{i}$ .

We note that Dvir and Edelman [14] also used this observation when studying the rigidity of a certain family of matrices. Proposition 43 will follow immediately from Theorem 45 below, but first we need a definition.

Definition 44.

Let $R$ be and ring, $n$ be an even integer, and $a:[2^{n/2}]\rightarrow\{0,1\}^{n/2}$ be any bijection between the two sets. For any function $f:\{0,1\}^{n}\to R$ , the Boolean rank of $f$ over $R$ is the minimum $r$ such that there are matrices $A, B$ of dimensions $2^{n/2}\times r$ and $r\times 2^{n/2}$ with $(A\cdot B)[i,j]=f(a(i),a(j))$ for all $i, j$ .

Theorem 45.

Let $n$ be even. Let $P$ be a multilinear polynomial in $n$ variables with degree at most $d$ over a ring $R$ . Then, the Boolean rank of $P$ over $R$ is at most $m:=2\sum_{i=0}^{\lfloor d/2\rfloor}\binom{n/2}{i}$ .

Proof.

Think of $P$ as being over two sets of variables, $\{x_{1},\ldots,x_{n/2}\}$ and $\{y_{1},\ldots,y_{n/2}\}$ . We begin with some notation: For sets $I,J\subseteq[n/2]$ , let $x^{I}$ denote $\prod_{i\in T}x_{i}$ , and let $c_{I,J}$ be the coefficient in $P$ of the monomial $x^{I}y^{J}$ . Let $I_{1},\ldots,I_{t}$ be a list of all subsets of $[n/2]$ of cardinality at most $\lfloor d/2\rfloor$ . Given a monomial $M$ and a variable assignment $A\in\{0,1\}^{n/2}$ , we let $M|_{A}$ denote the evaluation of $M$ on the assignment $A$ .

We want to prepare two matrices $A$ ( $2^{n/2}\times r$ ) and $B$ ( $r\times 2^{n/2}$ ), and prove for all $i, j$ that

(A\cdot B)[i,j]=P(a(i),a(j)).

For each $i\in[2^{n/2}]$ , we make a $2t$ -dimensional vector

A[i,:]:=\left(x^{I_{1}}|_{a(i)},\ldots,x^{I_{t}}|_{a(i)},\sum_{J\subseteq[n/2]% ,|J|\leq|I_{1}|}c_{J,I_{1}}x^{J}|_{a(i)},\ldots,\sum_{J\subseteq[n/2],|J|\leq|% I_{t}|}c_{J,I_{t}}x^{J}|_{a(i)}\right).

Correspondingly, for all $j\in[2^{n/2}]$ we make an $2t$ -dimensional vector

B[:,j]:=\left(\sum_{J\subseteq[n/2],|J|<|I_{1}|}c_{I_{1},J}y^{J}|_{a(j)},% \ldots,\sum_{J\subseteq[n/2],|J|<|I_{t}|}c_{I_{t},J}y^{J}|_{a(j)},y^{I_{1}}|_{% a(j)},\ldots,y^{I_{t}}|_{a(j)}\right).

Since every $d$ -set of $\{x_{1},\ldots,x_{n/2},y_{1},\ldots,y_{n/2}\}$ contains either at most $\lfloor d/2\rfloor$ $x_{i}$ ’s or at most $\lfloor d/2\rfloor$ $y_{i}$ ’s, we observe as desired that

\langle A[i,:],B[:,j]\rangle=\sum_{I,J\in\{I_{1},\ldots,I_{t}\}}a_{I,J}x^{I}|_% {a(i)}y^{J}|_{a(j)}=P(a(i),a(j)).\

$\hfill\blacktriangleleft$

Appendix F Missing Proof

Lemma 46.

There is a setting to $\alpha,\beta,$ and $\gamma$ which satisfies equations (1), (2), (3), and (4) from the proof of Theorem 16.

Proof.

Recall that $d, e$ are fixed, and we may set $c\geq 1$ to be arbitrarily large.

Setting

\beta:=\sqrt{\frac{2\ln(2)e\gamma}{c}},

inequality (2) immediately holds. Setting $\gamma:=\alpha+2$ , we have $\frac{\alpha+1}{\gamma}=\frac{\alpha+1}{\alpha+2}<1$ , so (3) holds. To satisfy (1), we need

\frac{\alpha^{2}(1+\alpha)}{3\ln(2)\gamma^{2}}=\frac{\alpha^{2}+\alpha^{3}}{% \ln(2)(3\alpha^{2}+12\alpha+12)}\geq d+e.

Setting $\alpha>1$ to be sufficiently large satisfies this inequality. Finally, we turn to (4). Substituting in our above solution for $\beta$ , we have

\frac{(1-\beta)c}{\gamma}=\frac{c-\sqrt{2\ln(2)}e^{1/2}\gamma^{1/2}c^{1/2}}{% \gamma}.

As $\gamma$ is only a function of $d$ and $e$ which are fixed constants, and $c$ can be made arbitrarily large, the above expression can be made larger than $e$ . This completes the proof. $\hfill\blacktriangleleft$

Appendix G Generalization to a-b basis

In this section, we generalize Theorem 16 to hold for a general basis, where we can choose any two elements $a,b\in R$ from our commutative ring $R$ to represent true and false. What we prove is:

$\blacktriangleright$ Reminder of Theorem 3. For every commutative ring $R$ , every pair of distinct $a,b\in R$ , and all $d\geq 1$ and $e>0$ , there is a $c\geq 1$ such that for all sufficiently large $m$ , the AND function on $m$ variables is not in $\text{\bf$R$-SDE}_{a,b}[2^{dm/c},m/c,2^{-em/c}]$ .

We begin by recalling and proving the version of the Schwartz-Zippel Lemma, Lemma 13, which we use in the proof of Theorem 16.

$\blacktriangleright$ Reminder of Lemma 13. Let $p$ be a nonzero multilinear $n$ -variate polynomial over any commutative ring $R$ , of total degree at most $d$ . Then $\Pr_{x\sim\{0,1\}^{n}}\left[p(x)\neq 0\right]\geq 1/2^{d}$ .

Proof.

Let $d^{\prime}\leq d$ be the total degree of $p$ . Without loss of generality, we may assume $p$ has the monomial $x_{1}x_{2}\cdots x_{d^{\prime}}$ with a nonzero coefficient. Consider any $\{0,1\}$ assignment to the remaining $n-d^{\prime}$ variables. The resulting polynomial $p^{\prime}$ still has the monomial $x_{1}x_{2}\cdots x_{d^{\prime}}$ with the same coefficient, and so it is a nonzero multilinear polynomial of degree $d^{\prime}$ in $d^{\prime}$ variables. It remains to show that $p^{\prime}$ is nonzero on at least one of the $2^{d^{\prime}}$ assignments to the remaining $d^{\prime}$ variables, which will imply as desired that $\Pr_{x\sim\{0,1\}^{n}}\left[p(x)\neq 0\right]\geq 1/2^{d}$ .

Let $m$ be a monomial in $p^{\prime}$ of lowest degree with a nonzero coefficient, let $c\neq 0$ be its coefficient, and let $S\subseteq\{x_{1},\ldots,x_{d^{\prime}}\}$ be the set of variables in $m$ (it may be that $m$ is the constant term, in which case $S=\emptyset$ ). Then, consider the assignment which sets the variables in $S$ to $1$ , and the remaining variables to $0$ . Our polynomial $p^{\prime}$ must evaluate to $c$ on this assignment, since monomial $m$ will have all variables set to $1$ , and all other monomials will have at least one variable set to $0$ by how we chose $m$ . This is the desired assignment since $c\neq 0$ . $\hfill\blacktriangleleft$

By carefully examining our proof of Theorem 16, we see that we only used the fact that we were working with the basis $\{0,1\}$ in three ways:

$\blacksquare$

We need that substituting a value in for one variable in a monomial decreases the degree of that monomial by at least one; this is true regardless of what basis we are working over.
$\blacksquare$

We need that the correct output when the AND is false is $0$ , and that the correct output when AND is true is any nonzero value; by subtracting $b$ from our polynomial we can always assume this is the case, while changing the number of monomials by at most one and leaving the degree unchanged.
$\blacksquare$

We need the Schwartz-Zippel lemma (Lemma 13) to hold for our choice of basis.

If we could generalize Lemma 13 to hold for any basis $\{a,b\}$ , then our same proof would work as stated to prove Theorem 3. Unfortunately this is impossible in general: If $a,b\in R$ are such that $a-b$ is a zero-divisor, then Lemma 13 does not hold when we use the $\{a,b\}$ basis. For example, working over $R={\mathbb{Z}}/6{\mathbb{Z}}$ , with $(a,b)=(0,3)$ , the single variable linear polynomial $p(x)=2x$ is a nonzero polynomial, but is zero on all inputs from the basis.

Although it is possible to salvage this method, at least for most choices of $\{a,b\}$ , we will instead proceed in a different way, which does not rely on how our original proof works, except for the fact (as remarked) that Theorem 3 holds when the input basis (what values we input to the polynomial for true or false) is $\{0,1\}$ , and the output basis (what values the polynomial outputs for true or false) is $\{0,b\}$ for any nonzero $b\in R$ .

We begin with the input basis $\{0,b\}$ for any nonzero $b\in R$ , where true maps to $b$ but false still maps to $0$ .

Lemma 47.

For every commutative ring $R$ , every nonzero $a\in R$ , and all $d\geq 1$ and $e>0$ , there is a $c\geq 1$ such that for all sufficiently large $m$ , the AND function on $m$ variables is not in $\text{\bf$R$-SDE}_{0,b}[2^{dm/c},m/c,2^{-em/c}]$ .

Proof.

For a given $d$ and $e$ , we make the same choice of $c\geq 1$ as in Theorem 16. Assume to the contrary that AND is in $\text{\bf$R$-SDE}_{0,b}[2^{dm/c},m/c,2^{-em/c}]$ , and let $p$ be the corresponding probabilistic polynomial. We will convert $p$ into a $\text{\bf$R$-SDE}_{0,1}[2^{dm/c},m/c,2^{-em/c}]$ representation, which will contradict Theorem 16.

Draw a polynomial $q$ from $p$ , and consider any nonzero monomial $\mathfrak{m}$ of $q$ , with degree $\mathfrak{d}$ and nonzero coefficient $\mathfrak{a}$ . If any of the variables in $\mathfrak{m}$ is assigned to $0$ , then $\mathfrak{m}$ evaluates to $0$ , and otherwise it evaluates to $\mathfrak{a}\cdot b^{\mathfrak{d}}$ .

Consider the new polynonial $q^{\prime}$ , given by

q^{\prime}(x)=\sum_{\mathfrak{a}\cdot\mathfrak{m}\in q}\mathfrak{a}\cdot b^{% \mathfrak{d}}\cdot\mathfrak{m}.

In other words, we have multiplied each monomial $\mathfrak{m}$ from $q$ by $b^{\mathfrak{d}}$ . Now, for any $x\in\{0,1\}^{n}$ , let $x^{\prime}\in\{0,b\}^{n}$ be given by $x^{\prime}_{i}=b\cdot x_{i}$ . Then we always have that $q^{\prime}(x)=q(x^{\prime})$ . Since $x$ corresponds to the same true/false assignment over the $\{0,1\}$ basis as $x^{\prime}$ does over the $\{0,b\}$ basis, this means that the resulting distribution on $q^{\prime}$ is a probabilistic polynomial with input basis $\{0,1\}$ and output basis $\{0,b\}$ for AND with the same sparsity, degree, and error as $p$ , as desired. $\hfill\blacktriangleleft$

Finally, we generalize to all input bases $\{a,b\}$ .

Proof of Theorem 3.

For a given $d\geq 1$ and $e>0$ , let $c(d,e)$ be the choice of $c$ which is made by Lemma 47. We will choose $c=c(d+1,e)$ here. Similar to before, our plan is to convert from a $\text{\bf$R$-SDE}_{a,b}[2^{dm/c},m/c,2^{-em/c}]$ representation to a $\text{\bf$R$-SDE}_{0,b-a}[2^{(d+1)m/c},m/c,2^{-em/c}]$ representation, and then to invoke Lemma 47.

Assume to the contrary that AND is in $\text{\bf$R$-SDE}_{a,b}[2^{dm/c},m/c,2^{-em/c}]$ , and let $p$ be the corresponding probabilistic polynomial. Draw a polynomial $q$ from $p$ , and consider the new polynomial $q^{\prime}$ given by

q^{\prime}(x_{1},x_{2},\ldots,x_{n})=q(x_{1}+a,x_{2}+a,\ldots,x_{n}+a).

Our desired probabilistic polynomial over the $\{0,b-a\}$ input basis will be the resulting distribution on $q^{\prime}$ . The polynomial $q^{\prime}$ clearly has the same degree as $q$ , and errs on the same true/false inputs as $q$ .

Consider any nonzero monomial $\mathfrak{m}$ of $q$ , with degree $\mathfrak{d}$ and nonzero coefficient $\mathfrak{a}$ . Each variable in $\mathfrak{m}$ has been replaced by the sum of two terms, and so $\mathfrak{m}$ has been replaced by $2^{\mathfrak{d}}$ monomials when expanded out. Since $p$ had degree at most $m/c$ , we have that $2^{\mathfrak{d}}\leq 2^{m/c}$ , and so since $q$ had at most $2^{dm/c}$ monomials, we have that $q^{\prime}$ has at most $2^{dm/c}\cdot 2^{m/c}=2^{(d+1)m/c}$ monomials. Our probabilistic polynomial is therefore a $\text{\bf$R$-SDE}_{0,b-a}[2^{(d+1)m/c},m/c,2^{-em/c}]$ representation, as desired. $\hfill\blacktriangleleft$

We note finally that, although we have been limiting ourselves to the same input basis $\{a,b\}$ for all of the $n$ variables, we could actually choose a different input basis for each variable, and then another output basis like usual, and the above proof still holds.

[bib.bib1] [1] Amir Abboud, Ryan Williams, and Huacheng Yu. More applications of the polynomial method to algorithm design. In SODA, pages 218–230, 2015.

[bib.bib2] [2] Josh Alman, Timothy M Chan, and Ryan Williams. Polynomial representations of threshold functions and algorithmic applications. In FOCS, pages 467–476, 2016. doi:10.1109/FOCS.2016.57.

[bib.bib3] [3] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. In FOCS, pages 136–150, 2015. doi:10.1109/FOCS.2015.18.

[bib.bib4] [4] Josh Alman and Ryan Williams. Probabilistic rank and matrix rigidity. In STOC, pages 641–652, 2017. doi:10.1145/3055399.3055484.

[bib.bib5] [5] James Aspnes, Richard Beigel, Merrick Furst, and Steven Rudich. The expressive power of voting polynomials. Combinatorica, 14(2):135–148, 1994. doi:10.1007/BF01215346.

[bib.bib6] [6] David A Mix Barrington, Richard Beigel, and Steven Rudich. Representing boolean functions as polynomials modulo composite numbers. Computational Complexity, 4(4):367–382, 1994. doi:10.1007/BF01263424.

[bib.bib7] [7] Saugata Basu, Nayantara Bhatnagar, Parikshit Gopalan, and Richard J. Lipton. Polynomials that sign represent parity and descartes’ rule of signs. Computational Complexity, 17(3):377–406, 2008. doi:10.1007/S00037-008-0244-2.

[bib.bib8] [8] Paul Beame and Trinh Huynh. Multiparty communication complexity and threshold circuit size of $\sf{AC}^{0}$ . SIAM Journal on Computing, 41(3):484–518, 2012. doi:10.1137/100792779.

[bib.bib9] [9] Richard Beigel, Nick Reingold, and Daniel Spielman. The perceptron strikes back. In Structure in Complexity Theory Conference, pages 286–291. IEEE, 1991. doi:10.1109/SCT.1991.160270.

[bib.bib10] [10] Chris Calabro, Russell Impagliazzo, and Ramamohan Paturi. The complexity of satisfiability of small depth circuits. In Parameterized and Exact Complexity (IWPEC), pages 75–85, 2009. doi:10.1007/978-3-642-11269-0_6.

[bib.bib11] [11] Timothy M Chan and Ryan Williams. Deterministic apsp, orthogonal vectors, and more: Quickly derandomizing razborov-smolensky. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1246–1255. Society for Industrial and Applied Mathematics, 2016. doi:10.1137/1.9781611974331.CH87.

[bib.bib12] [12] Arkadev Chattopadhyay. Discrepancy and the power of bottom fan-in in depth-three circuits. In FOCS, pages 449–458. IEEE, 2007. doi:10.1109/FOCS.2007.30.

[bib.bib13] [13] Ernie Croot, Vsevolod F Lev, and Péter Pál Pach. Progression-free sets in Zn. Annals of Mathematics, 185:331–337, 2017.

[bib.bib14] [14] Zeev Dvir and Benjamin L Edelman. Matrix rigidity and the croot-lev-pach lemma. Theory Of Computing, 15(8):1–7, 2019. doi:10.4086/TOC.2019.V015A008.

[bib.bib15] [15] Jiawei Gao, Russell Impagliazzo, Antonina Kolokolova, and R. Ryan Williams. Completeness for first-order properties on sparse structures with algorithmic applications. In SODA, pages 2162–2181, 2017. doi:10.1137/1.9781611974782.141.

[bib.bib16] [16] Kristoffer Arnsfelt Hansen and Vladimir V Podolskii. Polynomial threshold functions and boolean threshold circuits. Information and Computation, 240:56–73, 2015. doi:10.1016/J.IC.2014.09.008.

[bib.bib17] [17] Prahladh Harsha and Srikanth Srinivasan. On polynomial approximations to ac. Random Structures & Algorithms, 54(2):289–303, 2019. doi:10.1002/RSA.20786.

[bib.bib18] [18] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. JCSS, 62(2):367–375, 2001. doi:10.1006/JCSS.2000.1727.

[bib.bib19] [19] Swastik Kopparty and Srikanth Srinivasan. Certifying polynomials for ACˆ0(parity) circuits, with applications. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2012, pages 36–47, 2012. doi:10.4230/LIPICS.FSTTCS.2012.36.

[bib.bib20] [20] Matthias Krause and Pavel Pudlák. On the computational power of depth-2 circuits with threshold and modulo gates. Theoretical Computer Science, 174(1-2):137–156, 1997. doi:10.1016/S0304-3975(96)00019-9.

[bib.bib21] [21] Matthias Krause and Pavel Pudlák. Computing boolean functions by polynomials and threshold circuits. computational complexity, 7(4):346–370, 1998. doi:10.1007/S000370050015.

[bib.bib22] [22] Kasper Green Larsen and R. Ryan Williams. Faster online matrix-vector multiplication. In SODA, pages 2182–2189, 2017. doi:10.1137/1.9781611974782.142.

[bib.bib23] [23] Daniel Lokshtanov, Ramamohan Paturi, Suguru Tamaki, R. Ryan Williams, and Huacheng Yu. Beating brute force for systems of polynomial equations over finite fields. In SODA, pages 2190–2202, 2017. doi:10.1137/1.9781611974782.143.

[bib.bib24] [24] Shachar Lovett and Srikanth Srinivasan. Correlation bounds for poly-size $\sf{AC}^{0}$ circuits with $n^{1-o(1)}$ symmetric gates. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 640–651. Springer, 2011.

[bib.bib25] [25] Raghu Meka, Oanh Nguyen, and Van Vu. Anti-concentration for polynomials of independent random variables. arXiv preprint, 2015. arXiv:1507.00829.

[bib.bib26] [26] Daniel Moeller, Ramamohan Paturi, and Stefan Schneider. Subquadratic algorithms for succinct stable matching. In Computer Science Symposium in Russia, pages 294–308, 2016. doi:10.1007/978-3-319-34171-2_21.

[bib.bib27] [27] Noam Nisan and Mario Szegedy. On the degree of boolean functions as real polynomials. Computational complexity, 4(4):301–313, 1994. doi:10.1007/BF01263419.

[bib.bib28] [28] Ryan O’Donnell and Rocco A Servedio. Extremal properties of polynomial threshold functions. In IEEE Conference on Computational Complexity, pages 3–12. Citeseer, 2003. doi:10.1109/CCC.2003.1214406.

[bib.bib29] [29] A. A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.

[bib.bib30] [30] Alexander Razborov and Avi Wigderson. $n^{\Omega(\log n)}$ lower bounds on the size of depth-3 threshold cicuits with and gates at the bottom. Information Processing Letters, 45(6):303–307, 1993. doi:10.1016/0020-0190(93)90041-7.

[bib.bib31] [31] Alexander A Sherstov. Communication lower bounds using directional derivatives. Journal of the ACM (JACM), 61(6):34, 2014.

[bib.bib32] [32] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In STOC, pages 77–82, 1987. doi:10.1145/28395.28404.

[bib.bib33] [33] Srikanth Srinivasan. On improved degree lower bounds for polynomial approximation. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS, pages 201–212, 2013. doi:10.4230/LIPICS.FSTTCS.2013.201.

[bib.bib34] [34] Jun Tarui. Probabilistic polynomials, ac0 functions and the polynomial-time hierarchy. Theoretical computer science, 113(1):167–183, 1993. doi:10.1016/0304-3975(93)90214-E.

[bib.bib35] [35] Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theoretical Computer Science, 348(2-3):357–365, 2005. doi:10.1016/J.TCS.2005.09.023.

[bib.bib36] [36] Ryan Williams. Faster all-pairs shortest paths via circuit complexity. In STOC, pages 664–673, 2014. doi:10.1145/2591796.2591811.

[bib.bib37] [37] Ryan Williams. The polynomial method in circuit complexity applied to algorithm design (invited talk). In 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS, pages 47–60, 2014.

[bib.bib38] [38] Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In Proceedings of the ICM, 2018.

Sparsity Lower Bounds for Probabilistic Polynomials

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Theorem 1.

A General Lower Bound for AND and OR

Theorem 2 ([29, 32]).

Question 1.

Theorem 3.

Further Results

Theorem 4.

Theorem 5.

Theorem 6 (Informal; see Theorems 24 and 27).

Theorem 7.

Theorem 8.

Intuition for our Results

Prior Work

2 Preliminaries

Definitions and Notation

Definition 9.

Definition 10.

Definition 11.

Chernoff Bound

Lemma 12 (Chernoff bound).

Useful Facts About Polynomials

Lemma 13 (The 0-1 Schwartz-Zippel Lemma).

Theorem 14 ([31] Theorem 1.1).

Corollary 15.

Proof (sketch).

3 Sparsity Lower Bounds for Probabilistic Polynomials

Theorem 16.

Proof of Theorem 4.

Proof of Theorem 5.

Lemma 17.

Proof.

Proof of Theorem 16.

3.1 Probabilistic Sparsity Lower Bounds for Compositions

Theorem 18.

Proof.

Corollary 19.

Lemma 20.

Proof.

Proof of Theorem 1.

4 Conclusion

References

Appendix A Depth-3 Circuits and Probabilistic Polynomials

Definition 21.

Lemma 22.

Proof.

Lemma 23.

Proof.

Appendix B SETH Predictions about Probabilistic Polynomials

Theorem 24.

Proposition 25.

Proof.

Theorem 26.

Proof.

Theorem 27.

Proof (sketch).

Appendix C Sparsity for −1/1 Probabilistic Polynomials

Lemma 28.

Proof.

Theorem 29.

Proof.

Appendix D Separation Between Probabilistic Rank and Probabilistic Sparsity

Definition 30.

Lemma 31 ([4] Corollary 2.1).

D.1 Complements

Proposition 32 ([4] Lemma D.2).

Proposition 33.

Proof.

Corollary 34.

Proof.

Appendix C Sparsity for $-$ 1/1 Probabilistic Polynomials