Sparsity Lower Bounds for Probabilistic Polynomials
Abstract
Probabilistic polynomials over commutative rings offer a powerful way of representing Boolean functions. Although many degree lower bounds for such representations have been proved, sparsity lower bounds (counting the number of monomials in the polynomials) have not been so common. Sparsity upper bounds are of great interest for potential algorithmic applications, since sparse probabilistic polynomials are the key technical tool behind the best known algorithms for many core problems, including dense All-Pairs Shortest Paths, and the existence of sparser polynomials would lead to breakthrough algorithms for these problems.
In this paper, we prove several strong lower bounds on the sparsity of probabilistic and approximate polynomials computing Boolean functions when means “false”. Our main result is that the AND of ORs of variables requires probabilistic polynomials (over any commutative ring which isn’t too large) of sparsity to achieve even error. The lower bound is tight, and it rules out a large class of polynomial-method approaches for refuting the APSP and SETH conjectures via matrix multiplication. Our other results include:
-
Every probabilistic polynomial (over a commutative ring) for the disjointness function on two -bit vectors requires exponential sparsity in order to achieve exponentially low error.
-
A generic lower bound that any function requiring probabilistic polynomials of degree must require probabilistic polynomials of sparsity .
-
Building on earlier work, we consider the probabilistic rank of Boolean functions which generalizes the notion of sparsity for probabilistic polynomials, and prove separations of probabilistic rank and probabilistic sparsity.
Some of our results and lemmas are basis independent. For example, over any basis for true and false where , and any commutative ring , the AND function on variables has no probabilistic -polynomial with sparsity, degree, and error simultaneously. This AND lower bound is our main technical lemma used in the above lower bounds.
Keywords and phrases:
Probabilistic Polynomials, Sparsity, Orthogonal Vectors, Probabilistic RankFunding:
Josh Alman: Work supported in part by NSF Grant CCF-2238221.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Algebraic complexity theory ; Theory of computation Models of computation ; Mathematics of computing Discrete mathematicsEditors:
Raghu MekaSeries and Publisher:

1 Introduction
Let be a ring. The sparsity of a polynomial , denoted by , is the number of monomials of . A probabilistic polynomial in variables is a distribution on -variate polynomials. Its degree and sparsity are the maximum degree and sparsity, respectively, of any polynomial in its support. The error of on a function is the function from to defined as
Probabilistic polynomials with low sparsity are the main technical component in a number of recent randomized algorithms for core problems. This includes the best known randomized algorithm for the dense case of All-Pairs Shortest Paths (APSP) [36], the best known randomized algorithm for Orthogonal Vectors (OV) [1], the best known deterministic algorithms for these problems (which derandomizes the aforementioned probabilistic polynomials) [11], as well as Constraint Satisfaction problems [37, 15], All-Pairs Nearest Neighbor problems [3, 2], systems of polynomial equations over finite fields [23], online matrix-vector multiplication [22], and Stable Matching [26]. APSP and OV are especially of interest, as substantially faster algorithms for them would have many applications (see the survey of Vassilevska-Williams [38]). Moreover, it has become popular to conjecture that it is impossible to improve the best-known runtimes of these problems by polynomial factors.
For all of these algorithms, the running time improvement is completely determined by the sparsity of a certain probabilistic polynomial: if we can find a sparser polynomial, we would directly improve the runtime. This leads to the main motivating question of this paper:
Is it possible to design sparser probabilistic polynomials to speed up these algorithms further, and in particular, can we design polynomials sparse enough to refute some of the popular conjectures?
In this paper, we prove unconditionally that several known probabilistic polynomial constructions are already optimal, in a broad sense. The particular Boolean functions of interest for APSP and OV are as follows.
For a function , the Boolean function , is defined by:
The best known algorithm for OV [1] uses a probabilistic -polynomial representation of the function with sparsity to quickly check the orthogonality of many pairs of vectors. The OV problem for input vectors of dimension could be solved in truly sub-quadratic time, if a probabilistic polynomial with sparsity could be designed for . In particular, if there were a polynomial for with sparsity (for some universal ) that worked for every , that would yield a fast enough algorithm for OV to refute the Strong Exponential Time Hypothesis (SETH) [18, 10]. For completeness, we prove this in Theorem 27 in Appendix B.
Chan and Williams [36, 11] showed that can be used to solve APSP as well. Their algorithm, the best known randomized algorithm for APSP in dense graphs, crucially uses a probabilistic -polynomial representation of the function with sparsity, in order to get a time algorithm. If the sparsity could be improved to , then APSP would be solvable in truly subcubic time – an algorithmic breakthrough.
Our main result shows that in a broad sense, the sparsities of the known probabilistic polynomials for are already optimal, up to a constant factor in the exponent:
Theorem 1.
For every , every and every finite commutative ring such that , -probabilistic polynomials with error for which use the of to represent “false” require sparsity .
Theorem 1 is quite powerful, as it holds for all finite commutative rings rather than just finite fields. For instance, many simple functions like OR are known to have smaller polynomial representations when working modulo a composite number rather than over a finite field [6], but we rule out a sparsity improvement over such rings as well.
Theorem 1 also rules out sparser representations over larger rings (even infinite rings like ) without very large coefficients. For instance, given a sparse probabilistic polynomial for over the integers, we could compute all of its coefficients modulo to get a probabilistic polynomial over with at most the same degree, sparsity, and error. Even if we are working over a commutative ring where such a trick doesn’t work, it takes bits to describe elements of when . Working with such large coefficients is prohibitive when trying to solve OV or APSP, since it would presumably multiply the runtime by such a large polynomial factor, unless polynomial evaluation techniques substantially different from the current known techniques (FFT and fast matrix multiplication) are used.
The (only) caveat of our lower bound in Theorem 1 is that it relies on using the of for false (which is very natural when working with and ). We leave open the problem of proving a “basis independent” lower bound (where false can correspond to any value in ).
A General Lower Bound for AND and OR
Our main technical tool is a general (basis independent) lower bound for probabilistic polynomials representing the AND and OR functions, which is also of independent interest. The celebrated probabilistic polynomial constructions of Razborov and Smolensky, which are the key component behind the aforementioned probabilistic polynomials in many recent algorithms (including APSP and OV) yield:
Theorem 2 ([29, 32]).
For all primes and integers , OR and AND on variables have -probabilistic polynomials of degree , sparsity , and error .
Using this construction, there are -probabilistic polynomials for AND on variables with low values for any two choices of the sparsity, degree, and error measures:
-
Sparsity and degree (but error ) follows from Theorem 2 with .
-
Degree and error (but sparsity ) follows from Theorem 2 with .
-
Sparsity and error (but degree ) is achieved by the exact polynomial .
A natural question arises: is there a probabilistic polynomial for AND with low degree, low sparsity, and low error simultaneously?
Question 1.
Is there a probabilistic polynomial for AND on variables over any commutative ring which has degree, error, and sparsity?
While Question 1 is fundamental in its own right, a positive answer would also contradict a non-uniform version of SETH, by using it to construct sparse probabilistic polynomials for ; see the discussion in Appendix B for more details. In constructing these polynomials, there is a choice of which values in our ring map to true and false. We could use , or , or even a basis like [16]. While the degrees of polynomials do not change under such transformations, it is known that in general, the sparsity of polynomials can be very sensitive to this map [21]. Could this be exploited to refute SETH?
We unconditionally prove a strong no answer to Question 1:
Theorem 3.
For every commutative ring , every distinct , and all and , there is a such that for all sufficiently large , the AND function on variables does not have a -probabilistic polynomial, where represents false and represents true, of sparsity , degree , and error .
Indeed, a polynomial giving a yes answer to Question 1, upon setting all but variables to 1 (or whichever element of the ring corresponds to true), would contradict Theorem 3, since sparsity, degree, and error cannot increase upon setting variables. We emphasize that Theorem 3 holds over any commutative ring and any choice of basis .
Further Results
Our technical lemmas can be applied to yield several sparsity lower bounds for various algebraic representations of natural and well-studied Boolean functions. First, we give an exponential sparsity-error tradeoff lower bound for probabilistic polynomials for the disjointness function and the “complements” function . Informally, determines whether two given bit-vectors have zero inner product, and determines whether two given bit-vectors are binary complements of each other; it is very closely related to the equality function (see Section 2 for formal definitions).
Theorem 4.
For every commutative ring , function , constants and , there is a such that the disjointness of two bit vectors does not have a -probabilistic polynomial of sparsity and error which uses the of to represent “false”.
Theorem 5.
Theorem 4 also holds with disjointness replaced by .
That is, we cannot obtain a probabilistic polynomial for disjointness with subexponential sparsity and subexponential error simultaneously, regardless of the degree. (Observe that the exact representation of disjointness of two -bit vectors has sparsity and error.)
In the appendices, we also apply the above results in a number of other ways.
In Appendix A, we give a simple equivalence between probabilistic polynomials and certain depth-3 circuits, which makes some prior work relevant. In Appendix B, we discuss how sparse probabilistic polynomials for simple functions would refute SETH:
Theorem 6 (Informal; see Theorems 24 and 27).
For any commutative ring , either of the following would refute SETH:
-
A low-degree, low-sparsity, and low-error -probabilistic polynomial for AND (over ), or
-
An -probabilistic polynomial for with sparsity and constant error for any fixed integer .
In Appendix C, we investigate probabilistic polynomials over the basis instead of . While the degree of a polynomial does not depend on the basis, the sparsity can be very sensitive to such a change, and we give examples where probabilistic polynomial sparsity lower bounds in the setting do not hold in the setting. Building on our lower bound Theorem 5 over the basis, we prove:
Theorem 7.
Over any commutative ring which doesn’t have characteristic , the complements function has a probabilistic polynomial with sparsity and error over the basis , and does not over the basis .
In Appendix D, we consider a notion related to the probabilistic sparsity of a function, introduced by [4], called the probabilistic rank. The probabilistic rank of a function measures the extent to which the truth table matrix, or communication matrix, of can be probabilistically represented by low rank matrices. Probabilistic sparsity upper bounds give corresponding probabilistic rank upper bounds, a fact which is used by the aforementioned “polynomial method” algorithms to reduce polynomial evaluation to fast matrix multiplication. For some functions, the probabilistic rank could potentially be much smaller than the probabilistic sparsity. This is interesting, both for the prospect of being able to design faster algorithms using rank instead of sparsity, and given the connection between probabilistic rank and the matrix rigidity problem [4]. In Appendix D, we prove a separation between the two notions in a number of settings over , including showing that the two notions can be arbitrarily far apart for a natural function:
Theorem 8.
For any the -probabilistic sparsity of over , where has fan-in and each has fan-in , is strictly greater than its -probabilistic rank. In particular, as increases, the -probabilistic sparsity grows unboundedly while -probabilistic rank remains fixed.
We note that there are more straightforward constructions of functions which separate probabilistic sparsity and probabilistic rank, such as any function of large probabilistic sparsity which depends only on the first half of its input bits and thus has rank 1. However, the functions which are most interesting in algorithmic applications typically depend on both the first and second halves of the input variables similarly.
In Appendix E, we observe how bounds on one of the sparsity or degree of probabilistic polynomials for a function can give bounds on the other as well. We notably show that, if a function requires probabilistic degree , then it must require probabilistic sparsity . Hence, probabilistic degree lower bounds, which are more common in the literature, imply probabilistic sparsity lower bounds as well. We also relate probabilistic degree to probabilistic rank in a better-than-trivial way: Any -variate Boolean function with a probabilistic polynomial of degree and error has -probabilistic rank most , compared to the trivial .
Intuition for our Results
Our main proof technique, which we use to prove Theorem 3, and then again in some of our additional results, is a novel combination of two well-known ideas from the literature on lower bounds for Boolean functions and their polynomial representations.
The first idea is carefully chosen random restrictions. Random restrictions are convenient for studying probabilistic polynomial sparsity, since we can study each monomial of a probabilistic polynomial separately, and analyze the distribution of its resulting degree after applying a restriction. We give random restrictions that do not restrict too many variables, but that substantially lower the degree of any sparse polynomial.
The second idea is a variant of the Schwartz-Zippel Lemma (Lemma 13 below), which roughly says that low degree multivariate polynomials must have many roots on the Boolean hypercube. This is especially interesting, for instance, when applied to the function, which has only one Boolean root; it says that low degree probabilistic polynomials for must have high error.
Our proof strategy combines these two ideas in a new way. Given a sparse probabilistic polynomial for with low degree and error, we design a random restriction so that the restriction of at least one polynomial in the support of the probabilistic polynomial violates the Schwartz-Zippel Lemma, giving a contradiction. Previous work using random restrictions to get sparsity lower bounds for other types of polynomials has typically combined the random restrictions with simple degree lower bounds. Here, we need some novel technical arguments in order to apply the Schwartz-Zippel Lemma with the trade-off between the error and degree of the resulting restriction.
Of course, our description here (necessarily) glosses over many important details. In addition to random restrictions and the Schwartz-Zippel Lemma, our techniques use multiparty communication complexity, algebraic results about commutative rings (especially in our proof that Theorem 3 holds over any basis), and prior work in the area of polynomial representations of Boolean functions. Our proof extending Theorem 3 to any basis, which we give in Section G, is very general, and could be of independent interest for proving basis-independent versions of other results. It shows that for any function , a simultaneous lower bound on the degree and sparsity of a probabilistic polynomial for over any one basis implies a sparsity lower bound over any other basis as well.
Prior Work
Work on particular depth-three circuit size lower bounds (namely, circuits) can be seen as lower bounds on the sparsity of probabilistic polynomials (see Appendix A for an overview of the simple connection). For example:
-
Razborov and Wigderson [30] showed that a simple depth-three function requires circuits of size . This implies for any fixed prime , a sparsity/degree/error lower bound for probabilistic polynomials computing .
-
Chattopadhyay [12] gave an function requiring exponential circuits when the bottom fan-in is ; this corresponds to an degree lower bound for probabilistic polynomials computing an function.
-
Beame and Huynh [8] give a depth-9 circuit that needs circuits of size , corresponding to a probabilistic -sparsity lower bound for this function.
-
Sherstov [31] gives a depth-3 circuit with tighter circuit size lower bound. Essentially, if the bottom fan-in is for some , then the circuit size must be exponential.
Our Lemma 20, which we prove on the way to proving Theorem 1, can be viewed as extending this line of work by showing a probabilistic -sparsity lower bound for a depth-2 circuit.
Polynomial sparsity has been studied in a stronger setting, where the polynomials in question only need to correlate with a target function rather than probabilistically represent , by Lovett and Srinivasan [24]. Their main result is that polynomials over with monomials have exponentially low correlation with the function. Note that a sparse probabilistic polynomial for a Boolean function implies the existence of a sparse polynomial with high correlation with , but not vice versa. Hence, Lovett and Srinivasan’s lower bound applies to probabilistic polynomials as well. Their technique has similarities to some of ours – they also use a type of variable restriction to reduce low sparsity polynomials to low degree polynomials – but their “tree restriction” technique and analysis isn’t powerful enough to apply in our setting where, as we will see, we are constrained in how we are allowed to restrict our variables based on the function we are computing.
Sparsity has also been studied in the setting of polynomial threshold functions, where the sign of a polynomial dictates its output (as a Boolean function). Krause and Pudlak [21, 20] give a function that has exponentially-high sparsity as a polynomial threshold function in the basis, but polynomially-low sparsity over . Other references on sparsity in the polynomial threshold function setting include Basu et al. [7], O’Donnell and Servedio [28], and Hansen and Poldoskii [16]. It should be noted that these polynomial threshold function sparsity lower bounds do not suffice to prove probabilistic polynomial sparsity lower bounds: they correspond to lower bounds against depth-2, circuits, compared to our lower bounds against probabilistic polynomials which correspond to depth-3, circuits.
The degrees of probabilistic polynomials are more well-studied, starting with the work of Razborov and Smolensky [29, 32], although even some basic questions remain open. For instance, while is known to have constant-degree probabilistic polynomials for constant error over any fixed finite field, it is unknown what degree is needed over : there is a gap between the best upper bound of [9, 34, 5] and the best lower bound of [25, 17]. Our connection between sparsity and degree which we prove in Appendix E makes use of the probabilistic degree of , and is thus only (close to) tight over a finite field.
2 Preliminaries
Definitions and Notation
As usual, all logarithms are assumed to be base-two. For gates and , we write to denote the circuit whose output gate is a whose inputs are all copies of on disjoint variables. We write , , and to denote the AND, OR, and XOR functions on inputs, respectively, and is the majority function which computes whether at least half of its inputs are .
Let be a decision problem, construed as an infinite family of Boolean functions (one for each ). To facilitate the presentation, we define a (non-uniform) complexity class for problems which are computable by sparse, low-degree, and low-error probabilistic polynomials over :
Definition 9.
A decision problem is in the class if for all but finitely many , there is a probabilistic -polynomial for with sparsity at most , degree at most , and error at most . We analogously define and when there is no bound on the third parameter, and for probabilistic polynomials over the basis where “false” means and “true” means (when omitted, the default is that and ).
With this notation, we can succinctly state results about probabilistic polynomials for Boolean functions.
Many of our lower bounds concern two different Boolean functions, the disjointness function, and the complements function.
Definition 10.
For vectors , the disjointness function, , is given by (Note that this definition negates the variables compared to how the disjointness function often appears in the literature.)
Definition 11.
For vectors , we write to denote the complement of , given by for each . The equality function, , is given by, for , if and otherwise. In other words, .
The complements function, , is given by, for , . In other words, .
Chernoff Bound
In a few of our proofs, we will use a standard Chernoff bound for sums of independent Bernoulli random variables.
Lemma 12 (Chernoff bound).
If are independent Bernoulli random variables with sum , and is the expected value of , then we have:
Useful Facts About Polynomials
We state two key facts from past work which we will use in the proofs of our main results. First, we will need the following variant of the Schwartz-Zippel-DeMillo-Lipton Lemma (which appears, for instance, in [27, Lemma 2.6]).
Lemma 13 (The 0-1 Schwartz-Zippel Lemma).
Let be a nonzero multilinear -variate polynomial over any commutative ring , of total degree at most . Then .
A proof can be found in Appendix G for completeness. Note that Lemma 13 is tight for . Second, we need a simple lower bound on the probabilistic degree of circuits which can be derived from a communication complexity lower bound. We refer the reader to the introduction of Sherstov’s paper [31] for the relevant definitions (of the number-on-forehead communication model, and the -party set disjointness problem), as we will only need the statement of Corollary 15 for this paper.
Theorem 14 ([31] Theorem 1.1).
The number-on-forehead communication complexity of -party set disjointness on elements with error is at least .
Corollary 15.
For every and every , there is a such that: For every sufficiently large positive integer , every positive integer with , and every finite commutative ring with , the function , where the has fan-in and each has fan-in , does not have a probabilistic polynomial of error and degree less than .
Proof (sketch).
From such a polynomial, we can design a error number-on-forehead communication protocol for -party set disjointness: draw a polynomial from the probabilistic polynomial, then map each monomial to a player whose variable is not in that monomial. Each player can evaluate their monomials and report the sum, from which they can compute set disjointness with the desired error. Each player only needs to communicate one element of , using bits of communication, so the total communication is . If for sufficiently small , then this contradicts Theorem 14.
3 Sparsity Lower Bounds for Probabilistic Polynomials
We begin by proving a probabilistic polynomial lower bound for AND:
Theorem 16.
For every commutative ring , and all and , there is a such that for all sufficiently large , the AND function on variables is not in .
For notational simplicity, we use the substitution in our proof:
Restatement of Theorem 16. For every commutative ring , and all and , there is a such that for all sufficiently large , the AND function on variables is not in .
Theorem 16 is a special case of Theorem 3, restricted to when the basis is . Theorems 4 and 5 will follow from this. We will later prove the more general Theorem 3, which holds for any basis over , in Appendix G. In other words, we are currently focusing on the basis, but we later generalize this statement to hold over any basis. Before we give the proof, let us show that Theorem 16 implies Theorems 4 and 5.
Proof of Theorem 4.
Let be an representation of . From each of the gates, pick at random one variable feeding into the gate and fix it . Thus, for each , every monomial in of degree at least survives with probability at most . The probability that under this restriction has degree greater than is therefore at most . We set .
Note that under this restriction, the disjointness function reduces to . Thus, we have constructed a probabilistic polynomial for of degree with error . If is large enough, this contradicts Theorem 16. Hence our assumption about the existence of must be false.
Proof of Theorem 5.
The proof is identical, since restricting one input of each gate to reduces to as well.
We now begin proving Theorem 16. We start with a simple case to illustrate our use of the 0-1 Schwartz-Zippel Lemma (Lemma 13):
Lemma 17.
For every commutative ring , and all , there is a such that for all sufficiently large , the AND function on variables is not in -DE.
Proof.
Let be a probabilistic -polynomial with error at most computing AND of bits, such that every polynomial in the distribution has at most degree.
First we claim that, without loss of generality, the support of may be assumed to not contain the identically zero polynomial . Since is a polynomial for AND with error at most , on the point we must have . Therefore, if is in the support of , it must have probability at most of being chosen. Hence if we simply replace in with the polynomial which is identically the constant , the new probability of error is at most .
Next, we prove there is no distribution satisfying the above properties, if . Otherwise, by fixing randomness in (i.e., by picking any polynomial in the support of which achieves at most the average error), we can identify a fixed polynomial of degree at most which disagrees with the AND function on at most a fraction of inputs. By the previous paragraph, is not identically zero. Since the AND of variables is nonzero on only one point in , it follows that the polynomial is nonzero on at most points in . Hence the nonzero degree- polynomial satisfies
contradicting Lemma 13 when .
Before we turn to the harder part of the proof, where , we sketch the main ideas. Our first step is to hit a presumed probabilistic polynomial for AND with a random restriction of -inputs. The probabilities are chosen carefully so that the degree of the polynomial decreases considerably, yet the number of variables restricted is not too small, with high probability. That is, we shrink the degree of the polynomial, but we do not force it to a constant. Next, we select a non-zero polynomial from the remaining probabilistic polynomial distribution, and argue that with high probability, its degree has dropped significantly below , but its error on the AND function remains about . This contradicts Lemma 13, which says that a non-zero polynomial of degree- must be non-zero on at least points. While the general idea of the proof is not too complicated, we need to be careful with the details, and choose parameters very particularly so that all of the required constraints hold. Let us now give the details:
Proof of Theorem 16.
Assume there is a probabilistic polynomial over for AND with such that for all , the sparsity is at most , the degree is , and the error is . As in the proof of Lemma 17, we may assume that the zero polynomial is not in the support of . Let be real-valued parameters to be set later. Define
Consider the following construction of a probabilistic polynomial for AND on bits:
-
Given , sample .
-
For , independently and with probability , assign to in the polynomial . Let be the remaining polynomial in variables.
-
If and , then output (where the indicate ones).
-
Otherwise, output a random bit.
Our central claim is that, for appropriate setting of , the probabilistic polynomial has degree at most and computes the AND of bits with error at most . The degree of and AND functionality follow by construction; we need to bound the error. This will follow from positing a series of inequalities in the analysis that imply a small error bound, then proving at the end that the inequalities can be satisfied.
Define , where the probability is over the sampling in (of step 1) and the random restriction to (of step 2). Observe that err is precisely the probability that case 4 is reached in the above procedure. Further observe that the error of is at most : we have probability of error from which is , and probability err of reaching case 4.
We now claim that , when are set appropriately. In particular, provided that
(1) | ||||
(2) |
we will have that both and are at most .
Observe that the number of variables in can be seen as a sum of independent random variables , such that . Hence has expectation and by a Chernoff bound (Lemma 12),
But if (2) holds, then .
We will analyze by considering the monomials of . Let be a monomial in of degree . If , then the monomial cannot possibly contribute to the event that (the degree cannot increase by setting variables). So we may assume . There is a corresponding randomly reduced monomial in , obtained by setting each of the variables in monomial to with probability . (Note this is the worst case; in some commutative rings , the monomial could also cancel with another monomial and disappear from .) The degree can be seen as a sum of independent 0-1 random variables where . By a Chernoff bound (Lemma 12),
Now, if (1) holds, then and therefore
This bound and holds for every monomial in , and there are at most monomials, so
by the union bound.
We have proved that, assuming are set properly, has degree at most and computes the AND of bits with error at most . Finally, suppose that
(3) |
Then, is a probabilistic polynomial for the AND on variables, with degree at most and error .
Fixing the randomness in , we are in an analogous situation as the earlier case of : we can find a single non-zero degree- polynomial that differs from the AND of variables on an -fraction of points. Since the AND is nonzero on a fraction of points, the polynomial is non-zero on at most an fraction of points. Provided that
(4) |
we will have and , hence is non-zero on an fraction of points. But then the degree of is , for some . This contradicts Lemma 13.
It remains to show that (1), (2), (3), and (4) can be simultaneously satisfied to yield the above contradiction, which we do in Lemma 46 in Appendix F.
The full proof of Theorem 3, which generalizes the above Theorem 16 to any basis over , can be found in Appendix G. It shows that picking representatives in for true and false other than and cannot help to overcome the lower bound of Theorem 16. It is unfortunately not so straightforward to prove from here, since our proof of Theorem 16 critically uses the Schwartz-Zippel lemma, Lemma 13, which does not hold for general over any commutative ring . For a simple example, if over , then the nonzero linear polynomial is nonetheless zero on both and . We solve this issue by taking any probabilistic polynomial over with a basis where the Schwartz-Zippel lemma does not hold, and performing a combinatorial transformation to yield a new probabilistic polynomial over with a different basis where the Schwartz-Zippel lemma does hold, and such that has the same degree and error, and only mildly greater sparsity, than .
3.1 Probabilistic Sparsity Lower Bounds for Compositions
We next show how probabilistic degree lower bounds for Boolean functions can lead to probabilistic sparsity lower bounds for compositions of with simple functions like and .
Theorem 18.
Let be any commutative ring, and let be any Boolean function such that any probabilistic polynomial of error for over requires degree . Then, for every , every -error probabilistic polynomial over for the -variate function , the composition of with s of disjoint variables, requires sparsity at least over the basis.
Proof.
Label the variables of our function of interest, , by for and , so that our function can be written . Consider the following random restriction : for each , choose one uniformly and independently at random, and set to for each . Notice in particular that after applying to the inputs of our function , the result is the function on the remaining variables which were not set to .
Assume to the contrary that a probabilistic polynomial exists for with sparsity less than . We construct a probabilistic polynomial for as follows:
-
1.
Draw a , and a random restriction as described above.
-
2.
Let be the polynomial obtained by substituting s chosen by into the variables of .
-
3.
If then output , otherwise output a random bit.
Since the restriction transforms into , we know that as constructed in step 2 is a probabilistic polynomial for with error . If we can show that we actually return with probability at least in step 3, then we know our probabilistic polynomial has error at most , and it by definition has degree less than , which will contradict our assumption about as desired.
Observe that the monomials of are a subset of the monomials of . Moreover, a monomial of appears in if and only if none of its variables were set to . For a monomial in of degree at least , the probability that appears in is at most : if both and appear in for some and , then is definitely set to zero upon restriction , and otherwise, each variable in is set to with probability independently of the rest.
Therefore, if has sparsity , then the expected number of monomials of degree at least in is at most . Hence, by Markov’s inequality, the probability that has degree at least , meaning it has at least one monomial of degree at least , is at most , as desired.
Corollary 19.
Theorem 18 also holds with replaced by .
Applying Theorem 18 and Corollary 15, we can prove that a simple depth-2 circuit requires probabilistic sparsity . (Note the lower bound is tight, as mentioned in the introduction.)
Lemma 20.
For every and every finite commutative ring such that , and every function , -probabilistic polynomials with error for , where the has fan-in and each has fan-in , require sparsity over the basis.
Proof.
Pick a sufficiently small such that Corollary 15 holds, meaning requires -probabilistic degree for error . Applying Theorem 18 to with and , we see that requires -probabilistic sparsity for error . Collapsing the two bottom layers of s, we see that , which implies the desired result since .
We can finally prove our main theorem:
Proof of Theorem 1.
Setting for all changes into the negation of the function from Lemma 20. The same lower bounds hold since setting variables does not increase the degree, sparsity, or error.
4 Conclusion
By combining multiple techniques (random restrictions, Schwartz-Zippel, communication complexity lower bounds, known degree lower bounds, etc.), we have shown sparsity lower bounds for probabilistic and approximate polynomials computing several natural functions of interest in algorithm design. Perhaps the main question left open by our work is whether there is a polynomially-sparse constant-error probabilistic polynomial for the depth-three circuit over some basis where , working in some ring . Even more generally, does this circuit have low probabilistic rank over some ring? In this paper, we proved several strong sparsity lower bounds over the basis which significantly narrows the search space for such polynomials, but some of steps in our proofs require that the basis contains (“killing” monomials by setting variables to ).
For the complements () function, there is a nice sparsity upper bound over (compared to the lower bound over ), so it is still possible (but unlikely) that the basis could support sparse probabilistic polynomials for depth-3 functions. More basis-independent methods for sparsity lower bounds (such as Theorem 3) would be of great interest, as well as new techniques for constructing sparse polynomial representations.
References
- [1] Amir Abboud, Ryan Williams, and Huacheng Yu. More applications of the polynomial method to algorithm design. In SODA, pages 218–230, 2015.
- [2] Josh Alman, Timothy M Chan, and Ryan Williams. Polynomial representations of threshold functions and algorithmic applications. In FOCS, pages 467–476, 2016. doi:10.1109/FOCS.2016.57.
- [3] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. In FOCS, pages 136–150, 2015. doi:10.1109/FOCS.2015.18.
- [4] Josh Alman and Ryan Williams. Probabilistic rank and matrix rigidity. In STOC, pages 641–652, 2017. doi:10.1145/3055399.3055484.
- [5] James Aspnes, Richard Beigel, Merrick Furst, and Steven Rudich. The expressive power of voting polynomials. Combinatorica, 14(2):135–148, 1994. doi:10.1007/BF01215346.
- [6] David A Mix Barrington, Richard Beigel, and Steven Rudich. Representing boolean functions as polynomials modulo composite numbers. Computational Complexity, 4(4):367–382, 1994. doi:10.1007/BF01263424.
- [7] Saugata Basu, Nayantara Bhatnagar, Parikshit Gopalan, and Richard J. Lipton. Polynomials that sign represent parity and descartes’ rule of signs. Computational Complexity, 17(3):377–406, 2008. doi:10.1007/S00037-008-0244-2.
- [8] Paul Beame and Trinh Huynh. Multiparty communication complexity and threshold circuit size of . SIAM Journal on Computing, 41(3):484–518, 2012. doi:10.1137/100792779.
- [9] Richard Beigel, Nick Reingold, and Daniel Spielman. The perceptron strikes back. In Structure in Complexity Theory Conference, pages 286–291. IEEE, 1991. doi:10.1109/SCT.1991.160270.
- [10] Chris Calabro, Russell Impagliazzo, and Ramamohan Paturi. The complexity of satisfiability of small depth circuits. In Parameterized and Exact Complexity (IWPEC), pages 75–85, 2009. doi:10.1007/978-3-642-11269-0_6.
- [11] Timothy M Chan and Ryan Williams. Deterministic apsp, orthogonal vectors, and more: Quickly derandomizing razborov-smolensky. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1246–1255. Society for Industrial and Applied Mathematics, 2016. doi:10.1137/1.9781611974331.CH87.
- [12] Arkadev Chattopadhyay. Discrepancy and the power of bottom fan-in in depth-three circuits. In FOCS, pages 449–458. IEEE, 2007. doi:10.1109/FOCS.2007.30.
- [13] Ernie Croot, Vsevolod F Lev, and Péter Pál Pach. Progression-free sets in Zn. Annals of Mathematics, 185:331–337, 2017.
- [14] Zeev Dvir and Benjamin L Edelman. Matrix rigidity and the croot-lev-pach lemma. Theory Of Computing, 15(8):1–7, 2019. doi:10.4086/TOC.2019.V015A008.
- [15] Jiawei Gao, Russell Impagliazzo, Antonina Kolokolova, and R. Ryan Williams. Completeness for first-order properties on sparse structures with algorithmic applications. In SODA, pages 2162–2181, 2017. doi:10.1137/1.9781611974782.141.
- [16] Kristoffer Arnsfelt Hansen and Vladimir V Podolskii. Polynomial threshold functions and boolean threshold circuits. Information and Computation, 240:56–73, 2015. doi:10.1016/J.IC.2014.09.008.
- [17] Prahladh Harsha and Srikanth Srinivasan. On polynomial approximations to ac. Random Structures & Algorithms, 54(2):289–303, 2019. doi:10.1002/RSA.20786.
- [18] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. JCSS, 62(2):367–375, 2001. doi:10.1006/JCSS.2000.1727.
- [19] Swastik Kopparty and Srikanth Srinivasan. Certifying polynomials for ACˆ0(parity) circuits, with applications. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2012, pages 36–47, 2012. doi:10.4230/LIPICS.FSTTCS.2012.36.
- [20] Matthias Krause and Pavel Pudlák. On the computational power of depth-2 circuits with threshold and modulo gates. Theoretical Computer Science, 174(1-2):137–156, 1997. doi:10.1016/S0304-3975(96)00019-9.
- [21] Matthias Krause and Pavel Pudlák. Computing boolean functions by polynomials and threshold circuits. computational complexity, 7(4):346–370, 1998. doi:10.1007/S000370050015.
- [22] Kasper Green Larsen and R. Ryan Williams. Faster online matrix-vector multiplication. In SODA, pages 2182–2189, 2017. doi:10.1137/1.9781611974782.142.
- [23] Daniel Lokshtanov, Ramamohan Paturi, Suguru Tamaki, R. Ryan Williams, and Huacheng Yu. Beating brute force for systems of polynomial equations over finite fields. In SODA, pages 2190–2202, 2017. doi:10.1137/1.9781611974782.143.
- [24] Shachar Lovett and Srikanth Srinivasan. Correlation bounds for poly-size circuits with symmetric gates. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 640–651. Springer, 2011.
- [25] Raghu Meka, Oanh Nguyen, and Van Vu. Anti-concentration for polynomials of independent random variables. arXiv preprint, 2015. arXiv:1507.00829.
- [26] Daniel Moeller, Ramamohan Paturi, and Stefan Schneider. Subquadratic algorithms for succinct stable matching. In Computer Science Symposium in Russia, pages 294–308, 2016. doi:10.1007/978-3-319-34171-2_21.
- [27] Noam Nisan and Mario Szegedy. On the degree of boolean functions as real polynomials. Computational complexity, 4(4):301–313, 1994. doi:10.1007/BF01263419.
- [28] Ryan O’Donnell and Rocco A Servedio. Extremal properties of polynomial threshold functions. In IEEE Conference on Computational Complexity, pages 3–12. Citeseer, 2003. doi:10.1109/CCC.2003.1214406.
- [29] A. A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
- [30] Alexander Razborov and Avi Wigderson. lower bounds on the size of depth-3 threshold cicuits with and gates at the bottom. Information Processing Letters, 45(6):303–307, 1993. doi:10.1016/0020-0190(93)90041-7.
- [31] Alexander A Sherstov. Communication lower bounds using directional derivatives. Journal of the ACM (JACM), 61(6):34, 2014.
- [32] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In STOC, pages 77–82, 1987. doi:10.1145/28395.28404.
- [33] Srikanth Srinivasan. On improved degree lower bounds for polynomial approximation. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS, pages 201–212, 2013. doi:10.4230/LIPICS.FSTTCS.2013.201.
- [34] Jun Tarui. Probabilistic polynomials, ac0 functions and the polynomial-time hierarchy. Theoretical computer science, 113(1):167–183, 1993. doi:10.1016/0304-3975(93)90214-E.
- [35] Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theoretical Computer Science, 348(2-3):357–365, 2005. doi:10.1016/J.TCS.2005.09.023.
- [36] Ryan Williams. Faster all-pairs shortest paths via circuit complexity. In STOC, pages 664–673, 2014. doi:10.1145/2591796.2591811.
- [37] Ryan Williams. The polynomial method in circuit complexity applied to algorithm design (invited talk). In 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS, pages 47–60, 2014.
- [38] Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In Proceedings of the ICM, 2018.
Appendix A Depth-3 Circuits and Probabilistic Polynomials
The study of probabilistic polynomials over is equivalent to studying depth-three circuits of a certain form; let us recall the simple connection. In the following, let denote the number of 1s in the binary string .
Definition 21.
The approximate majority of separation on bits, denoted , is defined only on with , and is given by when , and when .
Lemma 22.
If is computable by a depth-three circuit with an approximate majority of separation at the output, PARITY gates with fan-in at most on the middle layer, and AND gates of fan-in at most at the bottom layer, then there is an probabilistic polynomial for with degree , sparsity , and error .
Proof.
An polynomial can be viewed as a circuit, and vice versa: each monomial in such a polynomial is computing the of its constituent variables, and then the output of the polynomial is the sum of these monomials, which is a . We thus get our probabilistic polynomial by selecting a uniformly random subcircuit which feeds into the top gate of our circuit, and outputting it as an polynomial.
Lemma 23.
For any constant , if there is an probabilistic polynomial for with degree , sparsity , and error , then is computable by a depth-three circuit with an approximate majority of separation and fan-in at the output, PARITY gates with fan-in at most on the middle layer, and AND gates of fan-in at most at the bottom layer.
If we didn’t bound the fan-in of the approximate majority gate, then the proof would be almost identical to the proof of Lemma 22. In order to bound the fan-in by , we need to use a Chernoff bound:
Proof.
Randomly sample polynomials from the probabilistic polynomial, for a parameter to be selected later. For each , by the Chernoff bound, the probability that less than a fraction of the polynomials output the correct answer on is at most . By setting for a sufficiently large constant , this can be made less than . Then, by the union bound, there is a probability less than 1 that, for any , less than a fraction of the polynomials output the correct answer on . Hence, by the probabilistic method, there must be a choice of such that at least a fraction give the correct answer on every . We can convert these polynomials into the desired circuit as in the proof of Theorem 22.
Appendix B SETH Predictions about Probabilistic Polynomials
Several algorithms based on the polynomial method in circuit complexity (such as [36, 37, 1, 3, 2]) have the following form: (a) we identify a “subcircuit” which, if could be evaluated on many inputs rapidly, we could solve a desired problem faster, (b) efficiently convert into a polynomial (probabilistic, approximate, etc.) so that the evaluation task becomes algebraic, and (c) solve the evaluation task rapidly using algebraic algorithms, such as a fast Fourier transform or a matrix multiplication. How far can this approach be pushed? A significant question has been whether this kind of approach can be used to solve CNF-SAT fast enough to refute the pesky Strong Exponential Time Hypothesis (SETH).
We observe in this section that a low-degree, low-sparsity, and low-error probabilistic polynomial for AND would have contradicted SETH (recall that we unconditionally prove in this paper that there is no such polynomial). Along the way, we demonstrate other “predictions” about polynomial representations obtained by assuming SETH.
Theorem 24.
For any commutative ring , a low-degree, low-sparsity, and low-error -probabilistic polynomial for AND (over ) implies that there is a universal such that for all , there is a non-uniform circuit family of size for solving -SAT on variables.111Moreover, an efficiently-samplable probabilistic polynomial for AND with these properties would refute the original Strong Exponential Time Hypothesis. All interesting probabilistic polynomial constructions we are aware of, such as those found in the works [29, 32, 9, 5, 19, 33, 3, 2], are efficiently samplable.
The proof of Theorem 24, follows from a few claims. First, we note that a low-degree/low-sparsity/low-error probabilistic polynomial for AND over 0/1 is equivalent to low-sparsity/low-error probabilistic polynomial for over 0/1. Let be any commutative ring.
Proposition 25.
For any , if then .
Proof.
From a probabilistic polynomial for , substitute the exact OR of two variables, , to get a probabilistic polynomial of the same error for . In this substitution, each original monomial is replaced by a product of at most polynomials, each of which is the OR of two variables and has three monomials; so the new polynomial has at most monomials.
Next, we note that a probabilistic polynomial for disjointness () with low sparsity and error implies the probabilistic polynomial needed for refuting SETH:
Theorem 26.
For any , if then has a -probabilistic polynomial of sparsity and error which (when there is no error) outputs for false and a nonzero value for true.
Proof.
We define . Let be the probabilistic polynomial for . Draw a . By a union bound, with probability at least , correctly computes the or value of on the different inputs which we want to compute the of. Hence, there are polynomials which each have at most monomials, and we want to compute their assuming they all evalutate to or .
The subring of generated by consists of all integer multiples of . It is isomorphic to for some nonnegative integer . (The case is when no positive integer multiple of is in , and hence .) We consider three cases depending on whether , , or .
If , then simply suffices to compute OR over . This has sparsity at most , and no additional error, for a total error of .
If , then let be two uniformly random subsets, and we pick
This is when all the are , and each is with probability otherwise, so the entire polynomial is with probability at least . Hence the total error is , and the sparsity is .
If , then pick uniformly and independently random , and pick the polynomial . If each then this is , and otherwise it is with probability at most . The sparsity is at most , and the error is at most .
Finally, we note that a -sparse constant-error probabilistic polynomial (in the sense of Theorem 26) for for every would refute (non-uniform) SETH:
Theorem 27.
Suppose there is a fixed such that for all , . Then there is a universal such that for all , there is a non-uniform circuit family of size for solving -SAT on variables.
Proof (sketch).
The Orthogonal Vectors problem asks: given a collection of bit vectors, is there a pair of them which are orthogonal? We show that the hypothesis implies that for all , the Orthogonal Vectors problem with vectors from can be solved in time with advice, for a universal . The theorem then follows from a (by now) standard reduction (as in [35, 3]).
By our hypothesis, we can store (as non-uniform advice) a probabilistic polynomial with sparsity and error for the function , where is a sufficiently small constant. Note this requires storing a distribution of only distinct -sparse polynomials (see Section A).
To solve Orthogonal Vectors on vectors, we proceed just as in Abboud-Williams-Yu [1]: partition the set of vectors into parts, with each part containing at most vectors. For all pairs of parts, we want to evaluate the probabilistic polynomial on all pairs of vectors in the union of the two parts. This gives an input of pairs of vectors; feeding this input to a polynomial , we can estimate whether there is an orthogonal pair among vectors in .
The problem of solving Orthogonal Vectors now reduces to: evaluate a random with sparsity , on different pairs of input vectors. When (for example), this problem can be solved in time, via fast rectangular matrix multiplication. To have a high probability of success, we only need to sample random from our probabilistic polynomial .
For similar reasons, many types of low sparsity polynomial representations for depth-3 , including low sparsity approximate polynomials over , would refute this form of SETH.
Appendix C Sparsity for 1/1 Probabilistic Polynomials
In many settings, the choice of basis (which field element corresponds to “true” and which to “false”) can have a large impact on the sparsity of polynomial representations of Boolean functions. For instance, over , computing XOR exactly on inputs requires monomials, whereas over it requires only one monomial.
We will show in this section that the choice of basis can make a big difference in the sparsity of probabilistic polynomials for the complements function (defined in Section 2). Although Theorem 16 holds over any choice of basis for our polynomials, our proof of Theorem 5 relies heavily on the fact that we are working over a basis where “false” corresponded to 0, and hence setting a variable to false eliminated every monomial that it was a part of.
Suppose we look instead at the basis, with the standard convention that “true” corresponds to . In this setting, the statement of Theorem 5 is no longer true, since we can construct a probabilistic polynomial for the complements function with polynomial sparsity and polynomial error:
Lemma 28.
For any , and any commutative ring which doesn’t have characteristic 2, the AND of variables has a probabilistic polynomial over of sparsity and error over the basis .
Proof.
Pick a random subset , and consider the polynomial . If , then some nonempty subset of the ’s are ’s, and the probability that is . Hence, in this case, with probability . If , then all ’s are , and we always have that . Hence, if we take independent copies of the above , then the polynomial has the desired properties.
Theorem 29.
For any , and any commutative ring which doesn’t have characteristic 2, has a probabilistic polynomial over of sparsity and error over the basis .
Proof.
The XOR of two variables is a single monomial over the basis, and so we simply substitute these monomials into the probabilistic polynomial from Lemma 28.
Theorem 29 with shows that has a probabilistic polynomial with polynomial sparsity and any polynomial error over the basis, whereas Theorem 5 says that no such probabilistic polynomial can exist over the basis.
Finally, we remark that just as in the case, we can still construct probabilistic polynomials for OR over that achieve any two of the three goals of low sparsity, low degree, and low error:
-
Low sparsity and error comes from Theorem 29,
-
Low degree and error still follows from Theorem 2 with , since degree and error are unchanged upon changing basis, and
-
Low sparsity and degree still follows from Theorem 2 with , since then the degree is still and so by exploiting multilinearity, the sparsity must be .
Appendix D Separation Between Probabilistic Rank and Probabilistic Sparsity
Definition 30.
Let be the canonical bijection, be any ring, be any Boolean function, and be any partition of the input variables to into two sets of size . For , let denote evaluated at the assignment where the variables in are set to and the variables in are set to . The truth table matrix of is the matrix given by .
For , and any matrix over , the -probabilistic rank of is the minimum for which there is a distribution on matrices over of rank at most such that for all :
The -probabilistic rank of is the maximum, over all such partitions , of the -probabilistic rank of over .
In this section, we compare the probabilistic sparsity of a function with its probabilistic rank. We know that the probabilistic sparsity is always an upper bound on the probabilistic rank:
Lemma 31 ([4] Corollary 2.1).
For any ring , if , then has -probabilistic rank at most over .
Using our probabilistic polynomial sparsity lower bounds, we are able to prove a separation in two different settings.
D.1 Complements
First, by using Theorem 5, we can give a simple explicit function, the complements function (defined in Section 2), which has substantially lower probabilistic rank than probabilistic sparsity over .
Recall from Theorem 5 that requires superpolynomial sparsity for any polynomial error over . In contrast, is known to have low probabilistic rank, by simulating a communication complexity protocol for disjointness (similar to our proof of Theorem 29), and it follows that has low probabilistic rank as well:
Proposition 32 ([4] Lemma D.2).
has -probabilistic rank over any field, including .
Proposition 33.
has -probabilistic rank over any field, including .
Proof.
The communication matrix of is the same as that of , up to a permutation of columns that swaps with for each .
Corollary 34.
has polynomial rank for any polynomial error over , whereas it requires superpolynomial sparsity for any polynomial error over .
Proof.
D.2 Compositions with high fan-in XORs
Second, we combine Corollary 19 with Lemma 31 and a simple upper bound construction to prove a separation between probabilistic rank and probabilistic sparsity for functions of the form , over .
Let be any Boolean function, and , be any values. Let be the smallest value such that has an -probabilistic polynomial of error and degree . We will prove a separation for the -variate function
Proposition 35.
Over , the function has -probabilistic rank at most
Proof.
Suppose we have partitioned our variables into parts , and for , let and be the corresponding partition of the variables . Define and , and note that and are each functions of the variables in only one of the parts.
By assumption, has an -probabilistic polynomial of error and degree . Suppose we draw a , and consider the expression
(5) |
where is the exact degree-2 polynomial for computing XOR on two variables. Expression (5) is therefore a probabilistic expression for , and it is a degree polynomial in the variables and , where each depends only on the variables in one side of the partition. The probabilistic rank is hence at most by Lemma 31.
Meanwhile, recall from Corollary 19 that requires -probabilistic sparsity . Combining these two bounds can show a separation, for instance:
Theorem 36.
For any
the -probabilistic sparsity of is strictly greater than its -probabilistic rank. In particular, as increases, the -probabilistic sparsity grows unboundedly while -probabilistic rank remains fixed.
Proof.
With this bound on , we have
as desired, where the second inequality is a standard bound on binomial coefficients. We can make the separation arbitrarily big by making sufficiently large, since increasing does not change the rank upper bound (the bound from Proposition 35 has no dependence on ), but increases the sparsity lower bound to as large as we would like.
We can now apply known upper and lower bounds to get separations for other explicit functions. Consider, for instance, the majority function:
Theorem 38 ([3]).
There is an -probabilistic polynomial for on bits with error and degree .
Combining these with Theorem 36 yields:
Theorem 39.
There is a constant such that, for any
the -probabilistic sparsity of over , where has fan-in and each has fan-in , is strictly greater than its -probabilistic rank. In particular, as increases, the -probabilistic sparsity grows unboundedly while -probabilistic rank remains fixed.
Appendix E Relationship between Degree, Sparsity, and Rank
Since the degree and sparsity of a probabilistic polynomial are related concepts, we are able to bound one using bounds on the other. Bounding the degree of a probabilistic polynomial will bound the sparsity of the polynomial, since there are only so many monomials of a given degree:
Proposition 40.
For any decision problem , any , and any prime , if , then , where
Proof.
Let be a representation of . Since whenever , we may assume that any polynomial in the support of is multilinear. Hence, each polynomial in the support of is a multilinear polynomial of degree at most , which has at most monomials. is therefore a representation of .
Perhaps more surprisingly, decision problems with low sparsity probabilistic polynomials must have low degree representations as well:
Proposition 41.
For any decision problem , any , and any prime , if , then , where
Proof.
Let be a representation of . We design a new probabilistic polynomial for , defined as follows:
-
1.
Draw a polynomial from . We can write as a sum of monomials as , where , each , and each is a monomial.
-
2.
Recall by Theorem 2 that for all . Draw a polynomial from the corresponding probabilistic polynomial on variables with .
-
3.
Each monomial computes the AND of some set of variables. It can therefore be probabilistically computed as , where there are ‘1’s. We can thus output the polynomial
This has degree , and by a union bound over the error of the replacement for each monomial, it has error at most .
At most a logarithmic factor is lost when converting between these two bounds; if has probabilistic degree then it can have probabilistic sparsity , and if it has probabilistic sparsity then it can have probabilistic degree .
Proposition 41 allows us to prove probabilistic sparsity lower bound from probabilistic degree lower bounds, which are much more common in the literature. For instance:
Corollary 42.
If is any Boolean function which requires -probabilistic degree for error , then requires -probabilistic sparsity for error .
Proof.
A lower sparsity representation would yield a probabilistic degree less than using Proposition 41.
Combining Proposition 40 with Lemma 31, we see that if has a probabilistic polynomial of degree , then it has probabilistic rank at most . Applying ideas from the recent breakthrough work of Croot, Lev, and Pach [13] on Roth’s theorem over , we can derive an improved bound on the rank from the degree:
Proposition 43.
Let be even. Let be any Boolean function with a probabilistic polynomial of degree and error . Then, the -probabilistic rank of is at most .
We note that Dvir and Edelman [14] also used this observation when studying the rigidity of a certain family of matrices. Proposition 43 will follow immediately from Theorem 45 below, but first we need a definition.
Definition 44.
Let be and ring, be an even integer, and be any bijection between the two sets. For any function , the Boolean rank of over is the minimum such that there are matrices of dimensions and with for all .
Theorem 45.
Let be even. Let be a multilinear polynomial in variables with degree at most over a ring . Then, the Boolean rank of over is at most .
Proof.
Think of as being over two sets of variables, and . We begin with some notation: For sets , let denote , and let be the coefficient in of the monomial . Let be a list of all subsets of of cardinality at most . Given a monomial and a variable assignment , we let denote the evaluation of on the assignment .
We want to prepare two matrices () and (), and prove for all that
For each , we make a -dimensional vector
Correspondingly, for all we make an -dimensional vector
Since every -set of contains either at most ’s or at most ’s, we observe as desired that
Appendix F Missing Proof
Lemma 46.
Proof.
Recall that are fixed, and we may set to be arbitrarily large.
Setting
inequality (2) immediately holds. Setting , we have , so (3) holds. To satisfy (1), we need
Setting to be sufficiently large satisfies this inequality. Finally, we turn to (4). Substituting in our above solution for , we have
As is only a function of and which are fixed constants, and can be made arbitrarily large, the above expression can be made larger than . This completes the proof.
Appendix G Generalization to a-b basis
In this section, we generalize Theorem 16 to hold for a general basis, where we can choose any two elements from our commutative ring to represent true and false. What we prove is:
Reminder of Theorem 3. For every commutative ring , every pair of distinct , and all and , there is a such that for all sufficiently large , the AND function on variables is not in .
We begin by recalling and proving the version of the Schwartz-Zippel Lemma, Lemma 13, which we use in the proof of Theorem 16.
Reminder of Lemma 13. Let be a nonzero multilinear -variate polynomial over any commutative ring , of total degree at most . Then .
Proof.
Let be the total degree of . Without loss of generality, we may assume has the monomial with a nonzero coefficient. Consider any assignment to the remaining variables. The resulting polynomial still has the monomial with the same coefficient, and so it is a nonzero multilinear polynomial of degree in variables. It remains to show that is nonzero on at least one of the assignments to the remaining variables, which will imply as desired that .
Let be a monomial in of lowest degree with a nonzero coefficient, let be its coefficient, and let be the set of variables in (it may be that is the constant term, in which case ). Then, consider the assignment which sets the variables in to , and the remaining variables to . Our polynomial must evaluate to on this assignment, since monomial will have all variables set to , and all other monomials will have at least one variable set to by how we chose . This is the desired assignment since .
By carefully examining our proof of Theorem 16, we see that we only used the fact that we were working with the basis in three ways:
-
We need that substituting a value in for one variable in a monomial decreases the degree of that monomial by at least one; this is true regardless of what basis we are working over.
-
We need that the correct output when the AND is false is , and that the correct output when AND is true is any nonzero value; by subtracting from our polynomial we can always assume this is the case, while changing the number of monomials by at most one and leaving the degree unchanged.
-
We need the Schwartz-Zippel lemma (Lemma 13) to hold for our choice of basis.
If we could generalize Lemma 13 to hold for any basis , then our same proof would work as stated to prove Theorem 3. Unfortunately this is impossible in general: If are such that is a zero-divisor, then Lemma 13 does not hold when we use the basis. For example, working over , with , the single variable linear polynomial is a nonzero polynomial, but is zero on all inputs from the basis.
Although it is possible to salvage this method, at least for most choices of , we will instead proceed in a different way, which does not rely on how our original proof works, except for the fact (as remarked) that Theorem 3 holds when the input basis (what values we input to the polynomial for true or false) is , and the output basis (what values the polynomial outputs for true or false) is for any nonzero .
We begin with the input basis for any nonzero , where true maps to but false still maps to .
Lemma 47.
For every commutative ring , every nonzero , and all and , there is a such that for all sufficiently large , the AND function on variables is not in .
Proof.
For a given and , we make the same choice of as in Theorem 16. Assume to the contrary that AND is in , and let be the corresponding probabilistic polynomial. We will convert into a representation, which will contradict Theorem 16.
Draw a polynomial from , and consider any nonzero monomial of , with degree and nonzero coefficient . If any of the variables in is assigned to , then evaluates to , and otherwise it evaluates to .
Consider the new polynonial , given by
In other words, we have multiplied each monomial from by . Now, for any , let be given by . Then we always have that . Since corresponds to the same true/false assignment over the basis as does over the basis, this means that the resulting distribution on is a probabilistic polynomial with input basis and output basis for AND with the same sparsity, degree, and error as , as desired.
Finally, we generalize to all input bases .
Proof of Theorem 3.
For a given and , let be the choice of which is made by Lemma 47. We will choose here. Similar to before, our plan is to convert from a representation to a representation, and then to invoke Lemma 47.
Assume to the contrary that AND is in , and let be the corresponding probabilistic polynomial. Draw a polynomial from , and consider the new polynomial given by
Our desired probabilistic polynomial over the input basis will be the resulting distribution on . The polynomial clearly has the same degree as , and errs on the same true/false inputs as .
Consider any nonzero monomial of , with degree and nonzero coefficient . Each variable in has been replaced by the sum of two terms, and so has been replaced by monomials when expanded out. Since had degree at most , we have that , and so since had at most monomials, we have that has at most monomials. Our probabilistic polynomial is therefore a representation, as desired.
We note finally that, although we have been limiting ourselves to the same input basis for all of the variables, we could actually choose a different input basis for each variable, and then another output basis like usual, and the above proof still holds.