Direct Sums for Parity Decision Trees
Abstract
Direct sum theorems state that the cost of solving instances of a problem is at least times the cost of solving a single instance. We prove the first such results in the randomised parity decision tree model. We show that a direct sum theorem holds whenever (1) the lower bound for parity decision trees is proved using the discrepancy method; or (2) the lower bound is proved relative to a product distribution.
Keywords and phrases:
direct sum, parity decision trees, query complexityFunding:
Tyler Besselman: Supported by the National Natural Science Foundation of China Grant No.62102260, NYTP Grant No.20121201, and NYU Shanghai Boost Fund.Copyright and License:
Weiqiang Yuan; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Oracles and decision treesAcknowledgements:
We thank Farzan Byramji for useful comments on an earlier version of this paper.Editors:
Srikanth SrinivasanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
One of the most basic questions that can be asked for any model of computation is:
How does the cost of computing independent instances scale with ?
A direct sum theorem states that if the cost of solving a single copy is , then solving copies has cost at least , which matches the trivial algorithm that solves the copies separately. Direct sums have been studied exhaustively for randomised query complexity , randomised communication complexity , and other concrete models of computation; see Section 1.3 for prior work. In this work, we initiate the study of direct sum problems for randomised parity decision tree complexity , a computational model sandwiched between the widely-studied and .
Parity decision trees
Parity decision trees generalise the usual notion of decision trees by allowing parity queries. To compute a function on input , a deterministic parity decision tree performs queries of the form “what is ?” where and . Once enough queries have been made, outputs . Parity decision trees are more powerful than ordinary decision trees: We have where (resp. ) denotes the (parity) decision tree complexity of , defined as the least depth of a deterministic (parity) decision tree computing . On the other hand, the -bit XOR function is an example where while . We define a randomised parity decision tree as a distribution over deterministic parity trees . Then is defined as the worst-case depth (over both input and randomness of the tree) of the best randomised parity tree computing with error , that is, for all . As usual, we let . To simplify notation, we drop the superscript pt and write and for short.
Our main research question is now formulated as follows. Let denote the direct sum function that takes instances and returns the value of on each of them, . We study the following question.
Question 1.
Do we have for every function ?
We show two (incomparable) main results: We answer Question 1 affirmatively when the randomised parity decision tree lower bound is proved using the discrepancy method (Section 1.1), or when the lower bound is proved relative to a product distribution (Section 1.2).
1.1 First result: Direct sum for discrepancy
Discrepancy is one of the oldest-known methods for proving randomised communication lower bounds [56, 3]. Let us tailor its definition to the setting of randomised parity trees. Thinking of as the vector space , consider some affine subspace and a probability distribution over the inputs . The discrepancy of measures how biased is on . Namely, let . The difference is called the bias of under . We define as the minimum over of the maximum difference an affine subspace can attain. Finally, the discrepancy bound is defined as . As in communication complexity, it is not hard to see that ; see Section 3 for details.
Theorem 1.
We have for any function .
In particular, if we have a function whose randomised parity decision tree complexity is equal to its discrepancy, , then Theorem 1 shows answering Question 1 for that function. To prove Theorem 1, we first establish a particularly simple characterisation of that relies on affine spaces defined by a single constraint. We then prove a perfect direct sum (and even an XOR lemma) for discrepancy using Fourier analysis.
1.2 Second result: Direct sum for product distributions
The standard approach for proving randomised lower bounds is to use Yao’s principle [55], which states that . Here is the distributional -error complexity of defined as the least depth of a (deterministic) parity tree such that . We say that a distribution over is product if it can be written as the product of independent Bernoulli distributions. We define the best lower bound provable using a product distribution as
Our second result answers Question 1 affirmatively (modulo logarithmic factors) whenever the randomised parity decision tree lower bound is proved relative to a product distribution.
Theorem 2.
We have for any -bit function .
We show moreover that the -factor loss in Theorem 2 can be avoided when is the uniform distribution (or more generally any bounded-bias distribution). One should compare this to the state-of-the-art in communication complexity, where the quantitatively best distributional direct sum results are also for product distributions and suffer logarithmic-factor losses [37, 4].
To prove Theorem 2, we introduce a new complexity measure tailored for product distributions, which we call skew complexity and which we define precisely in Section 4. We prove that this new measure admits a perfect direct sum theorem, , and that it characterises the measure up to an factor. (We also show that the logarithmic loss is necessary for our approach: there is a function such that , even though .) We give a more in-depth technical overview in Section 2.
Comparison of main results
We also show that our two main results (Theorems 1 and 2) are incomparable: For some functions , our first result gives a much stronger lower bound for than the second result – and vice versa. See Section 7 for the proof.
Lemma 3.
The complexity measures disc and are incomparable:
-
1.
There is an -bit function such that while .
-
2.
There is an -bit function such that while .
1.3 Related work
Parity decision trees
Even though the direct sum problem for parity decision trees has not been studied before, the model has been studied extensively. Parity decision trees were first defined by Kushilevitz and Mansour [40] in the context of learning theory. Several prior works have studied their basic combinatorial properties [57, 46] as well as Fourier-analytic properties [28, 27], often with connections to the log-rank conjecture [54, 53, 48, 20, 32, 43]; see also the survey [39]. There are various lifting theorems involving parity decision trees: lifting from to [31], from to [19, 5, 1], and from to [52, 17]. These lifting theorems have played a central role in proving lower bounds for proof systems that can reason using parities [33, 22, 25, 11, 18, 2].
Decision trees
In the decision tree model with classical queries, a deterministic direct sum theorem, , and even the stronger composition theorem, , are easy to show by combining adversary strategies [50]. In the randomised case, an optimal direct sum result, , is known [38, 36, 21]. Whether a composition theorem holds for randomised query complexity, (for total and ), is a major open problem [10, 6, 8, 7, 49]. In the randomised setting, it is possible that the direct sum problem requires strictly more than queries: if one wants to succeed in computing all copies with probability , then a naive application of the union bound would require each copy to have error . Results stating that one sometimes has are called “strong” direct sum theorems [12, 13] and they sometimes hold even for composed functions [9, 16, 29].
Communication complexity
The direct sum question for deterministic communication complexity was posed in [23] and it remains a notoriously difficult open problem [34]. By contrast, in the randomised setting, the direct sum problem is characterised by information complexity [15], which has inspired a line of works too numerous to cite here; see [35, §1.1] for an up-to-date overview. One of the key findings is that a direct sum for communication protocols is false in full generality in the distributional setting [26, 47]. We leave open the intriguing possibility that the information complexity approach can be adapted to parity decision trees. Historically, one of the first direct sum theorems proved for randomised communication was for the discrepancy bound [51, 41] (analogously to our Theorem 1). Here, discrepancy is known to be equivalent to the -norm [42]. We also mention that a near-optimal direct sum theorem holds for product distributions [4] (analogously to our Theorem 2).
1.4 Open question: Deterministic direct sum
The main question left open by our work is Question 1, namely, whether admits a direct sum theorem for all functions . However, we would also like to highlight the analogous question in the deterministic case . As discussed above, this is a long-standing open problem in the case of deterministic communication complexity . The best results so far are:
We observe in Section A.1 that both approaches have analogues in the parity setting.
Theorem 4.
For any function and ,
-
1.
,
-
2.
.
We leave it as an open question whether a perfect direct sum theorem holds for deterministic parity decision trees. We think one should attack this problem before addressing the (presumably much harder) problem for deterministic communication complexity.
2 Technical overview
We focus here on our second main result in Theorem 2 stating that and which is technically the much more involved theorem. Our main technical result is the following direct sum result for distributional complexity. Here ( times).
Theorem 5.
There exists a universal constant such that the following holds. For any , product distribution over , and ,
When , Theorem 2 follows by taking . Indeed, let be the distribution achieving the maximum for . Using the easy direction of the minimax principle:
The remaining case is handled separately using ad-hoc methods in Lemma 38. We now give an overview of the proof of Theorem 5.
Warm-up: Uniform distribution
We showcase the basic proof technique by sketching the proof in the simple case where is the uniform distribution. Fix an -bit function and let be the uniform distribution over . In the uniform (and more generally in the bounded-bias) case, we are actually able to avoid the additive/factor loss and obtain, for all ,
| (1) |
Fix a decision tree of depth computing copies of with error at most when . We show how to extract a tree that computes a single copy with error at most and depth . Leaves of correspond to affine subspaces of of codimension . More generally, one can associate with any node of the set of linear constraints that led to the node ( is the depth of the node ; the root is at level 0) and the vector of desired values. The set of inputs that reach node is then given by .
Of relevance here are the pure constraints one can extract from . A pure constraint for copy is some such that if and only if . To be more precise, the number of pure queries that can be extracted for query at node is defined with:
We describe next two illustrative examples when there are copies.
-
1.
Node corresponds to constraints “” and “”. Then, as it is possible to extract the pure parity constraint by adding the two constraints. In the same vein, .
-
2.
Node corresponds to constraints “” and “”. Then, as it not possible to extract a pure constraint for the first copy.
Observation 6.
For any node , we have .
As the second example highlights, it is possible for the inequality to be strict. This is a notable difference with classical decision trees: for any subcube , the sum of fixed bits of each copy is the total number of fixed bits in .
Where to plant ?
The overarching idea of our result is that under the uniform distribution, queries that increase the pure rank for a copy are the only ones that bring usable information. It is thus enough to find a copy with low expected pure rank in and plant the real instance there. To make this precise, taking the expectation over leaves of when with Observation 6 implies the existence of some copy with low expected pure rank:
Let us fix this advantageous copy to be . On input we run the tree with planted as and delay actual querying of bits of as much as possible. Suppose that the process has reached node with constraint set and there is a new parity query to be answered. If , the answer to that query can be found (an optimised tree would not do such a query). If , we say that is critical for if it would increase the pure rank for the first copy . If is critical, there is no way to avoid making a parity query to the real input and our algorithm does it. If is not critical, it is however enough to answer with a uniform bit (that is, move to a random child of in ) without querying at all.
To see this, further split , where is the constraint for the first copy and is the constraint for the rest of the copies. If has and , it must be that . Since is drawn from the uniform distribution we thus have for any fixed consistent with :
| (2) |
Correctness and efficiency
Let us call the above randomised tree solving one copy as . Correctness can be argued by showing that the distribution of leaves attained in the process for is the same as the distribution of leaves attained by in . On the other hand, has expected depth as a real query to is only ever made times for each leaf . In conclusion, has the following guarantees:
-
1.
.
-
2.
.
Using Markov inequality, it is possible to derandomise to get a deterministic parity tree solving with a worst-case guarantee instead of an average-case one. This step introduces a parameter controlling a trade-off between cost and error and yields the desired result (1).
2.1 Beyond uniform: The skew measure
Observe that (2) can fail badly for non-uniform . As an illustrative example suppose that two random bits are generated with and . The constraint is not pure from the point of view of . However, since is skewed towards being 0, the realisation of the constraint gives information about : . Thus, it seems one needs to query to answer the query even though the query is not critical for !
To circumvent this, we introduce the skew measure. This new measure is built around the observation that each bit of an input can be sampled independently in two steps. Indeed, the following process is equivalent to :
-
1.
Let be “” with probability 3/4 and with probability 1/4.
-
2.
If , return “”, else return a sample .
Note that if we are “lucky” and , we are back in the uniform case and (2) holds again. If not, we have somehow pre-emptively fixed the return bit to value . The skew measure explicitly splits product distributions into a random partial fixing followed by a uniform distribution over unfixed bits of . A tree computing in this model gets help from because reduces the complexity of the function. When those bits are unfixed, it is on the other hand easier to analyse the behaviour of the tree as it is the uniform case again.
In Sections 5 and 6, we show a perfect direct sum for the skew measure and that perhaps surprisingly, this new measure is only a -factor away from .
3 Direct sum for disc
The goal of this section is to prove Theorem 1, restated here for convenience. See 1 Let us start by defining discrepancy formally. We denote by the set of all affine subspaces of and the set of affine subspaces of codimension 1. Note that all spaces can be written as for some and .
Definition 7.
Let be a boolean function and be a distribution over . The (parity) discrepancy of with respect to is defined as:
The (parity) discrepancy of is where ranges over all distributions.
Observe that for all non-constant and by standard arguments, (see Lemma 42). Using the latter, the only thing left to get Theorem 1 is to prove a direct sum result for discrepancy. We do this in a very strong way by actually establishing an XOR lemma for disc. Let denote the function that takes instance and aggregates their result under using XOR, so that .
Lemma 8.
For any function , distribution and ,
This result is the strongest possible. Indeed, we cannot omit the “” on the right because of the counterexample : we have for any distribution . In Section A.3 we revisit this XOR lemma and show that it also holds in the distribution-free setting, with . As a final comment, we note that it is easier to work with instead of in the discrepancy setting, as it is somewhat tedious to define discrepancy for multi-valued functions. Before formally proving Lemma 8, we show how it is used to prove the main result Theorem 1.
Proof of Theorem 1.
Any decision tree computing can be converted to a decision tree computing . This is achieved by replacing the label of each leaf by its parity . This operation does not increase the error probability or cost and so, using the easy direction of Yao’s principle:
| (Lemma 41) | ||||
| (Lemma 42) | ||||
| (Lemma 8) | ||||
If , then the string of inequalities yields . If is constant, the claim is vacuously true. Finally, we show that for any non-constant , which completes the claim. Indeed, if , then .
To this end, let be a non-constant function and a distribution over which is balanced over -inputs and -inputs, i.e. . Let be the best deterministic parity decision tree for and suppose toward contradiction that it has strictly less than leaves. Let be the set of solutions which appear as a label on a leaf of . We have and since is balanced, any solution is equally likely so that:
Thus, errs with probability : a contradiction. We now proceed to prove Lemma 8 in three steps.
3.1 Step 1: Characterisation of discrepancy
Much like discrepancy for communication protocols can be characterised by the -norm of the communication matrix [51, 42], we show that the parity discrepancy of on is characterised by the -norm of the Fourier transform of a related function . This characterisation has two purposes. First, proving an XOR lemma requires exploring all the possible ways for the copies to sum to 1. This kind of convolution operation is greatly simplified in the Fourier domain, where it simply corresponds to standard multiplication. Second, the characterisation is also quite convenient to prove lower bounds on (which we do in Sections 7 and 8): it shows that maximum bias is (almost) attained for affine spaces of codimension 1 already.
The function
We relate a real-valued boolean function with its Fourier transform using the usual basis:
| [Fourier transform] | ||||
| [Inverse Fourier transform] |
See also [45] for more background on Fourier analysis. We use to denote the maximum absolute value of a Fourier coefficient of . To analyze , we introduce an associated function defined by and prove the following characterisation.
Lemma 9.
For every function and distribution over :
Proof.
The first inequality holds immediately because . For the second, fix a maximizing . Suppose that and fix its constraints and for so that . Observe that the vectors are linearly independent. Let so that and observe that
We focus on analysing terms . Let and observe that whenever , . Indeed, if is a linear combination of in :
On the other hand, for all . Indeed, Letting we have . Because , the constraint splits in half and thus . Factoring in those observations, we get:
Recall that has codimension and as such and , implying the desired inequality . We now prove the third inequality of the lemma. Fix any maximum Fourier coefficient and observe:
Fix the maximizing argument to and define . Note that and as such:
3.2 Step 2: Direct sum for the maximum Fourier coefficient
The outer-product of functions is defined as the function with . Next is a direct sum result for its max Fourier coefficient.
Claim 10.
For any , .
Proof.
Let ; for any , the definition of Fourier transform implies
From this, the equivalence is immediate:
3.3 Step 3: Conclusion
Proof of Lemma 8.
Let be the function associated with and in Lemma 9. It is possible to express as the -fold outer-product of : . Indeed, for , we have:
Thus, using the characterisation of Lemma 9 and Claim 10 times:
The XOR-lemma follows directly. We now show the other direction, . To do so, fix some maximizing and define which is concatenation of copies of . Formally:
Now, it is easy to check that and the claim follows.
4 Direct sum for part I: proof organisation
The goal of this section is to prepare the ground for a proof of our main technical contribution: a direct sum for parity trees in the distributional setting (restated below). See 5 As stated in Section 2, this is sufficient to prove Theorem 2 whenever . The remaining case is proved in Lemma 38 in Section A.2. We thus focus on proving Theorem 5 in the next two sections (this section is devoted to introducing the necessary definitions and technical lemmas).
4.1 Two strengthenings of Theorem 5
For technical convenience, we study distributional complexity for randomised trees. For a deterministic parity tree we let be the number of queries made by on input . If is a randomised tree and is a distribution, we define and in the natural way with:
Finally, we define . It is clear that but a converse result is more complicated, as the derandomisation can increase both the error and the depth simultaneously.
Claim 11.
For any , over and , .
We delay a proof of this folklore fact to Section A.4. We also refer readers to [36] which proves the analogue for ordinary decision trees. With this tool in hand, we can reduce Theorem 5 to the following theorem.
Theorem 12.
There exists a universal constant such that the following holds. For any , product distribution , and ,
Definition 13.
We say that a product distribution over is -bounded for some if for every .
In the next sections, we also show the following qualitative improvement over Theorem 12 for bounded distributions.
Theorem 14.
For any , -bounded distribution and ,
Let us highlight the difference between Theorem 12 and Theorem 14: the latter is free from both the factor and the extra error . This theorem is especially interesting when the hard distribution for the function at hand (e.g. MAJ) is close to the uniform one.
4.2 The Skew measure
For the rest of this paper, we let be the uniform distribution. Let be a distribution over and . We use to denote the mass of with respect to . When , we let be the distribution of conditioned on . Let be a partial assignment corresponding to the sub-cube . We use to denote .
4.2.1 Random partial fixings
Let be a product distribution over . We say that is -biased if for every . For the rest of the paper, we will assume without loss of generality that any encountered input distribution is -biased. Indeed, should not be -biased, we can apply the following iterative transformation. Let and . For every , if – the coordinate is already biased in the right direction – we simply leave and . Otherwise, let:
Observe that is -biased and for every and . Now that we are certain that is -biased, let . We define next the random partial fixing distribution with respect to . The intuition comes from the observation that each bit of can be written as a convex combination of the fixed bit “” and a uniform bit.
Definition 15 (Random Partial Fixing).
The random partial fixing with respect to , denoted , is a distribution of partial assignments sampled as follows: For each , we set independently
Observe that the following alternative two-step process is equivalent to sampling an input directly from . First, sample and then sample and return .
4.2.2 The new measure
Given a parity decision tree and a partial assignment over the input string, let denote the pruned by
-
1.
fixing all the variables in the support of ,
-
2.
removing redundant queries (those can be written as a linear combination of previous queries).
For randomised parity decision tree , we define as the distribution of , where .
Definition 16.
For every randomised parity decision tree and product distribution , define the skew average cost . Let be a function. For , we define the skew measure with:
Claim 17.
For any , product distribution , and , . Furthermore, equality holds if .
Proof.
The claim is immediate as for every randomised parity tree and product distribution .
4.3 Proof plan
The proofs of Theorems 12 and 14 are carried out in two steps. First, we prove a perfect direct sum for the skew measure in Section 5.
Theorem 18.
We have for any function , product and .
As a second step, we demonstrate in Section 6 that . We first prove a lossless conversion for product distribution which are constant-bounded. We then extend this to general product distributions for which we lose a -factor. Let us recall here that the loss for general (unbounded) product distribution is inherent to the skew measure. Indeed, we show in Section 8 the existence of some and for which but .
Theorem 19.
For any , product distribution , , we have
Theorem 20.
For any and -bounded product distribution , we have
Combining the results above it is now straightforward to conclude and prove Theorems 12 and 14. For instance, the proof of the former goes as follows.
Proof of Theorem 12.
| (Claim 17) | |||||
| (Theorem 18) | |||||
| (Theorem 19) | |||||
4.4 Some notation
Let us finish this section by defining some notations which will be useful for the rest of the paper. Let be a parity decision tree on input . We define as the set of nodes of and as the set of leaves of . For each node , we define the following: (items marked with are only defined for non-leaf nodes)
-
: the set of nodes on the root-to- path (including the root, excluding )
-
: the depth of
-
: the query made at node
-
the child of corresponding to the query outcome , where
-
: an boolean matrix with column vectors
-
of dimension .
-
: the labels on the root-to- path
For every boolean matrix , we use to denote the rank of (understood as a matrix over and let be the column space of . For every , let stand for the sub-matrix of consisting of row with indices in S. For every and , we denote by .
Let and be two distributions over . We use to denote the total variation distance between and and write if .
5 Direct sum for part II: direct sum for S
In this section, we prove a perfect direct sum for S (restated below). A direct consequence of this fact is a perfect direct sum for distributional parity query complexity under the uniform distribution. See 18
Corollary 21.
We have for any function and .
Proof.
Combine Claim 17 with Theorem 18. To prove Theorem 18, our overall strategy is to take a tree achieving and extract a tree computing a single copy of under to within error while having cost bounded by . To do so, we employ the extraction strategy hinted at in Section 2. The extractor works as long as the input distributions are uniform, which is the case after the random partial fixing step of S.
5.1 Extracting a single instance under uniform distributions
Let be a deterministic parity tree taking inputs and returning labels in . We assume without loss of generality that the queries along any root-to-leaf path are linearly independent. Let be the label associated with the leaf . For , we define the linear subspace of query vectors that are zero everywhere except for copy :
We say a node is critical with respect to if and denote the set of critical indices at node with . Finally, we let be the relative depth of with respect to instance and highlight that . The algorithm which extracts a tree for the -th instance out of is described in Algorithm 1.
Observe that it is indeed possible to compute the value of from and on line 5: Since , we have . On the other hand, as , we have . Thus , which means that can be written as a linear combination of the columns of : where are some ancestors of . This in turn implies that .
We stress that although is a deterministic tree, is a randomized decision tree with internal randomness inherited from the bits . Our main technical claim is that for any fixed , the algorithm perfectly simulates a run of when the input is on a random input and . In a nutshell, the randomness of the other instances can be substituted with the internal randomness . To make this precise, we let be the set of inputs leading to the node .
Claim 22.
For any , .
Proof.
Let us fix and for simplicity. We establish and alternative description of that puts pure constraints on instance 1 first. Pick independent vectors and extend them arbitrarily to a basis of . As each vector of this basis can be expressed as a linear combination of , it is possible to apply those linear combinations to and obtain values such that . The set of inputs that can reach node in a run of thus corresponds to
If , the statement follows directly as both probabilities are zero. However, if ,
This is so because a node can only be reached by having the “right” coin tosses of (provided that ). Thus, it remains to show that if .
Let and be the indices of the bits of every copy but the first one. Fix the boolean matrix and observe that by construction. We show that too. If , we can find a non-empty set such that . This implies that . But is linearly independent of – this contradicts . Therefore, if , we use this observation to conclude:
5.2 Proof of Theorem 18
We are now ready to show Theorem 18. Let be a randomised parity decision tree which witnesses . For each , define the randomized decision tree with:
-
1.
Sample .
-
2.
Sample .
-
3.
Let .
-
4.
Return .
We show in Lemma 23 that simultaneously for all . On the other hand, we show in Lemma 24 that . By an averaging argument, this shows the existence of a copy with cost and therefore . The remainder of this section is devoted to proving both claims.
Lemma 23.
For every , .
Proof.
It is enough to prove the statement assuming is a deterministic parity tree and . Let be the distribution of in the step 3 of generating . Fix some and note that . We also define . Using Claim 22 on a leaf yields:
Thus:
Observe now that for any , we have . Using the definition of thus yields:
Lemma 24.
.
Proof.
It is sufficient to prove this for the case where is a deterministic tree . We have:
where the third equality is due to the fact that the operations of applying and fixing variables are commutable. Let be a partial fixing and . The probability that node is visited during the process when the input is is . Observe that only makes queries to to reach . As such, we have:
The inequality is due to the fact that . This is because for each and so
To conclude, we have
6 Direct sum for part III: from S to
In this section, we show how to convert parity tree of the model to the more common model and prove Theorems 19 and 20. Let us fix for this section a boolean function together with some -biased product distribution over . Let be a deterministic parity tree trying to solve against . We begin by establishing an alternative view of the quantity . For any fixed , define the product distribution over with:
| (3) |
Sampling , and completing for all is equivalent to first sampling and then some . One can therefore see the process of as follows:
-
1.
Sample , .
-
2.
Run on .
-
3.
Every time attempts to make a query, check if simplifies the query: .
We describe this alternative view in detail in Algorithm 2. With this new interpretation, we can recast the quantity with
| (4) |
The idea to convert algorithms to ones is to simulate the process of Algorithm 2 by maintaining an incomplete but consistent view of . Initially, – i.e. nothing is known about – and we gradually update based on the queries we get. For instance, if , then ˜3 asserts . This scheme helps to relate the cost of the converted algorithm with . The description of the converted algorithm is given in Algorithm 3.
Definition 25.
Let be a fixing. The following are subsets of indices:
We also write to mean and likewise for other sets.
Let be the set of all possible that could be at the start of an iteration of Algorithm 3 at node . We now prove an invariant of Algorithm 3 and then its correctness.
Lemma 26.
For any state and that Algorithm 3 could be in at the start of a while iteration (line 3), it holds that:
Proof.
We prove the claim by induction on . The statement is true when is the root because both and are empty. Let us now assume that the statement is true for some and and prove that the invariant carries over to the next iteration regardless of the query outcomes and the randomness of the process. If is the updated value of at line 19, this amounts to showing that . We consider three cases.
Case .
Then, there is no update for and . Since for all , we have , as desired.
Case and for all .
Then, and . By definition of , we still have for all , so .
Case and for some .
Then and it must hold that . On the other hand,
Where the inequality follows from the fact that . Finally, this implies .
Lemma 27.
For any , .
Proof.
It is not hard to see that if Algorithm 3 gets the correct value of at each iteration of the while loop, it perfectly simulates . Thus, it suffices to show that whenever , the algorithm can compute the value of from the previous query outcomes. Lemma 26 and its proof implies that if , then . Thus can be written as a linear combination of column vectors of . Namely, , where are some ancestors of . On the other hand, we know that for all . Consequently, we have
Thus, Algorithm 3 follows the same path of vertices as , irrespective of the randomness . Consequently, its outputs correspond to the one of . We now turn our attention to the efficiency of Algorithm 3. We shall start with the special case of being a constant-bounded distribution. In this particular case, we obtain a lossless conversion. We then turn our attention to general product distributions, for which Algorithm 3 suffers a factor. This loss factor is inherent to reducing to as Section 8 shows.
6.1 Conversion for constant-bounded distribution
We now prove a strong efficiency result for Algorithm 3 in the special case where is -bounded (see Definition 13). A proof of our goal (Theorem 20) then follows easily.
Lemma 28.
We have .
Before proving this, we need an alternative view of the randomness used in the for-loop of Algorithm 3 (line 8 to 16). At the start of the process, a random partial fixing is generated. The algorithm is then deterministic: whenever some is queried in the for-loop, this is replaced by a query to . The algorithm updates with and exits the loop if . This process is given in detail in Algorithm 4.
Note that as is a product distribution, one can actually implement Algorithm 4 without querying all of at the start. Indeed, it is enough to query whenever one needs the value of , similarly to Algorithm 3. This implies that both processes are equivalent.
Suppose one runs Algorithm 4 on and . Fix some state the algorithm could be in at the start of the while loop (line 5). We let be the distribution of conditioned on reaching state . Furthermore, for a fixed and reachable with we let be the marginal distribution of conditioned on reaching state and . We now develop explicit formulations for those distributions.
Explicit definition of
Let be the distribution over defined as follows:
-
1.
For all , fix .
-
2.
For all , sample .
-
3.
Determine by solving
Explicit definition of
Let be the product distribution over defined as follows:
-
1.
For all such that , let with probability and else.
-
2.
For all such that , fix .
-
3.
For all , fix .
Claim 29.
For every reachable state and in Algorithm 4, we have
-
1.
;
-
2.
.
We delay the proof of this technical lemma to Section A.5. We can now prove the efficiency of our algorithm for -bounded distributions.
Proof of Lemma 28.
To relate Algorithm 2 with Algorithm 4, it is helpful to insert the book-keeping of in Algorithm 2 (lines 5 to 16, without 10) in between Algorithms 2 and 4 of Algorithm 2. This doesn’t change the number of queries or guarantees of Algorithm 2 but now both processes share the same state space over . For and , define and as the number of queries each process makes:
Using ˜4, it is thus enough to prove that when and . We have:
As both algorithms follow the same path in the state space, this expectation can be computed with respect to the code of Algorithm 4. Fix some state and observe that if there exists some such that , then by Lemma 26,
Therefore, for and , we have
The last equality is due to the fact that for all , if then . Let . We can now substitute for and for using Claim 29:
Thus, if and , we have
We now bound the expected number of queries made by . When , skips making a query at . On the other hand, when , the algorithm goes over and stops making queries as soon as it hits some . This probability is independent for each and can be computed explicitly using Claim 29. For and :
Therefore, if and ,
With this in hand, we can now prove Theorem 20, which we restate below for convenience. See 20
Proof.
Let be a randomised parity tree such that and . Define to be the randomised algorithm obtained by sampling and returning Algorithm 3 applied to . Using Lemma 27, we immediately obtain that . On the other hand:
Thus, , as desired.
6.2 Conversion for general product distribution
Algorithm 3 is not efficient for arbitrary product distribution since queries can be very biased so that . In such cases, we cannot even afford to pay one query as the corresponding expected increment for is .
To overcome this obstacle, we introduce the following idea. Run the algorithm as if every query returned , i.e. assuming for all (this is likely to happen for very biased distributions). This generates a list of indices for which we assume . Upon reaching a leaf, we check efficiently whether one of those is actually . If no such exists, we’re done – at the cost of no real queries! On the other hand, if a is found, we backtrack to this state and restart the procedure. Since we’ve found , it must be that and the algorithm has to pay one query there.
The process that “runs assuming ” and produces a list of indices to check is described in Algorithm 6. Then, the updated algorithm for converting an algorithm to a one is formulated in Algorithm 5.
How to run line 4?
This problem can be formulated as follows. Let be the search problem that asks for the index of the first (running from left to right) ’1’ in or if . Even though a simple adversary argument shows that one cannot perfectly compute by making parity queries, a folklore result [24, 44, 30], proves that there is a randomised protocol making queries that computes with some small error.
Lemma 30.
For any , .
Proof.
This folklore fact is discussed for the parity context in Section A.4. We let be the parity tree obtained by running Algorithm 5 with error parameter on line 4. Given two indices , we say if appears strictly earlier than in , and if or . Fix any . Let denote the first index in such that and suppose that is added to when . Observe that if such exists, for all . As a consequence, we know that must be reached. Moreover, we can immediately get the values of by flipping biased coins for all . Therefore, given , one can perfectly simulate Algorithm 3 by going over and updating , until finding the first index such that . We are now ready to prove the correctness and efficiency of .
Lemma 31.
For any fixed , .
Proof.
The randomness of stems from and the randomness involved in running the FFO algorithm at line 4. To analyse the latter, observe that line 4 is called at most times and each call fails with probability at most , hence:
If no call fails the discussion above implies that Algorithm 5 behaves identically to the earlier Algorithm 3. Hence, correctness of the former (Lemma 27) implies .
Lemma 32.
We have .
Proof.
We first prove that the expected number of iterations of the outer while-loop is low assuming that the algorithm always gets the correct index at line 4. Similar to what we did in Section 6.1, we view the randomness used in the for-loop (line 6 to 15) in Algorithm 5 as a pre-generated partial assignment . Note that the bits of are independent. If is the first index in with , we know that and for all . At the same time, for all are revealed to the algorithm one by one. As soon as some is found, the algorithm quits the loop.
For each and , consider running on input using randomness . Define as the number of iterations of the outer while loop when always gets the correct on line 4. Let denote the final state of . Since in each iteration except for the last one, we update some as , we have . By Lemma 26, we further have
where is the unique leaf at which terminates given . Since for all , , we have , hence . On the other hand, by definition we have
Lemma 30 asserts that line line 4 can be implemented to error using parity queries. Since all those calls are completed successfully with probability , we finally have:
See 19
Proof.
7 Separations I: disc vs.
Proof.
For the first item, we can consider the -bit majority function . It follows from [14, Theorem 1.2] that where the hard distribution is uniform. By contrast, it is not hard to see that (if we query for a random , it will have bias toward predicting ). We prove the second item by a probabilistic argument. Consider a random function , which is set with independently for each . In Claim 33, we show that and in Claim 34 that with high probability.
Claim 33.
With probability , .
Proof.
For each non-constant function , we define the “hard” distribution as
To prove the claim, it suffices to show . Using Lemma 9, this can be further simplified to prove:
To that end, fix any , note that and observe that by a Chernoff bound,
Using a similar argument, we can also show . By definition, , we thus have . Finally, observe that and so using a union bound,
Claim 34.
With probability , .
Proof.
Let denote the set of -biased product distributions, where . As observed in Section 4, it suffices to show .
As a first attempt, one might want to prove that with sufficiently high probability for any fixed and then apply union bound over all . However, this cannot be done directly since is infinite. Luckily, we can circumvent this barrier by discretizing . Let us define . For every and , consider the following two cases:
-
If , then . Observe that
thus .
-
Otherwise, we devise the following protocol: Sort . Pick the top inputs , then we check if our input is in . If yes, we output , otherwise we output . Formally, we define the function where
Since testing whether can be done with queries with success probability , by choosing and running the testing for every , one can show . It remains to prove that with high probability. Observe that for each , . Therefore:
For those , we have , which implies that .
By union bound over , we can deduce that
Consider now any product distribution , define its rounded version :
Observe that , thus we have for any parity tree and . Together with the string of inequalities developed above, we conclude that with probability at least ,
8 Separations II: S vs.
The goal of this section is to provide the following example of a function.
Theorem 35.
There exists a function and a product distribution such that and .
Recall that by Theorem 19, this is the largest possible gap between S and . To prove the separation, we use the function which takes two inputs and returns the value associated with the location of the first “” in . More precisely, we let be the location (from left to right) of the first “” in and if and let . We choose as hard distribution the product distribution where for each :
Let us note that the choice of in the distribution is arbitrary: any for constant is enough to guarantee that with high probability and get the lower bound.
Proof of Theorem 35.
We first prove that . Consider the following simple brute-force query algorithm that computes : Query the bits of one by one from left to right, until finding the first index such that . Then query and return if such exists. Otherwise (, simply return .
Observe that . Thus we only need to show . Let denote the set of for which . Note that forms a partition of . By the definition of , we have . For all , queries the same set of variables on . Moreover, sample and for each , since , we have that . Therefore,
We conclude that
Let us now turn our attention to . The lower-bound is covered in Claim 36. The upper bound is a direct consequence of for . More interestingly, one can actually show the stronger statement . Indeed, has exactly one “” in the first bits with probability for large enough. In that case, a simple binary search amongst the first bits of using parity queries is enough to find that location and return the corresponding bit of .
Claim 36.
Proof.
Using the characterisation of the bias with codimension-1 subspace Lemma 9, it is enough to show:
Fix an affine space of codimension 1 that maximize the above expression, i.e. some and such that . To simplify notation, we assume in what follows that but the proof is similar for the case . Let us partition in two sets:
We have:
Let us suppose without loss of generality that so that it is enough to show . Note that if , we’re done. If not, we can re-express the bias in the language of probability:
Let us denote the quantity within the absolute value by and the event by . Observe that can be conveniently decomposed as where and . With this, we have:
Recall that is a codimension-1 space and is the uniform distribution over . Thus, if (the number of non-zero entries in ) is zero or , it must be that for all and . In that case, the claim is proven because and so . We can thus assume that and fix to be the unique coordinate such that . Now, observe that for all and , and so that:
Finally, we use the fact that the event with is unlikely to happen if has large mass under . In any case, the probability is maximized for and hence:
The event can happen because or , thus we bound the bias with
References
- [1] Yaroslav Alekseev, Yuval Filmus, and Alexander Smal. Lifting Dichotomies. In 39th Computational Complexity Conference (CCC 2024), volume 300 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1–9:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CCC.2024.9.
- [2] Yaroslav Alekseev and Dmitry Itsykson. Lifting to bounded-depth and regular resolutions over parities via games. Technical Report TR24-128, ECCC, 2024. URL: https://eccc.weizmann.ac.il/report/2024/128/.
- [3] Laszlo Babai, Peter Frankl, and Janos Simon. Complexity classes in communication complexity theory. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986), pages 337–347, 1986. doi:10.1109/SFCS.1986.15.
- [4] Boaz Barak, Mark Braverman, Xi Chen, and Anup Rao. How to compress interactive communication. SIAM Journal on Computing, 42(3):1327–1363, 2013. doi:10.1137/100811969.
- [5] Paul Beame and Sajin Koroth. On Disperser/Lifting Properties of the Index and Inner-Product Functions. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1–14:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ITCS.2023.14.
- [6] Shalev Ben-David and Eric Blais. A tight composition theorem for the randomized query complexity of partial functions: Extended abstract. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 240–246, 2020. doi:10.1109/FOCS46700.2020.00031.
- [7] Shalev Ben-David and Eric Blais. A new minimax theorem for randomized algorithms. J. ACM, 70(6), 2023. doi:10.1145/3626514.
- [8] Shalev Ben-David, Eric Blais, Mika Göös, and Gilbert Maystre. Randomised Composition and Small-Bias Minimax . In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 624–635. IEEE Computer Society, 2022. doi:10.1109/FOCS54457.2022.00065.
- [9] Shalev Ben-David, Mika Göös, Robin Kothari, and Thomas Watson. When Is Amplification Necessary for Composition in Randomized Query Complexity? In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020), volume 176 of Leibniz International Proceedings in Informatics (LIPIcs), pages 28:1–28:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.APPROX/RANDOM.2020.28.
- [10] Shalev Ben-David and Robin Kothari. Randomized query complexity of sabotaged and composed functions. Theory of Computing, 14(5):1–27, 2018. doi:10.4086/toc.2018.v014a005.
- [11] Sreejata Kishor Bhattacharya, Arkadev Chattopadhyay, and Pavel Dvořák. Exponential Separation Between Powers of Regular and General Resolution over Parities. In 39th Computational Complexity Conference (CCC 2024), volume 300 of Leibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:32. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CCC.2024.23.
- [12] Eric Blais and Joshua Brody. Optimal separation and strong direct sum for randomized query complexity. In Proceedings of the 34th Computational Complexity Conference, CCC ’19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.CCC.2019.29.
- [13] Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan. A Strong Direct Sum Theorem for Distributional Query Complexity. In 39th Computational Complexity Conference (CCC 2024), volume 300 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1–16:30. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CCC.2024.16.
- [14] Mark Braverman, Ankit Garg, Denis Pankratov, and Omri Weinstein. Information lower bounds via self-reducibility. Theory of Computing Systems, 59(2):377–396, 2015. doi:10.1007/s00224-015-9655-z.
- [15] Mark Braverman and Anup Rao. Information equals amortized communication. IEEE Transactions on Information Theory, 60(10):6058–6069, 2014. doi:10.1109/TIT.2014.2347282.
- [16] Joshua Brody, Jae Tak Kim, Peem Lerdputtipongporn, and Hariharan Srinivasulu. A strong XOR lemma for randomized query complexity. Theory of Computing, 19(11):1–14, 2023. doi:10.4086/toc.2023.v019a011.
- [17] Farzan Byramji and Russell Impagliazzo. Lifting to randomized parity decision trees. Technical Report TR24-202, ECCC, 2024. URL: https://eccc.weizmann.ac.il/report/2024/202/.
- [18] Arkadev Chattopadhyay and Pavel Dvorak. Super-critical trade-offs in resolution over parities via lifting. Technical Report TR24-132, ECCC, 2024. URL: https://eccc.weizmann.ac.il/report/2024/132/.
- [19] Arkadev Chattopadhyay, Nikhil Mande, Swagato Sanyal, and Suhail Sherif. Lifting to Parity Decision Trees via Stifling. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 33:1–33:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ITCS.2023.33.
- [20] Tsun Ming Cheung, Hamed Hatami, Rosie Zhao, and Itai Zilberstein. Boolean functions with small approximate spectral norm. Discrete Analysis, 2024. doi:10.19086/da.122971.
- [21] Andrew Drucker. Improved direct product theorems for randomized query complexity. Comput. Complex., 21(2):197–244, 2012. doi:10.1007/s00037-012-0043-7.
- [22] Klim Efremenko, Michal Garlík, and Dmitry Itsykson. Lower bounds for regular resolution over parities. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, volume 41 of STOC ’24, pages 640–651. ACM, 2024. doi:10.1145/3618260.3649652.
- [23] Tomás Feder, Eyal Kushilevitz, Moni Naor, and Noam Nisan. Amortized communication complexity. SIAM Journal on Computing, 24(4):736–750, 1995. doi:10.1137/S0097539792235864.
- [24] U. Feige, D. Peleg, P. Raghavan, and E. Upfal. Computing with unreliable information. In Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, STOC ’90, pages 128–137. Association for Computing Machinery, 1990. doi:10.1145/100216.100230.
- [25] Yuval Filmus, Edward Hirsch, Artur Riazanov, Alexander Smal, and Marc Vinyals. Proving Unsatisfiability with Hitting Formulas. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024), volume 287 of Leibniz International Proceedings in Informatics (LIPIcs), pages 48:1–48:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.ITCS.2024.48.
- [26] Anat Ganor, Gillat Kol, and Ran Raz. Exponential separation of information and communication for boolean functions. J. ACM, 63(5), 2016. doi:10.1145/2907939.
- [27] Uma Girish, Makrand Sinha, Avishay Tal, and Kewen Wu. Fourier growth of communication protocols for XOR functions. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 721–732, 2023. doi:10.1109/FOCS57990.2023.00047.
- [28] Uma Girish, Avishay Tal, and Kewen Wu. Fourier growth of parity decision trees. In Proceedings of the 36th Computational Complexity Conference, CCC ’21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CCC.2021.39.
- [29] Mika Göös and Gilbert Maystre. A majority lemma for randomised query complexity. In Proceedings of the 36th Computational Complexity Conference, CCC ’21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CCC.2021.18.
- [30] Nathaniel Harms and Artur Riazanov. Better Boosting of Communication Oracles, or Not. In Siddharth Barman and Sławomir Lasota, editors, 44th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2024), volume 323 of Leibniz International Proceedings in Informatics (LIPIcs). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.FSTTCS.2024.25.
- [31] Hamed Hatami, Kaave Hosseini, and Shachar Lovett. Structure of protocols for XOR functions. SIAM Journal on Computing, 47(1):208–217, 2018. doi:10.1137/17M1136869.
- [32] Hamed Hatami, Kaave Hosseini, Shachar Lovett, and Anthony Ostuni. Refuting Approaches to the Log-Rank Conjecture for XOR Functions. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), volume 297 of Leibniz International Proceedings in Informatics (LIPIcs), pages 82:1–82:11. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.ICALP.2024.82.
- [33] Dmitry Itsykson and Dmitry Sokolov. Resolution over linear equations modulo two. Annals of Pure and Applied Logic, 171(1):102722, 2020. doi:10.1016/j.apal.2019.102722.
- [34] Siddharth Iyer and Anup Rao. An XOR lemma for deterministic communication complexity, 2024. doi:10.48550/arXiv.2407.01802.
- [35] Siddharth Iyer and Anup Rao. XOR lemmas for communication via marginal information. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 652–658. Association for Computing Machinery, 2024. doi:10.1145/3618260.3649726.
- [36] Rahul Jain, Hartmut Klauck, and Miklos Santha. Optimal direct sum results for deterministic and randomized decision tree complexity. Information Processing Letters, 110(20):893–897, 2010. doi:10.1016/j.ipl.2010.07.020.
- [37] Rahul Jain, Jaikumar Radhakrishnan, and Pranab Sen. A direct sum theorem in communication complexity via message compression. In Jos C. M. Baeten, Jan Karel Lenstra, Joachim Parrow, and Gerhard J. Woeginger, editors, Automata, Languages and Programming, pages 300–315. Springer Berlin Heidelberg, 2003. doi:10.1007/3-540-45061-0_26.
- [38] Hartmut Klauck, Robert Špalek, and Ronald de Wolf. Quantum and classical strong direct product theorems and optimal time-space tradeoffs. SIAM Journal on Computing, 36(5):1472–1493, 2007. doi:10.1137/05063235X.
- [39] A. Knop, S. Lovett, S. McGuire, and W. Yuan. Guest column: Models of computation between decision trees and communication. SIGACT News, 52(2):46–70, 2021. doi:10.1145/3471469.3471479.
- [40] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM Journal on Computing, 22(6):1331–1348, 1993. doi:10.1137/0222080.
- [41] Troy Lee, Adi Shraibman, and Robert Špalek. A direct product theorem for discrepancy. In 2008 23rd Annual IEEE Conference on Computational Complexity, pages 71–80, 2008. doi:10.1109/CCC.2008.25.
- [42] Nati Linial and Adi Shraibman. Learning complexity vs. communication complexity. In 2008 23rd Annual IEEE Conference on Computational Complexity, pages 53–63, 2008. doi:10.1109/CCC.2008.28.
- [43] Nikhil Mande and Swagato Sanyal. On parity decision trees for fourier-sparse boolean functions. ACM Trans. Comput. Theory, 16(2), 2024. doi:10.1145/3647629.
- [44] Noam Nisan. The communication complexity of threshold gates. Proc. of Combinatorics, Paul Erdős is Eighty, 1993.
- [45] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014. doi:10.1017/CBO9781139814782.
- [46] Ryan ODonnell, John Wright, Yu Zhao, Xiaorui Sun, and Li-Yang Tan. A composition theorem for parity kill number. In 2014 IEEE Conference on Computational Complexity (CCC), pages 144–154. IEEE Computer Society, 2014. doi:10.1109/CCC.2014.22.
- [47] Anup Rao and Makrand Sinha. Simplified separation of information and communication. Theory of Computing, 14(20):1–29, 2018. doi:10.4086/toc.2018.v014a020.
- [48] Swagato Sanyal. Fourier sparsity and dimension. Theory of Computing, 15(11):1–13, 2019. doi:10.4086/toc.2019.v015a011.
- [49] Swagato Sanyal. Randomized query composition and product distributions. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS), volume 289 of LIPIcs, pages 56:1–56:19. Schloss Dagstuhl, 2024. doi:10.4230/LIPIcs.STACS.2024.56.
- [50] Petr Savický. On determinism versus unambiquous nondeterminism for decision trees. Technical Report TR02-009, Electronic Colloquium on Computational Complexity (ECCC), 2002. URL: http://eccc.hpi-web.de/report/2002/009/.
- [51] Ronen Shaltiel. Towards proving strong direct product theorems. computational complexity, 12(1):1–22, 2003. doi:10.1007/s00037-003-0175-x.
- [52] Alexander Shekhovtsov and Vladimir Podolskii. Randomized lifting to semi-structured communication complexity via linear diversity. In 16th Innovations in Theoretical Computer Science Conference (ITCS), LIPIcs, pages 78:1–78:21. Schloss Dagstuhl, 2025. doi:10.4230/LIPIcs.ITCS.2025.78.
- [53] Amir Shpilka, Avishay Tal, and Ben Volk. On the structure of boolean functions with small spectral norm. computational complexity, 26(1):229–273, 2017. doi:10.1007/s00037-015-0110-y.
- [54] Hing Yin Tsang, Chung Hoi Wong, Ning Xie, and Shengyu Zhang. Fourier sparsity, spectral norm, and the log-rank conjecture. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 658–667, 2013. doi:10.1109/FOCS.2013.76.
- [55] Andrew Yao. Probabilistic computations: Toward a unified measure of complexity. In Proceedings of the 18th Annual Symposium on Foundations of Computer Science, SFCS ’77, pages 222–227. IEEE Computer Society, 1977. doi:10.1109/SFCS.1977.24.
- [56] Andrew Yao. Lower bounds by probabilistic arguments. In Proceedings of the 24th Annual Symposium on Foundations of Computer Science, SFCS ’83, pages 420–428. IEEE Computer Society, 1983. doi:10.1109/SFCS.1983.30.
- [57] Zhiqiang Zhang and Yaoyun Shi. On the parity complexity measures of boolean functions. Theoretical Computer Science, 411(26-28):2612–2618, 2010. doi:10.1016/j.tcs.2010.03.027.
Appendix A Appendix
A.1 Direct sums for D
In this appendix, we prove that the best-known direct sum results in the context of deterministic communication complexity can be obtained in the parity decision tree setting. We restate our theorem for convenience below. See 4 Let us first introduce a couple of definitions. Fix a function . A parity certificate for is an affine space such that and for any , . Similarly to the classical case, the parity certificate complexity is the smallest codimension of a space that certifies the value – where the hardest possible is taken. We also define for the number of non-zero Fourier coefficients of . To prove Theorem 4, it is enough to prove a direct sum for parity certificate complexity and employ the following two results:
Lemma 37.
For any and , .
Proof of Lemma 37.
Fix an input attaining and suppose towards contradiction that . This implies in particular that there exists an affine space described by equations (where ) that certifies the value of the input which is composed of copies of . Define for with:
Observe that and as such there must be some with . Fix for simplicity . Using Gaussian elimination, one can re-express where
-
1.
the constraints in are exclusively on bits of the first copy and
-
2.
any constraint in has at least one bit of a copy other than the first.
Since is about the first copy only, it can be identified with a single-copy affine space where in a natural way. Observe that as . Because the codimension of is strictly less than , there must be some with . Note that fixing leaves the system of linear constraints feasible and as such there exists such that : a contradiction since .
A.2 Omitted case of Theorem 2
Lemma 38.
If , we have .
Proof.
Fix a hard product distribution for . If , the claim follows trivially. Else, we have and using Claim 39 with , it must be that . Using Claim 17 and Theorem 18, we thus have:
Claim 39.
For any , product distribution and , we have .
Proof.
Fix a deterministic decision tree and consider the zero-query decision tree that comes out of applying Algorithm 7 to . To relate and , we go through Algorithm 5. Again, let be the tree obtained by applying Algorithm 5 to with error zero on line 4. We stress that is a randomised decision tree depending on . On the other hand, can be seen as with fewer instructions executed. Using Lemma 31, we have:
Now, let be a randomised parity tree such that and . Let be the randomised parity tree obtained as follows:
-
1.
Sample
-
2.
Return obtained by applying to Algorithm 7.
With the analysis above, we obtain . We remark that makes no queries and has the following error probability:
A.3 Direct sum for distribution-free discrepancy
Theorem 40.
For every function and ,
Proof.
The lower bound is a simple consequence of Lemma 8 by fixing to be a distribution such that and observing that . The other direction is more interesting as it says that the hardest distribution for is basically products of the hardest distribution for a single copy . Let where ranges over all distributions. Using Lemma 9, we obtain the following relation between and :
Therefore, to prove the upper bound, it is enough to show a perfect direct product for and apply it time. To this end, fix some other function and let us show that
Where we recall that . We can write as the value of the following linear program where the variables describe a distribution :
| min. | (5) | ||||
| s.t. | |||||
The dual of (5) is:
| max. | (6) | |||
| s.t. | ||||
Let and be the optimal feasible solutions to with respect to and . By the strong duality theorem, it holds that and . We now extract a feasible solution for (6) with respect to the function . Let be defined with and observe that is a feasible solution for the dual of . By applying the strong duality theorem again, we have , as desired.
A.4 Some facts about parity decision trees
Yao’s minimax principle is a powerful technique to analyse randomised algorithms – we adapt here the statement to parity trees, but the proof is exactly the same as the original one [55].
Lemma 41.
For any and distribution over , .
The following is a folklore fact relating randomised parity tree complexity and discrepancy [56, 3] which we re-prove in the parity context.
Lemma 42.
for any .
Proof.
Fix a parity decision tree of depth which makes error , note that
As , there exists some – an affine subspace – with large correlation:
See 30
Proof.
Let be the function that checks whether the input is and rejects otherwise. Observe that one iteration of the sumcheck protocol can be performed in one parity query. More precisely for any , if then:
Performing two random checks independently shows that . It is a folklore result that a (classical) randomised decision tree can solve with probability using oracle NOR-queries even if the oracle fails with probability [24, 44]. We highlight that this is an improvement over the naive method that boosts the noisy NOR queries and yields complexity . Recent work [30, §3] revisits this trick in depth for communication complexity and their result can be re-interpreted in the context of parity decision trees as follows:
Thus, plugging in and noting that (with binary search), we get the desired result. See 11
Proof.
Let be a randomised PDT satisfying that and . To prove the lemma, it suffices to construct a deterministic parity tree of depth with . Sample . We construct a new tree by pruning as follows: We remove all the nodes of of depth greater than . If any node of depth becomes a leaf, we label it with an arbitrary bit. Note that has depth . Finally, let denote the distribution over inherited from .
We observe that for each , both and happen only if . Moreover, by Markov’s inequality,
Therefore, . By an averaging argument, there exists some of depth that computes with error , as desired.
A.5 Omitted proofs of Section 6
In this appendix, we prove Claim 29, an alternative description for the distributions of Section 6.1. Let . We write if and are consistent over their non entries. That is, if for all , if and , then . Claim 29 follows from Claims 43 and 44.
Claim 43.
For every reachable state , consistent and , .
Proof.
Upon inspection of , it is enough to prove that for all :
Fix . We prove this by induction on the state space consistent with . The entry-point of the state space is . In this case, the statement holds by definition. Suppose now that the statement is true for state . Depending on the value of , there are several next state possible. Observe however that the next vertex of to be visited does not depend on , as it is fixed to be . For any fixed , we have:
Note that there can be only one state from which can be reached, namely . Indeed, suppose that there is another state from which can be reached. Then and have a common ancestor . Since the paths diverged after , it must be that and thus : a contradiction. Thus, we have the following equivalence:
Therefore, we have:
| (7) |
We can now use the inductive hypothesis on . Since implies , the numerator of 7 simplifies to:
Let and observe that the denominator of 7 is equal to:
Claim 44.
For every reachable state and , .
Proof.
Fix some and . Upon inspection of , it is enough to prove that
where is an indicator set to 1 if and only if for all , implies and for all . By Baye’s rule we have:
where
To analyse , we have:
On the other hand, the second component of is clearly zero if . For instance, cannot be reached if does not satisfy all equations on the path to . Thus, we have:
Combining those two observations, we get:
Observe that the last two products do not involve at all and can thus be cancelled in the initial expression:
Finally, observe that fixes the value of all the bits of except for . Thus, the summation in the denominator equals 1 and the claim follows.
